HN Gopher Feed (2017-10-12) - page 1 of 10 ___________________________________________________________________
Exploding Git Repositories
275 points by ingve
https://kate.io/blog/git-bomb/___________________________________________________________________
JoshMnem - 5 hours ago
Because that page is AMP by default, it takes about 7 seconds to
load the page on my laptop. AMP is really slow in some cases.Edit:
see my comment below before you downvote me.
katee - 4 hours ago
Huh, I've tested on a bunch of devices/connections and haven't
encountered that. Do you know what causes AMP to be that slow for
you? I'll take a look at serving non-AMP pages by default. It
will require tweaking how image inclusion works.
JoshMnem - 3 hours ago
For people who use extensions or browsers that block third
party JS, AMP pages will take many seconds to load in non-
mobile Web browsers.Here is information about some of the other
problems with AMP:https://www.theregister.co.uk/2017/05/19/open
_source_insider...https://danielmiessler.com/blog/google-amp-
not-good-thing/https://ethanmarcotte.com/wrote/ampersand/https
://css-tricks.com/need-catch-amp-
debate/https://daringfireball.net/linked/2017/01/17/schreiber-
amp
Sir_Cmpwn - 3 hours ago
Would you please remove amp entirely?
Retr0spectrum - 4 hours ago
What happens if you try to make a recursive tree?
ethomson - 4 hours ago
As in a tree that points to itself? You cannot, since a tree
would have to point to its own SHA1. So this would require you
to know your own tree's SHA and embed it in the tree.
mv4 - 4 hours ago
Reminded me of the GIF that displays its own MD5
hash:https://twitter.com/bascule/status/838927719534477312
katee - 4 hours ago
You can't make a valid recursive tree without a pre-image attack
against SHA1. However `git` doesn't actually verify the SHA1s
when it does most commands. If you make a recursive tree and try
`git status` it will segfault because the directory walking gets
stuck in infinite recursion.
timdorr - 6 hours ago
I'm curious how this was uploaded to GitHub successfully. I guess
they do less actual introspection on the repo's contents than I
thought. Did it wreak havoc on any systems behind the scenes
(similar to big repos like Homebrew's)?
stolee - 6 hours ago
There isn't anything wrong with the objects. A 'fetch' succeeds
but the 'checkout' is what blows up.
yes_or_gnome - 4 hours ago
Good point. For those that are curious:Clone (--no-checkout):
$ git clone --no-checkout https://github.com/Katee/git-bomb.git
Cloning into 'git-bomb'... remote: Counting objects: 18,
done. remote: Compressing objects: 100% (6/6), done.
remote: Total 18 (delta 2), reused 0 (delta 0), pack-reused 12
Unpacking objects: 100% (18/18), done. From there, you can do
some operations like `git log` and `git cat-file -p HEAD` (I
use the "dump" alias[1]; `git config --global alias.dump
catfile -p`), but not others `git checkout` or `git status`.[1]
Thanks to Jim Weirich and Git-Immersion,
http://gitimmersion.com/lab_23.html. I never knew the guy, but,
~~8yrs~~ (corrected below) 3.5yrs after his passing, I still go
back to his presentations on Git and Ruby often.Edit: And, to
see the whole tree: NEXT_REF=HEAD while [ -n "$NEXT_REF" ];
do echo "$NEXT_REF" git dump "${NEXT_REF}" echo
NEXT_REF=$(git dump "${NEXT_REF}"^{tree} 2>/dev/null | awk '{
if($4 == "d0" || $4 == "f0"){ print $3 } }') done
matthewrudy - 4 hours ago
Sad one to nitpick, but Jim died in 2014. So ~3.5 years
ago.Had the pleasure of meeting him in Singapore in
2013.Still so much great code of his we use all the time.
yes_or_gnome - 3 hours ago
Thanks for the correction, he truly was a brilliant mind.
One of my regrets was not being active and outgoing enough
to go meet him myself. I was lived in the Cincinnati area
from 2007-2012. I first got started with Ruby in 2009, and
quickly became aware of who he was (Rake, Bundler, etc) and
that he lived/worked close by. But, at the time, I wasn't
interested in conferences, meetups, or simply emailing
someone to say thanks.
enzanki_ars - 6 hours ago
I too was curious about this.https://github.com/Katee/git-
bomb/commit/45546f17e5801791d4b... shows:"Sorry, this diff is
taking too long to generate. It may be too large to display on
GitHub."...so they must have some kind of backend limits that may
have prevented this for becoming an issue.I wonder what would
happen if it was hosted on a GitLab instance? Might have to try
that sometime...
ethomson - 4 hours ago
Yes, hosting providers need rate limiting mitigations in place.
GitHub's is called gitmon (at least unofficially), and you can
learn more at https://m.youtube.com/watch?v=f7ecUqHxD7oVisual
Studio Team Services has a fundamentally different
architecture, but we do some similar mechanisms despite that.
(I should do some talks about it - but it's always hard to know
how much to say about your defenses lest it give attackers
clever new ideas!)
deckar01 - 6 hours ago
GitLab uses a custom Git client called Gitaly [0].> Project
Goals> Make the git data storage tier of large GitLab
instances, and GitLab.com in particular, fast.[0]:
https://gitlab.com/gitlab-org/gitalyEdit: It looks like Gitaly
still spawns git for low level operations. It is probably
affected.
jychang - 2 hours ago
Spawning git doesn't mean that it can't just check for a
timeout and stop the task with an error.Someone will probably
have to actually try an experiment with Gitlab.
ballenf - 5 hours ago
Since GitHub paid a bounty and Ok'd release, perhaps they've
patched some aspects of it already. Might be impossible to
recreate the issue now.My naive question is whether CLI "git"
would need or could benefit from a patch. Part of me thinks it
doesn't, since there are legitimate reasons for each individual
aspect of creating the problematic repo. But I probably don't
understand god deeply enough to know for sure.
mnx - 5 hours ago
is this a git->god typo, or a statement about your feelings
towards Linus?
warent - 5 hours ago
Please don't let Linus read this
shade23 - 6 hours ago
To save folks from searching :
https://github.com/cocoapods/cocoapods/issues/4989
styfle - 4 hours ago
Thanks. Here is the comment from a GitHub engineer addressing
the root cause:https://github.com/cocoapods/cocoapods/issues/49
89#issuecomm...
warent - 5 hours ago
Odd. It's surprising to me that this example runs out of memory.
What would be a possible solution?Admittedly I don't know that much
about the inner-workings of git, but off the top of my head,
perhaps something with traversing the tree depth-first and
releasing resources as you hit the bottom?
ericfrederich - 2 hours ago
You need a problem to have a solution to it. What do you
consider to be the problem here?This is essentially something
that can be expressed in relatively few bytes that expands to
something much larger.Imagine I had a compressed file format for
blank files "0x00" the whole way. It is implemented by writing
in ascii the size of the uncompressed file.So the contents of a
file called terrabyte.blank is just ascii "1000000000000" ... or
the contents of a file called petabyte.blank is "10000000000000"I
cannot decompress these files... what is the solution?
warent - 1 hours ago
I'm not following; why can't you decompress it? Of course you
cant decompress it into memory, but if it's trying to that then
there's a problem in the code (problem identified).Naive
solution, just write to the end of the file and make sure you
have enough disk. More sophisticated solution, shard the file
across multiple disks.
peff - 4 hours ago
Git assumes it can keep a small struct in memory for each file in
the repository (not the file contents, but a fixed per-file
size). This repository just has a very large number of files.
ericfrederich - 2 hours ago
Run this to create a 40K file which expands to 1GiB yes | head
-n536870912 | bzip2 -c > /tmp/foo.bz2 I would imagine you could do
something really creative with ImageMagick to create a giant PNG
file as well that'll make browsers, viewers, editors crash as well.
tedunangst - 1 hours ago
PNG has dimensions in the header so the decoder should know when
it's decompressed enough.
Hupriene - 1 hours ago
You can also make archives that contain
themselves:https://research.swtch.com/zip
kowdermeister - 6 hours ago
I thought it would self destruct after cloning of forking before
clicking :)
gwerbin - 1 hours ago
Would this be possible with a patch-based version control system
like Darcs or Pijul? Does patch-based version control have other
analogous security risks, or is it "better" in this case?
hathawsh - 5 hours ago
I wonder what the author means by "a lot" of RAM and storage. I
tried it for fun. The git process pegged one CPU core and swelled
to 26 GB of RAM over 8 minutes, after which I had to kill it.
gabesullice - 1 hours ago
Humblebrag ;)
wscott - 3 hours ago
Yeah I tried it too. Killed at 65G. Disappointed that Linux
killed Chrome first. Oct 12 15:47:52 x99 kernel:
[552390.074468] Out of memory: Kill process 7898 (git) score 956
or sacrifice child Oct 12 15:47:52 x99 kernel:
[552390.074471] Killed process 7898 (git) total-vm:65304212kB,
anon-rss:63789568kB, file-rss:1384kB, shmem-rss:0kB
Edit:Interesting. Linux didn't kill Chrome, it died on its own.
Oct 12 15:42:21 x99 kernel: [552060.423448]
TaskSchedulerFo[8425]: segfault at 0 ip 000055618c430740 sp
00007f344cc093f0 error 6 in chrome[556188a1d000+55d1000] Oct
12 15:42:21 x99 kernel: [552060.439116] Core dump to
|/usr/share/apport/apport 16093 11 0 16093 pipe failed Oct 12
15:42:21 x99 kernel: [552060.450561] traps: chrome[16409] trap
invalid opcode ip:55af00f34b4c sp:7ffee985fb20 error:0 Oct 12
15:42:21 x99 kernel: [552060.450564] in
chrome[55aeffb76000+55d1000] Oct 12 15:47:52 x99 kernel:
[552390.074289] syncthing invoked oom-killer:
gfp_mask=0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=0,
order=0, oom_score_adj=0 Seems Chrome faulted first, but it was
probably capturing all signals and didn't handle OOM. Then next,
syncthing faulted and it started the oom-killer which correctly
selected 'git' to kill.
porfirium - 5 hours ago
If we all click "Download ZIP" on this repo we can crash GitHub
together!Just click here: https://codeload.github.com/Katee/git-
bomb/zip/master
AceJohnny2 - 4 hours ago
I hope and expect that GitHub has the basic infrastructure to
monitor excessive processes and kill them.