git/http-walker.c, branch v2.16.2

pack: move find_sha1_pack()

2017-08-23T22:12:07Z

Signed-off-by: Jonathan Tan Signed-off-by: Junio C Hamano

Merge branch 'jk/http-walker-buffer-underflow-fix'

2017-03-17T20:50:26Z

"Dumb http" transport used to misparse a nonsense http-alternates response, which has been fixed. * jk/http-walker-buffer-underflow-fix: http-walker: fix buffer underflow processing remote alternates

http-walker: fix buffer underflow processing remote alternates

2017-03-13T17:20:29Z

If we parse a remote alternates (or http-alternates), we expect relative lines like: ../../foo.git/objects which we convert into "$URL/../foo.git/" (and then use that as a base for fetching more objects). But if the remote feeds us nonsense like just: ../ we will try to blindly strip the last 7 characters, assuming they contain the string "objects". Since we don't _have_ 7 characters at all, this results in feeding a small negative value to strbuf_add(), which converts it to a size_t, resulting in a big positive value. This should consistently fail (since we can't generall allocate the max size_t minus 7 bytes), so there shouldn't be any security implications. Let's fix this by using strbuf_strip_suffix() to drop the characters we want. If they're not present, we'll ignore the alternate (in theory we could use it as-is, but the rest of the http-walker code unconditionally tacks "objects/" back on, so it is it not prepared to handle such a case). Signed-off-by: Jeff King Signed-off-by: Junio C Hamano

http: release strbuf on disabled alternates

2017-03-06T18:52:57Z

This likely has no real-world impact on memory usage, but it is cleaner for future readers. Fixes: abcbdc03895f ("http: respect protocol.*.allow=user for http-alternates") Signed-off-by: Eric Wong Reviewed-by: Jeff King Signed-off-by: Junio C Hamano

http: inform about alternates-as-redirects behavior

2017-03-06T18:52:15Z

It is disconcerting for users to not notice the behavior change in handling alternates from commit cb4d2d35c4622ec2 ("http: treat http-alternates like redirects") Give the user a hint about the config option so they can see the URL and decide whether or not they want to enable http.followRedirects in their config. Signed-off-by: Eric Wong Reviewed-by: Jeff King Signed-off-by: Junio C Hamano

http: respect protocol.*.allow=user for http-alternates

2016-12-15T17:29:13Z

The http-walker may fetch the http-alternates (or alternates) file from a remote in order to find more objects. This should count as a "not from the user" use of the protocol. But because we implement the redirection ourselves and feed the new URL to curl, it will use the CURLOPT_PROTOCOLS rules, not the more restrictive CURLOPT_REDIR_PROTOCOLS. The ideal solution would be for each curl request we make to know whether or not is directly from the user or part of an alternates redirect, and then set CURLOPT_PROTOCOLS as appropriate. However, that would require plumbing that information through all of the various layers of the http code. Instead, let's check the protocol at the source: when we are parsing the remote http-alternates file. The only downside is that if there's any mismatch between what protocol we think it is versus what curl thinks it is, it could violate the policy. To address this, we'll make the parsing err on the picky side, and only allow protocols that it can parse definitively. So for example, you can't elude the "http" policy by asking for "HTTP://", even though curl might handle it; we would reject it as unknown. The only unsafe case would be if you have a URL that starts with "http://" but curl interprets as another protocol. That seems like an unlikely failure mode (and we are still protected by our base CURLOPT_PROTOCOL setting, so the worst you could do is trigger one of https, ftp, or ftps). Signed-off-by: Jeff King Signed-off-by: Brandon Williams Signed-off-by: Junio C Hamano

http-walker: complain about non-404 loose object errors

2016-12-06T20:43:34Z

Since commit 17966c0a6 (http: avoid disconnecting on 404s for loose objects, 2016-07-11), we turn off curl's FAILONERROR option and instead manually deal with failing HTTP codes. However, the logic to do so only recognizes HTTP 404 as a failure. This is probably the most common result, but if we were to get another code, the curl result remains CURLE_OK, and we treat it as success. We still end up detecting the failure when we try to zlib-inflate the object (which will fail), but instead of reporting the HTTP error, we just claim that the object is corrupt. Instead, let's catch anything in the 300's or above as an error (300's are redirects which are not an error at the HTTP level, but are an indication that we've explicitly disabled redirects, so we should treat them as such; we certainly don't have the resulting object content). Note that we also fill in req->errorstr, which we didn't do before. Without FAILONERROR, curl will not have filled this in, and it will remain a blank string. This never mattered for the 404 case, because in the logic below we hit the "missing_target()" branch and print nothing. But for other errors, we'd want to say _something_, if only to fill in the blank slot in the error message. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano

Merge branch 'ew/http-walker' into jk/http-walker-limit-redirect

2016-12-06T20:43:23Z

* ew/http-walker: list: avoid incompatibility with *BSD sys/queue.h http-walker: reduce O(n) ops with doubly-linked list http: avoid disconnecting on 404s for loose objects http-walker: remove unused parameter from fetch_object

http: treat http-alternates like redirects

2016-12-06T20:32:48Z

The previous commit made HTTP redirects more obvious and tightened up the default behavior. However, there's another way for a server to ask a git client to fetch arbitrary content: by having an http-alternates file (or a regular alternates file, which is used as a backup). Similar to the HTTP redirect case, a malicious server can claim to have refs pointing at object X, return a 404 when the client asks for X, but point to some other URL via http-alternates, which the client will transparently fetch. The end result is that it looks from the user's perspective like the objects came from the malicious server, as the other URL is not mentioned at all. Worse, because we feed the new URL to curl ourselves, the usual protocol restrictions do not kick in (neither curl's default of disallowing file://, nor the protocol whitelisting in f4113cac0 (http: limit redirection to protocol-whitelist, 2015-09-22). Let's apply the same rules here as we do for HTTP redirects. Namely: - unless http.followRedirects is set to "always", we will not follow remote redirects from http-alternates (or alternates) at all - set CURLOPT_PROTOCOLS alongside CURLOPT_REDIR_PROTOCOLS restrict ourselves to a known-safe set and respect any user-provided whitelist. - mention alternate object stores on stderr so that the user is aware another source of objects may be involved The first item may prove to be too restrictive. The most common use of alternates is to point to another path on the same server. While it's possible for a single-server redirect to be an attack, it takes a fairly obscure setup (victim and evil repository on the same host, host speaks dumb http, and evil repository has access to edit its own http-alternates file). So we could make the checks more specific, and only cover cross-server redirects. But that means parsing the URLs ourselves, rather than letting curl handle them. This patch goes for the simpler approach. Given that they are only used with dumb http, http-alternates are probably pretty rare. And there's an escape hatch: the user can allow redirects on a specific server by setting http..followRedirects to "always". Reported-by: Jann Horn Signed-off-by: Jeff King Signed-off-by: Junio C Hamano

http-walker: reduce O(n) ops with doubly-linked list

2016-07-12T22:17:42Z

Using the a Linux-kernel-derived doubly-linked list implementation from the Userspace RCU library allows us to enqueue and delete items from the object request queue in constant time. This change reduces enqueue times in the prefetch() function where object request queue could grow to several thousand objects. I left out the list_for_each_entry* family macros from list.h which relied on the __typeof__ operator as we support platforms without it. Thus, list_entry (aka "container_of") needs to be called explicitly inside macro-wrapped for loops. The downside is this costs us an additional pointer per object request, but this is offset by reduced overhead on queue operations leading to improved performance and shorter queue depths. Signed-off-by: Eric Wong Signed-off-by: Junio C Hamano