git/http-backend.c, branch v2.4.9

http-backend: spool ref negotiation requests to buffer

2015-05-26T03:43:18Z

When http-backend spawns "upload-pack" to do ref negotiation, it streams the http request body to upload-pack, who then streams the http response back to the client as it reads. In theory, git can go full-duplex; the client can consume our response while it is still sending the request. In practice, however, HTTP is a half-duplex protocol. Even if our client is ready to read and write simultaneously, we may have other HTTP infrastructure in the way, including the webserver that spawns our CGI, or any intermediate proxies. In at least one documented case[1], this leads to deadlock when trying a fetch over http. What happens is basically: 1. Apache proxies the request to the CGI, http-backend. 2. http-backend gzip-inflates the data and sends the result to upload-pack. 3. upload-pack acts on the data and generates output over the pipe back to Apache. Apache isn't reading because it's busy writing (step 1). This works fine most of the time, because the upload-pack output ends up in a system pipe buffer, and Apache reads it as soon as it finishes writing. But if both the request and the response exceed the system pipe buffer size, then we deadlock (Apache blocks writing to http-backend, http-backend blocks writing to upload-pack, and upload-pack blocks writing to Apache). We need to break the deadlock by spooling either the input or the output. In this case, it's ideal to spool the input, because Apache does not start reading either stdout _or_ stderr until we have consumed all of the input. So until we do so, we cannot even get an error message out to the client. The solution is fairly straight-forward: we read the request body into an in-memory buffer in http-backend, freeing up Apache, and then feed the data ourselves to upload-pack. But there are a few important things to note: 1. We limit the in-memory buffer to prevent an obvious denial-of-service attack. This is a new hard limit on requests, but it's unlikely to come into play. The default value is 10MB, which covers even the ridiculous 100,000-ref negotation in the included test (that actually caps out just over 5MB). But it's configurable on the off chance that you don't mind spending some extra memory to make even ridiculous requests work. 2. We must take care only to buffer when we have to. For pushes, the incoming packfile may be of arbitrary size, and we should connect the input directly to receive-pack. There's no deadlock problem here, though, because we do not produce any output until the whole packfile has been read. For upload-pack's initial ref advertisement, we similarly do not need to buffer. Even though we may generate a lot of output, there is no request body at all (i.e., it is a GET, not a POST). [1] http://article.gmane.org/gmane.comp.version-control.git/269020 Test-adapted-from: Dennis Kaarsemaker Signed-off-by: Jeff King Signed-off-by: Junio C Hamano

http-backend: fix die recursion with custom handler

2015-05-15T18:13:47Z

When we die() in http-backend, we call a custom handler that writes an HTTP 500 response to stdout, then reports the error to stderr. Our routines for writing out the HTTP response may themselves die, leading to us entering die() again. When it was originally written, that was OK; our custom handler keeps a variable to notice this and does not recurse. However, since cd163d4 (usage.c: detect recursion in die routines and bail out immediately, 2012-11-14), the main die() implementation detects recursion before we even get to our custom handler, and bails without printing anything useful. We can handle this case by doing two things: 1. Installing a custom die_is_recursing handler that allows us to enter up to one level of recursion. Only the first call to our custom handler will try to write out the error response. So if we die again, that is OK. If we end up dying more than that, it is a sign that we are in an infinite recursion. 2. Reporting the error to stderr before trying to write out the HTTP response. In the current code, if we do die() trying to write out the response, we'll exit immediately from this second die(), and never get a chance to output the original error (which is almost certainly the more interesting one; the second die is just going to be along the lines of "I tried to write to stdout but it was closed"). Signed-off-by: Jeff King Signed-off-by: Junio C Hamano

Merge branch 'rs/run-command-env-array'

2014-10-24T21:57:54Z

Add managed "env" array to child_process to clarify the lifetime rules. * rs/run-command-env-array: use env_array member of struct child_process run-command: add env_array, an optional argv_array for env

use env_array member of struct child_process

2014-10-19T22:26:34Z

Convert users of struct child_process to using the managed env_array for specifying environment variables instead of supplying an array on the stack or bringing their own argv_array. This shortens and simplifies the code and ensures automatically that the allocated memory is freed after use. Signed-off-by: Rene Scharfe Signed-off-by: Junio C Hamano

refs.c: change resolve_ref_unsafe reading argument to be a flags field

2014-10-15T17:47:24Z

resolve_ref_unsafe takes a boolean argument for reading (a nonexistent ref resolves successfully for writing but not for reading). Change this to be a flags field instead, and pass the new constant RESOLVE_REF_READING when we want this behaviour. While at it, swap two of the arguments in the function to put output arguments at the end. As a nice side effect, this ensures that we can catch callers that were unaware of the new API so they can be audited. Give the wrapper functions resolve_refdup and read_ref_full the same treatment for consistency. Signed-off-by: Ronnie Sahlberg Signed-off-by: Jonathan Nieder Signed-off-by: Junio C Hamano

Merge branch 'rs/child-process-init'

2014-09-11T17:33:27Z

Code clean-up. * rs/child-process-init: run-command: inline prepare_run_command_v_opt() run-command: call run_command_v_opt_cd_env() instead of duplicating it run-command: introduce child_process_init() run-command: introduce CHILD_PROCESS_INIT

run-command: introduce CHILD_PROCESS_INIT

2014-08-20T16:53:37Z

Most struct child_process variables are cleared using memset first after declaration. Provide a macro, CHILD_PROCESS_INIT, that can be used to initialize them statically instead. That's shorter, doesn't require a function call and is slightly more readable (especially given that we already have STRBUF_INIT, ARGV_ARRAY_INIT etc.). Helped-by: Johannes Sixt Signed-off-by: Rene Scharfe Signed-off-by: Junio C Hamano

http-backend.c: replace `git_config()` with `git_config_get_bool()` family

2014-08-07T20:33:26Z

Use `git_config_get_bool()` family instead of `git_config()` to take advantage of the config-set API which provides a cleaner control flow. Signed-off-by: Tanay Abhra Reviewed-by: Matthieu Moy Signed-off-by: Junio C Hamano

Merge branch 'maint'

2014-07-21T19:35:39Z

* maint: use xmemdupz() to allocate copies of strings given by start and length use xcalloc() to allocate zero-initialized memory

use xmemdupz() to allocate copies of strings given by start and length

2014-07-21T17:37:02Z

Use xmemdupz() to allocate the memory, copy the data and make sure to NUL-terminate the result, all in one step. The resulting code is shorter, doesn't contain the constants 1 and '\0', and avoids duplicating function parameters. For blame, the last copied byte (o->file.ptr[o->file.size]) is always set to NUL by fake_working_tree_commit() or read_sha1_file(), so no information is lost by the conversion to using xmemdupz(). Signed-off-by: Rene Scharfe Signed-off-by: Junio C Hamano