git/http-backend.c, branch v2.7.3

Convert struct object to object_id

2015-11-20T13:02:05Z

struct object is one of the major data structures dealing with object IDs. Convert it to use struct object_id instead of an unsigned char array. Convert get_object_hash to refer to the new member as well. Signed-off-by: brian m. carlson Signed-off-by: Jeff King

prefer git_pathdup to git_path in some possibly-dangerous cases

2015-08-10T22:37:12Z

Because git_path uses a static buffer that is shared with calls to git_path, mkpath, etc, it can be dangerous to assign the result to a variable or pass it to a non-trivial function. The value may change unexpectedly due to other calls. None of the cases changed here has a known bug, but they're worth converting away from git_path because: 1. It's easy to use git_pathdup in these cases. 2. They use constructs (like assignment) that make it hard to tell whether they're safe or not. The extra malloc overhead should be trivial, as an allocation should be an order of magnitude cheaper than a system call (which we are clearly about to make, since we are constructing a filename). The real cost is that we must remember to free the result. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano

convert "enum date_mode" into a struct

2015-06-29T18:39:07Z

In preparation for adding date modes that may carry extra information beyond the mode itself, this patch converts the date_mode enum into a struct. Most of the conversion is fairly straightforward; we pass the struct as a pointer and dereference the type field where necessary. Locations that declare a date_mode can use a "{}" constructor. However, the tricky case is where we use the enum labels as constants, like: show_date(t, tz, DATE_NORMAL); Ideally we could say: show_date(t, tz, &{ DATE_NORMAL }); but of course C does not allow that. Likewise, we cannot cast the constant to a struct, because we need to pass an actual address. Our options are basically: 1. Manually add a "struct date_mode d = { DATE_NORMAL }" definition to each caller, and pass "&d". This makes the callers uglier, because they sometimes do not even have their own scope (e.g., they are inside a switch statement). 2. Provide a pre-made global "date_normal" struct that can be passed by address. We'd also need "date_rfc2822", "date_iso8601", and so forth. But at least the ugliness is defined in one place. 3. Provide a wrapper that generates the correct struct on the fly. The big downside is that we end up pointing to a single global, which makes our wrapper non-reentrant. But show_date is already not reentrant, so it does not matter. This patch implements 3, along with a minor macro to keep the size of the callers sane. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano

Merge branch 'bc/object-id'

2015-06-05T19:17:37Z

for_each_ref() callback functions were taught to name the objects not with "unsigned char sha1[20]" but with "struct object_id". * bc/object-id: (56 commits) struct ref_lock: convert old_sha1 member to object_id warn_if_dangling_symref(): convert local variable "junk" to object_id each_ref_fn_adapter(): remove adapter rev_list_insert_ref(): remove unneeded arguments rev_list_insert_ref_oid(): new function, taking an object_oid mark_complete(): remove unneeded arguments mark_complete_oid(): new function, taking an object_oid clear_marks(): rewrite to take an object_id argument mark_complete(): rewrite to take an object_id argument send_ref(): convert local variable "peeled" to object_id upload-pack: rewrite functions to take object_id arguments find_symref(): convert local variable "unused" to object_id find_symref(): rewrite to take an object_id argument write_one_ref(): rewrite to take an object_id argument write_refs_to_temp_dir(): convert local variable sha1 to object_id submodule: rewrite to take an object_id argument shallow: rewrite functions to take object_id arguments handle_one_ref(): rewrite to take an object_id argument add_info_ref(): rewrite to take an object_id argument handle_one_reflog(): rewrite to take an object_id argument ...

http-backend: spool ref negotiation requests to buffer

2015-05-26T03:43:18Z

When http-backend spawns "upload-pack" to do ref negotiation, it streams the http request body to upload-pack, who then streams the http response back to the client as it reads. In theory, git can go full-duplex; the client can consume our response while it is still sending the request. In practice, however, HTTP is a half-duplex protocol. Even if our client is ready to read and write simultaneously, we may have other HTTP infrastructure in the way, including the webserver that spawns our CGI, or any intermediate proxies. In at least one documented case[1], this leads to deadlock when trying a fetch over http. What happens is basically: 1. Apache proxies the request to the CGI, http-backend. 2. http-backend gzip-inflates the data and sends the result to upload-pack. 3. upload-pack acts on the data and generates output over the pipe back to Apache. Apache isn't reading because it's busy writing (step 1). This works fine most of the time, because the upload-pack output ends up in a system pipe buffer, and Apache reads it as soon as it finishes writing. But if both the request and the response exceed the system pipe buffer size, then we deadlock (Apache blocks writing to http-backend, http-backend blocks writing to upload-pack, and upload-pack blocks writing to Apache). We need to break the deadlock by spooling either the input or the output. In this case, it's ideal to spool the input, because Apache does not start reading either stdout _or_ stderr until we have consumed all of the input. So until we do so, we cannot even get an error message out to the client. The solution is fairly straight-forward: we read the request body into an in-memory buffer in http-backend, freeing up Apache, and then feed the data ourselves to upload-pack. But there are a few important things to note: 1. We limit the in-memory buffer to prevent an obvious denial-of-service attack. This is a new hard limit on requests, but it's unlikely to come into play. The default value is 10MB, which covers even the ridiculous 100,000-ref negotation in the included test (that actually caps out just over 5MB). But it's configurable on the off chance that you don't mind spending some extra memory to make even ridiculous requests work. 2. We must take care only to buffer when we have to. For pushes, the incoming packfile may be of arbitrary size, and we should connect the input directly to receive-pack. There's no deadlock problem here, though, because we do not produce any output until the whole packfile has been read. For upload-pack's initial ref advertisement, we similarly do not need to buffer. Even though we may generate a lot of output, there is no request body at all (i.e., it is a GET, not a POST). [1] http://article.gmane.org/gmane.comp.version-control.git/269020 Test-adapted-from: Dennis Kaarsemaker Signed-off-by: Jeff King Signed-off-by: Junio C Hamano

show_head_ref(): convert local variable "unused" to object_id

2015-05-25T19:19:33Z

Signed-off-by: Michael Haggerty Signed-off-by: brian m. carlson Signed-off-by: Junio C Hamano

http-backend: rewrite to take an object_id argument

2015-05-25T19:19:33Z

Signed-off-by: Michael Haggerty Signed-off-by: brian m. carlson Signed-off-by: Junio C Hamano

each_ref_fn: change to take an object_id parameter

2015-05-25T19:19:27Z

Change typedef each_ref_fn to take a "const struct object_id *oid" parameter instead of "const unsigned char *sha1". To aid this transition, implement an adapter that can be used to wrap old-style functions matching the old typedef, which is now called "each_ref_sha1_fn"), and make such functions callable via the new interface. This requires the old function and its cb_data to be wrapped in a "struct each_ref_fn_sha1_adapter", and that object to be used as the cb_data for an adapter function, each_ref_fn_adapter(). This is an enormous diff, but most of it consists of simple, mechanical changes to the sites that call any of the "for_each_ref" family of functions. Subsequent to this change, the call sites can be rewritten one by one to use the new interface. Signed-off-by: Michael Haggerty Signed-off-by: brian m. carlson Signed-off-by: Junio C Hamano

http-backend: fix die recursion with custom handler

2015-05-15T18:13:47Z

When we die() in http-backend, we call a custom handler that writes an HTTP 500 response to stdout, then reports the error to stderr. Our routines for writing out the HTTP response may themselves die, leading to us entering die() again. When it was originally written, that was OK; our custom handler keeps a variable to notice this and does not recurse. However, since cd163d4 (usage.c: detect recursion in die routines and bail out immediately, 2012-11-14), the main die() implementation detects recursion before we even get to our custom handler, and bails without printing anything useful. We can handle this case by doing two things: 1. Installing a custom die_is_recursing handler that allows us to enter up to one level of recursion. Only the first call to our custom handler will try to write out the error response. So if we die again, that is OK. If we end up dying more than that, it is a sign that we are in an infinite recursion. 2. Reporting the error to stderr before trying to write out the HTTP response. In the current code, if we do die() trying to write out the response, we'll exit immediately from this second die(), and never get a chance to output the original error (which is almost certainly the more interesting one; the second die is just going to be along the lines of "I tried to write to stdout but it was closed"). Signed-off-by: Jeff King Signed-off-by: Junio C Hamano

Merge branch 'rs/run-command-env-array'

2014-10-24T21:57:54Z

Add managed "env" array to child_process to clarify the lifetime rules. * rs/run-command-env-array: use env_array member of struct child_process run-command: add env_array, an optional argv_array for env