git/urlmatch.c, branch v2.26.1

credential: allow wildcard patterns when matching config

2020-02-20T21:05:43Z

In some cases, a user will want to use a specific credential helper for a wildcard pattern, such as https://*.corp.example.com. We have code that handles this already with the urlmatch code, so let's use that instead of our custom code. Since the urlmatch code is a superset of our current matching in terms of capabilities, there shouldn't be any cases of things that matched previously that don't match now. However, in addition to wildcard matching, we now use partial path matching, which can cause slightly different behavior in the case that a helper applies to the prefix (considering path components) of the remote URL. While different, this is probably the behavior people were wanting anyway. Since we're using the urlmatch code, we need to encode the components we've gotten into a URL to match, so add a function to percent-encode data and format the URL with it. We now also no longer need to the custom code to match URLs, so let's remove it. Additionally, the urlmatch code always looks for the best match, whereas we want all matches for credential helpers to preserve existing behavior. Let's add an optional field, select_fn, that lets us control which items we want (in this case, all of them) and default it to the best-match code that already exists for other users. Signed-off-by: brian m. carlson Signed-off-by: Junio C Hamano

urlmatch: use hex2chr() in append_normalized_escapes()

2017-07-09T16:43:01Z

Simplify the code by using hex2chr() to convert and check for invalid characters at the same time instead of doing that sequentially with one table lookup for each. Signed-off-by: Rene Scharfe Signed-off-by: Junio C Hamano

urlmatch: allow globbing for the URL host part

2017-02-01T21:22:50Z

The URL matching function computes for two URLs whether they match not. The match is performed by splitting up the URL into different parts and then doing an exact comparison with the to-be-matched URL. The main user of `urlmatch` is the configuration subsystem. It allows to set certain configurations based on the URL which is being connected to via keys like `http..*`. A common use case for this is to set proxies for only some remotes which match the given URL. Unfortunately, having exact matches for all parts of the URL can become quite tedious in some setups. Imagine for example a corporate network where there are dozens or even hundreds of subdomains, which would have to be configured individually. Allow users to write an asterisk '*' in place of any 'host' or 'subdomain' label as part of the host name. For example, "http.https://*.example.com.proxy" sets "http.proxy" for all direct subdomains of "https://example.com", e.g. "https://foo.example.com", but not "https://foo.bar.example.com". Signed-off-by: Patrick Steinhardt Helped-by: Junio C Hamano Signed-off-by: Junio C Hamano

urlmatch: include host in urlmatch ranking

2017-02-01T21:22:46Z

In order to be able to rank positive matches by `urlmatch`, we inspect the path length and user part to decide whether a match is better than another match. As all other parts are matched exactly between both URLs, this is the correct thing to do right now. In the future, though, we want to introduce wild cards for the domain part. When doing this, it does not make sense anymore to only compare the path lengths. Instead, we also want to compare the domain lengths to determine which of both URLs matches the host part more closely. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano

urlmatch: split host and port fields in `struct url_info`

2017-01-31T18:06:54Z

The `url_info` structure contains information about a normalized URL with the URL's components being represented by different fields. The host and port part though are to be accessed by the same `host` field, so that getting the host and/or port separately becomes more involved than really necessary. To make the port more readily accessible, split up the host and port fields. Namely, the `host_len` will not include the port length anymore and a new `port_off` field has been added which includes the offset to the port, if available. The only user of these fields is `url_normalize_1`. This change makes it easier later on to treat host and port differently when introducing globs for domains. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano

urlmatch: enable normalization of URLs with globs

2017-01-31T18:06:54Z

The `url_normalize` function is used to validate and normalize URLs. As such, it does not allow for some special characters to be part of the URLs that are to be normalized. As we want to allow using globs in some configuration keys making use of URLs, namely `http..`, but still normalize them, we need to somehow enable some additional allowed characters. To do this without having to change all callers of `url_normalize`, where most do not actually want globbing at all, we split off another function `url_normalize_1`. This function accepts an additional parameter `allow_globs`, which is subsequently called by `url_normalize` with `allow_globs=0`. As of now, this function is not used with globbing enabled. A caller will be added in the following commit. Signed-off-by: Patrick Steinhardt Signed-off-by: Junio C Hamano

urlmatch.c: make match_urls() static

2015-01-15T19:05:48Z

No external callers exist. Signed-off-by: Junio C Hamano

isxdigit: cast input to unsigned char

2014-10-16T17:10:36Z

Otherwise, callers must do so or risk triggering warnings -Wchar-subscript (and rightfully so; a signed char might cause us to use a bogus negative index into the hexval_table). While we are dropping the now-unnecessary casts from the caller in urlmatch.c, we can get rid of similar casts in actually parsing the hex by using the hexval() helper, which implicitly casts to unsigned (but note that we cannot implement isxdigit in terms of hexval(), as it also casts its return value to unsigned). Signed-off-by: Jeff King Signed-off-by: Junio C Hamano

refactor skip_prefix to return a boolean

2014-06-20T17:44:43Z

The skip_prefix() function returns a pointer to the content past the prefix, or NULL if the prefix was not found. While this is nice and simple, in practice it makes it hard to use for two reasons: 1. When you want to conditionally skip or keep the string as-is, you have to introduce a temporary variable. For example: tmp = skip_prefix(buf, "foo"); if (tmp) buf = tmp; 2. It is verbose to check the outcome in a conditional, as you need extra parentheses to silence compiler warnings. For example: if ((cp = skip_prefix(buf, "foo")) /* do something with cp */ Both of these make it harder to use for long if-chains, and we tend to use starts_with() instead. However, the first line of "do something" is often to then skip forward in buf past the prefix, either using a magic constant or with an extra strlen(3) (which is generally computed at compile time, but means we are repeating ourselves). This patch refactors skip_prefix() to return a simple boolean, and to provide the pointer value as an out-parameter. If the prefix is not found, the out-parameter is untouched. This lets you write: if (skip_prefix(arg, "foo ", &arg)) do_foo(arg); else if (skip_prefix(arg, "bar ", &arg)) do_bar(arg); Signed-off-by: Jeff King Signed-off-by: Junio C Hamano

urlmatch.c: recompute pointer after append_normalized_escapes

2013-09-12T22:27:01Z

When append_normalized_escapes is called, its internal strbuf_add* calls can cause the strbuf's buf to be reallocated changing the value of the buf pointer. Do not use the strbuf buf pointer from before any append_normalized_escapes calls afterwards. Instead recompute the needed pointer. Signed-off-by: Thomas Rast Signed-off-by: Kyle J. McKay Signed-off-by: Junio C Hamano