-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace atoi() with strtoi_with_tail() #1646
base: master
Are you sure you want to change the base?
Conversation
Welcome to GitGitGadgetHi @mohit-marathe, and welcome to GitGitGadget, the GitHub App to send patch series to the Git mailing list from GitHub Pull Requests. Please make sure that your Pull Request has a good description, as it will be used as cover letter. You can CC potential reviewers by adding a footer to the PR description with the following syntax:
Also, it is a good idea to review the commit messages one last time, as the Git project expects them in a quite specific form:
It is in general a good idea to await the automated test ("Checks") in this Pull Request before contributing the patches, e.g. to avoid trivial issues such as unportable code. Contributing the patchesBefore you can contribute the patches, your GitHub username needs to be added to the list of permitted users. Any already-permitted user can do that, by adding a comment to your PR of the form Both the person who commented An alternative is the channel
Once on the list of permitted usernames, you can contribute the patches to the Git mailing list by adding a PR comment If you want to see what email(s) would be sent for a After you submit, GitGitGadget will respond with another comment that contains the link to the cover letter mail in the Git mailing list archive. Please make sure to monitor the discussion in that thread and to address comments and suggestions (while the comments and suggestions will be mirrored into the PR by GitGitGadget, you will still want to reply via mail). If you do not want to subscribe to the Git mailing list just to be able to respond to a mail, you can download the mbox from the Git mailing list archive (click the curl -g --user "<EMailAddress>:<Password>" \
--url "imaps://imap.gmail.com/INBOX" -T /path/to/raw.txt To iterate on your change, i.e. send a revised patch or patch series, you will first want to (force-)push to the same branch. You probably also want to modify your Pull Request description (or title). It is a good idea to summarize the revision by adding something like this to the cover letter (read: by editing the first comment on the PR, i.e. the PR description):
To send a new iteration, just add another PR comment with the contents: Need help?New contributors who want advice are encouraged to join [email protected], where volunteers who regularly contribute to Git are willing to answer newbie questions, give advice, or otherwise provide mentoring to interested contributors. You must join in order to post or view messages, but anyone can join. You may also be able to find help in real time in the developer IRC channel, |
c952b72
to
1ece724
Compare
/allow |
User mohit-marathe is now allowed to use GitGitGadget. WARNING: mohit-marathe has no public email address set on GitHub; |
/preview |
Preview email sent as [email protected] |
/submit |
Submitted as [email protected] To fetch this version into
To fetch this version to local tag
|
@@ -1,3 +1,4 @@ | |||
#include "git-compat-util.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Mohit Marathe via GitGitGadget" <[email protected]> writes:
> static const char digits[] = "0123456789";
> const char *q, *r;
> + char *endp;
> int n;
>
> q = p + 4;
> n = strspn(q, digits);
> if (q[n] == ',') {
> q += n + 1;
> - *p_before = atoi(q);
> + if (strtol_i2(q, 10, p_before, &endp) != 0)
> + return 0;
> n = strspn(q, digits);
> } else {
> *p_before = 1;
> }
Looking at this code again, because we upfront run strspn() to make
sure q[] begins with a run of digits *and* followed by a comma
(which is not a digit), I think it is safe to use atoi() and assume
it would slurp all the digits. So the lack of another check the use
of new helper allows us to do, namely
if (endp != q + n)
return 0;
is probably OK, but that is one of the two reasons why you would
favor the use of new helper over atoi(), so the upside of this
change is not all that great as I originally hoped for X-<.
Not your fault, of course. We would still catch when the digit
string that starts q[] is too large to fit in an int, which is an
upside.
> - if (n == 0 || q[n] != ' ' || q[n+1] != '+')
> + if (q[n] != ' ' || q[n+1] != '+')
> return 0;
When we saw q[] that begins with ',' upon entry to this function, we
used to say *p_before = 1 and then saw n==0 and realized it is not a
good input and returned 0 from the function.
Now we instead peek q[0] and the check says q[0] is not SP so we
will return 0 the same way so there is no behaviour change from the
upper hunk? The conversion may be correct, but it wasn't explained
in the proposed commit log message.
How are the change to stop caring about n==0 here ...
> r = q + n + 2;
> n = strspn(r, digits);
> if (r[n] == ',') {
> r += n + 1;
> - *p_after = atoi(r);
> - n = strspn(r, digits);
> + if (strtol_i2(r, 10, p_after, &endp) != 0)
> + return 0;
> } else {
> *p_after = 1;
> }
> - if (n == 0)
> - return 0;
... and this change here, linked to the switch from atoi() to
strtul_i2()[*]?
It looks like an unrelated behaviour change that is left
unexplained.
> return 1;
> }
Thanks for working on this one.
[Footnote]
* by the way, what a horrible name for a public function. Yuck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Mohit Marathe wrote (reply to this):
On Tuesday, January 23rd, 2024 at 1:02 AM, Junio C Hamano <[email protected]> wrote:
> "Mohit Marathe via GitGitGadget" [email protected] writes:
>
> > static const char digits[] = "0123456789";
> > const char *q, *r;
> > + char *endp;
> > int n;
> >
> > q = p + 4;
> > n = strspn(q, digits);
> > if (q[n] == ',') {
> > q += n + 1;
> > - *p_before = atoi(q);
> > + if (strtol_i2(q, 10, p_before, &endp) != 0)
> > + return 0;
> > n = strspn(q, digits);
> > } else {
> > *p_before = 1;
> > }
>
>
> Looking at this code again, because we upfront run strspn() to make
> sure q[] begins with a run of digits and followed by a comma
> (which is not a digit), I think it is safe to use atoi() and assume
> it would slurp all the digits. So the lack of another check the use
> of new helper allows us to do, namely
>
> if (endp != q + n)
> return 0;
>
> is probably OK, but that is one of the two reasons why you would
> favor the use of new helper over atoi(), so the upside of this
> change is not all that great as I originally hoped for X-<.
>
> Not your fault, of course. We would still catch when the digit
> string that starts q[] is too large to fit in an int, which is an
> upside.
>
> > - if (n == 0 || q[n] != ' ' || q[n+1] != '+')
> > + if (q[n] != ' ' || q[n+1] != '+')
> > return 0;
>
>
> When we saw q[] that begins with ',' upon entry to this function, we
> used to say *p_before = 1 and then saw n==0 and realized it is not a
> good input and returned 0 from the function.
Uh oh, I just looked at the `if` block and concluded that it was just
to check if it has numbers after the ',', which`strtol_i2()` already
does. But I totally missed this one.
> Now we instead peek q[0] and the check says q[0] is not SP so we
> will return 0 the same way so there is no behaviour change from the
> upper hunk? The conversion may be correct, but it wasn't explained
> in the proposed commit log message.
>
> How are the change to stop caring about n==0 here ...
>
> > r = q + n + 2;
> > n = strspn(r, digits);
> > if (r[n] == ',') {
> > r += n + 1;
> > - *p_after = atoi(r);
> > - n = strspn(r, digits);
> > + if (strtol_i2(r, 10, p_after, &endp) != 0)
> > + return 0;
> > } else {
> > *p_after = 1;
> > }
> > - if (n == 0)
> > - return 0;
>
>
> ... and this change here, linked to the switch from atoi() to
> strtul_i2()[*]?
>
> It looks like an unrelated behaviour change that is left
> unexplained.
>
> > return 1;
> > }
>
>
> Thanks for working on this one.
>
>
> [Footnote]
>
> * by the way, what a horrible name for a public function. Yuck.
Yeah, I thought so too /:D How does `strtol_i_updated` sounds?
Thanks for you feedback! I will send v2 with the corrections soon.
2dddb73
to
c3b202a
Compare
/preview |
Preview email sent as [email protected] |
8d32119
to
f3a03d6
Compare
/submit |
Submitted as [email protected] To fetch this version into
To fetch this version to local tag
|
f3a03d6
to
0e11719
Compare
/submit |
Submitted as [email protected] To fetch this version into
To fetch this version to local tag
|
0e11719
to
17f2dda
Compare
/submit |
Submitted as [email protected] To fetch this version into
To fetch this version to local tag
|
git-compat-util.h
Outdated
@@ -1309,6 +1309,29 @@ static inline int strtol_i(char const *s, int base, int *result) | |||
return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Mohit Marathe via GitGitGadget" <[email protected]> writes:
> From: Mohit Marathe <[email protected]>
>
> This function is an updated version of strtol_i() function. It will
> give more control to handle parsing of the characters after the
> integer and better error handling while parsing numbers.
i2 was horrible but this is worse. What would you call an even
newer variant when you need to add one? strtol_i_updated_twice?
To readers who are reading the code in 6 months, it is totally
uninteresting that strtol_i() is an older function and the new thing
was invented later as its update. What they want to learn is how
these two are different, what additional things this new one lets
them do compared to the old one, namely: we can optionally learn
where the run of the digits has ended.
Perhaps call it "strtoi_with_tail" or something, unless others
suggest even better names?
Thanks.
@@ -1,3 +1,4 @@ | |||
#include "git-compat-util.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Mohit Marathe via GitGitGadget" <[email protected]> writes:
> q = p + 4;
> n = strspn(q, digits);
> if (q[n] == ',') {
> q += n + 1;
So, we saw "@@ -" and skipped over these four bytes, skipped the
digits from there, and found a comma.
For "@@ -29,14 +30,18 @@", for example, our q is now "14 +30,18 @@"
as we have skipped over that comma after 29.
> - *p_before = atoi(q);
> + if (strtol_i_updated(q, 10, p_before, &endp) != 0)
> + return 0;
We parse out 14 and store it to *p_before. endp points at " +30..."
now.
> n = strspn(q, digits);
> + if (endp != q + n)
> + return 0;
Is this necessary? By asking strtol_i_updated() where the number ended,
we already know endp without skipping the digits in q with strspn().
Shouldn't these three lines become more like
n = endp - q;
instead?
After all, we are not trying to find a bug in strtol_i_updated(),
which would be the only reason how this "return 0" would trigger.
> } else {
> *p_before = 1;
> }
> @@ -48,8 +53,11 @@ static int scan_hunk_header(const char *p, int *p_before, int *p_after)
> n = strspn(r, digits);
> if (r[n] == ',') {
> r += n + 1;
> - *p_after = atoi(r);
> + if (strtol_i_updated(r, 10, p_after, &endp) != 0)
> + return 0;
> n = strspn(r, digits);
> + if (endp != r + n)
> + return 0;
Likewise.
> } else {
> *p_after = 1;
> }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Mohit Marathe wrote (reply to this):
On Thursday, January 25th, 2024 at 2:32 AM, Junio C Hamano <[email protected]> wrote:
> "Mohit Marathe via GitGitGadget" [email protected] writes:
>
> > q = p + 4;
> > n = strspn(q, digits);
> > if (q[n] == ',') {
> > q += n + 1;
>
>
> So, we saw "@@ -" and skipped over these four bytes, skipped the
> digits from there, and found a comma.
>
> For "@@ -29,14 +30,18 @@", for example, our q is now "14 +30,18 @@"
> as we have skipped over that comma after 29.
>
> > - *p_before = atoi(q);
> > + if (strtol_i_updated(q, 10, p_before, &endp) != 0)
> > + return 0;
>
>
> We parse out 14 and store it to *p_before. endp points at " +30..."
> now.
>
> > n = strspn(q, digits);
> > + if (endp != q + n)
> > + return 0;
>
>
> Is this necessary? By asking strtol_i_updated() where the number ended,
> we already know endp without skipping the digits in q with strspn().
> Shouldn't these three lines become more like
>
> n = endp - q;
>
> instead?
>
> After all, we are not trying to find a bug in strtol_i_updated(),
> which would be the only reason how this "return 0" would trigger.
>
I was confused about how an invalid hunk header of a corrupted would
look like. This was just an attempt of making a sanity check. But after
taking another look, I agree that its unnecessary.
> > } else {
> > *p_before = 1;
> > }
> > @@ -48,8 +53,11 @@ static int scan_hunk_header(const char *p, int *p_before, int *p_after)
> > n = strspn(r, digits);
> > if (r[n] == ',') {
> > r += n + 1;
> > - *p_after = atoi(r);
> > + if (strtol_i_updated(r, 10, p_after, &endp) != 0)
> > + return 0;
> > n = strspn(r, digits);
> > + if (endp != r + n)
> > + return 0;
>
>
> Likewise.
>
> > } else {
> > *p_after = 1;
> > }
17f2dda
to
ee8f4ae
Compare
/submit |
Submitted as [email protected] To fetch this version into
To fetch this version to local tag
|
git-compat-util.h
Outdated
@@ -1309,6 +1309,29 @@ static inline int strtol_i(char const *s, int base, int *result) | |||
return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Mohit Marathe via GitGitGadget" <[email protected]> writes:
> From: Mohit Marathe <[email protected]>
>
> This function is an updated version of strtol_i() function. It will
> give more control to handle parsing of the characters after the
> numbers and better error handling while parsing numbers.
>
> Signed-off-by: Mohit Marathe <[email protected]>
> ---
> git-compat-util.h | 23 +++++++++++++++++++++++
> 1 file changed, 23 insertions(+)
>
> diff --git a/git-compat-util.h b/git-compat-util.h
> index 7c2a6538e5a..c576b1b104f 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -1309,6 +1309,29 @@ static inline int strtol_i(char const *s, int base, int *result)
> return 0;
> }
Are we leaving the original one above? Shouldn't this step instead
remove it, as strtol_i() is now a C preprocessor macro as seen below?
> +#define strtol_i(s,b,r) strtoi_with_tail((s), (b), (r), NULL)
> +static inline int strtoi_with_tail(char const *s, int base, int *result, char **endp)
> +{
> + long ul;
> + char *dummy = NULL;
> +
> + if (!endp)
> + endp = &dummy;
> + errno = 0;
> + ul = strtol(s, endp, base);
> + if (errno ||
> + /*
> + * if we are told to parse to the end of the string by
> + * passing NULL to endp, it is an error to have any
> + * remaining character after the digits.
> + */
> + (dummy && *dummy) ||
> + *endp == s || (int) ul != ul)
> + return -1;
> + *result = ul;
> + return 0;
> +}
> +
> void git_stable_qsort(void *base, size_t nmemb, size_t size,
> int(*compar)(const void *, const void *));
> #ifdef INTERNAL_QSORT
This function is an updated version of strtol_i() function. It will give more control to handle parsing of the characters after the numbers and better error handling while parsing numbers. Signed-off-by: Mohit Marathe <[email protected]>
The change is made to improve the error-handling capabilities during the conversion of string to integers. The `strtoi_with_tail` function offers a more robust mechanism for converting strings to integers by providing enhanced error detection. Unlike `atoi`, `strtoi_with_tail` allows the code to differentiate between a valid conversion and an invalid one, offering better resilience against potential issues such as reading hunk header of a corrupted patch. Signed-off-by: Mohit Marathe <[email protected]>
ee8f4ae
to
858d6f9
Compare
/submit |
Submitted as [email protected] To fetch this version into
To fetch this version to local tag
|
Hello,
This patch series replaces
atoi()
with an updated version ofstrtol_i()
calledstrtoi_with_tail
(Credits: Junio C Hamano). Thereasoning behind this is to improve error handling by not allowing
non-numerical characters in the hunk header (which might happen
in case of a corrupt patch, although rarely).
There is still a change to be made, as Junio says:
"A corrupt patch may be getting a nonsense patch-ID with the current
code and hopefully is not matching other patches that are not
corrupt, but with such a change, a corrupt patch may not be getting
any patch-ID and a loop that computes patch-ID for many files and
try to match them up might need to be rewritten to take the new
failure case into account."
I'm not sure where this change needs to me made (maybe
get_one_patchid()
?). It would be great if anyone could point me tothe correct place.
Thanks,
Mohit Marathe