Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-128271: fix incorrect handling of negative read sizes in HTTPResponse.read() #128270

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

manushkin
Copy link

@manushkin manushkin commented Dec 26, 2024

Param value amt can be negative, e.x. -1.
In that case we read all data, with chunk separators, instead of correct reading.

P.S. example of using -1 here:
https://github.com/Textualize/rich/blob/master/rich/progress.py#L247

Param value `amt` can be negative, e.x. -1.
In that case we read all data, with chunk separators, instead of correct reading.

P.S. example of using -1 here:
https://github.com/Textualize/rich/blob/master/rich/progress.py#L247
Copy link

cpython-cla-bot bot commented Dec 26, 2024

All commit authors signed the Contributor License Agreement.
CLA signed

@bedevere-app
Copy link

bedevere-app bot commented Dec 26, 2024

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@manushkin
Copy link
Author

Should I create an issue/bug before that PR?

@picnixz
Copy link
Contributor

picnixz commented Dec 26, 2024

Should I create an issue/bug before that PR?

Yes.


Now, the prototype of HTTPResponse.read() is wrong and this is the one that should be fixed instead. If you look at the call chain, you'll see that passing amt=-1 actually leads to a lot of issues. For instance:

while (chunk_left := self._get_chunk_left()) is not None:
    if amt is not None and amt <= chunk_left:
        value.append(self._safe_read(amt))
        self.chunk_left = chunk_left - amt
        break

We have an issue with self.chunk_left = chunk_left - amt and self._safe_read. Instead, we should:

  • assert that amt is None or amt >= 0 in private methods
  • change amt = -1 to amt = None when calling read(). Namely read(-1) should be equivalent to read(None).

Btw, negative reading size means "read everything you can" (see: https://docs.python.org/3/library/io.html#io.BufferedIOBase.read).

@manushkin
Copy link
Author

I created issue: #128271

@picnixz picnixz changed the title Fixed http read_chunked for amt with negative value gh-128271: fix incorrect handling of negative read sizes in HTTPResponse.read() Dec 26, 2024
@picnixz
Copy link
Contributor

picnixz commented Dec 26, 2024

Considering the tests are failing and that the implementation should handle negative reading sizes as being equivalent to None, I'm converting this PR to a draft PR. When everything is corrected, you can ask for a review from me.

@picnixz picnixz marked this pull request as draft December 26, 2024 10:11
@picnixz
Copy link
Contributor

picnixz commented Dec 26, 2024

By the way, we need some tests for that. But focus on fixing the implementation first and I can help you for the tests later.

@bedevere-app
Copy link

bedevere-app bot commented Dec 26, 2024

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@picnixz
Copy link
Contributor

picnixz commented Dec 26, 2024

  • You can forget about hypothesis tests (the failure is known)

  • You should add a NEWS entry via blurb. Something like:

    Fix incorrect handling of negative read sizes in :meth:`HTTPResponse.read
    <http.client.HTTPResponse.read>`. Patch by YOUR_NAME.

    If you don't want to, you can leave out "Patch by [...]".

  • We can now test this. You can improve BasicTest.test_chunked in test_httplib.py by adding a test case for None and -1. You can take inspiration from

    for n in range(1, 12):
        sock = FakeSocket(chunked_start + last_chunk + chunked_end)
        resp = client.HTTPResponse(sock, method="GET")
        resp.begin()
        self.assertEqual(resp.read(n) + resp.read(n) + resp.read(), expected)
        resp.close()

    where you'll need to modify n and adapt the assertEqual assertion.

Lib/test/test_httplib.py Outdated Show resolved Hide resolved
@picnixz picnixz marked this pull request as ready for review December 26, 2024 13:23
@picnixz picnixz self-requested a review December 26, 2024 13:24
Copy link
Contributor

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this is something I haven't seen in the test_chunked(), but shouldn't we set {'Transfer-Encoding': 'chunked'} as well? Just to be sure, can you locally check that resp.chunked is true? (to check that we are indeed using the method you fixed)

Lib/test/test_httplib.py Outdated Show resolved Hide resolved
@illia-v
Copy link
Contributor

illia-v commented Dec 26, 2024

FYI, we've had a similar change done in urllib3 urllib3/urllib3#3356

@picnixz
Copy link
Contributor

picnixz commented Dec 26, 2024

Thanks for the information. We could also change our own _read1_chunked but the latter already handles negative n so... it should be fine on our side. However, @manushkin could you perhaps add some tests for read1(None) or read1(-123) as well (if there are none?) [you can also decide to make a follow-up PR for that]

@manushkin
Copy link
Author

manushkin commented Dec 27, 2024

read1 does not support None as param.
So I think it's better to improve it in another PR.

@picnixz
Copy link
Contributor

picnixz commented Dec 27, 2024

read1 does not support None as param.

Oh yes, the docs don't mention that None is accepted. It's fine now (though I'm wondering whether we should accept it or not. Since no one complained about it (or maybe there is an opened issue for that), let's not change it for now (neither here, nor in an another PR)).

Copy link
Contributor

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could refactor one day the tests since we always do the same thing but let's leave it for another (possible) PR. Otherwise, address the last comment on the NEWS entry and I think we're good to go (although a core dev (which I'm not) would need to merge your PR).

Lib/test/test_httplib.py Outdated Show resolved Hide resolved
@picnixz picnixz self-requested a review December 27, 2024 15:06
Copy link
Contributor

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Implementation-wise, I don't know whether it's better to handle a negative size in read() or in _read_chunked() but considering we're handling negative sizes in _read1_chunked(), it makes sense to do the same in _read_chunked().

@picnixz
Copy link
Contributor

picnixz commented Dec 27, 2024

Maybe some comments from @SethMichaelLarson or @illia-v (who were involved in the urllib3 PR)?

@illia-v
Copy link
Contributor

illia-v commented Dec 27, 2024

Implementation-wise, I don't know whether it's better to handle a negative size in read() or in _read_chunked() but considering we're handling negative sizes in _read1_chunked(), it makes sense to do the same in _read_chunked().

It looks like handling this in read will make negative ints and None equivalent for non-chunked responses too

cpython/Lib/http/client.py

Lines 472 to 501 in 64173cd

if self.chunked:
return self._read_chunked(amt)
if amt is not None:
if self.length is not None and amt > self.length:
# clip the read to the "end of response"
amt = self.length
s = self.fp.read(amt)
if not s and amt:
# Ideally, we would raise IncompleteRead if the content-length
# wasn't satisfied, but it might break compatibility.
self._close_conn()
elif self.length is not None:
self.length -= len(s)
if not self.length:
self._close_conn()
return s
else:
# Amount is not given (unbounded read) so we must check self.length
if self.length is None:
s = self.fp.read()
else:
try:
s = self._safe_read(self.length)
except IncompleteRead:
self._close_conn()
raise
self.length = 0
self._close_conn() # we read everything
return s

@picnixz
Copy link
Contributor

picnixz commented Dec 27, 2024

Note: I didn't quite understand whether you confirmed or not my suggestion of keeping it _read_chunked().

I think it's probably better to keep it in _read_chunked. While making the check in read() would make the two versions (non-chunked and chunked) eventually equivalent most of the time, we would still be reading fp in terms of chunk and not in one go, doing multiple calls (I've extracted the relevant lines of _read_chunked):

while (chunk_left := self._get_chunk_left()) is not None:
    value.append(self._safe_read(chunk_left))

If the HTTP response is meant to be read by chunks, we should do it so since "read as many bytes you can in chunks" and "read as many bytes you can in one go" may not be equivalent (depending on how the server sends the data, right?)

@illia-v
Copy link
Contributor

illia-v commented Dec 27, 2024

Let me clarify my point 🙂

This PR makes read handle negative integers and None the same for the if self.chunked case.

But if the response is not chunked, there are still different code paths for negative ints and None. So to fix read fully, the read method needs to handle negative integers anyway, either like in the following diff or in its beginning to pass only positive integers or None to self._read_chunked.

diff --git a/Lib/http/client.py b/Lib/http/client.py
index 1c0332d82bd..33a858d34ae 100644                                                                                                                                                                                            
--- a/Lib/http/client.py                                                                                                                                                                                                         
+++ b/Lib/http/client.py                                                                                                                                                                                                         
@@ -472,7 +472,7 @@ def read(self, amt=None):                                                                                                                                                                                    
         if self.chunked:                                                                                                                                                                                                        
             return self._read_chunked(amt)                                                                                                                                                                                      
                                                                                                                                                                                                                                 
-        if amt is not None:                                                                                                                                                                                                     
+        if amt is not None and amt >= 0:                                                                                                                                                                                        
             if self.length is not None and amt > self.length:                                                                                                                                                                   
                 # clip the read to the "end of response"                                                                                                                                                                        
                 amt = self.length   

@picnixz
Copy link
Contributor

picnixz commented Dec 27, 2024

if amt is not None and amt >= 0:

This is one is not really needed because we would bypass amt > self.length and directly call self.fp.read(amt) which would delegate to BufferIOBase.read() which handles negative ones correctly. But I agree that it's probably better to use the None path.

@illia-v
Copy link
Contributor

illia-v commented Dec 27, 2024

if amt is not None and amt >= 0:

This is one is not really needed because we would bypass amt > self.length and directly call self.fp.read(amt) which would delegate to BufferIOBase.read() which handles negative ones correctly. But I agree that it's probably better to use the None path.

In most cases it's not needed indeed, but there is some logic related to raising IncompleteRead in the None path which is skipped for negative ints currently

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants