gh-128271: fix incorrect handling of negative read sizes in `HTTPResponse.read()` #128270

manushkin · 2024-12-26T09:46:16Z

Param value amt can be negative, e.x. -1.
In that case we read all data, with chunk separators, instead of correct reading.

P.S. example of using -1 here:
https://github.com/Textualize/rich/blob/master/rich/progress.py#L247

Issue: Incorrect handling of negative reading sizes in HTTPResponse.read() #128271

Param value `amt` can be negative, e.x. -1. In that case we read all data, with chunk separators, instead of correct reading. P.S. example of using -1 here: https://github.com/Textualize/rich/blob/master/rich/progress.py#L247

cpython-cla-bot · 2024-12-26T09:46:19Z

All commit authors signed the Contributor License Agreement.

bedevere-app · 2024-12-26T09:46:21Z

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

manushkin · 2024-12-26T09:47:26Z

Should I create an issue/bug before that PR?

picnixz · 2024-12-26T09:55:12Z

Should I create an issue/bug before that PR?

Yes.

Now, the prototype of HTTPResponse.read() is wrong and this is the one that should be fixed instead. If you look at the call chain, you'll see that passing amt=-1 actually leads to a lot of issues. For instance:

while (chunk_left := self._get_chunk_left()) is not None:
    if amt is not None and amt <= chunk_left:
        value.append(self._safe_read(amt))
        self.chunk_left = chunk_left - amt
        break

We have an issue with self.chunk_left = chunk_left - amt and self._safe_read. Instead, we should:

assert that amt is None or amt >= 0 in private methods
change amt = -1 to amt = None when calling read(). Namely read(-1) should be equivalent to read(None).

Btw, negative reading size means "read everything you can" (see: https://docs.python.org/3/library/io.html#io.BufferedIOBase.read).

manushkin · 2024-12-26T10:07:28Z

I created issue: #128271

picnixz · 2024-12-26T10:11:19Z

Considering the tests are failing and that the implementation should handle negative reading sizes as being equivalent to None, I'm converting this PR to a draft PR. When everything is corrected, you can ask for a review from me.

picnixz · 2024-12-26T10:12:13Z

By the way, we need some tests for that. But focus on fixing the implementation first and I can help you for the tests later.

bedevere-app · 2024-12-26T10:28:05Z

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

picnixz · 2024-12-26T10:44:03Z

You can forget about hypothesis tests (the failure is known)

You should add a NEWS entry via blurb. Something like:

Fix incorrect handling of negative read sizes in :meth:`HTTPResponse.read
<http.client.HTTPResponse.read>`. Patch by YOUR_NAME.

If you don't want to, you can leave out "Patch by [...]".

We can now test this. You can improve BasicTest.test_chunked in test_httplib.py by adding a test case for None and -1. You can take inspiration from

for n in range(1, 12):
    sock = FakeSocket(chunked_start + last_chunk + chunked_end)
    resp = client.HTTPResponse(sock, method="GET")
    resp.begin()
    self.assertEqual(resp.read(n) + resp.read(n) + resp.read(), expected)
    resp.close()

where you'll need to modify n and adapt the assertEqual assertion.

Misc/NEWS.d/next/Library/2024-12-26-11-00-03.gh-issue-128271.mCcw3B.rst

Lib/test/test_httplib.py

picnixz

Actually, this is something I haven't seen in the test_chunked(), but shouldn't we set {'Transfer-Encoding': 'chunked'} as well? Just to be sure, can you locally check that resp.chunked is true? (to check that we are indeed using the method you fixed)

Lib/test/test_httplib.py

illia-v · 2024-12-26T16:49:09Z

FYI, we've had a similar change done in urllib3 urllib3/urllib3#3356

picnixz · 2024-12-26T17:40:58Z

Thanks for the information. We could also change our own _read1_chunked but the latter already handles negative n so... it should be fine on our side. However, @manushkin could you perhaps add some tests for read1(None) or read1(-123) as well (if there are none?) [you can also decide to make a follow-up PR for that]

manushkin · 2024-12-27T08:16:07Z

read1 does not support None as param.
So I think it's better to improve it in another PR.

picnixz · 2024-12-27T10:09:09Z

read1 does not support None as param.

Oh yes, the docs don't mention that None is accepted. It's fine now (though I'm wondering whether we should accept it or not. Since no one complained about it (or maybe there is an opened issue for that), let's not change it for now (neither here, nor in an another PR)).

picnixz

We could refactor one day the tests since we always do the same thing but let's leave it for another (possible) PR. Otherwise, address the last comment on the NEWS entry and I think we're good to go (although a core dev (which I'm not) would need to merge your PR).

Misc/NEWS.d/next/Library/2024-12-26-11-00-03.gh-issue-128271.mCcw3B.rst

Lib/test/test_httplib.py

picnixz

Looks good to me. Implementation-wise, I don't know whether it's better to handle a negative size in read() or in _read_chunked() but considering we're handling negative sizes in _read1_chunked(), it makes sense to do the same in _read_chunked().

picnixz · 2024-12-27T15:10:00Z

Maybe some comments from @SethMichaelLarson or @illia-v (who were involved in the urllib3 PR)?

illia-v · 2024-12-27T15:30:23Z

Implementation-wise, I don't know whether it's better to handle a negative size in read() or in _read_chunked() but considering we're handling negative sizes in _read1_chunked(), it makes sense to do the same in _read_chunked().

It looks like handling this in read will make negative ints and None equivalent for non-chunked responses too

cpython/Lib/http/client.py

Lines 472 to 501 in 64173cd

    
           if self.chunked: 
        
               return self._read_chunked(amt) 
        
           if amt is not None: 
        
               if self.length is not None and amt > self.length: 
        
                   # clip the read to the "end of response" 
        
                   amt = self.length 
        
               s = self.fp.read(amt) 
        
               if not s and amt: 
        
                   # Ideally, we would raise IncompleteRead if the content-length 
        
                   # wasn't satisfied, but it might break compatibility. 
        
                   self._close_conn() 
        
               elif self.length is not None: 
        
                   self.length -= len(s) 
        
                   if not self.length: 
        
                       self._close_conn() 
        
               return s 
        
           else: 
        
               # Amount is not given (unbounded read) so we must check self.length 
        
               if self.length is None: 
        
                   s = self.fp.read() 
        
               else: 
        
                   try: 
        
                       s = self._safe_read(self.length) 
        
                   except IncompleteRead: 
        
                       self._close_conn() 
        
                       raise 
        
                   self.length = 0 
        
               self._close_conn()        # we read everything 
        
               return s

picnixz · 2024-12-27T18:28:50Z

Note: I didn't quite understand whether you confirmed or not my suggestion of keeping it _read_chunked().

I think it's probably better to keep it in _read_chunked. While making the check in read() would make the two versions (non-chunked and chunked) eventually equivalent most of the time, we would still be reading fp in terms of chunk and not in one go, doing multiple calls (I've extracted the relevant lines of _read_chunked):

while (chunk_left := self._get_chunk_left()) is not None:
    value.append(self._safe_read(chunk_left))

If the HTTP response is meant to be read by chunks, we should do it so since "read as many bytes you can in chunks" and "read as many bytes you can in one go" may not be equivalent (depending on how the server sends the data, right?)

illia-v · 2024-12-27T18:59:56Z

Let me clarify my point 🙂

This PR makes read handle negative integers and None the same for the if self.chunked case.

But if the response is not chunked, there are still different code paths for negative ints and None. So to fix read fully, the read method needs to handle negative integers anyway, either like in the following diff or in its beginning to pass only positive integers or None to self._read_chunked.

diff --git a/Lib/http/client.py b/Lib/http/client.py
index 1c0332d82bd..33a858d34ae 100644                                                                                                                                                                                            
--- a/Lib/http/client.py                                                                                                                                                                                                         
+++ b/Lib/http/client.py                                                                                                                                                                                                         
@@ -472,7 +472,7 @@ def read(self, amt=None):                                                                                                                                                                                    
         if self.chunked:                                                                                                                                                                                                        
             return self._read_chunked(amt)                                                                                                                                                                                      
                                                                                                                                                                                                                                 
-        if amt is not None:                                                                                                                                                                                                     
+        if amt is not None and amt >= 0:                                                                                                                                                                                        
             if self.length is not None and amt > self.length:                                                                                                                                                                   
                 # clip the read to the "end of response"                                                                                                                                                                        
                 amt = self.length

picnixz · 2024-12-27T19:12:12Z

if amt is not None and amt >= 0:

This is one is not really needed because we would bypass amt > self.length and directly call self.fp.read(amt) which would delegate to BufferIOBase.read() which handles negative ones correctly. But I agree that it's probably better to use the None path.

illia-v · 2024-12-27T19:19:15Z

if amt is not None and amt >= 0:

This is one is not really needed because we would bypass amt > self.length and directly call self.fp.read(amt) which would delegate to BufferIOBase.read() which handles negative ones correctly. But I agree that it's probably better to use the None path.

In most cases it's not needed indeed, but there is some logic related to raising IncompleteRead in the None path which is skipped for negative ints currently

Fixed http read_chunked for amt with negative value

974f928

Param value `amt` can be negative, e.x. -1. In that case we read all data, with chunk separators, instead of correct reading. P.S. example of using -1 here: https://github.com/Textualize/rich/blob/master/rich/progress.py#L247

bedevere-app bot added the awaiting review label Dec 26, 2024

picnixz changed the title ~~Fixed http read_chunked for amt with negative value~~ gh-128271: fix incorrect handling of negative read sizes in HTTPResponse.read() Dec 26, 2024

bedevere-app bot mentioned this pull request Dec 26, 2024

Incorrect handling of negative reading sizes in HTTPResponse.read() #128271

Open

picnixz marked this pull request as draft December 26, 2024 10:11

bedevere-app bot removed the awaiting review label Dec 26, 2024

Fixed http read_chunked for amt with negative value

2685fe6

📜🤖 Added by blurb_it.

343a888

picnixz reviewed Dec 26, 2024

View reviewed changes

Misc/NEWS.d/next/Library/2024-12-26-11-00-03.gh-issue-128271.mCcw3B.rst Outdated Show resolved Hide resolved

picnixz reviewed Dec 26, 2024

View reviewed changes

Lib/test/test_httplib.py Outdated Show resolved Hide resolved

manushkin added 3 commits December 26, 2024 14:12

Added test

f082c53

Improved test

2cd0ead

Simple commit to change email for cpython-cla-bot

48bec8a

manushkin force-pushed the patch-1 branch from f0cc7a0 to 48bec8a Compare December 26, 2024 13:12

picnixz marked this pull request as ready for review December 26, 2024 13:23

bedevere-app bot added the awaiting review label Dec 26, 2024

picnixz self-requested a review December 26, 2024 13:24

picnixz reviewed Dec 26, 2024

View reviewed changes

Lib/test/test_httplib.py Outdated Show resolved Hide resolved

Improved test

26b7b2e

picnixz reviewed Dec 27, 2024

View reviewed changes

Misc/NEWS.d/next/Library/2024-12-26-11-00-03.gh-issue-128271.mCcw3B.rst Outdated Show resolved Hide resolved

Lib/test/test_httplib.py Outdated Show resolved Hide resolved

Improved test

4cd1232

picnixz self-requested a review December 27, 2024 15:06

picnixz approved these changes Dec 27, 2024

View reviewed changes

bedevere-app bot added awaiting core review and removed awaiting review labels Dec 27, 2024

Improved case for reading not chunked data

e348cec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-128271: fix incorrect handling of negative read sizes in `HTTPResponse.read()` #128270

gh-128271: fix incorrect handling of negative read sizes in `HTTPResponse.read()` #128270

manushkin commented Dec 26, 2024 •

edited by bedevere-app bot

Loading

cpython-cla-bot bot commented Dec 26, 2024 •

edited

Loading

bedevere-app bot commented Dec 26, 2024

manushkin commented Dec 26, 2024

picnixz commented Dec 26, 2024 •

edited

Loading

manushkin commented Dec 26, 2024

picnixz commented Dec 26, 2024

picnixz commented Dec 26, 2024

bedevere-app bot commented Dec 26, 2024

picnixz commented Dec 26, 2024 •

edited

Loading

picnixz left a comment

illia-v commented Dec 26, 2024

picnixz commented Dec 26, 2024

manushkin commented Dec 27, 2024 •

edited

Loading

picnixz commented Dec 27, 2024

picnixz left a comment

picnixz left a comment

picnixz commented Dec 27, 2024 •

edited

Loading

illia-v commented Dec 27, 2024

picnixz commented Dec 27, 2024 •

edited

Loading

illia-v commented Dec 27, 2024

picnixz commented Dec 27, 2024 •

edited

Loading

illia-v commented Dec 27, 2024

gh-128271: fix incorrect handling of negative read sizes in HTTPResponse.read() #128270

Are you sure you want to change the base?

gh-128271: fix incorrect handling of negative read sizes in HTTPResponse.read() #128270

Conversation

manushkin commented Dec 26, 2024 • edited by bedevere-app bot Loading

cpython-cla-bot bot commented Dec 26, 2024 • edited Loading

bedevere-app bot commented Dec 26, 2024

manushkin commented Dec 26, 2024

picnixz commented Dec 26, 2024 • edited Loading

manushkin commented Dec 26, 2024

picnixz commented Dec 26, 2024

picnixz commented Dec 26, 2024

bedevere-app bot commented Dec 26, 2024

picnixz commented Dec 26, 2024 • edited Loading

picnixz left a comment

Choose a reason for hiding this comment

illia-v commented Dec 26, 2024

picnixz commented Dec 26, 2024

manushkin commented Dec 27, 2024 • edited Loading

picnixz commented Dec 27, 2024

picnixz left a comment

Choose a reason for hiding this comment

picnixz left a comment

Choose a reason for hiding this comment

picnixz commented Dec 27, 2024 • edited Loading

illia-v commented Dec 27, 2024

picnixz commented Dec 27, 2024 • edited Loading

illia-v commented Dec 27, 2024

picnixz commented Dec 27, 2024 • edited Loading

illia-v commented Dec 27, 2024

gh-128271: fix incorrect handling of negative read sizes in `HTTPResponse.read()` #128270

gh-128271: fix incorrect handling of negative read sizes in `HTTPResponse.read()` #128270

manushkin commented Dec 26, 2024 •

edited by bedevere-app bot

Loading

cpython-cla-bot bot commented Dec 26, 2024 •

edited

Loading

picnixz commented Dec 26, 2024 •

edited

Loading

picnixz commented Dec 26, 2024 •

edited

Loading

manushkin commented Dec 27, 2024 •

edited

Loading

picnixz commented Dec 27, 2024 •

edited

Loading

picnixz commented Dec 27, 2024 •

edited

Loading

picnixz commented Dec 27, 2024 •

edited

Loading