Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial ConnectionLayers sketch #8

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

rousskov
Copy link
Contributor

@rousskov rousskov commented Oct 3, 2023

No description provided.

Copy link
Contributor Author

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unpolished PR contains a sketch of base connection layer classes that I promised to post during a Tuesday meeting a few weeks ago. The corresponding C++ declarations are based on existing/working Web Polygraph classes, but I hand-edited them to improve documentation and reduce noise. These classes will need to be adjusted/enhanced to be usable in Squid, of course.


```
SocketLayer - SocksLayer - SecurityLayer (with origin) - SecurityLayer (with proxy) - HTTP/1 Client job
```
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This layering approach, where each layer does not know what comes before or after it, allows dynamic creation/manipulation of processing chains (based on configuration and/or negotiated protocols), such as a long chain shown in the above example. The existing Squid code essentially has hard-coded processing chains, which makes adding support for "TLS inside TLS" or SOCKS framing very expensive/risky.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So how does this work with HTTP/2 where the framing is one layer and the semantics another?

And HTTP/3 where the message framing is sent as QUIC layer details?

Copy link
Contributor Author

@rousskov rousskov Oct 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So how does this work with HTTP/2 where the framing is one layer and the semantics another?

The class responsible for HTTP/2 framing is an UpperIoLayer child. It is named HttpTwoLayer in the sketch below. It converts I/O bytes from prev->readBuffer (after they were PROXY protocol stripped and TLS decoded, for example) into HTTP/2 frames (and HTTP/2 frames into prev->writeBuffer -- not shown here).

Each HTTP/2 transaction stream is handled by an object of another class (in the sketch below, these objects are stored in the streams_ container). That class does not operate on bytes. It operates on HTTP frames. It is not a part of the IoLayer hierarchy sketched in this PR, but you can see some rudimentary interactions between the two classes in the handleStreamFrame() sketch below.

void
HttpTwoLayer::parseFrames()
{
    while (const auto frame = extractFrame(prev->readBuffer)) {
        if (frame.type() == Frame::tpWindowUpdate)
            handleWindowUpdateFrame(frame);
        else
        if (frame.type() == Frame::tpRstStream)
            handleResetFrame(frame);
        else
        if (const auto streamId = frame.streamId())
            handleStreamFrame(frame, streamId);
        else
            handleConnFrame(frame);
    }
}

void
HttpTwoLayer::handleStreamFrame(const Frame frame, const uint32_t streamId)
{
    const auto it = streams_.find(streamId);

    if (it != streams_.end())
        return it->handleFrame(frame);

    if (frame.type() == Frame::tpHeaders || frame.type() == Frame::tpPushPromise)
        openStream(frame, streamId);
    else
        skipFrame(frame);
}

And HTTP/3 where the message framing is sent as QUIC layer details?

I would expect a similar arrangement. FWIW, in Polygraph, SocketLayer does support UDP. HttpThreeLayer will probably handle QUIC framing. Hopefully, HTTP/2 and HTTP/3 transaction stream managers will be able to share a lot of code -- all that logic happens above I/O layers.

P.S. HttpTwoLayer and similar names in these sketches are not what Squid should use. These Polygraph-derived names are used here for sketching convenience only. Many other names should be changed as well. These sketches are not about naming.

* LowerIoLayer: SocketLayer
* MiddleIoLayer: SocksLayer and SecurityLayer
* UpperIoLayer: HTTP/1, HTTP/2, FTP control, FTP data, etc. clients and server jobs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not paste existing Polygraph implementations of these layers. Their Squid implementations will differ quite a bit, of course, so there is probably little point in pasting that existing code here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first change for Squid is that HTTP (1-3) should all be MiddleIoLayer these days. Consider DoH and HTTP-over-HTTP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTP (1-3) should all be MiddleIoLayer these days. Consider DoH and HTTP-over-HTTP.

That remains to be seen, but I doubt that HTTP agents will derive from MiddleIoLayer because things on top of HTTP (e.g., DoH) will want to operate HTTP messages rather than bytes. There are certainly layers other than these byte-focused I/O layers; an UpperIoLayer child may also play a role of some hypothetical LowerHttpLayer.

This proposal does not try to cover everything from the Squid-accepted connection to the peer-accepted connection in a single chain of IoLayer objects:

// This is NOT being proposed here:
client socket LowerIoLayer -- TLS MiddleIoLayer -- HTTP MiddleIoLayer -- cache MiddleIoLayer -- HTTP MiddleIoLayer -- TLS MiddleIoLayer -- peer socket LowerIoLayer

Similar chains naturally exist in Squid, but they should not be based on IoLayers alone.


// Only the [last] UpperIoLayer may initiate the closing sequences, including
// both synchronous (closeNow()) and asynchronous (startClosing()).
```
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few other rules and caveats (usually learned the hard way) that I will need to collect and record if we decide to go forward with this design.

```

## TODO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My primary concern with this (or any other significant) redesign is that it will take an enormous effort to refactor Squid code accordingly because even straightforward Squid code changes usually require a lot of work, nerve cells, and time to land due to current Project idiosyncrasies. Making a lot of complex decisions and significant refactoring changes may be prohibitively expensive in the current Project environment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amos: I thought you said that the "clientStreams" design was terrible. This is essentially the same design, just naming things LowerIo and UpperIo instead of Producer and Consumer.

The comment quoted above illustrates the point of this change request thread: Technically problematic and emotionally charged arguments drain a lot of Project resources, making progress very costly. For example, responding to that comment alone has cost me more than two hours.

Virtually any design dealing with reading and writing protocol messages has I/O buffer(s) and some notification functions about data availability. Any design dealing with nested protocols will have some sort of sequence for processing steps. Those things are pretty much unavoidable because the problem domain itself effectively requires them. And, yes, clientStreams and IoLayers have that in common. Those common things are not what makes clientStreams "terrible".

I see at least two key fundamental differences between clientStreams and IoLayers:

  • clientStreams are unidirectional (like Unix pipes);
    IoLayers are bidirectional (like a socket; each layer writes and reads).
  • clientStreams buffers contain plain text HTTP response bytes;
    IoLayer buffers contain layer-specific bytes (a given layer may buffer, for example, TLS records, PROXY protocol header, or HTTP request bytes).

Those key differences are not what makes current clientStreams "terrible" either.

clientStream problems are in the details of their design, implementation, and use. AFAICT, those details are not relevant to this PR/discussion; we do not need to agree on them (anywhere) to make progress here.

@rousskov rousskov marked this pull request as draft October 3, 2023 19:13
virtual void startWriting() = 0;

RdBuf readBuffer; // filled by us; emptied by the upper layer
WrBuf writeBuffer; // filled by the upper layer; emptied by us
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Francesco noted during the meeting, the current buffer names do not tell the code reader which layer is filling buffer X and which layer is emptying buffer X. One has to read these data member descriptions to learn that.

FWIW, the current "read/write" names are based on how objects of this class (i.e. buffer owners) are using these buffers -- a LowerIoLayer object (e.g., SocketLayer) uses the readBuffer for calls like read(2) and uses writeBuffer for calls like write(2).

Here are a few illustrations from derived classes (Squid code will differ in many details):

void SocketLayer::readNow() // this is a LowerIoLayer child
{
    const auto sz = socket_.read(readBuffer.space(), readBuffer.spaceSize());
    if (sz > 0)
    ...
}
void HttpTwoLayer::parseFrames() // this is a UpperIoLayer child (which does not own I/O buffers)
{
    while (const auto frame = extractFrame(prev->readBuffer))
        ...
}

A better naming scheme is welcome.

@yadij
Copy link
Contributor

yadij commented Oct 3, 2023

I thought you said that the "clientStreams" design was terrible. This is essentially the same design, just naming things LowerIo and UpperIo instead of Producer and Consumer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants