-
-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async encoding of nodejs Stream #82
Comments
I am open to this. Can you talk more about how you would want the API to look? Would you want to be able to send some other CBOR to prefix or wrap the stream? I assume you would want to set the type to either bytes or text on creation, and to either be able to write buffers or strings to one side, and get cbor bytes out the other, right? would you be willing to have a maximum chunk size? would you mind the stream adding chunk boundaries at will? In string mode, would you want to ensure that only fully-formed UTF8 sequences were written? I'm imagining a transform stream. In backpressure situations, it's possible that the stream would get called with different boundaries than the writer intended. |
Simple for example encodeAsync with same parameters as encode function but it will return readable stream.
I am not sure whether I am understand this.
Seems good to me. Maybe with bytes as default.
I think when implementing this using transform stream when encodeAsync finds stream type it should push type 2 with indefinite length (31), then in each _transform it will get chunks of data which can be written to cbor as chunks of type 2 with same lengths as will get _transform. Maybe it can do some optimisations for example if _transform gets too small buffers like of size 1 byte then it could concat them into bigger cbor chunk. I do not see reason for maximum chunk size. (But as option can be added) Yes chunk boundaries at will of Stream are ok. For output cbor buffer stream reader or decoder on other side should not mind how is indefinite cbor buffer chunked.
Yes maybe invalid utf8 written to type 3 should emit error. (As probably does decoding) |
I like the idea of just sticking a ReadableStream in the input. That makes things a lot more straightforward. If you wanted to write Plan:
|
I do not think this could be very useful and it can be done also on user side. I hope I was understood right how I meant it. Input stream should not be possible only as input argument to encode function but also inside nested structure. For example now if I want to send this:
Then I first need to have 2x the size of both files together in memory before sending. My idea was to be possible something like this:
And memory footprint is around as small as would be when sending some file as stream directly. |
cbor.Encode is already a Transform stream. const stream = new cbor.Encoder()
stream.pipe(process.stdout)
stream.end({foo: 1}) |
In my proposed approach, you would write: const e = new cbor.Encoder()
e.pipe(stream)
e.write({
a: fs.createReadStream("some big file"),
b: [{
c: fs.createReadStream("some other big file")
}]
}) And you would get the memory usage you expect, along with appropriate backpressure. |
But fs.createReadStream returns stream. When using buffers this would have 1x the size of both files together in memory but stream support here would avoid that. Lets say I can stream from file system or from two network connections two files directly into cbor stream and sending it to another location without need to first load files into memory. |
I'll put together a quick patch, and we'll see if it solves your problem. If not, I'll ask a bunch more questions until I understand better. |
I suppose this would require more asynchronous approach during encoding. Here is what I am expecting that should happen in steps: e.write(
{ // send cbor type map with size 2
a: // send type string with size 1, send "a"
stream1, // send type buffer/string/array with indefinite length, start sending
// chunks as they arrive on stream as buffer/string/any (could be done with
// stream1.pipe to output through transform stream which will prefix
// chunks with cbor type+size and encode data), must asynchronously wait
// until stream1 ends then send break stop code, then continue with key b
b: // send type string with size 1, send "b"
[ // send type array size 1
{ // send type map size 1
c: // send type string with size 1, send "c"
stream2, // do same as for stream1 ... async wait for stream2 then after
// break stop code continue on key d then key e
d: 10 // ...
}
],
e: 20 // ...
}) So stream1, stream2 and output cbor are never present whole in memory (at least if they were not get whole as one chunk). Streams can be nested at any sane depth, inside maps and arrays just like other types. |
Any update on this? I'm trying to encode |
I presume you can't transform the |
Well yes, that's the current work-around. But I was under the impression that the whole point of implementing the |
Should be possible to asynchronously encode object with some values of type Stream and return stream which will emit encoded data continuously.
CBOR supports indefinite byte buffers: https://tools.ietf.org/html/rfc7049#section-2.2.2 so size of input streams does not must be known before writing data.
CBOR supports also indefinite arrays and maps so also input object streams are doable.
The text was updated successfully, but these errors were encountered: