-
Notifications
You must be signed in to change notification settings - Fork 316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance optimization of StreamSource #675
base: main
Are you sure you want to change the base?
Conversation
In which situation would this occur with real-world usage of tus-js-client? Does that happen when you resume an upload that is nearly finished with a new |
Hi Marius,
Thank you for the quick response on this.
To reproduce the issue:
1) Create a ReadableStream to some large file.
2) Start uploading the file.
3) Pause the upload half way through
4) Refresh your browser window - ** this is the important part **
5) Resume the upload ( By selecting the same file again ). The StreamSource
now has to process all of the data up to the middle ( which i have no
problem with and is impossible to get around ), but at the moment it also
buffers it in memory which is unnecessary and hangs the browser actually if
the file is big enough.
Pausing and resuming within the same session is not a problem.
Kind regards,
Matthew
…On Fri, Mar 15, 2024 at 2:59 PM Marius ***@***.***> wrote:
This is a serious problem if the stream has never been read from and you
call slice with a start and end position near the end of the stream.
In which situation would this occur with real-world usage of
tus-js-client? Does that happen when you resume an upload that is nearly
finished?
—
Reply to this email directly, view it on GitHub
<#675 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAUIQ6NAKSVCQ4AWARCT6XLYYLWCXAVCNFSM6AAAAABEX4XBA6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJZGYYTGOJWGI>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
As to your question about real world use - Yes, I am creating a new tus.Upload object. When dealing with a standard javscript File object the scenario I described above works without issue ( as with a file you can get a slice without having to read any data prior to 'start' ). I then started playing around with the idea of on the fly compression using the Compressions Streams API and starting observing the massive slowdown in the scenario above when using a stream. After some debugging I found out that it was all the concatenation and buffering in memory which was causing the issue. I don't think my solution is correct at the moment however as I am seeing incorrect file size uploads in the scenario I described above. Busy investigating. |
@Acconut I fixed the bug in my code and files are uploaded correctly now. The optimization definitely helps for the use case I described ( in a real world application we are developing ). Would you like any further clarification or code changes? |
That change itself looks good and also makes sense to me. Before we can merge this, we should also add a test case for this. In https://github.com/tus/tus-js-client/blob/f410679e8ae4b3821ffb919a5cfe7f04acd1528d/test/spec/test-browser-specific.js/#L251, we have test cases specific to readers in browsers. There a test case can be added, where a stream is supplied for resuming an upload with a non-zero offset. Seeking to this offset should result in multiple calls to the reader. In the end, an assertion should take place confirming that the data in the PATCH request is correct. Could you look into this? |
Hi @Acconut thanks for checking out my updates. I will try and push a test for the changes in the next few days. |
@Acconut I have added a test for you - it tests that the data sent in the PATCH request is correct when starting off at a non zero offset. We could I guess add some tests for the StreamSource itself - but I'm not sure what the best way would be to test that the concat method is NOT called when we don't expect it to be without exposing some internals of the class just for the sake of testing. Please let me know if you would like any further changes. |
At the moment when a call to 'slice(start, end)' occurs on StreamSource, if buffers in memory all data from wherever the current position in the stream is up to end.
This is a serious problem if the stream has never been read from and you call slice with a start and end position near the end of the stream. E.g if I have a 2 gigabyte file on disk that I am reading via a Web ReadableStream, and am using that as the source of my data for the StreamSource, and I ask for the last megabyte slice at the end of the StreamSource it will currently buffer the entire 2 gigabytes into a memory buffer, before returning the last megabyte!
Obviously with a stream you will always have to read the entire 2 gigabytes to get to the end, but buffering the data in memory is much more problematic and wasteful than just reading through the stream.