-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't preallocate a full vector, allocate on demand #154
base: master
Are you sure you want to change the base?
Conversation
Will this in any way also deprecate the
I don't think it's used anywhere yet. Not sure if it was intended for allocation or some sort of seeking strategy? |
Hi @5225225, Apologies for how long it took to come around to this, but got very busy and had to put Symphonia on a bit of a hiatus. Catching up now! If I understand this PR, the motivation is to protect against malicious files from allocating more bytes than could be read from the file? It's a good first go at a solution, but I think there are some additional considerations that need to be made:
What do you think? |
Hey @roderickvd, that option controls Symphonia's read-ahead/seekback buffer. The documentation for Generally, you shouldn't need to modify it. |
Am I correct that it is not used anywhere today? If there's anything I can tweak to improve |
No, it is used. However, it will not affect seek performance since Are you having issues with seek performance? The problem with OGG is that it's primarily a streaming format and the proper way to seek is either a linear scan or a bisection search of the file for the desired timestamp. I've implemented the latter, but this method does perform a lot of seeks. I've read that Spotify adds a non-standard seek index packet to their OGG files reduce the number of seeks. If you have any docs on that I could look into implementing support for it. Other than that, there are a few constants in |
yes The solution here is to mainly protect against "lol you were sent a 50KB file that says it's 100GB, you're going to OOM now", and not make any behavioral changes. I was originally planning to do a more clever scheme where we allow the decoder to ask the source how much is left, but that doesn't play nice with streaming sources. I could just bump up the limit here to 1MB which should be more reasonable. And yeah, this doesn't help with long files/streaming sources being too long. A limit might be smart, in that case. Though what would, say, a CLI player do? How do other players handle Very Large files? I forget if we require the whole file to fit in memory at once if we're playing it, but I assume we don't? |
My bad I could not find any occurrences doing a quick search.
Not issues perse but I have been at working optimising it. In librespot <= 0.4 we have been downloading 16 kB chunks over Mercury (which is cheap to setup headerless), so seek performance never was an issue. Now that we switched to HTTPS things are different, and it's a balancing act between larger chunk sizes (better throughput, less overhead) and latency (when seeking). Right now I have tuned it so that it initially grabs 64 kB chunks (which I took from
Spotify does prepend the file with its own Ogg header which is 166 bytes in length. Right now we get the normalisation values from it (they are not there with normal ReplayGain tags, but as specific bytes), then skip ahead and act as if byte 168 were actually byte 0 and we feed that to Symphonia. This is the Spotify Ogg header deciphered as far as I know: https://sourceforge.net/p/despotify/code/346/tree/java/trunk/src/main/java/se/despotify/client/player/SpotifyOggHeader.java |
Yeah, unfortunately streaming sources throw a wrench into many ideas.
If a limit is hit then the file would just be unreadable by Symphonia. The app. developer would need a way to allow the user to bump the limit or disable limits all together. For the most part, there are reasonable limits we could apply to metadata that would probably never be reached but not strain modern systems.
Generally they will just blindly allocate/load the file. The idea of a limit in this case would be an added safety feature of Symphonia beyond the typical Rust memory safety.
Symphonia doesn't require the entire file to be loaded into memory. Generally, it only has a 32 kB (by default, configurable by Looping back to the original problem though. I think the limit system as described is probably the way to go for metadata since it gives users flexibility. But metadata is also the only user of Another thing we should consider is having an option to skip a metadata item that's too large instead of throwing an error. Since metadata needs to be loaded to memory, memory constrained device may want to skip large metadata items rather than make the file unplayable. |
Hey @roderickvd, We should probably move this discussion to either a new Issue or a Discussion thread. Would you like to start one?
I see. I assume you're using range requests now? You could try just using the read length Symphonia issues your Probably the most ideal solution would be to have a background thread download the file into a cache, from which Symphonia can read. But on seek you can immediately move to where Symphonia requests. You'd probably need to track the ranges you have cached, but this would probably be an improvement. Ultimately though, the fastest solution would be to decode the Spotify seek table since instead of many seeks we can only do one.
This is just a guess from looking at the code, but... It looks like the So, based off this I assume that the seek table splits the file up into 100 segments, and stores the byte (or kB) offset to the start of each segment in the table. This is my hypothesis, but perhaps there's more too it. |
|
This avoids OOM issues, at the cost of resizing a bit more.
I originally had a design where streams could say how long they are, but I figured just letting vec resize here is the best solution.
I looked through for any other places where
vec![expr; length]
is called wherelength
is untrusted, and did only see these. If there's any more I'll fix em.Sorry, no test files for this, I kinda did the work a while ago and forgot I did it, so didn't keep any test files.