-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimized decoder for WebAssembly #4
Comments
Hi @thelamer ! I don't know all that much about the latter two platforms, but I went in to x86 stuff without knowing anything either, so I can just learn the same way. What I am aware of, however, is that Rust only targets wasm32 at the moment (correct me if I'm wrong). Some of the optimizations (including the single-pixel hash function inspired by rapid-qoi) depend on being within a 64-bit integer, so those would have to be stripped. I also don't know the extent of the range of SIMD options there are in WASM. I assume that they have to be more general to make them platform-independent. There may not be some of the instructions I would need to optimize easily, but I can definitely try. The potential is potentially there, I think (lol). If you want, I can try WASM after I finish x86 (after I'm done the base stuff and ssse3, the rest of the instructions won't take very long). |
@AstroFloof sorry for the delay missed this ping. Yes SIMD instructions in WebAssembly is very limited. So in this case my focus is on any optimizations even if small that could be made to the decoding reference implementation I used which directly translates to lower CPU clientside and higher FPS. Thought I would reach out to anyone trying to improve the QOI v1 spec for decoding/encoding which is a very small list. Right now performance is pretty good on higher end modern CPUs: fpsdemo.mp4An easy way to see this first hand would be to run From a development standpoint it would just involve building and swapping out the wasm blob and function names in: If you setup a Github Sponsorship on your account I would be happy to toss you some money just for looking into it. I'm interested if it is even possible. |
Hello again @thelamer ! I'm honoured that you'd consider sponsoring this project, and I would love to work on a WASM decoder (and later an encoder). I should warn you not to expect anything, however. I don't know anything about WASM yet (although I knew nothing about x86 before starting the project, so I will learn, of course), and I'm not sure how long it'll take me to make a MVP to start optimizing. Additionally, most if not all of the SIMD-related optimizations are focused on hashing every pixel before beginning the encoding process. I do know of techniques I've used to speed up decode that I can attempt to use though. Whatever the final product may be, I'll certainly do my best! I intend to release a new version of the x86-64 encoder/decoder very soon, so I'll start after that. |
Hey @thelamer! I'm a friend of Floof that introduced him to QOI and followed his hardqoi developments since |
Welp, quick update on this. Did some more investigating and I think QOI on GPU really is a dead-end :P |
@AstroFloof so I have been pumping decode code through different AI language models and some of them seem to think the hashing is not needed and it can be more efficiently performed with an array. This is all greek to me but does this make any sense to you?
|
This doesn't make any sort of sense. LLMs just predict the next token and have no idea what they're doing. |
Thanks for taking a look, I figured as much I could not get this to run. |
Feel free to simply close out this issue if you are not interested but we just implemented QOI image format for VNC to deliver lossless remote desktops using Rust WASM clientside here:
https://github.com/kasmtech/noVNC/tree/master/core/decoders/qoi
Some docs here:
https://www.kasmweb.com/docs/latest/how_to/lossless.html
I have been wondering if SIMD optimizations were even possible on the server side for some time now, I tried out the stable branch with ssse3 and did see +- 10% in encoding speed vs rapid qoi depending on what image you feed to it. Looks like offloading the hashing has some promise especially once the AVX stuff is implemented.
Though I am specifically reaching out if you think the decoding could be sped up in a web browser? The compiled blob linked earlier in noVNC is a modified version of this implementation:
https://github.com/lukeflima/qoi-viewer
This is all functional, but under high load scenarios you need a pretty beefy client to maintain FPS at a gigabit. Even a small improvement on the web assembly side would have a large impact on overall smoothness of desktop delivery. Anything we do for desktop delivery is open source including these changes if possible.
Essentially I am wondering if you would be interested in some side work to put together a highly optimized open source WASM qoi decoder that takes a Uint8Array as input and spits back "ImageData" as a uint8clamped array and size information. We do 24 bit qoi without the alpha channel.
The text was updated successfully, but these errors were encountered: