You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the only way to write a large amount of data into volatile memory is through iterating through individual VolAddress and writing a single T to memory one after another.
I've not inspected myself the generated code, but it seems reasonable to assume this results in slower than optimal code (it seems fast, so maybe it actually is optimizing it!) I've read your amazing writeup on how to hand optimize load/store ASM code, and it seems reasonable that a write_slice and read_slice methods could make use of better intrinsics such as volatile_copy_memory and volatile_set_memory (Using a pointer cast might just work??)
The problem with the volatile_copy_memory and volatile_set_memory intrinsics is that they're still unstable, and likely to be unstable for a long time. That's why I made the design as simple as possible to kinda work around that intrinsic being missing.
I think some of this sort of thing is in the VolSlice type, which is behind a feature gate right now, but it probably would be fine to stabilize into the main library. And the VolBlock type will deref into the VolSlice type, once things are ready.
Oh I see now. I completely missed the VolRegion struct. That looks exactly like what I'm describing in this issue. Amazing! I guess it's still possible to featuregate the optimized version and keep the sequential read/slice in the stable implementation.
Looking at the volatile crate, I see that's exactly what they came up with.
Though I wonder if casting to a VolAddress<[T; N]> then writing/reading the array to/from memory compiles down into volatile_copy/set_memory. Which would be an alright stable alternative (although requires specifying the exact length)
Currently, the only way to write a large amount of data into volatile memory is through iterating through individual
VolAddress
and writing a singleT
to memory one after another.I've not inspected myself the generated code, but it seems reasonable to assume this results in slower than optimal code (it seems fast, so maybe it actually is optimizing it!) I've read your amazing writeup on how to hand optimize load/store ASM code, and it seems reasonable that a
write_slice
andread_slice
methods could make use of better intrinsics such asvolatile_copy_memory
andvolatile_set_memory
(Using a pointer cast might just work??)I've written already such methods (though without optimizations, just the API) and I'd be happy to contribute it back if this is wanted.
The text was updated successfully, but these errors were encountered: