Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for presser #138

Merged
merged 15 commits into from
May 10, 2023
Merged

Conversation

fu5ha
Copy link
Contributor

@fu5ha fu5ha commented Oct 17, 2022

For CPU-to-GPU staging & update buffers, one must copy data into them. This currently requires unsafe raw pointer copies or using an unsound byte-slice interface. presser solves this by providing helper functions for safely copying data into raw allocated buffers while validating layout requirements and not creating invalid references to uninitialised memory.

Implementation

This PR adds support for presser in two ways, and for now only for Vulkan.

  1. Allocation implements presser::Slab directly, meaning you can directly pass it into presser's helper functions. However, because not all Allocations are valid Slabs, if you attempt to use a non-mapped or too-large allocation as a Slab, you'll cause a panic.
  2. For the case where you want to handle not being able to convert to a Slab in some way other than panicking, Allocation::try_as_mapped_slab will optionally return a MappedAllocationSlab which pre-checks its conditions and so infallibly implements Slab upon succeeding.

The advantage of implementing Slab on Allocation directly, in addition to the added level of convenience it provides, is that once EmbarkStudios/presser#5 lands and we upgrade to that version here, you'll be able to share an &Allocator and use the read_X helpers from presser from multiple places, rather than requiring &mut Allocator everywhere.

Future path to adoption

  1. Land this PR
  2. Design an equivalent API for the d3d12 Allocation type. I think this likely will look like adding a mapped_ptr: Option<NonNull<c_void>> field much like the vulkan::Allocation has, and then a pub fn map(&mut self, resource: *mut ID3D12Resource) which would fill in that field. But, that's for another PR's discussion...
  3. At this point we could probably do a new gpu-allocator release just with the copy apis.
  4. Land Add read helpers EmbarkStudios/presser#5 and upgrade gpu-allocator to a version 0.4 which includes those changes
  5. Do another gpu-allocator (minor) release which adds #[deprecated] attributes to mapped_slice[_mut], documentation detailing why they've been deprecated, and linking to an upgrade guide in some release notes or something (or just a document in the repo).
  6. Once we've established that presser is actually replacing all use cases, do a gpu-allocator major release which actually removes the #[deprecated] methods and links to the migration guide again in the changelog/release notes.

@fu5ha fu5ha changed the title ignore tmp dir Add support for presser Oct 17, 2022
Copy link
Member

@MarijnS95 MarijnS95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds optional support for presser under the presser feature.

Thanks! Can you describe in the description/commit, in a single line, what presser provides for users and why it should be used in gpu-allocator, without having them click through to the crate?

Depending on how this pans out we might want to land https://github.com/Traverse-Research/gpu-allocator/compare/uninit at some point or remove the currently-UB API altogether.
EDIT: One flaw here is that the API requires mapped memory to be initialized with &[u8], which presser points out to be unsafe when having to transmute some repr(C) struct to such a slice while containing padding. Perhaps we can implement these initialization functions on top of presser too?

Doing this is quite straightforward for Vulkan, but there didn't seem to be an equivalent for the d3d12 Allocation type and I'm not familiar enough with that api to say how that should be implemented

IIRC we have a d3d12::ID3D12Resource::Map() call in our custom implementation. I'm not sure why this isn't provided by gpu-allocator since this is an explicit (vkMapMemory()) operation on Vulkan as well... @manon-traverse any idea, perhaps worth to add in a separate PR?

I could implement it if one told me how to acquire a mapped base pointer for some allocation...

upload_buffer
.as_ref()
.unwrap()
.Map(0, std::ptr::null(), &mut mapped_ptr);

(But in a separate, preliminary PR, if we end up going this route 😉)

src/vulkan/presser.rs Outdated Show resolved Hide resolved
src/vulkan/presser.rs Outdated Show resolved Hide resolved
Cargo.toml Outdated Show resolved Hide resolved
src/vulkan/presser.rs Outdated Show resolved Hide resolved
@fu5ha
Copy link
Contributor Author

fu5ha commented Oct 18, 2022

IIRC we have a d3d12::ID3D12Resource::Map() call in our custom implementation. I'm not sure why this isn't provided by gpu-allocator since this is an explicit (vkMapMemory()) operation on Vulkan as well... @manon-traverse any idea, perhaps worth to add in a separate PR?

Yeah so when looking through the d3d12 visualizer example, it seems like the main issue is that in order to get a mapped pointer you have to first create the actual ID3D12Resource ptr and then Map that, rather than mapping just the memory object that the Allocation knows about.. or something. So not sure exactly how an api there should look.

The Allocation could maybe contain an optional mapped ptr much like the vulkan Allocation does and then add a function like unsafe fn map(&mut self, res: *mut ID3D12Resource) which sets the internal mapped ptr and then allows access to the same as_mapped_slab api I introduced here.

@MarijnS95
Copy link
Member

MarijnS95 commented Oct 18, 2022

Right, gpu-allocator would have to go through CreatePlacedResource() (and expose the returned ID3D12Resource for this) which seemingly isn't as clear-cut to use here: it is apparently allowed to be called multiple times on the same heap memory/offset to overlap resources.

@fu5ha
Copy link
Contributor Author

fu5ha commented Oct 18, 2022

Depending on how this pans out we might want to land https://github.com/Traverse-Research/gpu-allocator/compare/uninit at some point or remove the currently-UB API altogether.

Interesting. I'm sorta unconvinced that abstracting this is much of a benefit because it's so easy to make an initialized buffer uninit again by copying data that has padding bytes into it, as you mention. I'm also not sure exactly how that works in terms of whether data the gpu would write would make padding bytes uninit (I think it won't since it's FFI so compiler has to be conservative) for GPU-to-CPU. I think that case of a readback buffer is probably the most compelling use case for actually tracking the initialization state of the memory as a whole since that would allow actually-safe access as a &[u8] for reads.

And yeah, I think using presser to implement the initialization api would be a great fit, and I'm in favor of eventually removing the mapped_slice_mut api eventually :)

@fu5ha
Copy link
Contributor Author

fu5ha commented Oct 18, 2022

Right, gpu-allocator would have to go through CreatePlacedResource() (and expose the returned ID3D12Resource for this) which seemingly isn't as clear-cut to use here: it is apparently allowed to be called multiple times on the same heap memory/offset to overlap resources.

Not necessarily, it could require the user to do CreatePlacedResource() themselves and then pass the created resource back in to the map api on Allocation

@MarijnS95
Copy link
Member

Not necessarily, it could require the user to do CreatePlacedResource() themselves and then pass the created resource back in to the map api on Allocation

Meant in the theoretical case that we extend gpu-allocator to perform CreatePlacedResource() + Map(), instead of accepting it as an external argument (that would then have to be asserted "somehow" to belong to the allocation), sorry for not making that more clear :)

@fu5ha
Copy link
Contributor Author

fu5ha commented Oct 18, 2022

Ah right, makes sense :)

@MarijnS95
Copy link
Member

I'm sorta unconvinced that abstracting this is much of a benefit

Agreed. I'm inclined to just make fn mapped_slice unsafe (since it really, really is!), slap a /// # Safety\n/// Only to be called when the memory is known to be _fully_ initialized and leave out this whole MaybeUninit mess simply because It's Complicated™ and so easy to invalidate in the various ways you pointed out.

I think that case of a readback buffer is probably the most compelling use case for actually tracking the initialization state of the memory as a whole since that would allow actually-safe access as a &[u8] for reads.

This is especially tricky since yes, you typically want to consume the data in some &[T] way, and have to assume the GPU has written exactly that T to it and is synchronized with the CPU timeline (and isn't mutably aliased, isn't being written to by "another thing" on the GPU timeline, ...).

MarijnS95 added a commit that referenced this pull request Oct 18, 2022
As discussed long ago, and recently in #138, it is undefined behaviour
to create or transmute to `&[u8]` when the underlying data is possibly
uninit.  This also holds true for transmuting arbitrary `T: Copy`
structures to `&[u8]` where eventual padding bytes are considered
uninitialized, hence invalid for `u8`.

Instead of coming up with a massive safety API that distinguishes
between uninitialized and initialized buffers - which turn out to be
really easy to invalidate by copying structures with padding bytes -
place the onus on the user to keep track of initialization status by
only ever providing mapped slices in an `unsafe` context.  Users are
expected to initialize the buffer using `ptr::copy(_nonoverlapping)()`
when used from a CPU context instead of calling `.mapped_mut_slice()`,
or switch to the new [presser] API from #138.

[presser]: https://crates.io/crates/presser
@fu5ha
Copy link
Contributor Author

fu5ha commented Oct 18, 2022

Agreed. I'm inclined to just make fn mapped_slice unsafe (since it really, really is!), slap a /// # Safety\n/// Only to be called when the memory is known to be _fully_ initialized and leave out this whole MaybeUninit mess simply because It's Complicated™ and so easy to invalidate in the various ways you pointed out.

snip moved to #139 discussion.

Copy link
Member

@Jasper-Bekkers Jasper-Bekkers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very glad with this contribution since it's alwasy bothered me a little bit that we're effectively having to unsafe transmute our data to get it into gpu-allocator, however I have a few remarks that I'd like to share;

  1. I think this is important enough to require feature parity with Dx12, it doesn't have to appear in this PR, but I will block the release of a new gpu-allocator version until we've landed Dx12 support as well.
  2. I would like us to come up with a plan so we can start deprecating the old way of doing things and slowly move towards having presser be the default API here, so I'm not sure if we should hide it behind a feature toggle in the first place.

.gitignore Outdated Show resolved Hide resolved
@fu5ha
Copy link
Contributor Author

fu5ha commented Oct 19, 2022

Sounds good to me @Jasper-Bekkers! I'll clean this up a bit more based on that plan and can start experimenting with a dx12 api soon.

@fu5ha
Copy link
Contributor Author

fu5ha commented Oct 19, 2022

Addressed Jasper's feedback and updated the original PR description with the new direction :)

@fu5ha fu5ha requested review from Jasper-Bekkers and MarijnS95 and removed request for Jasper-Bekkers and MarijnS95 October 19, 2022 15:44
@fu5ha
Copy link
Contributor Author

fu5ha commented Oct 19, 2022

... github is being dumb and not letting me re-request from both of you at the same time I guess lol.

///
/// This type should be acquired by calling [`Allocation::try_as_mapped_slab`].
pub struct MappedAllocationSlab<'a> {
_borrowed_alloc: &'a mut Allocation,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't actually use _borrowed_alloc then we can hold a PhantomData<&'a mut Allocation>. It will give us the same ownership semantics without actually using any memory.

MarijnS95 added a commit that referenced this pull request Nov 30, 2022
As discussed long ago, and recently in #138, it is undefined behaviour
to create or transmute to `&[u8]` when the underlying data is possibly
uninit.  This also holds true for transmuting arbitrary `T: Copy`
structures to `&[u8]` where eventual padding bytes are considered
uninitialized, hence invalid for `u8`.

Instead of coming up with a massive safety API that distinguishes
between uninitialized and initialized buffers - which turn out to be
really easy to invalidate by copying structures with padding bytes -
place the onus on the user to keep track of initialization status by
only ever providing mapped slices in an `unsafe` context.  Users are
expected to initialize the buffer using `ptr::copy(_nonoverlapping)()`
when used from a CPU context instead of calling `.mapped_mut_slice()`,
or switch to the new [presser] API from #138.

[presser]: https://crates.io/crates/presser
@Jasper-Bekkers Jasper-Bekkers merged commit 006ade8 into Traverse-Research:main May 10, 2023
@MarijnS95
Copy link
Member

NonNull::slice_from_raw_parts() just stabilized which we might also use to represent a valid range of bytes on a raw pointer, without the UB of mapping possibly uninitialized data as safe slice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants