Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selective merged prefill #643

Open
wants to merge 22 commits into
base: mlperf_features
Choose a base branch
from

Conversation

xuechendi
Copy link

No description provided.

@xuechendi xuechendi marked this pull request as draft December 18, 2024 01:23
@xuechendi xuechendi marked this pull request as ready for review December 19, 2024 00:06
@xuechendi xuechendi changed the title [Draft]Selective merged prefill Selective merged prefill Dec 19, 2024
Signed-off-by: Chendi Xue <[email protected]>
Signed-off-by: Chendi Xue <[email protected]>
Signed-off-by: Chendi Xue <[email protected]>
Signed-off-by: Chendi Xue <[email protected]>
Signed-off-by: Chendi Xue <[email protected]>
Signed-off-by: Chendi Xue <[email protected]>
Signed-off-by: Chendi Xue <[email protected]>
@xuechendi xuechendi force-pushed the selective_merged_prefill branch from 6ddfcac to f6c0c84 Compare December 20, 2024 04:40
Signed-off-by: Chendi.Xue <[email protected]>
Signed-off-by: Chendi Xue <[email protected]>
@xuechendi xuechendi force-pushed the selective_merged_prefill branch from b6f6961 to 2d6ceb9 Compare December 20, 2024 23:09
key_cache, value_cache = HPUPagedAttention.split_kv_cache(
kv_cache, self.num_kv_heads, self.head_size)

key_cache = self.k_cache(padded_key_tensor, key_cache, block_indices,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that when decoding, padded_key_tensor is not defined. Would this be a problem?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right, but it didn't trigger any error, I'll look into it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yangw1234 , after checking with the codes, since enable_merged_prefill is only enabled in prefill_fwd, I'll clean up the codes to make it more readable

Comment on lines 279 to 286
max_len=attn_metadata.slot_mapping.size(1)
seq_lens_tensor_list = attn_metadata.seq_lens_tensor.tolist()
# we need to copy the key and value tensors to the padded tensors
# shape is [bacth_size, entire_seq_len, num_kv_heads, head_size]
padded_key_tensor = split_and_pad_to_length(key, max_len, seq_lens_tensor_list)
padded_value_tensor = split_and_pad_to_length(value, max_len, seq_lens_tensor_list)
padded_key_tensor = padded_key_tensor.flatten(0, 1).unflatten(0, (block_indices.size(0), -1))
padded_value_tensor = padded_value_tensor.flatten(0, 1).unflatten(0, (block_indices.size(0), -1))
Copy link

@yangw1234 yangw1234 Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to get rid of these line if we prepare block_indices and block_offsets in a way that excludes the padded tokens?

@xuechendi xuechendi force-pushed the selective_merged_prefill branch from d18df2d to 92bf903 Compare January 7, 2025 22:32
Signed-off-by: Chendi Xue <[email protected]>
Signed-off-by: Chendi.Xue <[email protected]>
@xuechendi xuechendi force-pushed the selective_merged_prefill branch 2 times, most recently from b7d0931 to ce48860 Compare January 8, 2025 01:49
@xuechendi xuechendi force-pushed the selective_merged_prefill branch from ce48860 to a3602f2 Compare January 8, 2025 01:55
Signed-off-by: Chendi Xue <[email protected]>
Signed-off-by: Chendi Xue <[email protected]>
Signed-off-by: Chendi Xue <[email protected]>
Signed-off-by: Chendi.Xue <[email protected]>
Signed-off-by: Chendi.Xue <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants