Selective merged prefill #643

xuechendi · 2024-12-18T01:23:26Z

No description provided.

Signed-off-by: Chendi Xue <[email protected]>

Signed-off-by: Chendi.Xue <[email protected]> Signed-off-by: Chendi Xue <[email protected]>

yangw1234 · 2025-01-06T18:09:04Z

vllm/attention/backends/hpu_attn.py

+                key_cache, value_cache = HPUPagedAttention.split_kv_cache(
+                    kv_cache, self.num_kv_heads, self.head_size)
+
+                key_cache = self.k_cache(padded_key_tensor, key_cache, block_indices,


It seems that when decoding, padded_key_tensor is not defined. Would this be a problem?

I think you're right, but it didn't trigger any error, I'll look into it.

@yangw1234 , after checking with the codes, since enable_merged_prefill is only enabled in prefill_fwd, I'll clean up the codes to make it more readable

yangw1234 · 2025-01-06T18:28:54Z

vllm/attention/backends/hpu_attn.py

+                max_len=attn_metadata.slot_mapping.size(1)
+                seq_lens_tensor_list = attn_metadata.seq_lens_tensor.tolist()
+                # we need to copy the key and value tensors to the padded tensors
+                # shape is [bacth_size, entire_seq_len, num_kv_heads, head_size]
+                padded_key_tensor = split_and_pad_to_length(key, max_len, seq_lens_tensor_list)
+                padded_value_tensor = split_and_pad_to_length(value, max_len, seq_lens_tensor_list)
+                padded_key_tensor = padded_key_tensor.flatten(0, 1).unflatten(0, (block_indices.size(0), -1))
+                padded_value_tensor = padded_value_tensor.flatten(0, 1).unflatten(0, (block_indices.size(0), -1))


Is it possible to get rid of these line if we prepare block_indices and block_offsets in a way that excludes the padded tokens?

Signed-off-by: Chendi Xue <[email protected]>

Signed-off-by: Chendi.Xue <[email protected]>

Signed-off-by: Chendi Xue <[email protected]> Signed-off-by: Chendi.Xue <[email protected]>

Signed-off-by: Chendi.Xue <[email protected]>

Signed-off-by: Chendi Xue <[email protected]>

Signed-off-by: Chendi Xue <[email protected]> Signed-off-by: Chendi.Xue <[email protected]>

Signed-off-by: Chendi.Xue <[email protected]>

xuechendi requested review from kzawora-intel, madamczykhabana, michalkuligowski and mgawarkiewicz as code owners December 18, 2024 01:23

xuechendi marked this pull request as draft December 18, 2024 01:23

xuechendi marked this pull request as ready for review December 19, 2024 00:06

xuechendi changed the title ~~[Draft]Selective merged prefill~~ Selective merged prefill Dec 19, 2024

xuechendi force-pushed the mlperf_features branch from da935cf to d6bdc90 Compare December 19, 2024 21:44

xuechendi added 8 commits December 19, 2024 23:52

update benchmark with bucketing strategy

997a10a

Signed-off-by: Chendi Xue <[email protected]>

merge input tokens

79c8b8e

Signed-off-by: Chendi Xue <[email protected]>

Enable merged prefill

552e294

Signed-off-by: Chendi Xue <[email protected]>

accuracy issue is fixed

bd87512

Signed-off-by: Chendi Xue <[email protected]>

use logical_and_

1caf266

Signed-off-by: Chendi Xue <[email protected]>

update

c528736

Signed-off-by: Chendi Xue <[email protected]>

update benchmark

510722e

Signed-off-by: Chendi Xue <[email protected]>

update

f6c0c84

Signed-off-by: Chendi Xue <[email protected]>

xuechendi force-pushed the selective_merged_prefill branch from 6ddfcac to f6c0c84 Compare December 20, 2024 04:40

xuechendi added 2 commits December 20, 2024 22:09

rewrite split function to make fp8 work

b133542

Signed-off-by: Chendi Xue <[email protected]>

Fix accuracy issue

2d6ceb9

Signed-off-by: Chendi.Xue <[email protected]> Signed-off-by: Chendi Xue <[email protected]>

xuechendi force-pushed the selective_merged_prefill branch from b6f6961 to 2d6ceb9 Compare December 20, 2024 23:09

yangw1234 reviewed Jan 6, 2025

View reviewed changes

xuechendi added 3 commits January 6, 2025 22:07

clean up codes in hpu-attn

116dc6c

Signed-off-by: Chendi Xue <[email protected]>

update warming up strategy for merged_prefill

fade386

Signed-off-by: Chendi Xue <[email protected]>

fix an accuracy issue caused by selected_token_index

911f14b

Signed-off-by: Chendi.Xue <[email protected]>

xuechendi force-pushed the selective_merged_prefill branch from d18df2d to 92bf903 Compare January 7, 2025 22:32

move tolist to llamamodel fwd

612abed

Signed-off-by: Chendi Xue <[email protected]> Signed-off-by: Chendi.Xue <[email protected]>

xuechendi force-pushed the selective_merged_prefill branch 2 times, most recently from b7d0931 to ce48860 Compare January 8, 2025 01:49

use index_put with full block_indices

a3602f2

Signed-off-by: Chendi.Xue <[email protected]>

xuechendi force-pushed the selective_merged_prefill branch from ce48860 to a3602f2 Compare January 8, 2025 01:55

xuechendi added 7 commits January 8, 2025 04:39

use fixed length for selected_token_indices

97ea32b

Signed-off-by: Chendi.Xue <[email protected]>

clean up hpu_attn codes

11ffc2f

Signed-off-by: Chendi Xue <[email protected]>

add CPU version attn_mask preparation

2ef08d6

Signed-off-by: Chendi Xue <[email protected]>

update CPU version attn_bias prepration and clean up

405243a

Signed-off-by: Chendi Xue <[email protected]>

111

774c13c

Signed-off-by: Chendi Xue <[email protected]>

update hpu attn

3826c1d

Signed-off-by: Chendi Xue <[email protected]> Signed-off-by: Chendi.Xue <[email protected]>

update

a906f36

Signed-off-by: Chendi.Xue <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Selective merged prefill #643

Selective merged prefill #643

xuechendi commented Dec 18, 2024

yangw1234 Jan 6, 2025

xuechendi Jan 6, 2025

xuechendi Jan 6, 2025

yangw1234 Jan 6, 2025 •

edited

Loading

Selective merged prefill #643

Are you sure you want to change the base?

Selective merged prefill #643

Conversation

xuechendi commented Dec 18, 2024

yangw1234 Jan 6, 2025

Choose a reason for hiding this comment

xuechendi Jan 6, 2025

Choose a reason for hiding this comment

xuechendi Jan 6, 2025

Choose a reason for hiding this comment

yangw1234 Jan 6, 2025 • edited Loading

Choose a reason for hiding this comment

yangw1234 Jan 6, 2025 •

edited

Loading