[AMDGPU] [SelectionDAG] [GlobalIsel] select with constant combine into binaryOp with zext/sext #121145

vg0204 · 2024-12-26T08:30:06Z

In both the ISEL under generic combines, various select with constants combine into binary ops with zext/sext operand like

select Cond, C1, C1-1 --> add (zext Cond), (C1-1)
select Cond, Pow2, 0  -->  shl  (zext Cond), log2(Pow2)  
select Cond, C1, C1+1 --> add (sext Cond), (C1+1)

For various architecture, instruction materialization for zext/sext might be cheaper as compared to select, thus making sense for above combine optimization.

But in case of AMDGPU, both the zext/sext & select (for f32 with inline constants) materializes into v_cndmask_b32_e64. Thus the above optimization increases the cost by introducing an additional binary instruction.

If you look from different persepective, as in AMDGPU both the Zext/Sext and Select boils down to same machine instruction canonincally, thus really undoing the folding of binOp into Select. For example :

Select Cond, 7, 6 --> add ( zext Cond ), 6 materializes as :

v_cndmask_b32_e64 v0, 0, 1, vcc
v_add_u32_e32 v0, 6, v0

instead of

v_cndmask_b32_e64 v0, 6, 7, vcc

on which the binOp into Select combine is really missed, as Select is eliminated, but nevertheless (Zext cond) materializes as same as (Select cond 1, 0). So for AMDGPU : add ( zext Cond ), 6 <==> add ( Select 1, 0 ), 6 after the instruction selection is done. This really showcases that zext introduction (via select's combine) really caused the skip of BinOp fold into select, introducing the additional binary instruction.

It is the root cause of SWDEV-505394, as increases the code length.

The text was updated successfully, but these errors were encountered:

llvmbot · 2024-12-26T10:55:35Z

@llvm/issue-subscribers-backend-amdgpu

Author: Vikash Gupta (vg0204)

In both the ISEL under generic combines, various select with constants combine into binary ops with zext/sext operand like

select Cond, C1, C1-1 --&gt; add (zext Cond), (C1-1)
select Cond, Pow2, 0  --&gt;  shl  (zext Cond), log2(Pow2)  
select Cond, C1, C1+1 --&gt; add (sext Cond), (C1+1)

For various architecture, instruction materialization for zext/sext might be cheaper as compared to select, thus making sense for above combine optimization.

But in case of AMDGPU, both the zext/sext & select (for f32 with inline constants) materializes into v_cndmask_b32_e64. Thus the above optimization increases the cost by introducing an additional binary instruction.

If you look from different persepective, as in AMDGPU both the Zext/Sext and Select boils down to same machine instruction canonincally, thus really undoing the folding of binOp into Select. For example :

Select Cond, 7, 6 --> add ( zext Cond ), 6 materializes as :

v_cndmask_b32_e64 v0, 0, 1, vcc
v_add_u32_e32 v0, 6, v0

on which the binOp into Select combine is really missed, as Select is eliminated, but nevertheless (Zext cond)
materializes as same as (Select cond 1, 0). So for AMDGPU : add ( zext Cond ), 6 <==> add ( Select 1, 0 ), 6 after the instruction selection is done. This really showcases that zext introduction (via select's combine) really caused the skip of BinOp fold into select, introducing the additional binary instruction.

It is the root cause of SWDEV-505394, by increasing the code length.

vg0204 · 2024-12-26T11:07:49Z

Suggested Solutions :

Solution-0 : A target-specific combine to fold BinOp into Zext(in form of Select). This should be invoked after the completion of generic combines postDAGLegalization.

BinOp (zext (Cond), const0) -> select Cond, 1<BinOp>const0, 0<BinOp>const0 
BinOp (const0, zext (Cond)) -> select Cond, const0<BinOp>1, const0<BinOp>0

Solution-1 : A target-specific hook which prevents the combine of select with constants which give rise to binOp with zext/sext operand for AMDGPU, really eliminating the problem from source. But, have shaky thoughts on its implementation in a cleaner way without affecting the other target's pipeline.

vg0204 · 2024-12-26T11:14:32Z

CC : @arsenm, @cdevadas , @aemerson , @jayfoad

arsenm · 2024-12-26T12:20:58Z

In globalisel, all combines are explicitly opt-in. This doesn't need a new hook, and is already a generic combine in the DAG. We can just add it to the default sets

vg0204 · 2024-12-26T14:02:46Z

In globalisel, all combines are explicitly opt-in. This doesn't need a new hook, and is already a generic combine in the DAG. We can just add it to the default sets

Can you say specifically, what are you referring to be already in generic combine in the DAG; and can be just added to default set? Do you referring to an already exiting combine OR talking about the one I mentioned in the Solution-1 ?

vg0204 self-assigned this Dec 26, 2024

github-actions bot added the new issue label Dec 26, 2024

vg0204 assigned vg0204 and unassigned vg0204 Dec 26, 2024

vg0204 changed the title ~~[AMDGPU] [SelectionDAG] [GlobalIsel] select with constant combine into binaryOp with zext~~ [AMDGPU] [SelectionDAG] [GlobalIsel] select with constant combine into binaryOp with zext/sext Dec 26, 2024

vg0204 added backend:AMDGPU llvm:globalisel llvm:SelectionDAG SelectionDAGISel as well and removed new issue labels Dec 26, 2024

vg0204 mentioned this issue Dec 26, 2024

[WIP] [AMDGPU] [GlobalIsel] Combine Fmul with Select into ldexp instruction. #120104

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMDGPU] [SelectionDAG] [GlobalIsel] select with constant combine into binaryOp with zext/sext #121145

[AMDGPU] [SelectionDAG] [GlobalIsel] select with constant combine into binaryOp with zext/sext #121145

vg0204 commented Dec 26, 2024 •

edited

Loading

llvmbot commented Dec 26, 2024

vg0204 commented Dec 26, 2024 •

edited

Loading

vg0204 commented Dec 26, 2024

arsenm commented Dec 26, 2024

vg0204 commented Dec 26, 2024 •

edited

Loading

[AMDGPU] [SelectionDAG] [GlobalIsel] select with constant combine into binaryOp with zext/sext #121145

[AMDGPU] [SelectionDAG] [GlobalIsel] select with constant combine into binaryOp with zext/sext #121145

Comments

vg0204 commented Dec 26, 2024 • edited Loading

llvmbot commented Dec 26, 2024

vg0204 commented Dec 26, 2024 • edited Loading

vg0204 commented Dec 26, 2024

arsenm commented Dec 26, 2024

vg0204 commented Dec 26, 2024 • edited Loading

vg0204 commented Dec 26, 2024 •

edited

Loading

vg0204 commented Dec 26, 2024 •

edited

Loading

vg0204 commented Dec 26, 2024 •

edited

Loading