Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU] [SelectionDAG] [GlobalIsel] select with constant combine into binaryOp with zext/sext #121145

Open
vg0204 opened this issue Dec 26, 2024 · 5 comments
Assignees

Comments

@vg0204
Copy link
Contributor

vg0204 commented Dec 26, 2024

In both the ISEL under generic combines, various select with constants combine into binary ops with zext/sext operand like

select Cond, C1, C1-1 --> add (zext Cond), (C1-1)
select Cond, Pow2, 0  -->  shl  (zext Cond), log2(Pow2)  
select Cond, C1, C1+1 --> add (sext Cond), (C1+1)

For various architecture, instruction materialization for zext/sext might be cheaper as compared to select, thus making sense for above combine optimization.

But in case of AMDGPU, both the zext/sext & select (for f32 with inline constants) materializes into v_cndmask_b32_e64. Thus the above optimization increases the cost by introducing an additional binary instruction.

If you look from different persepective, as in AMDGPU both the Zext/Sext and Select boils down to same machine instruction canonincally, thus really undoing the folding of binOp into Select. For example :

Select Cond, 7, 6 --> add ( zext Cond ), 6 materializes as :

v_cndmask_b32_e64 v0, 0, 1, vcc
v_add_u32_e32 v0, 6, v0

instead of

v_cndmask_b32_e64 v0, 6, 7, vcc

on which the binOp into Select combine is really missed, as Select is eliminated, but nevertheless (Zext cond) materializes as same as (Select cond 1, 0). So for AMDGPU : add ( zext Cond ), 6 <==> add ( Select 1, 0 ), 6 after the instruction selection is done. This really showcases that zext introduction (via select's combine) really caused the skip of BinOp fold into select, introducing the additional binary instruction.

It is the root cause of SWDEV-505394, as increases the code length.

@vg0204 vg0204 self-assigned this Dec 26, 2024
@vg0204 vg0204 assigned vg0204 and unassigned vg0204 Dec 26, 2024
@vg0204 vg0204 changed the title [AMDGPU] [SelectionDAG] [GlobalIsel] select with constant combine into binaryOp with zext [AMDGPU] [SelectionDAG] [GlobalIsel] select with constant combine into binaryOp with zext/sext Dec 26, 2024
@llvmbot
Copy link
Member

llvmbot commented Dec 26, 2024

@llvm/issue-subscribers-backend-amdgpu

Author: Vikash Gupta (vg0204)

In both the ISEL under generic combines, various select with constants combine into binary ops with zext/sext operand like
select Cond, C1, C1-1 --&gt; add (zext Cond), (C1-1)
select Cond, Pow2, 0  --&gt;  shl  (zext Cond), log2(Pow2)  
select Cond, C1, C1+1 --&gt; add (sext Cond), (C1+1)

For various architecture, instruction materialization for zext/sext might be cheaper as compared to select, thus making sense for above combine optimization.

But in case of AMDGPU, both the zext/sext & select (for f32 with inline constants) materializes into v_cndmask_b32_e64. Thus the above optimization increases the cost by introducing an additional binary instruction.

If you look from different persepective, as in AMDGPU both the Zext/Sext and Select boils down to same machine instruction canonincally, thus really undoing the folding of binOp into Select. For example :

Select Cond, 7, 6 --&gt; add ( zext Cond ), 6 materializes as :

v_cndmask_b32_e64 v0, 0, 1, vcc
v_add_u32_e32 v0, 6, v0

on which the binOp into Select combine is really missed, as Select is eliminated, but nevertheless (Zext cond)
materializes as same as (Select cond 1, 0). So for AMDGPU : add ( zext Cond ), 6 &lt;==&gt; add ( Select 1, 0 ), 6 after the instruction selection is done. This really showcases that zext introduction (via select's combine) really caused the skip of BinOp fold into select, introducing the additional binary instruction.

It is the root cause of SWDEV-505394, by increasing the code length.

@vg0204
Copy link
Contributor Author

vg0204 commented Dec 26, 2024

Suggested Solutions :

Solution-0 : A target-specific combine to fold BinOp into Zext(in form of Select). This should be invoked after the completion of generic combines postDAGLegalization.

BinOp (zext (Cond), const0) -> select Cond, 1<BinOp>const0, 0<BinOp>const0 
BinOp (const0, zext (Cond)) -> select Cond, const0<BinOp>1, const0<BinOp>0

Solution-1 : A target-specific hook which prevents the combine of select with constants which give rise to binOp with zext/sext operand for AMDGPU, really eliminating the problem from source. But, have shaky thoughts on its implementation in a cleaner way without affecting the other target's pipeline.

@vg0204
Copy link
Contributor Author

vg0204 commented Dec 26, 2024

CC : @arsenm, @cdevadas , @aemerson , @jayfoad

@arsenm
Copy link
Contributor

arsenm commented Dec 26, 2024

In globalisel, all combines are explicitly opt-in. This doesn't need a new hook, and is already a generic combine in the DAG. We can just add it to the default sets

@vg0204
Copy link
Contributor Author

vg0204 commented Dec 26, 2024

In globalisel, all combines are explicitly opt-in. This doesn't need a new hook, and is already a generic combine in the DAG. We can just add it to the default sets

Can you say specifically, what are you referring to be already in generic combine in the DAG; and can be just added to default set? Do you referring to an already exiting combine OR talking about the one I mentioned in the Solution-1 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants