aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorSebastian Neubauer <sebastian.neubauer@amd.com>2021-04-12 10:25:54 +0200
committerSebastian Neubauer <sebastian.neubauer@amd.com>2021-04-12 11:01:38 +0200
commitf9a8c6a0e50540f68e6740a849a7caf5e4d46ca6 (patch)
treeddd01dac8b35608ae506a18434ec16074d3cf46e /llvm/test/CodeGen/AMDGPU/spill-m0.ll
parent[OpenCL] Accept .rgba in OpenCL 3.0 (diff)
downloadllvm-project-dev-main-update.tar.gz
llvm-project-dev-main-update.tar.bz2
llvm-project-dev-main-update.zip
[AMDGPU] Save VGPR of whole wave when spillingdev-main-update
Spilling SGPRs to scratch uses a temporary VGPR. LLVM currently cannot determine if a VGPR is used in other lanes or not, so we need to save all lanes of the VGPR. We even need to save the VGPR if it is marked as dead. The generated code depends on two things: - Can we scavenge an SGPR to save EXEC? - And can we scavenge a VGPR? If we can scavenge an SGPR, we - save EXEC into the SGPR - set the needed lane mask - save the temporary VGPR - write the spilled SGPR into VGPR lanes - save the VGPR again to the target stack slot - restore the VGPR - restore EXEC If we were not able to scavenge an SGPR, we do the same operations, but everytime the temporary VGPR is written to memory, we - write VGPR to memory - flip exec (s_not exec, exec) - write VGPR again (previously inactive lanes) Surprisingly often, we are able to scavenge an SGPR, even though we are at the brink of running out of SGPRs. Scavenging a VGPR does not have a great effect (saves three instructions if no SGPR was scavenged), but we need to know if the VGPR we use is live before or not, otherwise the machine verifier complains. Differential Revision: https://reviews.llvm.org/D96336
Diffstat (limited to 'llvm/test/CodeGen/AMDGPU/spill-m0.ll')
-rw-r--r--llvm/test/CodeGen/AMDGPU/spill-m0.ll6
1 files changed, 3 insertions, 3 deletions
diff --git a/llvm/test/CodeGen/AMDGPU/spill-m0.ll b/llvm/test/CodeGen/AMDGPU/spill-m0.ll
index 474461d2ae12..91d3f8c98c8d 100644
--- a/llvm/test/CodeGen/AMDGPU/spill-m0.ll
+++ b/llvm/test/CodeGen/AMDGPU/spill-m0.ll
@@ -14,11 +14,11 @@
; TOVGPR: v_writelane_b32 [[SPILL_VREG:v[0-9]+]], [[M0_COPY]], [[M0_LANE:[0-9]+]]
+; TOVMEM: s_mov_b64 [[COPY_EXEC:s\[[0-9]+:[0-9]+\]]], exec
+; TOVMEM: s_mov_b64 exec, 1
; TOVMEM: v_writelane_b32 [[SPILL_VREG:v[0-9]+]], [[M0_COPY]], 0
-; TOVMEM: s_mov_b32 [[COPY_EXEC_LO:s[0-9]+]], exec_lo
-; TOVMEM: s_mov_b32 exec_lo, 1
; TOVMEM: buffer_store_dword [[SPILL_VREG]], off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:4 ; 4-byte Folded Spill
-; TOVMEM: s_mov_b32 exec_lo, [[COPY_EXEC_LO]]
+; TOVMEM: s_mov_b64 exec, [[COPY_EXEC]]
; GCN: s_cbranch_scc1 [[ENDIF:BB[0-9]+_[0-9]+]]