aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* [mlir][StandardToSPIRV] Add support for lowering std.xor on bool to SPIR-VmainHanhan Wang2021-04-202-1/+32
| | | | | | | | | std.xor ops on bool are lowered to spv.LogicalNotEqual. For Boolean values, xor and not-equal are the same thing. Reviewed By: antiagainst Differential Revision: https://reviews.llvm.org/D100817
* [gn build] reformat all gn filesNico Weber2021-04-204-5/+8
| | | | | | $ git ls-files '*.gn' '*.gni' | xargs llvm/utils/gn/gn.py format (and manually wrap two comments)
* [AArch64][SVE] Lower MULHU/MULHS nodes to umulh/smulh instructionsBradley Smith2021-04-209-7/+1452
| | | | | | | | | | Mark MULHS/MULHU nodes as legal for both scalable and fixed SVE types, and lower them to the appropriate SVE instructions. Additionally now that the MULH nodes are legal, integer divides can be expanded into a more performant code sequence. Differential Revision: https://reviews.llvm.org/D100487
* Revert "[SLP] Add detection of shuffled/perfect matching of tree entries."Alexey Bataev2021-04-203-168/+128
| | | | | This reverts commit b232771acad6225574a2eaf9f860a0fed7ef0804 to fix buildbots.
* [ARM] Create VMOVRRD from adjacent vector extractsDavid Green2021-04-2083-8619/+7329
| | | | | | | | | | | | | | | This adds a combine for extract(x, n); extract(x, n+1) -> VMOVRRD(extract x, n/2). This allows two vector lanes to be moved at the same time in a single instruction, and thanks to the other VMOVRRD folds we have added recently can help reduce the amount of executed instructions. Floating point types are very similar, but will include a bitcast to an integer type. This also adds a shouldRewriteCopySrc, to prevent copy propagation from DPR to SPR, which can break as not all DPR regs can be extracted from directly. Otherwise the machine verifier is unhappy. Differential Revision: https://reviews.llvm.org/D100244
* [flang][driver] Refactor methods for parsing options (nfc)Andrzej Warzynski2021-04-202-18/+25
| | | | | | | | | | | | | | | | | | | | | This is just a small update that makes sure that errors arising from parsing command-line options are captured more visibly. Also, all parsing methods will now consistently return either a bool ("may fail") or void ("never fails"). An instance of `InputKind` coming from `-x` is added to `FrontendOptions` rather then being returned from `ParseFrontendArgs`. It's currently not used, but we will require it shortly. In particular, once code-generation is available we will use it to differentiate between LLVM IR and Fortran input. `FrontendOptions` is a very suitable place to keep it. This changes don't affect the error reporting in the driver. In this respect these are non-functional-changes. However, it will simplify things in the forthcoming patches in which we may need a better error tracking/recovery mechanism. Differential Revision: https://reviews.llvm.org/D100556
* [SLP] Add detection of shuffled/perfect matching of tree entries.Alexey Bataev2021-04-203-128/+168
| | | | | | | | | | | SLP supports perfect diamond matching for the vectorized tree entries but do not support it for gathered entries and does not support non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds support for this matching to improve cost of the vectorized tree. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D100495
* [AArch64][AsmParser] NFC: Remove unused ExtendOp structCullen Rhodes2021-04-201-4/+0
| | | | Left over from 2625a993f926 when extend and shift were merged.
* Fix PR46880: Fail CHECK-NOT with undefined variableThomas Preud'homme2021-04-208-82/+77
| | | | | | | | | | | Currently a CHECK-NOT directive succeeds whenever the corresponding match fails. However match can fail due to an error rather than a lack of match, for instance if a variable is undefined. This commit makes match error a failure for CHECK-NOT. Reviewed By: jdenny Differential Revision: https://reviews.llvm.org/D86222
* [AMDGPU] Add TransVALU to gfx10Sebastian Neubauer2021-04-204-43/+136
| | | | | | | | | Instructions on the transcendental unit are executed in parallel to the normal VALU, so add this as an extra resource. This doesn't seem to have any effect, but it should be more correct. Differential Revision: https://reviews.llvm.org/D100123
* [RISCV][NFC] Add tests for scalable-vector DAGCombiner improvementsFraser Cormack2021-04-206-1/+125
| | | | These will all be improved by future patches.
* [AMDGPU] Use if instead of foreach in a few places. NFC.Jay Foad2021-04-202-7/+7
|
* [flang][nfc] Port 2 tests to use the new driver when enabledAndrzej Warzynski2021-04-202-2/+2
| | | | | | | | | | | | | | | This is similar to https://reviews.llvm.org/D100309, i.e. `%f18` is replaced with `%flang_new`. resolve105.f90 wasn't in tree when D100309 was worked on, so it's updated here instead. label14.f90 requires `-fsyntax-only`. I didn't notice that when submitting D100309, hence updating it now instead. `-fsyntax-only` is required to prevent `%f18` from calling an external compiler (which then fails and returns a non-zero exit code). Differential Revision: https://reviews.llvm.org/D100655
* [libc++][ci] Re-split the CI pipeline to try and reduce load on more buildersLouis Dionne2021-04-201-29/+35
|
* [MCA][LSUnit] Fix a potential use after free in the logic that updates ↵Andrea Di Biagio2021-04-202-2/+7
| | | | | | | | | memory groups. Make sure that the `CriticalMemoryInstruction` of a memory group is invalidated if it references an already executed instruction. This avoids a potential use-after-free if the critical memory info becomes stale, and the value is read after the instruction has executed.
* [PowerPC] Canonicalize shuffles on big endian targets as wellNemanja Ivanovic2021-04-2028-2902/+3646
| | | | | | | | | | | | | | Extend shuffle canonicalization and conversion of shuffles fed by vectorized scalars to big endian subtargets. For big endian subtargets, loads and direct moves of scalars into vector registers put the data in the correct element for SCALAR_TO_VECTOR if the data type is 8 bytes wide. However, if the data type is narrower, the value still ends up in the wrong place - althouth a different wrong place than on little endian targets. This patch extends the combine that keeps values where they are if they feed a shuffle to big endian targets. Differential revision: https://reviews.llvm.org/D100478
* [llvm-objdump] Add an llvm-otool toolNico Weber2021-04-2029-68/+509
| | | | | | | | | | | | | | This implements an LLVM tool that's flag- and output-compatible with macOS's `otool` -- except for bugs, but from testing with both `otool` and `xcrun otool-classic`, llvm-otool matches vanilla otool's behavior very well already. It's not 100% perfect, but it's a very solid start. This uses the same approach as llvm-objcopy: llvm-objdump uses a different OptTable when it's invoked as llvm-otool. This is possible thanks to D100433. Differential Revision: https://reviews.llvm.org/D100583
* [ValueTypes] Fix sizes of v256i32 and v256f32 (8182 -> 8192)Cullen Rhodes2021-04-201-2/+2
|
* [AMDGPU] Use simpler alternatives to !foldl. NFC.Jay Foad2021-04-201-4/+4
|
* [mlir][linalg] lower index operations during linalg to vector lowering.Tobias Gysi2021-04-207-27/+134
| | | | | | The patch extends the vectorization pass to lower linalg index operations to vector code. It allocates constant 1d vectors that enumerate the indexes along the iteration dimensions and broadcasts/transposes these 1d vectors to the iteration space. Differential Revision: https://reviews.llvm.org/D100373
* [DAG] SelectionDAG.cpp - breakup if-else chains where each block returns. NFCI.Simon Pilgrim2021-04-201-22/+19
| | | | Match style guide that requests that if+return blocks are separate.
* Fix Wdocumentation warning by consistently using '///' comment blocks. NFCI.Simon Pilgrim2021-04-201-4/+4
|
* [mlir] test gather/scatter index vector of type index.Tobias Gysi2021-04-201-6/+11
| | | | | | Test the vector to llvm lowering of index vectors with index element type. Differential Revision: https://reviews.llvm.org/D100827
* [lit, test] Fix test cancellation feature detectionThomas Preud'homme2021-04-202-3/+24
| | | | | | | | | | | | | | | | | | | | A lit feature guards tests for the lit timeout functionality because on most system it depends on the availability of the psutil Python module. However, that feature is defined based on the ability of the testing lit to cancel test, which does not necessarily apply to the ability of the tested lit. In particular, RUN commands have a cleared PYTHONPATH and user site packages are disabled. In the case where psutil is found by the testing lit from one of those two source of python path, the tested lit would not be able to find it, causing timeout tests to fail. This commit fixes the issue by testing the ability to cancel tests in the RUN command environment. Reviewed By: yln Differential Revision: https://reviews.llvm.org/D99728
* clang-format: [JS] do not merge imports and exports.Martin Probst2021-04-202-0/+10
| | | | | | | Previously, clang-format would erroneously merge import and export statements. These need to be kept separate, as the semantics differ. Differential Revision: https://reviews.llvm.org/D100752
* [C++, test] Fix typo in NSS* varsThomas Preud'homme2021-04-201-3/+2
| | | | | | | | | | | | The NSS FileCheck variables at the end of the CodeGenCXX/split-stacks.cpp clang testcase are off by 1, resulting in the use of an undefined variable (NSS3). One of the CHECK-NOT is also redundant because _Z8tnosplitIiEiv uses the same attribute as _Z3foov without split stack. This commit fixes that. Reviewed By: ChuanqiXu Differential Revision: https://reviews.llvm.org/D99839
* [AMDGPU] Re-arrange ds_read/ds_write ISel pattern for better readability.hsmahesha2021-04-201-16/+4
| | | | | | Reviewed By: foad Differential Revision: https://reviews.llvm.org/D100773
* [MemoryBuiltins] Added support for memalignDávid Bolvanský2021-04-202-0/+53
| | | | memalign is older aligned_alloc.
* [Support] APInt.h - remove <algorithm> include. NFCI.Simon Pilgrim2021-04-201-3/+4
| | | | Replace std::min use which should allow us to avoid including the <algorithm> header in every include of APInt.h.
* [CodeGen] CodeGenPassBuilder.h - remove unnecessary <string> include. NFCI.Simon Pilgrim2021-04-201-1/+0
| | | | We only use StringRef so include that.
* [RISCV] Refactor an optimization of addition with immediateBen Shi2021-04-203-23/+41
| | | | | | Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100769
* [AArch64] Constant fold sve_convert_from_svbool(zero) to zeroJoe Ellis2021-04-203-17/+45
| | | | | | Co-authored-by: Paul Walker <paul.walker@arm.com> Differential Revision: https://reviews.llvm.org/D100463
* [AArch64][SVE][InstCombine] Replace last{a,b} intrinsics with extracts...Joe Ellis2021-04-203-0/+250
| | | | | | | | | | | | when the predicate used by last{a,b} specifies a known vector length. For example: aarch64_sve_lasta(VL1, D) -> extractelement(D, #1) aarch64_sve_lastb(VL1, D) -> extractelement(D, #0) Co-authored-by: Paul Walker <paul.walker@arm.com> Differential Revision: https://reviews.llvm.org/D100476
* [libcxx][test] Split off debug mode testsKristina Bessonova2021-04-2050-562/+1168
| | | | | | | | | | | | | | | | | This continues the work started by @ldionne in 2908eb20ba7. The debug mode tests from - libcxx/containers/sequences/vector/ - libcxx/strings/basic.string/string.access/ - libcxx/strings/basic.string/string.iterators/ similarly contain two tests in every file making the second test never run. The patch splits the tests into separate files. Reviewed By: Quuxplusone, ldionne Differential Revision: https://reviews.llvm.org/D100592
* [ARM] Regenerate a couple of tests. NFCDavid Green2021-04-202-611/+1905
|
* [mlir] Progressively lower vector to SCFMatthias Springer2021-04-206-0/+641
| | | | | | | | Add a new ProgressiveVectorToSCF pass that lowers vector transfer ops to SCF by gradually unpacking one dimension at time. Unpacking stops at 1D, but can be configured to stop earlier, should the HW support (N>1)-d vectors. The current implementation cannot handle permutation maps, masks, tensor types and unrolling yet. These will be added in subsequent commits. Once features are on par with VectorToSCF, this implementation will replace VectorToSCF. Differential Revision: https://reviews.llvm.org/D100622
* [mlir] Add patterns to lower Math operations to LLVM based libm calls.Tres Popp2021-04-207-0/+277
| | | | | | | | Some Math operations do not have an equivalent in LLVM. In these cases, allow a low priority fallback of calling the libm functions. This is to give functionality and is not a performant option. Differential Revision: https://reviews.llvm.org/D100367
* [Support] BinaryStreamReader.h - remove unnecessary <string> include. NFCI.Simon Pilgrim2021-04-201-2/+1
| | | | We only use StringRef so include that.
* Re-land [GreedyRA ORE] Add Cost of spill locations into remarkSerguei Katkov2021-04-205-15/+71
| | | | | | | | | | | | | Re-land the patch with a fix of clang test. Cost of spill location is computed basing on relative branch frequency where corresponding spill/reload/copy are located. While the number itself is highly depends on incoming IR, the total cost can be used when do some changes in RA. Revert "Revert "[GreedyRA ORE] Add Cost of spill locations into remark"" This reverts commit 680f3d6de79f7dd75ee0cda256a541d18e504a22.
* [RISCV] Fix missing emergency slots for scalable stack offsetsFraser Cormack2021-04-206-37/+275
| | | | | | | | | | | | | | This patch adds an additional emergency spill slot to RVV code. This is required as RVV stack offsets may require an additional register to compute. This patch includes an optimization by @HsiangKai <kai.wang@sifive.com> to reduce the number of registers required for the computation of stack offsets from 3 to 2. Otherwise we'd need two additional emergency spill slots. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D100574
* [LV] Let selectVectorizationFactor reason directly on VectorizationFactor.Sander de Smalen2021-04-205-36/+62
| | | | | | | | | | | | | | | | | | | | | | Rather than maintaining two separate values, a `float` for the per-lane cost and a Width for the VF, maintain a single VectorizationFactor which comprises the two and also removes the need for converting an integer value to float. This simplifies the query when asking if one VF is more profitable than another when we want to extend this for scalable vectors (which may require additional options to determine if e.g. a scalable VF of the some cost, is more profitable than a fixed VF of the same cost). The patch isn't entirely NFC because it also fixes an issue in selectEpilogueVectorizationFactor, where the cost passed to ProfitableVFs no longer truncates the floating-point cost from `float` to `unsigned` to then perform the calculation on the truncated cost. It now does a cost comparison with the correct precision. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100121
* [PowerPC] Use mtvsrdd to put callee-saved GPR into VSRQiu Chaofan2021-04-203-27/+103
| | | | | | | | | | This patch exploits mtvsrdd instruction (available in ISA3.0+) to save two callee-saved GPR registers into a single VSR, making it more efficient. Reviewed By: jsji, nemanjai Differential Revision: https://reviews.llvm.org/D62565
* [DAGCombiner] Support fold zero scalar vector.Jun Ma2021-04-207-56/+73
| | | | | | | | | This patch changes ISD::isBuildVectorAllZeros to ISD::isConstantSplatVectorAllZeros which handles zero sclar vector. TestPlan: check-llvm Differential Revision: https://reviews.llvm.org/D100813
* [AMDGPU] GCNDPPCombine: don't shrink V_ADD_CO_U32 if carry out is usedJay Foad2021-04-202-0/+27
| | | | | | | | Don't shrink VOP3 instructions if there are any uses of a carry-out operand, because the shrunken form of the instruction would write the carry-out to vcc instead of to a virtual register. Differential Revision: https://reviews.llvm.org/D100760
* [X86][AMX] Verify illegal types or instructions for x86_amx.Luo, Yuanke2021-04-2013-19/+90
| | | | | | | | | | This patch is related to https://reviews.llvm.org/D100032 which define some illegal types or operations for x86_amx. There are no arguments, arrays, pointers, vectors or constants of x86_amx. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D100472
* Explicitly pass type to cast load constant folding resultArthur Eubanks2021-04-205-32/+37
| | | | | | | | | | | | | | | | | | | | | | Previously we would use the type of the pointee to determine what to cast the result of constant folding a load. To aid with opaque pointer types, we should explicitly pass the type of the load rather than looking at pointee types. ConstantFoldLoadThroughBitcast() converts the const prop'd value to the proper load type (e.g. [1 x i32] -> i32). Instead of calling this in every intermediate step like bitcasts, we only call this when we actually see the global initializer value. In some existing uses of this API, we don't know the exact type we're loading from immediately (e.g. first we visit a bitcast, then we visit the load using the bitcast). In those cases we have to manually call ConstantFoldLoadThroughBitcast() when simplifying the load to make sure that we cast to the proper type. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D100718
* [PowerPC] Support f128 under VSXQiu Chaofan2021-04-2015-1634/+1499
| | | | | | | | | | This patch is the last one in backend to support fp128 type in pre-POWER9 subtargets with VSX, removing temporary option and updating remaining tests. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D92374
* [SelectionDAG] Relax constraints on STEP_VECTOR step operandFraser Cormack2021-04-204-248/+36
| | | | | | | | | | | | | | | | | | This patch relaxes the requirement that the STEP_VECTOR step constant must be of a type at least as large as the vector element type. This does not permit its use on targets which have legal vector element types larger than the largest legal scalar type, such as i64 vectors on RV32. As such, the requirement has been loosened so that the step operand must be any scalar type so long as the constant immediate is non-negative and the value fits inside the vector element type. This limits combining optimizations in certain circumstances but in practice it's unlikely to be a hindrance. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D100660
* [CSKY 6/n] Add support branch and symbol series instructionZi Xuan Wu2021-04-2013-11/+817
| | | | | | | | | | This patch adds basic CSKY branch instructions and symbol address series instructions. Those two kinds of instruction have relationship between each other, and it involves much work about Fixups. For now, basic instructions are enabled except for disassembler support. We would support to generate basic codegen asm firstly and delay disassembler work later. Differential Revision: https://reviews.llvm.org/D95029
* [CSKY 5/n] Add support for all CSKY basic integer instructions except for ↵Zi Xuan Wu2021-04-205-46/+407
| | | | | | | | | | | branch series This patch adds basic CSKY integer instructions except for branch series such as bsr, br. It mainly includes basic ALU, load & store, compare and data move instructions. Branch series instructions need handle complex symbol operand as following patch later. Differential Revision: https://reviews.llvm.org/D94007