Add the fourth set of patches for the PPC ISA 3.1 support
Previous set of patchs is in https://bugs.kde.org/show_bug.cgi?id=426123
Created attachment 132166 [details] testsuite support for VSX scalar min max compare quad precision operations
Created attachment 132167 [details] testsuite support for 128-bit binary integer operations
Created attachment 132168 [details] testsuite support for bit manipulation opterations
Created attachment 132169 [details] Functional support for ISA 3.1 VSX scalar min max compare quad precision
Created attachment 132170 [details] Functional support for ISA 3.1 128-bit Binary integer operations
Created attachment 132171 [details] Functional support for ISA 3.1 Bit manipulation operations
Next set of patches are in https://bugs.kde.org/show_bug.cgi?id=427401
All 6 patches look good to me .. OK to land. I'm always impressed how many test cases you have! One comment about the helper function deposit_bits_under_mask_helper and its friends, though. The way you have it, you are at (serious) risk of false positive undefined-value uses from Memcheck. The reason is that Memcheck doesn't know how definedness flows though the function (from args to results). So it assumes the worst: all 64 result bits are assumed to be undefined if any of the 128 input bits are. This will give false positives because it's imprecise: if you have an undefined input bit in `src` that is not selected by `mask`, then that undefinedness doesn't propagate to the result. But Memcheck can't see that. One hack you can do is, when generating IR to call the function, instead of generating just deposit_bits_under_mask_helper(src, mask) generate deposit_bits_under_mask_helper(src & mask, mask) That should help, because it forces any bits of `src` that are not used, to be defined zeroes. Then Memcheck's imprecise instrumentation of the function is OK, because it won't be fooled by undefined `src` bits that are not used. I would add -- if you do decide to make this refinement, I suggest you make a test case that checks my theory is correct. So far this is just speculation.
Also I see you have a helper to compute popcount64 (population_count64_helper). You know there are some fairly short IR sequences that will do that in-line, right? Even if you don't want to do use that, and instead want to continue with the helper, maybe at least use the fast sequences in the helper? The are probably about 15-20 instructions long. (see gen_POPCOUNT in guest_amd64_toIR.c)
Reply to comment 9 I created a test case for the deposit_bits_under_mask_helper() to pass undefined bits. I tested with and without the hack. The hack did reduce the number of positive results as expected. I add an extensive comment to the call where I added the hack to explain why we were ANDing the src and msk. I borrowed extensively from your explanation. :-) Thanks for the help.
Follow on. My initial test cases passed but then the regression test had some failures. Looking at the instruction a bit deeper, This instruction uses the mask bits to sort the bits from the source. The src bits that correspond to a 1 in the mask bit position are packed into the lower n-bits where n is the number of 1 bits in the mask. The src bits that correspond to a zero in the mask are packed into the uppper 64-n bits. The values of the src bits are not changed. Thus the hack of ANDing the src and mask bits will not work as it changes the value of the src. Given that we don't actually know where the undefined src bits will end up, it is probably best to assume the worst that all 64-bits will be undefined unless we can explicitly calculate and set the shadow bits in the helper function.
Created attachment 132820 [details] Functional support for ISA 3.1 128-bit Binary integers operations
Created attachment 132821 [details] Functional support for ISA 3.1 128-bit Binary integers operations Update patch
Created attachment 132822 [details] functional suppor for ISA 3.1 Bit manipulaion operations Update patch
Created attachment 132841 [details] Functional support for ISA 3.1 128-bit Binary integers operations Update the popcount 64-bit helper to use fast algorithm.
Created attachment 132843 [details] functional suppor for ISA 3.1 Bit manipulaion operations Update patch as it is dependent on the 128-bit binary integer operation patch that was updated to do fast pop count. Note, could not use the hack on thedeposit_bits_under_mask_helper() as the hack and the mask and src values. The instruction actually uses the mask to sort the bits. Thus the src bits must not be altered for the instruction to work. Can't update the shadow bits based on the sorted result so leaving the code unchanged to allow memcheck to set all 64-bits to undefined if any of the input bits are undefined.
Looks OK to me. Thanks for the rework. OK to land.
Patches committed. ommit cd01d5eb0c4f1d7440c45ed9595fc1ab99165910 (HEAD -> master, origin/master, origin/HEAD) Author: Carl Love <cel@us.ibm.com> Date: Tue Oct 6 12:01:35 2020 -0500 Bit Manipulation Operation tests commit 62f62b7ce64cf7f29df9ecb5607ec3a47b38722f Author: Carl Love <cel@us.ibm.com> Date: Tue Oct 6 12:00:07 2020 -0500 128-bit Binary Integer Operation tests commit 7a52b46d1a5c306ed16acc5c006969fce5144683 Author: Carl Love <cel@us.ibm.com> Date: Tue Oct 6 11:58:44 2020 -0500 VSX Scalar Min/Max/Compare Quad-Precision operation tests commit ea228057d07056b1ce817baa51b6346ad6738e4b Author: Carl Love <cel@us.ibm.com> Date: Wed May 6 20:09:38 2020 -0500 ISA 3.1 Bit-Manipulation Operations Add support for: cfuged Centrifuge Doubleword cntlzdm Count Leading Zeros Doubleword under bit Mask cnttzdm Count Trailing Zeros Doubleword under bit Mask pdepd Parallel Bits Deposit Doubleword pextd Parallel Bits Extract Doubleword vcfuged Vector Centrifuge Doubleword vclzdm Vector Count Leading Zeros Doubleword under bit Mask vctzdm Vector Count Trailing Zeros Doubleword under bit Mask vgnb Vector Gather every Nth Bit vpdepd Vector Parallel Bits Deposit Doubleword vpextd Vector Parallel Bits Extract Doubleword xxeval VSX Vector Evaluate dependent on RFC2609 patch being applied first. commit 42321f97e81af24e5112d72c16d9b2523182b04d Author: Carl Love <cel@us.ibm.com> Date: Thu Sep 24 10:29:56 2020 -0500 ISA 3.1 128-bit Binary Integer Operations Add support for: vmuleud, vmuloud, vmulesd, vmulosd vextsd2q, vcmpuq, vcmpsq vcmpequq, vcmpequq., vcmpgtuq, vcmpgtuq., vcmpgtsq, vcmpgtsq. vrlq, vrlqnm, vlqmi, vslq, vsrq, vsraq commit 3ec034d7465fcddecc323a7012038f6b6987a264 Author: Carl Love <cel@us.ibm.com> Date: Thu Apr 30 10:37:07 2020 -0500 ISA 3.1 VSX Scalar Minimum/Maximum/Compare Quad-Precision Operations Add support for: xscmpeqqp VSX Scalar Compare Equal Quad-Precision xscmpgeqp VSX Scalar Compare Greater Than or Equal Quad-Precision xscmpgtqp VSX Scalar Compare Greater Than Quad-Precision xsmaxcqp VSX Scalar Maximum Type-C Quad-Precision xsmincqp VSX Scalar Minimum Type-C Quad-Precision
No issues seen with these patches in the last couple of weeks. Closing.