Bug 427400 - PPC ISA 3.1 support is missing, part 4
Summary: PPC ISA 3.1 support is missing, part 4
Status: CLOSED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: vex (show other bugs)
Version: unspecified
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-10-06 19:44 UTC by Carl Love
Modified: 2020-11-18 23:05 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
testsuite support for VSX scalar min max compare quad precision operations (98.13 KB, patch)
2020-10-06 19:47 UTC, Carl Love
Details
testsuite support for 128-bit binary integer operations (361.74 KB, patch)
2020-10-06 19:47 UTC, Carl Love
Details
testsuite support for bit manipulation opterations (469.96 KB, patch)
2020-10-06 19:48 UTC, Carl Love
Details
Functional support for ISA 3.1 VSX scalar min max compare quad precision (13.26 KB, patch)
2020-10-06 19:50 UTC, Carl Love
Details
Functional support for ISA 3.1 128-bit Binary integer operations (31.46 KB, patch)
2020-10-06 19:50 UTC, Carl Love
Details
Functional support for ISA 3.1 Bit manipulation operations (41.33 KB, patch)
2020-10-06 19:51 UTC, Carl Love
Details
Functional support for ISA 3.1 128-bit Binary integers operations (31.46 KB, patch)
2020-10-28 03:12 UTC, Carl Love
Details
Functional support for ISA 3.1 128-bit Binary integers operations (31.46 KB, patch)
2020-10-28 03:17 UTC, Carl Love
Details
functional suppor for ISA 3.1 Bit manipulaion operations (41.33 KB, patch)
2020-10-28 03:19 UTC, Carl Love
Details
Functional support for ISA 3.1 128-bit Binary integers operations (31.78 KB, patch)
2020-10-28 15:41 UTC, Carl Love
Details
functional suppor for ISA 3.1 Bit manipulaion operations (41.33 KB, patch)
2020-10-28 15:46 UTC, Carl Love
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Carl Love 2020-10-06 19:44:57 UTC
Add the fourth set of patches for the PPC ISA 3.1 support
Comment 1 Carl Love 2020-10-06 19:46:11 UTC
Previous set of patchs is in https://bugs.kde.org/show_bug.cgi?id=426123
Comment 2 Carl Love 2020-10-06 19:47:11 UTC
Created attachment 132166 [details]
testsuite support for VSX scalar min max compare quad precision operations
Comment 3 Carl Love 2020-10-06 19:47:54 UTC
Created attachment 132167 [details]
testsuite support for 128-bit binary integer operations
Comment 4 Carl Love 2020-10-06 19:48:30 UTC
Created attachment 132168 [details]
testsuite support for bit manipulation opterations
Comment 5 Carl Love 2020-10-06 19:50:07 UTC
Created attachment 132169 [details]
Functional support for ISA 3.1 VSX scalar min max compare quad precision
Comment 6 Carl Love 2020-10-06 19:50:51 UTC
Created attachment 132170 [details]
Functional support for ISA 3.1 128-bit Binary integer operations
Comment 7 Carl Love 2020-10-06 19:51:34 UTC
Created attachment 132171 [details]
Functional support for ISA 3.1 Bit manipulation operations
Comment 8 Carl Love 2020-10-06 22:16:47 UTC
Next set of patches are in https://bugs.kde.org/show_bug.cgi?id=427401
Comment 9 Julian Seward 2020-10-23 15:29:54 UTC
All 6 patches look good to me .. OK to land.
I'm always impressed how many test cases you have!

One comment about the helper function deposit_bits_under_mask_helper and
its friends, though.  The way you have it, you are at (serious) risk of
false positive undefined-value uses from Memcheck.  The reason is that
Memcheck doesn't know how definedness flows though the function (from
args to results).  So it assumes the worst: all 64 result bits are
assumed to be undefined if any of the 128 input bits are.

This will give false positives because it's imprecise: if you have an
undefined input bit in `src` that is not selected by `mask`, then that 
undefinedness doesn't propagate to the result.  But Memcheck can't
see that.

One hack you can do is, when generating IR to call the function, instead
of generating just

    deposit_bits_under_mask_helper(src, mask)

generate

    deposit_bits_under_mask_helper(src & mask, mask)

That should help, because it forces any bits of `src` that are not used,
to be defined zeroes.  Then Memcheck's imprecise instrumentation of the
function is OK, because it won't be fooled by undefined `src` bits that
are not used.

I would add -- if you do decide to make this refinement, I suggest you
make a test case that checks my theory is correct.  So far this is just
speculation.
Comment 10 Julian Seward 2020-10-23 15:50:56 UTC
Also I see you have a helper to compute popcount64
(population_count64_helper).  You know there are some fairly short IR
sequences that will do that in-line, right?  Even if you don't want to do use
that, and instead want to continue with the helper, maybe at least use the
fast sequences in the helper?  The are probably about 15-20 instructions long.
(see gen_POPCOUNT in guest_amd64_toIR.c)
Comment 11 Carl Love 2020-10-27 19:35:07 UTC
Reply to comment 9

I created a test case for the deposit_bits_under_mask_helper() to pass undefined bits.  I tested with and without the hack.  The hack did reduce the number of positive results as expected.  I add an extensive comment to the call where I added the hack to explain why we were ANDing the src and msk.  I borrowed extensively from your explanation.  :-)  Thanks for the help.
Comment 12 Carl Love 2020-10-28 01:40:31 UTC
Follow on.  My initial test cases passed but then the regression test had some failures.  Looking at the instruction a bit deeper, This instruction uses the mask bits to sort the bits from the source.  The src bits that correspond to a 1 in the mask bit position are packed into the lower n-bits where n is the number of 1 bits in the mask.  The src bits that correspond to a zero in the mask are packed into the uppper 64-n bits.  The values of the src bits are not changed.  Thus the hack of ANDing the src and mask bits will not work as it changes the value of the src.  Given that we don't actually know where the undefined src bits will end up, it is probably best to assume the worst that all 64-bits will be undefined unless we can explicitly calculate and set the shadow bits in the helper function.
Comment 13 Carl Love 2020-10-28 03:12:27 UTC
Created attachment 132820 [details]
Functional support for ISA 3.1 128-bit Binary integers operations
Comment 14 Carl Love 2020-10-28 03:17:59 UTC
Created attachment 132821 [details]
Functional support for ISA 3.1 128-bit Binary integers operations

Update patch
Comment 15 Carl Love 2020-10-28 03:19:44 UTC
Created attachment 132822 [details]
functional suppor for ISA 3.1 Bit manipulaion operations

Update patch
Comment 16 Carl Love 2020-10-28 15:41:38 UTC
Created attachment 132841 [details]
Functional support for ISA 3.1 128-bit Binary integers operations

Update the popcount 64-bit helper to use fast algorithm.
Comment 17 Carl Love 2020-10-28 15:46:03 UTC
Created attachment 132843 [details]
functional suppor for ISA 3.1 Bit manipulaion operations

Update patch as it is dependent on the 128-bit binary integer operation patch that was updated to do fast pop count.  Note, could not use the hack on thedeposit_bits_under_mask_helper() as the hack and the mask and src values.  The instruction actually uses the mask to sort the bits.  Thus the src bits must not be altered for the instruction to work.  Can't update the shadow bits based on the sorted result so leaving the code unchanged to allow memcheck to set all 64-bits to undefined if any of the input bits are undefined.
Comment 18 Julian Seward 2020-11-05 17:34:05 UTC
Looks OK to me.  Thanks for the rework.  OK to land.
Comment 19 Carl Love 2020-11-09 23:48:17 UTC
Patches committed.

ommit cd01d5eb0c4f1d7440c45ed9595fc1ab99165910 (HEAD -> master, origin/master, origin/HEAD)
Author: Carl Love <cel@us.ibm.com>
Date:   Tue Oct 6 12:01:35 2020 -0500

    Bit Manipulation Operation tests

commit 62f62b7ce64cf7f29df9ecb5607ec3a47b38722f
Author: Carl Love <cel@us.ibm.com>
Date:   Tue Oct 6 12:00:07 2020 -0500

    128-bit Binary Integer Operation tests

commit 7a52b46d1a5c306ed16acc5c006969fce5144683
Author: Carl Love <cel@us.ibm.com>
Date:   Tue Oct 6 11:58:44 2020 -0500

    VSX Scalar Min/Max/Compare Quad-Precision operation tests

commit ea228057d07056b1ce817baa51b6346ad6738e4b
Author: Carl Love <cel@us.ibm.com>
Date:   Wed May 6 20:09:38 2020 -0500

    ISA 3.1 Bit-Manipulation Operations
    
    Add support for:
    
    cfuged Centrifuge Doubleword
    cntlzdm Count Leading Zeros Doubleword under bit Mask
    cnttzdm Count Trailing Zeros Doubleword under bit Mask
    pdepd Parallel Bits Deposit Doubleword
    pextd Parallel Bits Extract Doubleword
    vcfuged Vector Centrifuge Doubleword
    vclzdm Vector Count Leading Zeros Doubleword under bit Mask
    vctzdm Vector Count Trailing Zeros Doubleword under bit Mask
    vgnb Vector Gather every Nth Bit
    vpdepd Vector Parallel Bits Deposit Doubleword
    vpextd Vector Parallel Bits Extract Doubleword
    xxeval VSX Vector Evaluate
    
    dependent on RFC2609 patch being applied first.
commit 42321f97e81af24e5112d72c16d9b2523182b04d
Author: Carl Love <cel@us.ibm.com>
Date:   Thu Sep 24 10:29:56 2020 -0500

    ISA 3.1 128-bit Binary Integer Operations
    
    Add support for:
    
    vmuleud, vmuloud, vmulesd, vmulosd
    vextsd2q, vcmpuq, vcmpsq
    vcmpequq, vcmpequq., vcmpgtuq, vcmpgtuq., vcmpgtsq, vcmpgtsq.
    vrlq, vrlqnm, vlqmi, vslq, vsrq, vsraq

commit 3ec034d7465fcddecc323a7012038f6b6987a264
Author: Carl Love <cel@us.ibm.com>
Date:   Thu Apr 30 10:37:07 2020 -0500

    ISA 3.1 VSX Scalar Minimum/Maximum/Compare Quad-Precision Operations
    
    Add support for:
    
    xscmpeqqp VSX Scalar Compare Equal Quad-Precision
    xscmpgeqp VSX Scalar Compare Greater Than or Equal Quad-Precision
    xscmpgtqp VSX Scalar Compare Greater Than Quad-Precision
    xsmaxcqp VSX Scalar Maximum Type-C Quad-Precision
    xsmincqp VSX Scalar Minimum Type-C Quad-Precision
Comment 20 Carl Love 2020-11-18 23:05:49 UTC
No issues seen with these patches in the last couple of weeks.  Closing.