Bug 435665 - PPC ISA 3.0 copy, paste, cpabort instructions are not supported
Summary: PPC ISA 3.0 copy, paste, cpabort instructions are not supported
Status: CLOSED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: vex (show other bugs)
Version: unspecified
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-04-12 16:52 UTC by Carl Love
Modified: 2021-06-03 15:37 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
PPC add copy, paste, cpabort support (11.47 KB, patch)
2021-04-12 16:57 UTC, Carl Love
Details
fixes for the copy, paste patch (1.36 KB, patch)
2021-05-05 00:18 UTC, Carl Love
Details
fixes for the copy, paste patch (1.24 KB, patch)
2021-05-05 15:02 UTC, Carl Love
Details
PPC add copy, paste, cpabort support (14.14 KB, patch)
2021-05-28 19:44 UTC, Carl Love
Details
PPC add copy, paste, cpabort support (15.11 KB, patch)
2021-06-02 19:34 UTC, Carl Love
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Carl Love 2021-04-12 16:52:46 UTC
The copy, paste and cpabort instructions are not supported.  They require kernel support which has recently been added.  These instructions can be used to do a 128byte block memcopy.  They can also be used to communicate with hardware accelerators if they exist on the host system.
Comment 1 Carl Love 2021-04-12 16:57:31 UTC
Created attachment 137535 [details]
PPC add copy, paste, cpabort support

The patch uses a dirty helper to support the copy, paste, abort instructions.  A dirty helper was used.  Hardware accelerators require the use of the copy, paste and cpabort instructions for the user program to communicate.  Thus the instructions must be issued on the host system with the address of the data to be copied/pasted.  Cannot emulate the instructions with Iops.  

Julian, please review the patch and let me know if it looks OK.  Thanks.
Comment 2 Julian Seward 2021-05-04 09:13:51 UTC
(In reply to Carl Love from comment #1)
> Created attachment 137535 [details]
> PPC add copy, paste, cpabort support

+   d = unsafeIRDirty_1_N (
+      cr0,
+      0/*regparms*/,
+      "copy_paste_abort_dirty_helper",
+      fnptr_to_fnentry( vbi, &copy_paste_abort_dirty_helper ),
+      args );
+   stmt( IRStmt_Dirty(d) );

Doesn't this need some statement of effects?  (mem/regs read/written)
Without that Memcheck won't know what effect the instruction has.
Comment 3 Carl Love 2021-05-05 00:18:51 UTC
Created attachment 138153 [details]
fixes for the copy, paste patch

Yes, the dirty helper effects for the guest memory state is missing.  My bad.  The copy instruction reads from the guest memory and the paste instruction writes to guest memory.  The cpabort instruction does not touch the guest state.  Rather it resets the copy paste state in the underlying host hardware.  Not sure how to reflect that in the dirty helper effects.  I set the effect to read from address 0 which is really not accurate.  Really need another identifier for "no change"?

I have attached a patch with the fixes for comments.  Please let me know if the changes for copy and paste look OK, and any thoughts how to fix the cpabort effects.
Comment 4 Carl Love 2021-05-05 15:02:10 UTC
Created attachment 138168 [details]
fixes for the copy, paste patch

Updated fixes patch
Comment 5 Julian Seward 2021-05-07 14:12:44 UTC
I'm somewhat concerned, at a fundamental level .. if I understand
correctly, these instructions allow one to copy memory from one
place to another (so why bother?  Why not just do normal loads/stores?)
but -- at least as implemented -- there's no way to tell Memcheck
or any other tool, that the values in the destination area are
derived from values in the source area.  So you'll wind up with
definedness false positives or negatives as a result of using them.

Can you explain the background to the insns, what they are used for,
etc, so we can see if there's an implementation that fits better
in the instrumentation framework?
Comment 6 Carl Love 2021-05-07 16:05:50 UTC
Julian:

On Fri, 2021-05-07 at 14:12 +0000, Julian Seward wrote:
> https://bugs.kde.org/show_bug.cgi?id=435665 
> 
> --- Comment #5 from Julian Seward <jseward@acm.org> ---
> I'm somewhat concerned, at a fundamental level .. if I understand
> correctly, these instructions allow one to copy memory from one
> place to another (so why bother?  Why not just do normal
> loads/stores?)
> but -- at least as implemented -- there's no way to tell Memcheck
> or any other tool, that the values in the destination area are
> derived from values in the source area.  So you'll wind up with
> definedness false positives or negatives as a result of using them.
> 
> Can you explain the background to the insns, what they are used for,
> etc, so we can see if there's an implementation that fits better
> in the instrumentation framework?
> 

The instructions were added to communicate with optional hardware
accelerator units that a system may have.  The copy/paste instructions
an also be used to implement memcopy.

To use a hardware accelerator a the user program makes an OS call to
register the hardware accelerator.  The user program maps a memory
region for the accellerator into its program space.  The user program
can then communicate with the accelerator reading/writing to the memory
region,  The copy and paste instruction are used to move the data
to/from the mapped memory for the hardware accelerator.  The hardware
only allows the copy and paste instructions to be used to communicate
with the accelerators.  Normal loads and stores can not be used.  If
you try to use normal loads/stores you get a bus error.

The document:

 https://github.com/libnxz/power-gzip/blob/master/doc/power_nx_gzip_um.pdf

describes this in more detail.   I haven't actually written an code for
an accelerator but I was given some test programs for use in the
Valgrind support development.  One of the tests is a simple memcopy
program without an accelerator.  The other test communicate with an
accelerator.  The specific instructions are described in the ISA

https://ibm.ent.box.com/s/1hzcwkwf8rbju5h9iyf44wm94amnlcrv

I believe the layout for the memory used to communicate could be
different for each accelerator.  The memory will be changed by the
accelerator.  I don't think there is anyway we could know what parts of
the memory were changed (initialized or not) by the accelerator prior
to a copy from the mapped memory region back to a data structure in the
user program.  The instructions require the entire 128byes to be
copied/pasted.  You can not do a subset of the 128-byte memory.

In my first attempt at supporting the instructions, I created a new
128-byte "copy buffer register' in the guest state.  I had the copy and
paste instructions explicitly copy and paste to the guest state copy
buffer register.  I also had a guest state register to track the status
of a copy/paste in progress or not so I could detect errors.  This
worked for the memory copy program.  I would still have to have the
host do a copy/paste of the guest copy buffer register.  I would have
to get the mapped address from the instruction to then do the
copy/paste of the guest copy buffer.  It was all rather a mess.  I saw
no benefit in having a guest copy buffer.  

Since I have to use the copy and paste instructions, I really didn't
see any other way to handle these instructions other then using a dirty
helper to actually do the operations on the host.  

I can send you the test programs if that helps.

                        Carl
Comment 7 Julian Seward 2021-05-10 07:19:22 UTC
I looked at the PowerISA 3.0b documentation you linked to.
Given that this is a copy from "normal" memory to an accelerator,
I think you don't have the option to implement it precisely.

I'd say the least worst is to implement it so that, for the copy,
Memcheck will flag an error if any of the 128 bytes are undefined,
and for the paste we make it look as if the 128 bytes are
completely defined.  That way, at least you'll know if you're
sending undefined values to the accelerator.

Exactly how to achieve this I am not sure.  It will take some
fiddling with the annotations on the dirty helper calls.  I'll
contemplate it more.
Comment 8 Tulio Magno Quites Machado Filho 2021-05-10 18:49:40 UTC
(In reply to Julian Seward from comment #5)
> there's no way to tell Memcheck
> or any other tool, that the values in the destination area are
> derived from values in the source area.  So you'll wind up with
> definedness false positives or negatives as a result of using them.

Yes.  I think the user will always have to ignore at least one false positive when the program tries to read the memory from the accelerator, regardless if copy/paste is being used.

(In reply to Julian Seward from comment #7)
> I'd say the least worst is to implement it so that, for the copy,
> Memcheck will flag an error if any of the 128 bytes are undefined,
> and for the paste we make it look as if the 128 bytes are
> completely defined.  That way, at least you'll know if you're
> sending undefined values to the accelerator.

As someone that plans to use this feature, I'd say the minimum I expect is that Valgrind doesn't abort when copy/paste/cpabort is used.

But getting extra errors like these would be great indeed.
Comment 9 Julian Seward 2021-05-27 12:19:12 UTC
So .. I had an idea about how to get the annotations right.  Problem is, it's a
nasty hack (although simple), and I had hoped that the passage of a bit of time
would cause me to think of something cleaner.  But nothing came to mind, so
here it is.  I *think* it'll work, but it needs implementing and testing, to
check that if you do pass undefined values to the accelerator, you do actually
get a Memcheck error.

The basic idea is for your helper function -- the one that copies 128 bytes [or
whatever] to the accelerator -- to return a fake return value.  At present it
doesn't return anything.  Specifically, you change the helper to return a UWord
and that UWord should be (for example) zero.  Then, add a fake use of the fake
value, in a way that will cause Memcheck to test it, as follows.  Furthermore
this fake use must not be something that IR optimisation can later remove.

On the IR side (in guest_ppc_toIR.c), you allocate an IRTemp of type I64
(assuming 64-bit only operation) to receive the fake value, and set the
IRTemp::tmp field to be that temp.  Also for the IRDirty structure, you must
set mFx/mAddr/mSize so as to declare the memory area it is reading.

Immediately after emitting the IRDirty (in guest_ppc_toIR.c), place an
IRStmt_Exit that uses the fake return value.  Something very similar to this:

      stmt( IRStmt_Exit( 
               binop(Iop_CmpNE64, fake_return_val_tmp, mkU64(0)),
               Ijk_SigTRAP,
               mode64 ? IRConst_U64(cia) : IRConst_U32((UInt)cia),
               OFFB_CIA
      ));

where cia is the current instruction address (is often passed around, see (eg)
do_trap in guest_ppc_toIR.c).

The effects of this trickery are as follows:

(1) the above stmt() asks the IR to exit, asking Valgrind to hand the program a
    SIGTRAP at this point, if the fake return value is nonzero, however ..

(2) .. that never happens, because your helper always returns zero.

(3) Memcheck will believe that any undefinedness in the area read by the helper
    will be propagated through to the fake return value, and will generate
    instrumentation to cause that to happen.

(4) Memcheck will instrument the IRStmt_Exit to check the definedness computed
    by (3) and emit an error if the fake return value contains any undefined
    bits.  Hence you get the warning we want.

(5) We note that the IR optimisation passes do not know what value the helper
    call will return.  Hence we are guaranteed that they can't optimise away
    the IRStmt_Exit and its associated check.

Bad, huh?!  I wish there were a cleaner way.

I suggest you test it at least as follows:

(a) check that, if the memory area is fully defined, no error is printed.

(b) check that, if the memory area contains even one undefined bit, an error is
    printed.

(c) check that, if the helper function does return nonzero, then you really do
    get SIGTRAP synthesised at the instruction.  Not that this is important,
    but just as a kind of sanity check.

Good luck!  I am happy to look at patches, etc, as ever.
Comment 10 Carl Love 2021-05-28 19:44:49 UTC
Created attachment 138846 [details]
PPC add copy, paste, cpabort support

Updated patch to add the Memcheck trickery to get Memcheck to detect undefined bits in the copy paste buffer.

In comment 9, you mentioned that the dirty handler didn't return a value.  Actually it returns the 8-bit CR0 value in a UInt.  

The suggestion was to have the dirty helper return a UWord.  Unfortunately UWord is not defined.  I tested using a ULong and a UInt as the dirty helper return value and both seem to work.  I went with the UInt as it is a little easier to extract the CR0 value.  Not sure if there is something in the trickery that requires ULong instead of UInt?

The IRStmt_Exit() call was modified to mask out the lower 8-bits where the CR0 value is returned before comparing against zero.  The dirty helper ensures that the upper 24-bits are all zero.  

Per the suggested testing, I tested with all bits defined.  Valgrind does not report an error.  When one or all of the buffer bytes are undefined Valgrind reports:

=885408== Conditional jump or move depends on uninitialised value(s)
==885408==    at 0x180890: test_copy (test_copy_paste1.c:18)
==885408==    by 0x180963: main (test_copy_paste1.c:50)

as expected.  

I also tested removing the masking out of the CR0 field and verified the sig trap is generated. The message was:

==836609== Memcheck, a memory error detector
==836609== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==836609== Using Valgrind-3.18.0.GIT and LibVEX; rerun with -h for copyright inf
o
==836609== Command: ./test_copy_paste2
==836609== 
==836609== Conditional jump or move depends on uninitialised value(s)
==836609==    at 0x180890: test_copy (test_copy_paste2.c:18)
==836609==    by 0x180A0F: main (test_copy_paste2.c:79)
==836609== 
==836609== 
==836609== Process terminating with default action of signal 5 (SIGTRAP)
==836609==    at 0x180890: test_copy (test_copy_paste2.c:18)
==836609==    by 0x180A0F: main (test_copy_paste2.c:79)
==836609== 
==836609== HEAP SUMMARY:
==836609==     in use at exit: 0 bytes in 0 blocks
==836609==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==836609== 
==836609== All heap blocks were freed -- no leaks are possible
==836609== 
==836609== Use --track-origins=yes to see where uninitialised values come from
==836609== For lists of detected and suppressed errors, rerun with: -s
==836609== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Trace/breakpoint trap (core dumped)

The code was commented based on Julian's explanation in comment 9 to document how the trickery works.
Comment 11 Julian Seward 2021-06-02 15:48:46 UTC
(In reply to Carl Love from comment #10)
> Created attachment 138846 [details]
> PPC add copy, paste, cpabort support
> 
> Updated patch to add the Memcheck trickery to get Memcheck to detect
> undefined bits in the copy paste buffer.



I *think* this is OK now, with two caveats:

+UInt copy_paste_abort_dirty_helper(UInt addr, UInt op) {

This will not build on anything except a Power target.  It needs to be
ifdef'd in the usual way.

Ok to land if you fix the above plus sanity check the following issue.

Here's another clarifying comment; I think you should add something
like this.  It's about the "paste" effects annotations.  If I understand
the code correctly, for "paste" you will now have annotations

  Ifx_Write at (wherever the target address is) for 128 bytes
  and also
  the function takes two args `EA_base` and `operation`
  and it returns the uint32_t that you mention.

How Memcheck will instrument that is, if there is any undefinedness in
the inputs, then all of the outputs will be undefined.  Hence

  if    EA_base or operation contain any undefined bits

  then  the return value is undefined and the specified 128-byte memory area
        are undefined after the call
        
  else  the return value is undefined and the specified 128-byte memory area
        are defined after the call

The point is .. if I understand "paste" correctly, the destination address is
not part of the "normal" program address space.  So the Ifx_Write annotation
in this case is irrelevant.  Indeed, if the destination address is part of
*some other* address space, then we actually need to omit that Ifx_Write
annotation, because othewise Memcheck will update its status bits for that
address range in this program's address space, which would be wrong.

So .. I hope that makes sense.
Comment 12 Carl Love 2021-06-02 16:54:35 UTC
Tulio:

I think you are probably better able to clarify the comment from Julian in comment 11.

> The point is .. if I understand "paste" correctly, the destination address is
> not part of the "normal" program address space.  So the Ifx_Write annotation
> in this case is irrelevant.  Indeed, if the destination address is part of
> *some other* address space, then we actually need to omit that Ifx_Write
> annotation, because othewise Memcheck will update its status bits for that
> address range in this program's address space, which would be wrong.

The destination address for the paste instruction is always imapped into the current user address space, correct?   

Assuming Tulio agrees, than we will always want to have the Ifx_Write annotation.
Comment 13 Tulio Magno Quites Machado Filho 2021-06-02 17:05:44 UTC
(In reply to Carl Love from comment #12)
> The destination address for the paste instruction is always imapped into the
> current user address space, correct?   

Correct.  The address is always mmap'ed.
Comment 14 Julian Seward 2021-06-02 18:14:43 UTC
(In reply to Tulio Magno Quites Machado Filho from comment #13)
> (In reply to Carl Love from comment #12)
> Correct.  The address is always mmap'ed.

Ok.  Then I think you're good to go as-is, apart from the ifdef thing.
Comment 15 Carl Love 2021-06-02 19:34:04 UTC
Created attachment 138960 [details]
PPC add copy, paste, cpabort support

Updated patch 

- Added #if defined(__powerpc__) around body of copy_paste_abort_dirty_helper() function.
- Changed "asm volitile" to "__asm__ __volitile__" for the copy, paste and cpabort instructions in the copy_paste_abort_dirty_helper() function to make it consistent with other inline assembly.  

- Added clarifying comments about the paste instruction per comment 11

I retested the patch with the above changes.  

I believe all concerns have now been addressed.
Comment 16 Carl Love 2021-06-03 15:36:54 UTC
Patch committed.

commit fa4d21af44861e504a398412c9a120b28d6ceb4e (HEAD -> master, origin/master, origin/HEAD)
Author: Carl Love <cel@us.ibm.com>
Date:   Thu May 27 11:54:22 2021 -0500

    PPC64: Add support for copy, cpabort, paste instructions
Comment 17 Carl Love 2021-06-03 15:37:11 UTC
Closing