The copy, paste and cpabort instructions are not supported. They require kernel support which has recently been added. These instructions can be used to do a 128byte block memcopy. They can also be used to communicate with hardware accelerators if they exist on the host system.
Created attachment 137535 [details] PPC add copy, paste, cpabort support The patch uses a dirty helper to support the copy, paste, abort instructions. A dirty helper was used. Hardware accelerators require the use of the copy, paste and cpabort instructions for the user program to communicate. Thus the instructions must be issued on the host system with the address of the data to be copied/pasted. Cannot emulate the instructions with Iops. Julian, please review the patch and let me know if it looks OK. Thanks.
(In reply to Carl Love from comment #1) > Created attachment 137535 [details] > PPC add copy, paste, cpabort support + d = unsafeIRDirty_1_N ( + cr0, + 0/*regparms*/, + "copy_paste_abort_dirty_helper", + fnptr_to_fnentry( vbi, ©_paste_abort_dirty_helper ), + args ); + stmt( IRStmt_Dirty(d) ); Doesn't this need some statement of effects? (mem/regs read/written) Without that Memcheck won't know what effect the instruction has.
Created attachment 138153 [details] fixes for the copy, paste patch Yes, the dirty helper effects for the guest memory state is missing. My bad. The copy instruction reads from the guest memory and the paste instruction writes to guest memory. The cpabort instruction does not touch the guest state. Rather it resets the copy paste state in the underlying host hardware. Not sure how to reflect that in the dirty helper effects. I set the effect to read from address 0 which is really not accurate. Really need another identifier for "no change"? I have attached a patch with the fixes for comments. Please let me know if the changes for copy and paste look OK, and any thoughts how to fix the cpabort effects.
Created attachment 138168 [details] fixes for the copy, paste patch Updated fixes patch
I'm somewhat concerned, at a fundamental level .. if I understand correctly, these instructions allow one to copy memory from one place to another (so why bother? Why not just do normal loads/stores?) but -- at least as implemented -- there's no way to tell Memcheck or any other tool, that the values in the destination area are derived from values in the source area. So you'll wind up with definedness false positives or negatives as a result of using them. Can you explain the background to the insns, what they are used for, etc, so we can see if there's an implementation that fits better in the instrumentation framework?
Julian: On Fri, 2021-05-07 at 14:12 +0000, Julian Seward wrote: > https://bugs.kde.org/show_bug.cgi?id=435665 > > --- Comment #5 from Julian Seward <jseward@acm.org> --- > I'm somewhat concerned, at a fundamental level .. if I understand > correctly, these instructions allow one to copy memory from one > place to another (so why bother? Why not just do normal > loads/stores?) > but -- at least as implemented -- there's no way to tell Memcheck > or any other tool, that the values in the destination area are > derived from values in the source area. So you'll wind up with > definedness false positives or negatives as a result of using them. > > Can you explain the background to the insns, what they are used for, > etc, so we can see if there's an implementation that fits better > in the instrumentation framework? > The instructions were added to communicate with optional hardware accelerator units that a system may have. The copy/paste instructions an also be used to implement memcopy. To use a hardware accelerator a the user program makes an OS call to register the hardware accelerator. The user program maps a memory region for the accellerator into its program space. The user program can then communicate with the accelerator reading/writing to the memory region, The copy and paste instruction are used to move the data to/from the mapped memory for the hardware accelerator. The hardware only allows the copy and paste instructions to be used to communicate with the accelerators. Normal loads and stores can not be used. If you try to use normal loads/stores you get a bus error. The document: https://github.com/libnxz/power-gzip/blob/master/doc/power_nx_gzip_um.pdf describes this in more detail. I haven't actually written an code for an accelerator but I was given some test programs for use in the Valgrind support development. One of the tests is a simple memcopy program without an accelerator. The other test communicate with an accelerator. The specific instructions are described in the ISA https://ibm.ent.box.com/s/1hzcwkwf8rbju5h9iyf44wm94amnlcrv I believe the layout for the memory used to communicate could be different for each accelerator. The memory will be changed by the accelerator. I don't think there is anyway we could know what parts of the memory were changed (initialized or not) by the accelerator prior to a copy from the mapped memory region back to a data structure in the user program. The instructions require the entire 128byes to be copied/pasted. You can not do a subset of the 128-byte memory. In my first attempt at supporting the instructions, I created a new 128-byte "copy buffer register' in the guest state. I had the copy and paste instructions explicitly copy and paste to the guest state copy buffer register. I also had a guest state register to track the status of a copy/paste in progress or not so I could detect errors. This worked for the memory copy program. I would still have to have the host do a copy/paste of the guest copy buffer register. I would have to get the mapped address from the instruction to then do the copy/paste of the guest copy buffer. It was all rather a mess. I saw no benefit in having a guest copy buffer. Since I have to use the copy and paste instructions, I really didn't see any other way to handle these instructions other then using a dirty helper to actually do the operations on the host. I can send you the test programs if that helps. Carl
I looked at the PowerISA 3.0b documentation you linked to. Given that this is a copy from "normal" memory to an accelerator, I think you don't have the option to implement it precisely. I'd say the least worst is to implement it so that, for the copy, Memcheck will flag an error if any of the 128 bytes are undefined, and for the paste we make it look as if the 128 bytes are completely defined. That way, at least you'll know if you're sending undefined values to the accelerator. Exactly how to achieve this I am not sure. It will take some fiddling with the annotations on the dirty helper calls. I'll contemplate it more.
(In reply to Julian Seward from comment #5) > there's no way to tell Memcheck > or any other tool, that the values in the destination area are > derived from values in the source area. So you'll wind up with > definedness false positives or negatives as a result of using them. Yes. I think the user will always have to ignore at least one false positive when the program tries to read the memory from the accelerator, regardless if copy/paste is being used. (In reply to Julian Seward from comment #7) > I'd say the least worst is to implement it so that, for the copy, > Memcheck will flag an error if any of the 128 bytes are undefined, > and for the paste we make it look as if the 128 bytes are > completely defined. That way, at least you'll know if you're > sending undefined values to the accelerator. As someone that plans to use this feature, I'd say the minimum I expect is that Valgrind doesn't abort when copy/paste/cpabort is used. But getting extra errors like these would be great indeed.
So .. I had an idea about how to get the annotations right. Problem is, it's a nasty hack (although simple), and I had hoped that the passage of a bit of time would cause me to think of something cleaner. But nothing came to mind, so here it is. I *think* it'll work, but it needs implementing and testing, to check that if you do pass undefined values to the accelerator, you do actually get a Memcheck error. The basic idea is for your helper function -- the one that copies 128 bytes [or whatever] to the accelerator -- to return a fake return value. At present it doesn't return anything. Specifically, you change the helper to return a UWord and that UWord should be (for example) zero. Then, add a fake use of the fake value, in a way that will cause Memcheck to test it, as follows. Furthermore this fake use must not be something that IR optimisation can later remove. On the IR side (in guest_ppc_toIR.c), you allocate an IRTemp of type I64 (assuming 64-bit only operation) to receive the fake value, and set the IRTemp::tmp field to be that temp. Also for the IRDirty structure, you must set mFx/mAddr/mSize so as to declare the memory area it is reading. Immediately after emitting the IRDirty (in guest_ppc_toIR.c), place an IRStmt_Exit that uses the fake return value. Something very similar to this: stmt( IRStmt_Exit( binop(Iop_CmpNE64, fake_return_val_tmp, mkU64(0)), Ijk_SigTRAP, mode64 ? IRConst_U64(cia) : IRConst_U32((UInt)cia), OFFB_CIA )); where cia is the current instruction address (is often passed around, see (eg) do_trap in guest_ppc_toIR.c). The effects of this trickery are as follows: (1) the above stmt() asks the IR to exit, asking Valgrind to hand the program a SIGTRAP at this point, if the fake return value is nonzero, however .. (2) .. that never happens, because your helper always returns zero. (3) Memcheck will believe that any undefinedness in the area read by the helper will be propagated through to the fake return value, and will generate instrumentation to cause that to happen. (4) Memcheck will instrument the IRStmt_Exit to check the definedness computed by (3) and emit an error if the fake return value contains any undefined bits. Hence you get the warning we want. (5) We note that the IR optimisation passes do not know what value the helper call will return. Hence we are guaranteed that they can't optimise away the IRStmt_Exit and its associated check. Bad, huh?! I wish there were a cleaner way. I suggest you test it at least as follows: (a) check that, if the memory area is fully defined, no error is printed. (b) check that, if the memory area contains even one undefined bit, an error is printed. (c) check that, if the helper function does return nonzero, then you really do get SIGTRAP synthesised at the instruction. Not that this is important, but just as a kind of sanity check. Good luck! I am happy to look at patches, etc, as ever.
Created attachment 138846 [details] PPC add copy, paste, cpabort support Updated patch to add the Memcheck trickery to get Memcheck to detect undefined bits in the copy paste buffer. In comment 9, you mentioned that the dirty handler didn't return a value. Actually it returns the 8-bit CR0 value in a UInt. The suggestion was to have the dirty helper return a UWord. Unfortunately UWord is not defined. I tested using a ULong and a UInt as the dirty helper return value and both seem to work. I went with the UInt as it is a little easier to extract the CR0 value. Not sure if there is something in the trickery that requires ULong instead of UInt? The IRStmt_Exit() call was modified to mask out the lower 8-bits where the CR0 value is returned before comparing against zero. The dirty helper ensures that the upper 24-bits are all zero. Per the suggested testing, I tested with all bits defined. Valgrind does not report an error. When one or all of the buffer bytes are undefined Valgrind reports: =885408== Conditional jump or move depends on uninitialised value(s) ==885408== at 0x180890: test_copy (test_copy_paste1.c:18) ==885408== by 0x180963: main (test_copy_paste1.c:50) as expected. I also tested removing the masking out of the CR0 field and verified the sig trap is generated. The message was: ==836609== Memcheck, a memory error detector ==836609== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==836609== Using Valgrind-3.18.0.GIT and LibVEX; rerun with -h for copyright inf o ==836609== Command: ./test_copy_paste2 ==836609== ==836609== Conditional jump or move depends on uninitialised value(s) ==836609== at 0x180890: test_copy (test_copy_paste2.c:18) ==836609== by 0x180A0F: main (test_copy_paste2.c:79) ==836609== ==836609== ==836609== Process terminating with default action of signal 5 (SIGTRAP) ==836609== at 0x180890: test_copy (test_copy_paste2.c:18) ==836609== by 0x180A0F: main (test_copy_paste2.c:79) ==836609== ==836609== HEAP SUMMARY: ==836609== in use at exit: 0 bytes in 0 blocks ==836609== total heap usage: 0 allocs, 0 frees, 0 bytes allocated ==836609== ==836609== All heap blocks were freed -- no leaks are possible ==836609== ==836609== Use --track-origins=yes to see where uninitialised values come from ==836609== For lists of detected and suppressed errors, rerun with: -s ==836609== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) Trace/breakpoint trap (core dumped) The code was commented based on Julian's explanation in comment 9 to document how the trickery works.
(In reply to Carl Love from comment #10) > Created attachment 138846 [details] > PPC add copy, paste, cpabort support > > Updated patch to add the Memcheck trickery to get Memcheck to detect > undefined bits in the copy paste buffer. I *think* this is OK now, with two caveats: +UInt copy_paste_abort_dirty_helper(UInt addr, UInt op) { This will not build on anything except a Power target. It needs to be ifdef'd in the usual way. Ok to land if you fix the above plus sanity check the following issue. Here's another clarifying comment; I think you should add something like this. It's about the "paste" effects annotations. If I understand the code correctly, for "paste" you will now have annotations Ifx_Write at (wherever the target address is) for 128 bytes and also the function takes two args `EA_base` and `operation` and it returns the uint32_t that you mention. How Memcheck will instrument that is, if there is any undefinedness in the inputs, then all of the outputs will be undefined. Hence if EA_base or operation contain any undefined bits then the return value is undefined and the specified 128-byte memory area are undefined after the call else the return value is undefined and the specified 128-byte memory area are defined after the call The point is .. if I understand "paste" correctly, the destination address is not part of the "normal" program address space. So the Ifx_Write annotation in this case is irrelevant. Indeed, if the destination address is part of *some other* address space, then we actually need to omit that Ifx_Write annotation, because othewise Memcheck will update its status bits for that address range in this program's address space, which would be wrong. So .. I hope that makes sense.
Tulio: I think you are probably better able to clarify the comment from Julian in comment 11. > The point is .. if I understand "paste" correctly, the destination address is > not part of the "normal" program address space. So the Ifx_Write annotation > in this case is irrelevant. Indeed, if the destination address is part of > *some other* address space, then we actually need to omit that Ifx_Write > annotation, because othewise Memcheck will update its status bits for that > address range in this program's address space, which would be wrong. The destination address for the paste instruction is always imapped into the current user address space, correct? Assuming Tulio agrees, than we will always want to have the Ifx_Write annotation.
(In reply to Carl Love from comment #12) > The destination address for the paste instruction is always imapped into the > current user address space, correct? Correct. The address is always mmap'ed.
(In reply to Tulio Magno Quites Machado Filho from comment #13) > (In reply to Carl Love from comment #12) > Correct. The address is always mmap'ed. Ok. Then I think you're good to go as-is, apart from the ifdef thing.
Created attachment 138960 [details] PPC add copy, paste, cpabort support Updated patch - Added #if defined(__powerpc__) around body of copy_paste_abort_dirty_helper() function. - Changed "asm volitile" to "__asm__ __volitile__" for the copy, paste and cpabort instructions in the copy_paste_abort_dirty_helper() function to make it consistent with other inline assembly. - Added clarifying comments about the paste instruction per comment 11 I retested the patch with the above changes. I believe all concerns have now been addressed.
Patch committed. commit fa4d21af44861e504a398412c9a120b28d6ceb4e (HEAD -> master, origin/master, origin/HEAD) Author: Carl Love <cel@us.ibm.com> Date: Thu May 27 11:54:22 2021 -0500 PPC64: Add support for copy, cpabort, paste instructions
Closing