Bug 369175 - jm_vec_isa_2_07 test crashes on ppc64
Summary: jm_vec_isa_2_07 test crashes on ppc64
Status: CLOSED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: 3.12 SVN
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-09-21 22:27 UTC by Mark Wielaard
Modified: 2016-10-10 18:18 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
trace flags output (12.93 KB, text/x-log)
2016-09-23 14:08 UTC, Will Schmidt
Details
output from monitor v.translate on segfaulting instruction (74.59 KB, text/x-log)
2016-09-23 14:36 UTC, Will Schmidt
Details
comment 3 step 3 output (278.97 KB, text/x-log)
2016-09-23 15:11 UTC, Will Schmidt
Details
simple test program (1.82 KB, text/x-csrc)
2016-09-23 15:21 UTC, Carl Love
Details
comment 3 step 3 output respin (337.24 KB, text/x-log)
2016-09-23 15:28 UTC, Will Schmidt
Details
Trace for smaller badness2 reproducer (208.54 KB, text/plain)
2016-09-26 11:49 UTC, Mark Wielaard
Details
Patch to wrap &is_BCDstring128_helper address in fnptr_to_fnentry (6.88 KB, patch)
2016-09-26 12:25 UTC, Mark Wielaard
Details
patch, replace isa2_07 bcdadd/bcdsub with isa2_06 vand inst to we can run bcdadd test on Power 7 (2.42 KB, patch)
2016-09-26 16:51 UTC, Carl Love
Details
binary for bcd_add.c test (68.55 KB, application/x-executable)
2016-09-26 16:53 UTC, Carl Love
Details
Patch to fix missing fnptr_to_fnentry() wrapper function calls for clean and dirty helpers (19.89 KB, patch)
2016-09-28 17:19 UTC, Carl Love
Details
Some BE fixes (2.44 KB, patch)
2016-09-30 21:12 UTC, Carl Love
Details
ISA 3.0 BE fixes for various new instructions (34.01 KB, patch)
2016-10-07 22:57 UTC, Carl Love
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Wielaard 2016-09-21 22:27:19 UTC
none/tests/ppc64/jm_vec_isa_2_07.vgtest runs fine on ppc64le, but crashes on a ppc64[be] setup:

gcc-4.8.5-11.el7.ppc64
binutils-2.25.1-22.base.el7.ppc64
glibc-2.17-157.el7.ppc64
glibc-2.17-157.el7.ppc
kernel-3.10.0-508.el7.ppc64

processor	: 0
cpu		: POWER8 (architected), altivec supported
clock		: 3425.000000MHz
revision	: 2.1 (pvr 004b 0201)

timebase	: 512000000
platform	: pSeries
model		: IBM,8247-22L
machine		: CHRP IBM,8247-22L

none/tests/ppc64/jm_vec_isa_2_07.stderr.diff says:

+Process terminating with default action of signal 11 (SIGSEGV)
+ Bad permissions for mapped region at address 0x........
+   at 0x........: test_bcdadd (test_isa_2_07_part1.c:710)
+   by 0x........: test_av_bcd (test_isa_2_07_part1.c:1419)
+   by 0x........: generic_start_main.isra.0 (in /...libc...)
+   by 0x........: (below main)

Up to that point the stdout output looks like expected.

Reproducible: Always
Comment 1 Mark Wielaard 2016-09-21 22:32:20 UTC
Note that running the program itself (not under valgrind) seems fine.
It is only when ran under valgrind that it produces the SIGSEGV:

==28394== Process terminating with default action of signal 11 (SIGSEGV)
==28394==  Bad permissions for mapped region at address 0x3828C978
==28394==    at 0x100016D4: test_bcdadd (test_isa_2_07_part1.c:710)
==28394==    by 0x10001A2B: test_av_bcd (test_isa_2_07_part1.c:1419)
==28394==    by 0x420646B: generic_start_main.isra.0 (in /usr/lib64/libc-2.17.so)
==28394==    by 0x4206693: (below main) (in /usr/lib64/libc-2.17.so)

static int PS_bit;
static void test_bcdadd (void)
{
   if (PS_bit)
      __asm__ __volatile__ ("bcdadd. %0, %1, %2, 1" : "=v" (vec_out): "v" (vec_inA),"v" (vec_inB));
   else
=>      __asm__ __volatile__ ("bcdadd. %0, %1, %2, 0" : "=v" (vec_out): "v" (vec_inA),"v" (vec_inB));
}

=> is line 710.
Comment 2 Carl Love 2016-09-22 22:31:32 UTC
I have some what isolated where the issue when running on Power 7 BE (issue does not occur on LE) is but at this point could use some help getting to the root cause.  

in VEX/priv/guest_ppc_toIR.c in the function dis_av_bcd() there is a case statement that handles the bcdadd and bcdsub instructions.  Once the result of the operation is completed, we have to compute some values to set the CR field 6.  The calculation of the valid bit is as follows:

valid =
            unop( Iop_64to32,
                  binop( Iop_And64,
                         is_BCDstring128( /* Signed */True, mkexpr( vA ) ),
                         is_BCDstring128( /* Signed */True, mkexpr( vB ) ) ) );

Note,  is_BCDstring128() calls a clean helper written in C.   The issue is making the two calls above leads to a segmentation fault.  I rewrote the computation as follows to isolate out the
computations:

          assign( A_isBCDstr, is_BCDstring128( /* Signed */True, mkexpr( vA ) ) );
         //      assign( A_isBCDstr, mkU64( 0 ) );                              
         assign( B_isBCDstr, is_BCDstring128( /* Signed */True, mkexpr( vB ) ) );
         //      assign( B_isBCDstr, mkU64( 0 ) );                              

         valid = unop( Iop_64to32, binop( Iop_And64, mkexpr( A_isBCDstr ), mkexpr( B_isBCDstr ) ) );
Now if I make the call to is_BCDstring128() for vA and vB I get the segmentation fault.  If I comment out either computation and put in the appropriate commented out line to just explicitly set the result to zero, I do not get a segmentation fault.  The segmentation fault occurs later when the IR code is actually executed.  I am guessing that somehow the generation of the IR code with two back to back clean helper calls results in bad code that causes the segmentation fault.  Unfortunately, the issue is a bit deeper in Valgrind then I am familiar with.  I am looking for help to debug this issue.  Please let me know if there is a way we can dump out the generated code and so we can examine it.  Thanks.
Comment 3 Julian Seward 2016-09-23 08:45:35 UTC
This kind of thing could well be due to incorrect register allocation around
the calls, perhaps corrupting the values passed to the calls or corrupting 
values in registers around the call site, that both caller and callee expect the
other to preserve.  Maybe something along those lines.

I tried to reproduce this on gcc112.fsffrance.org (a P8 system) but failed --
it runs OK.  I'd be happy to chase this if I could reproduce on a machine
that I can access.

Failing that .. I'd suggest:

(1) make the smallest possible testcase that segfaults

(2) make sure it still segfaults with --tool=none

(3) with --tool=none, run it with --trace-flags=10001110 and
   --trace-notbelow= set to the appropriate value so you actually
   get output for the failing block.  From that we might be able
   to guess what the problem is.

(4) Find the failing instruction in GDB by setting env var
   VALGRIND_LAUNCHER=    (to nothing)
   and then running
   gdb none/none-ppc64be-linux
   run --tool=none ./path/to/the/test/binary
   and disassemble say 10 insns each side of the failing insn, so we
   can locate it in the JIT generated code.  Be aware that GDB may
   catch segfaults that are unrelated to this crash, prior to the crash
   point; allow it to pass those on to valgrind to be handled.

(5) Post the results from (3) and (4) here.
Comment 4 Julian Seward 2016-09-23 08:57:14 UTC
Comment 3 assumes that the block that segfaults is the same one where
the (we assume) mis-translation occurred.  It might be that some previous
block was mis-translated and causes the simulated machine state to be
corrupted, and only in some later block does it crash.  But that's a less
likely scenario.
Comment 5 Julian Seward 2016-09-23 09:39:50 UTC
Still can't repro it, but with a test case for this insn, the two calls look like
this:

IR and virtual-registerised code:

 -- t127 = 1Sto32(32to1(64to32(And64(is_BCDstring128_helper{0x38174610}(0x1:I64,t117,t118):I64,is_BCDstring128_helper{0x38174610}(0x1:I64,t120,t121):I64))))
li_word %vR721,0x0000000000000001
mr %r3,%vR721
mr %r4,%vR117
mr %r5,%vR118
call: { li_word %r10,0x0000000038174610 ; mtctr r10 ; bctrl [r3,r4,r5,RLPri_Int] }
mr %vR722,%r3
li_word %vR723,0x0000000000000001
mr %r3,%vR723
mr %r4,%vR120
mr %r5,%vR121
call: { li_word %r10,0x0000000038174610 ; mtctr r10 ; bctrl [r3,r4,r5,RLPri_Int] }
mr %vR724,%r3
and %vR720,%vR722,%vR724
andi. %vR725,%vR720,1
cmplwi %cr7,%vR725,1
set (cr7.eq=1),%vR719: { mfcr r0 ; rlwinm %vR719,r0,31,31,31 }
slwi %vR719,%vR719,31
srawi %vR719,%vR719,31
mr %vR127,%vR719

Register allocation around the two calls:

194   li_word %r14,0x0000000000000001
195   mr %r3,%r14
196   mr %r4,%r15
197   mr %r5,%r16
198   call: { li_word %r10,0x0000000038174610 ; mtctr r10 ; bctrl [r3,r4,r5,RLPri_Int] }
199   mr %r14,%r3
200   li_word %r19,0x0000000000000001
201   mr %r3,%r19
202   mr %r4,%r17
203   mr %r5,%r18
204   call: { li_word %r10,0x0000000038174610 ; mtctr r10 ; bctrl [r3,r4,r5,RLPri_Int] }
205   mr %r19,%r3
206   and %r20,%r14,%r19
207   andi. %r14,%r20,1

So the result of the first call is parked in r14, and then we make the second call.  That
seems OK because r14 (in fact, r14 to r28) are callee saved so the result of the first
call won't get trashed.  Also, the args for the second call enter this fragment in 
r19/r18/r17, so we can be fairly sure the first call doesn't trash them.  So there's
nothing *obviously* wrong here that I can see.  We need the real crash block.
Comment 6 Mark Wielaard 2016-09-23 11:52:03 UTC
(In reply to Julian Seward from comment #3)
> I tried to reproduce this on gcc112.fsffrance.org (a P8 system) but failed --
> it runs OK.  I'd be happy to chase this if I could reproduce on a machine
> that I can access.

gcc112 is ppc64le which seems to work fine. Unfortunately the ppc64[be] machines in the GCC compile farm are all power7, not power8 (as gcc112 is).

> (3) with --tool=none, run it with --trace-flags=10001110 and
>    --trace-notbelow= set to the appropriate value so you actually
>    get output for the failing block.  From that we might be able
>    to guess what the problem is.

It is sometimes convenient to do that from gdb. Run with ./vg-in-place --vgdb-error=0 none/tests/ppc64/test_isa_2_07_part1 in one terminal gdb none/tests/ppc64/test_isa_2_07_part1 in another. Attach in gdb with target remote | coregrind/vgdb. continue. It gets the SEGV. Find the address where it crashed. Then use monitor v.translate <address> [<traceflags>] to dump the block. The traceflags are the same. See http://www.valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.valgrind-monitor-commands
Comment 7 Will Schmidt 2016-09-23 13:11:35 UTC
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x00000000100016d8 in test_bcdadd () at test_isa_2_07_part1.c:710
710	      __asm__ __volatile__ ("bcdadd. %0, %1, %2, 0" : "=v" (vec_out): "v" (vec_inA),"v" (vec_inB));

(gdb) disas
Dump of assembler code for function test_bcdadd:
   0x0000000010001694 <+0>:	nop
   0x0000000010001698 <+4>:	lwz     r9,-32228(r2)
   0x000000001000169c <+8>:	cmpwi   cr7,r9,0
   0x00000000100016a0 <+12>:	beq     cr7,0x100016c8 <test_bcdadd+52>
   0x00000000100016a4 <+16>:	nop
   0x00000000100016a8 <+20>:	addi    r9,r2,-32328
   0x00000000100016ac <+24>:	li      r10,32
   0x00000000100016b0 <+28>:	lxvd2x  vs32,r10,r9
   0x00000000100016b4 <+32>:	lxvd2x  vs33,0,r9
   0x00000000100016b8 <+36>:	bcdadd. v0,v1,v0,1
   0x00000000100016bc <+40>:	li      r10,16
   0x00000000100016c0 <+44>:	stxvd2x vs32,r9,r10
   0x00000000100016c4 <+48>:	b       0x100016e8 <test_bcdadd+84>
   0x00000000100016c8 <+52>:	nop
   0x00000000100016cc <+56>:	addi    r9,r2,-32328
   0x00000000100016d0 <+60>:	li      r10,32
   0x00000000100016d4 <+64>:	lxvd2x  vs32,r10,r9
=> 0x00000000100016d8 <+68>:	lxvd2x  vs33,0,r9
   0x00000000100016dc <+72>:	bcdadd. v0,v1,v0,0
   0x00000000100016e0 <+76>:	li      r10,16
   0x00000000100016e4 <+80>:	stxvd2x vs32,r9,r10
   0x00000000100016e8 <+84>:	blr
   0x00000000100016ec <+88>:	.long 0x0
   0x00000000100016f0 <+92>:	.long 0x0
   0x00000000100016f4 <+96>:	.long 0x0
End of assembler dump.

(gdb) info reg r10
r10            0x10001958	268441944
(gdb) info reg r9
r9             0x10020f40	268570432
(gdb) info reg vs32
vs32           {uint128 = 0x6090180378642006002244669113354d, v2_double = {0xffffffffffffffff, 0x0}, v4_float = {
    0xffffffff, 0xffffffff, 0x0, 0x0}, v4_int32 = {0x60901803, 0x78642006, 0x224466, 0x9113354d}, v8_int16 = {
    0x6090, 0x1803, 0x7864, 0x2006, 0x22, 0x4466, 0x9113, 0x354d}, v16_int8 = {0x60, 0x90, 0x18, 0x3, 0x78, 
    0x64, 0x20, 0x6, 0x0, 0x22, 0x44, 0x66, 0x91, 0x13, 0x35, 0x4d}}
(gdb) info reg vs33
vs33           {uint128 = 0x8045090189321003001122334556677d, v2_double = {0x0, 0x0}, v4_float = {0x0, 0x0, 0x0, 
    0xd66}, v4_int32 = {0x80450901, 0x89321003, 0x112233, 0x4556677d}, v8_int16 = {0x8045, 0x901, 0x8932, 
    0x1003, 0x11, 0x2233, 0x4556, 0x677d}, v16_int8 = {0x80, 0x45, 0x9, 0x1, 0x89, 0x32, 0x10, 0x3, 0x0, 0x11, 
    0x22, 0x33, 0x45, 0x56, 0x67, 0x7d}}
(gdb)
Comment 8 Will Schmidt 2016-09-23 13:21:14 UTC
off-by-one on my disassembly "==>" pointer, the SIGSEGV is actually on the bcdadd instruction.   (showed up differently while single-stepping).

   0x00000000100016d4 <+64>:	lxvd2x  vs32,r10,r9
   0x00000000100016d8 <+68>:	lxvd2x  vs33,0,r9
=> 0x00000000100016dc <+72>:	bcdadd. v0,v1,v0,0
   0x00000000100016e0 <+76>:	li      r10,16

 still poking..
Comment 9 Mark Wielaard 2016-09-23 13:40:50 UTC
Assuming that is with remote vgdb attached then you should be able to get the generated IR and assembly with:
 monitor v.translate 0x00000000100016dc 0b00100001

(See the manual or valgrind --help-debug for the explanation of the --trace-flags)
Comment 10 Will Schmidt 2016-09-23 14:08:28 UTC
Created attachment 101242 [details]
trace flags output

last portion of output from a run in attachment.

$ ./vg-in-place --tool=none  --trace-flags=10001110 --trace-notbelow=2000 none/tests/ppc64/test_isa_2_07_part1 -a

very last bit of output also inline:
377   std %r14,1296(%r31)
378   ld %r14,1304(%r31)
379   li_word %r15,0xFFFFFFFFFFFFFFFC
380   and %r16,%r14,%r15
381   (xIndir) if (always) { std %r16,1296(%r31); imm64 r30,$disp_cp_xindir; mtctr r30; bctr }

VexExpansionRatio 36 1872   520 :10

==4341== 
==4341== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==4341==  Bad permissions for mapped region at address 0x3824B2B0
==4341==    at 0x100016D8: test_bcdadd (test_isa_2_07_part1.c:710)
==4341==    by 0x10001A2F: test_av_bcd (test_isa_2_07_part1.c:1419)
==4341==    by 0x41E4D87: generic_start_main.isra.0 (libc-start.c:289)
==4341==    by 0x41E4FB7: (below main) (libc-start.c:80)
==4341==
Comment 11 Will Schmidt 2016-09-23 14:36:01 UTC
(In reply to Mark Wielaard from comment #9)
> Assuming that is with remote vgdb attached then you should be able to get
> the generated IR and assembly with:
>  monitor v.translate 0x00000000100016dc 0b00100001
> 
> (See the manual or valgrind --help-debug for the explanation of the
> --trace-flags)

Ok.   So given this: 
Program received signal SIGSEGV, Segmentation fault.
0x00000000100016d8 in test_bcdadd () at test_isa_2_07_part1.c:710
710	      __asm__ __volatile__ ("bcdadd. %0, %1, %2, 0" : "=v" (vec_out): "v" (vec_inA),"v" (vec_inB));
I did the monitor command against the ... 16d8 addr.
(gdb) monitor v.translate 0x00000000100016d8 0b00100001

The output is larger than I had expected...    attaching it momentarily.
Comment 12 Will Schmidt 2016-09-23 14:36:46 UTC
Created attachment 101243 [details]
output from monitor v.translate on segfaulting instruction
Comment 13 Will Schmidt 2016-09-23 15:11:08 UTC
Created attachment 101244 [details]
comment 3 step 3 output

results from comment3 step 3
Comment 14 Carl Love 2016-09-23 15:21:17 UTC
Created attachment 101245 [details]
simple test program

The attachment is a stripped down test for the bcdadd instruction.  It was pulled out of the jm_vec_isa_2_07 test.
Comment 15 Will Schmidt 2016-09-23 15:28:55 UTC
Created attachment 101246 [details]
comment 3 step 3 output respin

comment 3 step 3 respin with --trace-notbelow=2092
Comment 16 Will Schmidt 2016-09-23 16:02:23 UTC
A bit of an info dump from the IRC discussions occurring on this bug:

A gdb session revealed:

Program received signal SIGSEGV, Segmentation fault.
 0x000000003824b2b0 in is_BCDstring128_helper ()
 (gdb) disas 0x000000003824b2b0
 Dump of assembler code for function is_BCDstring128_helper:
 => 0x000000003824b2b0 <+0>: .long 0x0
    0x000000003824b2b4 <+4>: addi    r0,r23,-11904

Disassembly of the function by name shows a different address.
 (gdb) disas is_BCDstring128_helper
 Dump of assembler code for function is_BCDstring128_helper:
    0x000000003817d180 <+0>: cmpdi   cr7,r3,1
    0x000000003817d184 <+4>: li      r3,1

per some disassembly, I believe we have confirmed we just tried to execute the OPD...

000000003824b2b0 g     F .opd   0000000000000080 is_BCDstring128_helper

> objdump -h none/none-ppc64be-linux 

none/none-ppc64be-linux:     file format elf64-powerpc

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .note.gnu.build-id 00000024  0000000038000158  0000000038000158  00000158  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  1 .text         001c6ed0  0000000038000180  0000000038000180  00000180  2**5
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  2 .rodata       00059198  00000000381c7050  00000000381c7050  001c7050  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  3 .eh_frame     00000030  00000000382201e8  00000000382201e8  002201e8  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .data.rel.ro  00000030  000000003823ffd0  000000003823ffd0  0022ffd0  2**3
                  CONTENTS, ALLOC, LOAD, DATA
  5 .data         000025f0  0000000038240000  0000000038240000  00230000  2**3
                  CONTENTS, ALLOC, LOAD, DATA
  6 .opd          0000ad98  00000000382425f0  00000000382425f0  002325f0  2**3
                  CONTENTS, ALLOC, LOAD, DATA
  7 .got          00000030  000000003824d388  000000003824d388  0023d388  2**3
                  CONTENTS, ALLOC, LOAD, DATA
  8 .bss          009da2a0  000000003824d3c0  000000003824d3c0  0023d3b8  2**4
                  ALLOC
  9 .comment      0000002c  0000000000000000  0000000000000000  0023d3b8  2**0
                  CONTENTS, READONLY
 10 .debug_aranges 00001c20  0000000000000000  0000000000000000  0023d3f0  2**4
                  CONTENTS, READONLY, DEBUGGING
 11 .debug_info   0030b9a3  0000000000000000  0000000000000000  0023f010  2**0
                  CONTENTS, READONLY, DEBUGGING
 12 .debug_abbrev 000156fa  0000000000000000  0000000000000000  0054a9b3  2**0
                  CONTENTS, READONLY, DEBUGGING
 13 .debug_line   00055a2f  0000000000000000  0000000000000000  005600ad  2**0
                  CONTENTS, READONLY, DEBUGGING
 14 .debug_frame  00023b30  0000000000000000  0000000000000000  005b5ae0  2**3
                  CONTENTS, READONLY, DEBUGGING
 15 .debug_str    0002e1c5  0000000000000000  0000000000000000  005d9610  2**0
                  CONTENTS, READONLY, DEBUGGING
 16 .debug_loc    002b2e89  0000000000000000  0000000000000000  006077d5  2**0
                  CONTENTS, READONLY, DEBUGGING
 17 .debug_ranges 00051bb0  0000000000000000  0000000000000000  008ba65e  2**0
                  CONTENTS, READONLY, DEBUGGING
Comment 17 Mark Wielaard 2016-09-23 16:03:10 UTC
Since some of the output looks as if we jump into (or just read from?) the ODP and that gives a permission error/SEGV, could you provide the section and segments of the tool to double check everything is at the right place? eu-readelf -Sl  none-ppc64be-linux
Comment 18 Mark Wielaard 2016-09-23 16:07:14 UTC
Never mind the objdump -h gives the right information. The 0x3824B2B0 is indeed in the middle of the ODP. The ODP is loaded and DATA (not CODE), which I assume means it is readable (so we can load/read the actual function address and jump to that) but not execute it.
Comment 19 Will Schmidt 2016-09-23 20:33:33 UTC
Chased this one around under gdb for a while, looking for where the R2 or the stack frame are getting messed up.    Something seems to be going horribly wrong here, but it's possible valgrind just always does this and I'm totally misunderstanding what I am seeing.

With that said:

It looks like something is going badly in  m_dispatch/dispatch-ppc64be-linux.S, line 225, in or around the vspltisw instruction.   Noting the changes between the backtrace before and the backtrace after the step over that instruction.

(gdb) 
223	        beq     .LafterVMX2
=> 0x0000000038092e88 <.vgPlain_disp_run_translations+320>:	41 82 00 0c	beq     0x38092e94 <.vgPlain_disp_run_translations+332>
(gdb) bt
#0  .vgPlain_disp_run_translations () at m_dispatch/dispatch-ppc64be-linux.S:223
#1  0x00000000380a3114 in run_thread_for_a_while (two_words=two_words@entry=0x802b3fe20, dispatchCtrP=dispatchCtrP@entry=0x802b3fe34, tid=tid@entry=1, 
    alt_host_addr=alt_host_addr@entry=0, use_alt_host_addr=use_alt_host_addr@entry=0 '\000') at m_scheduler/scheduler.c:947
#2  0x00000000380a5a18 in vgPlain_scheduler (tid=<optimized out>) at m_scheduler/scheduler.c:1336
#3  0x00000000380bf3c8 in thread_wrapper (tidW=<optimized out>) at m_syswrap/syswrap-linux.c:103
#4  run_a_thread_NORETURN (tidW=<optimized out>) at m_syswrap/syswrap-linux.c:156
#5  0x0000000000000000 in ?? ()
(gdb) stepi
.vgPlain_disp_run_translations () at m_dispatch/dispatch-ppc64be-linux.S:225
225	        vspltisw 3,0x0  /* generate zero */
=> 0x0000000038092e8c <.vgPlain_disp_run_translations+324>:	10 60 03 8c	vspltisw v3,0
(gdb) bt
#0  .vgPlain_disp_run_translations () at m_dispatch/dispatch-ppc64be-linux.S:225
#1  0x00000000380a3114 in run_thread_for_a_while (two_words=0x0, dispatchCtrP=0x0, tid=<optimized out>, alt_host_addr=0, use_alt_host_addr=<optimized out>)
    at m_scheduler/scheduler.c:947
#2  0x0000000000000000 in ?? ()

(gdb) stepi
226	        mtvscr  3
=> 0x0000000038092e90 <.vgPlain_disp_run_translations+328>:	10 00 1e 44	mtvscr  v3
(gdb) 
230	        stdu    1,-48(1)
=> 0x0000000038092e94 <.vgPlain_disp_run_translations+332>:	f8 21 ff d1	stdu    r1,-48(r1)
(gdb) 
.vgPlain_disp_run_translations () at m_dispatch/dispatch-ppc64be-linux.S:233
233	        mr      31,4      /* r31 (generated code gsp) = r4 */
=> 0x0000000038092e98 <.vgPlain_disp_run_translations+336>:	7c 9f 23 78	mr      r31,r4
(gdb) 
240	        mtctr   5
=> 0x0000000038092e9c <.vgPlain_disp_run_translations+340>:	7c a9 03 a6	mtctr   r5
(gdb) 
241	        bctr
=> 0x0000000038092ea0 <.vgPlain_disp_run_translations+344>:	4e 80 04 20	bctr
(gdb) 
0x0000000803240000 in ?? ()
=> 0x0000000803240000:	83 df 00 08	lwz     r30,8(r31)
(gdb) 

... at which point I suspect we're off in the weeds with our IP/SP.    Did a bit more single-stepping after that,..   Neither I nor gdb are able to map whatever we are executing to anything..  

0x0000000803240000 in ?? ()
=> 0x0000000803240000:	83 df 00 08	lwz     r30,8(r31)
(gdb) 
0x0000000803240004 in ?? ()
=> 0x0000000803240004:	37 de ff ff	addic.  r30,r30,-1
(gdb) 
0x0000000803240008 in ?? ()
=> 0x0000000803240008:	93 df 00 08	stw     r30,8(r31)
(gdb) 
0x000000080324000c in ?? ()
=> 0x000000080324000c:	40 80 00 10	bge     0x80324001c
(gdb) 

eventually we stumble across ppc32g_dirtyhelper_MFSPR_....
(gdb) 
0x00000008032401ec in ?? ()
=> 0x00000008032401ec:	4e 80 04 21	bctrl
(gdb) 
ppc32g_dirtyhelper_MFSPR_268_269 (r269=0) at priv/guest_ppc_helpers.c:98
98	   if (r269) {
=> 0x000000003817ce80 <ppc32g_dirtyhelper_MFSPR_268_269+0>:	2f a3 00 00	cmpdi   cr7,r3,0
   0x000000003817ce84 <ppc32g_dirtyhelper_MFSPR_268_269+4>:	40 9e 00 1c	bne     cr7,0x3817cea0 <ppc32g_dirtyhelper_MFSPR_268_269+32>
....
a bctrl in that takes us back into dispatch-ppc64be-linux.S
(gdb) 
0x000000080324026c in ?? ()
=> 0x000000080324026c:	4e 80 04 21	bctrl
(gdb) 
.vgPlain_disp_cp_chain_me_to_fastEP () at m_dispatch/dispatch-ppc64be-linux.S:435
435	        li   6, VG_TRC_CHAIN_ME_TO_FAST_EP
=> 0x0000000038093034 <.vgPlain_disp_cp_chain_me_to_fastEP+0>:	38 c0 00 33	li      r6,51
(gdb) 
436	        mflr 7
....
eventually we branch into postamble/vgPlain_disp_run_translations...

(gdb) 
442	        b    .postamble
=> 0x0000000038093040 <.vgPlain_disp_cp_chain_me_to_fastEP+12>:	4b ff fe 64	b       0x38092ea4 <.vgPlain_disp_run_translations+348>
(gdb) 
.vgPlain_disp_run_translations () at m_dispatch/dispatch-ppc64be-linux.S:256
256	        addi    1,1,48
Comment 20 Mark Wielaard 2016-09-26 11:49:43 UTC
Created attachment 101295 [details]
Trace for smaller badness2 reproducer

Trace output for a smaller reproducer:

$ cat badness2.c

#include <string.h>

static vector unsigned long long vec_out, vec_inA, vec_inB;

int main ( void )
{
   memset(&vec_inA, 0x12, sizeof(vec_inA));
   memset(&vec_inB, 0x34, sizeof(vec_inB));
   memset(&vec_out, 0x56, sizeof(vec_out));
   __asm__ __volatile__ ("bcdadd. %0, %1, %2, 0"
                         : "=v" (vec_out): "v" (vec_inA),"v" (vec_inB));
   return 0;
}

$ gcc -Winline -Wall -g -O -mregnames -maltivec -m64 -o badness2 badness2.c -mvsx -Wa,-mvsx -mcpu=power8 -Wa,-mpower8
Comment 21 Mark Wielaard 2016-09-26 12:25:14 UTC
Created attachment 101296 [details]
Patch to wrap &is_BCDstring128_helper address in fnptr_to_fnentry

After some debugging Julian suggested this fix. It makes the original testcase PASS. But Julian claims this is impossible :)

There are other calls that arguably should use fnptr_to_fnentry to get/call the actual function entry (instead the function descriptor). So either or testcases are not stress testing things enough, or we still don't fully understand what is going on.
Comment 22 Julian Seward 2016-09-26 12:29:47 UTC
Looking for helper calls in the the whole of guest_ppc_toIR.c, by searching for the
string "mkIRExprVec_", I found the following non-wrapped uses of function
pointers.  They should all be wrapped in fnptr_to_fnentry().  The reason
for my claim of "impossible" in comment 21 is that I would have thought that
the BCD test cases exercise all of these helper functions, so any missing
wrapper would cause the case to segfault.  So maybe we don't really yet
understand what is going on.


static IRExpr * is_BCDstring128 (UInt Signed, IRExpr *src)
                          &is_BCDstring128_helper,

static IRTemp increment_BCDstring (IRExpr *src, IRExpr *carry_in)
                          &increment_BCDstring32_helper,
                          &increment_BCDstring32_helper,
                          &increment_BCDstring32_helper,
                          &increment_BCDstring32_helper,

static IRExpr * convert_to_zoned ( IRExpr *src, IRExpr *upper_byte )
                          &convert_to_zoned_helper,
                          &convert_to_zoned_helper,

static IRExpr * convert_to_national ( IRExpr *src ) {
                          &convert_to_national_helper,
                          &convert_to_national_helper,

static IRExpr * convert_from_zoned ( IRExpr *src ) {
                                        &convert_from_zoned_helper,

static IRExpr * convert_from_national ( IRExpr *src ) {
                          &convert_from_national_helper,

static Bool dis_av_load ( const VexAbiInfo* vbi, UInt theInstr )
                           &ppc64g_dirtyhelper_LVS,
(although this one doesn't matter, because it is LE only.  Still, inconsistent)

same for lvsr a few lines later
Comment 23 Ulrich Weigand 2016-09-26 15:57:50 UTC
Will asked me to look at this bug and comment from the perspective of the Power ABI ...

Now, with the ppc64(be) (or AIX) ABI, to call a function via function pointer you need to do two things:
- read the target functions' TOC pointer from the function descriptor (word 1) and load it into r2, *and*
- read the target functions' entry point address from the function descriptor (word 0) and transfer control to it

(And of course you need to save the old value of r2 before the call and restore it afterwards.  To be fully correct, you also need to load the static chain pointer from word 2 of the function descriptor and load it into r11 -- but that only matters when calling a nested function.)

If you're jumping into a function descriptor (as in comment #19), then you're clearly treating a function descriptor address as if it were an entry point address, which is of course wrong.

However, adding calls to fnptr_to_fnentry at a high level likewise seems wrong, since once you've done that, you've forgotten where the function descriptor was and have no chance of then retrieving the correct TOC value to load into r2 before the call.  And in fact, the crash in comment #7 seems a typical example of what happens when calling a function without setting up its TOC pointer correctly.

I'm not very familiar with valgrind internals, but what I'd have expected to happen is that the function pointer value (i.e. function descriptor address) is being kept around in the IR until that (platform-specific) point where actual function call instructions (bctrl etc.) are emitted; and only at *that* point look into the function descriptor and use it to load both TOC pointer and entry point address.
Comment 24 Julian Seward 2016-09-26 16:09:10 UTC
(In reply to Ulrich Weigand from comment #23)
> However, adding calls to fnptr_to_fnentry at a high level likewise seems
> wrong, since once you've done that, you've forgotten where the function
> descriptor was and have no chance of then retrieving the correct TOC value
> to load into r2 before the call.  And in fact, the crash in comment #7 seems
> a typical example of what happens when calling a function without setting up
> its TOC pointer correctly.

Ulrich, you're right.  Valgrind actually kludges this on ppc64be, by completely
ignoring the TOC pointer question and not saving or restoring r2 across calls.
I suspect that the reason this works is that Valgrind itself is compiled into a
statically linked executable, and so there is only ever one required r2 value,
which is set up at start time and stays the same for ever more.  Does that
sound plausible?
Comment 25 Ulrich Weigand 2016-09-26 16:39:48 UTC
(In reply to Julian Seward from comment #24)
> (In reply to Ulrich Weigand from comment #23)
> > However, adding calls to fnptr_to_fnentry at a high level likewise seems
> > wrong, since once you've done that, you've forgotten where the function
> > descriptor was and have no chance of then retrieving the correct TOC value
> > to load into r2 before the call.  And in fact, the crash in comment #7 seems
> > a typical example of what happens when calling a function without setting up
> > its TOC pointer correctly.
> 
> Ulrich, you're right.  Valgrind actually kludges this on ppc64be, by completely
> ignoring the TOC pointer question and not saving or restoring r2 across calls.
> I suspect that the reason this works is that Valgrind itself is compiled into a
> statically linked executable, and so there is only ever one required r2 value,
> which is set up at start time and stays the same for ever more.  Does that
> sound plausible?

I see.  For the "host" side (valgrind itself), I guess that's a valid approach, as long as you make sure that there is indeed never any shared library involved (not even via dlopen), and the main executable actually only uses a single TOC (the linker in fact supports generating executables using multiple TOCs).  With code generated by a current compiler (in particular using the medium memory model), you'll probably always get single-TOC executables anyway, but if you actually rely on that property, I guess it cannot hurt to make sure by passing --no-multi-toc to the linker.

Of course on the "target" side (the program running under valgrind), you'll have shared libraries and thus multiple TOCs, so if you inject code on that side, you may need to consider reloading the emulated r2 at some point.  That probably isn't an issue with the "helpers" discussed in this bug though.

Given that, simply doing the fnptr_to_fentry actually may just be the right fix.
Comment 26 Carl Love 2016-09-26 16:51:22 UTC
Created attachment 101299 [details]
patch, replace isa2_07 bcdadd/bcdsub with isa2_06 vand inst to we can run bcdadd test on Power 7

Replace bcdadd, bcdsub with vand so we can run bcdadd test on
 a power 7 to test the clean helper issue.

Ignore compiler warnings about unused variables.
Comment 27 Carl Love 2016-09-26 16:53:32 UTC
Created attachment 101300 [details]
binary for bcd_add.c test

Binary for the bcd_test.c program.  Runs under valgrind on Power 7 with the valgrind bcdadd patch previously posted.
Comment 28 Carl Love 2016-09-26 16:58:30 UTC
Posted a patch to replace the bcdadd instruction generation in Valgrind with vand.  This patch allows us to run a bcd add test on a power 7 (big endian) machine to test the ISA 3.0 code that calls is_BCDstring128() function to see if the function pointer stuff for clean helpers is broken on the older big endian machines.   

Also posted the compiled bcd_test.c test program.  I ran the patch and binary bcd_test program on a Power 7.  It does appear to segfault the same as we are seeing on Power 8 BE.  Note, this was done WITH OUT Mark's patch applied.
Comment 29 Carl Love 2016-09-28 17:19:08 UTC
Created attachment 101332 [details]
Patch to fix missing fnptr_to_fnentry() wrapper function calls for clean and dirty helpers

The attached patch fixes the missing fnptr_to_fnentry() calls for the clean and dirty calls.  

This patch is based on patch 101296 by Mark Wielaard but with the fnptr_to_fentry() call added to all of the other clean and dirty handlers where it was missing.  Note, the order of the arguments in the function calls was also changed but that is just a "cosmetic" change

The fnptr_to_fnentry() fix was tested and verified on a Power 7 (Big endian) system by applying patch 101299.  The patch basically allows us to test the control flow for the bcdadd and bcdsub instructions without actually generating the bcdadd or bcdsub instructions on a Power 7.  Instead of the bcdadd and bcdsub instructions being generated, the vand instruction which is supported by Power 7 is generated.  With patch 101299,  the bug in the clean helper was verified on a big endian system.  Then when the fnptr_to_fentry() fix (patch 101296) was applied the clean helper executes correctly.

The full regtest suite has been run with the attached patch applied to the current upstream code base on a Power 8 Big endian and Power 8 Little endian systems.  The regtest suite on the patched upstream code has been tested and verified on an ISA 3.0 Little endian simulator.  The ISA 3.0 Big endian test is currently still running.  Unfortunately, building Valgrind from scratch and running the regtest literally takes all day.  

Please review the patch and let me know if you see any issues.  If everyone approves the patch and the ISA 3.0 Big endian test passes, I will commit the fix.  The fix will need to get staged for the upcoming Valgrind release.
Comment 30 Mark Wielaard 2016-09-30 21:11:45 UTC
(In reply to Carl Love from comment #29)
> Please review the patch and let me know if you see any issues.  If everyone
> approves the patch and the ISA 3.0 Big endian test passes, I will commit the
> fix.  The fix will need to get staged for the upcoming Valgrind release.

Looks good to me and it passes the testsuite on my setup (power8 both ppc64 and ppc64le).
Comment 31 Carl Love 2016-09-30 21:12:45 UTC
Created attachment 101364 [details]
Some BE fixes

The BE simulator runs found a few more issues that cause Valgrind to fail.  These are minor things that were easy to fix.  The attached patch addresses these issues.


The BE testing has also detected that some of the load and store instructions are not loading the data in the expected BE order.  Working on a patch for these instructions.
Comment 32 Carl Love 2016-10-03 15:33:53 UTC
Patch to fix missing fnptr_to_fnentry() wrapper funion calls for clean and dirty helpers was committed,  VEX commit  3251.

The BE fixes for the function FPU_rounding_mode_isOdd(), stxvl and stxvx instructions was committed, VEX commit 3252

Patch for the load and store instructions in BE mode is still pending.
Comment 33 Carl Love 2016-10-07 22:57:12 UTC
Created attachment 101482 [details]
ISA 3.0 BE fixes for various new instructions

This is an additional commit to fix issues found with the 
new Power ISA 3.0 instructions for BE mode.  The instructions
fixed in this patch include: lxvl, lxvx, lxvwsx, lxvh8x, lxvh16x,
stxvx, stxvh8x, stxvh16x, lxsibzx, lxsihzx, xscvqpdp, xscvqpdp0,
xvcvsphp.
Comment 34 Carl Love 2016-10-07 22:58:21 UTC
ISA 3.0 BE fixes for various new instructions patch committed   VEX commit 3260
Comment 35 Carl Love 2016-10-07 23:12:14 UTC
testsuite fix to give more unique values.

In testing issues with the new ISA 3.0 instructions in BE mode, it was
found that we needed some more unique values in the operands to catch
various errors.  The issue is a sigle 32-bit value was replicated four
times for a V128 operand.  The result is testing loads and stores where
the word or half word order was swizzled couln't be detected because
they were the same.  By making the 32-bit chunks unique we were able
to catch additional errors.
 
The initialize_buffer() function in none/tests/ppc64/ppc64_helpers.h was 
changed to ensure we got more unique values.  The result is the ISA 3.0 expected
output files changed.  The expect files were for LE.  The expect files were
updated for the BE results and the LE result file was added with the "-LE"
on the end of the file name per convention.  The patch is large so I did not attach it
to the bug.

Committed in Valgrind commit 16032