Bug 371966 - No uninitialised values reported with PGI -Mstack_arrays
Summary: No uninitialised values reported with PGI -Mstack_arrays
Status: REPORTED
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (show other bugs)
Version: 3.11.0
Platform: unspecified Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-11-02 07:11 UTC by Carl Ponder
Modified: 2016-12-01 10:39 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Simple Fortran test-case using array with dynamic bound. (1.77 KB, text/x-fortran)
2016-11-02 07:11 UTC, Carl Ponder
Details
Assembly generated without stack-arrays, where valgrind works (18.36 KB, text/plain)
2016-11-23 12:59 UTC, Carl Ponder
Details
Assembly generated with stack arrays, where valgrind doesn't work (18.20 KB, text/plain)
2016-11-23 13:00 UTC, Carl Ponder
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Carl Ponder 2016-11-02 07:11:25 UTC
Created attachment 101954 [details]
Simple Fortran test-case using array with dynamic bound.

I have a simple Fortran test-case that allocates an array and uses uninitialized values from it. Using the PGI compiler, if I compile it using the -Mstack_arrays option, valgrind reports 0 errors.

I also have a HUGE program (WRF) where valgrind is likewise not reporting anything in spite of the fact that uninitialized array-elements are being used, so I'm trying to track down issues like this one.

Can you guys explain what's going on? I'm also checking with PGI on this.
Comment 1 Carl Ponder 2016-11-02 07:15:45 UTC
I attached the test-case here. You can reproduce the issue as follows:

pgfortran -o test03.pgi test03.f90 -O0 -gopt
valgrind test03.pgi                     # 12 errors.

pgfortran -o test03.pgi test03.f90 -O0 -gopt -Mstack_arrays
valgrind test03.pgi                     # 0 errors.

I'm using the PGI 16.9 compiler running on CentOS 7.2. The valgrind was built with GCC 4.8.5.
Comment 2 Philippe Waroquiers 2016-11-02 20:17:32 UTC
(In reply to Carl Ponder from comment #0)
> Created attachment 101954 [details]
> Simple Fortran test-case using array with dynamic bound.
> 
> I have a simple Fortran test-case that allocates an array and uses
> uninitialized values from it. Using the PGI compiler, if I compile it using
> the -Mstack_arrays option, valgrind reports 0 errors.
> 
> I also have a HUGE program (WRF) where valgrind is likewise not reporting
> anything in spite of the fact that uninitialized array-elements are being
> used, so I'm trying to track down issues like this one.
> 
> Can you guys explain what's going on? I'm also checking with PGI on this.
No idea, and when I try to reproduce on my debian box, it tells:
No command 'pgfortran' found, did you mean:
 Command 'gfortran' from package 'gfortran' (main)


So, here is what I suggest:
Compile your application with debugging information.

Then use gdb+vgdb to step in your application
(see http://www.valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.gdbserver
for more information)

Use the xb monitor command: 
  xb <addr> [<len>] shows the definedness (V) bits and values for <len> (default 1) bytes starting at <addr>
to see at which moment the memory of x(6..10) becomes initialised.
You probably better use --vgdb=full to be sure to step precisely (and even maybe
use stepi when relevant).
Comment 3 Carl Ponder 2016-11-02 22:53:25 UTC
This "pgfortran" is the PGI Fortran compiler.
What I'm puzzled about is why valgrind is finding more uninitialized array-elements when I compiled with gfortran than with pgfortran, and if I use

pgfortran -O0 -gopt -Mstack_arrays ...

valgrind doesn't find any uninitialized array-elements at all.
So this "gdb+vgdb" will show me the valgrind internal tables that keep track of what's initialized and what isn't?
Comment 4 Carl Ponder 2016-11-03 00:51:13 UTC
Can you please list out the commands more precisely?
I ran these commands in one window:

      module purge
      module load pgi/16.9
      module load gcc/4.8.5
      module load valgrind

      pgfortran -o test03.pgi test03.f90 -O0 -gopt -Mstack_arrays
      valgrind --tool=memcheck --vgdb=full --vgdb-error=0 test03.pgi

Then in the second window I ran these commands:

      module purge
      module load pgi/16.9
      module load gcc/4.8.5
      module load valgrind

      gdb test03.pgi
      target remote | vgdb

      b 77
      c

so far so good. But now:

      print N

gives

      Cannot access memory at address 0x4011a0000000

Why is this? And

      print x(1)

gives

      value being subranged must be in memory

And

      xb 0x4011a0000000

gives

      Undefined command: "xb".  Try "help".
Comment 5 Philippe Waroquiers 2016-11-03 06:20:17 UTC
(In reply to Carl Ponder from comment #3)
> This "pgfortran" is the PGI Fortran compiler.
> What I'm puzzled about is why valgrind is finding more uninitialized
> array-elements when I compiled with gfortran than with pgfortran, and if I
> use
> 
> pgfortran -O0 -gopt -Mstack_arrays ...
> 
> valgrind doesn't find any uninitialized array-elements at all.
> So this "gdb+vgdb" will show me the valgrind internal tables that keep track
> of what's initialized and what isn't?
The best is to read the manual:
see http://www.valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.gdbserver
and the section in the memcheck part describing the memcheck specific
monitor commands
http://www.valgrind.org/docs/manual/mc-manual.html#mc-manual.monitor-commands

The manual is (supposed to be) clear/complete and understandable, but
can for sure always be improved (so, file a new bug if something is not clear)

Thanks
Comment 6 Philippe Waroquiers 2016-11-03 06:42:00 UTC
(In reply to Carl Ponder from comment #4)
> Can you please list out the commands more precisely?
> I ran these commands in one window:
> 
>       module purge
>       module load pgi/16.9
>       module load gcc/4.8.5
>       module load valgrind
> 
>       pgfortran -o test03.pgi test03.f90 -O0 -gopt -Mstack_arrays
>       valgrind --tool=memcheck --vgdb=full --vgdb-error=0 test03.pgi
> 
> Then in the second window I ran these commands:
> 
>       module purge
>       module load pgi/16.9
>       module load gcc/4.8.5
>       module load valgrind
> 
>       gdb test03.pgi
>       target remote | vgdb
> 
>       b 77
>       c
> 
> so far so good. But now:
> 
>       print N
> 
> gives
> 
>       Cannot access memory at address 0x4011a0000000
Strange.
Do you see the same when debugging test03.pgi natively ?
   (i.e. when not using target remote | vgdb ?

Maybe gdb does not properly understand the debugging info
generated by pgfortran ?
If gdb can properly print e.g. N when natively debugging
but cannot when using target remote, then that looks
like a bug (in gdb and/or in valgrind gdbserver)
What is the version of gdb you are using ?


> 
> Why is this? And
> 
>       print x(1)
> 
> gives
> 
>       value being subranged must be in memory
I guess the problem here is similar to the print N.

An alternative is to modify your program so that it prints
the addresses of the variables to examine.
Then you should be able to use xb monitor command without
having to use e.g. (gdb) print &X(1)

> 
> And
> 
>       xb 0x4011a0000000
> 
> gives
> 
>       Undefined command: "xb".  Try "help".
See valgrind user manual, explaining what is a monitor command
and how to use them.
Basically, a monitor command is a string that gdb will send
to the remote gdbserver. This string is sent by gdb using
   'monitor'
e.g.
(gdb) monitor xb 0x1234
The manual explains it all, and give examples.
Comment 7 Carl Ponder 2016-11-03 13:00:17 UTC
Ok here's better -- I can see the data if I compile using "-O0 -g" rather than "-O0 -gopt", which I'd assumed would be the same thing.
Here's what I'm seeing in the step-through: at line 77, the array contains

      (gdb) print x
      $1 = (0, 1, 2, 3, 4, 0, 69349896, 0, 19, 0)

where x(6:10) are uninitialized values. Here are the bits for the 40-byte range of x:

(gdb) print &x
$6 = (PTR TO -> ( integer (10))) 0xffeffed90
(gdb) monitor xb 0xffeffed90 40
		  00	  00	  00	  00	  00	  00	  00	  00
0xFFEFFED90:	0x00	0x00	0x00	0x00	0x01	0x00	0x00	0x00
		  00	  00	  00	  00	  00	  00	  00	  00
0xFFEFFED98:	0x02	0x00	0x00	0x00	0x03	0x00	0x00	0x00
		  00	  00	  00	  00	  00	  00	  00	  00
0xFFEFFEDA0:	0x04	0x00	0x00	0x00	0x00	0x00	0x00	0x00
		  00	  00	  00	  00	  00	  00	  00	  00
0xFFEFFEDA8:	0x08	0x32	0x22	0x04	0x00	0x00	0x00	0x00
		  00	  00	  00	  00	  00	  00	  00	  00
0xFFEFFEDB0:	0x13	0x00	0x00	0x00	0x00	0x00	0x00	0x00

This doesn't look right to me, given that x(4) is assigned but x(8) is not:

(gdb) print x(4)
$18 = 3
(gdb) print &x(4)
$19 = (PTR TO -> ( integer )) 0xffeffed9c
(gdb) monitor xb 0xffeffed9c 4
		  00	  00	  00	  00
0xFFEFFED9C:	0x03	0x00	0x00	0x00

(gdb) print x(8)
$20 = 0
(gdb) print &x(8)
$21 = (PTR TO -> ( integer )) 0xffeffedac
(gdb) monitor xb 0xffeffedac 4
		  00	  00	  00	  00
0xFFEFFEDAC:	0x00	0x00	0x00	0x00

Based on the explanation in the document, I would expect all the bytes to be assigned FF for X(1:5) and 00 for the rest.
Comment 8 Carl Ponder 2016-11-03 13:04:40 UTC
If I *don't* compile with the -Mstack_arrays, I get this at line 77 instead:

(gdb) print x
$1 = (0, 1, 2, 3, 4, 0, 0, 0, 0, 0)
(gdb) print &x
$2 = (PTR TO -> ( integer (10))) 0x70881d0

(gdb) monitor xb 0x70881d0 40
		  00	  00	  00	  00	  00	  00	  00	  00
0x70881D0:	0x00	0x00	0x00	0x00	0x01	0x00	0x00	0x00
		  00	  00	  00	  00	  00	  00	  00	  00
0x70881D8:	0x02	0x00	0x00	0x00	0x03	0x00	0x00	0x00
		  00	  00	  00	  00	  ff	  ff	  ff	  ff
0x70881E0:	0x04	0x00	0x00	0x00	0x00	0x00	0x00	0x00
		  ff	  ff	  ff	  ff	  ff	  ff	  ff	  ff
0x70881E8:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
		  ff	  ff	  ff	  ff	  ff	  ff	  ff	  ff
0x70881F0:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
Comment 9 Philippe Waroquiers 2016-11-03 20:45:59 UTC
(In reply to Carl Ponder from comment #8)
> If I *don't* compile with the -Mstack_arrays, I get this at line 77 instead:
> 
> (gdb) print x
> $1 = (0, 1, 2, 3, 4, 0, 0, 0, 0, 0)
> (gdb) print &x
> $2 = (PTR TO -> ( integer (10))) 0x70881d0
> 
> (gdb) monitor xb 0x70881d0 40
> 		  00	  00	  00	  00	  00	  00	  00	  00
> 0x70881D0:	0x00	0x00	0x00	0x00	0x01	0x00	0x00	0x00
> 		  00	  00	  00	  00	  00	  00	  00	  00
> 0x70881D8:	0x02	0x00	0x00	0x00	0x03	0x00	0x00	0x00
> 		  00	  00	  00	  00	  ff	  ff	  ff	  ff
> 0x70881E0:	0x04	0x00	0x00	0x00	0x00	0x00	0x00	0x00
> 		  ff	  ff	  ff	  ff	  ff	  ff	  ff	  ff
> 0x70881E8:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
> 		  ff	  ff	  ff	  ff	  ff	  ff	  ff	  ff
> 0x70881F0:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00

So, the code generated is different. You should now debug at asm instruction
level, using e.g.
   disp /i $pc
 then repeat
   stepi
   xb ...
till you identify which instruction is effectively initialising the array.

At this point, nothing seems abnormal on valgrind side.

So very probably the compiler is generating some code that initialises this
memory. You should discuss with the compiler people to ask why.
Comment 10 Carl Ponder 2016-11-03 21:14:07 UTC
Stopping at line 70 puts it right after the array-allocation but before the array-writes are happening:

     62   implicit none
     63   integer, intent(in) :: N
     64   integer ( kind = 4 ) i
     65   integer ( kind = 4 ) :: x(1:N)
     66 
     67 !
     68 !  X = { 0, 1, 2, 3, 4, ?a, ?b, ?c, ?d, ?e }.
     69 !
     70   do i = 1, 5

The data-state still says initialized, even though the array contains junk values:

(gdb) print x
$2 = (40, 0, 117993993, 0, 117993992, 0, 69349896, 0, 19, 0)
(gdb) print &x
$3 = (PTR TO -> ( integer (10))) 0xffeffed90
(gdb) monitor xb 0xffeffed90 40
		  00	  00	  00	  00	  00	  00	  00	  00
0xFFEFFED90:	0x28	0x00	0x00	0x00	0x00	0x00	0x00	0x00
		  00	  00	  00	  00	  00	  00	  00	  00
0xFFEFFED98:	0x09	0x72	0x08	0x07	0x00	0x00	0x00	0x00
		  00	  00	  00	  00	  00	  00	  00	  00
0xFFEFFEDA0:	0x08	0x72	0x08	0x07	0x00	0x00	0x00	0x00
		  00	  00	  00	  00	  00	  00	  00	  00
0xFFEFFEDA8:	0x08	0x32	0x22	0x04	0x00	0x00	0x00	0x00
		  00	  00	  00	  00	  00	  00	  00	  00
0xFFEFFEDB0:	0x13	0x00	0x00	0x00	0x00	0x00	0x00	0x00

I'm checking with the compiler guys on this.
Comment 11 Carl Ponder 2016-11-22 17:36:25 UTC
Back to comment #9, there *is* no instruction initializing the array, which is why it has some junk entries, regardless of valgirind's lack of mention.

Talking to the PGI people, the -Mxtack_arrays flag causes the local arrays to be allocated on the stack, so the allocation is just a matter of adjusting the stack-pointer, rather than invoking "malloc" or equivalent.

Does valgrind work by intercepting the malloc calls and then tabulating the uninitialized memory-cells? And if the arrays are allocated off of the stack in gfortran or gcc, how would valgrind keep track of this?
Comment 12 Julian Seward 2016-11-22 17:51:16 UTC
Probably your least-worst option at this point is to compile the test
program in the configuration where the errors are not reported, and hope
that it all gets compiled into a single function (which it looks like
it will).  Then disassemble it and maybe we can see if the compiler
stuck in some instructions to zero out the array on the stack before
use.  That strikes me as the most likely outcome.
Comment 13 Carl Ponder 2016-11-22 19:33:58 UTC
Given that there's junk in the array, I know that the contents aren't being zero'd out, and the PGI people confirm that -Mstack_arrays are not initialized. How does valgrind recognize that an array is being initialized under the circumstances? Is it following the control-flow instruction-by-instruction?
Comment 14 Philippe Waroquiers 2016-11-22 20:02:36 UTC
(In reply to Carl Ponder from comment #13)
> Given that there's junk in the array, I know that the contents aren't being
> zero'd out, and the PGI people confirm that -Mstack_arrays are not
> initialized. How does valgrind recognize that an array is being initialized
> under the circumstances? Is it following the control-flow
> instruction-by-instruction?

For Arrays allocated on the heap, the memory is marked uninitialised
when allocated.

For Arrays on the stack (more generally for all stacks variables),
the vars are marked as uninitialised when the stack pointer is decreased
to create the frame.

So, what might happen maybe with the fortran coompiler is that
they do not decrease/increase the SP for each function call
and/or for each scope 
or whatever.
Comment 15 Carl Ponder 2016-11-23 12:59:24 UTC
Created attachment 102408 [details]
Assembly generated without stack-arrays, where valgrind works
Comment 16 Carl Ponder 2016-11-23 13:00:06 UTC
Created attachment 102409 [details]
Assembly generated with stack arrays, where valgrind doesn't work
Comment 17 Carl Ponder 2016-11-23 13:11:37 UTC
I uploaded the two assembly-files. From the "sdiff", I think this is where the allocations vary:

              -Mnostack_arrays                             -Mstack_arrays
        --------------------------------           -------------------------------
    494 ..Dcfi3:                                   ..Dcfi3:
    495         subq    $48, %rsp                |         subq    $32, %rsp
    496         movq    %rbx, -24(%rbp)          |         movq    %rbx, -16(%rbp)
    497         movq    %r12, -32(%rbp)          |         movq    %r12, -24(%rbp)
    498         movq    %r13, -40(%rbp)          |         movq    %r13, -32(%rbp)
    499 ##  lineno: 38                             ##  lineno: 38
    500         movq    %rdi, %rbx                         movq    %rdi, %rbx
    501         movl    (%rbx), %eax                       movl    (%rbx), %eax
    502         movl    %eax, -16(%rbp)          |         movl    %eax, -8(%rbp)
    503         movslq  -16(%rbp), %rax          |         movslq  -8(%rbp), %rdi
    504         movq    %rax, -8(%rbp)           |         shlq    $2, %rdi
    505         leaq    -8(%rbp), %rdi           |         call    __builtin_aa
                xorl    %eax, %eax               <
                movl    $.C2_299, %esi           <
                call    pgf90_auto_alloc04       <
                movq    %rax, %r12                         movq    %rax, %r12

(I'm including the line-numbers, up to the point where they correspond between the two files).
I'm guessing that these pgf90_auto_alloc04 / __builtin_aa are performing the allocations, I'll check with PGI on this.
Comment 18 Carl Ponder 2016-11-30 19:05:39 UTC
PGI confirms that this call to "__builtin_aa" is what's bumping the stack pointer. It's a subroutine inside the PGI runtime.

Does valgrind have a way for us to intercept this subroutine-call and then mark the array-elements as being uninitialized? I think this would solve the problem for us.
Comment 19 Tom Hughes 2016-11-30 19:53:15 UTC
They should already be marked as uninitialised when __builtin_aa adjusts the stack pointer - the problem is that they will then be changed to inaccessible when it returns because the caller is not normally supposed to rely on stack values allocated by the callee.

Basically that routine is not ABI compliant by the sounds of it, which may be fine for something generated by the compiler, but it creates problems for external tools like valgrind.

In principle it should be possible to intercept it though, so long as it appears in the symbol table.
Comment 20 Tom Hughes 2016-11-30 19:56:01 UTC
Actually given that the return from the call will unwind the stack again that means the caller will be accessing values below the stack pointer which is unsafe if a signal fires as the signal may trash the stack below the stack pointer (there's a small extra redzone below sp that is safe on x86_64 but not on x86_32).
Comment 21 Julian Seward 2016-11-30 20:08:52 UTC
(In reply to Tom Hughes from comment #19)
My assumption about what __builtin_aa does is: it moves RSP down by the
specified amount, zeroes out the new area, and then returns.  Except ..
how does it return?  It must have to copy its own return address to just
below the newly allocated area, and only then return.

It would be possible to intercept it, but you'd have to hand-write a
replacement in assembly, since the above isn't doable in C.
Comment 22 Carl Ponder 2016-11-30 20:42:02 UTC
I know they're not zeroing out the space.
As far as trying to intercept the subroutine-call, I've worked a little on this level

      coregrind/m_syswrap

but these only intercept system-calls, right?
And you're saying that there's no analogous convention for me to intercept calls into the PGI runtime and record the uninitialized data state, right?
Comment 23 Julian Seward 2016-11-30 20:55:23 UTC
(In reply to Carl Ponder from comment #22)
> I know they're not zeroing out the space.

That doesn't sync with my understanding of the discussion above.

I think your chances of getting a definitive answer are low
without providing an executable test case, with symbols, that 
we can try out.
Comment 24 Carl Ponder 2016-12-01 10:39:11 UTC
I can upload an executable, or I can give you the source-code for the test and instructions on how to build and run it.
You'd still need to have the PGI runtime installed. I can help you get a demo copy if you need.

About the zeroing of the space, (a) I can see there's nonzero junk in the array, and (b) PGI insists that they don't zero-out stack arrays. Why do you keep insisting that they do? NVIDIA owns PGI and I've been in weekly con-calls with their compiler developers for the last 5 years.