Created attachment 101954 [details] Simple Fortran test-case using array with dynamic bound. I have a simple Fortran test-case that allocates an array and uses uninitialized values from it. Using the PGI compiler, if I compile it using the -Mstack_arrays option, valgrind reports 0 errors. I also have a HUGE program (WRF) where valgrind is likewise not reporting anything in spite of the fact that uninitialized array-elements are being used, so I'm trying to track down issues like this one. Can you guys explain what's going on? I'm also checking with PGI on this.
I attached the test-case here. You can reproduce the issue as follows: pgfortran -o test03.pgi test03.f90 -O0 -gopt valgrind test03.pgi # 12 errors. pgfortran -o test03.pgi test03.f90 -O0 -gopt -Mstack_arrays valgrind test03.pgi # 0 errors. I'm using the PGI 16.9 compiler running on CentOS 7.2. The valgrind was built with GCC 4.8.5.
(In reply to Carl Ponder from comment #0) > Created attachment 101954 [details] > Simple Fortran test-case using array with dynamic bound. > > I have a simple Fortran test-case that allocates an array and uses > uninitialized values from it. Using the PGI compiler, if I compile it using > the -Mstack_arrays option, valgrind reports 0 errors. > > I also have a HUGE program (WRF) where valgrind is likewise not reporting > anything in spite of the fact that uninitialized array-elements are being > used, so I'm trying to track down issues like this one. > > Can you guys explain what's going on? I'm also checking with PGI on this. No idea, and when I try to reproduce on my debian box, it tells: No command 'pgfortran' found, did you mean: Command 'gfortran' from package 'gfortran' (main) So, here is what I suggest: Compile your application with debugging information. Then use gdb+vgdb to step in your application (see http://www.valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.gdbserver for more information) Use the xb monitor command: xb <addr> [<len>] shows the definedness (V) bits and values for <len> (default 1) bytes starting at <addr> to see at which moment the memory of x(6..10) becomes initialised. You probably better use --vgdb=full to be sure to step precisely (and even maybe use stepi when relevant).
This "pgfortran" is the PGI Fortran compiler. What I'm puzzled about is why valgrind is finding more uninitialized array-elements when I compiled with gfortran than with pgfortran, and if I use pgfortran -O0 -gopt -Mstack_arrays ... valgrind doesn't find any uninitialized array-elements at all. So this "gdb+vgdb" will show me the valgrind internal tables that keep track of what's initialized and what isn't?
Can you please list out the commands more precisely? I ran these commands in one window: module purge module load pgi/16.9 module load gcc/4.8.5 module load valgrind pgfortran -o test03.pgi test03.f90 -O0 -gopt -Mstack_arrays valgrind --tool=memcheck --vgdb=full --vgdb-error=0 test03.pgi Then in the second window I ran these commands: module purge module load pgi/16.9 module load gcc/4.8.5 module load valgrind gdb test03.pgi target remote | vgdb b 77 c so far so good. But now: print N gives Cannot access memory at address 0x4011a0000000 Why is this? And print x(1) gives value being subranged must be in memory And xb 0x4011a0000000 gives Undefined command: "xb". Try "help".
(In reply to Carl Ponder from comment #3) > This "pgfortran" is the PGI Fortran compiler. > What I'm puzzled about is why valgrind is finding more uninitialized > array-elements when I compiled with gfortran than with pgfortran, and if I > use > > pgfortran -O0 -gopt -Mstack_arrays ... > > valgrind doesn't find any uninitialized array-elements at all. > So this "gdb+vgdb" will show me the valgrind internal tables that keep track > of what's initialized and what isn't? The best is to read the manual: see http://www.valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.gdbserver and the section in the memcheck part describing the memcheck specific monitor commands http://www.valgrind.org/docs/manual/mc-manual.html#mc-manual.monitor-commands The manual is (supposed to be) clear/complete and understandable, but can for sure always be improved (so, file a new bug if something is not clear) Thanks
(In reply to Carl Ponder from comment #4) > Can you please list out the commands more precisely? > I ran these commands in one window: > > module purge > module load pgi/16.9 > module load gcc/4.8.5 > module load valgrind > > pgfortran -o test03.pgi test03.f90 -O0 -gopt -Mstack_arrays > valgrind --tool=memcheck --vgdb=full --vgdb-error=0 test03.pgi > > Then in the second window I ran these commands: > > module purge > module load pgi/16.9 > module load gcc/4.8.5 > module load valgrind > > gdb test03.pgi > target remote | vgdb > > b 77 > c > > so far so good. But now: > > print N > > gives > > Cannot access memory at address 0x4011a0000000 Strange. Do you see the same when debugging test03.pgi natively ? (i.e. when not using target remote | vgdb ? Maybe gdb does not properly understand the debugging info generated by pgfortran ? If gdb can properly print e.g. N when natively debugging but cannot when using target remote, then that looks like a bug (in gdb and/or in valgrind gdbserver) What is the version of gdb you are using ? > > Why is this? And > > print x(1) > > gives > > value being subranged must be in memory I guess the problem here is similar to the print N. An alternative is to modify your program so that it prints the addresses of the variables to examine. Then you should be able to use xb monitor command without having to use e.g. (gdb) print &X(1) > > And > > xb 0x4011a0000000 > > gives > > Undefined command: "xb". Try "help". See valgrind user manual, explaining what is a monitor command and how to use them. Basically, a monitor command is a string that gdb will send to the remote gdbserver. This string is sent by gdb using 'monitor' e.g. (gdb) monitor xb 0x1234 The manual explains it all, and give examples.
Ok here's better -- I can see the data if I compile using "-O0 -g" rather than "-O0 -gopt", which I'd assumed would be the same thing. Here's what I'm seeing in the step-through: at line 77, the array contains (gdb) print x $1 = (0, 1, 2, 3, 4, 0, 69349896, 0, 19, 0) where x(6:10) are uninitialized values. Here are the bits for the 40-byte range of x: (gdb) print &x $6 = (PTR TO -> ( integer (10))) 0xffeffed90 (gdb) monitor xb 0xffeffed90 40 00 00 00 00 00 00 00 00 0xFFEFFED90: 0x00 0x00 0x00 0x00 0x01 0x00 0x00 0x00 00 00 00 00 00 00 00 00 0xFFEFFED98: 0x02 0x00 0x00 0x00 0x03 0x00 0x00 0x00 00 00 00 00 00 00 00 00 0xFFEFFEDA0: 0x04 0x00 0x00 0x00 0x00 0x00 0x00 0x00 00 00 00 00 00 00 00 00 0xFFEFFEDA8: 0x08 0x32 0x22 0x04 0x00 0x00 0x00 0x00 00 00 00 00 00 00 00 00 0xFFEFFEDB0: 0x13 0x00 0x00 0x00 0x00 0x00 0x00 0x00 This doesn't look right to me, given that x(4) is assigned but x(8) is not: (gdb) print x(4) $18 = 3 (gdb) print &x(4) $19 = (PTR TO -> ( integer )) 0xffeffed9c (gdb) monitor xb 0xffeffed9c 4 00 00 00 00 0xFFEFFED9C: 0x03 0x00 0x00 0x00 (gdb) print x(8) $20 = 0 (gdb) print &x(8) $21 = (PTR TO -> ( integer )) 0xffeffedac (gdb) monitor xb 0xffeffedac 4 00 00 00 00 0xFFEFFEDAC: 0x00 0x00 0x00 0x00 Based on the explanation in the document, I would expect all the bytes to be assigned FF for X(1:5) and 00 for the rest.
If I *don't* compile with the -Mstack_arrays, I get this at line 77 instead: (gdb) print x $1 = (0, 1, 2, 3, 4, 0, 0, 0, 0, 0) (gdb) print &x $2 = (PTR TO -> ( integer (10))) 0x70881d0 (gdb) monitor xb 0x70881d0 40 00 00 00 00 00 00 00 00 0x70881D0: 0x00 0x00 0x00 0x00 0x01 0x00 0x00 0x00 00 00 00 00 00 00 00 00 0x70881D8: 0x02 0x00 0x00 0x00 0x03 0x00 0x00 0x00 00 00 00 00 ff ff ff ff 0x70881E0: 0x04 0x00 0x00 0x00 0x00 0x00 0x00 0x00 ff ff ff ff ff ff ff ff 0x70881E8: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 ff ff ff ff ff ff ff ff 0x70881F0: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
(In reply to Carl Ponder from comment #8) > If I *don't* compile with the -Mstack_arrays, I get this at line 77 instead: > > (gdb) print x > $1 = (0, 1, 2, 3, 4, 0, 0, 0, 0, 0) > (gdb) print &x > $2 = (PTR TO -> ( integer (10))) 0x70881d0 > > (gdb) monitor xb 0x70881d0 40 > 00 00 00 00 00 00 00 00 > 0x70881D0: 0x00 0x00 0x00 0x00 0x01 0x00 0x00 0x00 > 00 00 00 00 00 00 00 00 > 0x70881D8: 0x02 0x00 0x00 0x00 0x03 0x00 0x00 0x00 > 00 00 00 00 ff ff ff ff > 0x70881E0: 0x04 0x00 0x00 0x00 0x00 0x00 0x00 0x00 > ff ff ff ff ff ff ff ff > 0x70881E8: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 > ff ff ff ff ff ff ff ff > 0x70881F0: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 So, the code generated is different. You should now debug at asm instruction level, using e.g. disp /i $pc then repeat stepi xb ... till you identify which instruction is effectively initialising the array. At this point, nothing seems abnormal on valgrind side. So very probably the compiler is generating some code that initialises this memory. You should discuss with the compiler people to ask why.
Stopping at line 70 puts it right after the array-allocation but before the array-writes are happening: 62 implicit none 63 integer, intent(in) :: N 64 integer ( kind = 4 ) i 65 integer ( kind = 4 ) :: x(1:N) 66 67 ! 68 ! X = { 0, 1, 2, 3, 4, ?a, ?b, ?c, ?d, ?e }. 69 ! 70 do i = 1, 5 The data-state still says initialized, even though the array contains junk values: (gdb) print x $2 = (40, 0, 117993993, 0, 117993992, 0, 69349896, 0, 19, 0) (gdb) print &x $3 = (PTR TO -> ( integer (10))) 0xffeffed90 (gdb) monitor xb 0xffeffed90 40 00 00 00 00 00 00 00 00 0xFFEFFED90: 0x28 0x00 0x00 0x00 0x00 0x00 0x00 0x00 00 00 00 00 00 00 00 00 0xFFEFFED98: 0x09 0x72 0x08 0x07 0x00 0x00 0x00 0x00 00 00 00 00 00 00 00 00 0xFFEFFEDA0: 0x08 0x72 0x08 0x07 0x00 0x00 0x00 0x00 00 00 00 00 00 00 00 00 0xFFEFFEDA8: 0x08 0x32 0x22 0x04 0x00 0x00 0x00 0x00 00 00 00 00 00 00 00 00 0xFFEFFEDB0: 0x13 0x00 0x00 0x00 0x00 0x00 0x00 0x00 I'm checking with the compiler guys on this.
Back to comment #9, there *is* no instruction initializing the array, which is why it has some junk entries, regardless of valgirind's lack of mention. Talking to the PGI people, the -Mxtack_arrays flag causes the local arrays to be allocated on the stack, so the allocation is just a matter of adjusting the stack-pointer, rather than invoking "malloc" or equivalent. Does valgrind work by intercepting the malloc calls and then tabulating the uninitialized memory-cells? And if the arrays are allocated off of the stack in gfortran or gcc, how would valgrind keep track of this?
Probably your least-worst option at this point is to compile the test program in the configuration where the errors are not reported, and hope that it all gets compiled into a single function (which it looks like it will). Then disassemble it and maybe we can see if the compiler stuck in some instructions to zero out the array on the stack before use. That strikes me as the most likely outcome.
Given that there's junk in the array, I know that the contents aren't being zero'd out, and the PGI people confirm that -Mstack_arrays are not initialized. How does valgrind recognize that an array is being initialized under the circumstances? Is it following the control-flow instruction-by-instruction?
(In reply to Carl Ponder from comment #13) > Given that there's junk in the array, I know that the contents aren't being > zero'd out, and the PGI people confirm that -Mstack_arrays are not > initialized. How does valgrind recognize that an array is being initialized > under the circumstances? Is it following the control-flow > instruction-by-instruction? For Arrays allocated on the heap, the memory is marked uninitialised when allocated. For Arrays on the stack (more generally for all stacks variables), the vars are marked as uninitialised when the stack pointer is decreased to create the frame. So, what might happen maybe with the fortran coompiler is that they do not decrease/increase the SP for each function call and/or for each scope or whatever.
Created attachment 102408 [details] Assembly generated without stack-arrays, where valgrind works
Created attachment 102409 [details] Assembly generated with stack arrays, where valgrind doesn't work
I uploaded the two assembly-files. From the "sdiff", I think this is where the allocations vary: -Mnostack_arrays -Mstack_arrays -------------------------------- ------------------------------- 494 ..Dcfi3: ..Dcfi3: 495 subq $48, %rsp | subq $32, %rsp 496 movq %rbx, -24(%rbp) | movq %rbx, -16(%rbp) 497 movq %r12, -32(%rbp) | movq %r12, -24(%rbp) 498 movq %r13, -40(%rbp) | movq %r13, -32(%rbp) 499 ## lineno: 38 ## lineno: 38 500 movq %rdi, %rbx movq %rdi, %rbx 501 movl (%rbx), %eax movl (%rbx), %eax 502 movl %eax, -16(%rbp) | movl %eax, -8(%rbp) 503 movslq -16(%rbp), %rax | movslq -8(%rbp), %rdi 504 movq %rax, -8(%rbp) | shlq $2, %rdi 505 leaq -8(%rbp), %rdi | call __builtin_aa xorl %eax, %eax < movl $.C2_299, %esi < call pgf90_auto_alloc04 < movq %rax, %r12 movq %rax, %r12 (I'm including the line-numbers, up to the point where they correspond between the two files). I'm guessing that these pgf90_auto_alloc04 / __builtin_aa are performing the allocations, I'll check with PGI on this.
PGI confirms that this call to "__builtin_aa" is what's bumping the stack pointer. It's a subroutine inside the PGI runtime. Does valgrind have a way for us to intercept this subroutine-call and then mark the array-elements as being uninitialized? I think this would solve the problem for us.
They should already be marked as uninitialised when __builtin_aa adjusts the stack pointer - the problem is that they will then be changed to inaccessible when it returns because the caller is not normally supposed to rely on stack values allocated by the callee. Basically that routine is not ABI compliant by the sounds of it, which may be fine for something generated by the compiler, but it creates problems for external tools like valgrind. In principle it should be possible to intercept it though, so long as it appears in the symbol table.
Actually given that the return from the call will unwind the stack again that means the caller will be accessing values below the stack pointer which is unsafe if a signal fires as the signal may trash the stack below the stack pointer (there's a small extra redzone below sp that is safe on x86_64 but not on x86_32).
(In reply to Tom Hughes from comment #19) My assumption about what __builtin_aa does is: it moves RSP down by the specified amount, zeroes out the new area, and then returns. Except .. how does it return? It must have to copy its own return address to just below the newly allocated area, and only then return. It would be possible to intercept it, but you'd have to hand-write a replacement in assembly, since the above isn't doable in C.
I know they're not zeroing out the space. As far as trying to intercept the subroutine-call, I've worked a little on this level coregrind/m_syswrap but these only intercept system-calls, right? And you're saying that there's no analogous convention for me to intercept calls into the PGI runtime and record the uninitialized data state, right?
(In reply to Carl Ponder from comment #22) > I know they're not zeroing out the space. That doesn't sync with my understanding of the discussion above. I think your chances of getting a definitive answer are low without providing an executable test case, with symbols, that we can try out.
I can upload an executable, or I can give you the source-code for the test and instructions on how to build and run it. You'd still need to have the PGI runtime installed. I can help you get a demo copy if you need. About the zeroing of the space, (a) I can see there's nonzero junk in the array, and (b) PGI insists that they don't zero-out stack arrays. Why do you keep insisting that they do? NVIDIA owns PGI and I've been in weekly con-calls with their compiler developers for the last 5 years.