Bug 404843 - s390x: backtrace sometimes ends prematurely
Summary: s390x: backtrace sometimes ends prematurely
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: unspecified
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-02-26 14:25 UTC by Andreas Arnez
Modified: 2019-04-08 05:02 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:


Attachments
Proposed fix (24.17 KB, patch)
2019-04-03 15:17 UTC, Julian Seward
Details
New GET_STARTREGS implementation for s390x (1.88 KB, text/plain)
2019-04-05 14:27 UTC, Andreas Arnez
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Arnez 2019-02-26 14:25:14 UTC
Sometimes Valgrind cannot unwind the stack correctly on s390x.

For instance, consider this program:

  int fun_b(int *p) { return *p; }
  int fun_a(int *p) { return fun_b(p); }
  int main() { return fun_a(0); }

when compiling with gcc 8.0.1 and running under Valgrind, the backtrace looks like this:

==100497==    at 0x1000516: fun_b (in /home/arnez/tmp/backtrace-test)
==100497==    by 0x1000551: fun_a (in /home/arnez/tmp/backtrace-test)
==100497==    by 0x4866107: ??? (in /usr/lib64/libc-2.27.so)

Note that "main" is missing from the backtrace.

This problem also affects various test cases in Valgrind's test suite.
Comment 1 Andreas Arnez 2019-03-06 19:19:27 UTC
The assembly of fun_b from the sample program above starts like this:
        ldgr    %f2,%r11
        .cfi_register 11, 17
        ldgr    %f0,%r15
        .cfi_register 15, 16
Which means that the entry values of r11 (frame pointer) and r15 (stack pointer) must be retrieved from floating-point registers f2 and f0, respectively.  In leaf functions like this it is pretty common for newer GCCs to spill general-purpose registers into floating-point registers.
Comment 2 Julian Seward 2019-03-10 10:03:28 UTC
This is ungood; but that said: if the number of FP registers
involved is small and fixed (eg, it's only ever f0/f1/f2/f3 usw)
then we might be able to fix it within the existing unwind 
framework, by adding the relevant FP registers to the set of registers
that are unwound for s390.

Basically that would mean adding the relevant registers to struct
D3UnwindRegs, and then adding code to update the new values.
Comment 3 Andreas Arnez 2019-03-13 19:17:31 UTC
To summarize some of the discussion on IRC about this topic today (please correct if I misunderstood anything):

* Valgrind's unwinder needs to be extra fast.  For instance, Memcheck takes a stack trace for each block allocated, and then again when it's freed.

* In order to be fast, the unwinder limits the number of registers to unwind.  Currently no platform unwinds more than 5 registers.

* Initial register values are copied into an UnwindStartRegs in VG_(get_UnwindStartRegs) from the registers of the specified simulated threads.

* In order to use FPRs for unwinding, they have to be added to the UnwindStartRegs and D3UnwindRegs structures, and summary rule fields for the s390 variant of DiCfSI must be added as well.  Preferably we only add a few FPRs, to limit the performance impact.

I'll probably look into that soon.
Comment 4 Andreas Arnez 2019-04-02 15:37:58 UTC
Some background info: Spilling into FPRs has been around for a while.  In upstream GCC it was introduced for s390x with this commit:

  https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=203303

Which is described by this post on the gcc-patches mailing list:

  https://gcc.gnu.org/ml/gcc-patches/2013-10/msg00534.html
Comment 5 Julian Seward 2019-04-03 15:17:56 UTC
Created attachment 119229 [details]
Proposed fix

For s390x, adds unwinding of f0..f7, so as to be able the handle the case
where GPRs are saved in caller-saved fp registers.
Comment 6 Andreas Arnez 2019-04-04 18:18:17 UTC
(In reply to Julian Seward from comment #5)
> Created attachment 119229 [details]
> Proposed fix
> 
> For s390x, adds unwinding of f0..f7, so as to be able the handle the case
> where GPRs are saved in caller-saved fp registers.
Thanks!  This certainly helps a lot.  In my limited testing the unwinding seems to have worked consistently with this patch.  Note that I've only tried GCC 8.0.1.  A few comments:

> [...]
> --- a/coregrind/m_debuginfo/debuginfo.c
> +++ b/coregrind/m_debuginfo/debuginfo.c
> @@ -3199,6 +3199,15 @@ Addr ML_(get_CFA) ( Addr ip, Addr sp, Addr fp,
> [...]
>       uregs.sp = sp;
>       uregs.fp = fp;
> +     /* JRS FIXME 3 Apr 2019: surely we can do better for f0..f7 */
> +     uregs.f0 = 0;
> +     uregs.f1 = 0;
I think we can skip the FPR initialization here.  IIUC, we'd only need it
if the CFA depended on an FPR.  But this would be a very unusual compiler
decision, because an FPR can't be used for addressing.  I'd say we can
rule that out.

> [...]
> @@ -3259,6 +3268,8 @@ void VG_(ppUnwindInfo) (Addr from, Addr to)
>     For arm, the unwound registers are: R7 R11 R12 R13 R14 R15.
>  
>     For arm64, the unwound registers are: X29(FP) X30(LR) SP PC.
> +
> +   For s390, the unwound registers are: R11(FP) R14(LR) R15(SP) F0 F2 F4 …
This doesn't match the current implementation, right?  In fact it seems
that F0-F7 are unwound, although F0, F2, and F4 are probably sufficient (I
hope).  Theoretically we could even avoid unwinding the FPRs at all; we
just need them for the initial frame.  Since all of F0-F7 are volatile,
the caller can't really have saved anything there.

By the way, the PC doesn't need to be unwound either, since unwound frames
have PC == R14.

> [...]
> --- a/coregrind/m_debuginfo/storage.c
> +++ b/coregrind/m_debuginfo/storage.c
> @@ -138,7 +138,31 @@ void ML_(ppDiCfSI) ( const XArray* /* of CfiExpr */ e…
> [...]
> +            VG_(printf)("oldF6");                \
> +         } else                                  \
> +         if (_how == CFIR_S390X_F7) {            \
> +            VG_(printf)("oldF7");                \
> +         } else{                                 \
Missing space.

> [...]
> --- a/coregrind/m_libcassert.c
> +++ b/coregrind/m_libcassert.c
> @@ -176,6 +176,15 @@
> [...]
>          (srP)->misc.S390X.r_fp = fp;                      \
>          (srP)->misc.S390X.r_lr = lr;                      \
> +        /* ANDREAS 3 Apr 2019 FIXME r_f0..r_f7 */         \
> +        (srP)->misc.S390X.r_f0 = 0;/*FIXME*/              \
> +        (srP)->misc.S390X.r_f1 = 0;/*FIXME*/              \
Let me look into this, I'll provide a suitable inline assembly.

> [...]
> --- a/coregrind/m_machine.c
> +++ b/coregrind/m_machine.c
> @@ -118,6 +118,23 @@ void VG_(get_UnwindStartRegs) ( /*OUT*/UnwindStartReg…
> [...]
>     regs->misc.S390X.r_lr
>        = VG_(threads)[tid].arch.vex.guest_LR;
> +   /* ANDREAS 3 Apr 2019 FIXME r_f0..r_f7: is this correct? */
> +   regs->misc.S390X.r_f0
> +      = VG_(threads)[tid].arch.vex.guest_v0.w64[0];
Looks correct to me.  The FPRs overlap with the lower-addressed halves of
the respective vector registers.

> [...]
> --- a/coregrind/m_signals.c
> +++ b/coregrind/m_signals.c
> @@ -526,6 +526,15 @@ typedef struct SigQueue {
>          (srP)->r_sp = (ULong)((uc)->uc_mcontext.regs.gprs[15]);    \
>          (srP)->misc.S390X.r_fp = (uc)->uc_mcontext.regs.gprs[11];  \
>          (srP)->misc.S390X.r_lr = (uc)->uc_mcontext.regs.gprs[14];  \
> +        /* ANDREAS 3 Apr 2019 FIXME r_f0..r_f7: is this correct? */ \
> +        (srP)->misc.S390X.r_f0 = (uc)->uc_mcontext.fpregs.fprs[0]; \
Looks OK.
Comment 7 Julian Seward 2019-04-04 18:30:37 UTC
Thanks for the testing and review.

> +   For s390, the unwound registers are: R11(FP) R14(LR) R15(SP) F0 F2 F4 …
> This doesn't match the current implementation, right?

That's correct.  The comment is out of date.  I'll fix it.

> In fact it seems that F0-F7 are unwound, although F0, F2, and F4 are
> probably sufficient (I hope).

I at first implemented only F0, F2 and F4, but then I thought it would be
worth checking for "summarisation failures" (failures to represent the
unwind info in V's internal structures) when reading all the unwind data
for libc/libpthread/whatever when starting a hello-world program.  From that
it was obvious that the compiler really does use all of F0-F7 at least
once.  So I decided to do them all.
Comment 8 Andreas Arnez 2019-04-05 14:27:34 UTC
Created attachment 119257 [details]
New GET_STARTREGS implementation for s390x

(In reply to Julian Seward from comment #7)
> [...]  From that
> it was obvious that the compiler really does use all of F0-F7 at least
> once.  So I decided to do them all.
Oh, my.

OK, here's a suggested new implementation of GET_STARTREGS for s390x in m_libcassert.c.
Comment 9 Julian Seward 2019-04-08 05:02:49 UTC
Fixed, d36ea889d8d8a1646be85c30ab5771af6912b7a1.