Bug 479996 - Segmentation fault on aarch64 checking programs built with -fstack-check
Summary: Segmentation fault on aarch64 checking programs built with -fstack-check
Status: REPORTED
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (other bugs)
Version First Reported In: 3.20.0
Platform: Other Linux
: NOR crash
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-01-18 15:17 UTC by Emanuele Rocca
Modified: 2024-01-24 12:56 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Emanuele Rocca 2024-01-18 15:17:54 UTC
Hi,

On aarch64, using valgrind on the following program built with -fstack-check results in a segmentation fault:
  
  // example.c
  void a_function() { char buf[10752]; }
  int main() { a_function(); }

gcc -fstack-check example.c -o example && valgrind ./example

==2743238== Memcheck, a memory error detector
==2743238== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==2743238== Using Valgrind-3.20.0 and LibVEX; rerun with -h for copyright info
==2743238== Command: ./example
==2743238== 
==2743238== Invalid write of size 8
==2743238==    at 0x10873C: main (in /tmp/example)
==2743238==  Address 0x1ffefff9c0 is on thread 1's stack
==2743238==  4112 bytes below stack pointer
==2743238== 
==2743238== Invalid write of size 8
==2743238==    at 0x108718: a_function (in /tmp/example)
==2743238==    by 0x10874B: main (in /tmp/example)
==2743238==  Address 0x1ffeffe9c0 is on thread 1's stack
==2743238==  8192 bytes below stack pointer
==2743238== 
==2743238== Invalid write of size 8
==2743238==    at 0x108720: a_function (in /tmp/example)
==2743238==    by 0x10874B: main (in /tmp/example)
==2743238==  Address 0x1ffeffdfc0 is not stack'd, malloc'd or (recently) free'd
==2743238== 
==2743238== 
==2743238== Process terminating with default action of signal 11 (SIGSEGV)
==2743238==  Access not within mapped region at address 0x1FFEFFDFC0
==2743238==    at 0x108720: a_function (in /tmp/example)
==2743238==    by 0x10874B: main (in /tmp/example)
==2743238==  If you believe this happened as a result of a stack
==2743238==  overflow in your program's main thread (unlikely but
==2743238==  possible), you can try to increase the size of the
==2743238==  main thread stack using the --main-stacksize= flag.
==2743238==  The main thread stack size used in this run was 8388608.
==2743238== 
==2743238== HEAP SUMMARY:
==2743238==     in use at exit: 0 bytes in 0 blocks
==2743238==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==2743238== 
==2743238== All heap blocks were freed -- no leaks are possible
==2743238== 
==2743238== For lists of detected and suppressed errors, rerun with: -s
==2743238== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0)
Segmentation fault
Comment 1 Adhemerval Zanella 2024-01-24 12:56:02 UTC
I think the issue is how -fstack-check is implemented on aarch64 and I am not sure how to fix on valgrind without decrease the coverage somehow. On aarch64, the example code is translated as:

---
a_function:
        sub     x10, sp, #8192
        str     xzr, [x10]
        sub     x10, x10, #4096
        str     xzr, [x10, 1536]
        mov     x12, 10752
        sub     sp, sp, x12
        nop
        add     sp, sp, x12
        ret
main:
        sub     x10, sp, #8192
        str     xzr, [x10, 4080]
        stp     x29, x30, [sp, -16]!
        mov     x29, sp
        bl      a_function
        mov     w0, 0
        ldp     x29, x30, [sp], 16
        ret
---

So the stack is probed *without* updating the stack pointer, and it is technically a stack overflow. Different than x86_64, where the stack pointer is updated and then adjusted after the probe:

---
main:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 8216
        or      QWORD PTR [rsp], 0
        add     rsp, 8216
        mov     eax, 0
        call    a_function
        mov     eax, 0
        leave
        ret
---

Both ABIs defined specific code generation (-fstack-check=specific), but x86_64 also internally defines STACK_CHECK_MOVING_SP (https://gcc.gnu.org/onlinedocs//gccint/Stack-Checking.html). I think it is because x86_64 frame generation code requires the SP to be updated to correctly generate FP and local variables access correctly, which is not required by aarch64 backend and thus gcc developers could turn it off for better code generation.

The SP update is not strickly required, since for Linux and other OS, the kernel will guarantee a minimum stack size that is lazy allocated. Ideally the probe would either trigger lazy allocations through soft page faults or hit the guard page/invalid memory allocation. The problem is -fstack-check strategy has multiple corner cases, that's why recent gcc provides the -fstack-clash-protection instead.

On aarch64 the -fstack-clash-protection will update the SP in the expected increments if the stack allocation requires the probing. Using -fstack-clash-protection --param stack-clash-protection-probe-interval=12 --param stack-clash-protection-guard-size=12:

---
a_function:
        sub     sp, sp, #4096
        str     xzr, [sp, 1024]
        sub     sp, sp, #4096
        str     xzr, [sp, 1024]
        sub     sp, sp, #2560
        nop
        mov     x12, 10752
        add     sp, sp, x12
        ret
main:
        stp     x29, x30, [sp, -16]!
        mov     x29, sp
        bl      a_function
        mov     w0, 0
        ldp     x29, x30, [sp], 16
        ret
---

So one option to proper fix would fix aarch64 code generation to always update and rollback SP, as x86_64. I am not sure if this is really worth once -fstack-clash-protection has a better strategy for stack probing. The ADA compiler uses -fstack-clash as default, but I am not sure the language runtime requirement and/or -fstack-clash-protection is used instead.

As a side note, this issue happens not only for aarch64 but potentially for all architectures that might not set STACK_CHECK_MOVING_SP. For instance on powerpc64le:

$ gcc -fstack-check example.c -o example && valgrind ./example
[...]
==2197773== Invalid write of size 8
==2197773==    at 0x100005FC: main (in /home/azanella/projects/valgrind/bz479996/example-fstack-check)
==2197773==  Address 0x1fff00a760 is on thread 1's stack
==2197773==  16496 bytes below stack pointer
==2197773==
[...]

SInce the stack probing is also doing without updating the SP (r0 is used as the scratch register):
---
main:
.LFB1:
	.cfi_startproc
.LCF1:
0:	addis 2,12,.TOC.-.LCF1@ha
	addi 2,2,.TOC.-.LCF1@l
	.localentry	main,.-main
	std 0,-16496(1)
	mflr 0
	std 0,16(1)
	std 31,-8(1)
	stdu 1,-112(1)
	.cfi_def_cfa_offset 112
	.cfi_offset 65, 16
	.cfi_offset 31, -8
	mr 31,1
	.cfi_def_cfa_register 31
	bl a_function
	li 9,0
	extsw 9,9
	mr 3,9
	addi 1,31,112
	.cfi_def_cfa 1, 0
	ld 0,16(1)
	mtlr 0
	ld 31,-8(1)
	blr
---