| Summary: | LTP testcase mknod06 crashes memcheck | ||
|---|---|---|---|
| Product: | [Developer tools] valgrind | Reporter: | mcermak |
| Component: | memcheck | Assignee: | mcermak |
| Status: | ASSIGNED --- | ||
| Severity: | normal | CC: | pjfloyd |
| Priority: | NOR | ||
| Version First Reported In: | 3.25 GIT | ||
| Target Milestone: | --- | ||
| Platform: | Other | ||
| OS: | Linux | ||
| Latest Commit: | Version Fixed/Implemented In: | ||
| Sentry Crash Report: | |||
| Attachments: |
proposed patch
updated patch updated patch |
||
|
Description
mcermak
2025-10-03 13:46:45 UTC
Probably the first test with a pathname that is longer than PATH_MAX. Right ... I've tweaked the LTP testsuite so that the testcase doesn't fork (ugh :), and then:
$ gdb -q -ex 'set auto-load safe-path /' -ex 'set remote exec-file ./mknod06' -ex 'set sysroot /' -ex 'target extended-remote | vgdb --multi --vargs -q --tool=memcheck' ./mknod06
Reading symbols from ./mknod06...
Remote debugging using | vgdb --multi --vargs -q --tool=memcheck
(gdb) b run
Breakpoint 1 at 0x401ec0: file mknod06.c, line 42.
(gdb) r
Starting program: /home/mcermak/WORK/LTP/testcases/kernel/syscalls/mknod/mknod06
relaying data between gdb and process 4179032
Loaded /usr/local/libexec/valgrind/valgrind-monitor.py
Type "help valgrind" for more info.
tst_buffers.c:57: TINFO: Test is using guarded buffers
tst_tmpdir.c:316: TINFO: Using /tmp/LTP_mknIKjroF as tmpdir (tmpfs filesystem)
tst_test.c:2024: TINFO: LTP version: 20250930
tst_test.c:2027: TINFO: Tested kernel: 6.12.11-200.fc41.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Jan 24 04:59:58 UTC 2025 x86_64
tst_kconfig.c:88: TINFO: Parsing kernel config '/lib/modules/6.12.11-200.fc41.x86_64/config'
tst_test.c:1843: TINFO: Overall timeout per run is 8h 19m 30s
Breakpoint 1, run (i=0) at mknod06.c:42
42 TST_EXP_FAIL(mknod(tc->pathname, MODE_FIFO_RWX, 0), tc->exp_errno, "%s",
(gdb) p tc->pathname
$1 = 0x485dffe 'a' <repeats 200 times>...
(gdb) s
__GI___errno_location () at errno-loc.c:25
25 {
(gdb) n
26 return &errno;
(gdb) n
--4179032-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting
--4179032-- si_code=2; Faulting address: 0x485F000; sp: 0x1002ef6d00
valgrind: the 'impossible' happened:
Killed by fatal signal
host stacktrace:
==4179032== at 0x5801028D: mc_is_defined_asciiz (mc_main.c:4412)
==4179032== by 0x5801028D: check_mem_is_defined_asciiz (mc_main.c:4490)
==4179032== by 0x58099D91: vgPlain_client_syscall (syswrap-main.c:2410)
==4179032== by 0x58095CFA: handle_syscall (scheduler.c:1214)
==4179032== by 0x58097BE6: vgPlain_scheduler (scheduler.c:1588)
==4179032== by 0x581014BA: thread_wrapper (syswrap-linux.c:102)
==4179032== by 0x581014BA: run_a_thread_NORETURN (syswrap-linux.c:154)
sched status:
running_tid=1
Thread 1: status = VgTs_Runnable syscall 259 (lwpid 4179032)
==4179032== at 0x495C017: mknodat (mknodat.c:33)
==4179032== by 0x401F05: run (mknod06.c:42)
==4179032== by 0x40AEDA: run_tests (tst_test.c:1686)
==4179032== by 0x40AEDA: testrun (tst_test.c:1752)
==4179032== by 0x40AEDA: fork_testrun.isra.0 (tst_test.c:1881)
==4179032== by 0x40DD3D: tst_run_tcases (tst_test.c:2044)
==4179032== by 0x401DCD: main (tst_test.h:738)
client stack range: [0x1FFEFFC000 0x1FFF000FFF] client SP: 0x1FFEFFF4B8
valgrind stack range: [0x1002DF7000 0x1002EF6FFF] top usage: 17848 of 1048576
Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.
If that doesn't help, please report this bug to: www.valgrind.org
In the bug report, send all the above text, the valgrind
version, and what OS and version you are using. Thanks.
Program terminated with signal 0, Signal 0.
The program no longer exists.
(gdb)
Created attachment 185488 [details] proposed patch FTR, the mentioned LTP testcase is here (interesting are lines 30 and 32: https://github.com/linux-test-project/ltp/blob/d2550ffbbcfe163212cd7e9c132db65ae0fa06ed/testcases/kernel/syscalls/mknod/mknod06.c#L30 Created attachment 185547 [details]
updated patch
Could you explain exactly what caused the crash?
Does this also affect other syscalls where an input path is longer than PATH_MAX?
The call to ML_(fd_at_check_allowed) should be passing in "mknodat". Also I prefer to use the same argument names as the man page (where syscalls have a direct libc equivalent). If the user sees our error and then looks at "man {error-syscall -function}" then its clearer if all the names match. So that would be "mknodat(dirfd)".
The testcase sets the path argument to 4098 'a' characters. Those are followed by null terminator: >> (gdb) p tcases[0]->pathname >> $1 = 0x7ffff7fbcffe 'a' <repeats 200 times>... >> (gdb) set print elements 0 >> (gdb) p tcases[0]->pathname >> $2 = 0x7ffff7fbcffe 'a' <repeats 4098 times> >> (gdb) p *tcases[0]->pathname >> $3 = 97 'a' >> (gdb) p *(tcases[0]->pathname + 4097) >> $4 = 97 'a' >> (gdb) p *(tcases[0]->pathname + 4098) >> $5 = 0 '\000' >> (gdb) p *(tcases[0]->pathname + 4099) >> $6 = 0 '\000' >> (gdb) p *(tcases[0]->pathname + 4100) >> $7 = 0 '\000' >> (gdb) l >> 47 { >> 48 SAFE_MKNOD("tnode_1", MODE_FIFO_RWX, 0); >> 49 SAFE_MKNOD("tnode", MODE_FIFO_RWX, 0); >> 50 >> 51 for (int i = 0; i <= (PATH_MAX + 1); i++) >> 52 longpathname[i] = 'a'; >> 53 tcases[0].pathname = longpathname; >> 54 } >> 55 >> 56 static struct tst_test test = { >> (gdb) The host stacktrace points finger to mc_is_defined_asciiz(): >> host stacktrace: >> ==2284884== at 0x580102AB: mc_is_defined_asciiz (mc_main.c:4393) >> ==2284884== by 0x580102AB: check_mem_is_defined_asciiz (mc_main.c:4494) >> ==2284884== by 0x58099E11: vgPlain_client_syscall (syswrap-main.c:2410) >> ==2284884== by 0x58095D7A: handle_syscall (scheduler.c:1214) >> ==2284884== by 0x58097C66: vgPlain_scheduler (scheduler.c:1588) >> ==2284884== by 0x5810153A: thread_wrapper (syswrap-linux.c:102) >> ==2284884== by 0x5810153A: run_a_thread_NORETURN (syswrap-linux.c:154) I've added some debugging output to it using VG_(printf)(...). >> /* Check a zero-terminated ascii string. Tricky -- don't want to >> examine the actual bytes, to find the end, until we're sure it is >> safe to do so. */ >> >> static Bool mc_is_defined_asciiz ( Addr a, Addr* bad_addr, UInt* otag ) >> { >> UWord vabits2; >> >> PROF_EVENT(MCPE_IS_DEFINED_ASCIIZ); >> DEBUG("mc_is_defined_asciiz\n"); >> >> if (otag) *otag = 0; >> if (bad_addr) *bad_addr = 0; >> while (True) { >> PROF_EVENT(MCPE_IS_DEFINED_ASCIIZ_LOOP); >> vabits2 = get_vabits2(a); >> if (VA_BITS2_DEFINED != vabits2) { >> // Error! Nb: Report addressability errors in preference to >> // definedness errors. And don't report definedeness errors unless >> // --undef-value-errors=yes. >> if (bad_addr) { >> *bad_addr = a; >> } >> if (VA_BITS2_NOACCESS == vabits2) { >> return MC_AddrErr; >> } >> if (MC_(clo_mc_level) >= 2) { >> if (otag && MC_(clo_mc_level) == 3) { >> *otag = MC_(helperc_b_load1)( a ); >> } >> return MC_ValueErr; >> } >> } >> /* Ok, a is safe to read. */ >> if (* ((UChar*)a) == 0) { >> return MC_Ok; >> } >> a++; >> } >> } This ^^^ loops over all the 'a' characters, and then it gets to the null terminator. At that point * ((UChar*)a) bombs out. I've added some debug prints and set a conditional breakpoint. Valgrind apparently crashes when it tries to dereference the null terminator. What confuses me is that I can do the dereference in gdb without crash. But when I actually step over that line, it crashes: 4021:a 4022:a 4023:a 4024:a 4025:a 4026:a 4027:a 4028:a 4029:a 4030:a 4031:a 4032:a 4033:a 4034:a 4035:a 4036:a 4037:a 4038:a 4039:a 4040:a 4041:a 4042:a 4043:a 4044:a 4045:a 4046:a 4047:a 4048:a 4049:a 4050:a 4051:a 4052:a 4053:a 4054:a 4055:a 4056:a 4057:a 4058:a 4059:a 4060:a 4061:a 4062:a 4063:a 4064:a 4065:a 4066:a 4067:a 4068:a 4069:a 4070:a 4071:a 4072:a 4073:a 4074:a 4075:a 4076:a 4077:a 4078:a 4079:a 4080:a 4081:a 4082:a 4083:a 4084:a 4085:a 4086:a 4087:a 4088:a 4089:a 4090:a 4091:a 4092:a 4093:a 4094:a 4095:a Breakpoint 1, mc_is_defined_asciiz (a=75886590, bad_addr=<synthetic pointer>, otag=<synthetic pointer>) at mc_main.c:4415 4415 VG_(printf)("%d:%c ", c, * ((UChar*)a)); (gdb) n 4096:a 4417 c++; (gdb) n 4393 vabits2 = get_vabits2(a); (gdb) n 4394 if (VA_BITS2_DEFINED != vabits2) { (gdb) n 4412 if (* ((UChar*)a) == 0) { (gdb) n 4415 VG_(printf)("%d:%c ", c, * ((UChar*)a)); (gdb) n 4097:a 4417 c++; (gdb) n 4393 vabits2 = get_vabits2(a); (gdb) n 4394 if (VA_BITS2_DEFINED != vabits2) { (gdb) n 4412 if (* ((UChar*)a) == 0) { (gdb) p * ((UChar*)a) $21 = 0 '\000' (gdb) # now we are just before the crash. Not sure why gdb can successfully (gdb) # do the dereference ^^^. But now when I actually do the step, it will (gdb) # crash: (gdb) n --2367825-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting I realized that the nulls at the end are just by chance. And looking at mc_is_defined_asciiz() seems like what's not working right are the checks before "/* Ok, a is safe to read. */" . mc_is_defined_asciiz() seems to try hard not to examine the actual bytes (which is what my recently proposed patch does).
The fact, that I can dereference the characters at positions bigger than VKI_PATH_MAX in my test case with GDB, as shown, but it SEGVS under valgrind probably means that the SEGV is consequence of some valgrind emulation specific check, probably stepping out of some allowed buffer.
The string length constraint is specific to given syscall, so I assume it will need to be addressed somewhere in the PRE wrapper, just like PRE(sys_openat) or PRE(sys_open) do that. These wrappers mention // we need something like a "ML_(safe_to_deref_path)" that does a binary search for the addressable length, and maybe nul, which is probably related. This makes me think that my recent patch is probably doing the right thing in right place, but not really sure.
Related files seem to be coregrind/m_aspacemgr/aspacemgr-common.c and coregrind/m_aspacemgr/aspacemgr-linux.c but I'm not quite getting what happens there yet.
> Could you explain exactly what caused the crash?
Honestly, although I think I kind of get some context here, I'm still puzzled.
Created attachment 185660 [details] updated patch (In reply to Paul Floyd from comment #5) > Does this also affect other syscalls where an input path is longer than > PATH_MAX? I think so. I think PRE_MEM_RASCIIZ() only works for properly null terminated strings. > The call to ML_(fd_at_check_allowed) should be passing in "mknodat". Also I > prefer to use the same argument names as the man page (where syscalls have a > direct libc equivalent). If the user sees our error and then looks at "man > {error-syscall -function}" then its clearer if all the names match. So that > would be "mknodat(dirfd)". Understood, makes sense. Attaching updated patch. And it's now being tested here: https://builder.sourceware.org/buildbot/#/changes/98002 . |