Bug 451837 - When profiling this specific executable, valgrind fails to break down 0x62 0xF1 0x7F 0x28 0x7F 0x2 0xF 0x87 0x95 0xA1 into 2 separate instructions
Summary: When profiling this specific executable, valgrind fails to break down 0x62 0x...
Status: RESOLVED DUPLICATE of bug 383010
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: 3.19 GIT
Platform: Fedora RPMs Linux
: NOR crash
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-03-23 22:29 UTC by ytrezq
Modified: 2024-02-25 02:10 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Attached executable for reproducing the bug. (3.90 MB, application/x-executable)
2022-03-23 22:29 UTC, ytrezq
Details

Note You need to log in before you can comment on or make changes to this bug.
Description ytrezq 2022-03-23 22:29:26 UTC
Created attachment 147688 [details]
Attached executable for reproducing the bug.

I built the latest btrfsprogs on my system with profile feedback (-fprofile-use), the program runs normally on my cpu but when I try to perform
[liveuser@localhost-live btrfs-progs-v5.16.2]$ valgrind --tool=callgrind --dump-instr=yes --branch-sim=yes --collect-jumps=yes ./btrfs.static check -p --init-csum-tree /tmp/newly_btrfs_volume_example_with_sha256
with the executable attached, it fails with :
Starting repair.
Opening filesystem to check...
vex amd64->IR: unhandled instruction bytes: 0x62 0xF1 0x7F 0x28 0x7F 0x2 0xF 0x87 0x95 0xA1
vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE
vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=0
==63828== valgrind: Unrecognised instruction at address 0x45dade.
==63828==    at 0x45DADE: btrfs_csum_data.constprop.0.isra.0 (disk-io.c:147)
==63828==    by 0x45D949: btrfs_check_super (disk-io.c:1666)
==63828==    by 0x45D876: btrfs_read_dev_super (disk-io.c:1879)
==63828==    by 0x45D5A9: btrfs_scan_one_device (volumes.c:548)
==63828==    by 0x45D4E3: check_mounted_where.constprop.0 (open-utils.c:61)
==63828==    by 0x45D489: check_mounted (open-utils.c:130)
==63828==    by 0x45CD93: cmd_check (main.c:10455)
==63828==    by 0x45C6DD: main (commands.h:125)
==63828== Your program just tried to execute an instruction that Valgrind
==63828== did not recognise.  There are two possible reasons for this.
==63828== 1. Your program has a bug and erroneously jumped to a non-code
==63828==    location.  If you are running Memcheck and you just saw a
==63828==    warning about a bad jump, it's probably your program's fault.
==63828== 2. The instruction is legitimate but Valgrind doesn't handle it,
==63828==    i.e. it's Valgrind's fault.  If you think this is the case or
==63828==    you are not sure, please let us know and we'll try to fix it.
==63828== Either way, Valgrind will now raise a SIGILL signal which will
==63828== probably kill your program.
==63828== 
==63828== Process terminating with default action of signal 4 (SIGILL): dumping core
==63828==  Illegal opcode at address 0x45DADE
==63828==    at 0x45DADE: btrfs_csum_data.constprop.0.isra.0 (disk-io.c:147)
==63828==    by 0x45D949: btrfs_check_super (disk-io.c:1666)
==63828==    by 0x45D876: btrfs_read_dev_super (disk-io.c:1879)
==63828==    by 0x45D5A9: btrfs_scan_one_device (volumes.c:548)
==63828==    by 0x45D4E3: check_mounted_where.constprop.0 (open-utils.c:61)
==63828==    by 0x45D489: check_mounted (open-utils.c:130)
==63828==    by 0x45CD93: cmd_check (main.c:10455)
==63828==    by 0x45C6DD: main (commands.h:125)
==63828== 
==63828== Events    : Ir Bc Bcm Bi Bim
==63828== Collected : 791898 132272 8055 1806 620
==63828== 
==63828== I   refs:      791,898
==63828== 
==63828== Branches:      134,078  (132,272 cond + 1,806 ind)
==63828== Mispredicts:     8,675  (  8,055 cond +   620 ind)
==63828== Mispred rate:      6.5% (    6.1%     +  34.3%   )

But unlike what valgrind says 0x62 0xF1 0x7F 0x28 0x7F 0x2 0xF 0x87 0x95 0xA1 consists of 2 instructions instead of 1 and thus in reality isn’t a single invalid instruction but :

000000000045dac0 <btrfs_csum_data.constprop.0.isra.0>:
  45dade:       62 f1 7f 28 7f 02       vmovdqu8 YMMWORD PTR [rdx],ymm0
  45dae4:       0f 87 95 a1 fa ff       ja     407c7f <btrfs_csum_data.constprop.0.isra.0.cold+0x19>

SOFTWARE/OS VERSIONS
Linux/KDE Plasma: Fedora 37. Rpms executables only

Please fix this !
Comment 1 Josef Weidendorfer 2022-03-28 19:07:16 UTC
As seen at the end of the log, this is about the unhandled (=unknown to Valgrind) instruction vmovdqu8.
So this is not specific to Callgrind, but about supporting the instruction in Valgrind in general.

According to https://en.wikipedia.org/wiki/AVX-512, this is from the AVX512 ISA extension.
Valgrind does not (yet) support AVX512, so this is expected.

Your executable seems to unconditionally assume to run on a processor with AVX512.
It first should check with CPUID if the processor supports AVX512 before using the instruction.
Then it would find out that the virtual Valgrind CPU does not support it and so the code has
to use a code variant not using AVX512... (you can ask the compiler to add such checks).
Comment 2 ytrezq 2022-03-28 19:21:11 UTC
The binary runs correctly without Valdrind, so are you sure about avx512? The instruction looks like 256 bits.
Comment 3 Tom Hughes 2022-03-28 20:02:15 UTC
Well it will run fine with valgrind (assuming the CPU supports the AVX512 extensions) but when running under valgrind you are running on valgrind's emulated CPU instead of the real one and that doesn't support he AVX512 extensions yet.
Comment 4 Tom Hughes 2022-03-28 20:02:32 UTC
I meant "without valgrind" there of course...
Comment 5 ytrezq 2022-03-28 22:14:23 UTC
Except it doesn t (I m having real avx512 hardware I can provide for testing) because Valgrind is disassembling the instruction incorrectly. 0x62 0xF1 0x7F 0x28 0x7F 0x2 0xF 0x87 0x95 0xA1 instead of 0x62 0xF1 0x7F 0x28 0x7F 0x2
Comment 6 Tom Hughes 2022-03-28 23:13:59 UTC
That looks the same to me if you ignore the extra bytes. The valgrind decoder doesn't know how long the instruction is (because it doesn't understand it) so it just dumps enough bytes to guarantee getting the whole thing.
Comment 7 Tom Hughes 2022-06-14 21:30:36 UTC

*** This bug has been marked as a duplicate of bug 383010 ***