Created attachment 147688 [details] Attached executable for reproducing the bug. I built the latest btrfsprogs on my system with profile feedback (-fprofile-use), the program runs normally on my cpu but when I try to perform [liveuser@localhost-live btrfs-progs-v5.16.2]$ valgrind --tool=callgrind --dump-instr=yes --branch-sim=yes --collect-jumps=yes ./btrfs.static check -p --init-csum-tree /tmp/newly_btrfs_volume_example_with_sha256 with the executable attached, it fails with : Starting repair. Opening filesystem to check... vex amd64->IR: unhandled instruction bytes: 0x62 0xF1 0x7F 0x28 0x7F 0x2 0xF 0x87 0x95 0xA1 vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0 vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0 ==63828== valgrind: Unrecognised instruction at address 0x45dade. ==63828== at 0x45DADE: btrfs_csum_data.constprop.0.isra.0 (disk-io.c:147) ==63828== by 0x45D949: btrfs_check_super (disk-io.c:1666) ==63828== by 0x45D876: btrfs_read_dev_super (disk-io.c:1879) ==63828== by 0x45D5A9: btrfs_scan_one_device (volumes.c:548) ==63828== by 0x45D4E3: check_mounted_where.constprop.0 (open-utils.c:61) ==63828== by 0x45D489: check_mounted (open-utils.c:130) ==63828== by 0x45CD93: cmd_check (main.c:10455) ==63828== by 0x45C6DD: main (commands.h:125) ==63828== Your program just tried to execute an instruction that Valgrind ==63828== did not recognise. There are two possible reasons for this. ==63828== 1. Your program has a bug and erroneously jumped to a non-code ==63828== location. If you are running Memcheck and you just saw a ==63828== warning about a bad jump, it's probably your program's fault. ==63828== 2. The instruction is legitimate but Valgrind doesn't handle it, ==63828== i.e. it's Valgrind's fault. If you think this is the case or ==63828== you are not sure, please let us know and we'll try to fix it. ==63828== Either way, Valgrind will now raise a SIGILL signal which will ==63828== probably kill your program. ==63828== ==63828== Process terminating with default action of signal 4 (SIGILL): dumping core ==63828== Illegal opcode at address 0x45DADE ==63828== at 0x45DADE: btrfs_csum_data.constprop.0.isra.0 (disk-io.c:147) ==63828== by 0x45D949: btrfs_check_super (disk-io.c:1666) ==63828== by 0x45D876: btrfs_read_dev_super (disk-io.c:1879) ==63828== by 0x45D5A9: btrfs_scan_one_device (volumes.c:548) ==63828== by 0x45D4E3: check_mounted_where.constprop.0 (open-utils.c:61) ==63828== by 0x45D489: check_mounted (open-utils.c:130) ==63828== by 0x45CD93: cmd_check (main.c:10455) ==63828== by 0x45C6DD: main (commands.h:125) ==63828== ==63828== Events : Ir Bc Bcm Bi Bim ==63828== Collected : 791898 132272 8055 1806 620 ==63828== ==63828== I refs: 791,898 ==63828== ==63828== Branches: 134,078 (132,272 cond + 1,806 ind) ==63828== Mispredicts: 8,675 ( 8,055 cond + 620 ind) ==63828== Mispred rate: 6.5% ( 6.1% + 34.3% ) But unlike what valgrind says 0x62 0xF1 0x7F 0x28 0x7F 0x2 0xF 0x87 0x95 0xA1 consists of 2 instructions instead of 1 and thus in reality isn’t a single invalid instruction but : 000000000045dac0 <btrfs_csum_data.constprop.0.isra.0>: 45dade: 62 f1 7f 28 7f 02 vmovdqu8 YMMWORD PTR [rdx],ymm0 45dae4: 0f 87 95 a1 fa ff ja 407c7f <btrfs_csum_data.constprop.0.isra.0.cold+0x19> SOFTWARE/OS VERSIONS Linux/KDE Plasma: Fedora 37. Rpms executables only Please fix this !
As seen at the end of the log, this is about the unhandled (=unknown to Valgrind) instruction vmovdqu8. So this is not specific to Callgrind, but about supporting the instruction in Valgrind in general. According to https://en.wikipedia.org/wiki/AVX-512, this is from the AVX512 ISA extension. Valgrind does not (yet) support AVX512, so this is expected. Your executable seems to unconditionally assume to run on a processor with AVX512. It first should check with CPUID if the processor supports AVX512 before using the instruction. Then it would find out that the virtual Valgrind CPU does not support it and so the code has to use a code variant not using AVX512... (you can ask the compiler to add such checks).
The binary runs correctly without Valdrind, so are you sure about avx512? The instruction looks like 256 bits.
Well it will run fine with valgrind (assuming the CPU supports the AVX512 extensions) but when running under valgrind you are running on valgrind's emulated CPU instead of the real one and that doesn't support he AVX512 extensions yet.
I meant "without valgrind" there of course...
Except it doesn t (I m having real avx512 hardware I can provide for testing) because Valgrind is disassembling the instruction incorrectly. 0x62 0xF1 0x7F 0x28 0x7F 0x2 0xF 0x87 0x95 0xA1 instead of 0x62 0xF1 0x7F 0x28 0x7F 0x2
That looks the same to me if you ignore the extra bytes. The valgrind decoder doesn't know how long the instruction is (because it doesn't understand it) so it just dumps enough bytes to guarantee getting the whole thing.
*** This bug has been marked as a duplicate of bug 383010 ***