On ARM64, 16-byte store instructions such as `str q0, [...]` or `stp x1, x2, [...]` are reported as "Invalid write of size 8". Likewise for 32-byte store `stp q0, q1, [...]` and probably also longer stores like `st1`. Example: include <stdlib.h> int main(void) { char *ptr = malloc(256); ptr += 256; asm volatile ("fmov d0, #1.5 ; str q0, [%0]" : : "r" (ptr) : "q0", "memory"); return 0; } `valgrind ./a.out` prints out: ==16170== Invalid write of size 8 ==16170== at 0x1087BC: main (in /home/nate/bugs/valgrind/a.out) ==16170== Address 0x49ec140 is 0 bytes after a block of size 256 alloc'd ==16170== at 0x484F058: malloc (vg_replace_malloc.c:380) ==16170== by 0x1087A3: main (in /home/nate/bugs/valgrind/a.out) I expected "Invalid write of size 16". Reproduced with latest 3.18.0 from git, as well as 3.17.0 from Ubuntu 21.04 package.
To expand on this a bit, here's a sweep through all of the 'str' sizes: #include <stdlib.h> int main(void) { char *ptr = malloc(256); ptr += 256; asm volatile ("str b0, [%0]" :: "r" (ptr) : "b0", "memory"); asm volatile ("str h0, [%0]" :: "r" (ptr) : "h0", "memory"); asm volatile ("str s0, [%0]" :: "r" (ptr) : "s0", "memory"); asm volatile ("str d0, [%0]" :: "r" (ptr) : "d0", "memory"); asm volatile ("str q0, [%0]" :: "r" (ptr) : "q0", "memory"); free(ptr - 256); return 0; } which under the latest valgrind release shows the expected sizes except for q ==5990== Invalid write of size 1 ==5990== at 0x400608: main (strtest.c:5) ==5990== Address 0x4a1a140 is 0 bytes after a block of size 256 alloc'd ==5990== at 0x48682A4: malloc (vg_replace_malloc.c:431) ==5990== by 0x4005F3: main (strtest.c:3) ==5990== ==5990== Invalid write of size 2 ==5990== at 0x400610: main (strtest.c:6) ==5990== Address 0x4a1a140 is 0 bytes after a block of size 256 alloc'd ==5990== at 0x48682A4: malloc (vg_replace_malloc.c:431) ==5990== by 0x4005F3: main (strtest.c:3) ==5990== ==5990== Invalid write of size 4 ==5990== at 0x400618: main (strtest.c:7) ==5990== Address 0x4a1a140 is 0 bytes after a block of size 256 alloc'd ==5990== at 0x48682A4: malloc (vg_replace_malloc.c:431) ==5990== by 0x4005F3: main (strtest.c:3) ==5990== ==5990== Invalid write of size 8 ==5990== at 0x400620: main (strtest.c:8) ==5990== Address 0x4a1a140 is 0 bytes after a block of size 256 alloc'd ==5990== at 0x48682A4: malloc (vg_replace_malloc.c:431) ==5990== by 0x4005F3: main (strtest.c:3) ==5990== ==5990== Invalid write of size 8 ==5990== at 0x400628: main (strtest.c:9) ==5990== Address 0x4a1a140 is 0 bytes after a block of size 256 alloc'd ==5990== at 0x48682A4: malloc (vg_replace_malloc.c:431) ==5990== by 0x4005F3: main (strtest.c:3) Decode for the 'q' case does look correct (GET:V128, so it would be a 16-byte value) (arm64) 0x400628: str q0, [x0, #0] ------ IMark(0x400628, 4, 0) ------ t21 = Add64(GET:I64(16),0x0:I64) STle(t21) = GET:V128(320) PUT(272) = 0x40062C:I64
With stp (store pair), there's a similar problem. Pairs of 32, 64 and 128-bit values all get reported as the wrong write sizes. asm volatile ("stp s0, s1, [%0]" :: "r" (ptr) : "s0", "s1", "memory"); asm volatile ("stp d0, d1, [%0]" :: "r" (ptr) : "d0", "d1", "memory"); asm volatile ("stp q0, q1, [%0]" :: "r" (ptr) : "q0", "q1", "memory"); ==26048== Invalid write of size 4 ==26048== at 0x400630: main (strtest.c:12) ==26048== Address 0x4a1a140 is 0 bytes after a block of size 256 alloc'd ==26048== at 0x48682A4: malloc (vg_replace_malloc.c:431) ==26048== by 0x4005F3: main (strtest.c:3) ==26048== ==26048== Invalid write of size 8 ==26048== at 0x400638: main (strtest.c:13) ==26048== Address 0x4a1a140 is 0 bytes after a block of size 256 alloc'd ==26048== at 0x48682A4: malloc (vg_replace_malloc.c:431) ==26048== by 0x4005F3: main (strtest.c:3) ==26048== ==26048== Invalid write of size 8 ==26048== at 0x400640: main (strtest.c:14) ==26048== Address 0x4a1a140 is 0 bytes after a block of size 256 alloc'd ==26048== at 0x48682A4: malloc (vg_replace_malloc.c:431) ==26048== by 0x4005F3: main (strtest.c:3)