Valgrind build itself with -mpreferred-stack-boundary=2 on x86 (32 bit) targets. On the other hand, SSE instructions expect things to be aligned on 16 byte boundary (e.g. -mpreferred-stack-boundary=4). This won't be reported at compile time and results in crashes at runtime, e.g. vgdb simply does not work at all and crashes out immediately when calling into glibc code which moves things to and from XMM registers (because it was built with SSE enabled).
What platform / glibc version is this with?
Yocto/qemux86. We use the following flags across the stack: -march=core2 -mtune=core2 -msse3 -mfpmath=sse glibc is at 2.36.