gcc puts __asm__ code within a function into a different partition, compared to __asm__code outside of functions (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84861). This means, that with Link-Time-Optimization the __asm__ outside of functions, defining a function, is not callable from other object files. commit ab773096d "Fix 338252 - building valgrind with -flto (link time optimisation) fails" compiles for this reason coregrind/m_libcsetjmp.c and coregrind/m_main.c explicitly without -flto. The compilation process will be much easier, if m_libcsetjmp.c defines the functions VG_MINIMAL_LONGJMP and VG_MINIMAL_SETJMP outside of __asm__ as suggested at the above link. (I guess the code will then also be easier to read and maintain). Evaluate the change below, apply something similar to the other targets and for m_main.c, and remove in coregrind/Makefile.am the with/without -lflto distinction logic when compiling m_libcsetjmp.c and m_main.c. Consider also the discussion at https://bugs.kde.org/show_bug.cgi?id=338252 . diff --git a/coregrind/m_libcsetjmp.c b/coregrind/m_libcsetjmp.c index 68c101e..5626748 100644 --- a/coregrind/m_libcsetjmp.c +++ b/coregrind/m_libcsetjmp.c @@ -382,25 +382,10 @@ __asm__( #if defined(VGP_amd64_linux) || defined(VGP_amd64_darwin) || \ defined(VGP_amd64_solaris) -__asm__( -".text" "\n" -"" "\n" - -#if defined(VGP_amd64_linux) || defined(VGP_amd64_solaris) -".global VG_MINIMAL_SETJMP" "\n" // rdi = jmp_buf -"VG_MINIMAL_SETJMP:" "\n" - -#elif defined(VGP_amd64_darwin) -".globl _VG_MINIMAL_SETJMP" "\n" // rdi = jmp_buf -"_VG_MINIMAL_SETJMP:" "\n" +__attribute__((returns_twice)) +UWord VG_MINIMAL_SETJMP(VG_MINIMAL_JMP_BUF(_env)) { -#else -# error "Huh?" -#endif - -" movq %rax, 0(%rdi)" "\n" -" movq %rbx, 8(%rdi)" "\n" -" movq %rcx, 16(%rdi)" "\n" +__asm__( " movq %rdx, 24(%rdi)" "\n" " movq %rdi, 32(%rdi)" "\n" " movq %rsi, 40(%rdi)" "\n" @@ -421,22 +406,13 @@ __asm__( " movq $0, %rax" "\n" " ret" "\n" "" "\n" + ); +} - -#if defined(VGP_amd64_linux) || defined(VGP_amd64_solaris) -".global VG_MINIMAL_LONGJMP" "\n" -"VG_MINIMAL_LONGJMP:" "\n" // rdi = jmp_buf - -#elif defined(VGP_amd64_darwin) -".globl _VG_MINIMAL_LONGJMP" "\n" -"_VG_MINIMAL_LONGJMP:" "\n" // rdi = jmp_buf - -#else -# error "Huh?" -#endif +__attribute__((noreturn)) +void VG_MINIMAL_LONGJMP(VG_MINIMAL_JMP_BUF(_env)) { + __asm__( // skip restoring rax; it's pointless -" movq 8(%rdi), %rbx" "\n" -" movq 16(%rdi), %rcx" "\n" " movq 24(%rdi), %rdx" "\n" // defer restoring rdi; we still need it " movq 40(%rdi), %rsi" "\n" @@ -465,12 +441,8 @@ __asm__( // address space. " jmp *%rax" "\n" "" "\n" - -#if !defined(VGP_amd64_darwin) -".previous" "\n" -#endif -); - + ); +} #endif /* VGP_amd64_linux || VGP_amd64_darwin || VGP_amd64_solaris */
Thanks for filing the bug, some little comments ... I suppose we can do similar changes for other platforms for the SETJMP/LONGJMP, I am not sure that _start can be handled the same way (i.e. defined as a C function). E.g. _start has instructions right at the beginning to setup the stack, while a normal function assumes the stack is already setup. I do not have the experience/knowledge and access to the required platforms needed to do/evaluate the needed changes.