Bug 392180

Summary: LTO build simplification: __asm__ in m_libcsetjmp.c
Product: [Developer tools] valgrind Reporter: Дилян Палаузов <dilyan.palauzov>
Component: generalAssignee: Julian Seward <jseward>
Status: REPORTED ---    
Severity: normal CC: philippe.waroquiers
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: Other   
OS: Linux   
Latest Commit: Version Fixed In:

Description Дилян Палаузов 2018-03-22 14:20:22 UTC
gcc puts __asm__ code within a function into a different partition, compared to __asm__code outside of functions (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84861).  This means, that with Link-Time-Optimization the __asm__ outside of functions, defining a function, is not callable from other object files.

commit ab773096d "Fix 338252 - building valgrind with -flto (link time optimisation) fails" compiles for this reason coregrind/m_libcsetjmp.c and coregrind/m_main.c explicitly without -flto.

The compilation process will be much easier, if m_libcsetjmp.c defines the functions VG_MINIMAL_LONGJMP and VG_MINIMAL_SETJMP outside of __asm__ as suggested at the above link.  (I guess the code will then also be easier to read and maintain).

Evaluate the change below, apply something similar to the other targets and for m_main.c, and remove in coregrind/Makefile.am the with/without -lflto distinction logic when compiling m_libcsetjmp.c and m_main.c.

Consider also the discussion at https://bugs.kde.org/show_bug.cgi?id=338252 .

diff --git a/coregrind/m_libcsetjmp.c b/coregrind/m_libcsetjmp.c
index 68c101e..5626748 100644
--- a/coregrind/m_libcsetjmp.c
+++ b/coregrind/m_libcsetjmp.c
@@ -382,25 +382,10 @@ __asm__(
 #if defined(VGP_amd64_linux) || defined(VGP_amd64_darwin) || \
     defined(VGP_amd64_solaris)
 
-__asm__(
-".text"  "\n"
-""       "\n"
-
-#if defined(VGP_amd64_linux) || defined(VGP_amd64_solaris)
-".global VG_MINIMAL_SETJMP"  "\n"  // rdi = jmp_buf
-"VG_MINIMAL_SETJMP:"  "\n"
-
-#elif defined(VGP_amd64_darwin)
-".globl _VG_MINIMAL_SETJMP"  "\n"  // rdi = jmp_buf
-"_VG_MINIMAL_SETJMP:"  "\n"
+__attribute__((returns_twice))
+UWord VG_MINIMAL_SETJMP(VG_MINIMAL_JMP_BUF(_env)) {
 
-#else
-#   error "Huh?"
-#endif
-
-"        movq   %rax,   0(%rdi)"   "\n"
-"        movq   %rbx,   8(%rdi)"   "\n"
-"        movq   %rcx,  16(%rdi)"   "\n"
+__asm__(
 "        movq   %rdx,  24(%rdi)"   "\n"
 "        movq   %rdi,  32(%rdi)"   "\n"
 "        movq   %rsi,  40(%rdi)"   "\n"
@@ -421,22 +406,13 @@ __asm__(
 "        movq   $0, %rax"          "\n"
 "        ret"                      "\n"
 ""       "\n"
+       );
+}
 
-
-#if defined(VGP_amd64_linux) || defined(VGP_amd64_solaris)
-".global VG_MINIMAL_LONGJMP"  "\n"
-"VG_MINIMAL_LONGJMP:"  "\n"    // rdi = jmp_buf
-
-#elif defined(VGP_amd64_darwin)
-".globl _VG_MINIMAL_LONGJMP"  "\n"
-"_VG_MINIMAL_LONGJMP:"  "\n"    // rdi = jmp_buf
-
-#else
-#   error "Huh?"
-#endif
+__attribute__((noreturn))
+void  VG_MINIMAL_LONGJMP(VG_MINIMAL_JMP_BUF(_env)) {
+  __asm__(
          // skip restoring rax; it's pointless
-"        movq     8(%rdi),  %rbx"    "\n"
-"        movq    16(%rdi),  %rcx"    "\n"
 "        movq    24(%rdi),  %rdx"    "\n"
          // defer restoring rdi; we still need it
 "        movq    40(%rdi),  %rsi"    "\n"
@@ -465,12 +441,8 @@ __asm__(
          // address space.
 "        jmp *%rax"                  "\n"
 ""       "\n"
-
-#if !defined(VGP_amd64_darwin)
-".previous"       "\n"
-#endif
-);
-
+         );
+}
 #endif /* VGP_amd64_linux || VGP_amd64_darwin || VGP_amd64_solaris */
Comment 1 Philippe Waroquiers 2018-03-23 17:48:46 UTC
Thanks for filing the bug, some little comments ...

I suppose we can do similar changes for other platforms for the SETJMP/LONGJMP,

I am not sure that _start can be handled the same way (i.e. defined as
a C function).
E.g. _start has instructions right at the beginning to setup the stack,
while a normal function assumes the stack is already setup.

I do not have the experience/knowledge and access to the required platforms
needed to do/evaluate the needed changes.