Bug 392180 - LTO build simplification: __asm__ in m_libcsetjmp.c
Summary: LTO build simplification: __asm__ in m_libcsetjmp.c
Status: REPORTED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: unspecified
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-03-22 14:20 UTC by Дилян Палаузов
Modified: 2018-03-23 17:48 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Дилян Палаузов 2018-03-22 14:20:22 UTC
gcc puts __asm__ code within a function into a different partition, compared to __asm__code outside of functions (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84861).  This means, that with Link-Time-Optimization the __asm__ outside of functions, defining a function, is not callable from other object files.

commit ab773096d "Fix 338252 - building valgrind with -flto (link time optimisation) fails" compiles for this reason coregrind/m_libcsetjmp.c and coregrind/m_main.c explicitly without -flto.

The compilation process will be much easier, if m_libcsetjmp.c defines the functions VG_MINIMAL_LONGJMP and VG_MINIMAL_SETJMP outside of __asm__ as suggested at the above link.  (I guess the code will then also be easier to read and maintain).

Evaluate the change below, apply something similar to the other targets and for m_main.c, and remove in coregrind/Makefile.am the with/without -lflto distinction logic when compiling m_libcsetjmp.c and m_main.c.

Consider also the discussion at https://bugs.kde.org/show_bug.cgi?id=338252 .

diff --git a/coregrind/m_libcsetjmp.c b/coregrind/m_libcsetjmp.c
index 68c101e..5626748 100644
--- a/coregrind/m_libcsetjmp.c
+++ b/coregrind/m_libcsetjmp.c
@@ -382,25 +382,10 @@ __asm__(
 #if defined(VGP_amd64_linux) || defined(VGP_amd64_darwin) || \
     defined(VGP_amd64_solaris)
 
-__asm__(
-".text"  "\n"
-""       "\n"
-
-#if defined(VGP_amd64_linux) || defined(VGP_amd64_solaris)
-".global VG_MINIMAL_SETJMP"  "\n"  // rdi = jmp_buf
-"VG_MINIMAL_SETJMP:"  "\n"
-
-#elif defined(VGP_amd64_darwin)
-".globl _VG_MINIMAL_SETJMP"  "\n"  // rdi = jmp_buf
-"_VG_MINIMAL_SETJMP:"  "\n"
+__attribute__((returns_twice))
+UWord VG_MINIMAL_SETJMP(VG_MINIMAL_JMP_BUF(_env)) {
 
-#else
-#   error "Huh?"
-#endif
-
-"        movq   %rax,   0(%rdi)"   "\n"
-"        movq   %rbx,   8(%rdi)"   "\n"
-"        movq   %rcx,  16(%rdi)"   "\n"
+__asm__(
 "        movq   %rdx,  24(%rdi)"   "\n"
 "        movq   %rdi,  32(%rdi)"   "\n"
 "        movq   %rsi,  40(%rdi)"   "\n"
@@ -421,22 +406,13 @@ __asm__(
 "        movq   $0, %rax"          "\n"
 "        ret"                      "\n"
 ""       "\n"
+       );
+}
 
-
-#if defined(VGP_amd64_linux) || defined(VGP_amd64_solaris)
-".global VG_MINIMAL_LONGJMP"  "\n"
-"VG_MINIMAL_LONGJMP:"  "\n"    // rdi = jmp_buf
-
-#elif defined(VGP_amd64_darwin)
-".globl _VG_MINIMAL_LONGJMP"  "\n"
-"_VG_MINIMAL_LONGJMP:"  "\n"    // rdi = jmp_buf
-
-#else
-#   error "Huh?"
-#endif
+__attribute__((noreturn))
+void  VG_MINIMAL_LONGJMP(VG_MINIMAL_JMP_BUF(_env)) {
+  __asm__(
          // skip restoring rax; it's pointless
-"        movq     8(%rdi),  %rbx"    "\n"
-"        movq    16(%rdi),  %rcx"    "\n"
 "        movq    24(%rdi),  %rdx"    "\n"
          // defer restoring rdi; we still need it
 "        movq    40(%rdi),  %rsi"    "\n"
@@ -465,12 +441,8 @@ __asm__(
          // address space.
 "        jmp *%rax"                  "\n"
 ""       "\n"
-
-#if !defined(VGP_amd64_darwin)
-".previous"       "\n"
-#endif
-);
-
+         );
+}
 #endif /* VGP_amd64_linux || VGP_amd64_darwin || VGP_amd64_solaris */
Comment 1 Philippe Waroquiers 2018-03-23 17:48:46 UTC
Thanks for filing the bug, some little comments ...

I suppose we can do similar changes for other platforms for the SETJMP/LONGJMP,

I am not sure that _start can be handled the same way (i.e. defined as
a C function).
E.g. _start has instructions right at the beginning to setup the stack,
while a normal function assumes the stack is already setup.

I do not have the experience/knowledge and access to the required platforms
needed to do/evaluate the needed changes.