Created attachment 88247 [details] VEX/auxprogs/genoffsets.s generated by passing -flto to CFLAGS If I set CFLAGS to contain -flto during ./configure, VEX/auxprogs/genoffsets.s is generated as attached. In turn VEX/pub/libvex_guest_offsets.h is emptry and OFFSET_amd64_R10, OFFSET_amd64_RSI in coregrind/m_syswrap/syswrap-main.c are undefined . The compilation fails. A possible fix, not necessary the best one, is to check during ./configure if CFLAGS contains -flto and abort then.
I cannot reproduce this and amd64: svn co svn://svn.valgrind.org/valgrind/trunk trunk cd trunk ./autogen.sh CFLAGS=-flto ./configure --prefix=/tmp/bug make >& LOG libvex_guest_offsets.h gets generated OK. There is however a link error: ../coregrind/link_tool_exe_linux 0x38000000 gcc -Wno-long-long -flto -Wwrite-strings -fno-stack-protector -o memcheck-amd64-linux -m64 -O2 -g -Wall -Wmissing-prototypes -Wshadow -Wpointer-arith -Wstrict-prototypes -Wmissing-declarations -Wno-format-zero-length -Wno-tautological-compare -fno-strict-aliasing -fno-builtin -fomit-frame-pointer -O2 -static -nodefaultlibs -nostartfiles -u _start -m64 memcheck_amd64_linux-mc_leakcheck.o memcheck_amd64_linux-mc_malloc_wrappers.o memcheck_amd64_linux-mc_main.o memcheck_amd64_linux-mc_translate.o memcheck_amd64_linux-mc_machine.o memcheck_amd64_linux-mc_errors.o ../coregrind/libcoregrind-amd64-linux.a ../VEX/libvex-amd64-linux.a -lgcc `vgPlain_interim_stack' referenced in section `.text' of /tmp/ccGkHIT6.ltrans0.ltrans.o: defined in discarded section `COMMON' of libcoregrind_amd64_linux_a-m_main.o (symbol from plugin) collect2: error: ld returned 1 exit status
Some weeks ago, I did a (kludgy) trial to use -flto for compiling Valgrind. See http://sourceforge.net/p/valgrind/mailman/message/32641134/ that contains a (kludgy, non commitable) patch to enable -flto and solve the problems encountered (e.g. genoffsets.s and the linking problem) The conclusion was that lto might somewhat help performance, but more measurements were needed (in particular on more modern CPU)
Created attachment 88254 [details] Philippe's possibly kludgy -flto enablement patch I'm attaching Philippe's patch here for convenience. I can confirm that r14273 can be built successfully with that patch on Ubuntu 14.04.1 using gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2
I used gcc-4.9.1. To reproduce the problem, just put in VEX/Makefile on line 2678 -flto, as shown below, and regenereate auxprogs/genoffsets.s + pub/libvex_guest_offsets.h 2676 rm -f auxprogs/genoffsets.s 2677 $(mkdir_p) auxprogs pub 2678 $(CC) $(CFLAGS) -flto \ 2679 $(LIBVEX_CFLAGS) \ 2680 $(AM_CFLAGS_AMD64_LINUX) \ 2681 -O -S -o auxprogs/genoffsets.s \ 2682 $(srcdir)/auxprogs/genoffsets.c I recommend abolishing ./autogen.sh and using instead "autoreconf -i" (I have used it so far, as I it worked and I didn't know that ./autogen.sh exists). It does exactly the same (+searches for gettext). The proposed patch fixes the problem with the generation of VEX/pub/libvex_guest_offsets.h . With -flto I also get link problems, even using -Wl,--plugin=/usr/lib/bfd-plugins/liblto_plugin.so make[3]: Entering directory '/mnt/new/home/svn/valgrind/memcheck' ../coregrind/link_tool_exe_linux 0x38000000 gcc -Wno-long-long -flto -pipe -O2 -Wl,--hash-style=gnu -Wl,-O1 -Wl,-z,relro -Wwrite-strings -fno-stack-protector -L/usr/lib64 -L/lib64 -Wl,--plugin=/usr/lib/bfd-plugins/liblto_plugin.so -Wl,-S -Wl,--hash-style=gnu -Wl,-O1 -Wl,-z,relro -o memcheck-amd64-linux -m64 -fuse-linker-plugin -flto -ffat-lto-objects -O2 -g -Wall -Wmissing-prototypes -Wshadow -Wpointer-arith -Wstrict-prototypes -Wmissing-declarations -Wno-format-zero-length -Wno-tautological-compare -fno-strict-aliasing -fno-builtin -fomit-frame-pointer -O2 -static -nodefaultlibs -nostartfiles -u _start -m64 memcheck_amd64_linux-mc_leakcheck.o memcheck_amd64_linux-mc_malloc_wrappers.o memcheck_amd64_linux-mc_main.o memcheck_amd64_linux-mc_translate.o memcheck_amd64_linux-mc_machine.o memcheck_amd64_linux-mc_errors.o ../coregrind/libcoregrind-amd64-linux.a ../VEX/libvex-amd64-linux.a -lgcc /tmp/ccoQPwJS.ltrans8.ltrans.o: In function `error': /home/svn/valgrind/memcheck/m_gdbserver/utils.c:51: undefined reference to `VG_MINIMAL_LONGJMP' /tmp/ccoQPwJS.ltrans8.ltrans.o: In function `gdbserver_terminate': /home/svn/valgrind/memcheck/m_gdbserver/server.c:947: undefined reference to `VG_MINIMAL_SETJMP' /tmp/ccoQPwJS.ltrans8.ltrans.o: In function `handle_syscall': /home/svn/valgrind/memcheck/m_scheduler/scheduler.c:1085: undefined reference to `VG_MINIMAL_SETJMP' /tmp/ccoQPwJS.ltrans8.ltrans.o: In function `server_main': /home/svn/valgrind/memcheck/m_gdbserver/server.c:965: undefined reference to `VG_MINIMAL_SETJMP' /tmp/ccoQPwJS.ltrans8.ltrans.o: In function `call_gdbserver.lto_priv.2226': /home/svn/valgrind/memcheck/m_gdbserver/m_gdbserver.c:768: undefined reference to `VG_MINIMAL_LONGJMP' /tmp/ccoQPwJS.ltrans8.ltrans.o: In function `run_thread_for_a_while': /home/svn/valgrind/memcheck/m_scheduler/scheduler.c:931: undefined reference to `VG_MINIMAL_SETJMP' /tmp/ccoQPwJS.ltrans19.ltrans.o: In function `barf.lto_priv.450': /home/svn/valgrind/memcheck/m_debuginfo/readdwarf3.c:5061: undefined reference to `VG_MINIMAL_LONGJMP' /tmp/ccoQPwJS.ltrans19.ltrans.o: In function `vgModuleLocal_new_dwarf3_reader': /home/svn/valgrind/memcheck/m_debuginfo/readdwarf3.c:5089: undefined reference to `VG_MINIMAL_SETJMP' /tmp/ccoQPwJS.ltrans21.ltrans.o: In function `resume_scheduler.lto_priv.2050': /home/svn/valgrind/memcheck/m_signals.c:1879: undefined reference to `VG_MINIMAL_LONGJMP' /tmp/ccoQPwJS.ltrans24.ltrans.o: In function `scan_all_valid_memory_catcher': /home/svn/valgrind/memcheck/mc_leakcheck.c:937: undefined reference to `VG_MINIMAL_LONGJMP' /tmp/ccoQPwJS.ltrans24.ltrans.o: In function `lc_scan_memory': /home/svn/valgrind/memcheck/mc_leakcheck.c:1028: undefined reference to `VG_MINIMAL_SETJMP' collect2: error: ld returned 1 exit status
(In reply to Дилян Палаузов from comment #0) > > A possible fix, not necessary the best one, is to check during ./configure > if CFLAGS contains -flto and abort then. There are probably a zillion way to break Valgrind build when playing with CFLAGS/compiler options. Detecting these cases and aborting for all these cases is a significant work with little benefits. If you want to play/experiment with lto switch, you could start from attachment https://bugs.kde.org/attachment.cgi?id=88254 => keeping this bug as a wish to support -flto
(In reply to Philippe Waroquiers from comment #5) > (In reply to Дилян Палаузов from comment #0) > > > > A possible fix, not necessary the best one, is to check during ./configure > > if CFLAGS contains -flto and abort then. > There are probably a zillion way to break Valgrind build when > playing with CFLAGS/compiler options. > Detecting these cases and aborting for all these cases is a significant > work with little benefits. Right. If you're fiddling with CFLAGS you're on your own. That being said, I could imagine distributors to play with LTO as that becomes more mature. So it'd be good if we would support this somehow. For instance: (1) add an option --enable-lto to configure (2) if --enable-lto was given, do The Right Thing under the covers along the lines of Philippe's patch (3) make configure check whether compiler supports -flto (on s390 we need to support RHEL5 which uses 3.4.6 which does not support -flto) Something along those lines would have better chances for being included.
I con't play directly with CFLAGS for building valgrind . I have in /usr/local/etc/config.site : export CFLAGS="-O3 -flto -fno-fat-lto-objects" export LDFLAGS="-Wl,-S -Wl,-z,relro -Wl,--hash-style=gnu -Wl,-O1" export prefix=/usr export sysconfdir=/etc export localstatedir=/var export CXXFLAGS=$CFLAGS export enable_silent_rules=yes and every time I do anywhere ./configure this settings are taken. The error for undefined "OFFSET_amd64_R10" is very misleading and takes a lot of time to associate it with the settings in /usr/local/etc/config.site . That said, I am agains including --enable-lto to configure. Advanced users know how to pass the exact options for the built process and the rest does not know, that their builds are suboptimal. The only thing that matters is, if CFLAGS contains -flto .
I recommend abolishing ./autogen.sh and using instead "autoreconf -i" (I have used it so far, as I it worked and I didn't know that ./autogen.sh exists). It does exactly the same (+searches for gettext).
A simple story here is: Valgrind's performance is limited mostly by the quality of the code that its own jit generates and partly by the performance of your processor's caches, branch predictors and store queues. Messing with the compile flags isn't going to get you much. Just leave them at the defaults.
(In reply to Julian Seward from comment #9) > much. Just leave them at the defaults. Keeping the default also has as additional advantage that the day the valgrind configure properly detects a working -flto (and that this has been measured to be a significant benefit), you will automatically get it (or any other optimisation flags or others that will be deemed useful).
Still in 482e36cb6e-20180203, using gcc 6.4.0. Also reported downstream in Gentoo, see https://bugs.gentoo.org/641806
Note that I am busy doing a (I hope ok to commit soon)patch in valgrind source code so that -flto works 'out of the box'. The patch currently gives working tools on Debian 9 with gcc 6.3.0 What still needs to be done: * run regression tests, and see it is all ok (and if not, ivnestigate) * try on other platforms (e.g. ppc, s390, ...) * make some more perf measurements
Created attachment 111329 [details] patch adding --enable-lto=yes configure option Fix 338252 - building valgrind with -flto (link time optimisation) fails * Addition of a new configure option --enable-lto=yes or --enable-lto=no Default value is --enable-lto=no, as the build is significantly slower, so is not appropriate for valgrind development : this should be used only on buildbots and/or by packagers. * Some files containins asm functions have to be compiled without lto: coregrind/m_libcsetjmp.c coregrind/m_main.c If these are compiled with lto, that gives undefined symbols at link time. The files to compile without lto are coregrind/m_libcsetjmp.c coregrind/m_main.c To compile these files with other options, a noinst target lib is defined. The objects of this library are then added to the libcoregrind. * memcheck/mc_main.c : move the handwritten asm helpers to mc_main_asm.c. This avoids undefined symbols on some toolchains. Due to this, the preprocessor symbols that activate the fast or asm memcheck helpers are moved to mc_include.h Platforms with handwritten helpers will also have the memcheck primary map defined non static. * In VEX, auxprogs/genoffsets.c also has to be compiled without lto, as the asm produced by the compiler is post-processed to produce pub/libvex_guest_offsets.h. lto not producing asm means the generation fails if we used -flto to compile this file. * all the various Makefile*am are modified to use LTO_CFLAGS for (most) targets. LTO_CFLAGS is empty when --enable-lto=no, otherwise is set to the flags needed for gcc. If --enable-lto=no, LTO_AR and LTO_RANLIB are the standard AR and RANLIB, otherwise they are the lto capable versions (gcc-ar and gcc-ranlib). * This has been tested on: debian 9.4/gcc 6.3.0/amd64+x86 rhel 7.4/gcc 6.4.0/amd64 ubuntu 17.10/gcc 7.2.0/amd64+x86 fedora26/gcc 7.3.1/s390x No regressions on the above.
Autoconf users willing to compile everything with LTO just by calling ./configure put in /usr/local/etc/config.site: CFLAGS="-O3 -fno-fat-lto-objects -flto" CXXFLAGS="-O3 -fno-fat-lto-objects -flto" LDFLAGS="-Wl,-O1,-s -flto=4" And then any ./configure implies LTO. Just as other programs relying on autoconf doing the right thing do not offer --enable-lto option in ./configure --help, can be compiled with LTO, valgrind./configure --help shall not print --enable-lto. Programmer printing on --help "--enable-lto" don't understand autoconf. This applies also for artificially added -g or -O2 from configure.ac or contained in Makefile* ar, nm and ld from binutils can work with LTO, if they are compiled with plugin support binutils-gdb.git/configure --enable-plugins). When gcc calls ld, it passes the plugin path after -plugin, see. https://sourceware.org/binutils/docs/ld/Options.html#Options (-plugin with one dash). Please note, that gcc's "make install" will not install the LTO plugins on the place, where nm will look from it, if it thinks that plugins are needed. See https://sourceware.org/binutils/docs/binutils/nm.html -> --plugins . If `nm --plugin`, `ar --plugin`, `ranlib --plugin` or `ld -plugin` (with one dash) print 'Unknown option', then they are not compiled correctly. The right output of the above commands is "missing argument". The approach for using LTO is to compile ar/ld with plugin support, rather than using gcc-ld/gcc-ar. When CFLAGS from config.site contains -flto, then valgrind can assume that ar/nm/ld are compiled with plugin support. LTO is more than -flto, there is also -fno-fat-lto-objects, and Clang has its own LTO-world. All that said, what is needed for LTO support is stripping the LTO flags when building coregrind/m_libcsetjmp.c, coregrind/m_main.c and VEX/auxprogs/genoffsets.c, assuming that the system is configured correctly with the LTO plugins and ar/ld supporting plugins, or find out why LTO makes problems with those files. Compling everything with LTO produces: In file included from m_syswrap/syswrap-main.c:46: m_syswrap/syswrap-main.c: In function ‘putSyscallStatusIntoGuestState’: m_syswrap/syswrap-main.c:1135:14: error: ‘OFFSET_amd64_RAX’ undeclared (first use in this function) OFFSET_amd64_RAX, sizeof(UWord) ); ^~~~~~~~~~~~~~~~ The diff between VEX/auxprogs/genoffsets.s generated from VEX/auxprogs/genoffsets.c with and without LTO is: --- VEX/auxprogs/genoffsets.s-2 2018-03-12 14:33:05.473476269 +0100 +++ ../valgrind/VEX/auxprogs/genoffsets.s-2 2018-03-12 14:33:14.025483351 +0100 @@ -1,1023 +1,26 @@ .file "genoffsets.c" - .text -.Ltext0: - .globl foo - .type foo, @function -foo: -.LFB17: - .file 1 "./auxprogs/genoffsets.c" - .loc 1 84 1 view -0 - .cfi_startproc - .loc 1 86 4 view .LVU1 -#APP -# 86 "./auxprogs/genoffsets.c" 1 - -#define OFFSET_x86_EAX xyzzy$8 - -# 0 "" 2 - .loc 1 87 4 view .LVU2 -# 87 "./auxprogs/genoffsets.c" 1 - -#define OFFSET_x86_EBX xyzzy$20 - -# 0 "" 2 (- and other OFFSET_ symbols removed) To generate VEX/pub/libvex_guest_offsets.h this "gcc -O -S -o auxprogs/genoffsets.s ./auxprogs/genoffsets.c" is called, and OFFSET_amd64_RAX is in the output if -flto was not passed on this line. This change resolves the mentioned problem with OFFSET_amd64_RAX: diff --git a/Makefile.vex.am b/Makefile.vex.am index 4ad5ffa..f36708c 100644 --- a/Makefile.vex.am +++ b/Makefile.vex.am @@ -63,9 +63,9 @@ BUILT_SOURCES = pub/libvex_guest_offsets.h CLEANFILES = pub/libvex_guest_offsets.h if COMPILER_IS_CLANG -CFLAGS_FOR_GENOFFSETS = $(CFLAGS) -no-integrated-as +CFLAGS_FOR_GENOFFSETS = -no-integrated-as else -CFLAGS_FOR_GENOFFSETS = $(CFLAGS) +CFLAGS_FOR_GENOFFSETS = endif # This is very uggerly. Need to sed out both "xyzzyN" and Now I get: make[3]: Entering directory '/git/valgrind/memcheck' GEN memcheck-amd64-linux m_translate.c: In function ‘vgPlain_translate’: m_translate.c:1527:39: warning: ‘isWrap’ may be used uninitialized in this function [-Wmaybe-uninitialized] m_translate.c:1518:12: note: ‘isWrap’ was declared here m_signals.c:2024: error: undefined reference to 'VG_MINIMAL_LONGJMP' m_signals.c:2024: error: undefined reference to 'VG_MINIMAL_LONGJMP' m_signals.c:2024: error: undefined reference to 'VG_MINIMAL_LONGJMP' m_signals.c:2024: error: undefined reference to 'VG_MINIMAL_LONGJMP' m_gdbserver/server.c:1175: error: undefined reference to 'VG_MINIMAL_SETJMP' m_gdbserver/server.c:1193: error: undefined reference to 'VG_MINIMAL_SETJMP' m_debuginfo/readdwarf3.c:5231: error: undefined reference to 'VG_MINIMAL_SETJMP' m_scheduler/scheduler.c:1005: error: undefined reference to 'VG_MINIMAL_SETJMP' Without -flto: nm coregrind/libcoregrind_amd64_linux_a-m_libcsetjmp.o 0000000000000052 T VG_MINIMAL_LONGJMP 0000000000000000 T VG_MINIMAL_SETJMP with lto: nm coregrind/libcoregrind_amd64_linux_a-m_libcsetjmp.o (nothing) so the asm in coreconf/m_libcsetjmp.c shall be tweeked, in a way that on LTO VG_MINIMAL_LONGJMP and VG_MINIMAL_SETJMP are not optimized out. I don't speak assembler.
(In reply to Дилян Палаузов from comment #14) > Autoconf users willing to compile everything with LTO just by calling > ./configure put in /usr/local/etc/config.site: > > CFLAGS="-O3 -fno-fat-lto-objects -flto" CXXFLAGS="-O3 -fno-fat-lto-objects > -flto" LDFLAGS="-Wl,-O1,-s -flto=4" > > And then any ./configure implies LTO. Just as other programs relying on > autoconf doing the right thing do not offer --enable-lto option in > ./configure --help, can be compiled with LTO, valgrind./configure --help > shall not print --enable-lto. Programmer printing on --help "--enable-lto" > don't understand autoconf. This applies also for artificially added -g or > -O2 from configure.ac or contained in Makefile* Thank you for this insight and sharing your approach. It is really helpful to know that other possibilities exists. However I modifying /usr/local/etc/config.site affects all projects build on a box. Also we cannot force Valgrind users and maintainers to modify global configuration file [consider shared build machines, for example in the gcc build farm]. Perhaps a merged approach could be found?
People either want to compile everything with LTO, or nothing with LTO, or they don't know about LTO. Modifying /usr/local/etc/config.site is not the only way, calling CFLAGS="-flto" ./configure also works (if CFLAGS is not set in config.site) and is not "export"ed, as those take precedence. Doing ./configure && make CFLAGS='-flto' does work, too (https://www.gnu.org/software/make/manual/html_node/Overriding.html) So people willing LTO either know the aforementioned ways to compile with LTO, or they do not insist of having LTO. In fact valgrind and php are the only software I am aware of that doesn't work with LTO, php ignoring LDFLAGS from config.site and postgresql ignoring CFLAGS from config.site, and all the rest is working with both the autoconf principles and with LTO.
Having a global /usr/local/etc/config.site file is not necessary. It is possible to put the file somewhere else and set in env CONFIG_SITE for the current session where the file is that shall be considered.
(In reply to Дилян Палаузов from comment #16) > People either want to compile everything with LTO, or nothing with LTO, or > they don't know about LTO. Humph, that looks like a very general statement about all people in the world :). There exist at least one person (me) that knows about LTO and that does not want to compile everything with LTO, but want to compile a few things with LTO :). And I highly suspect that a lot (many ? all ?) distribution packagers also do not configure all the packages the same way. > So people willing LTO either know the aforementioned ways to compile with > LTO, or they do not insist of having LTO. Launching a valgrind configure/make with CFLAGS containing -flto fails for various mysterious reasons (see below). The attached patch is an approach to solve these problems, obtained after several other approaches failed (in particular to solve the asm related problems). > > In fact valgrind and php are the only software I am aware of that doesn't > work with LTO, php ignoring LDFLAGS from config.site and postgresql ignoring > CFLAGS from config.site, and all the rest is working with both the autoconf > principles and with LTO. The first problem to solve (as you saw) is the genoffset problem. As genoffset already had specific compiler options, I just slightly extended the current approach. => problem solved. I prefer to do the minimal change in this tricky area : I have no idea what could be the impact of more heavily changing the way genoffset is compiled. So, just compiling it with the same flags as today is the safest approach: if --enable-lto is not used, then it just works as of today. I then tried hard to solve the linking problems with VG_MINIMAL_SETJMP. Once it was solved, I had to solve the problem of another undefined reference (_start). These were solved by adding a pragma to push the option -fno-lto for the problematic files. With these fixes, I obtained something working on my 'main' distro (debian), but with a very strange warning, something like: a plugin is needed to handle lto objects And of course, my toolchain is configured with plugin support ! So this warning is completely mysterious/not understandable. But as the resulting valgrind was working, I just ignored the warning. I then tested on another distro (fedora+s390x) There the build stopped with an error message 'invalid option -fno-lto' (while -flto is properly supported). So, yet another mystery. Various other trials to fix the problem resulted in gcc-ar/ar core dumping, compilation failing, or resulting valgrind not working. So, another bunch of mysteries. I even tried to debug gcc-ar/ar, but could not make any sense of what I saw. Finally, I obtained the attached patch, that I have validated on 4 distributions, with 4 different gcc versions. As I understand, your opinion is that the way valgrind autoconf/automake is done is 'not clean', as e.g. it hardcodes some compilation options (-O2 and similar). I am not an autoconf/automake expert, so I cannot really judge, and for sure I do not have the time/knowledge to clean all that up. So, in summary: * as I understand, a clean solution would imply to heavily rework valgrind build system. * Valgrind build system also supports build a primary and secondary arch. Not too sure that this is very usual/supported out of the box when using CFLAGS on the command line (flags must be different for primary and secondary arch). So, that might make a clean solution more complex to obtain. * even if --enable-lto option is not a clean solution, at least it allows to build with lto if desired. Note that as far as I can see, several other packages have such an option --enable-lto. See e.g. https://forums.gentoo.org/viewtopic-t-1052716.html Sorry for not being able to do things better in this area, I guess we will need a real expert for this. In the meantime, if the patch is deemed reasonable enough, I will push it, as at least it allows to have an lto compiled valgrind, and points at what has to be compiled 'specially' if someone wants to do a cleaner build system.
A waring like "a plugin is needed to handle lto objects" suggests that nm/ar do not find the LTO plugin. For an object file compiled with LTO (and -fno-fat-lto-objects), `nm x.o` will print the symbols there, after it recognizes, that a plugin is needed. If the plugin is not findable by nm, it prints nm: libcoregrind_amd64_linux_a-m_poolalloc.o: plugin needed to handle lto object For the plugin support I have: ls /usr/local/lib/bfd-plugins/ -l total 4 lrwxrwxrwx 1 root staff 65 Mar 14 10:55 liblto_plugin.so -> /usr/local/libexec/gcc/x86_64-pc-linux-gnu/8.0.1/liblto_plugin.so This location is used by ar, ranlib and nm, but not by ld. I asked at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84861 how to tweak coregrind/m_libcsetjmp.c so that -flto does not delete VG_MINIMAL_LONGJMP and VG_MINIMAL_SETJMP. I guess you have uploaded PRs with the ar crashes.
With the change below m_libcsetjmp.c links on amd64 correctly with LTO. Is the change equivalent (apart from cheating gcc restoring rax in in VG_MINIMAL_LONGJMP)? Is volatile after __asm__ necessary? diff --git a/coregrind/m_libcsetjmp.c b/coregrind/m_libcsetjmp.c index 68c101e..5626748 100644 --- a/coregrind/m_libcsetjmp.c +++ b/coregrind/m_libcsetjmp.c @@ -382,25 +382,10 @@ __asm__( #if defined(VGP_amd64_linux) || defined(VGP_amd64_darwin) || \ defined(VGP_amd64_solaris) -__asm__( -".text" "\n" -"" "\n" - -#if defined(VGP_amd64_linux) || defined(VGP_amd64_solaris) -".global VG_MINIMAL_SETJMP" "\n" // rdi = jmp_buf -"VG_MINIMAL_SETJMP:" "\n" - -#elif defined(VGP_amd64_darwin) -".globl _VG_MINIMAL_SETJMP" "\n" // rdi = jmp_buf -"_VG_MINIMAL_SETJMP:" "\n" +__attribute__((returns_twice)) +UWord VG_MINIMAL_SETJMP(VG_MINIMAL_JMP_BUF(_env)) { -#else -# error "Huh?" -#endif - -" movq %rax, 0(%rdi)" "\n" -" movq %rbx, 8(%rdi)" "\n" -" movq %rcx, 16(%rdi)" "\n" +__asm__( " movq %rdx, 24(%rdi)" "\n" " movq %rdi, 32(%rdi)" "\n" " movq %rsi, 40(%rdi)" "\n" @@ -421,22 +406,13 @@ __asm__( " movq $0, %rax" "\n" " ret" "\n" "" "\n" + ); +} - -#if defined(VGP_amd64_linux) || defined(VGP_amd64_solaris) -".global VG_MINIMAL_LONGJMP" "\n" -"VG_MINIMAL_LONGJMP:" "\n" // rdi = jmp_buf - -#elif defined(VGP_amd64_darwin) -".globl _VG_MINIMAL_LONGJMP" "\n" -"_VG_MINIMAL_LONGJMP:" "\n" // rdi = jmp_buf - -#else -# error "Huh?" -#endif +__attribute__((noreturn)) +void VG_MINIMAL_LONGJMP(VG_MINIMAL_JMP_BUF(_env)) { + __asm__( // skip restoring rax; it's pointless -" movq 8(%rdi), %rbx" "\n" -" movq 16(%rdi), %rcx" "\n" " movq 24(%rdi), %rdx" "\n" // defer restoring rdi; we still need it " movq 40(%rdi), %rsi" "\n" @@ -465,12 +441,8 @@ __asm__( // address space. " jmp *%rax" "\n" "" "\n" - -#if !defined(VGP_amd64_darwin) -".previous" "\n" -#endif -); - + ); +} #endif /* VGP_amd64_linux || VGP_amd64_darwin || VGP_amd64_solaris */
(In reply to Дилян Палаузов from comment #20) Thanks for looking at this further. > With the change below m_libcsetjmp.c links on amd64 correctly with LTO. Is > the change equivalent (apart from cheating gcc restoring rax in in > VG_MINIMAL_LONGJMP)? Is volatile after __asm__ necessary? I have tested the patch on amd64, and the regression tests are working. So, what you propose might be a way to go. However, I am not at ease to change the asm code as this is tricky, and must be done and tested for all platforms. So, I have pushed the patch which implements --enable-lto=yes|no as ab773096df7aaaf46e8883af5ed4690f4d4499af. If deemed desirable, another bug/wish could be opened to support a better configuration mechanism (i.e. use CFLAGS, instead of having predefined logic to setup CFLAGS and similar in configure).
I am against using gcc-ar or gcc-ranlib. These are probably not available, when clang is used, and it can do LTO, too. Instead of cluttering configure.ac to use gcc-ac/gcc-ranlib, it can either rely on the fact, that /usr/(local/)?/bfd-plugins contain the LTO plugin, or alternatively compile something with -flto to an object file, call "nm the-object-file.o" and if the latter prints "missing plugin", then configure shall tell the user that she has to put the linker plugin under {libdir}/bfd-plugins . This would help the user to adjust her system to be able to compile other software with LTO, even software whose authors don't know about LTO. LTO is anyway not ready for the prime time yet. emacs does not lTO-link with ld.bfd, but with ld.gold and php also doesn't work with lto. So if the build process just uses ar/ranlib and it fails, then the user shall try without LTO or debug, debug, debug... I would like to know why gcc creates gcc-ar instead of puthing the linker plug in a place, that normal ar would auto-load it, but I do not expect an answer here on this.
I just verified, that LLVM does not install gcc-ar and gcc-ranlib, but llvm-ar and llvm-ranlib . So to make LTO work with Clang, on systems where GCC is not installed, either llvm-ar and llvm-ranlib shall be used, or ar and ranlib shall be used and it must be assumed that the system is LTO-ready (the plugins are under {libdir}/bfd-plugins/). Concerning installing by default the plugins on the compiler's "make install" on the location where binutils will look for them I filled https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84934 and https://bugs.llvm.org/show_bug.cgi?id=36802. Shall I open for the assembler-review in coregrind/m_libcsetjmp.c a separate ticket, or can it be kept here?
(In reply to Дилян Палаузов from comment #22) > I am against using gcc-ar or gcc-ranlib. These are probably not available, > when clang is used, and it can do LTO, too. > > Instead of cluttering configure.ac to use gcc-ac/gcc-ranlib, it can either > rely on the fact, that /usr/(local/)?/bfd-plugins contain the LTO plugin, or > alternatively compile something with -flto to an object file, call "nm > the-object-file.o" and if the latter prints "missing plugin", then configure > shall tell the user that she has to put the linker plugin under > {libdir}/bfd-plugins . Using directly ar and similar is causing problems on some platforms, (some of them that I care about). The gcc documentation (including the latest gcc 7.3) explicitly indicates to use gcc-ar and similar (I guess this ensures that in case you use a gcc in another path that the 'standard' /usr/...; that the correct/associated plugins are used. So, I do not think it is a good idea to diverge (for gcc) from the explicitly documented way to build a static lib, even if some advanced users might be able to fix/bypass/... using plugins in a directory such as /usr/(local/)?/bfd-plugins (note that on my distro, there is no such directory. So, plugins are somewhere else, I am guessing properly found by gcc-ar and similar). > > This would help the user to adjust her system to be able to compile other > software with LTO, even software whose authors don't know about LTO. > > LTO is anyway not ready for the prime time yet. emacs does not lTO-link > with ld.bfd, but with ld.gold and php also doesn't work with lto. So if the > build process just uses ar/ranlib and it fails, then the user shall try > without LTO or debug, debug, debug... Or rather, configure shall do the correct thing, whenever it can, using the way as documented in gcc manual. > > I would like to know why gcc creates gcc-ar instead of puthing the linker > plug in a place, that normal ar would auto-load it, but I do not expect an > answer here on this. As said above, if you have different gcc versions installed, and the plugin differ according to gcc version, then gcc-ar and similar can do the job properly. An 'hard-coded' path that ar would use will not give the same. At least for some gcc, I see the plugin is part of gcc itself. As I understand, the real problem with gcc-ar and similar is that this does not work with llvm, as llvm needs llvm-ar and similar. I largely prefer to keep this bug closed, and have another one bug discussing using clang+lto to build valgrind. I suppose patches for this would not be too difficult, but I am not an llvm user. So, can you open another bug related to clang/lto/llvm-ar, ... ? Thanks Philippe
(In reply to Дилян Палаузов from comment #23) > I just verified, that LLVM does not install gcc-ar and gcc-ranlib, but > llvm-ar and llvm-ranlib . > > So to make LTO work with Clang, on systems where GCC is not installed, > either llvm-ar and llvm-ranlib shall be used, or ar and ranlib shall be used > and it must be assumed that the system is LTO-ready (the plugins are under > {libdir}/bfd-plugins/). > > Concerning installing by default the plugins on the compiler's "make > install" on the location where binutils will look for them I filled > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84934 and > https://bugs.llvm.org/show_bug.cgi?id=36802. Which, as I understand, does not work if you have several gcc versions installed. > > Shall I open for the assembler-review in coregrind/m_libcsetjmp.c a separate > ticket, or can it be kept here? Yes, that would be better, i.e. to have 2 new bugs: * clang/lto/llvm-ar, ... (cdr comment 24) * allow to configure valgrind using CFLAGS (a.o. to support LTO) Thanks Philippe
With python 3.6, doing "./configure --with-lto --enable-optimizations && make" compiles with -flto, but calls ranlib/ar, not gcc-ranlib or gcc-ar. Why can python-lto live without gcc-ranlib, but valgrind cannot? Also "./configure --with-lto" alone does not imply -flto. > > Concerning installing by default the plugins on the compiler's "make install" on the location where binutils will look for them I filled https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84934 and https://bugs.llvm.org/show_bug.cgi?id=36802. > Which, as I understand, does not work if you have several gcc versions installed. It works, as the plugin is backwards compatible. You just install the newest gcc-plugin, it then handles the older versions. In particular, if x86_64-pc-linux-gnu-gcc-6.4.1 is installed on a system before x86_64-pc-linux-gnu-gcc-6.4.1 is installed, and a file is then compiled with 6.4.1, strace shows that gcc-ar calls 7.3.1/liblto_plugin.so . Concerning how LTO-libraries shall be built in portable way - by detecting in ./configure whether gcc or clang is used and then switching to gcc-ranlib or llvm-ranlib and why gcc does recomment gcc-nm instead of installing the plugin at the right place I asked at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84995 and http://lists.llvm.org/pipermail/llvm-dev/2018-March/122018.html . While both llvm and gcc can do LTO, but considering that after being asked neither on clang side is described how to build the LTO plugin (https://bugs.llvm.org/show_bug.cgi?id=32759#c3) nor on gcc side the plugin is installed on the right place (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70345), honestly, I don't think that any of the llvm and gcc teams is willing to put efforts to offer LTO for the masses. For switching the __asm__ in coregrind/m_(main,libcsetjmp).c I opened https://bugs.kde.org/show_bug.cgi?id=392180 . > * allow to configure valgrind using CFLAGS (a.o. to support LTO) How shall this work? I can imagine at first place substituting -S with -E in Makefile.vex.am-92- $(CC) $(CFLAGS_FOR_GENOFFSETS) \ Makefile.vex.am-93- $(LIBVEX_CFLAGS_NO_LTO) \ Makefile.vex.am-94- $(AM_CFLAGS_@VGCONF_PLATFORM_PRI_CAPS@) \ Makefile.vex.am:95: -O -S -o auxprogs/genoffsets.s \ Makefile.vex.am-96- $(srcdir)/auxprogs/genoffsets.c Makefile.vex.am-97- grep xyzzy auxprogs/genoffsets.s | grep "^[# ]*#define" \ Makefile.vex.am-98- | sed "s/# #define/#define/g" \ https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84934#c1 suggests that distributions are supposed to install the LTO-plugin on the right place. Do the distrubutuions you care about do so and if not, why?
(In reply to Дилян Палаузов from comment #26) > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84934#c1 suggests that > distributions are supposed to install the LTO-plugin on the right place. Do > the distrubutuions you care about do so and if not, why? I have no idea, but the gcc doc says to use gcc-ar. On at least some platforms, it was failing with ar, and succesful when using gcc-ar (according to the gcc user manual). If a gcc+distribution works with ar, it will also work with gcc-ar. So, I see no reason to use ar instead of gcc-ar, as this can only make lto fail on some setup, without repairing any other setup. However, as I understand, llvm is important for you, and when compiling with llvm, llvm-ar must be used. So, when I have a little bit of time, I will install llvm and see if I can make that work.
(In reply to Philippe Waroquiers from comment #27) > So, when I have a little bit of time, I will install llvm > and see if I can make that work. At least on Ubunty 17.10, with the distro llvm/clang, configuring and building valgrind does not work properly, even without lto. Without lto, some c++ regression tests fail to compile (most of this can be bypassed by adding -std=c++11. Some tests are not compiling as they seem to depend on gcc internals. To try lto, some updates have to be done to configure.ac (I will attach the tentative patch), but then the link fails with: ../coregrind/link_tool_exe_linux 0x58000000 clang -o memcheck-amd64-linux -flto -m64 -O2 -finline-functions -g -Wall -Wmissing-prototypes -Wshadow -Wpointer-arith -Wstrict-prototypes -Wmissing-declarations -Wcast-align -Wcast-qual -Wwrite-strings -Wempty-body -Wformat -Wformat-security -Wignored-qualifiers -fno-stack-protector -fno-strict-aliasing -fno-builtin -Wno-cast-align -Wno-self-assign -Wno-tautological-compare -fomit-frame-pointer -O2 -static -nodefaultlibs -nostartfiles -u _start -m64 memcheck_amd64_linux-mc_leakcheck.o memcheck_amd64_linux-mc_malloc_wrappers.o memcheck_amd64_linux-mc_main.o memcheck_amd64_linux-mc_main_asm.o memcheck_amd64_linux-mc_translate.o memcheck_amd64_linux-mc_machine.o memcheck_amd64_linux-mc_errors.o ../coregrind/libcoregrind-amd64-linux.a ../VEX/libvex-amd64-linux.a -lgcc Global is external, but doesn't have external or weak linkage! i64 ()* @do_syscall_WRK LLVM ERROR: Broken module found, compilation aborted! clang: error: linker command failed with exit code 1 (use -v to see invocation) So, some more work is needed to support llvm, and even more work needed to support llvm+lto.
Created attachment 111644 [details] tentative patch to support llvm+lto With the attached patch, llvm+lto works till the link phase on ubuntu 17.10+clang version 4.0.1-6 (tags/RELEASE_401/final). However, it fails at link time with: Global is external, but doesn't have external or weak linkage! i64 ()* @do_syscall_WRK LLVM ERROR: Broken module found, compilation aborted! clang: error: linker command failed with exit code 1 (use -v to see invocation)
For the record, with /usr/local/etc/config.site containing CC=clang CXX=clang++ CFLAGS="-Wall -pipe -O3" CXXFLAGS="-pipe -O3" LDFLAGS="-L/usr/local/lib64 -Wl,-O1,-s -fuse-ld=gold" and this changes: diff --git a/configure.ac b/configure.ac --- a/configure.ac +++ b/configure.ac @@ -47,7 +47,7 @@ AC_PROG_CXX AC_PROG_RANLIB # Set LTO_RANLIB variable to an lto enabled ranlib if test "x$LTO_RANLIB" = "x"; then - AC_PATH_PROGS([LTO_RANLIB], [gcc-ranlib]) + AC_PATH_PROGS([LTO_RANLIB], [ranlib]) fi AC_ARG_VAR([LTO_RANLIB],[Library indexer command for link time optimisation]) @@ -68,7 +68,7 @@ AC_ARG_VAR([AR],[Archiver command]) # same for LTO_AR variable for lto enabled archiver if test "x$LTO_AR" = "x"; then - AC_PATH_PROGS([LTO_AR], [gcc-ar]) + AC_PATH_PROGS([LTO_AR], [ar]) fi AC_ARG_VAR([LTO_AR],[Archiver command for link time optimisation]) valgrind does compile and link using clang, after being ./configure'd with --enable-lto. Likewise, having in /usr/local/etc/config.site CFLAGS="-Wall -pipe -O3" CXXFLAGS="-pipe -O3" LDFLAGS="-L/usr/local/lib64 -Wl,-O1,-s -fuse-ld=gold" and thus implicitly using gcc, ./configure --enable-lto also compiles and links. So it is a matter of installing the linker plugin at the right place, uring the installation of the compiler, assuming a recent version of binutils (I think the support for linker plugins is there for 3-4 years now). If the compilers' linker plugins are not installed where the linker+ar+nm+ranlib searches them, consider asking your distribution to do so. I use clang 6.0.0 and have not run any tests.