Bug 338252 - building valgrind with -flto (link time optimisation) fails
Summary: building valgrind with -flto (link time optimisation) fails
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: unspecified
Platform: Other Linux
: NOR wishlist
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-08-14 05:33 UTC by Дилян Палаузов
Modified: 2018-06-20 23:19 UTC (History)
4 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
VEX/auxprogs/genoffsets.s generated by passing -flto to CFLAGS (19.12 KB, text/x-tex)
2014-08-14 05:33 UTC, Дилян Палаузов
Details
Philippe's possibly kludgy -flto enablement patch (1.76 KB, patch)
2014-08-14 14:26 UTC, Florian Krohm
Details
patch adding --enable-lto=yes configure option (42.86 KB, text/plain)
2018-03-11 21:32 UTC, Philippe Waroquiers
Details
tentative patch to support llvm+lto (2.68 KB, text/plain)
2018-03-25 19:22 UTC, Philippe Waroquiers
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Дилян Палаузов 2014-08-14 05:33:57 UTC
Created attachment 88247 [details]
VEX/auxprogs/genoffsets.s generated by passing -flto to CFLAGS

If I set CFLAGS to contain -flto during ./configure, VEX/auxprogs/genoffsets.s is generated as attached.  In turn VEX/pub/libvex_guest_offsets.h is emptry and OFFSET_amd64_R10, OFFSET_amd64_RSI in coregrind/m_syswrap/syswrap-main.c are undefined .  The compilation fails.

A possible fix, not necessary the best one, is to check during ./configure if CFLAGS contains -flto and abort then.
Comment 1 Florian Krohm 2014-08-14 09:18:21 UTC
I cannot reproduce this and amd64:

svn co svn://svn.valgrind.org/valgrind/trunk trunk
cd trunk
./autogen.sh
CFLAGS=-flto ./configure --prefix=/tmp/bug
make >& LOG

libvex_guest_offsets.h gets generated OK.
There is however a link error:
../coregrind/link_tool_exe_linux 0x38000000 gcc  -Wno-long-long -flto -Wwrite-strings -fno-stack-protector   -o memcheck-amd64-linux -m64 -O2 -g -Wall -Wmissing-prototypes -Wshadow -Wpointer-arith -Wstrict-prototypes -Wmissing-declarations -Wno-format-zero-length -Wno-tautological-compare -fno-strict-aliasing -fno-builtin -fomit-frame-pointer -O2 -static -nodefaultlibs -nostartfiles -u _start  -m64 memcheck_amd64_linux-mc_leakcheck.o memcheck_amd64_linux-mc_malloc_wrappers.o memcheck_amd64_linux-mc_main.o memcheck_amd64_linux-mc_translate.o memcheck_amd64_linux-mc_machine.o memcheck_amd64_linux-mc_errors.o ../coregrind/libcoregrind-amd64-linux.a ../VEX/libvex-amd64-linux.a -lgcc 
`vgPlain_interim_stack' referenced in section `.text' of /tmp/ccGkHIT6.ltrans0.ltrans.o: defined in discarded section `COMMON' of libcoregrind_amd64_linux_a-m_main.o (symbol from plugin)
collect2: error: ld returned 1 exit status
Comment 2 Philippe Waroquiers 2014-08-14 13:55:23 UTC
Some weeks ago, I did a (kludgy) trial to use -flto for compiling Valgrind.

See http://sourceforge.net/p/valgrind/mailman/message/32641134/
that contains a (kludgy, non commitable) patch to enable -flto
and solve the problems encountered (e.g. genoffsets.s and the linking problem)

The conclusion was that lto might somewhat help performance, but more
measurements were needed (in particular on more modern CPU)
Comment 3 Florian Krohm 2014-08-14 14:26:24 UTC
Created attachment 88254 [details]
Philippe's possibly kludgy -flto enablement patch

I'm attaching Philippe's patch here for convenience.
I can confirm that r14273 can be built successfully with that patch on Ubuntu 14.04.1 using gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2
Comment 4 Дилян Палаузов 2014-08-14 20:21:11 UTC
I used gcc-4.9.1.  To reproduce the problem, just put in VEX/Makefile on line 2678 -flto, as shown below, and regenereate auxprogs/genoffsets.s + pub/libvex_guest_offsets.h


2676         rm -f auxprogs/genoffsets.s
2677        $(mkdir_p) auxprogs pub
2678        $(CC) $(CFLAGS) -flto \
2679              $(LIBVEX_CFLAGS) \
2680              $(AM_CFLAGS_AMD64_LINUX) \
2681              -O -S -o auxprogs/genoffsets.s \
2682              $(srcdir)/auxprogs/genoffsets.c


I recommend abolishing ./autogen.sh and using instead "autoreconf -i"  (I have used it so far, as I it worked and I didn't know that ./autogen.sh exists).  It does exactly the same (+searches for gettext).

The proposed patch fixes the problem with the generation of VEX/pub/libvex_guest_offsets.h .

With -flto I also get link problems, even using -Wl,--plugin=/usr/lib/bfd-plugins/liblto_plugin.so

make[3]: Entering directory '/mnt/new/home/svn/valgrind/memcheck'
../coregrind/link_tool_exe_linux 0x38000000 gcc  -Wno-long-long -flto -pipe -O2 -Wl,--hash-style=gnu -Wl,-O1 -Wl,-z,relro  -Wwrite-strings -fno-stack-protector  -L/usr/lib64 -L/lib64 -Wl,--plugin=/usr/lib/bfd-plugins/liblto_plugin.so -Wl,-S -Wl,--hash-style=gnu -Wl,-O1 -Wl,-z,relro -o memcheck-amd64-linux -m64 -fuse-linker-plugin -flto -ffat-lto-objects -O2 -g -Wall -Wmissing-prototypes -Wshadow -Wpointer-arith -Wstrict-prototypes -Wmissing-declarations -Wno-format-zero-length -Wno-tautological-compare -fno-strict-aliasing -fno-builtin -fomit-frame-pointer -O2 -static -nodefaultlibs -nostartfiles -u _start  -m64 memcheck_amd64_linux-mc_leakcheck.o memcheck_amd64_linux-mc_malloc_wrappers.o memcheck_amd64_linux-mc_main.o memcheck_amd64_linux-mc_translate.o memcheck_amd64_linux-mc_machine.o memcheck_amd64_linux-mc_errors.o ../coregrind/libcoregrind-amd64-linux.a ../VEX/libvex-amd64-linux.a -lgcc

  /tmp/ccoQPwJS.ltrans8.ltrans.o: In function `error':
/home/svn/valgrind/memcheck/m_gdbserver/utils.c:51: undefined reference to `VG_MINIMAL_LONGJMP'
/tmp/ccoQPwJS.ltrans8.ltrans.o: In function `gdbserver_terminate':
/home/svn/valgrind/memcheck/m_gdbserver/server.c:947: undefined reference to `VG_MINIMAL_SETJMP'
/tmp/ccoQPwJS.ltrans8.ltrans.o: In function `handle_syscall':
/home/svn/valgrind/memcheck/m_scheduler/scheduler.c:1085: undefined reference to `VG_MINIMAL_SETJMP'
/tmp/ccoQPwJS.ltrans8.ltrans.o: In function `server_main':
/home/svn/valgrind/memcheck/m_gdbserver/server.c:965: undefined reference to `VG_MINIMAL_SETJMP'
/tmp/ccoQPwJS.ltrans8.ltrans.o: In function `call_gdbserver.lto_priv.2226':
/home/svn/valgrind/memcheck/m_gdbserver/m_gdbserver.c:768: undefined reference to `VG_MINIMAL_LONGJMP'
/tmp/ccoQPwJS.ltrans8.ltrans.o: In function `run_thread_for_a_while':
/home/svn/valgrind/memcheck/m_scheduler/scheduler.c:931: undefined reference to `VG_MINIMAL_SETJMP'
/tmp/ccoQPwJS.ltrans19.ltrans.o: In function `barf.lto_priv.450':
/home/svn/valgrind/memcheck/m_debuginfo/readdwarf3.c:5061: undefined reference to `VG_MINIMAL_LONGJMP'
/tmp/ccoQPwJS.ltrans19.ltrans.o: In function `vgModuleLocal_new_dwarf3_reader':
/home/svn/valgrind/memcheck/m_debuginfo/readdwarf3.c:5089: undefined reference to `VG_MINIMAL_SETJMP'
/tmp/ccoQPwJS.ltrans21.ltrans.o: In function `resume_scheduler.lto_priv.2050':
/home/svn/valgrind/memcheck/m_signals.c:1879: undefined reference to `VG_MINIMAL_LONGJMP'
/tmp/ccoQPwJS.ltrans24.ltrans.o: In function `scan_all_valid_memory_catcher':
/home/svn/valgrind/memcheck/mc_leakcheck.c:937: undefined reference to `VG_MINIMAL_LONGJMP'
/tmp/ccoQPwJS.ltrans24.ltrans.o: In function `lc_scan_memory':
/home/svn/valgrind/memcheck/mc_leakcheck.c:1028: undefined reference to `VG_MINIMAL_SETJMP'
collect2: error: ld returned 1 exit status
Comment 5 Philippe Waroquiers 2014-08-15 08:29:38 UTC
(In reply to Дилян Палаузов from comment #0)
> 
> A possible fix, not necessary the best one, is to check during ./configure
> if CFLAGS contains -flto and abort then.
There are probably a zillion way to break Valgrind build when
playing with CFLAGS/compiler options.
Detecting these cases and aborting for all these cases is a significant
work with little benefits.

If you want to play/experiment with lto switch, you could start from
attachment https://bugs.kde.org/attachment.cgi?id=88254

=> keeping this bug as a wish to support -flto
Comment 6 Florian Krohm 2014-08-15 09:14:00 UTC
(In reply to Philippe Waroquiers from comment #5)
> (In reply to Дилян Палаузов from comment #0)
> > 
> > A possible fix, not necessary the best one, is to check during ./configure
> > if CFLAGS contains -flto and abort then.
> There are probably a zillion way to break Valgrind build when
> playing with CFLAGS/compiler options.
> Detecting these cases and aborting for all these cases is a significant
> work with little benefits.

Right. If you're fiddling with CFLAGS  you're on your own.

That being said, I could imagine distributors to play with LTO as that becomes more mature. So it'd be good if we would support this somehow. 

For instance:
(1) add an option --enable-lto to configure
(2) if --enable-lto was given, do The Right Thing under the covers along the
    lines of Philippe's patch 
(3) make configure check whether compiler supports -flto
    (on s390 we need to support RHEL5 which uses 3.4.6 which does not support
    -flto)

Something along those lines would have better chances for being included.
Comment 7 Дилян Палаузов 2014-08-19 20:27:40 UTC
I con't play directly with CFLAGS for building valgrind . I have in 
/usr/local/etc/config.site :

export CFLAGS="-O3 -flto -fno-fat-lto-objects"
export LDFLAGS="-Wl,-S -Wl,-z,relro -Wl,--hash-style=gnu -Wl,-O1"
export prefix=/usr
export sysconfdir=/etc
export localstatedir=/var
export CXXFLAGS=$CFLAGS
export enable_silent_rules=yes

and every time I do anywhere ./configure this settings are taken.  The error for undefined "OFFSET_amd64_R10" is very misleading and takes a lot of time to associate it with the settings in /usr/local/etc/config.site .

That said, I am agains including --enable-lto to configure.  Advanced users know how to pass the exact options for the built process and the rest does not know, that their builds are suboptimal.  The only thing that matters is, if CFLAGS contains -flto .
Comment 8 Дилян Палаузов 2014-08-19 20:47:39 UTC
I recommend abolishing ./autogen.sh and using instead "autoreconf -i"  (I have used it so far, as I it worked and I didn't know that ./autogen.sh exists).  It does exactly the same (+searches for gettext).
Comment 9 Julian Seward 2014-09-03 07:15:33 UTC
A simple story here is: Valgrind's performance is limited mostly by
the quality of the code that its own jit generates and partly by the
performance of your processor's caches, branch predictors and
store queues.  Messing with the compile flags isn't going to get you
much.  Just leave them at the defaults.
Comment 10 Philippe Waroquiers 2014-09-04 17:08:31 UTC
(In reply to Julian Seward from comment #9)
> much.  Just leave them at the defaults.
Keeping the default also has as additional advantage that the day the valgrind configure
properly detects a working -flto (and that this has been measured to be a significant benefit),
you will automatically get it (or any other optimisation flags or others that will be
deemed useful).
Comment 11 Austin English 2018-02-05 23:20:59 UTC
Still in 482e36cb6e-20180203, using gcc 6.4.0.

Also reported downstream in Gentoo, see https://bugs.gentoo.org/641806
Comment 12 Philippe Waroquiers 2018-02-06 22:50:19 UTC
Note that I am busy doing a (I hope ok to commit soon)patch in
valgrind source code so that -flto works 'out of the box'.

The patch currently gives working tools on Debian 9 with gcc 6.3.0

What still needs to be done:
* run regression tests, and see it is all ok (and if not, ivnestigate)
* try on other platforms (e.g. ppc, s390, ...)
* make some more perf measurements
Comment 13 Philippe Waroquiers 2018-03-11 21:32:05 UTC
Created attachment 111329 [details]
patch adding --enable-lto=yes configure option

Fix 338252 - building valgrind with -flto (link time optimisation) fails

* Addition of a new configure option --enable-lto=yes or --enable-lto=no
  Default value is --enable-lto=no, as the build is significantly slower,
  so is not appropriate for valgrind development : this should be used
  only on buildbots and/or by packagers.

* Some files containins asm functions have to be compiled without lto:
    coregrind/m_libcsetjmp.c
    coregrind/m_main.c
  If these are compiled with lto, that gives undefined symbols at link time.
  The files to compile without lto are
    coregrind/m_libcsetjmp.c
    coregrind/m_main.c

  To compile these files with other options, a noinst target lib is defined.
  The objects of this library are then added to the libcoregrind.

* memcheck/mc_main.c : move the handwritten asm helpers to mc_main_asm.c.
  This avoids undefined symbols on some toolchains. Due to this,
  the preprocessor symbols that activate the fast or asm memcheck helpers
  are moved to mc_include.h
  Platforms with handwritten helpers will also have the memcheck primary
  map defined non static.

* In VEX, auxprogs/genoffsets.c also has to be compiled without lto,
  as the asm produced by the compiler is post-processed to produce
  pub/libvex_guest_offsets.h. lto not producing asm means the generation
  fails if we used -flto to compile this file.

* all the various Makefile*am are modified to use LTO_CFLAGS for
  (most) targets. LTO_CFLAGS is empty when --enable-lto=no,
  otherwise is set to the flags needed for gcc.
  If --enable-lto=no, LTO_AR and LTO_RANLIB are the standard AR and RANLIB,
  otherwise they are the lto capable versions (gcc-ar and gcc-ranlib).

* This has been tested on:
    debian 9.4/gcc 6.3.0/amd64+x86
    rhel 7.4/gcc 6.4.0/amd64
    ubuntu 17.10/gcc 7.2.0/amd64+x86
    fedora26/gcc 7.3.1/s390x

  No regressions on the above.
Comment 14 Дилян Палаузов 2018-03-12 14:23:23 UTC
Autoconf users willing to compile everything with LTO just by calling ./configure put in /usr/local/etc/config.site:

CFLAGS="-O3 -fno-fat-lto-objects -flto" CXXFLAGS="-O3 -fno-fat-lto-objects -flto" LDFLAGS="-Wl,-O1,-s -flto=4"

And then any ./configure implies LTO.  Just as other programs relying on autoconf doing the right thing do not offer --enable-lto option in ./configure --help, can be compiled with LTO, valgrind./configure --help shall not print --enable-lto.  Programmer printing on --help "--enable-lto" don't understand autoconf.  This applies also for artificially added -g or -O2 from configure.ac or contained in Makefile*

ar, nm and ld from binutils can work with LTO, if they are compiled with plugin support binutils-gdb.git/configure --enable-plugins).

When gcc calls ld, it passes the plugin path after -plugin, see. https://sourceware.org/binutils/docs/ld/Options.html#Options (-plugin with one dash).  Please note, that gcc's "make install" will not install the LTO plugins on the place, where nm will look from it, if it thinks that plugins are needed.  See https://sourceware.org/binutils/docs/binutils/nm.html -> --plugins .

If `nm --plugin`, `ar --plugin`, `ranlib --plugin` or `ld -plugin` (with one dash) print 'Unknown option', then they are not compiled correctly.  The right output of the above commands is "missing argument".  The approach for using LTO is to compile ar/ld with plugin support, rather than using gcc-ld/gcc-ar.

When CFLAGS from config.site contains -flto, then valgrind can assume that ar/nm/ld are compiled with plugin support.

LTO is more than -flto, there is also -fno-fat-lto-objects, and Clang has its own LTO-world.

All that said, what is needed for LTO support is stripping the LTO flags when building  coregrind/m_libcsetjmp.c, coregrind/m_main.c and  VEX/auxprogs/genoffsets.c, assuming that the system is configured correctly with the LTO plugins and ar/ld supporting plugins, or find out why LTO makes problems with those files.

Compling everything with LTO produces:
In file included from m_syswrap/syswrap-main.c:46:
m_syswrap/syswrap-main.c: In function ‘putSyscallStatusIntoGuestState’:
m_syswrap/syswrap-main.c:1135:14: error: ‘OFFSET_amd64_RAX’ undeclared (first use in this function)
              OFFSET_amd64_RAX, sizeof(UWord) );
              ^~~~~~~~~~~~~~~~

The diff between VEX/auxprogs/genoffsets.s generated from VEX/auxprogs/genoffsets.c with and without LTO is:

--- VEX/auxprogs/genoffsets.s-2 2018-03-12 14:33:05.473476269 +0100
+++ ../valgrind/VEX/auxprogs/genoffsets.s-2     2018-03-12 14:33:14.025483351 +0100
@@ -1,1023 +1,26 @@
        .file   "genoffsets.c"
-       .text
-.Ltext0:
-       .globl  foo
-       .type   foo, @function
-foo:
-.LFB17:
-       .file 1 "./auxprogs/genoffsets.c"
-       .loc 1 84 1 view -0
-       .cfi_startproc
-       .loc 1 86 4 view .LVU1
-#APP
-# 86 "./auxprogs/genoffsets.c" 1
-       
-#define OFFSET_x86_EAX xyzzy$8
-
-# 0 "" 2
-       .loc 1 87 4 view .LVU2
-# 87 "./auxprogs/genoffsets.c" 1
-       
-#define OFFSET_x86_EBX xyzzy$20
-
-# 0 "" 2
(- and other OFFSET_ symbols removed)

To generate VEX/pub/libvex_guest_offsets.h this "gcc -O -S -o auxprogs/genoffsets.s ./auxprogs/genoffsets.c" is called, and OFFSET_amd64_RAX is in the output if -flto was not passed on this line.

This change resolves the mentioned problem with OFFSET_amd64_RAX:

diff --git a/Makefile.vex.am b/Makefile.vex.am
index 4ad5ffa..f36708c 100644
--- a/Makefile.vex.am
+++ b/Makefile.vex.am
@@ -63,9 +63,9 @@ BUILT_SOURCES = pub/libvex_guest_offsets.h
 CLEANFILES    = pub/libvex_guest_offsets.h
 
 if COMPILER_IS_CLANG
-CFLAGS_FOR_GENOFFSETS = $(CFLAGS) -no-integrated-as
+CFLAGS_FOR_GENOFFSETS = -no-integrated-as
 else
-CFLAGS_FOR_GENOFFSETS = $(CFLAGS)
+CFLAGS_FOR_GENOFFSETS =
 endif
 
 # This is very uggerly.  Need to sed out both "xyzzyN" and

Now I get:
make[3]: Entering directory '/git/valgrind/memcheck'
  GEN      memcheck-amd64-linux
m_translate.c: In function ‘vgPlain_translate’:
m_translate.c:1527:39: warning: ‘isWrap’ may be used uninitialized in this function [-Wmaybe-uninitialized]
m_translate.c:1518:12: note: ‘isWrap’ was declared here
m_signals.c:2024: error: undefined reference to 'VG_MINIMAL_LONGJMP'
m_signals.c:2024: error: undefined reference to 'VG_MINIMAL_LONGJMP'
m_signals.c:2024: error: undefined reference to 'VG_MINIMAL_LONGJMP'
m_signals.c:2024: error: undefined reference to 'VG_MINIMAL_LONGJMP'
m_gdbserver/server.c:1175: error: undefined reference to 'VG_MINIMAL_SETJMP'
m_gdbserver/server.c:1193: error: undefined reference to 'VG_MINIMAL_SETJMP'
m_debuginfo/readdwarf3.c:5231: error: undefined reference to 'VG_MINIMAL_SETJMP'
m_scheduler/scheduler.c:1005: error: undefined reference to 'VG_MINIMAL_SETJMP'


Without -flto: 
nm  coregrind/libcoregrind_amd64_linux_a-m_libcsetjmp.o
0000000000000052 T VG_MINIMAL_LONGJMP
0000000000000000 T VG_MINIMAL_SETJMP
with lto:
nm coregrind/libcoregrind_amd64_linux_a-m_libcsetjmp.o
(nothing)

so the asm in coreconf/m_libcsetjmp.c shall be tweeked, in a way that on LTO VG_MINIMAL_LONGJMP and VG_MINIMAL_SETJMP are not optimized out.  I don't speak assembler.
Comment 15 Ivo Raisr 2018-03-12 17:10:41 UTC
(In reply to Дилян Палаузов from comment #14)
> Autoconf users willing to compile everything with LTO just by calling
> ./configure put in /usr/local/etc/config.site:
> 
> CFLAGS="-O3 -fno-fat-lto-objects -flto" CXXFLAGS="-O3 -fno-fat-lto-objects
> -flto" LDFLAGS="-Wl,-O1,-s -flto=4"
> 
> And then any ./configure implies LTO.  Just as other programs relying on
> autoconf doing the right thing do not offer --enable-lto option in
> ./configure --help, can be compiled with LTO, valgrind./configure --help
> shall not print --enable-lto.  Programmer printing on --help "--enable-lto"
> don't understand autoconf.  This applies also for artificially added -g or
> -O2 from configure.ac or contained in Makefile*

Thank you for this insight and sharing your approach. It is really helpful to know that other possibilities exists.
However I modifying /usr/local/etc/config.site affects all projects build on a box. Also we cannot force Valgrind users and maintainers to modify global configuration file [consider shared build machines, for example in the gcc build farm]. Perhaps a merged approach could be found?
Comment 16 Дилян Палаузов 2018-03-12 17:43:08 UTC
People either want to compile everything with LTO, or nothing with LTO, or they don't know about LTO.  Modifying /usr/local/etc/config.site is not the only way, calling

  CFLAGS="-flto" ./configure

also works (if CFLAGS is not set in config.site) and is not "export"ed, as those take precedence.  Doing

  ./configure && make CFLAGS='-flto'

does work, too (https://www.gnu.org/software/make/manual/html_node/Overriding.html)

So people willing LTO either know the aforementioned ways to compile with LTO, or they do not insist of having LTO.

In fact valgrind and php are the only software I am aware of that doesn't work with LTO, php ignoring LDFLAGS from config.site and postgresql ignoring CFLAGS from config.site, and all the rest is working with both the autoconf principles and with LTO.
Comment 17 Дилян Палаузов 2018-03-12 17:54:57 UTC
Having a global /usr/local/etc/config.site file is not necessary.  It is possible to put the file somewhere else and set in env CONFIG_SITE for the current session where the file is that shall be considered.
Comment 18 Philippe Waroquiers 2018-03-12 21:38:26 UTC
(In reply to Дилян Палаузов from comment #16)
> People either want to compile everything with LTO, or nothing with LTO, or
> they don't know about LTO.
Humph, that looks like a very general statement about all people in the
world :).
There exist at least one person (me) that knows about LTO and that does
not want to compile everything with LTO, but want to compile a few
things with LTO :).
And I highly suspect that a lot (many ? all ?) distribution packagers
also do not configure all the packages the same way.

 
> So people willing LTO either know the aforementioned ways to compile with
> LTO, or they do not insist of having LTO.
Launching a valgrind configure/make with CFLAGS containing -flto
fails for various mysterious reasons (see below).
The attached patch is an approach to solve these problems,
obtained after several other approaches failed (in particular to solve
the asm related problems).

> 
> In fact valgrind and php are the only software I am aware of that doesn't
> work with LTO, php ignoring LDFLAGS from config.site and postgresql ignoring
> CFLAGS from config.site, and all the rest is working with both the autoconf
> principles and with LTO.
The first problem to solve (as you saw) is the genoffset problem.
As genoffset already had specific compiler options, I just slightly extended
the current approach. => problem solved.
I prefer to do the minimal change in this tricky area : I have no idea what
could be the impact of more heavily changing the way genoffset is compiled.
So, just compiling it with the same flags as today is the safest approach:
if --enable-lto is not used, then it just works as of today.

I then tried hard to solve the linking problems with VG_MINIMAL_SETJMP.
Once it was solved, I had to solve the problem of another undefined reference
(_start).
These were solved by adding a pragma to push the option -fno-lto
for the problematic files.

With these fixes, I obtained something working on my 'main' distro (debian),
but with a very strange warning, something like: 
   a plugin is needed to handle lto objects
And of course, my toolchain is configured with plugin support !
So this warning is completely mysterious/not understandable.
But as the resulting valgrind was working, I just ignored the warning.

I then tested on another distro (fedora+s390x)
There the build stopped with an error message 'invalid option -fno-lto'
(while -flto is properly supported). So, yet another mystery.
Various other trials to fix the problem resulted in gcc-ar/ar core dumping,
compilation failing, or resulting valgrind not working.
So, another bunch of mysteries. I even tried to debug gcc-ar/ar, but could
not make any sense of what I saw.

Finally, I obtained the attached patch, that I have validated on 4 distributions,
with 4 different gcc versions.

As I understand, your opinion is that the way valgrind autoconf/automake is done
is 'not clean', as e.g. it hardcodes some compilation options (-O2 and similar).
I am not an autoconf/automake expert, so I cannot really judge, and for sure
I do not have the time/knowledge to clean all that up.

So, in summary:
  * as I understand, a clean solution would imply to heavily rework valgrind
    build system.
  * Valgrind build system also supports build a primary and secondary arch.
    Not too sure that this is very usual/supported out of the box when
    using CFLAGS on the command line (flags must be different for primary and
    secondary arch).
    So, that might make a clean solution more complex to obtain.
 * even if --enable-lto option is not a clean solution, at least it allows to
    build with lto if desired.
    Note that as far as I can see, several other packages have such an option
     --enable-lto. See e.g. https://forums.gentoo.org/viewtopic-t-1052716.html

Sorry for not being able to do things better in this area, I guess we will
need a real expert for this.

In the meantime, if the patch is deemed reasonable enough, I will push it, as at
least it allows to have an lto compiled valgrind, and points at what
has to be compiled 'specially'  if someone wants to do a cleaner build system.
Comment 19 Дилян Палаузов 2018-03-14 10:19:04 UTC
A waring like "a plugin is needed to handle lto objects" suggests that nm/ar do not find the LTO plugin.

For an object file compiled with LTO (and -fno-fat-lto-objects), `nm x.o` will print the symbols there, after it recognizes, that a plugin is needed.  If the plugin is not findable by nm, it prints
   nm: libcoregrind_amd64_linux_a-m_poolalloc.o: plugin needed to handle lto object

For the plugin support I have:

  ls /usr/local/lib/bfd-plugins/ -l
  total 4
  lrwxrwxrwx 1 root staff 65 Mar 14 10:55 liblto_plugin.so -> /usr/local/libexec/gcc/x86_64-pc-linux-gnu/8.0.1/liblto_plugin.so

This location is used by ar, ranlib and nm, but not by ld.

I asked at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84861 how to tweak coregrind/m_libcsetjmp.c so that -flto does not delete VG_MINIMAL_LONGJMP and VG_MINIMAL_SETJMP.

I guess you have uploaded PRs with the ar crashes.
Comment 20 Дилян Палаузов 2018-03-14 13:41:54 UTC
With the change below m_libcsetjmp.c links on amd64 correctly with LTO.  Is the change equivalent (apart from cheating gcc restoring rax in in VG_MINIMAL_LONGJMP)?  Is volatile after __asm__ necessary?

diff --git a/coregrind/m_libcsetjmp.c b/coregrind/m_libcsetjmp.c
index 68c101e..5626748 100644
--- a/coregrind/m_libcsetjmp.c
+++ b/coregrind/m_libcsetjmp.c
@@ -382,25 +382,10 @@ __asm__(
 #if defined(VGP_amd64_linux) || defined(VGP_amd64_darwin) || \
     defined(VGP_amd64_solaris)
 
-__asm__(
-".text"  "\n"
-""       "\n"
-
-#if defined(VGP_amd64_linux) || defined(VGP_amd64_solaris)
-".global VG_MINIMAL_SETJMP"  "\n"  // rdi = jmp_buf
-"VG_MINIMAL_SETJMP:"  "\n"
-
-#elif defined(VGP_amd64_darwin)
-".globl _VG_MINIMAL_SETJMP"  "\n"  // rdi = jmp_buf
-"_VG_MINIMAL_SETJMP:"  "\n"
+__attribute__((returns_twice))
+UWord VG_MINIMAL_SETJMP(VG_MINIMAL_JMP_BUF(_env)) {
 
-#else
-#   error "Huh?"
-#endif
-
-"        movq   %rax,   0(%rdi)"   "\n"
-"        movq   %rbx,   8(%rdi)"   "\n"
-"        movq   %rcx,  16(%rdi)"   "\n"
+__asm__(
 "        movq   %rdx,  24(%rdi)"   "\n"
 "        movq   %rdi,  32(%rdi)"   "\n"
 "        movq   %rsi,  40(%rdi)"   "\n"
@@ -421,22 +406,13 @@ __asm__(
 "        movq   $0, %rax"          "\n"
 "        ret"                      "\n"
 ""       "\n"
+       );
+}
 
-
-#if defined(VGP_amd64_linux) || defined(VGP_amd64_solaris)
-".global VG_MINIMAL_LONGJMP"  "\n"
-"VG_MINIMAL_LONGJMP:"  "\n"    // rdi = jmp_buf
-
-#elif defined(VGP_amd64_darwin)
-".globl _VG_MINIMAL_LONGJMP"  "\n"
-"_VG_MINIMAL_LONGJMP:"  "\n"    // rdi = jmp_buf
-
-#else
-#   error "Huh?"
-#endif
+__attribute__((noreturn))
+void  VG_MINIMAL_LONGJMP(VG_MINIMAL_JMP_BUF(_env)) {
+  __asm__(
          // skip restoring rax; it's pointless
-"        movq     8(%rdi),  %rbx"    "\n"
-"        movq    16(%rdi),  %rcx"    "\n"
 "        movq    24(%rdi),  %rdx"    "\n"
          // defer restoring rdi; we still need it
 "        movq    40(%rdi),  %rsi"    "\n"
@@ -465,12 +441,8 @@ __asm__(
          // address space.
 "        jmp *%rax"                  "\n"
 ""       "\n"
-
-#if !defined(VGP_amd64_darwin)
-".previous"       "\n"
-#endif
-);
-
+         );
+}
 #endif /* VGP_amd64_linux || VGP_amd64_darwin || VGP_amd64_solaris */
Comment 21 Philippe Waroquiers 2018-03-18 17:29:42 UTC
(In reply to Дилян Палаузов from comment #20)
Thanks for looking at this further.

> With the change below m_libcsetjmp.c links on amd64 correctly with LTO.  Is
> the change equivalent (apart from cheating gcc restoring rax in in
> VG_MINIMAL_LONGJMP)?  Is volatile after __asm__ necessary?
I have tested the patch on amd64, and the regression tests are working.
So, what you propose might be a way to go.
However, I am not at ease to change the asm code as this is tricky,
and must be done and tested for all platforms.

So, I have pushed the patch which implements --enable-lto=yes|no
as ab773096df7aaaf46e8883af5ed4690f4d4499af.

If deemed desirable, another bug/wish could be opened to support
a better configuration mechanism (i.e. use CFLAGS, instead of having
predefined logic to setup CFLAGS and similar in configure).
Comment 22 Дилян Палаузов 2018-03-18 20:11:02 UTC
I am against using gcc-ar or gcc-ranlib.  These are probably not available, when clang is used, and it can do LTO, too.

Instead of cluttering configure.ac to use gcc-ac/gcc-ranlib, it can either rely on the fact, that /usr/(local/)?/bfd-plugins contain the LTO plugin, or alternatively compile something with -flto to an object file, call "nm the-object-file.o" and if the latter prints "missing plugin", then configure shall tell the user that she has to put the linker plugin under {libdir}/bfd-plugins .

This would help the user to adjust her system to be able to compile other software with LTO, even software whose authors don't know about LTO.

LTO is anyway not ready for the prime time yet.  emacs does not lTO-link with ld.bfd, but with ld.gold and php also doesn't work with lto.  So if the build process just uses ar/ranlib and it fails, then the user shall try without LTO or debug, debug, debug...

I would like to know why gcc creates gcc-ar instead of puthing the linker plug in a place, that normal ar would auto-load it, but I do not expect an answer here on this.
Comment 23 Дилян Палаузов 2018-03-19 17:26:21 UTC
I just verified, that LLVM does not install gcc-ar and gcc-ranlib, but llvm-ar and llvm-ranlib .

So to make LTO work with Clang, on systems where GCC is not installed, either llvm-ar and llvm-ranlib shall be used, or ar and ranlib shall be used and it must be assumed that the system is LTO-ready (the plugins are under {libdir}/bfd-plugins/).

Concerning installing by default the plugins on the compiler's "make install" on the location where binutils will look for them I filled https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84934 and https://bugs.llvm.org/show_bug.cgi?id=36802.

Shall I open for the assembler-review in coregrind/m_libcsetjmp.c a separate ticket, or can it be kept here?
Comment 24 Philippe Waroquiers 2018-03-19 21:06:31 UTC
(In reply to Дилян Палаузов from comment #22)
> I am against using gcc-ar or gcc-ranlib.  These are probably not available,
> when clang is used, and it can do LTO, too.
> 
> Instead of cluttering configure.ac to use gcc-ac/gcc-ranlib, it can either
> rely on the fact, that /usr/(local/)?/bfd-plugins contain the LTO plugin, or
> alternatively compile something with -flto to an object file, call "nm
> the-object-file.o" and if the latter prints "missing plugin", then configure
> shall tell the user that she has to put the linker plugin under
> {libdir}/bfd-plugins .

Using directly ar and similar is causing problems on some platforms,
(some of them that I care about). The gcc documentation (including the latest
gcc 7.3) explicitly indicates to use gcc-ar and similar
(I guess this ensures that in case you use a gcc in another path
that the 'standard' /usr/...; that the correct/associated plugins are used.
So, I do not think it is a good idea to diverge (for gcc)
from the explicitly documented way to build a static lib, even
if some advanced users might be able to fix/bypass/... using
plugins in a directory such as /usr/(local/)?/bfd-plugins 
(note that on my distro, there is no such directory. So, plugins are
somewhere else, I am guessing properly found by gcc-ar and similar).

> 
> This would help the user to adjust her system to be able to compile other
> software with LTO, even software whose authors don't know about LTO.
> 
> LTO is anyway not ready for the prime time yet.  emacs does not lTO-link
> with ld.bfd, but with ld.gold and php also doesn't work with lto.  So if the
> build process just uses ar/ranlib and it fails, then the user shall try
> without LTO or debug, debug, debug...
Or rather, configure shall do the correct thing, whenever it can, using
the way as documented in gcc manual.

> 
> I would like to know why gcc creates gcc-ar instead of puthing the linker
> plug in a place, that normal ar would auto-load it, but I do not expect an
> answer here on this.
As said above, if you have different gcc versions installed, and the plugin
differ according to gcc version, then gcc-ar and similar can
do the job properly. An 'hard-coded' path that ar would use will not
give the same. At least for some gcc, I see the plugin is part of gcc
itself.

As I understand, the real problem with gcc-ar and similar is that this
does not work with llvm, as llvm needs llvm-ar and similar.
I largely prefer to keep this bug closed, and have another one bug
discussing using clang+lto to build valgrind.
I suppose patches for this would not be too difficult, but I am not
an llvm user.

So, can you open another bug related to clang/lto/llvm-ar, ... ?

Thanks
Philippe
Comment 25 Philippe Waroquiers 2018-03-19 21:09:48 UTC
(In reply to Дилян Палаузов from comment #23)
> I just verified, that LLVM does not install gcc-ar and gcc-ranlib, but
> llvm-ar and llvm-ranlib .
> 
> So to make LTO work with Clang, on systems where GCC is not installed,
> either llvm-ar and llvm-ranlib shall be used, or ar and ranlib shall be used
> and it must be assumed that the system is LTO-ready (the plugins are under
> {libdir}/bfd-plugins/).
> 
> Concerning installing by default the plugins on the compiler's "make
> install" on the location where binutils will look for them I filled
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84934 and
> https://bugs.llvm.org/show_bug.cgi?id=36802.
Which, as I understand, does not work if you have several gcc versions
installed.

> 
> Shall I open for the assembler-review in coregrind/m_libcsetjmp.c a separate
> ticket, or can it be kept here?
Yes, that would be better, i.e. to have 2 new bugs:
* clang/lto/llvm-ar, ... (cdr comment 24)
* allow to configure valgrind using CFLAGS (a.o. to support LTO)

Thanks
Philippe
Comment 26 Дилян Палаузов 2018-03-22 14:51:21 UTC
With python 3.6, doing "./configure --with-lto --enable-optimizations && make" compiles with -flto, but calls ranlib/ar, not gcc-ranlib or gcc-ar.  Why can python-lto live without gcc-ranlib, but valgrind cannot?  Also "./configure --with-lto" alone does not imply -flto.

> > Concerning installing by default the plugins on the compiler's "make install" on the location where binutils will look for them I filled https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84934 and https://bugs.llvm.org/show_bug.cgi?id=36802.
> Which, as I understand, does not work if you have several gcc versions installed.

It works, as the plugin is backwards compatible.  You just install the newest gcc-plugin, it then handles the older versions.  In particular, if x86_64-pc-linux-gnu-gcc-6.4.1 is installed on a system before x86_64-pc-linux-gnu-gcc-6.4.1 is installed, and a file is then compiled with 6.4.1, strace shows that gcc-ar calls 7.3.1/liblto_plugin.so .

Concerning how LTO-libraries shall be built in portable way - by detecting in ./configure whether gcc or clang is used and then switching to gcc-ranlib or llvm-ranlib and why gcc does recomment gcc-nm instead of installing the plugin at the right place I asked at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84995 and http://lists.llvm.org/pipermail/llvm-dev/2018-March/122018.html .

While both llvm and gcc can do LTO, but considering that after being asked neither on clang side is described how to build the LTO plugin (https://bugs.llvm.org/show_bug.cgi?id=32759#c3) nor on gcc side the plugin is installed on the right place (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70345), honestly, I don't think that any of the llvm and gcc teams is willing to put efforts to offer LTO for the masses.

For switching the __asm__ in coregrind/m_(main,libcsetjmp).c I opened https://bugs.kde.org/show_bug.cgi?id=392180 .

> * allow to configure valgrind using CFLAGS (a.o. to support LTO)
How shall this work?  I can imagine at first place substituting -S with -E in

Makefile.vex.am-92-     $(CC) $(CFLAGS_FOR_GENOFFSETS) \
Makefile.vex.am-93-           $(LIBVEX_CFLAGS_NO_LTO) \
Makefile.vex.am-94-           $(AM_CFLAGS_@VGCONF_PLATFORM_PRI_CAPS@) \
Makefile.vex.am:95:                             -O -S -o auxprogs/genoffsets.s \
Makefile.vex.am-96-                                      $(srcdir)/auxprogs/genoffsets.c
Makefile.vex.am-97-     grep xyzzy auxprogs/genoffsets.s | grep "^[# ]*#define" \
Makefile.vex.am-98-        | sed "s/# #define/#define/g" \

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84934#c1 suggests that distributions are supposed to install the LTO-plugin on the right place.  Do the distrubutuions you care about do so and if not, why?
Comment 27 Philippe Waroquiers 2018-03-23 17:55:53 UTC
(In reply to Дилян Палаузов from comment #26)

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84934#c1 suggests that
> distributions are supposed to install the LTO-plugin on the right place.  Do
> the distrubutuions you care about do so and if not, why?
I have no idea, but the gcc doc says to use gcc-ar.
On at least some platforms, it was failing with ar, and succesful when
using gcc-ar (according to the gcc user manual).

If a gcc+distribution works with ar, it will also work with gcc-ar.

So, I see no reason to use ar instead of gcc-ar, as this can only
make lto fail on some setup, without repairing any other setup.

However, as I understand, llvm is important for you, and when compiling with
llvm, llvm-ar must  be used.

So, when I have a little bit of time, I will install llvm
and see if I can make that work.
Comment 28 Philippe Waroquiers 2018-03-25 19:18:36 UTC
(In reply to Philippe Waroquiers from comment #27)
> So, when I have a little bit of time, I will install llvm
> and see if I can make that work.

At least on Ubunty 17.10, with the distro llvm/clang, configuring and
building valgrind does not work properly, even without lto.

Without lto, some c++ regression tests fail to compile (most of this can
be bypassed by adding -std=c++11. Some tests are not compiling as
they seem to depend on gcc internals.

To try lto, some updates have to be done to configure.ac (I will attach the
tentative patch), but then the link fails with:
../coregrind/link_tool_exe_linux 0x58000000 clang     -o memcheck-amd64-linux -flto -m64 -O2 -finline-functions -g -Wall -Wmissing-prototypes -Wshadow -Wpointer-arith -Wstrict-prototypes -Wmissing-declarations -Wcast-align -Wcast-qual -Wwrite-strings -Wempty-body -Wformat -Wformat-security -Wignored-qualifiers -fno-stack-protector -fno-strict-aliasing -fno-builtin -Wno-cast-align -Wno-self-assign -Wno-tautological-compare -fomit-frame-pointer -O2 -static -nodefaultlibs -nostartfiles -u _start  -m64 memcheck_amd64_linux-mc_leakcheck.o memcheck_amd64_linux-mc_malloc_wrappers.o memcheck_amd64_linux-mc_main.o memcheck_amd64_linux-mc_main_asm.o memcheck_amd64_linux-mc_translate.o memcheck_amd64_linux-mc_machine.o memcheck_amd64_linux-mc_errors.o ../coregrind/libcoregrind-amd64-linux.a ../VEX/libvex-amd64-linux.a -lgcc 
Global is external, but doesn't have external or weak linkage!
i64 ()* @do_syscall_WRK
LLVM ERROR: Broken module found, compilation aborted!
clang: error: linker command failed with exit code 1 (use -v to see invocation)

So, some more work is needed to support llvm, and even more work needed
to support llvm+lto.
Comment 29 Philippe Waroquiers 2018-03-25 19:22:42 UTC
Created attachment 111644 [details]
tentative patch to support llvm+lto

With the attached patch, llvm+lto works till the link phase on ubuntu 17.10+clang version 4.0.1-6 (tags/RELEASE_401/final).
However, it fails at link time with:
Global is external, but doesn't have external or weak linkage!
i64 ()* @do_syscall_WRK
LLVM ERROR: Broken module found, compilation aborted!
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Comment 30 Дилян Палаузов 2018-06-20 23:19:04 UTC
For the record, with /usr/local/etc/config.site containing

CC=clang CXX=clang++ CFLAGS="-Wall -pipe -O3" CXXFLAGS="-pipe -O3" LDFLAGS="-L/usr/local/lib64 -Wl,-O1,-s -fuse-ld=gold"

and this changes:

diff --git a/configure.ac b/configure.ac
--- a/configure.ac
+++ b/configure.ac
@@ -47,7 +47,7 @@ AC_PROG_CXX
 AC_PROG_RANLIB
 # Set LTO_RANLIB variable to an lto enabled ranlib
 if test "x$LTO_RANLIB" = "x"; then
-  AC_PATH_PROGS([LTO_RANLIB], [gcc-ranlib])
+  AC_PATH_PROGS([LTO_RANLIB], [ranlib])
 fi
 AC_ARG_VAR([LTO_RANLIB],[Library indexer command for link time optimisation])
 
@@ -68,7 +68,7 @@ AC_ARG_VAR([AR],[Archiver command])
 
 # same for LTO_AR variable for lto enabled archiver
 if test "x$LTO_AR" = "x"; then
-  AC_PATH_PROGS([LTO_AR], [gcc-ar])
+  AC_PATH_PROGS([LTO_AR], [ar])
 fi
 AC_ARG_VAR([LTO_AR],[Archiver command for link time optimisation])
 
valgrind does compile and link using clang, after being ./configure'd with --enable-lto.

Likewise, having in /usr/local/etc/config.site

CFLAGS="-Wall -pipe -O3" CXXFLAGS="-pipe -O3" LDFLAGS="-L/usr/local/lib64 -Wl,-O1,-s -fuse-ld=gold"

and thus implicitly using gcc, ./configure --enable-lto also compiles and links.

So it is a matter of installing the linker plugin at the right place, uring the installation of the  compiler, assuming a recent version of binutils (I think the support for linker plugins is there for 3-4 years now).

If the compilers' linker plugins are not installed where the linker+ar+nm+ranlib searches them, consider asking your distribution to do so.

I use clang 6.0.0 and have not run any tests.