Bug 401742 - unreproducible .a files should not be built with LTO
Summary: unreproducible .a files should not be built with LTO
Status: REPORTED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: 3.14.0
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-12-04 15:58 UTC by Bernhard M. Wiedemann
Modified: 2018-12-05 21:03 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
fixing patch (3.23 KB, text/plain)
2018-12-04 16:16 UTC, Bernhard M. Wiedemann
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Bernhard M. Wiedemann 2018-12-04 15:58:05 UTC
SUMMARY
While working on reproducible builds for openSUSE, I found that
the valgrind package varied for every build
because its .a files contain .o files
created by gcc -flto -c
Such .o files are gcc version specific and not very useful.
Also they always differ from randomness introduced by gcc,
making it impossible to verify that the build was correct.

See also https://bugzilla.opensuse.org/show_bug.cgi?id=1118163

STEPS TO REPRODUCE
1. build openSUSE Factory valgrind package 2 times
2. compare results

OBSERVED RESULT
.a files differ

EXPECTED RESULT
It should be possible to generate identical build results
multiple times.

ADDITIONAL INFORMATION
I got a patch to fix this and will upload it to Phabricator.
Comment 1 Bernhard M. Wiedemann 2018-12-04 16:07:41 UTC
Submitted a fix at https://phabricator.kde.org/D17349
Comment 2 Tom Hughes 2018-12-04 16:10:39 UTC
The whole point of a .a file is to contain .o files surely...

I mean I know you can put any file in an ar archive but they are almost always used for .o files!

Anyway the most important point is that we don't use phabricator as we're not a KDE project so please just attach the patch here.
Comment 3 Tom Hughes 2018-12-04 16:11:33 UTC
Ah I see now I need to read that with the next line, but the odd line breaking confused me.
Comment 4 Bernhard M. Wiedemann 2018-12-04 16:16:25 UTC
Created attachment 116673 [details]
fixing patch
Comment 5 Tom Hughes 2018-12-04 16:42:57 UTC
Doesn't that just disable LTO though?

Note that you can already use --enable-lto=no when configuring if you want to disable it but judging by the comments in the configure scripts distribution builds are exactly what was expected to want to use it?
Comment 6 Bernhard M. Wiedemann 2018-12-04 21:00:07 UTC
The patch only disables LTO for .a file creation where it does not make sense to have it.
Shared objects and such still use LTO with the patch.

grep -- "-flto "
finds 134 occurrences in the build log with applied patch.
Comment 7 Tom Hughes 2018-12-04 22:09:46 UTC
Right, but those .a's are actually the main valgrind code I think? They're just temporary artefacts used as part of the build, not something we ship?

So by removing LTO from those you are removing it from most of valgrind...
Comment 8 Philippe Waroquiers 2018-12-04 22:21:59 UTC
(In reply to Bernhard M. Wiedemann from comment #6)
> The patch only disables LTO for .a file creation where it does not make
> sense to have it.
Can you give more explanations about what you mean with 
'disables LTO ... where it does not make sense to have it' ?

Something like:
It does not make sense to have LTO for these .a files because .....

Otherwise, with reference to comment 7: I effectively think that this
makes lto almost useless : very few .o, most of the code in in the coregrind and VEX .a.
See e.g. the below link command that builds memcheck: 
../coregrind/link_tool_exe_linux 0x58000000 gcc     -o memcheck-amd64-linux -flto -flto-partition=one -fuse-linker-plugin -m64 -O2 -finline-functions -g -Wall -Wmissing-prototypes -Wshadow -Wpointer-arith -Wstrict-prototypes -Wmissing-declarations -Wcast-align -Wcast-qual -Wwrite-strings -Wempty-body -Wformat -Wformat-security -Wignored-qualifiers -Wmissing-parameter-type -Wlogical-op -Wold-style-declaration -fno-stack-protector -fno-strict-aliasing -fno-builtin  -fomit-frame-pointer -O2 -static -nodefaultlibs -nostartfiles -u _start  -m64 memcheck_amd64_linux-mc_leakcheck.o memcheck_amd64_linux-mc_malloc_wrappers.o memcheck_amd64_linux-mc_main.o memcheck_amd64_linux-mc_main_asm.o memcheck_amd64_linux-mc_translate.o memcheck_amd64_linux-mc_machine.o memcheck_amd64_linux-mc_errors.o ../coregrind/libcoregrind-amd64-linux.a ../VEX/libvex-amd64-linux.a -lgcc 

I am also wondering why -flto option makes something not reproducable :
if given twice the same compiler commands with the same args and same sources,
why is gcc producing different things when given -flto ?
That looks like a 'non reproducible result' produced by gcc, and gcc
should be fixed to make -flto builds reproducible ?
Comment 9 Bernhard M. Wiedemann 2018-12-05 08:04:32 UTC
According to our (SUSE) compiler guys, the .o files created with -flto are only intermediate artifacts and not meant to be shipped to users.

So maybe the better fix would then be to not ship those .a files containing them?

It would also be nice if gcc could produce deterministic results there, but apparently they need some unique ID and it is hard (from code or design) to hash inputs or such.
Comment 10 Tom Hughes 2018-12-05 08:36:04 UTC
They are shipped because you need to link against them if you are building an out-of-tree tool. I don't see why it's wrong for them to be LTO enabled though?

If you really want to ship non-LTO builds then you would ideally need to build two versions of the libraries - one to link the in-tree tools against and one to ship for any out-of-tree tools to link against.

Or you could choose to disable LTO with the configure switch and not have any tools built with it.
Comment 11 Mark Wielaard 2018-12-05 11:19:46 UTC
To add to what Tom and Philippe said above the .a files are static archives that are installed to make it possible to create your own valgrind tools. When creating such tools they will need to be statically linked (valgrind tools cannot depend on shared libraries). Since they are statically linked they seem ideal candidates for LTO. But it sounds like this isn't really an anticipated use case for LTO?

Note that this isn't really something most people do. In Fedora the archives and the header files to build your own valgrind tools are put in a separate package (valgrind-tools-devel) and I don't really know if anybody every uses this subpackage (no other package in the distro depends on it). There is a "normal" valgrind-devel package too, which contains the header files that valgrind aware programs/libraries depend on. And that one is actually used by various other packages.
Comment 12 Bernhard M. Wiedemann 2018-12-05 14:52:14 UTC
According to our compiler guys, such .a files should then be created with 
-ffat-lto-objects so that they contain actual machine code
that can be used without having the same gcc version and lto-plugins.

In openSUSE, header files and .a files are mixed together in a valgrind-devel package and that is used in builds of
anjuta ceph gperftools libqt5-qtwebengine mpich rdma-core ruby2.5 wine xmms2

But maybe they only use the .h files.
I quickly tested xmms2, rdma-core, gperftools, anjuta
and none of the 4 uses .a files.

Anyway, such a patch seems useful:
valgrind-3.14.0/configure.ac:
-TEST_LTO_CFLAGS="-flto -flto-partition=one -fuse-linker-plugin"
+TEST_LTO_CFLAGS="-flto -flto-partition=one -fuse-linker-plugin -ffat-lto-objects"

That would allow us to strip the LTO sections from the .a files and still have the normal machine code.
Or we just drop the .a files from our rpms since nobody seems to use them.
Comment 13 Mark Wielaard 2018-12-05 21:03:26 UTC
(In reply to Bernhard M. Wiedemann from comment #12)
> In openSUSE, header files and .a files are mixed together in a
> valgrind-devel package and that is used in builds of
> anjuta ceph gperftools libqt5-qtwebengine mpich rdma-core ruby2.5 wine xmms2
> 
> But maybe they only use the .h files.
> I quickly tested xmms2, rdma-core, gperftools, anjuta
> and none of the 4 uses .a files.

Independent from how we resolve this LTO issue you might want to take a peek at the Fedora valgrind.spec: https://src.fedoraproject.org/rpms/valgrind/blob/master/f/valgrind.spec

valgrind-devel only contains the callgrind.h drd.h helgrind.h memcheck.h and valgrind.h include files valgrind aware programs (like the ones you list above) need. So it is only a couple of KB.

All other development files can go into a separate subpackage (e.g. valgrind-tools-devel). which is several MB.