Bug 164298

Summary: finitel() returns false normally but true under valgrind
Product: [Developer tools] valgrind Reporter: Andrés Roldán <aroldan>
Component: generalAssignee: Julian Seward <jseward>
Status: RESOLVED INTENTIONAL    
Severity: normal CC: mwelinder, newbie-02, tom
Priority: NOR    
Version: 3.3.1   
Target Milestone: ---   
Platform: Compiled Sources   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description Andrés Roldán 2008-06-17 15:37:09 UTC
Version:           3.3.1 (using Devel)
Installed from:    Compiled sources
Compiler:          gcc-4.3 
OS:                Linux

This is a bug forwarded from Debian BTS.

finitel is behaving incorrectly when compiled with gcc-4.3 and run under valgrind.  Change the compiler or run without valgrind and things work smoothly.

#include <math.h>
#include <assert.h>
#include <locale.h>
#include <stdio.h>

main()
{
   double d;
   long double ld;

   setlocale (LC_ALL, "C");
   sscanf ("Inf", "%lf", &d);

   ld = d;
   assert (!finitel (d));
}
------------------------------------------
gcc test.c -lm
valgrind ./a.out
....
a.out: t.c:15: main: Assertion `!finitel (d)' failed.

-- System Information:
Debian Release: lenny/sid
 APT prefers unstable
 APT policy: (500, 'unstable')
Architecture: i386 (i686)

Kernel: Linux 2.6.24-1-686-bigmem (SMP w/4 CPU cores)
Locale: LANG=en_CA.UTF-8, LC_CTYPE=en_CA.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages valgrind depends on:
ii  libc6                         2.7-12     GNU C Library: Shared libraries
Comment 1 Tom Hughes 2008-06-17 15:44:35 UTC
I can't see any evidence of valgrind asserting in the output you have provided.

What appears to be happening is that your test program is asserting, which is a completely different thing.

What your complaint therefore amounts to is that the finitel() function is returning true under valgrind but false when run normally.
Comment 2 Tom Hughes 2008-06-17 15:52:54 UTC
I can reproduce this when the test program is compiled with gcc 4.3.0 but not when it is compiled with 4.1.2 so there is obviously some difference in the generated code.

The difference appears to be that the call to finitel is now inlined, so this:

        fldl    -8(%rbp)
        fstpt   (%rsp)
        call    finitel

is replaced by this:

        fldl    -8(%rbp)
        fabs
        fldt    .LC3(%rip)
        fucomip %st(1), %st
        fstp    %st(0)
        setb    %al
        movzbl  %al, %eax
        xorl    $1, %eax

which is presumably not being emulated correctly, or is perhaps reliant on the full 80 bit x87 precision that valgrind does not emulate.
Comment 3 Tom Hughes 2008-06-17 15:53:49 UTC
Oh, and the constant at LC3 is defined thus:

.LC3:
        .long   4294967295
        .long   4294967295
        .long   32766
        .long   0
Comment 4 M Welinder 2008-06-29 03:46:33 UTC
Precision is not the issue, range is.

LC3 above should be loaded as roughly 1.18973e+4932, but under valgrind
it is loaded as +inf.  I think the code implements fabsl(x)<=LDBL_MAX.

Choices...

1. Implement long double under valgrind.  Probably a lot of work
2. Map oversize non-infinite values to DBL_MAX, not +inf.
3. Recognize the above code sniplet and map it to a function call.

For the record, the issue manifests itself on "valgrind gnumeric"
Comment 5 Julian Seward 2008-11-02 18:37:30 UTC
Known limitation.  FWIW, if gnumeric actually depends on 80-bit
FP arithmetic, then how can it be portable to ppc, sparc, etc,
which only support 64-bit IEEE754 ?
Comment 6 M Welinder 2008-11-03 15:16:03 UTC
Could you elaborate why this was wonfixed, please?  (Too much work,
architectural problem, other?)

As for Gnumeric, there are three different ways it can be compiled.

1. "double"-only.
2. Mostly "double", but use "long double" in a few key places.
3. All "long double".

(In addition, libgoffice does a self check that actually checks the finitel
call in this report.  We have reduced severity from fatal to warning due to
this problem.  That will be fully deployed in a couple of years.)

2 is the default, so by default a valgrinded gnumeric will produce different
results from a non-valgrinded gnumeric.  Note, that "long double" is not
necessarily 80 bits.  On Solaris/Sparc it is 128 bits.
Comment 7 Julian Seward 2008-11-03 20:06:14 UTC
> Could you elaborate why this was wonfixed, please?  (Too much work,
> architectural problem, other?)

Too much work, basically.  It's certainly fixable, but it would be
considerable work -- easily a week -- to add 80-bit FP primitives,
fix up 80-bit FP code generation, register allocation, fix Memcheck,
check nothing else got broken in the process.  It's a lot of extra
code (== stuff to maintain for ever more) and the case for it is,
well, let's say, not very compelling.

> As for Gnumeric, there are three different ways it can be compiled.
>
> 1. "double"-only.
> 2. Mostly "double", but use "long double" in a few key places.
> 3. All "long double".

Can I ask .. is it really of value to your users, to be able to
build Gnumeric in three different ways, with presumably slightly
different FP results depending on the platform?  I would have
thought it would be more useful if you hardwired (1), so that it
ran on the widest range of platforms and gave identical results
on all of them.  Am I missing something?
Comment 8 M Welinder 2008-11-04 15:18:08 UTC
> Can I ask .. is it really of value to your users, to be able to build Gnumeric
> in three different ways, with presumably slightly different FP results
> depending on the platform?

First, the floating-point semantics of C are pretty weakly defined.  You
will see differences between systems whether you want it or not simply
because functions like "log" will produce different results.  (Heck, on
x86_64 you can even get one result in 32-bit mode and another in 64-bit
mode!  That's pretty weird!)

Further differences occur because a regular x86 chip will by default
compute with full 80-bit precision, even if "double" instructions are
used.  Only when gcc decides to spill a temporary value to memory will
the "double" format be used.  I seem to recall that glibc's math
function -- even the "double" ones -- depend on this and become
significantly less accurate if the floating-point unit is told to
work only with 64-bits.  Presumably this means that they are less
accurate when valgrinded.

Back to Gnumeric.  Gnumeric is like, say, perl in this regard: whether
the precision is useful depends on what you do with it.  For your tax
calculations it doesn't matter.  For scientific computations it does.
Using (2) over (1) gives significant improvements to a number of
statistical calculations with very little effort.  The jury is still
out on (3).

It is unpleasant, but not a show-stopper, that running under valgrind
means taking different different paths through the program.  Luckily
the math-intensive parts of Gnumeric are by nature not prone to memory
errors.
Comment 9 b. 2022-03-14 06:21:54 UTC
sorry, hard thing to step in with reopening of old stuff, not yet sure if the system will allow, apologize! 
 
i can't contribute to the technics, but to the scope: consider not making valgrind universal in this point saves 1 to 2 weeks of work ... 

BUT! ... 

leaving it as it is probably wastes hours per day of some 100 people on he world who expect a better functioning, become trapped by this shortcoming, have to investigate ... thus in a global scope it wastes lot's of effort of skilled people trying to make a good job. 

suggestions: 
- either provide some warning that valgrind isn't qualified to check special things, and an appropriate note in the output ( or is such already in? ), 
- or fix it even if it's effort, it will make valgrind a better tool and pay back in a global scope. 

maybe you can organize a collection, crowd funding? to finance the effort? 

Best Regards, b.