Bug 214950 - New tool exp-floattrap
Summary: New tool exp-floattrap
Status: REPORTED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: unspecified
Platform: unspecified All
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-11-17 12:15 UTC by Mathias Fröhlich
Modified: 2024-02-19 14:48 UTC (History)
4 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Diff to include the exp-floattrap tool (764 bytes, patch)
2009-11-17 12:15 UTC, Mathias Fröhlich
Details
The subdirectory with the exp-floattrap tool. (26.57 KB, application/octet-stream)
2009-11-17 12:16 UTC, Mathias Fröhlich
Details
Updated patch remade for revision 11686. (26.74 KB, patch)
2011-04-07 14:09 UTC, Mathias Fröhlich
Details
Updated patch remade for revision 15454. (29.93 KB, patch)
2015-07-29 17:07 UTC, Mathias Fröhlich
Details
attachment-31153-0.html (29.78 KB, text/html)
2015-07-31 19:00 UTC, Mathias Fröhlich
Details
Updated patch agains rev 15464 (31.98 KB, patch)
2015-07-31 19:01 UTC, Mathias Fröhlich
Details
attachment-25498-0.html (17.39 KB, text/html)
2015-08-01 13:54 UTC, Mathias Fröhlich
Details
Updated patch against rev 15566 (57.04 KB, patch)
2015-08-18 20:14 UTC, Mathias Fröhlich
Details
Patch rebased after 57dbcacfdbff0d4c12dcd52ff56f159631499dc6 (57.76 KB, application/mbox)
2023-04-12 15:53 UTC, Paul Floyd
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mathias Fröhlich 2009-11-17 12:15:19 UTC
Created attachment 38400 [details]
Diff to include the exp-floattrap tool

As requested on the mailing list this is now attached here.
See:

http://sourceforge.net/mailarchive/forum.php?thread_name=200909100842.37877.M.Froehlich%40science-computing.de&forum_name=valgrind-developers

and the followup mail.

Again here the original Mail:

Hi all,
 
 I have done a small valgrind tool that tests each result of a floating point 
 operation agains inf and nan values.
 
 I have written that tool to find problems in some simulation codes at a 
 customer here. Also I used that last weekend to find similar problems in 
 Flightgear (an open source flight simulation).
 So one might ask if the work of this tool could be done by the usual floating 
 point traps.
 The problem with the hardware traps is that they are asyncronous. That means 
 you get them a little later than when the problem happens - I believe at the 
 next fpu instruction past the error. Which means that you sometimes get a 
 usable hint where the problem is but sometomes you get signaled somewhere 
 completely different.
 
 That tool prooved to be much more useful since it really points you to the 
 *exact* source code line where the problem happens.
 
 I would like to know if there is any chance to get such a tool into the public 
 valgrind tree.
 And if so, what to do to get that stuff reviewed, contributed ...
 
 GReetings
 
 Mathias
Comment 1 Mathias Fröhlich 2009-11-17 12:16:26 UTC
Created attachment 38401 [details]
The subdirectory with the exp-floattrap tool.
Comment 2 Mathias Fröhlich 2011-04-07 14:09:24 UTC
Created attachment 58675 [details]
Updated patch remade for revision 11686.

Due to multiple requests in my mailbox, I have updated the attached patch to the current version of valgrind.
This patch applies and works also for the past 3.6 release version.
Comment 3 Mathias Fröhlich 2015-07-29 17:07:04 UTC
Created attachment 93794 [details]
Updated patch remade for revision 15454.
Comment 4 Florian Krohm 2015-07-29 21:31:38 UTC
Couple of comments. 
I've built this on x86-64 with GCC. There are compiler warnings such as this one:

ft_main.c:130:13: warning: function declaration isn't a prototype [-Wstrict-prototypes]
 static void detected_float_inf()

Use (void) instead.

The testcases have the same issue although there are no complaints because
the compile flags are more permissive there. I suggest you fix those nevertheless.

Indentation is 3 spaces (and no tabs). 

Running the testcases failed. There ought to be a filter_stderr script in the tests directory.
We would also want more testcases. There should be at least one for every
insert_check_XXXXX function.

Index: exp-floattrap/tests/inf-float.c
===================================================================
--- exp-floattrap/tests/inf-float.c	(revision 0)
+++ exp-floattrap/tests/inf-float.c	(revision 0)
@@ -0,0 +1,15 @@
+#include <stdio.h>
+
+int main()
+{
+  float zero = 0;
+  float one = 1;
+  float inf;
+
+  inf = one/zero;

Add volatile to the declarations to make sure the computation is not
constant-folded by some clever compiler.  Likewise for the other testcases.


Index: exp-floattrap/docs/ft-manual.xml
===================================================================
--- exp-floattrap/docs/ft-manual.xml	(revision 0)
+++ exp-floattrap/docs/ft-manual.xml	(revision 0)
@@ -0,0 +1,41 @@
+<?xml version="1.0"?> <!-- -*- sgml -*- -->
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
+  "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
+
+
+<chapter id="ft-manual" xreflabel="Floattrap">
+
+<title>Floattrap: the "floatingpoint trap" tool</title>
+<subtitle>Checks each result of floating point operations for NaN or
+Inf values.</subtitle>
+
+<para>A tool that examines the result of every floating point
+operation and checks these results for NaN or inf values.</para>
+
+<para>Where is it really useful to have floating point values that
+represent invalid numbers like the IEEE NaN value or having values
+where you can represent values beyond the range of floating point
+values, it is also often catastrophic if these values occur without
+being noticed. Often when tese values show up in computational
+programs or simulation codes, NaNs propagate fast through the whole
+simulation program. Typically you will notice them at some values you
+observe past the original problem occured. But tracking the original
+line of code where the first NaN is happens is a huge problem.</para>
+
+<para>There are hardware floating point traps available on most
+cpus. But they usually happen ansyncronous and signal very often
+sometimes past the real problem happened. This really limits the cases
+where you can get useful information from that hardware traps to very
+few cases.</para>
+

Can you rephrase this a bit? The key point here is that you want to
report nans and infs as soon as possible )namely, when they are created) because
computation with those funny numbers is usually unintended and causes
unexpected output down the road. 

I also suggest to spell-check the text. There are some errors..


Index: exp-floattrap/ft_main.c
===================================================================
--- exp-floattrap/ft_main.c	(revision 0)
+++ exp-floattrap/ft_main.c	(revision 0)
@@ -0,0 +1,736 @@
+
+/*--------------------------------------------------------------------*/
+/*--- FloatTrap: The floating point trap.                ft_main.c ---*/
+/*--------------------------------------------------------------------*/
+
+/*
+   This file is part of FloatTrap, the syncronous floating point trap tracer.

the -> a
synchronous

+static void
+insert_check_float_(IRSB* bb, IRTemp value)

For consistency with other functions you defined earlier use

static void insert_check_float_(IRSB* bb, IRTemp value)

+{
+  IRTemp absValue = addBinop(bb, Ity_I32, Iop_And32,
+                             IRExpr_RdTmp(value),
+                             IRExpr_Const(IRConst_U32(0x7fffffff)));
+
+  if (check_for_inf) {
+    IRTemp isFinite = addBinop(bb, Ity_I1, Iop_CmpEQ32,
+                               IRExpr_RdTmp(absValue),
+                               IRExpr_Const(IRConst_U32(0x7f800000)));

What is this magic constant? Please #define a symbolic constant.

+    
+    IRDirty* dirty = unsafeIRDirty_0_N(0, "detected_float_inf",
+                                       detected_float_inf, mkIRExprVec_0());
+    dirty->guard = IRExpr_RdTmp(isFinite);

Perhaps that variable should be called isInfinite?

+    addStmtToIRSB(bb, IRStmt_Dirty(dirty));
+  }
+
+  if (check_for_nan) {
+    // Neat trick:
+    // A nan has all bits in its exponent set and any bit in the mantissa.
+    // So a nan's bitpattern int representation can be written as
+    //  0x7f800000 + x
+    // where x > 0.
+    // That means compute:
+    //   t = 0x7f800000 - (0x7f800000 + x) = -x
+    // and look for the sign of t.
+    // If the exponent is not 0x7f800000 (that is less than 0x7f800000), the
+    // resulting t value will be >= 0.
+
+    IRTemp nanTestTemp = addBinop(bb, Ity_I32, Iop_Sub32,
+                                  IRExpr_Const(IRConst_U32(0x7f800000)),
+                                  IRExpr_RdTmp(absValue));
+
+#if 0
+    // Would be easiest, but valgrind issues an error if I do so ...

Can you elaborate what the problem is?  

+static void
+insert_check_double_(IRSB* bb, IRTemp value)
+{
+  IRTemp absValue = addBinop(bb, Ity_I64, Iop_And64,
+                             IRExpr_RdTmp(value),
+                             IRExpr_Const(IRConst_U64(0x7fffffffffffffffll)));

Use ull suffix here.

+
+  if (check_for_inf) {
+    IRTemp isFinite = addBinop(bb, Ity_I1, Iop_CmpEQ64,
+                               IRExpr_RdTmp(absValue),
+                               IRExpr_Const(IRConst_U64(0x7ff0000000000000ll)));

and here.
What does this large constant represent? 

+
+static void
+insert_check_double128(IRSB* bb, IRTemp value)
+{
+  IRTemp tempHi, tempLo;
+
+  tempHi = addUnop(bb, Ity_F64, Iop_F128HItoF64, IRExpr_RdTmp(value));
+  insert_check_double(bb, tempHi);
+
+  tempLo = addUnop(bb, Ity_F64, Iop_F128LOtoF64, IRExpr_RdTmp(value));
+  insert_check_double(bb, tempLo);

So you're using insert_check_double twice. I doubt that is correct.
Did you test this?

+static
+IRSB* ft_instrument ( VgCallbackClosure* closure,
+                      IRSB* bb_in,
+                      const VexGuestLayout* layout, 
+                      const VexGuestExtents* vge,
+		      const VexArchInfo* archinfo_host,
+                      IRType gWordTy, IRType hWordTy )
+{
+  Int i;
+  IRSB* bb;
+  Bool modified = False;
+
+  bb = deepCopyIRSBExceptStmts(bb_in);
+  for (i = 0; i < bb_in->stmts_used; i++) {
+    IRStmt* stmt;
+    IRType tempType;
+
+    /* Append the current statement */
+    stmt = deepCopyIRStmt(bb_in->stmts[i]);
+    addStmtToIRSB(bb, stmt);
+        
+    /* Ok, we have an expression with a result */
+    if (stmt->tag != Ist_WrTmp)
+      continue;
+
+    /* We are only interrested in floaintg point assignemnt expressions */
+    tempType = typeOfIRTemp(bb_in->tyenv, stmt->Ist.WrTmp.tmp);
+    if (tempType == Ity_F16) {
+      /* Results from floating point ops are always checked */
+      IRExpr* rhs = stmt->Ist.WrTmp.data;
+      switch (rhs->tag) {
+      case Iex_Qop:
+      case Iex_Triop:
+      case Iex_Binop:
+      case Iex_Unop:
+        insert_check_float16(bb, stmt->Ist.WrTmp.tmp);
+        modified = True;

Below you do:
        if (insert_check_.........)
          modified = True;
So this here is inconsistent.
You can also simply say:
   modified = insert_check_......

More occurrences of this below..

+        break;
+      default:
+        break;
+      }
+    }
+    else if (tempType == Ity_F32) {
+      /* Results from floating point ops are always checked */
+      IRExpr* rhs = stmt->Ist.WrTmp.data;
+      switch (rhs->tag) {
+      case Iex_Qop:
+      case Iex_Triop:
+      case Iex_Binop:
+      case Iex_Unop:
+        insert_check_float(bb, stmt->Ist.WrTmp.tmp);
+        modified = True;
+        break;
+      default:
+        break;
+      }
+    }
+    else if (tempType == Ity_F64) {
+      /* Results from floating point ops are always checked */
+      IRExpr* rhs = stmt->Ist.WrTmp.data;
+      switch (rhs->tag) {
+      case Iex_Qop:
+      case Iex_Triop:
+      case Iex_Binop:
+      case Iex_Unop:
+        insert_check_double(bb, stmt->Ist.WrTmp.tmp);
+        modified = True;
+        break;
+      default:
+        break;
+      }
+    }
+    else if (tempType == Ity_F128) {
+      /* Results from floating point ops are always checked */
+      IRExpr* rhs = stmt->Ist.WrTmp.data;
+      switch (rhs->tag) {
+      case Iex_Qop:
+      case Iex_Triop:
+      case Iex_Binop:
+      case Iex_Unop:
+        insert_check_double128(bb, stmt->Ist.WrTmp.tmp);
+        modified = True;
+        break;
+      default:
+        break;
+      }
+    }
+    else if (tempType == Ity_V128) {
+      IRExpr* rhs = stmt->Ist.WrTmp.data;
+      switch (rhs->tag) {
+      case Iex_Qop:
+        break;
+      case Iex_Triop:
+        if (insert_check(bb, rhs->Iex.Triop.details->op, stmt->Ist.WrTmp.tmp))
+          modified = True;
+        break;
+      case Iex_Binop:
+        if (insert_check(bb, rhs->Iex.Binop.op, stmt->Ist.WrTmp.tmp))
+          modified = True;
+        break;
+      case Iex_Unop:
+        if (insert_check(bb, rhs->Iex.Unop.op, stmt->Ist.WrTmp.tmp))
+          modified = True;
+        break;
+
+      default:
+        break;
+      }
+    }
+    else if (tempType == Ity_V256) {
+      IRExpr* rhs = stmt->Ist.WrTmp.data;
+      switch (rhs->tag) {
+      case Iex_Qop:
+        break;
+      case Iex_Triop:
+        if (insert_check(bb, rhs->Iex.Triop.details->op, stmt->Ist.WrTmp.tmp))
+          modified = True;
+        break;
+      case Iex_Binop:
+        if (insert_check(bb, rhs->Iex.Binop.op, stmt->Ist.WrTmp.tmp))
+          modified = True;
+        break;
+      case Iex_Unop:
+        if (insert_check(bb, rhs->Iex.Unop.op, stmt->Ist.WrTmp.tmp))
+          modified = True;
+        break;
+
+      default:
+        break;
+      }
+    }
+  }
+
+  (void)modified;
+  /* if (modified) */
+  /*     ppIRSB ( bb ); */
+

   if (0 && modified)

is a pattern that we've been using in other places..

+static void ft_pre_clo_init(void)
+{
+  VG_(details_name)            ("Exp-Floattrap");

exp-floattrap
Comment 5 Mathias Fröhlich 2015-07-31 19:00:24 UTC
Created attachment 93821 [details]
attachment-31153-0.html

Hi Florian,

Thanks for the review!

On Wednesday, July 29, 2015 21:31:38 Florian Krohm wrote:
> --- Comment #4 from Florian Krohm <florian@eich-krohm.de> ---
> Couple of comments.
I have changed most of what you requested.
For the ones I left out, there is a comment below.

> Running the testcases failed. There ought to be a filter_stderr script in the
> tests directory.
> We would also want more testcases. There should be at least one for every
> insert_check_XXXXX function.
The test cases were even outdated a lot.
I have at least brought the current ones into a better shape.
I would agree on the variety of test cases.
Nevertheless, I have not yet(!?) provided these additional cases.

> Can you rephrase this a bit? The key point here is that you want to
> report nans and infs as soon as possible )namely, when they are created)
> because
> computation with those funny numbers is usually unintended and causes
> unexpected output down the road. 
> 
> I also suggest to spell-check the text. There are some errors..
Also, changed this a lot. But any suggestions/improvements are welcome. I am no English
native and I am pretty sure my English wording can stand a lot of improvements.

> +   This file is part of FloatTrap, the syncronous floating point trap tracer.
> 
> the -> a
> synchronous
Well, it is synchronous - that is the whole point *not* to be *A*synchronous.

> +{
> +  IRTemp absValue = addBinop(bb, Ity_I32, Iop_And32,
> +                             IRExpr_RdTmp(value),
> +                             IRExpr_Const(IRConst_U32(0x7fffffff)));
> +
> +  if (check_for_inf) {
> +    IRTemp isFinite = addBinop(bb, Ity_I1, Iop_CmpEQ32,
> +                               IRExpr_RdTmp(absValue),
> +                               IRExpr_Const(IRConst_U32(0x7f800000)));
> 
> What is this magic constant? Please #define a symbolic constant.
That's just +inf as a bit pattern here.
Even if the constant appears somewhere else, this pattern means something
different in the other places. So for example below, when detecting nan's
the same value appears but not really in it's meaning as +inf. Rather than that,
it does just what the comment says: nifty bit fiddling so that we find out
what we need. BTW, this check for nan is taken from glibc's isnan function.

> +  if (check_for_nan) {
> +    // Neat trick:
> +    // A nan has all bits in its exponent set and any bit in the mantissa.
> +    // So a nan's bitpattern int representation can be written as
> +    //  0x7f800000 + x
> +    // where x > 0.
> +    // That means compute:
> +    //   t = 0x7f800000 - (0x7f800000 + x) = -x
> +    // and look for the sign of t.
> +    // If the exponent is not 0x7f800000 (that is less than 0x7f800000), the
> +    // resulting t value will be >= 0.
When I see c++ comments, I would think that valgrind wants c comments, right?

> +#if 0
> +    // Would be easiest, but valgrind issues an error if I do so ...
> 
> Can you elaborate what the problem is?  
No, I don't remember. The original version dates back to 2008 I think ...
It looks like we don't need that anymore.

> IRExpr_Const(IRConst_U64(0x7ff0000000000000ll)));
> 
> What does this large constant represent? 
In this case +inf as bit pattern for double.
I have added a comment like above.

> +static void
> +insert_check_double128(IRSB* bb, IRTemp value)
> +{
> +  IRTemp tempHi, tempLo;
> +
> +  tempHi = addUnop(bb, Ity_F64, Iop_F128HItoF64, IRExpr_RdTmp(value));
> +  insert_check_double(bb, tempHi);
> +
> +  tempLo = addUnop(bb, Ity_F64, Iop_F128LOtoF64, IRExpr_RdTmp(value));
> +  insert_check_double(bb, tempLo);
> 
> So you're using insert_check_double twice. I doubt that is correct.
Twice, yes, once on the high and once on the low part.
Actually, that is something I was wondering about. I was looking for a conversion
operator from F128 to I128 so that I can do the same bitmask trick like the
others precisions do. But I could not find such a conversion.
So how is that F128 supposed to work with VEX? Is this the usual doubledouble
approach or is this a real 128 bit IEEE float?

> Did you test this?
Just no ...

Is there some more documentation on VEX? I just did match what appears plausible to me.

> +static
> +IRSB* ft_instrument ( VgCallbackClosure* closure,
> +                      IRSB* bb_in,
> +                      const VexGuestLayout* layout, 
> +                      const VexGuestExtents* vge,
> +              const VexArchInfo* archinfo_host,
> +                      IRType gWordTy, IRType hWordTy )
> +{
> +  Int i;
> +  IRSB* bb;
> +  Bool modified = False;
> +
> +  bb = deepCopyIRSBExceptStmts(bb_in);
> +  for (i = 0; i < bb_in->stmts_used; i++) {
> +    IRStmt* stmt;
> +    IRType tempType;
> +
> +    /* Append the current statement */
> +    stmt = deepCopyIRStmt(bb_in->stmts[i]);
> +    addStmtToIRSB(bb, stmt);
> +        
> +    /* Ok, we have an expression with a result */
> +    if (stmt->tag != Ist_WrTmp)
> +      continue;
> +
> +    /* We are only interrested in floaintg point assignemnt expressions */
> +    tempType = typeOfIRTemp(bb_in->tyenv, stmt->Ist.WrTmp.tmp);
> +    if (tempType == Ity_F16) {
> +      /* Results from floating point ops are always checked */
> +      IRExpr* rhs = stmt->Ist.WrTmp.data;
> +      switch (rhs->tag) {
> +      case Iex_Qop:
> +      case Iex_Triop:
> +      case Iex_Binop:
> +      case Iex_Unop:
> +        insert_check_float16(bb, stmt->Ist.WrTmp.tmp);
> +        modified = True;
> 
> Below you do:
>         if (insert_check_.........)
>           modified = True;
> So this here is inconsistent.
> You can also simply say:
>    modified = insert_check_......
Hmm, I don't think this works the same as this happens in a loop and might
reset the value to False if one of the later conversions did not
result into a modification.
I did use
  modified = insert_check...(blah) || modified;
for some OpenGL piglit tests recently, but I am not sure if this is
really better?
Anyway, these code pieces could stand some restructuring but the point in the past
provided patch was to have something that works for the ones that asked.

So, having said that, if there is a chance for this to go upstream, I can polish whatever
you want. But if not, I have to realize that there is life beyond software which I enjoy also.
:-)

Greetings

Mathias
Comment 6 Mathias Fröhlich 2015-07-31 19:01:53 UTC
Created attachment 93822 [details]
Updated patch agains rev 15464
Comment 7 Florian Krohm 2015-08-01 10:32:56 UTC
(In reply to Mathias Fröhlich from comment #5)

I'm going to answer a few quick questions here:

> When I see c++ comments, I would think that valgrind wants c comments, right?
> 

For multi-line comments we'd like C-style. For inline comments C++ comments are fine, but there is no strong preference one way or the other.

> > +insert_check_double128(IRSB* bb, IRTemp value)
> > +{
> > +  IRTemp tempHi, tempLo;
> > +
> > +  tempHi = addUnop(bb, Ity_F64, Iop_F128HItoF64, IRExpr_RdTmp(value));
> > +  insert_check_double(bb, tempHi);
> > +
> > +  tempLo = addUnop(bb, Ity_F64, Iop_F128LOtoF64, IRExpr_RdTmp(value));
> > +  insert_check_double(bb, tempLo);
> > 
> > So you're using insert_check_double twice. I doubt that is correct.
> Twice, yes, once on the high and once on the low part.
> Actually, that is something I was wondering about. I was looking for a
> conversion
> operator from F128 to I128 so that I can do the same bitmask trick like the
> others precisions do. But I could not find such a conversion.
> So how is that F128 supposed to work with VEX? Is this the usual doubledouble
> approach or is this a real 128 bit IEEE float?

F128 is a true 128 bit IEEE float. You cannot look at it as two 64 bit floats.
There is no need for a F128 to I128 conversion because there are no 128 bit wide integers on the platforms we support. You can convert to I64 or I32 or take the low and high 8 bytes of an F128.
I have access to a machine with 128 bit floating point values to I can help with testing that.

> Is there some more documentation on VEX? I just did match what appears
> plausible to me.

libvex_ir.h is the main documentation and the lackey source tree (and 'none' source tree) are good for inspiration.

> So, having said that, if there is a chance for this to go upstream, I can
> polish whatever
> you want. But if not, I have to realize that there is life beyond software
> which I enjoy also.

This is the main point we need to discuss before I take another look at the patch.
The tool itself is simple enough so, once finalised, I do not see much of a maintenance issue. We can just carry it along and adjust to VEX changes of which there are not many.

However, having a proper set of testcases is critical to getting the patch upstream. What is currently there is not good enough. Also, you need to convince us that the code is working properly. It currently does not for 128 bit floats and I'm not at all sure that the handling of the
various SIMD IROps is correct either. Testcases help here as well as do more comments in the code.
You need to figure out for yourself whether that is something you're willing to do.
Comment 8 Mathias Fröhlich 2015-08-01 13:54:23 UTC
Created attachment 93833 [details]
attachment-25498-0.html

Hi Florian,

On Saturday, August 01, 2015 10:32:56 Florian Krohm wrote:
> > When I see c++ comments, I would think that valgrind wants c comments, right?
> > 
> 
> For multi-line comments we'd like C-style. For inline comments C++ comments are
> fine, but there is no strong preference one way or the other.
Ok, so, I will adapt the longer ones at least.

> > > +insert_check_double128(IRSB* bb, IRTemp value)
> > > +{
> > > +  IRTemp tempHi, tempLo;
> > > +
> > > +  tempHi = addUnop(bb, Ity_F64, Iop_F128HItoF64, IRExpr_RdTmp(value));
> > > +  insert_check_double(bb, tempHi);
> > > +
> > > +  tempLo = addUnop(bb, Ity_F64, Iop_F128LOtoF64, IRExpr_RdTmp(value));
> > > +  insert_check_double(bb, tempLo);
> > > 
> > > So you're using insert_check_double twice. I doubt that is correct.
> > Twice, yes, once on the high and once on the low part.
> > Actually, that is something I was wondering about. I was looking for a
> > conversion
> > operator from F128 to I128 so that I can do the same bitmask trick like the
> > others precisions do. But I could not find such a conversion.
> > So how is that F128 supposed to work with VEX? Is this the usual doubledouble
> > approach or is this a real 128 bit IEEE float?
> 
> F128 is a true 128 bit IEEE float. You cannot look at it as two 64 bit floats.
> There is no need for a F128 to I128 conversion because there are no 128 bit
> wide integers on the platforms we support. You can convert to I64 or I32 or
> take the low and high 8 bytes of an F128.
> I have access to a machine with 128 bit floating point values to I can help
> with testing that.
Ok, I will provide something that that should work in theory - if you can care
for the practice it would be great!!!

> > Is there some more documentation on VEX? I just did match what appears
> > plausible to me.
> 
> libvex_ir.h is the main documentation and the lackey source tree (and 'none'
> source tree) are good for inspiration.
Ok.

> > So, having said that, if there is a chance for this to go upstream, I can
> > polish whatever
> > you want. But if not, I have to realize that there is life beyond software
> > which I enjoy also.
> 
> This is the main point we need to discuss before I take another look at the
> patch.
> The tool itself is simple enough so, once finalised, I do not see much of a
> maintenance issue. We can just carry it along and adjust to VEX changes of
> which there are not many.
> 
> However, having a proper set of testcases is critical to getting the patch
> upstream. What is currently there is not good enough. Also, you need to
> convince us that the code is working properly. It currently does not for 128
> bit floats and I'm not at all sure that the handling of the
> various SIMD IROps is correct either. Testcases help here as well as do more
> comments in the code.
> You need to figure out for yourself whether that is something you're willing to
> do.

Under these conditions, I am fine!
I can provide test cases and complete the missing stuff!
And I am around for questions at any time. My mail address did not change for the
at least 15 years, so you can really reach me and I am willing to help also
past any initial contribution.

In the initial mail regarding this tool some years ago one of the valgrind people
was talking about maintainership for this thing. And this is just one notch too
strong for me. If I personally say that I will maintain this in the future, than
I commit myself to constantly monitor the valgrind development mailing lists and
respond in time to requests coming in there. At least this is what I would mean
when I ask somebody for maintainership. But I cannot take maintainership in the
sense I understand mainteinership for each of the about 1000loc contributions
that I did in the past for OSS projects. And yes, as you said, the tool is small
and not that hard to understand.

If we are really talking about upstreaming, there is no need to convince me that
test cases are important and that it should work in all cases. I just won't spend my
spare time polishing if we end up later with this maintainership discussion again.

So, from what I can hear, expect a full set of test cases coming up soon.

Greetings from Stuttgart/Germany

Mathias
Comment 9 Florian Krohm 2015-08-09 10:49:00 UTC
Thats for the updated version. Here are a few more comments on it:

> Index: exp-floattrap/tests/inf-float.vgtest
> ===================================================================
> --- exp-floattrap/tests/inf-float.vgtest	(revision 0)
> +++ exp-floattrap/tests/inf-float.vgtest	(revision 0)
> @@ -0,0 +1,3 @@
> +prog: inf-float
> +vgopts: --trap-inf=yes

It is recommended to add -q to vgopts unless you're really interested in ERROR SUMMARY or the boiler plate stuff valgrind writes out at the beginning. And here we're not. This allows to simplify the filter_stderr script:

> Index: exp-floattrap/tests/filter_stderr
> ===================================================================
> --- exp-floattrap/tests/filter_stderr	(revision 0)
> +++ exp-floattrap/tests/filter_stderr	(revision 0)

...

> +# Remove "exp-floattrap, ..." line up to the following copyright line.
> +sed "/^exp-floattrap/,+2 d" |

This can go away, because of -q (see above)

> Index: exp-floattrap/tests/inf-float.c
> ===================================================================
> --- exp-floattrap/tests/inf-float.c	(revision 0)
> +++ exp-floattrap/tests/inf-float.c	(revision 0)
> @@ -0,0 +1,11 @@
> +int main(void)
> +{
> +   // make sure that the value computation is not optimized away by the compiler
> +   volatile float zero = 0;
> +   volatile float one = 1;
> +   volatile float inf;
> +
> +   inf = one/zero;

Compiler warns here that value is unused. Did you not see this warning ?

> +
> +   return 0;

Change to return (int)inf;
which fixes the warning.
Likewise in the other tests.

> Index: exp-floattrap/tests/Makefile.am
> ===================================================================
> --- exp-floattrap/tests/Makefile.am	(revision 0)
> +++ exp-floattrap/tests/Makefile.am	(revision 0)
> @@ -0,0 +1,13 @@
> +include $(top_srcdir)/Makefile.tool-tests.am
> +
> +SUBDIRS = .
> +DIST_SUBDIRS = .
> +
> +dist_noinst_SCRIPTS = \
> +        filter_stderr
> +
> +check_PROGRAMS = \
> +	inf-float \
> +	inf-double \
> +	nan-float \
> +	nan-double

This Makefile.am needs work.  If you had run "make regtest" then at the end
there would have been complaints like these:

...checking makefile consistency
exp-floattrap/tests/Makefile.am:1: error: inf-double.stderr.exp is missing in EXTRA_DIST
exp-floattrap/tests/Makefile.am:1: error: inf-double.vgtest is missing in EXTRA_DIST
exp-floattrap/tests/Makefile.am:1: error: inf-float.stderr.exp is missing in EXTRA_DIST
exp-floattrap/tests/Makefile.am:1: error: inf-float.vgtest is missing in EXTRA_DIST

Use this command: "make post-regtest-checks"  to check whether your makefiles
are OK.  make post-regtest-checks is run at the end of make regtest.

Then there is this warning:

...checking header files and include directives
Unknown directory 'exp-floattrap'. Please update check_headers_and_includes

You need to update that script. It's obvious what to do once you look at it.

As a final check, you need to run

make dist BUILD_ALL_DOCS=no 

 in the root of the source tree. This command builds a distribution tarball
 and, obviously, it should succeed. 


> Index: exp-floattrap/tests/inf-float.stderr.exp

The .exp files need to be updated due to the  -q  addition in the .vgtest
files.


> Index: exp-floattrap/ft_main.c
> ===================================================================
> --- exp-floattrap/ft_main.c	(revision 0)
> +++ exp-floattrap/ft_main.c	(revision 0)
> @@ -0,0 +1,700 @@

> +   This file is part of FloatTrap, the syncronous floating point trap tracer.

This file is part of FloatTrap, a synchronous floating point trap tracer.

> +
> +/*--------------------------------------------------------------------*/
> +/*--- FloatTrap: The floating point trap.                ft_main.c ---*/

I think you want to say:

FloatTrap: A floating point trap detector.


> +static void ft_tool_error_pp(const Error* e)
> +{
> +   VG_(message)(Vg_UserMsg, "%s:", VG_(get_error_string)(e));
> +   VG_(pp_ExeContext)(VG_(get_error_where)(e));
> +}
> +
> +static UInt ft_tool_error_update_extra(const Error* e)
> +{
> +   return 0;
> +}
> +
> +static Bool ft_tool_error_recog(const HChar* name, Supp* supp)
> +{
> +   return True;
> +}
> +
> +static Bool ft_tool_error_read_extra(Int fd, HChar** buf, SizeT *s, Int *i, Supp* supp)
> +{
> +   return True;
> +}
> +
> +static Bool ft_tool_error_matches(const Error* e, const Supp* supp)
> +{
> +   return True;
> +}
> +
> +static const HChar* ft_tool_error_name(const Error* e)
> +{
> +   switch (VG_(get_error_kind)(e)) {
> +   case NaNResult:
> +      return "NaN error";
> +   case InfResult:
> +      return "Inf error";
> +   default:
> +      return "unknown";
> +   }
> +}
> +
> +static SizeT ft_tool_error_get_extra_si(const Error* e, HChar * c, Int i)
> +{
> +   return 0;
> +}
> +
> +static SizeT ft_tool_error_print_extra_su(const Supp* supp, HChar * c, Int i)
> +{
> +   return 0;
> +}
> +
> +static void ft_tool_error_update_extra_su(const Error* e, const Supp* supp)
> +{
> +}
> +

I'm not sure the above is all that is needed. 
For example suppressing an error does not work. I tried this in
exp-floattrap/tests:

../../vg-in-place --gen-suppressions=all --tool=exp-floattrap ./nan-float

That runs into this assertion:
valgrind: m_errormgr.c:405 (gen_suppression): Assertion 'xtra[num_written] == '\0'' failed.


> +static void ft_print_usage(void)
> +{  
> +   VG_(printf)(
> +               "    --trap-inf=yes|no     trap if the result of a float operation is inf\n"
> +               "    --trap-nan=yes|no     trap if the result of a float operation is nan\n"
> +               );

Add the default value. Here is an example:

--show-below-main=no|yes  continue stack traces below main() [no]


> +}
> +
> +static void ft_print_debug_usage(void)
> +{  
> +   VG_(printf)(
> +               "    (none)\n"
> +               );

Write this in a single line. I know you've seen it done differently elsewhere.
But that code was written when computer displays had a 4:3 aspect ratio  :)


Change this:
> +                          True,
to
                             True,/*show TIDs for errors*/

> +/* Add an unary operation to the basic block */
> +
> +static IRTemp addUnop(IRSB* bb, IRType ty, IROp op, IRExpr* arg)
> +{

I'll look at the VEX related stuff in the next iteration in more detail.
Comment 10 Mathias Fröhlich 2015-08-18 20:12:16 UTC
Hi Florian,

Thanks for the comments.

I have attached an updated version of the patch.

I think I have incorporated all the comments.
Most notable the tests have been extended a lot.
But the SIMD tests are tied to the intel cpus together with gcc.
The gcc specifics are guarded with a gcc define, so it should compile everywhere, but obviously it does not run just everywhere. The tests are guarded with a prerequisite for these cpu classes.

Please review,
thanks

Mathias
Comment 11 Mathias Fröhlich 2015-08-18 20:14:42 UTC
Created attachment 94101 [details]
Updated patch against rev 15566
Comment 12 Florian Krohm 2015-09-10 14:37:06 UTC
Hi Matthias,

sorry for getting back to you so late...
Thanks for the new patch. Generating suppression business is working now.
Good. Using them however is not working.

../../vg-in-place --tool=exp-floattrap --gen-suppressions=all ./nan-double 2> SUPP
grep -v ^== SUPP > SUPP1
../../vg-in-place --tool=exp-floattrap --suppressions=SUPP1 ./nan-double

==6817== FATAL: in suppressions file "SUPP1" near line 4:
==6817==    location should be "...", or should start with "fun:" or "obj:"
==6817== exiting now.

I believe this line:
  Result of IEEE double operation is NaN
should not be there. Compare with memcheck suppressions..
When I remove the offending line from the suppression file, the syntax
error will go away, but the complaint is not being suppressed.
Please also make sure that a suppression for an INF value does not suppress a NAN complaint and vice versa (assuming they occur on the same line).

I spotted a few overly long lines in ft_print_usage and ft_pre_clo_init.
You can use C's string literal concatenation: "foo" " bar"  == "foo bar"
to fix this. Limit is 80 chars per line.

Testcases:
nan-float.c and a few others show 2 errors instead of one. The extra
complaint is likely due to the cast expression. That wouldn't be a 
problem per se. However, on other platforms (I tested s390) there is
only one error because the cast is implemented differently, apparently.
We could add an additional .exp file, of course. But, generally, we'd
like not to special-case stuff if it can be avoided. I suggest to rewrite
the tests like so:

volatile float nanval;

int main(void)
{
   /* make sure that the value computation is not optimized away by the compiler */
   volatile float zero = 0.0f;

   nanval = zero/zero;

   return 0;
}

Note that the global variable cannot be called nan as that conflicts with
a gcc built-in function.

As you'll be editing the testcases anyhow, can you 
(a) fix the overly long comment line  and
(b) add a comment that says which IROp is being tested here ?

I had expected the IROp to be Iop_DivF32 but that wasn't it.
Instead if was an Iop_Div64F0x2 (on x86-64) which was a surprise (to me at
least). On s390 it *is* a Iop_DivF32.
That gives us some coverage for function insert_check. Ideally we'd like to exercise one IROp
for each block of 'case' labels in that function.

Now: The testcases with inline assembler are specific to x86-64 and won't
compile on other platforms. Move them into exp-floattrap/tests/amd64.
Those tests using "long double" you can copy (not move) into
exp-floattrap/tests/s390x. That way we can test 128-bit wide floating point
ops. Only s390 has that for now.
Don't forget to update configure.ac


ft_main.c:

static void insert_check_float_F16(IRSB* bb, IRTemp value)
{
   /* Blow up the value to a F32 in the hope that this preserves inf and nan
    */

Let's not assume that. This F16 stuff seems to be ARM specific. Perhaps
somebody who knows about ARM can chime in and provide some details.
Until then I'd say, issue a warning (once) when encountering an F16 
operation and continue without instrumenting.

Please define symbolic names for 0x7f800000u and its friends:
#define NAN32 0x7f800000u
or whatever that value is. That will improve readability and avoid long lines.

It is not necessary to explan the "neat trick" three times. Once is enough
and you can have the other places refer to that instance.
I would appreciate a pointer to the place where you found it. You said
earlier it was in glibc somewhere? glibc version and file name would be good.

One more thing came to mind. How difficult would it be to be more specific
in the error message. For example, instead of:

   Result of IEEE double operation is NaN:

say this:

   Result of IEEE double division is NaN:

or multiplication, square root, etc...
This may be helpful when you have a line with lots of floating point
operations. 
because the IR level we're
looking at here is not full trees, but SSA form. But still...

Another thing I noticed is avalanche errors:

For   value = 200 + zero/zero;   there will be two errors: one for the
division and one for the addition. That can be noisy. Given the level the
IR is at at this point (SSA form and *not* trees) this will be quite
difficult to do (if it can be done reasonably at all). I just wanted to
throw it out. Perhaps you can think of something clever.
Here is a difficult example:

      value = exp1 + exp2;

where exp1 and exp2 are expressions both producing a NaN. So you want
an error for the NaN produced by exp1 and one for the NaN produced by
exp2 but no error for the addition of the two. 

When you add a new version of the patch can you mark the previous versions as obsolete?
Thanks.  Enough for today..
Comment 13 Mathias Fröhlich 2015-10-17 16:37:21 UTC
Hi,

(In reply to Florian Krohm from comment #12)
> sorry for getting back to you so late...
Same here...

> Thanks for the new patch. Generating suppression business is working now.
> Good. Using them however is not working.
Well, looking at what I have to do here to make it work. I guess I have just omitted
doing suppressions back these years ago.
You will get that with the next patch.

> I spotted a few overly long lines in ft_print_usage and ft_pre_clo_init.
> You can use C's string literal concatenation: "foo" " bar"  == "foo bar"
> to fix this. Limit is 80 chars per line.
So, this really exists? You made a comment with the wide monitors that
made me think that I do not have to care anymore for 80 chars.

> Testcases:
> nan-float.c and a few others show 2 errors instead of one. The extra
> complaint is likely due to the cast expression. That wouldn't be a 
> problem per se. However, on other platforms (I tested s390) there is
> only one error because the cast is implemented differently, apparently.
> We could add an additional .exp file, of course. But, generally, we'd
> like not to special-case stuff if it can be avoided. I suggest to rewrite
> the tests like so:
> 
> volatile float nanval;
> 
> int main(void)
> {
>    /* make sure that the value computation is not optimized away by the
> compiler */
>    volatile float zero = 0.0f;
> 
>    nanval = zero/zero;
> 
>    return 0;
> }
> 
> Note that the global variable cannot be called nan as that conflicts with
> a gcc built-in function.
> 
> As you'll be editing the testcases anyhow, can you 
> (a) fix the overly long comment line  and
> (b) add a comment that says which IROp is being tested here ?
> 
> I had expected the IROp to be Iop_DivF32 but that wasn't it.
> Instead if was an Iop_Div64F0x2 (on x86-64) which was a surprise (to me at
> least). On s390 it *is* a Iop_DivF32.
> That gives us some coverage for function insert_check. Ideally we'd like to
> exercise one IROp
> for each block of 'case' labels in that function.

Hmm, this may be fuzzy in the end.
At least I expect this to be compiler, compile flags and architecture dependent.
So, old 80 bits float registers on i386 probably translate into what you expect.
But what you see on 64 bits intel is the simd variant that works on the lower part
of the vector register but mentions the whole register. That somehow matches the F0x2 operation variant somehow. At least this is my expected match of assembly concepts to the iop descriptions I have found. 
That said, I think we have no clear mapping from c code to the assembler. If we really want to test each iop separately and mention them in the test code, I can only see writing everything in architecture dependent assembly.
Do we want purely assembler written tests to nail down the ops we get?

> ft_main.c:
> 
> static void insert_check_float_F16(IRSB* bb, IRTemp value)
> {
>    /* Blow up the value to a F32 in the hope that this preserves inf and nan
>     */
> 
> Let's not assume that. This F16 stuff seems to be ARM specific. Perhaps
> somebody who knows about ARM can chime in and provide some details.
> Until then I'd say, issue a warning (once) when encountering an F16 
> operation and continue without instrumenting.
Ok, will do.

> It is not necessary to explan the "neat trick" three times. Once is enough
> and you can have the other places refer to that instance.
Well, you wanted more documentation and this part is about the only code that is not just a direct match of widely known cpu concepts and its straight forward applications. 
But yes, we can cross reference them.

> I would appreciate a pointer to the place where you found it. You said
> earlier it was in glibc somewhere? glibc version and file name would be good.
Well, isnan is your friend :-)
Seriously, I don't know what I really looked at some years ago. But the generic, architecture independent isnan implementation looks quite similar even these days. I will drop a reference to this in the code ...

> One more thing came to mind. How difficult would it be to be more specific
> in the error message. For example, instead of:
> 
>    Result of IEEE double operation is NaN:
> 
> say this:
> 
>    Result of IEEE double division is NaN:
> 
> or multiplication, square root, etc...
> This may be helpful when you have a line with lots of floating point
> operations. 
> because the IR level we're
> looking at here is not full trees, but SSA form. But still...
> 
> Another thing I noticed is avalanche errors:
> 
> For   value = 200 + zero/zero;   there will be two errors: one for the
> division and one for the addition. That can be noisy. Given the level the
> IR is at at this point (SSA form and *not* trees) this will be quite
> difficult to do (if it can be done reasonably at all). I just wanted to
> throw it out. Perhaps you can think of something clever.
> Here is a difficult example:
> 
>       value = exp1 + exp2;
> 
> where exp1 and exp2 are expressions both producing a NaN. So you want
> an error for the NaN produced by exp1 and one for the NaN produced by
> exp2 but no error for the addition of the two. 

I know that there is room for improvements in error reporting. 
And yes, I believe it's at first sadly architecture dependent.

Ok, value tracking - the brute force method here.
I don't know enough of valgrind to see if this is easily possible.
Probably we would have to do something similar than memcheck (probably) does.
I think in memcheck must be a flag for each value we process that marks if the value got
initialized or not. That flag is then propagated across the operations and evaluated once
we hit a conditional jump. Somehow simplified probably, but basically true??
If we do this with an 'already reported' flag we could be safe.
I fear that this is a lot of work to be done.

Less brute force, we may be able to analyse the inputs and just report an output value
of NaN if it is not just a simple propagation of a NaN already available in the input.
Let me think about that a bit more ...
Inf values may be more complex with this idea.

If we can get some of that working, we do not need to go all the architecture dependent tests then ...

Have a nice weekend ...

Mathias
Comment 14 Paul Floyd 2022-06-23 12:57:54 UTC
Almost 7 years since the last patch. Hmm. Are you still following this Mathias?

I'm very interested in getting SIGFPE to work on amd64 - we use SIGFPE extensively at work and rely on it for the correct operation of our simulator.
Comment 15 Mathias Fröhlich 2022-07-06 08:49:55 UTC
Good morning,

I am still alive, and I am still using this tool every now and then for my own simulator stuff.
Yes. I think that it is still useful, nevertheless, I hear that we are the minority here.
I have not forward ported the patch for some time.
The basic idea is to insert nan and inf checks at every floating point operation.
So if you want to do that feel free. I can may be assist to grab the basic idea, but I have as of
today not enough time to forward port in the near future.

best

Mathias
Comment 16 Paul Floyd 2023-04-12 15:53:50 UTC
Created attachment 158040 [details]
Patch rebased after 57dbcacfdbff0d4c12dcd52ff56f159631499dc6

Some root dir files needed updating (removal of exp-sgcheck and renaming exp-dhat to dhat)

Use electronic address in GPL license

Add entries to .gitignore
Comment 17 Bruno Lathuilière 2024-02-19 14:48:03 UTC
As a side product of the valgrind tool verrou, we have  a similar functionality (activated by default) for all  emulated rounding mode. So if the rounding-mode nearest is selected (by default), the behavior of  verrou and exp-floattrap is similar.

The specialized tool exp-floattrap has several advantages : 
       - exp-floattrap deals with more IOP (by example Iop_Recip*, Iop_RSqst*, Iop_Neg*   ... ).
       - I think (I did not check) exp-floattrap should be faster than verrou as the NaN detection is done with IR. In verrou it is done inside the dirty call to a  function which replace the floating point operation.
       - exp-floattrap should be more portable than verrou.

There one point where verrou is maybe more user friendly (concerning only the nan/inf detection). Indeed with option --libm=instrumented (not yet the default), the dynamic libm call are intercepted and so the nan/inf detection inside a libm call are ignored : there is only one detection on the result. 
 
To conclude both tools are complementary but there is overlapping, so it is good to keep in mind. As I just discovered this tool, I think it could be a good idea to add a link in this page : https://valgrind.org/downloads/variants.html