Bug 417075

Summary: pwritev(vector[...]) suppression ignored
Product: [Developer tools] valgrind Reporter: Leonid Yuriev <leo>
Component: memcheckAssignee: Julian Seward <jseward>
Status: CLOSED INTENTIONAL    
Severity: normal CC: ahajkova, jseward, mark, philippe.waroquiers
Priority: NOR    
Version: 3.15 SVN   
Target Milestone: ---   
Platform: Fedora RPMs   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: not a fix, but detects the incompatible supp entry and produce a warning

Description Leonid Yuriev 2020-02-02 20:57:33 UTC
SUMMARY

memcheck 3.13 (ubuntu) works fine, but memcheck 3.15 (fedora 31) don't handle suppression for pwritev syscall.

STEPS TO REPRODUCE
1. git clone https://github.com/leo-yuriev/libmdbx.git
2. make -C libmdbx memcheck
3. cat libmdbx/valgrind-*.log | less

OBSERVED RESULT
Error report:
==149931== Syscall param pwritev(vector[0]) points to uninitialised byte(s)
==149931==    at 0x4D2F5CD: pwritev (in /usr/lib64/libc-2.30.so)
==149931==    by 0x486B47D: mdbx_pwritev (osal.c:790)
==149931==    by 0x486910E: mdbx_flush_iov (core.c:7507)
==149931==    by 0x486910E: mdbx_page_flush (core.c:7566)


EXPECTED RESULT
Suppression works as expected.

SOFTWARE/OS VERSIONS
$ lsb_release -a
LSB Version:	:core-4.1-amd64:core-4.1-noarch
Distributor ID:	Fedora
Description:	Fedora release 31 (Thirty One)
Release:	31
Codename:	ThirtyOne

ADDITIONAL INFORMATION
Suppression:
{
   pwrite-page-flush
   Memcheck:Param
   pwritev(vector[...])
   fun:pwritev
   ...
   fun:mdbx_page_flush
}

https://raw.githubusercontent.com/leo-yuriev/libmdbx/master/test/valgrind_suppress.txt
Comment 1 Leonid Yuriev 2020-02-02 21:03:17 UTC
Actual memcheck version information:

==151028== Memcheck, a memory error detector
==151028== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==151028== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info

$ dnf list valgrind
Last metadata expiration check: 0:01:18 ago on Mon Feb  3 00:01:22 2020.
Installed Packages
valgrind.x86_64                    1:3.15.0-11.fc31                     @fedora
Comment 2 Philippe Waroquiers 2020-02-04 21:11:34 UTC
What does
  valgrind --gen-suppressions=all ...
show ?

Does it suppress if you use the suppression exactly as generated by valgrind ?

Thanks
Comment 3 Leonid Yuriev 2020-02-05 01:08:52 UTC
> What does `valgrind --gen-suppressions=all ...` show ?
{
   <insert_a_suppression_name_here>
   Memcheck:Param
   pwritev(vector[0])
   fun:pwritev
   fun:mdbx_pwritev
   fun:mdbx_flush_iov
   fun:mdbx_page_flush
   fun:mdbx_txn_commit
   fun:_ZN8testcase16breakable_commitEv
   fun:_ZN8testcase39db_open__begin__table_create_open_cleanERj
   fun:_ZN15testcase_nested5setupEv
   fun:_Z12test_executeRK12actor_config
   fun:_Z16osal_actor_startRK12actor_configRi
   fun:main
}

{
   <insert_a_suppression_name_here>
   Memcheck:Param
   pwritev(vector[1])

...
   pwritev(vector[2])
...
   pwritev(vector[3]) and so on.


> Does it suppress if you use the suppression exactly as generated by valgrind ?
Suppressions works for an every explicitly `vector[N]`, but not for the `vector[...]`.

Thanks
Comment 4 Philippe Waroquiers 2020-02-05 05:24:10 UTC
(In reply to Leonid Yuriev from comment #3)
> > What does `valgrind --gen-suppressions=all ...` show ?
> {
>    <insert_a_suppression_name_here>
>    Memcheck:Param
>    pwritev(vector[0])
>    fun:pwritev
>    fun:mdbx_pwritev
>    fun:mdbx_flush_iov
>    fun:mdbx_page_flush
>    fun:mdbx_txn_commit
>    fun:_ZN8testcase16breakable_commitEv
>    fun:_ZN8testcase39db_open__begin__table_create_open_cleanERj
>    fun:_ZN15testcase_nested5setupEv
>    fun:_Z12test_executeRK12actor_config
>    fun:_Z16osal_actor_startRK12actor_configRi
>    fun:main
> }
> 
> {
>    <insert_a_suppression_name_here>
>    Memcheck:Param
>    pwritev(vector[1])
> 
> ...
>    pwritev(vector[2])
> ...
>    pwritev(vector[3]) and so on.
> 
> 
> > Does it suppress if you use the suppression exactly as generated by valgrind ?
> Suppressions works for an every explicitly `vector[N]`, but not for the
> `vector[...]`.

As I understand, you are expecting vector[...] in the line following
Memcheck:Param to match vector[1] or vector[2] or ...
There is no such logic.  This part of the error must match exactly.

Your suppression entry was working with 3.13 because in 3.13, the error
was generated with vector[...]

Such errors for pwritev was changed from producing vector[...]
to vector[0], vector[1], etc as part of the commit:

commit b0861063a8d2a55bb7423e90d26806bab0f78a12
Author:     Alexandra Hájková <ahajkova@redhat.com>
AuthorDate: Tue Jun 4 13:47:14 2019 +0200
Commit:     Mark Wielaard <mark@klomp.org>
CommitDate: Wed Jul 3 00:19:16 2019 +0200

As far as I can see, the fact that the new error message after this commit
contains a varying offset between brackets is what causes the problem:
this looks to me to be a backward incompatible change (as shown by your
supp that stopped working between 3.13 and 3.15) and does not match
the 'idea' of error parameters. Here are some comments extracted from
the description of void VG_(maybe_record_error):
   Note that `ekind' and `s' are also used to generate a suppression.
   `s' should therefore not contain data depending on the specific
   execution (such as addresses, values) but should rather contain
   e.g. a system call parameter symbolic name.
(where 's' is this vector[1] etc string).

Wondering how to fix this ...
If we go back to the behaviour before 3.15, we break the suppression
entries working for 3.15, and if we do not go back, we break the suppression
entries working for 3.14 and before.
Comment 5 Mark Wielaard 2020-02-05 12:10:08 UTC
This is unfortunate and an unforeseen consequence of making the the error message more useful (it is useful to know which vector contained uninitialised bytes).

Sadly we have had releases with both the old and new variant of the error message. So we would indeed break some existing suppressions picking either the old or new variant.

I wonder if we can make [...] special so that it either matches the literal string [...] or [<number>].

Note: we do use the [...] variant also in [process_vm_](readv|writev) and vmsplice.
Comment 6 Philippe Waroquiers 2020-02-05 20:59:33 UTC
(In reply to Mark Wielaard from comment #5)
> This is unfortunate and an unforeseen consequence of making the the error
> message more useful (it is useful to know which vector contained
> uninitialised bytes).
> 
> Sadly we have had releases with both the old and new variant of the error
> message. So we would indeed break some existing suppressions picking either
> the old or new variant.
Yes, unfortunate.  I will add Julian in the cc list to have some more
opinions ...

> 
> I wonder if we can make [...] special so that it either matches the literal
> string [...] or [<number>].
This can for sure be coded, but the error matching code/logic is IMO
quite complex already, and moreover, this logic will be tool and error
dependent, as the extra message matching is done by the tool, not by
the m_errormgr.c common core module.

> 
> Note: we do use the [...] variant also in [process_vm_](readv|writev) and
> vmsplice.
Comment 7 Philippe Waroquiers 2020-02-06 07:04:23 UTC
(In reply to Mark Wielaard from comment #5)
> This is unfortunate and an unforeseen consequence of making the the error
> message more useful (it is useful to know which vector contained
> uninitialised bytes).
Yes, having more precise info in this area can be very useful.

To give more information about an error without making them 'different',
it might be more appropriate to put more information in the 'extra' part
of the error.
The logic to compare this "extra" part is in the tool, and so the tool
can consider that this additional info is not to be compared in the error
equality logic.

And of course, as usual, if you have an error, you can use gdb+vgdb
to get as much details as possible about the error being reported.
Comment 8 Leonid Yuriev 2020-02-17 17:56:11 UTC
Related to https://github.com/erthink/libmdbx/issues/82
Comment 9 Philippe Waroquiers 2020-04-23 20:12:36 UTC
Created attachment 127809 [details]
not a fix, but detects the incompatible supp entry and produce a warning

The commit log explains in details what we envisaged,
and why the decision is to keep what 3.15 accepts instead
of rolling back the change in 3.15 or make the error matching logic
even more complex
Comment 10 Philippe Waroquiers 2020-04-23 20:18:45 UTC
Some further notes:
I should re-update the warning to replace the final 'or ...'
by  'or [2].

And I sincerely hope that nobody is using preadv and pwritev wrongly
with huge vectors, as otherwise they might need to type a lot of supp
entries.
(that is in fact a main reason to avoid variable extra error lines ...)
Comment 11 Philippe Waroquiers 2020-04-23 20:24:18 UTC
Updated the warning message to be:
==3170== WARNING: preadv(vector[...]) is an obsolete suppression line not supported anymore since valgrind 3.15.
==3170== You should replace [...] by a specific index such as [0] or [1] or [2] or similar
Comment 12 Philippe Waroquiers 2020-04-24 15:12:50 UTC
After much discussion, we decided to keep the 3.15 behaviour
rather than rollback to 3.14 behaviour.
But in valgrind 3.16, the incompatible supp entry will be detected,
and a warning message will be given.
That has been pushed today as d9e714812
This is not really fixing the bug, so closing it as RESOLVED INTENTIONAL.

Sorry for the backward incompatible change pushed in 3.15,
we will try harder in the future to avoid such incompatible changes.
Comment 13 Leonid Yuriev 2020-05-03 17:38:03 UTC
The recommended solution was applied.
https://github.com/erthink/libmdbx/commit/5819f7a468d8f23ebc858fe6edabeab95c2bb6d8