Bug 502679 - Use LTP for testing valgrind
Summary: Use LTP for testing valgrind
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (other bugs)
Version First Reported In: 3.25 GIT
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-04-11 14:12 UTC by mcermak
Modified: 2025-05-06 10:36 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
proposed patch (6.93 KB, patch)
2025-04-11 14:12 UTC, mcermak
Details
proposed patch (6.14 KB, patch)
2025-04-14 09:10 UTC, mcermak
Details
summary.log after make ltpchecks on fedora 41 x86_64 (92.64 KB, text/x-log)
2025-04-17 20:39 UTC, Mark Wielaard
Details
parallelize testing (2.87 KB, patch)
2025-04-25 10:58 UTC, mcermak
Details
parallelize testing (3.32 KB, patch)
2025-04-25 12:33 UTC, mcermak
Details
possible patch (4.10 KB, patch)
2025-05-05 12:27 UTC, mcermak
Details

Note You need to log in before you can comment on or make changes to this bug.
Description mcermak 2025-04-11 14:12:31 UTC
Created attachment 180173 [details]
proposed patch

I propose to use the Linux Test Project (LTP) for testing valgrind.   Attached is a patch that adds a new Makefile target make ltpchecks (modelled after make auxchecks).  It compiles the LTP testsuite sources, then runs the LTP syscall tests with valgrind and without it, checks for errors, and reports them.

For context, I've recently used a test script *similar* to this proposed patch, and uploaded resulting test logs to the Bunsen tool. This hopefully gives some idea how the resulting logs looks like, and what kind of issues can the test catch:

https://builder.sourceware.org/testruns/?op_mode=search_exact&has_keyvalue_k=testrun.gitdescribe&has_keyvalue_op=glob&has_keyvalue_v=*valgrind-ltp*
Comment 1 mcermak 2025-04-11 14:15:39 UTC
Note that the trailing asterisk needs to be part of the above URL, otherwise it won't work.
Comment 2 Mark Wielaard 2025-04-13 11:49:04 UTC
I think ltp-error-patterns.txt is missing from the patch.
Comment 3 Mark Wielaard 2025-04-13 12:00:48 UTC
+# GNU Scientific Library 1.6

copy/paste issue :)

+LTP_DIR_NAME=20250130

I would like this to be LTP_VERSION and group it together with LTP_SHA256_SUM.
In theory these two are the only things to update to get a newer ltp version integrated.

+LTP_TAR_NAME=ltp-full-$(LTP_DIR_NAME).tar.xz
+LTP_URL=https://github.com/linux-test-project/ltp/releases/download/$(LTP_DIR_NAME)/$(LTP_TAR_NAME)
+LTP_SHA256_SUM=02e4ec326be54c3fd92968229a468c02c665d168a8a673edc38a891f7395ae10
+LTP_TAR=$(AUX_CHECK_DIR)/$(LTP_TAR_NAME)
+LTP_SRC_DIR=$(AUX_CHECK_DIR)/ltp-full-$(LTP_DIR_NAME)
+# By default we like -O3 to hopefully get some loop vectorization
+# You can also override LTP_CFLAGS if you want e.g. -march=core-avx2
+# Different LTP_CFLAGS will result in different build dirs (under AUX_CHECK_DIR)
+LTP_CFLAGS=-g -O3
+# i386 needs sse to get rounding for floating point correct.
+# But we only want this if the primary isn't AMD64
+if VGCONF_ARCHS_INCLUDE_X86
+if !VGCONF_ARCHS_INCLUDE_AMD64
+# This was just blindly copied from GSL, might need tweaks...
+LTP_CFLAGS+=-mfpmath=sse -msse2
+endif
+endif

I think LTP_CFLAGS is not necessary. If set at all it should just be -g -O2 as default.
-O3 might make sense for GSL, but not for what we are testing here.
And hopefully there are no special flags needed per architecture.

 # Filter out spaces from GSL_CFLAGS to get unique build dir
 GSL_BUILD_DIR=$(AUX_CHECK_DIR)/gsl-build$(subst $(sp),,$(GSL_CFLAGS))
+# LTP_BUILD_DIR=$(AUX_CHECK_DIR)/ltp-build$(subst $(sp),,$(LTP_CFLAGS))
+# LTP_BUILD_DIR=$(AUX_CHECK_DIR)/ltp-full-$(LTP_DIR_NAME)
+# LTP_BUILD_DIR=$(AUX_CHECK_DIR)/ltp-full-$(LTP_DIR_NAME)
+LTP_BUILD_DIR=$(AUX_CHECK_DIR)/ltp-build$(subst $(sp),,$(LTP_CFLAGS))

Which would make the subst here unnecessary.
Comment 4 Mark Wielaard 2025-04-13 12:19:34 UTC
(In reply to Mark Wielaard from comment #3)

>  # Filter out spaces from GSL_CFLAGS to get unique build dir
>  GSL_BUILD_DIR=$(AUX_CHECK_DIR)/gsl-build$(subst $(sp),,$(GSL_CFLAGS))
> +# LTP_BUILD_DIR=$(AUX_CHECK_DIR)/ltp-build$(subst $(sp),,$(LTP_CFLAGS))
> +# LTP_BUILD_DIR=$(AUX_CHECK_DIR)/ltp-full-$(LTP_DIR_NAME)
> +# LTP_BUILD_DIR=$(AUX_CHECK_DIR)/ltp-full-$(LTP_DIR_NAME)
> +LTP_BUILD_DIR=$(AUX_CHECK_DIR)/ltp-build$(subst $(sp),,$(LTP_CFLAGS))
> 
> Which would make the subst here unnecessary.

Actually LTP_BUILD_DIR isn't used at all, so lets just remove it.
Comment 5 Mark Wielaard 2025-04-13 12:27:47 UTC
+# Currently 3 LTP tests are run as a demo, not under vg-in-place.
+ltp-check: $(LTP_SRC_DIR)
+       $(abs_top_srcdir)/auxprogs/ltp-tester.sh $(LTP_SRC_DIR)

This comment seems outdated.

+ltp-clean:
+       rm -rf $(LTP_SRC_DIR)

I think this target should be added to either clean-local or auxclean.
Comment 6 Mark Wielaard 2025-04-13 20:26:32 UTC
diff --git a/auxprogs/ltp-tester.sh b/auxprogs/ltp-tester.sh
new file mode 100755
index 000000000..8ee1a8778
--- /dev/null
+++ b/auxprogs/ltp-tester.sh
@@ -0,0 +1,72 @@
+#!/bin/bash

OK, /bin/bash instead of /bin/sh because of bash mapfiles usage below.

+set -e
+
+SELF_DIR=$(dirname $(readlink -f $0))
+LTP_DIR=${1:-$SELF_DIR/auxchecks/ltp-full-20250130}

Can we derive this for the setting of AUX_CHECK_DIR and LTP_VERSION (currently LTP_DIR_NAME) in the Makefile (export those variable or pass then as arguments to the script/

+ORIG_PATH=${PATH}
+OUTPUT_LOG="${OUTPUT_LOG:-$LTP_DIR/valgrind-ltp.log}"

OK, so it can be set or derived from LTP_DIR.
Pondering whether it should be set relative to SELF_DIR to make sure bunsen picks it up.
For the builders we would set the AUX_CHECK_DIR to somewhere outside the builddir, so we keep a cache of the download tar.gz and builddir.

+DIFFCMD="${DIFFCMD:-diff -u}"

Isn't this a little overkil? What would the alternative to diff -u be?
Asking because I think this makes the script less readable than it could be (IMHO).

+VALGRIND="${VALGRIND:-$SELF_DIR/../vg-in-place}"

I do like this.

+>$OUTPUT_LOG

So this means all stdout output goes to OUTPUT_LOG from this point on?
but then ...

+myLog ()
+{
+    msg="$1"
+    echo "$msg"
+    echo -e "FAIL: $msg" >> $OUTPUT_LOG
+}

Isn't this doing the output twice?

+cd $LTP_DIR

Ack.
Comment 7 Mark Wielaard 2025-04-13 21:19:06 UTC
+mapfile -t files < <(find testcases/kernel/syscalls -executable -and -type f | sort)
+c=${#files[@]}; i=0

Interesting, slightly above my bash knowledge, but I think I understand what it does.

+for test in "${files[@]}"; do
+    dir=$(dirname $test)
+    exe=$(basename $test)
+    tmp=$(mktemp -d)

Why do we need these tmp dirs? They just hold the log files? Shouldn't we just keep those?
That seems useful for local debugging what went wrong instead of having everything in one big output log file.

+    i=$((++i))
+    pushd $dir >/dev/null
+        echo "[$i/$c] Testing $exe ..." | tee -a $OUTPUT_LOG
+        PATH="$ORIG_PATH:$PWD"
+        ./$exe >$tmp/log1std 2>$tmp/log1err ||:
+        $VALGRIND -q --tool=none --log-file=$tmp/log2 ./$exe >$tmp/log2std 2>$tmp/log2err ||:
+        $VALGRIND -q --tool=memcheck --log-file=$tmp/log3 ./$exe >$tmp/log3std 2>$tmp/log3err ||:

OK, running once normally, once with the none tool and once with memcheck.
Keeping stdout and stderr, plus the valgrind none and memcheck logs.
Nice.

+        # We want to make sure that LTP syscall tests give identical
+        # results with and without valgrind.  The test logs go to the
+        # stderr.  They aren't identical across individual runs. The
+        # differences include port numbers, temporary files, test
+        # output ordering changes and more, so that they aren't trivially
+        # comparable.  So we resort to comparing at least the final
+        # summaty of individual test results
+        tail -10 $tmp/log1err | grep -E "^(passed|failed|broken|skipped|warnings):" > $tmp/log1summary ||:
+        tail -10 $tmp/log2err | grep -E "^(passed|failed|broken|skipped|warnings):" > $tmp/log2summary ||:
+        tail -10 $tmp/log3err | grep -E "^(passed|failed|broken|skipped|warnings):" > $tmp/log3summary ||:

Nice.

+        # Check logs, report errors
+        pushd $tmp >/dev/null

hmmm.

+            if test $(wc -l log2 | cut -d' ' -f1) -gt 0; then
+                myLog "${exe}: unempty log2:\n$(cat log2)"
+            fi

I think you can use test -s log2 here

       -s FILE
              FILE exists and has a size greater than zero

+            if grep -f $SELF_DIR/ltp-error-patterns.txt * > error-patterns-found.txt; then
+                myLog "${exe}: error string found:\n$(cat error-patterns-found.txt)"
+            fi

Note that ltp-error-patterns.txt seems to be missing from the patch.
So * is why you wat to do it in a separate tmp dir?
Couldn't you explicitly list the log files?
Do you need to check anything except the valgrind error logs? log2 and log3?

+            if ! ${DIFFCMD} log1summary log2summary >/dev/null; then
+                myLog "${exe}: ${DIFFCMD} log1summary log2summary:\n$(${DIFFCMD} log1summary log2summary)"
+            fi
+
+            if ! ${DIFFCMD} log2summary log3summary >/dev/null; then
+                myLog "${exe}: ${DIFFCMD} log2summary log3summary:\n$(${DIFFCMD} log2summary log3summary)"
+            fi

Yeah, makes sense.

+        popd >/dev/null
+    popd >/dev/null
+    rm -rf $tmp
+done

Still somewhat surprised the tmp dir with the logs is removed.

Sorry for the many nitpicks and questions. I do really like this. Would like to hook it up to the buildbots.
Comment 8 mcermak 2025-04-14 09:10:17 UTC
Created attachment 180249 [details]
proposed patch

Hi Mark.  Thank you for your thorough review!  I've attempted to address your concerns one by one.   Please check the attached patch.  Does it look good to you now?
Comment 9 Mark Wielaard 2025-04-17 14:50:02 UTC
Hi Martin,

(In reply to mcermak from comment #8)
> Hi Mark.  Thank you for your thorough review!  I've attempted to address
> your concerns one by one.   Please check the attached patch.  Does it look
> good to you now?

Currently testing with these changes:

diff --git a/auxprogs/Makefile.am b/auxprogs/Makefile.am
index 27c22857ede6..00c743bc11f8 100644
--- a/auxprogs/Makefile.am
+++ b/auxprogs/Makefile.am
@@ -229,7 +229,9 @@ gsl-check: $(GSL_BUILD_DIR)/gsl-build
                $(GSL_BUILD_DIR)/valgrind-gsl.out
 
 ltp-check: $(LTP_SRC_DIR)
-       LTP_SRC_DIR=$(LTP_SRC_DIR) $(abs_top_srcdir)/auxprogs/ltp-tester.sh
+       LTP_SRC_DIR=$(LTP_SRC_DIR) \
+       VALGRIND=$(abs_top_builddir)/vg-in-place \
+         $(abs_top_srcdir)/auxprogs/ltp-tester.sh
 
 
 # We keep the tarball but remove the unpacked sources and build
diff --git a/auxprogs/ltp-tester.sh b/auxprogs/ltp-tester.sh
index 45a968801757..a3c2157c56d7 100755
--- a/auxprogs/ltp-tester.sh
+++ b/auxprogs/ltp-tester.sh
@@ -26,7 +26,7 @@ myLog ()
 
 cd $LTP_SRC_DIR
 
-mapfile -t files < <(find testcases/kernel/syscalls -executable -and -type f | sort | head -10)
+mapfile -t files < <(find testcases/kernel/syscalls -executable -and -type f | sort)
 c=${#files[@]}; i=0
 
 for test in "${files[@]}"; do
@@ -55,7 +55,7 @@ for test in "${files[@]}"; do
 
         # Check logs, report errors
         pushd $l >/dev/null
-            if test $(wc -l log2 | cut -d' ' -f1) -gt 0; then
+            if test -s log2; then
                 myLog "${exe}: unempty log2:\n$(cat log2)"
             fi
Comment 10 Mark Wielaard 2025-04-17 20:39:41 UTC
Created attachment 180366 [details]
summary.log after make ltpchecks on fedora 41 x86_64

It takes a long time 3+ hours, but it does seem to work as expected.
The summary.log is really useful.

I like to push this now so people can try it out with the 3.25.0 RC1.

But we should look into an exclude list, there are a handful of tests that take most of the time (1 takes more than an hour).
And maybe we should try to run the tests in parallel.
Comment 11 Mark Wielaard 2025-04-17 20:42:48 UTC
commit 944a8c864c5e1df6388bf4b47900eb9c001ceb73
Author: Martin Cermak <mcermak@redhat.com>
Date:   Thu Apr 17 16:14:19 2025 +0200

    Use LTP for testing valgrind
    
    Add a new top level make target ltpchecks which will fetch the latest
    linux test project (ltp) release as defined by the LTP_VERSION and
    LTP_SHA256 variables in auxprogs/Makefile.am (update those when a new
    version of ltp is released). If the ltp tar.xz has already been
    downloaded, or it has already been unpacked and build, the (cached)
    file and build will be reused.
    
    The actual testing is done through the auxprogs/ltp-tester.sh script.
    It takes all executable tests from the ltp testcases under
    kernel/syscalls and runs them 3 times. Once directly, not under
    valgrind, once with -q --tool=none and once with -q
    --tool=memcheck. It then checks that valgrind didn't produce any
    messages with the none tool, that there were no fatal errors produced
    (as defined in auxprogs/ltp-error-patterns.txt) and that the ltp
    results are the same with and without valgrind.
    
    Currently there are 1472 test binaries and running them all (serially)
    takes more than three hours and detects various missing or incomplete
    syscall handlers in valgrind, plus various crashers.
    
    https://bugs.kde.org/show_bug.cgi?id=502679
Comment 12 Mark Wielaard 2025-04-18 10:43:14 UTC
(In reply to Mark Wielaard from comment #11)
>     Currently there are 1472 test binaries and running them all (serially)
>     takes more than three hours and detects various missing or incomplete
>     syscall handlers in valgrind, plus various crashers.

commit ca5a8e303b8bef7baccfc28ff8b46b65ceb50708
Author: Mark Wielaard <mark@klomp.org>
Date:   Fri Apr 18 12:22:29 2025 +0200

    Add auxprogs/ltp-excludes.txt
    
    There are a couple of ltp testcases that take a very long time to run
    (under valgrind). Add a file auxprogs/ltp-excludes.txt that is used to
    exclude them from a make ltpchecks run, containing 10 tests:
    
    bind06
    epoll-ltp
    inotify09
    msgstress01
    sendmsg03
    setsockopt06
    setsockopt07
    signal05
    signal06
    timerfd_settime02
    
    Excluding these 10 tests brings the execution time of make ltpchecks
    down to ~45 minutes.
Comment 13 mcermak 2025-04-25 10:58:49 UTC
Created attachment 180636 [details]
parallelize testing

Hi Mark, thank you for merging the patch!

Please consider the attached addendum:  It allows for running the syscall tests in parallel.  With this update, the LTP testsuite finished in less than 20 minutes for me (on a small VM with 1 CPU core).  Besides that, this patch allows for optionally running only selected testcases specified via the TESTS env var.
Comment 14 mcermak 2025-04-25 12:33:53 UTC
Created attachment 180638 [details]
parallelize testing

Re-format the previous patch using git format-patch to standardize it.  Please, review.
Comment 15 Mark Wielaard 2025-05-04 03:06:26 UTC
Note again that this is in an already closed bug, it would be better to just open a new bug so we don't loose this.
But we are close. Just some small comments.

(In reply to mcermak from comment #14)
> Created attachment 180638 [details]
> parallelize testing
> 
> Re-format the previous patch using git format-patch to standardize it. 
> Please, review.

+PARALLEL_JOBS=${PARALLEL_JOBS:-$(($(nproc --all) + 10))}

Could we pass this from the Makefile instead? So that it follows the make -jx ltpchecks from top-level?
I am thinking of something like:

diff --git a/auxprogs/Makefile.am b/auxprogs/Makefile.am
index 9cec4f222b36..3a2d0d17629e 100644
--- a/auxprogs/Makefile.am
+++ b/auxprogs/Makefile.am
@@ -231,7 +231,13 @@ gsl-check: $(GSL_BUILD_DIR)/gsl-build
        diff -u $(abs_top_srcdir)/auxprogs/gsl-1.6.out.x86.exp \
                $(GSL_BUILD_DIR)/valgrind-gsl.out
 
+# Extract -jNUM from MAKEFLAGS.
+# Note that any spaces between -j and NUM have already been removed.
+# Also there will be only one (the last) -jNUM left in MAKEFLAGS.
+PARALLEL_JOBS=$(subst -j,,$(filter -j%,$(MAKEFLAGS)))
+
 ltp-check: $(LTP_SRC_DIR)
+       PARALLEL_JOBS=$(PARALLEL_JOBS) \
        LTP_SRC_DIR=$(LTP_SRC_DIR) \
        VALGRIND=$(abs_top_builddir)/vg-in-place \
          $(abs_top_srcdir)/auxprogs/ltp-tester.sh

Also could we use just $(nproc) instead of --all and +10. I am afraid to overload a machine otherwise.
With the above, if people really want to use that then they can use make -j $[$(nproc --all) + 10]

+    while test "$(pgrep -P $$ | wc -l)" -gt $PARALLEL_JOBS; do sleep 0.1; done

I think jobs -l is slightly more efficient than pgrep -P $$
The sleep 0.1 while loop isn't great, but I don't know a good alternative.

Don't you have to pass $test and $i to doTest?
It seems they could change before they are used in a particular doTest run.
Maybe something like:

diff --git a/auxprogs/ltp-tester.sh b/auxprogs/ltp-tester.sh
index 412a1ac55089..e31f8a6b12e4 100755
--- a/auxprogs/ltp-tester.sh
+++ b/auxprogs/ltp-tester.sh
@@ -14,7 +14,7 @@ SUMMARY_LOG="$LOGDIR/summary.log"
 DIFFCMD="diff -u"
 VALGRIND="${VALGRIND:-$LTP_SRC_DIR/../../../vg-in-place}"
 # For parallel testing, consider IO intensive jobs, take nproc into account
-PARALLEL_JOBS=${PARALLEL_JOBS:-$(($(nproc --all) + 10))}
+PARALLEL_JOBS=${PARALLEL_JOBS:-$(nproc)}
 # TESTS env var may be specified to restrict testing to selected test cases
 
 # Initialize LOGDIR
@@ -29,11 +29,13 @@ myLog ()
 
 doTest ()
 {
-    dir=$(dirname $test)
-    exe=$(basename $test)
+    t=$1
+    nr=$2
+    dir=$(dirname $t)
+    exe=$(basename $t)
     l="$LOGDIR/$exe"
     mkdir -p $l
-    echo "[$i/$c] Testing $exe ..." | tee -a $l/summary
+    echo "[$nr/$c] Testing $exe ..." | tee -a $l/summary
     pushd $dir >/dev/null
         PATH="$ORIG_PATH:$PWD"
         ./$exe >$l/log1std 2>$l/log1err ||:
@@ -88,9 +90,9 @@ c=${#files[@]}; i=0
 
 # Run tests in parallel
 for test in "${files[@]}"; do
-    while test "$(pgrep -P $$ | wc -l)" -gt $PARALLEL_JOBS; do sleep 0.1; done
+    while test "$(jobs -l | wc -l)" -gt $PARALLEL_JOBS; do sleep 0.1; done
     i=$((++i))
-    doTest &
+    doTest $test $i &
 done
 
 wait
Comment 16 mcermak 2025-05-05 12:27:35 UTC
Created attachment 180947 [details]
possible patch

Hi Mark, thank you for the update, it works for me!  For completeness, I'm attaching the modified patch here.
Comment 17 Mark Wielaard 2025-05-06 10:36:14 UTC
Pushed as:

commit 0ce068434ec359324ff1e27f0c7bd5df8ae4e813
Author: Martin Cermak <mcermak@redhat.com>
Date:   Mon May 5 14:23:16 2025 +0200

    Run the LTP syscall tests in parallel
    
    - Run the LTP syscall tests in parallel to save time.
    - Allow for running only selected tests using TESTS env var.

Note that this bug is already resolved. Lets use a new bug for any followups like the automake-like bunsen output discussed on irc.