Bug 468575

Summary:	Add support for RISC-V
Product:	[Developer tools] valgrind	Reporter:	Petr Pavlu <petr.pavlu>
Component:	general	Assignee:	Julian Seward <jseward>
Status:	CONFIRMED ---
Severity:	normal	CC:	laokz, mark, pjfloyd, rjiejie, rjones, sam
Priority:	NOR
Version:	unspecified
Target Milestone:	---
Platform:	Other
OS:	Linux
Latest Commit:		Version Fixed In:
Sentry Crash Report:
Attachments:	v1-0001-riscv64-valgrind-new.patch v1-0002-riscv64-vex-new.patch v1-0003-riscv64-tests-new.patch v1-0004-riscv64-valgrind-mod.patch v1-0005-riscv64-vex-mod.patch v1-0006-riscv64-tests-mod.patch

Description Petr Pavlu 2023-04-16 14:25:50 UTC

RISC-V [1, 2] is a modern open-standard architecture which has seen increasing adoption lately.

I and a few other contributors have been working on adding support for this architecture in Valgrind, using a downstream repository [3]. The port has reached a state where I think it has a good quality to be added to the official Valgrind project and continued to be developed there.

The port adds supports for the 64-bit RISC-V architecture, specifically for the RV64GC instruction set which is what most Linux distributions currently target. Support for further extensions is expected to be incrementally implemented in the future, as more extensions get defined by architecture stewards and make it into hardware.

The work is cut into 6 patches. The first three add new files specific only to the port and they are split for Valgrind (Coregrind+tools), VEX and tests. The last three modify already present files and are split in the same way.

Implementation mostly tries to stay consistent with arm64 and mips64. No new Iops are currently added by the port but some new ones might be needed in the future. For instance, support for the Zfh extension (half-precision floating-point) will likely require a few additional Iops, although these should be needed for the complete Armv8.2-A FP16 support too.

The port passes almost completely the Valgrind test suite. A few tests still currently fail and need to be investigated.

The port gained some reasonable traction and a few other people contributed to it in the last year by implementing missing bits and fixing various bugs.

[1] https://riscv.org/
[2] https://en.wikipedia.org/wiki/RISC-V
[3] https://github.com/petrpavlu/valgrind-riscv64

Comment 1 Petr Pavlu 2023-04-16 14:26:44 UTC

Created attachment 158147 [details]
v1-0001-riscv64-valgrind-new.patch

Comment 2 Petr Pavlu 2023-04-16 14:27:03 UTC

Created attachment 158148 [details]
v1-0002-riscv64-vex-new.patch

Comment 3 Petr Pavlu 2023-04-16 14:27:26 UTC

Created attachment 158149 [details]
v1-0003-riscv64-tests-new.patch

Comment 4 Petr Pavlu 2023-04-16 14:27:39 UTC

Created attachment 158150 [details]
v1-0004-riscv64-valgrind-mod.patch

Comment 5 Petr Pavlu 2023-04-16 14:27:53 UTC

Created attachment 158151 [details]
v1-0005-riscv64-vex-mod.patch

Comment 6 Petr Pavlu 2023-04-16 14:28:05 UTC

Created attachment 158152 [details]
v1-0006-riscv64-tests-mod.patch

Comment 7 Petr Pavlu 2023-04-16 14:30:47 UTC

v1:
* attachment 158147 [details] v1-0001-riscv64-valgrind-new.patch,
* attachment 158148 [details] v1-0002-riscv64-vex-new.patch,
* attachment 158149 [details] v1-0003-riscv64-tests-new.patch,
* attachment 158150 [details] v1-0004-riscv64-valgrind-mod.patch,
* attachment 158151 [details] v1-0005-riscv64-vex-mod.patch,
* attachment 158152 [details] v1-0006-riscv64-tests-mod.patch.

Based on ab6d3928a ("regtest: warning cleanup").

Test suite results:
> == 675 tests, 4 stderr failures, 2 stdout failures, 1 stderrB failure, 1 stdoutB failure, 0 post failures ==
> gdbserver_tests/hginfo                   (stderrB)
> gdbserver_tests/hgtls                    (stdoutB)
> memcheck/tests/cdebug_zlib_gnu           (stderr)
> memcheck/tests/linux/stack_changes       (stdout)
> memcheck/tests/linux/stack_changes       (stderr)
> memcheck/tests/pointer-trace             (stderr)
> memcheck/tests/sh-mem-random             (stdout)
> memcheck/tests/sh-mem-random             (stderr)

Comment 8 Paul Floyd 2023-04-17 08:27:15 UTC

hginfo fails on several systems
TLS is, well, tricky

Are the other failures "reaonable"?

Comment 9 Petr Pavlu 2023-04-17 20:03:19 UTC

(In reply to Paul Floyd from comment #8)
> hginfo fails on several systems
> TLS is, well, tricky
> 
> Are the other failures "reaonable"?

The following is a short summary why the remaining tests fail. However, I'm not sure yet for any of them what is their underlying problem and still need to do proper investigation.

> gdbserver_tests/hginfo                   (stderrB)

The output has information about an extra lock inside _rtld_local. Likely a duplicate of bug 444487.

> gdbserver_tests/hgtls                    (stdoutB)

Fails because GDB is not able to determine an address of the variable 'local'. The test used to pass on my system, possibly something compiler-related.

> memcheck/tests/cdebug_zlib_gnu           (stderr)

Produced backtrace contains rubbish 'at 0x........: main (ng long int:3)' instead of 'at 0x........: main (cdebug.c:3)'.

> memcheck/tests/linux/stack_changes       (stdout)
> memcheck/tests/linux/stack_changes       (stderr)

Lots of unexpected invalid writes reported from hello() -> printf().

> memcheck/tests/pointer-trace             (stderr)

The second mmap() call in the test unexpectedly fails with -EINVAL.

> memcheck/tests/sh-mem-random             (stdout)
> memcheck/tests/sh-mem-random             (stderr)

Another unexpected mmap() failure.

Comment 10 JojoR 2023-04-20 01:42:13 UTC

(In reply to Petr Pavlu from comment #9)
> (In reply to Paul Floyd from comment #8)
> > hginfo fails on several systems
> > TLS is, well, tricky
> > 
> > Are the other failures "reaonable"?
> 
> > memcheck/tests/sh-mem-random             (stdout)
> > memcheck/tests/sh-mem-random             (stderr)
> 
> Another unexpected mmap() failure.

From our investigation, high address mmap is not supported by some riscv board/os if it support feature "Sv39" only,
so it is reasonable also :)

Comment 11 JojoR 2023-06-25 08:51:13 UTC

@Paul Floyd

Has anyone reviewed these patches ?

It has taken two months and there is not any feedback :)

Comment 12 laokz 2023-07-30 14:59:07 UTC

> > memcheck/tests/pointer-trace             (stderr)
> The second mmap() call in the test unexpectedly fails with -EINVAL.
The second mmap() is called with PROT_WRITE only. I'm pretty sure the result is 'resonable' because it is KERNEL relevant.

From https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8aeb7b17f04ef40f620c763502e2b644c5c73efd , it shows that kernel would return EINVAL for mmap(PROT_WRITE only) if without this commit, or the call would be treated as with an implied PROT_READ like other architectures.

I tested on openEuler 22.03 with/without this commit and got expected success/failure result.

>> with the commit
> # perl tests/vg_regtest memcheck/tests/pointer-trace.vgtest
> pointer-trace:   valgrind   -q --leak-check=yes ./pointer-trace
> 
> == 1 test, 0 stderr failures, 0 stdout failures, 0 stderrB failures, 0 stdoutB failures, 0 post failures ==

>> without the commit
> # perl tests/vg_regtest memcheck/tests/pointer-trace.vgtest
> pointer-trace:   valgrind   -q --leak-check=yes ./pointer-trace
> *** pointer-trace failed (stderr) ***
> 
> == 1 test, 1 stderr failure, 0 stdout failures, 0 stderrB failures, 0 stdoutB failures, 0 post failures ==
> memcheck/tests/pointer-trace             (stderr)
> 
> # cat memcheck/tests/pointer-trace.stderr.out
> trap 2 failed: Invalid argument
> 1,000 bytes in 1 blocks are definitely lost in loss record ... of ...
>    at 0x........: malloc (vg_replace_malloc.c:...)
>    by 0x........: main (pointer-trace.c:86)

Comment 13 JojoR 2024-06-17 02:23:11 UTC

@Paul Floyd

Has anyone reviewed these patches ?

It has taken long time and there is not any feedback :)

Could we review & push these basic ISAs (rv64gc) implementation first ?
other of extensions like Vector of RISCV depend on this, that will be the next steps :)

Comment 14 Paul Floyd 2024-06-17 08:24:11 UTC

I built and ran regtest on my github clone of https://github.com/petrpavlu/valgrind-riscv64 on FreeBSD amd64.

There were a couple of unexpected failures in memcheck/tests/freebsd (one of the scalars and eventfd2). I'll look into them and also try some more platforms.

Comment 15 JojoR 2024-06-18 01:35:30 UTC

(In reply to Paul Floyd from comment #14)
> I built and ran regtest on my github clone of
> https://github.com/petrpavlu/valgrind-riscv64 on FreeBSD amd64.
> 
> There were a couple of unexpected failures in memcheck/tests/freebsd (one of
> the scalars and eventfd2). I'll look into them and also try some more
> platforms.

Good news, thanks for your efforts, hope that will been fixed soon :)

Comment 16 Paul Floyd 2024-06-18 12:21:02 UTC

I think it's a regression that I added. Should be fixed. I'll check again when Petr rebases next, and I'll also give Linux amd64 a go and see if I can build and run regtest on riscv.

Comment 17 Paul Floyd 2024-06-20 11:45:48 UTC

On amd64 Linux, one warning that I've opened an issue for on Petr's github.

A few other warnings about pointer-to-int-casts.

regtest clean.

Comment 18 Paul Floyd 2024-06-21 06:42:59 UTC

On a machine that is StarFive JH7100 SoC (2x SiFive U74 at 1.20 GHz)

I have a problem with make check.

gcc -Winline -Wall -Wshadow -Wno-long-long -g -fno-stack-protector       -o compressed compressed.o  
compressed.o: in function `test_compressed_10':
/home/paulf/valgrind-riscv64/none/tests/riscv64/compressed.c:347:(.text+0x56d16): dangerous relocation: The addend isn't allowed for R_RISCV_GOT_HI20
collect2: error: ld returned 1 exit status
make[5]: *** [Makefile:741: compressed] Error 1
make[5]: Leaving directory '/home/paulf/valgrind-riscv64/none/tests/riscv64'
make[4]: *** [Makefile:899: check-am] Error 2
make[4]: Leaving directory '/home/paulf/valgrind-riscv64/none/tests/riscv64'
make[3]: *** [Makefile:2270: check-recursive] Error 1
make[3]: Leaving directory '/home/paulf/valgrind-riscv64/none/tests'
make[2]: *** [Makefile:1101: check-recursive] Error 1
make[2]: Leaving directory '/home/paulf/valgrind-riscv64/none'
make[1]: *** [Makefile:926: check-recursive] Error 1
make[1]: Leaving directory '/home/paulf/valgrind-riscv64'
make: *** [Makefile:1225: check] Error 2

Maybe a GCC problem?

gcc --version
gcc (Debian 13.2.0-13) 13.2.0

Comment 19 JojoR 2024-06-21 06:47:34 UTC

(In reply to Paul Floyd from comment #18)
> On a machine that is StarFive JH7100 SoC (2x SiFive U74 at 1.20 GHz)
> 
> I have a problem with make check.
> 
> gcc -Winline -Wall -Wshadow -Wno-long-long -g -fno-stack-protector       -o
> compressed compressed.o  
> compressed.o: in function `test_compressed_10':
> /home/paulf/valgrind-riscv64/none/tests/riscv64/compressed.c:347:(.
> text+0x56d16): dangerous relocation: The addend isn't allowed for
> R_RISCV_GOT_HI20
> collect2: error: ld returned 1 exit status
> make[5]: *** [Makefile:741: compressed] Error 1
> make[5]: Leaving directory '/home/paulf/valgrind-riscv64/none/tests/riscv64'
> make[4]: *** [Makefile:899: check-am] Error 2
> make[4]: Leaving directory '/home/paulf/valgrind-riscv64/none/tests/riscv64'
> make[3]: *** [Makefile:2270: check-recursive] Error 1
> make[3]: Leaving directory '/home/paulf/valgrind-riscv64/none/tests'
> make[2]: *** [Makefile:1101: check-recursive] Error 1
> make[2]: Leaving directory '/home/paulf/valgrind-riscv64/none'
> make[1]: *** [Makefile:926: check-recursive] Error 1
> make[1]: Leaving directory '/home/paulf/valgrind-riscv64'
> make: *** [Makefile:1225: check] Error 2
> 
> Maybe a GCC problem?
> 
> gcc --version
> gcc (Debian 13.2.0-13) 13.2.0

Yes, looks ld met some bugs :(
and I suggest you can get the latest toolchain from here:
https://github.com/riscv-collab/riscv-gnu-toolchain/releases/

Comment 20 Paul Floyd 2024-07-02 10:47:54 UTC

And regtest results

== 737 tests, 6 stderr failures, 2 stdout failures, 0 stderrB failures, 1 stdoutB failure, 0 post failures ==
gdbserver_tests/hgtls                    (stdoutB)
memcheck/tests/pointer-trace             (stderr)
none/tests/double_close_range            (stderr)
none/tests/double_close_range_sup        (stderr)
none/tests/double_close_range_xml        (stderr)
none/tests/riscv64/compressed            (stdout)
none/tests/riscv64/compressed            (stderr)
none/tests/riscv64/integer               (stdout)
none/tests/riscv64/integer               (stderr)

I've seen memcheck/tests/pointer-trace issues before where the working directory is on NFS. Not riscv related.

The double_close_range tests are fairly new and may need some extra filtering.

The two riscv64 fails look like a  problem with the toolchain. I'll see if trying a local build of the toolchain helps.

Comment 21 Petr Pavlu 2024-07-08 21:16:08 UTC

(In reply to Paul Floyd from comment #20)
> And regtest results
> 
> == 737 tests, 6 stderr failures, 2 stdout failures, 0 stderrB failures, 1
> stdoutB failure, 0 post failures ==
> gdbserver_tests/hgtls                    (stdoutB)

The hgtls test fails because GDB is confused by missing DW_AT_location attributes for TLS variables. It looks to be a problem with the RISC-V target in GCC not implementing TARGET_ASM_OUTPUT_DWARF_DTPREL. I have a preliminary patch for GCC which adds it.

> [...]
> none/tests/riscv64/compressed            (stdout)
> none/tests/riscv64/compressed            (stderr)
> none/tests/riscv64/integer               (stdout)
> none/tests/riscv64/integer               (stderr)
> 
> [...]
> The two riscv64 fails look like a  problem with the toolchain. I'll see if
> trying a local build of the toolchain helps.

The mentioned compilation failure of RISC-V tests is some problem related to them being compiled as position-independent, which is apparently the default on Debian. It is possible for me to reproduce it locally. I'll have a closer look.

Comment 22 JojoR 2024-07-18 03:20:15 UTC

> > [...]
> > none/tests/riscv64/compressed            (stdout)
> > none/tests/riscv64/compressed            (stderr)
> > none/tests/riscv64/integer               (stdout)
> > none/tests/riscv64/integer               (stderr)
> > 
> > [...]
> > The two riscv64 fails look like a  problem with the toolchain. I'll see if
> > trying a local build of the toolchain helps.
> 
> The mentioned compilation failure of RISC-V tests is some problem related to
> them being compiled as position-independent, which is apparently the default
> on Debian. It is possible for me to reproduce it locally. I'll have a closer
> look.

What's the details about the error output ?
I can help to have a look if that is obvious :)
BTW, it didn't happen in my environment.

Comment 23 JojoR 2024-10-10 06:51:05 UTC

Hi, Paul Floyd & Petr Pavlu

Any progress ? 
It's depression if there are a few small issues :(

Comment 24 Paul Floyd 2024-10-10 09:22:01 UTC

(In reply to JojoR from comment #23)
> Hi, Paul Floyd & Petr Pavlu
> 
> Any progress ? 
> It's depression if there are a few small issues :(

There are two things that I consider as requirements.

Firstly someone to get git write access to maintain this.

Secondly all the usual code requirements (review for such a big change, all regressions passing on all platforms).

Comment 25 Mark Wielaard 2024-10-11 14:59:40 UTC

https://builder.sourceware.org/ now has several riscv builders. We should setup one.
We are already testing armhf, i386, ppc64, arm64, ppc64le, s390x, x86_64 and power9/10.
https://builder.sourceware.org/buildbot/#/builders?tags=valgrind