Bug 476465 - AArch64 ARMv8.3 LDAPR/LDAPRH/LDAPRB instructions not supported
Summary: AArch64 ARMv8.3 LDAPR/LDAPRH/LDAPRB instructions not supported
Status: CONFIRMED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: 3.22 GIT
Platform: Android Android 13.x
: NOR normal
Target Milestone: ---
Assignee: Paul Floyd
URL:
Keywords:
: 476895 (view as bug list)
Depends on:
Blocks:
 
Reported: 2023-11-02 11:23 UTC by landfillbaby69
Modified: 2025-04-15 18:22 UTC (History)
8 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
patch adding LDAPR instruction handling (1.11 KB, text/plain)
2023-11-29 11:34 UTC, Arne Juul
Details

Note You need to log in before you can comment on or make changes to this bug.
Description landfillbaby69 2023-11-02 11:23:14 UTC
SUMMARY

Exactly what is says in the title.
AArch64 ARMv8.3 LDAPR/LDAPRH/LDAPRB instructions aren't supported by Valgrind 3.22.0.
It's annoying since on my Pixel 6, the linker64 binary itself has these instructions.


STEPS TO REPRODUCE

On Termux via an ARMv8.3 device running Android 14:
echo 'int main(void) {}' | clang-17 -g -o test -xc - && valgrind ./test

Alternatively, compile and run something like this quick dirty test (just returns the first byte of itself):
.text
.globl _start
_start:
adrp    x8, _start
add     x8, x8, :lo12:_start
ldaprb  w0, [x8]
mov     x8, #93 // exit
svc     #0


OBSERVED RESULT

==9889== Memcheck, a memory error detector
==9889== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==9889== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==9889== Command: ./test
==9889==
ARM64 front end: load_store
disInstr(arm64): unhandled instruction 0x38BFC109
disInstr(arm64): 0011'1000 1011'1111 1100'0001 0000'1001
==9889== valgrind: Unrecognised instruction at address 0x411e4f0.
==9889==    at 0x411E4F0: __dl__Z26__libc_safe_arc4random_bufPvm (in /apex/com.android.runtime/bin/linker64)
==9889==    by 0x41235E7: __dl___libc_init_main_thread_late (in /apex/com.android.runtime/bin/linker64)                                                                         ==9889==    by 0x40D4F4F: __dl__ZL29__linker_init_post_relocationR19KernelArgumentBlockR6soinfo (in /apex/com.android.runtime/bin/linker64)
==9889==    by 0x40D4ECF: __dl___linker_init (in /apex/com.android.runtime/bin/linker64)
==9889==    by 0x40640B7: __dl__start (in /apex/com.android.runtime/bin/linker64)       ==9889== Your program just tried to execute an instruction that Valgrind
==9889== did not recognise.  There are two possible reasons for this.
==9889== 1. Your program has a bug and erroneously jumped to a non-code
==9889==    location.  If you are running Memcheck and you just saw a
==9889==    warning about a bad jump, it's probably your program's fault.
==9889== 2. The instruction is legitimate but Valgrind doesn't handle it,
==9889==    i.e. it's Valgrind's fault.  If you think this is the case or
==9889==    you are not sure, please let us know and we'll try to fix it.
==9889== Either way, Valgrind will now raise a SIGILL signal which will
==9889== probably kill your program.
==9889==
==9889== Process terminating with default action of signal 4 (SIGILL)
==9889==  Illegal opcode at address 0x411E4F0
==9889==    at 0x411E4F0: __dl__Z26__libc_safe_arc4random_bufPvm (in /apex/com.android.runtime/bin/linker64)
==9889==    by 0x41235E7: __dl___libc_init_main_thread_late (in /apex/com.android.runtime/bin/linker64)
==9889==    by 0x40D4F4F: __dl__ZL29__linker_init_post_relocationR19KernelArgumentBlockR6soinfo (in /apex/com.android.runtime/bin/linker64)                                     ==9889==    by 0x40D4ECF: __dl___linker_init (in /apex/com.android.runtime/bin/linker64)
==9889==    by 0x40640B7: __dl__start (in /apex/com.android.runtime/bin/linker64)       ==9889==
==9889== HEAP SUMMARY:                                                                  ==9889==     in use at exit: 0 bytes in 0 blocks
==9889==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated                       ==9889==
==9889== All heap blocks were freed -- no leaks are possible
==9889==
==9889== For lists of detected and suppressed errors, rerun with: -s
==9889== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Illegal instruction

Note: 0x38BFC109 is ldaprb w9, [x8]


EXPECTED RESULT

No failure.
Comment 1 landfillbaby69 2023-11-02 11:34:30 UTC
i should have set the severity higher since it renders it unusable for dynamically linked binaries on my device
Comment 2 Romain Geissler 2023-11-15 13:27:42 UTC
*** Bug 476895 has been marked as a duplicate of this bug. ***
Comment 3 Arne Juul 2023-11-29 11:34:32 UTC
Created attachment 163612 [details]
patch adding LDAPR instruction handling
Comment 4 Arne Juul 2023-11-29 11:50:27 UTC
after upgrading to GCC 13 we're seeing that the compiler generates LDAPR instructions
for simple code when "RCpc" extension is enabled, as described here:
https://community.arm.com/arm-community-blogs/b/tools-software-ides-blog/posts/enabling-rcpc-in-gcc-and-llvm

I've attached a minimal patch that handles the simple variants of LDAPR together with LDAR, following doc here:
https://developer.arm.com/documentation/ddi0602/2023-06/Base-Instructions/LDAPR--Load-Acquire-RCpc-Register-

There's various other variants which I have not attempted handling of (including Post-index variant, also described in the doc above).
Comment 5 Arne Juul 2023-11-29 12:01:35 UTC
on RHEL8 (or almalinux 8.9 in my case) this is easily triggered by installing gcc-toolset-13 and compiling this C++ program:

#include <stdlib.h>
const char * my_getenv_once() {
    static const char * value = getenv("MYVALUE");
    return value;
}
int main(int argc, char **argv) {
    return (my_getenv_once() ? 0 : 1);
}

$ rpm -q gcc-toolset-13-gcc-c++
gcc-toolset-13-gcc-c++-13.1.1-4.3.el8.aarch64
$ /opt/rh/gcc-toolset-13/root/usr/bin/c++ -g -O2 -march=armv8.2-a+rcpc mytest.cpp -o mytest
$ /usr/bin/valgrind ./mytest 
==210903== Memcheck, a memory error detector
==210903== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==210903== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==210903== Command: ./mytest
==210903== 
ARM64 front end: load_store
disInstr(arm64): unhandled instruction 0xF8BFC260
disInstr(arm64): 1111'1000 1011'1111 1100'0010 0110'0000
==210903== valgrind: Unrecognised instruction at address 0x4101e4.
==210903==    at 0x4101E4: my_getenv_once() (myfun.cpp:3)
==210903==    by 0x4100AB: main (myfun.cpp:7)
Comment 6 Mark Wielaard 2023-12-08 14:26:09 UTC
Thanks for the patch. Still looking for a machine to test it on. Although with your patch I can now run the testcase under valgrind while my test machine doesn't actually support LDAPR.

I am not really familiar with the arm64 code, but the patch does look reasonable.

I find the arm documentation hard to navigate, which variants exactly are we missing and which are required to be supported?
Comment 7 Arne Juul 2023-12-11 13:48:57 UTC
From the ARM docs all the still-missing instructions are only available with extra feature flags FEAT_LRCPC2 or FEAT_LRCPC3, so they are probably not as important.  Someone familiar with how the "normal" load/store family of instructions are implemented in valgrind could probably add support for the unscaled immediate / post-index / pre-index variants, but that's out of my reach.

FEAT_LRCPC2: LDAPR (unscaled immediate) and STLR (unscaled immediate)
FEAT_LRCPC3 additions: post-index LDAPR, LDIAPP, STILP, and pre-index STLR instructions, and
    if advanced SIMD and floating-point is implemented, also the
    LDAPUR (SIMD&FP), LDAP1 (SIMD&FP), STLUR (SIMD&FP), and STL1 (SIMD&FP) instructions
Comment 8 Romain Geissler 2024-02-24 10:48:47 UTC
Hi,

On this patch implementing only LDAPR, is it going to be pushed in the repo or we are waiting for a patch implementing all 3 variants ?

Cheers,
Romain
Comment 9 Petteri Räty 2024-07-07 08:01:03 UTC
(In reply to Mark Wielaard from comment #6)
> Thanks for the patch. Still looking for a machine to test it on. Although
> with your patch I can now run the testcase under valgrind while my test
> machine doesn't actually support LDAPR.
> 

You can use a Hetzner Cloud CAX11 Ampere VM for practically no cost.

https://www.hetzner.com/cloud/
Comment 10 Paul Floyd 2024-07-08 07:31:41 UTC
I have some credits on AWS, so I could probably test it there.
Comment 11 William Ashley 2024-09-24 18:33:15 UTC
I was able to verify that the proposed patch does result in a successful valgrind run of Arne's production case on Graviton.

That's only verifying that it applies, compiles, doesn't hit the unrecognized instruction error in valgrind, and the program behaves as it does outside of valgrind. I haven't however, dove into the IR and surrounding logic to ensure that it is semantically correct.
Comment 12 Rob Bresalier 2024-11-20 13:52:29 UTC
I believe I ran into this issue. I applied the patch and my application gets past it, although I don't know 100% if it is correct. Would be nice if this could get reviewed and merged into the valgrind release.

I am using ARM Neoverse N2 CPU core:
https://www.marvell.com/content/dam/marvell/en/public-collateral/embedded-processors/marvell-octeon-10-dpu-platform-product-brief.pdf

According to SIGILL diagnostic the unhandled instruction is: 0xB8BFC000

When I decode this instruction using this link: https://armconverter.com/?disasm&code=00C0BFB8

I see it is also ldapr instruction:

ldapr w0, [x0]

My backtrace looks like this:

ARM64 front end: load_store
disInstr(arm64): unhandled instruction 0xB8BFC000
disInstr(arm64): 1011'1000 1011'1111 1100'0000 0000'0000
==3466== valgrind: Unrecognised instruction at address 0xc5a7330.
==3466==    at 0xC5A7330: load (core_arch_ops_gcc_aarch64.hpp:857)
==3466==    by 0xC5A7330: load (core_ops_gcc_atomic.hpp:86)
==3466==    by 0xC5A7330: load (atomic_impl.hpp:473)
==3466==    by 0xC5A7330: boost::thread_detail::enter_once_region(boost::once_flag&) (once_atomic.cpp:39)
==3466==    by 0xC5A1187: call_once<void (&)()> (once_atomic.hpp:123)
==3466==    by 0xC5A1187: boost::detail::set_current_thread_data(boost::detail::thread_data_base*) (thread.cpp:160)
==3466==    by 0xC5A44F3: thread_proxy (thread.cpp:174)
==3466==    by 0xC837B97: start_thread (pthread_create.c:481)
==3466==    by 0xD00199B: thread_start (clone.S:79)
==3466== Your program just tried to execute an instruction that Valgrind
==3466== did not recognise.  There are two possible reasons for this.
==3466== 1. Your program has a bug and erroneously jumped to a non-code
==3466==    location.  If you are running Memcheck and you just saw a
==3466==    warning about a bad jump, it's probably your program's fault.
==3466== 2. The instruction is legitimate but Valgrind doesn't handle it,
==3466==    i.e. it's Valgrind's fault.  If you think this is the case or

==3466==    you are not sure, please let us know and we'll try to fix it.
==3466== Either way, Valgrind will now raise a SIGILL signal which will
==3466== probably kill your program.
==3466== 
==3466== Process terminating with default action of signal 4 (SIGILL): dumping core
Comment 13 Rob Bresalier 2024-11-20 13:54:49 UTC
To add to my previous comment, the unhandled instruction occurs at:

line 857 of boost-1.82.0/boost-1.82.0/libs/atomic/include/boost/atomic/detail/core_arch_ops_gcc_aarch64.hpp

which you can see below is ldapr:

854 #if defined(BOOST_ATOMIC_DETAIL_AARCH64_HAS_RCPC)
855             if (order == memory_order_consume || order == memory_order_acquire)
856             {
857                 __asm__ __volatile__
858                 (
859                     "ldapr %w[value], %[storage]\n\t"
860                     : [value] "=r" (v)
861                     : [storage] "Q" (storage)
862                     : "memory"
863                 );
864             }
865             else
866 #endif
867             {
868                 __asm__ __volatile__
869                 (
870                     "ldar %w[value], %[storage]\n\t"
871                     : [value] "=r" (v)
872                     : [storage] "Q" (storage)
873                     : "memory"
874                 );
875             }
Comment 14 Paul Floyd 2024-11-21 08:36:52 UTC
(In reply to Rob Bresalier from comment #12)
> I believe I ran into this issue. I applied the patch and my application gets
> past it, although I don't know 100% if it is correct. Would be nice if this
> could get reviewed and merged into the valgrind release.

There is also the problem of access to hardware that supports these features.
Comment 15 Johan Förberg 2024-11-22 10:34:38 UTC
The patch by Arne Juul fixes the issue for me. Please considering merging the patch.