SUMMARY I ran thunderbird mail client under valgrind. There is a test suite called xpcshell-tst, and during the execution of test suite running Comm-Centeral Thuderbrid, I got a "Conditional jump or move depends on uninitialised value(s)" in one of the file. STEPS TO REPRODUCE 1. The original bug is filed in mozilla's bugzilla. https://bugzilla.mozilla.org/show_bug.cgi?id=1952749 2. I ran comm-central thunderbird (compiled locally on my PC) under valgrind with the following parameters.: Please note that I am NOT using mozilla's versatile |mach| command that can be used to invoke valgrind in a simplistic manner. It has a shell quoting problem and cannot pass the complext valgrind options which I use correctly. So, I opted to rename the original xpcshell binary to xpcshell-bin, and installs a binary that calls xpcshell-bin under valgrind with appropriate options. With this setup, I ran thunderbird's xpcshell-test test suite, and the suite is executed by thunderbird running under valgrind. Here is the options that I pass to valgrind.: run-valgrind-xpcshell invoked ... sizeof(prepargs)=136 argc=26 finalargs[ 0] = valgrind finalargs[ 1] = --track-origins=yes finalargs[ 2] = --trace-children=yes finalargs[ 3] = --trace-children-skip=/usr/bin/lsb_release,/usr/bin/hg,/bin/rm,*/bin/certutil,*/bin/pk12util,*/bin/ssltunnel,*/bin/uname,*/bin/which,*/bin/ps,*/bin/grep,*/bin/java,*/fix-stacks,*/firefox/firefox,*/bin/firefox-esr,*/bin/python,*/bin/python3,*/bin/python2,*/bin/python2.7,*/bin/lsb_release,*/bin/bash,*/bin/nodejs,*/bin/node,*/bin/xpcshell,python3,/bin/sh finalargs[ 4] = --vex-iropt-register-updates=allregs-at-mem-access finalargs[ 5] = --smc-check=all-non-file finalargs[ 6] = --gen-suppressions=all finalargs[ 7] = --show-mismatched-frees=no finalargs[ 8] = --fair-sched=yes finalargs[ 9] = --num-callers=50 finalargs[ 10] = --suppressions=/NEW-SSD/NREF-COMM-CENTRAL/mozilla/build/valgrind/cross-architecture.sup finalargs[ 11] = --suppressions=/NEW-SSD/moz-obj-dir/objdir-tb3/_valgrind/i386-pc-linux-gnu.sup finalargs[ 12] = --suppressions=/home/ishikawa/Dropbox/myown.sup finalargs[ 13] = --show-possibly-lost=no finalargs[ 14] = --malloc-fill=0xA5 finalargs[ 15] = --free-fill=0xC3 finalargs[ 16] = /NEW-SSD/moz-obj-dir/objdir-tb3/dist/bin/xpcshell-bin finalargs[ 17] = -g finalargs[ 18] = /NEW-SSD/moz-obj-dir/objdir-tb3/dist/bin finalargs[ 19] = -a finalargs[ 20] = /NEW-SSD/moz-obj-dir/objdir-tb3/dist/bin finalargs[ 21] = -m finalargs[ 22] = -e finalargs[ 23] = const _HEAD_JS_PATH = "/NEW-SSD/NREF-COMM-CENTRAL/mozilla/testing/xpcshell/head.js"; finalargs[ 24] = -e finalargs[ 25] = const _MOZINFO_JS_PATH = "/NEW-SSD/moz-obj-dir/objdir-tb3/temp/xpc-profile-g5u3x9tf/mozinfo.json"; finalargs[ 26] = -e finalargs[ 27] = const _PREFS_FILE = "/NEW-SSD/moz-obj-dir/objdir-tb3/temp/user.js"; finalargs[ 28] = -e finalargs[ 29] = const _TESTING_MODULES_DIR = "/NEW-SSD/moz-obj-dir/objdir-tb3/_tests/modules/"; finalargs[ 30] = -f finalargs[ 31] = /NEW-SSD/NREF-COMM-CENTRAL/mozilla/testing/xpcshell/head.js finalargs[ 32] = -e finalargs[ 33] = const _HEAD_FILES = ["/NEW-SSD/moz-obj-dir/objdir-tb3/_tests/xpcshell/comm/mailnews/imap/test/unit/head_imap_maildir.js"]; finalargs[ 34] = -e finalargs[ 35] = const _JSDEBUGGER_PORT = 0; finalargs[ 36] = -e finalargs[ 37] = const _TEST_FILE = ["/NEW-SSD/moz-obj-dir/objdir-tb3/_tests/xpcshell/comm/mailnews/imap/test/unit/test_localToImapFilter.js"]; finalargs[ 38] = -e finalargs[ 39] = const _TEST_NAME = "xpcshell-maildir.ini:comm/mailnews/imap/test/unit/test_localToImapFilter.js"; finalargs[ 40] = -e finalargs[ 41] = _execute_test(); quit(0); Please note that I am using fair scheduling which has run thunderbird (and presumably Firefox) without issues regarding thread scheduling issues before. But this time, it may not work as expected. OBSERVED RESULT EXPECTED RESULT SOFTWARE/OS VERSIONS uname -a Linux ip030 6.12.12-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.12.12-1 (2025-02-02) x86_64 GNU/Linux ADDITIONAL INFORMATION To summarize the issue discussed in mozilla's bugzilla, thunderbird creates a lambda/function that is then passed to a different thread using SyncRunnable::DispatchToThread(), and usually the different thread executes that function in the context of THAT thread, and then returns the function value if any. The calling thread is blocked as the SyncRunnable::DispatchToThread() name suggests (so it seems).| However, somehow under valgrind, this blocking of the calling thread of SyncRunnable::DisaptachToThread() does not occur and it does not wait for the completion of the function invocation that is to be done on a different thread. Thunderbird merrily proceeds without waiting, so the variable(s) that are assigned values in the function/lambda are not assigned a value yet, and I see "Conditional jump or move depends on uninitialised value(s)". I have investigated this far. I am puzzled since I have not seen this happen before. I have run thunderbird under vlgrind on and off for the last 7-8 years, or maybe longer. But for the last 10 months or so, I did not because of an issue maybe timing issue inside the GUI library. (See https://bugzilla.mozilla.org/show_bug.cgi?id=1880148 ) I say the problem was with thunderbird because the issue discussed there happened with gdb, strace, and valgrind. Basically, slowdown caused by these tools cause the context menu system misbehave. It was NOT valgrind issue. I can't rule out this particular issue crept in the last 10 months. BUT there were a few old code with the same pattern (create a lambda/function and then pass it to a different thread for execution in the context of the different thread, and wait for it). Moreover, there ARE tons of such usages in firefox code base. So I think the code pattern is correct, and it only now happens that valgrind somehow mishandles the thread switch issue completely. Does anything in the newer version (3.24, 3.25GIT) ring a bell? I wonder if I should change "--fair-sched=yes", but I see no reason that it will solve the current issue. I see the same issue under 3.24 and 3.25 GIT. Thank you again for offering this great tool to the programming community.
You can find the use of SyncRunnable::DispatchToSend() in comm-central thunderbird tree. https://searchfox.org/comm-central/search?q=SyncRunnable%3A%3ADispatchToThread%28&path=&case=false®exp=false Within the code used only by Thunderbird: Textual Occurrences (43 lines across 31 files) mailnews/base/src/nsMsgContentPolicy.cpp 212 mozilla::SyncRunnable::DispatchToThread( <--- modified in 2025 mailnews/base/src/nsNewMailnewsURI.cpp 61 mozilla::SyncRunnable::DispatchToThread( <--- modified in 2023 73 mozilla::SyncRunnable::DispatchToThread( <--- ditto 90 mozilla::SyncRunnable::DispatchToThread( <--- ditto 160 mozilla::SyncRunnable::DispatchToThread( <--- modified in 2019 mailnews/imap/src/nsImapProtocol.cpp 1292 mozilla::SyncRunnable::DispatchToThread( <-- modified in 2023 <--- THIS WHERE THE PROBLEM was reported. 1317 mozilla::SyncRunnable::DispatchToThread( <-- modified in 2023 1345 mozilla::SyncRunnable::DispatchToThread( <--- modified in 2023 The rest of the listing is in mozilla-central which is used by Firefox and thunderbird shares it. In order to learn the modify date of the code, we need to access the M-C tree: https://searchfox.org/mozilla-central/search?q=SyncRunnable%3A%3ADispatchToThread%28&path=&case=false®exp=false Many have been creatd in 2010's and in 2020, etc. and thus if Firefox developers have run firefox under valgrind, they may have reported the issue already. So maybe newer versions of valgrind has issue(s) [maybe it does not handle thread sync related primtives correctly? Maybe hellgrind does and memcheck forgets to handle some primitives?]. Or thunderbird does something wrong which firefox gets right. Just a thought.
(In reply to zephyrus00jp from comment #1) > So maybe newer versions of valgrind has issue(s) [maybe it does not handle > thread sync related primtives correctly? Maybe hellgrind does and memcheck > forgets to handle some primitives?]. > Or thunderbird does something wrong which firefox gets right. I don't see anything concrete here that indicates a bug in Valgrind. Memcheck has detected a conditional read error. I strongly suggest that you take memcheck's word that there is an error and don't start making random guesses about other causes. It is possible that the change in scheduling that you get when running under Valgrind is revealing an underlying bug in the guest code. That's still a guest issue. The Valgrind core has to do some hacky things so that newly spawned threads also run under Valgrind. Memcheck doesn't do anything else with thread primitives. DRD and Helgrind intercept pthread functions so that they an validate and record the thread state. The intercepts still call the intercepted pthread functions. In order to see where the error is try using vgdb. You will need 2 terminals, one with valgrind and the other with gdb. When you hit the error you can use the memcheck monitor commands to see which part of the 'if' expression is uninitialized.
(In reply to Paul Floyd from comment #2) > (In reply to zephyrus00jp from comment #1) > > > So maybe newer versions of valgrind has issue(s) [maybe it does not handle > > thread sync related primtives correctly? Maybe hellgrind does and memcheck > > forgets to handle some primitives?]. > > Or thunderbird does something wrong which firefox gets right. > > I don't see anything concrete here that indicates a bug in Valgrind. > Memcheck has detected a conditional read error. I strongly suggest that you > take memcheck's word that there is an error and don't start making random > guesses about other causes. > > It is possible that the change in scheduling that you get when running under > Valgrind is revealing an underlying bug in the guest code. That's still a > guest issue. The Valgrind core has to do some hacky things so that newly > spawned threads also run under Valgrind. Memcheck doesn't do anything else > with thread primitives. DRD and Helgrind intercept pthread functions so that > they an validate and record the thread state. The intercepts still call the > intercepted pthread functions. > > In order to see where the error is try using vgdb. You will need 2 > terminals, one with valgrind and the other with gdb. When you hit the error > you can use the memcheck monitor commands to see which part of the 'if' > expression is uninitialized. The uninitialized value is |rv|. https://searchfox.org/comm-central/source/mailnews/imap/src/nsImapProtocol.cpp#1281 ``` ** * Dispatch socket thread to to determine if connection is alive. */ nsresult nsImapProtocol::IsTransportAlive(bool* alive) { nsresult rv; <------------- THIS is declared without an initialization. auto GetIsAlive = [transport = nsCOMPtr{m_transport}, &rv, alive]() mutable { rv = transport->IsAlive(alive); <-------------- |rv| is supposed to get set in this lambda. }; nsCOMPtr<nsIEventTarget> socketThread( do_GetService(NS_SOCKETTRANSPORTSERVICE_CONTRACTID)); if (socketThread) { mozilla::SyncRunnable::DispatchToThread( socketThread, NS_NewRunnableFunction("nsImapProtocol::IsTransportAlive", GetIsAlive)); <--- calling thread does not stop here to wait for lambda (GetIsAlive), and proceeds. } else { rv = NS_ERROR_NOT_AVAILABLE; } return rv; <--- Thus, this |rv| returns an uninitialized value since GetIsAlive has not been executed. ``` The above is what I found.
So not an issue with memcheck.
(In reply to Paul Floyd from comment #4) > So not an issue with memcheck. I am afraid that I have o disagree here. There is an issue of memcheck here, which I have not seen before. memcheck changes the behavior of the program here. In normal circumstances, |rv| is set properly in the lambda/function that is executed in another thread. (That lambda/function invocation is supposed to be waited.) Here, I found that somehow the control flow is no longer which is supposed to happen (the execution of lambda/function via DispatchToThread() is waited until completion) during the execution of thunderbird. I don't know the details, but here is the first case I have seen memcheck changed the behavior of thunderbird which resulted in logical error, and reported. As I noted in comment 43 (which may not have been clear)), under normal circumstances, |rv| is set correctly before returned. HOWEVER, UNDER VALGRIND/MEMCHECK > NS_NewRunnableFunction("nsImapProtocol::IsTransportAlive", GetIsAlive)); <--- calling thread does not stop here to wait for lambda (GetIsAlive), and proceeds. but UNDER NORMAL RUN WITHOUT VALGRIND, the calling thread STOPS HERE TO WAIT FOR lambda (GetIsAlive). Sorry, it was not clear. But since memcheck somehow changes the control flow despite thread context switch and the wait introduced by monitor/lock, the value returned is uninitialized. See the synchronization code at https://searchfox.org/mozilla-central/source/xpcom/threads/SyncRunnable.h#71 This thread code was written in 2014 and I assume it has worked well for Firefox and Thunderbird for more than a decade. However, the particular instance of the DispatchToThread() call was introduced in 2023. So this particular line of code may have uncovered an issue with memcheck and the primitives of thunderbird mail client. However, plase note there are tons similar call patterns in Firefox created more than dozen years ago or so. See the calls to DispathToThread() in mozilla code base. There are several dozens such places. https://searchfox.org/mozilla-central/search?q=DispatchToThread&path=&case=false®exp=false I assume they have worked under valgrind because there were people who ran firefox under valgrind to find errors. So I wonder what has changed in the last couple of years when I could not run thunderbird under valgrind due to the change of framework for its test suite. (From mozimill to mochitest). Compiler's generated code? I am using GCC-14 for now. Binary utilities? Both valgrind 3.24.0 and valgrind-3.2.5.0GIT ( I compiled locally) showed the symptom. valgrind 3.23 was too old to run thunderbird code of today since there are some syscalls which were not handled by valgrind 3.23. I initially thought of inserting my own sync primitive, but then I realized that thunderbird/firefox has already implemented it already. https://searchfox.org/mozilla-central/source/xpcom/threads/SyncRunnable.h#71 So I am puzzled what I can to make valgrind and thunderbird run together by sticking to the thread synchronization behavior observed under normal run.
BTW, please see the use of valgrind to check for bugs in Firefox. https://blog.mozilla.org/jseward/2015/02/11/mochitests-are-now-valgrind-clean/ (Admittedly, it is old and was written in 2014. But it was my understanding that some people did run firefox for checking memory errors since then.) Also, I checked the Thunderbird code by ASAN run. But ASAN can't detected uninitialized code, that is why I ran thunderbird under valgrind. Problem with thunderbird is that the size of developer community is much smaller than that of firefox and that is why there was not much input from thunderbird developer community on this particular issue. I am following the usage of valgrind with mozilla code: https://firefox-source-docs.mozilla.org/contributing/debugging/debugging_firefox_with_valgrind.html But there have been issues of running thunderbird under valgrind because not many people have run such combination. I think I need to fix this issue one way or the other to make the execution of mochitest suite of thunderbird under valgrind. Otherwise, I cannot trust the result of the test run under valgrind. Obviously, the failure to initialize return code which may have random value would invalidate the test run. Hmm... As of now, ASAN run does not show any glaring errors, but just recently, coverity static analyzer reported various uninitialized field usages and so I am very uncomfortable with the current comm-central thunderbird tree as far as the uninitialized memory issue is concerned. TIA
Does the code run cleanly with Helgrind, DRD and TSAN? This still looks like a thread issue to me and not a memcheck issue.
(In reply to Paul Floyd from comment #7) > Does the code run cleanly with Helgrind, DRD and TSAN? > > This still looks like a thread issue to me and not a memcheck issue. Thank you for your comment. This indeed looks like a thread issue, but the circumstantial evidence suggests it has worked well for firefox and thunderbird for quite a while. Given the difficulty of running thunderbird under valgrind when we execute the test suites, I will opt for running TSAN version of thunderbird. Let me try TSAN first. Then Helgrind, or DRD. I will report the result. TIA
(In reply to zephyrus00jp from comment #8) ... > > Let me try TSAN first. Then Helgrind, or DRD. > I will report the result. > > TIA I am working on this issue. I am checking thunderbird under TSAN and fixed a few race issues. However, now I have hit on race issues in WebRenderer, specifically in OpenGL library and a surprise, garbage collection subsystem. The latter probably should be whitelisted since it is not reported in bugzilla of mozilla firefox or thunderbird. I am trying to whitelist the graphics system's race, but somehow I have not been able to do that. So the progress is very slow.
Any update on reproducing the issue? I agree with Paul that this doesn't seem a memcheck issue but probably some threading issue.
(In reply to Mark Wielaard from comment #10) > Any update on reproducing the issue? > I agree with Paul that this doesn't seem a memcheck issue but probably some > threading issue. Lets close this for now. Feel free to reopen if you are able to reproduce the issue.