Bug 85756 - x86 assembly prefix LOCK to guarantee atomicity has no effect
Summary: x86 assembly prefix LOCK to guarantee atomicity has no effect
Status: RESOLVED DUPLICATE of bug 197793
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: 2.1.2
Platform: Compiled Sources Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on: 197793
Blocks:
  Show dependency treegraph
 
Reported: 2004-07-23 09:41 UTC by smile
Modified: 2009-07-01 08:13 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
assembly code using LOCK (592 bytes, text/plain)
2004-07-23 09:52 UTC, smile
Details
driver file (1.38 KB, text/plain)
2004-07-23 09:53 UTC, smile
Details
clients (1.07 KB, text/plain)
2004-07-23 09:53 UTC, smile
Details
build (166 bytes, text/plain)
2004-07-23 09:55 UTC, smile
Details

Note You need to log in before you can comment on or make changes to this bug.
Description smile 2004-07-23 09:41:51 UTC
Version:           2.1.2 (using KDE KDE 3.2.3)
Installed from:    Compiled From Sources
Compiler:          gcc 3.2.3 
OS:                Linux

The LOCK assembly prefix used to guarantee atomicity at instruction level seems to have no effect when used with Valgrind. The following code is used by Oracle to compare and swap for speedy concurrent access to structures in shared memory.

-------
        .text
        .align  16
        .globl  swapit
        .type   swapit,@function
swapit:
        movl   4(%esp),%ecx             /* ecx = data address          */
        movl   8(%esp),%eax             /* eax = old_value             */
        movl   12(%esp),%edx            /* edx = new_value             */
        lock;  cmpxchgl %edx,(%ecx)     /* Atomic compare and swap     */
        setz   %al
        movzbl %al,%eax                 /* put ZF into %eax            */
        ret                             /* return                      */
        .size   swapit,.-swapit
-------
When --trace-codegen=11111 is enabled the following is noticed
In the first stage x86 -> UCODE I see the LOCK instruction


   (2)     0x8048428:  movl 12(%esp,,),%edx
   (2)
   (2)       10: GETL              %ESP, t14
   (2)       11: LEA1L             12(t14), t12
   (2)       12: LDL               (t12), t16
   (2)       13: PUTL              t16, %EDX
   (2)       14: INCEIPo           $4
   (2)
   (2)     0x804842C:  cmpxchgl %edx,(%ecx)
   (2)
==>(2)       15: LOCKo
   (2)       16: GETL              %ECX, t26
   (2)       17: LDL               (t26), t22
   (2)       18: GETL              %EDX, t20
   (2)       19: GETL              %EAX, t18
   (2)       20: MOVL              t18, t24  
   (2)       21: SUBL              t22, t24 
(-wOSZACP)
   (2)       22: CMOVLz            t20, t22 
(-rOSZACP)
   (2)       23: CMOVLnz           t22, t18 
(-rOSZACP)
----------------------
At the last stage i.e after instrumentation I donot see the LOCK

(2)       12: LDL               (t12), t16 
(2)       13: PUTL              t16, %EDX
(2)       14: INCEIPo           $4
(2)       15: CCALLo            0xB72A4BB1(t4) 
(2)       16: LDL               (t4), t22
(2)       17: MOVL              t10, t24
(2)       18: SUBL              t22, t24  (-wOSZACP)
(2)       19: CMOVLz            t16, t22  (-rOSZACP)
(2)       20: CMOVLnz           t22, t10  (-rOSZACP)
-------------------



A simple testcase is attached that uses the above assembly code and can be used to judge the presence of LOCK mechanism.

The trials were done on a, 

$ uname -a
Linux stacj32 2.4.21-15.ELsmp #1 SMP Thu Apr 22 00:27:41 EDT 2004 i686 i686 i386 GNU/Linux

$ valgrind --version
valgrind-2.1.2
(The recently release development version)

4 files are attached. swap.s, client.c, server.c, Makefile

The swap.s is a assembly code containing swapit function that performs atomic swap operation. A shared memory location, old_value and new_value is passed to this function. The swapit function reads the shared memory and compares it with the old_value if they are equal then the new_value is overwritten. If they differ swapit returns  false.

The server.c code creates a shared memory segment of size 5 bytes. The last byte is used to control the start stop of the clients. When its set to 1 the clients are ready to go. when its set to 0 the clients exit. The server tries to swap a 0 value in the first 4 bytes (int) with a 1.

The client.c attaches to the shared memory segment and tries to swap 1 with a 0. Minimum of 2 instances of the client should be started. The tests were done with 3 clients started quickly one after the other.

When the server successfully swaps 10000 times it sets the 5th byte to 0 to signal clients to exit and then the server exits.

You have to run the server first . server accepts a integer argument. This specifies the number of successful swaps it waits for before exiting. The default value is 10000.

After execution the clients and the server print a count of the number of successful swaps they performed. The server's count stays at 10000 or the value specified in the command line. The sum of the client's count should match the server's count. The count differs if two clients read 1 at the same time and update it with a 0 or a similar case. A differing count signifies no LOCK.

When the test was run on a smp kernel(2.4.21-15.ELsmp), without valgrind the counts matched. When valgrind was involved the sum of the swaps of the clients exceeded the server's count. when server's count was 10000 the clients total was 10988.

When the test is run on a non-smp kernel(2.4.21-15.EL) without valgrind the total of the clients was equal to that of the server. With valgrind sometimes I found that the client's total falls below server's total. I couldn't understand this behaviour. I set 1000 as server's count and got 977 as the sum of the client's count.

In any case this can happen only when atomicity is absent. The trials were done on a 2 cpu box.
Comment 1 smile 2004-07-23 09:52:38 UTC
Created attachment 6792 [details]
assembly code using LOCK
Comment 2 smile 2004-07-23 09:53:18 UTC
Created attachment 6793 [details]
driver file
Comment 3 smile 2004-07-23 09:53:49 UTC
Created attachment 6794 [details]
clients
Comment 4 smile 2004-07-23 09:55:02 UTC
Created attachment 6795 [details]
build
Comment 5 Tom Hughes 2004-07-23 16:58:13 UTC
I'm not sure there's much we can do about this as there is no guarantee that a single instruction in the executable being emulated will translate to a single instruction in the generated code so it may not be possible to preserve a LOCK prefix in all cases.
Comment 6 smile 2004-07-28 12:47:50 UTC
LOCK prefix guarantees instruction level atomicity in a multicpu environment, essential for mutual exclusion and represents a very important feature. The point is not just that LOCK prefix is missing in generated code, but because of it atomicity is not guaranteed.
Comment 7 Tom Hughes 2004-07-28 12:55:39 UTC
I know that, what I'm saying is that it is not (in general terms) possible for valgrind to provide the atomicity you require because there is no guarantee that the instruction in the input stream that you want to be atomic will be single instruction in the output stream. That's just how valgrind works.

It might be possible to preserve the LOCK prefix when there is a one-one mapping between instructions, and valgrind should certainly warn loudly when it is ignoring a LOCK prefix.
Comment 8 smile 2004-08-05 15:14:28 UTC
I got your point. Thanks. Then the alternative would be to go for high level language mutex algorithms. This bug is a good item for the wishlist I guess. 
Comment 9 smile 2004-09-08 07:26:11 UTC
One of the local experts mentioned that a lock prefix use with cmpxchg can be implemented with a 

  cmpxchg 
  lfence

and doesnot require lock prefix. But ofcourse it can only work with P4.
Comment 10 Nicholas Nethercote 2009-06-30 04:36:00 UTC
Julian, will this be fixed by bug 197793?  If so, this can be marked as a dup of it.
Comment 11 Julian Seward 2009-07-01 08:13:53 UTC

*** This bug has been marked as a duplicate of bug 197793 ***