Understanding Threads: Atomic Operations and Memory Ordering

A common question about threads is:

I need a read/write access to a variable from two or more threads. Should I protect an access to a variable by critical section?

A simple answer is: yes, if you want to be on the safe side always protect the variable by a critical section. A deeper answer requires an understanding of the two concepts: atomic operations and memory ordering.

The “atomic” means that the result of the operation is always valid. Suppose we have two threads: thread A writes to a variable one of the two possible values (0 or 1); thread B reads the variable. If both writes and reads are atomic, then the result of the read operation in thread B can be only 0 or 1 – it is impossible to read a different value from the variable, even if write and read operations are performed at the same time.

That is what an official documentation on Intel processors says about guaranteed atomic operations:

Guaranteed Atomic Operations

The Intel486 processor (and newer processors since) guarantees that the following basic memory operations will always be carried out atomically:

    Reading or writing a byte
    Reading or writing a word aligned on a 16-bit boundary
    Reading or writing a doubleword aligned on a 32-bit boundary

The Pentium processor (and newer processors since) guarantees that the following additional memory operations will always be carried out atomically:

    Reading or writing a quadword aligned on a 64-bit boundary

Does the above mean that if an access to a variable is carried out atomically by processor there is no need to protect an access to a variable by critical section? A general answer is NO. Sometimes atomic operations are not enough and we also should take a memory ordering into account.

Consider the following situation:

thread A executes the following code:

  K:= 0;
  L:= 0;
// [..] - a lot of code skipped
  K:= 5;
  L:= 1;

thread B executes the following code:

if L = 1 then N:= N + K;

Read and write access to K and L variables is atomic. What is wrong with the above code?
The result of N:= N + K; is predictable only if the processor enforces strong memory ordering, i.e. thread B reads the changed K and L values in the same order as thread A writes them, so that if thread B sees that L has changed (if L = 1) it assumes that K has also changed (and should be read as 5).
Modern processors does not enforce strong memory ordering. You can read about memory ordering (and atomic operations) in official Intel manual (currently available here), but a good practice is not to rely on a particular kind of memory ordering in a particular processor. We should not assume that when thread B reads L as 1 it will next read K as 5 – it is also possible that it will read K as 0, so the result of

if L = 1 then N:= N + K;

is unpredictable. (Of course the result is always predictable if the above code is executed in thread A).

A possible solution in the above situation is to protect read and write access to L variable by a critical section – a critical section creates a kind of memory barrier and enforces local strong memory ordering. A critical section also makes an access to a variable atomic, so if you protect multithread-accessed variables by critical sections you are on the safe side.

Advertisements

4 thoughts on “Understanding Threads: Atomic Operations and Memory Ordering

  1. You can use Atomic Operations directly available with the InterlockedIncrement, InterlockedDecrement, InterlockedExchange, and InterlockedExchangeAdd functions ,defined in SysUtils.pas and Windows.pas.

    About performance, the faster are Atomic Operations, then if you use Critical sections, you’ll need to define the protected section as short as possible.

    And since even Atomic Operations can be also very CPU-consuming (see http://synopse.info/forum/viewtopic.php?id=57 about the implicit LOCK of all CPU cores), it’s sometimes a better idea to use a thread-dependent counter, then update a global variable only if necessary. And the use of a Critical Section can be better, from the performance point of view, than a long list of atomic operations.

    In all multi-threading programming, avoiding locks as much as possible is mandatory. Consider using a RCU-like mechanism.

  2. Reference counting or other counting variables are good for atomic operations. However, the result of the atomic operator (or function) should be used since it is a copy of the last value of the shared variable. Otherwise reading from a shared variable can result in unpredictable values.

  3. It’s a shame you don’t have a donate button! I’d without a doubt donate to this fantastic blog! I guess for now i’ll settle for book-marking and adding your RSS feed
    to my Google account. I look forward to brand new updates and will talk about this blog with my Facebook group.
    Chat soon!

  4. I do not know if it’s just me or if everyone else encountering problems with your blog. It looks like some of the written text on your posts are running off the screen. Can somebody else please provide feedback and let me know if this is happening to them as well? This could be a problem with my web browser because I’ve had this happen previously.
    Thank you

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s