现在的位置: 首页 > 综合 > 正文

Beautiful Proof – Data-Race-Free Implies Sequential Consistency

2018年05月13日 ⁄ 综合 ⁄ 共 10960字 ⁄ 字号 评论关闭

转载自:http://blog.csdn.net/pongba/archive/2007/03/09/1525446.aspx

C++
的罗浮宫
(http://blog.csdn.net/pongba

 

(另参考:A Memory Model for C++: Sequential Consistency for Race-Free Programs

             Sequential Consistency for Atomics

 



Disclaimer

: You should first
(but not the least) understand the fundamentals of multithreading
memory models

to be able to read the following text. I personally
recommend you start with the related
C++0x
memory model proposals

if you’re a C++ programmer, or the java memory
model

if you’re a java programmer. That being said, the principles
behind the proof apply to all kinds of the so-called data-race-free
models

.

If it happens that you’ve been slogging through all the glories
proposals of C++0x memory model

, and found yourself tripped up by
the proof in question. Then you’ll find this post useful. Otherwise I
suggest you at least read this
paper

before going on.

I do assume that you have a sufficient understanding of the
fact that multithreading
is becoming
,
if not has already become, the
next

big
event

and general
skill

in the programming field. There’re flames
all
over
the
places
.
And since C++ is one of my favorite
programming languages, and, unfortunately one that has not included a
multithreading memory model of some sort (well, not until C++0x
is published
),
I kept tracking of all
the progress

that has
been made

with respect to it
.
It turns out that multithreading is this extremely
tricky

yet interesting
business
.

OK, enough said, let’s get down to business. Just one last
thing though, I only write this post to explain the proof informally, so
this is not
going to be like an “Introduction to
C++0x memory model” post (although I do have the plan to write one of
those, just so long as there’s enough time), instead, I’ll go without
any further preamble. For those of you who’re not familiar with the
upcoming, brand-new C++0x memory model, and do consider yourself a C++
programmer (or, you know, a programmer in general), I strongly suggest
you start now
.

 

 

 

Anyway, The original paper
is here


Let’s take a look at its conclusion first:

 

If you stick to simple synchronization
rules, that is, using only lock and unlock to avoid data-races
, then a
consistent
execution

of your program will exhibit behavior that you can reason
about according to one of its corresponding sequentially
consistent

executions.

 

Well, put simply, this states that you can regard your
program as being executed by sequentially interleaving the actions of
all the threads, provided that you use only lock and unlock to avoid
data-races.

 

To state it even more loosely, it actually means that data-race-free implies Sequential Consistency
, the most
intuitive consistency
model

.

 

Actually, this is one of the most fundamentally important
conclusions with respect to this mainstream multithreading memory model.
It has two very nice implications. First, it says that we only need to
make our program data-race-free to get sequential consistency; and
second, it gives the implementations (compilers, vms, and hardwares)
great latitude in optimizing
the code, most of them being instruction
reorderings

of course.

 

For example, let’s consider the code snippet below:

 

int x, y; // globals

Thread 1   
Thread 2

y = 1;       
x = 1;

r1 = x;      
r2 = y;

 

If the code is executed sequential-consistently, r1 and r2
could never be both 0, because if we observed that r1 is 0, then it must
be the case that “r1 = x” precedes
“x = 1”, together
with the other two relations ( “y = 1” precedes
“r1 = x” and “x = 1” precedes
“r2 = y”), we can conclude that “y = 1” precedes
“r2
= y” (since the relation precedes
is transitive under
the sequential consistency interpretation), making r2 observe the value
1. And the same reasoning applies the other way around.

 

Unfortunately, however, sequential consistency is too
strict in that it prevents many useful optimizations, such as
instruction reordering. It’s usually the case that, in a multithreaded
program, the parts shared by two or more threads are rare. While present
CPUs don’t actually know the existence of threads, to achieve
sequential consistency we’d have to compile/execute the whole program
sequential-consistently, not just the inter-thread parts. So relying on
hardware obviously is not an option. This leaves us with the only choice
of manually specifying which parts of a program are inter-threaded. And
that’s where all
those synchronization instructions

kick in.

 

Like in the above example, most of today’s compiler could
reorder the two actions in each thread, thus making “r1 == r2 == 0” a perfectly
legitimate outcome, and even if your compiler hadn’t done it, your CPU will do
it

, too. But if we sprinkle some special crumbles (i.e. synchronization
primitives

) over our code, we’ll get sequential consistency easily
without forcing the whole implementation to obey sequential consistency
rules:

 

int x, y;

mutex mx, my;

 

Thread 1         
Thread 2

Lock

(my);         
Lock
(mx);

y = 1;               
x = 1;

Unlock

(my);    

Unlock
(mx);

Lock

(mx);        
Lock
(my);

r1 = x;             
r2 = y;

Unlock

(mx);   

Unlock
(my);

 

Here, we simply used lock
/unlock
primitives to avoid data-race which could cause the infamous undefined
behavior. Since lock/unlock primitives enforce happens-before relations,
and more importantly, act as one-way memory
fences

, this will get us sequential consistency, in addition to
data-race-free-ness. I think the reasoning should be pure simple, so
I’ll just leave it to you.

 

So what’s the big deal? I mean, didn’t people always do
that? The big deal is, we use lock/unlock to avoid all the data-races,
but as a result, we not only get to avoid the data-race, we also get the
whole sequential consistency property. That’s a really nice guarantee.
It means that we don’t have to worry about all the gory optimizations
that might change the behavior of our code; we just have to make our
code data-race-free, and the whole sequential consistency property just
comes as a side-effect. And maybe more importantly, the implementations
still get to keep their unawareness of threads, hence most of the
optimizations still go underneath, but you just don’t have to worry
about them anymore.

 

The principle of data-race-free memory model is basically
that the compilers or CPUs know nothing about threads; they compile code
or execute instructions as if there were just one thread, doing all
those crazy optimizations that are only legitimate in a single-thread
context; the programmers, however, put some specially

instructions in their code to prevent the crazy optimizations in some
areas, which, supposedly, are those places where multiple threads
communicate with each other. This way not only do we get to preserve the
correctness (sequential consistency) of our code, but we get to allow
most of the useful optimizations. This proved to be the most practical
trade-off before the super-intelligent compiler or CPU shows up which
could efficiently
recognize all those inter-thread
communication code for us, and respect them while doing all the
optimizations.

 

Anyway, I’ll just get down to the proof of “data-race-free
implies SC” before you get all bored ;)

 

Let’s consider a consistent
execution of a
program on a given input. Recall that, informally, “consistent” means
this particular execution respects all the happens-before

relations that are specified in the code using lock

and unlock
primitives, although it might not respect
the is-sequenced-before
relations, which is pretty
much what instruction reordering is all about.

 

So, let’s call this particular execution P’, and apparently
it’s not exactly
sequentially consistent, because
presumably there’d be lots of instructions that could and would be
reordered. However, what’s important to us is if it exhibits the exact
behavior of one of its corresponding sequential
consistent counterparts. For convenience, let’s just call one of the
latter ones P.

 

But what is this P anyway? How do we get it? Remember that
we said a consistent execution should respect all the happens-before
relations in the original program? So all the happens-before relations
in P’ must be preserved in P, which doesn’t contradict the sequential
consistency of P. And plus, P will have all those is-sequenced-before
relations, which will make all the actions from each thread appear in
the order in which they appear in their own thread. But it’s still not
enough; to make P a sequentially consistent execution, all the actions
should appear in a total order
. Recall that two
actions from different thread that aren’t ordered by happens-before
could be executed in any order, so why don’t we just arbitrarily order
them? This way we get to keep each thread order and will get a total
order.

 

Now, we’re going to prove that we could use P to reason
about the exact behavior of P’, which simply means that, despite all the
optimizations that have been made to our code, we could still regard it
as being executed sequential-consistently.

 

Apparently, from the standpoint of main memories, there’re
just two kinds of actions performed by a CPU – store

and load
. So, an execution of a program can be seen as
consisting of only stores
and loads
.
When we’re reasoning about the behavior of a program on a given input,
we are actually predicting the value seen by each load
.

 

So let’s just consider a load L in P’, and assume the value
seen by it is generated by a store Svisible
. To prove that
we could use P to reason about the behavior of P’, we just have to show
that Svisible
is the last preceding store in P that stores to
the same location.

 

Clearly, if Svisible
and L are in the same
thread in P’, Svisible
must be sequenced before L, because if
not, it would either implies a violation of C++03;5p4 (which basically
says that there shall not be conflicting accesses of the same scalar
between two adjacent sequence points) or conflicts the intra-thread
semantic constraints (which basically says that it should at least
appear to be the case that a particular thread is executed sequentially,
letting each load see the value written by the preceding store).

 

However, what if Svisible
and L reside in
different threads? Then Svisible
must happen-before L because
of the no data-race rule.

 

So, it turned out that Svisible
either
is-sequenced-before L or happens-before L. And here comes the real
brilliant part. We’ve already known that P’ respects

all the happens-before relations in the original program, so if Svisible
happens-before L in P’, then it must happen-before L in P. But what if Svisible
is-sequenced-before L? Recall that P is sequentially consistent; it respects
all the is-sequenced-before relations in the
original program. So if Svisible
is-sequenced-before L, then
it must be the case that it’s still sequenced before L in P. Hence, in
either case, Svisible
is before
L in P. So,
to get to the final conclusion that says we can use P to reason about
the behavior of P’, it’s left to be shown that there’re no other stores
to the same location that come sandwiched between Svisible

and L in P, because this way when we decide in P that L sees the value
written by Svisible
, it would actually be the case that L
would see the value written by Svisible
in P’, too, which is
what “reasoning about the behavior of P’ using P” is all about.

 

Let’s use the powerful “reductio ad absurdum”.
Let’s assume that there’s another Smiddle
sitting between Svisible
and L in P. Since P clearly is data-race-free (P and P’ both are
consistent executions of the original data-race-free program), Smiddle
,
Svisible
, and L must be fully ordered (by
is-sequenced-before or happens-before). However, S­visible

and L are already ordered. So the only possibilities left are:

 

Smiddle

is-sequenced-before/happens-before Svisible

 

L is-sequenced-before/happens-before Smiddle

 

Svisible

is-sequenced-before/happens-before Smiddle

is-sequenced-before/happens-before L

 

The reason the first two aren’t possible is that
it contradicts the reverse ordering in P. Here’s the logic: If Smiddle
is-sequenced-before/happens-before Svisible
, then it will
come before Svisible
in P, too (because is-sequenced-before
and happens-before are the two kinds of relations that P must respect),
thus contradicting our assumption that Smiddle
sits between Svisible
and L. The same reasoning applies to the second case.

 

To show that the last case isn’t possible
either, we just have to recall that P’ is a consistent execution (which
means every load sees the last preceding store to the same location), so
if Smiddlereally
comes between Svisible
and L in P(in the sense shown by the last case below) and since P’ have
to respect the happens-before relations (and the is-sequenced-before
relation in this case), Smiddle
will comes between them in
P’, too, thus blocking the line of sight from L to Svisible
,
making L see the value written by Smiddle
, not Svisible,

which will, in turn, make P’ an inconsistent execution.

 

So it turns out this imaginary Smiddle
appears neither here nor there. Then it must not exist, concluding the
proof.

抱歉!评论已关闭.