[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SMP, barriers, etc.

From: Samuel Thibault
Subject: Re: SMP, barriers, etc.
Date: Mon, 28 Dec 2009 13:32:53 +0100
User-agent: Mutt/1.5.12-2006-07-14

Da Zheng, le Mon 28 Dec 2009 20:08:56 +0800, a écrit :
> On 09-12-28 下午7:28, Samuel Thibault wrote:
> >>> It's not more expensive than usual operations: it already has to do it
> >>> to process cache line invalidations for everything that is in the cache.
> >> I don't understand. Do you mean processing cache line invalidation in 
> >> local cache?
> > 
> > Yes: a processor already has to listen to what other processors want to
> > do with data that it has in its cache.
> Really?

Yes, so that

int x, y;

void *f(void*foo) { while(1) x++; }
void *g(void*foo) { while(1) y++; }

int main(void) {
        pthread_t tf, tg;
        pthread_create(&tf, NULL, f, NULL);
        pthread_create(&tg, NULL, g, NULL);

works appropriately.

> Is it scalable?

Nope. That's why most applications do not scale well on big parallel
machines, at least due to cacheline false sharing like examplified above.
That's one of the main thing we teach our students in Bordeaux.

> Didn't you say cache coherency in most architecture is done by
> software?

I'd need to get the precise quote that could express that, but no.

> >>>> That conditional store instruction needs to do more if it succeeds. It 
> >>>> has to
> >>>> invalidate cache lines specified by the monitored address in other 
> >>>> processors.
> >>>
> >>> Locked operations on Intel have to do the same :)
> >> Doesn't the intel processor maintain cache coherency by hardware?
> > 
> > Err, yes. But I guess that's also the case with the Alpha, no?
> No, I don't think so. I believe Alpha has more relaxed model.

Yes it has very relaxed rules about the ordering of cache coherency
visibility, but it still _has_ to execute the program above correctly,
thus maintain cache line coherency.

> > both the writing processor and the reading processor cooperate on
> > the ordering of the visibility of the changes.  That's much more
> > lightweight for the hardware cache coherency protocol and still
> > enough to implement locks, RCU lists etc.
> OK. Combined with what you said above, when a processor writes new values to 
> the
> memory, hardware invalidates all cache lines that contains these variables
> *immediately*.

It doesn't have to be completely immediate to get cache coherency. The
actual invalidation can be deferred up to the time when other processors
try to read/write it. That's what gives you potential out of order
visibility events.

> So when a write memory barrier instruction is executed, the processor
> has to remember the order of writes, so the read memory barrier
> instruction executed on another processor can somehow get the
> information?

In principle, yes. The actual implementation in the hardware cache
coherency protocol can be a barrier that prevents reordering, or
sequential numbers, etc.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]