Re: [Gluster-devel] Architecture advice

gluster-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] Architecture advice

From:	Gordan Bobic
Subject:	Re: [Gluster-devel] Architecture advice
Date:	Mon, 12 Jan 2009 20:00:44 +0000
User-agent:	Thunderbird 2.0.0.19 (X11/20090107)

Martin Fick wrote:

--- On Mon, 1/12/09, Gordan Bobic <address@hidden> wrote:

...
No need for fencing simply because you now use HA
translator. The assumption in this case is that theservers can still talk to each other but that oneserver's connection to the clients may have died.
That means that 50% of the scope for failure will still
wipe you out because you'll start splitbraining. Not the
way forward at all. A fencing setup will at least preserve
the data integrity.


Fencing won't help either without cooperation, see below...

The correct way to handle comms channel
failure between client and server is to have bonded
interfaces going via different physical paths. _ONLY_
dealing with the situation where both servers are alive and
connected to each other but we can only reach one due to an
obscure failure somewhere in the network (e.g. a failed
switch port or a failed NIC in the server) is a pretty
half-arsed edge case.

Why is that the correct way? There's nothing wrong withhaving "bonding" at the glusterfs protocol level, isthere?

The problem is that it only covers a very narrow edge case that isn'tall that likely. A bonded NIC over separate switches all the way to bothservers is a much more sensible option. Or else what failure are youtrying to protect yourself against? It's a bit like fitting a bigpadlock on the door when there's a wall missing.

That is somewhat what the HA translator is, exceptthat it is supposed to take care of some additionalfailures. It is supposed to retransmit "in progress"operations that have not succeeded because of commfailures (I have yet to figure out where in the codethis happens though).

This is a reinvention of a wheel. NFS already handles this gracefullyfor the use-case you are describing.

Why re-invent the wheel when the tools to deal with these
failure modes already exist?
Are you referring to bonding here? If so, see abovewhy HA may be better (or additional benefit).

My original point is that it doesn't add anything new that you couldn'tachieve with tools that are already available.

Any failures on the server side may still warrant a
fencing setup, but AFR is not yet setup to workcooperatively with a fencing setup.

It doesn't have to be. If one server in AFR dies
nothing spectacular happens. Things time out and carry on. I
don't see what cooperation there would need to be. RHCS
does it's own heart-beating and fencing. Mix and match
as required.


Yes, if a server goes down you are fine (aside from the
scenario where the other server then goes down followed
by the first one coming back up).  But, if you are using
the HA translator above and the communication goes down

between the two servers you may still get split brain(thus the need for heartbeat/fencing).

And therein lies the problem - unless you are proposing adding acomplete fencing infrastructure into glusterfs, too.

But, even with the current write logging in AFR, thereare possible split brain scenarios which can not beavoided even with heartbeat/fencing (yet). Anytime twodifferent clients try to write to the same area of thefilessystem and the network is segregated, there is achance that they each succeed and fail on oppositeservers causing split brain. There is nothing heartbeatcan do about this except attempt to mitigate the problemby intervening. But heartbeat has no hooks to know whenthis happens so by the time heartbeat intervenes,"half writes" to each server may have occurred thatcannot be undone. That is the reason you really needcooperation between AFR and some other tool (such asheartbeat).

No, that's the whole point. You DON'T need that cooperation. If AFR isserver-side, if the server's disconnect, cluster/heartbeat willdisconnect, too, which will initialize fencing and failover,hard-power-off the failed server, and everything lives happily everafter. GlusterFS doesn't need to be aware. One node disappears. That'sabout the size of it. When it comes back, if files were written to,their timestamps will be newer, so on next read, they'll get synced tothe re-joined node.

AFR needs to be able write all or nothingto all servers until some external policy machine(such as heartbeat) decides that it is safe (becauseof fencing or other mechanism) to proceed writing toonly a portion of the subvolumes (servers). Withoutthis I don't see how you can prevent split brain?

With server-side AFR, splitbrain cannot really occur (OK, there's a tinywindow of opportunity for it if the server isn't really totally deadsince there's no total FS lock-out until fencing is completed like onGFS, but it's probably close enough). If the server's can't heartbeat toeach other, they can't AFR to each other, either. So either the writegets propagated, or it doesn't. The machine that remained operationalwill have more up to date files and as necessary those will get syncedback. It's not quite as tight as GFS in terms of ensuring dataconsistency like a DRBD+GFS solution would be, but it is probably closeenough for most use-cases.


Gordan

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Gluster-devel] Architecture advice, (continued)
- Re: [Gluster-devel] Architecture advice, Joe Landman, 2009/01/08
  - Re: [Gluster-devel] Architecture advice, Dan Parsons, 2009/01/08
- Re: [Gluster-devel] Architecture advice, Gordan Bobic, 2009/01/12
  - Re: [Gluster-devel] Architecture advice, Martin Fick, 2009/01/12
    - Re: [Gluster-devel] Architecture advice, Gordan Bobic, 2009/01/12
- Re: [Gluster-devel] Architecture advice, Gordan Bobic, 2009/01/12
- Re: [Gluster-devel] Architecture advice, Martin Fick, 2009/01/12
  - Re: [Gluster-devel] Architecture advice, Gordan Bobic <=
    - Re: [Gluster-devel] Architecture advice, Martin Fick, 2009/01/12
    - Re: [Gluster-devel] Architecture advice, Gordan Bobic, 2009/01/12
    - Re: [Gluster-devel] Architecture advice, Martin Fick, 2009/01/12
- Re: [Gluster-devel] Architecture advice, Gordan Bobic, 2009/01/14

Prev by Date: Re: [Gluster-devel] Architecture advice
Next by Date: Re: [Gluster-devel] RPM / BerkeleyDB on GlusterFS
Previous by thread: Re: [Gluster-devel] Architecture advice
Next by thread: Re: [Gluster-devel] Architecture advice
Index(es):
- Date
- Thread