gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] Re; Load balancing ...


From: Krishna Srinivas
Subject: Re: [Gluster-devel] Re; Load balancing ...
Date: Tue, 29 Apr 2008 11:29:49 +0530

We did discuss about journaling translator. But implementation wise it gives
to a lot of complications.
* journal has to be maintained, it would require huge disk memory space.
* replay of the journaling will cause race conditions. (If we consider 2 or more
  clients, each client writes to same offset)

A better solution would be to maintain a list of dirty blocks and use it during
selfheal.

Krishna

On Tue, Apr 29, 2008 at 6:11 AM, Gareth Bult <address@hidden> wrote:
> Hi,
>
>  I must say I find the idea of a journal approach quite appealing, although 
> the split brain problem is an issue .. that said AFR volumes already have a 
> split-brain problem .. unplugging a network lead between two AFR sub-volumes 
> is an easy demonstration of this .. both servers will assume the other is 
> down and carry on .. would adding a journal make the issue any worse?
>
>  (or am I missing something?)
>
>  In terms of a real use-case, I've had lots of cluster issues relating to 
> single nodes becoming unavailable for short periods. With the exception of 
> "heartbeat" screwing up a DRBD setup (which was an internal software failure, 
> rather than anything we would be looking to protect against) I've never 
> experienced two nodes becoming isolated and potentially suffering from 
> split-brain. (I accept it can/does happen, but I'm thinking it's not an 
> everyday occurrence)
>
>  So ... a journal would not be a perfect solution, however a very limited 
> amount of split-brian protection might be considered a "pretty good" solution 
> in-context and it would provide excellent recovery metrics in most cases.
>
>  ??
>
>  In terms of work, I'm guessing each write operation would need to put an 
> additional (serial,path,offset,bytes,data) to the journal volume .. each data 
> volume would need to keep track of it's most recent serial, then mount would 
> need to check the journal and run playbacks for each sub-volume who's serial 
> isn't up to the most recent in the journal serial ...
>
>  If all this is done in a journal translator .. it doesn't "sound" too 
> onerous or that it would involve changing any other code ... ??
>
>  Gareth.
>
>
>
>  ----- Original Message -----
>  From: "Gordan Bobic" <address@hidden>
>
> To: "gluster-devel" <address@hidden>
>  Sent: Monday, April 28, 2008 7:56:16 PM GMT +00:00 GMT Britain, Ireland, 
> Portugal
>  Subject: Re: [Gluster-devel] Re; Load balancing ...
>
>
>
> Martin Fick wrote:
>
>  > May I suggest an alternate approach?  The rsync model
>  > seems like a nice one when you have no idea what the
>  > changes are, but with the glusterfs AFR it is possible
>  > to keep track of the changes.  What about adding a
>  > journaling volume option to the AFR translator?
>
>  Sounds like you are effectively describing an extent based volume, very
>  similar to what DRBD does to limit the amount of sync required.
>
>  > So if changes cannot be written to Sub B they would
>  > be recorded in Journal A.  When B comes back up and
>  > AFR notices a mismatch between a file on Sub A and Sub
>  > B and would normally query Sub A for the file
>  > contents, it could query Journal A first to see if the
>  > changes to the file are stored there.  If so, Journal
>  > A could reply with just the changes instead of the
>  > whole file and AFR can then apply the changes to Sub
>  > B.
>
>  Splitbrain handling of this would be impossible, and one version would
>  always have to win. But other than that, I can see that would work.
>
>  > The journal volume would not actually be required and
>  > would be space limited, it would simply drop changes
>  > that it can no longer keep track of.  If the journal
>  > does not have the change logged, everything would
>  > proceed as it does today, the subvolume would be
>  > queried for the whole file.  This would be a little
>  > like the DRBD model, but more inline with the gluster
>  > way of doing things.  It would be better than what
>  > DRBD does since it would be more granular.  When space
>  > for changes runs out, whole files might have to be
>  > synced, but not necessarily the whole filessytem!
>
>  I think having an rsync type syncing algorithm that can operate on the
>  whole file would be more flexible and potentially provide enough of an
>  improvement to make the complication of adding journals/extents not
>  worthwhile.
>
>  > I realize that this a major enhancement, and would be
>  > a lot of work, but then again, so probably would the
>  > rsync model implementation, would it not?
>
>  I haven't looked at the GlusterFS code (yet), but I would imagine that
>  implementing rsync-like file sync would be _much_ less work than
>  implementing extents/journals/undo logs.
>
>  > The
>  > advantage here is that consistency would be assured.
>
>  That is arguably fairly academic. Just use the rolling hash for rsync
>  that is big enough that the probability of a false negative in the
>  hashed block is around the same as the probability of a media error.
>
>  > The tradeoff between the journal and the rsync model
>  > is one of disk space for the journal versus CPU time
>  > for the rsync model.  Certainly both could be
>  > implemented, the journal could be queried first, and
>  > if that fails, use the rsync method!
>  > Thoughts?
>
>  In the ideal world - yes. In practice, I think that just adding rsync
>  capability for partial syncs would give most of the benefits for
>  relatively little effort in terms of implementation.
>
>  Gordan
>
>
>  _______________________________________________
>  Gluster-devel mailing list
>  address@hidden
>  http://lists.nongnu.org/mailman/listinfo/gluster-devel
>
>
>  _______________________________________________
>  Gluster-devel mailing list
>  address@hidden
>  http://lists.nongnu.org/mailman/listinfo/gluster-devel
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]