gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gluster-devel] Proposal to change locking in data-self-heal


From: Pranith Kumar Karampuri
Subject: [Gluster-devel] Proposal to change locking in data-self-heal
Date: Tue, 21 May 2013 09:10:18 -0400 (EDT)

Hi,
    This idea is proposed by Brian Foster as a solution to several hangs we 
faced during self-heal + truncate situation or two self-heals triggered on the 
same file situation.

Problem:
Scenario-1:
At the moment when data-self-heal is triggered on a file, until the self-heal 
is complete, extra full file-locks will be blocked. Because of this, truncate 
fops hang until the self-heal is complete.

Scenario-2:
While a self-heal is in progress if another self-heal is triggered on the same 
file then it will be put into blocked queue. Because of the presence of this 
blocked lock, further locks by writes on the file will be moved to blocked 
queue as well.

Both these scenarios lead to user perceivable interim hangs.

Little bit of background:
At the moment the data-self-heal acquires the locks in following pattern. It 
takes full file lock then gets xattrs on files on both replicas. Decides 
sources/sinks based on the xattrs. Now it acquires lock from 0-128k then 
unlocks the full file lock. Syncs 0-128k range from source to sink now acquires 
lock 128k+1 till 256k then unlocks 0-128k, syncs 128k+1 till 256k block... so 
on finally it takes full file lock again then unlocks the final small range 
block. It decrements pending counts and then unlocks the full file lock.
     This pattern of locks is chosen to avoid more than 1 self-heal to be in 
progress. BUT if another self-heal tries to take a full file lock while a 
self-heal is already in progress it will be put in blocked queue, further 
inodelks from writes by the application will also be put in blocked queue 
because of the way locks xlator grants inodelks. Here is the code:

xlators/features/locks/src/inodelk.c - line 225
  0         if (__blocked_lock_conflict (dom, lock) && !(__owner_has_lock (dom, 
lock))) {                                                                       
                               
  1                 ret = -EAGAIN;                                              
     
  2                 if (can_block == 0)                                         
     
  3                         goto out;                                           
     
  4                                                                             
     
  5                 gettimeofday (&lock->blkd_time, NULL);                      
     
  6                 list_add_tail (&lock->blocked_locks, 
&dom->blocked_inodelks);    

Solution:
Since we want to prevent two parallel self-heals. We let them compete in a 
separate "domain". Lets call the domain on which the locks have been taken on 
in previous approach as "data-domain".

In the new approach When a self-heal is triggered it
acquires a full lock in the new domain "self-heal-domain".
    After this it performs data-self-heal using the locks in "data-domain" in 
the following manner:
    Acquire full file lock and get xattrs on file and decide source/sinks 
unlock full file lock.
    acquire lock with range 0 - 128k, sync the data from source to sinks in 
range 0 - 128k unlock 0 - 128k lock.
    acquire lock with range 128k+1 - 256k, sync the data from source to sinks 
in range 128k+1 - 256k, unlock 128k+1 - 256k lock.
    .....
    until the end of file is reached do this.
    acquire full file lock and decrement the pending counts then unlock the 
full file lock.
unlock the full file lock in "self-heal-domain"

scenario-1 won't happen because there exists a chance for it to acquire 
truncate's full file lock after any 128k range sync happens.
Scenario-2 won't happen because extra self-heals that are launched on the same 
file will be blocked in self-heal-domain so the data-path's locks are not 
affected by this.

Let me know if you see any problems/suggestions with this approach.

Pranith.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]