Re: [Gluster-devel] debugging ping timeouts

gluster-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] debugging ping timeouts

From:	Niels de Vos
Subject:	Re: [Gluster-devel] debugging ping timeouts
Date:	Wed, 26 Mar 2014 00:37:52 +0100
User-agent:	Mutt/1.5.20 (2009-12-10)

On Fri, Mar 21, 2014 at 05:25:29AM -0400, Pranith Kumar Karampuri wrote:
> hi,
>     I do not think glusterfs at the moment could tell why 
>     a ping-timeout happened. And by the time a user learns that such 
>     an event happened, client would have disconnected and reconnected, 
>     so we can not debug the issue any more. One of the reasons why 
>     ping-timeouts may happen is because epoll thread is busy doing 
>     something, most probably waiting on a mutex lock. So I am thinking 
>     may be we should add some extra information before and after 
>     acquiring locks and duration of critical section executions and 
>     report them at the time of disconnect.
> 
> pseudo code:
> 
> PTHREAD_MUTEX_LOCK(lock) {
>      get the current time to T1;
>      pthread_mutex_lock (lock);
>      get the current time T2;
>      if T2-T2 is greather than already recorded time update it //may be we 
> should also remember the xlator in which it happened.
> }
> 
> PTHREAD_MUTEX_UNLOCK(lock) {
>      get the current time to T3;
>      pthread_mutex_unlock (lock);
>      if T3-T2 is greather than already recorded time update it
> }
> 
> Something similar should be done for spin_locks as well.
> 
> When a disconnect event comes this information will be logged along 
> with disconnect messages.
> 
> If you could think of anything else please add it to the thread and we 
> will make a call after a while to see what all can be done to debug 
> such issues further.

When I would be asked on how to debug this ;-) , I would write 
a systemtap script. It obviously is not part of the glusterfs code-base, 
but it is one of the options that we as support engineers can run 
without modifying existing deployments.

The attached script should be a good start to debug any locking issues 
in glusterfs processes. It has been tested on Fedora 20, but should 
likely work on any RHEL (or derivate) system too. You need to install 
kernel-devel for the running kernel (systemtap builds a kernel module 
for the tracing), glusterfs-debuginfo and possibly the -debuginfo for 
dependent libraries. The script itself is plain text, and the comments 
in it should be clear enough to use it.

Good luck!

-- 
Niels de Vos
Sr. Software Maintenance Engineer
Support Engineering Group
Red Hat Global Support Services

log-locks.stp
Description: Text document

[Prev in Thread]

Current Thread

[Next in Thread]

[Gluster-devel] debugging ping timeouts, Pranith Kumar Karampuri, 2014/03/21
- Re: [Gluster-devel] debugging ping timeouts, Niels de Vos <=

Prev by Date: Re: [Gluster-devel] DHT idea: rebalance-specific layout
Next by Date: [Gluster-devel] Fwd: New Defects reported by Coverity Scan for GlusterFS
Previous by thread: [Gluster-devel] debugging ping timeouts
Next by thread: [Gluster-devel] Updated regression.sh on regression test host
Index(es):
- Date
- Thread