gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] Re: Timeout settings and self-healing ? (WAS: HA fai


From: Guido Smit
Subject: Re: [Gluster-devel] Re: Timeout settings and self-healing ? (WAS: HA failover test unsuccessful (inaccessible mountpoint))
Date: Wed, 23 Apr 2008 12:47:50 +0200
User-agent: Thunderbird 2.0.0.12 (Windows/20080213)

Krishna,

I did the test. I killed glusterfsd on one server.
All tests (ls, df, cp) worked like it should. I didn't even notice any difference. Unplugging the cable however, blocked all operations and finally after a few minutes
the transport endpoint message appears.


Krishna Srinivas wrote:
Guido,
Do you see the same behavior if you kill one of the server processes
instead of unplugging the cable?
Can you "cd" out of glusterfs mount point and "cd" back in after you
get the first "transport endpoint not connected" and see if you
still see the error?
Do you see "transport endpoint" error for all operations you do on
the mount point?
Thanks
Krishna


On Tue, Apr 22, 2008 at 1:19 PM, Guido Smit <address@hidden> wrote:
 My server configs:

 http://glusterfs.pastebin.com/m3f82f264

 One of the client config:
 http://glusterfs.pastebin.com/d5df7fab

 My problem is, when one of the storage servers is unplugged, I always get
the
 Transport endpoint is not connected message.




 Krishna Srinivas wrote:
 Guido,

Can you give the setup details, conf files?
you can use http://glusterfs.pastebin.com for pasting conf files.

Thanks
Krishna

On Fri, Apr 4, 2008 at 2:40 PM, Anand Avati <address@hidden> wrote:


 Daniel/Guido,
 can you paste the logs which are relevant from the time of unplugging the
 cable till the end of experiment?

 avati

 2008/4/3, Daniel Maher <address@hidden <address@hidden>>:



 > On Thu, 3 Apr 2008 14:55:48 +0530 "Anand Avati" <address@hidden>
 > wrote:
 >
 > > Daniel,
 > > maybe it is just taking long to detect connection failure. Can you
 > > try with 'option transport-timeout 20' (sets response timeout to 20
 > > seconds) in all your protocol/client and see if you still face the
 > > 'hang' ?
 >
 > My simple test case is as follows :
 > 1. Unplug one of the nodes (dfsD)
 > 2. Attempt to ls -l the /opt/ (in which gfs-mount/ - the mountpoint -
 > is contained)
 >
 > I set the timeout option along with every client instance in both the
 > client and server configs. I tested timeout settings of 10 and 20
 > seconds (just to see). In both cases, the 'hang' releases after a while
 > (approx 30 seconds), but the results are odd. For example :
 >
 > # ls -l
 > (hang ~ 30 seconds)
 > ls: cannot access gfs-mount: Transport endpoint is not connected
 > total 0
 > d????????? ? ? ? ? ? gfs-mount
 >
 > # ls -l
 > (immediate)
 > ls: cannot access gfs-mount: Transport endpoint is not connected
 > total 0
 > d????????? ? ? ? ? ? gfs-mount
 >
 > (user wait ~ 5 seconds)
 >
 > # ls -l
 > total 8
 > drwxr-xr-x 2 root root 4096 2008-04-03 09:43 gfs-mount
 >
 > It would appear that the "recovery" time, regardless of whether the
 > timeout is set to 10 or 20, is around 35 to 40 seconds - though, at the
 > very least, it recovered. Is there any reasonable way to bring this
 > period of time down ?
 >
 > Thank you all so much for your feedback on this topic !
 >
 >


_______________________________________________
 Gluster-devel mailing list
 address@hidden
 http://lists.nongnu.org/mailman/listinfo/gluster-devel






--
Met vriendelijke groet,

Guido Smit
ComLog B.V.

Televisieweg 133
1322 BE Almere
T. 036 5470500
F. 036 5470481

No virus found in this outgoing message.
 Checked by AVG.
 Version: 7.5.524 / Virus Database: 269.23.3/1390 - Release Date: 4/21/2008
4:23 PM





--
Met vriendelijke groet,

Guido Smit
ComLog B.V.

Televisieweg 133
1322 BE Almere
T. 036 5470500
F. 036 5470481

No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.524 / Virus Database: 269.23.3/1392 - Release Date: 4/22/2008 3:51 
PM

reply via email to

[Prev in Thread] Current Thread [Next in Thread]