gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] 2.0.9 hanging bug


From: Brian Hirt
Subject: Re: [Gluster-devel] 2.0.9 hanging bug
Date: Thu, 23 Sep 2010 00:31:00 -0600

So 3.0.5 has turned out to be even more of a nightmare than 2.0.9 -- at least 
2.0.9 worked until a machine shutdown unexpectedly.

At the stroke of midnight all my clients got stale nfs handles and were no 
longer able to access files on the filesystem.   Had to kill glusterfs and all 
programs with open files, unmount and remount.

Here is a sample of the errors in the client logs:
        [2010-09-23 00:00:00] W [fuse-bridge.c:725:fuse_attr_cbk] 
glusterfs-fuse: 3408010: LOOKUP() / => -1 (Stale NFS file handle)
        [2010-09-23 00:00:16] W [fuse-bridge.c:725:fuse_attr_cbk] 
glusterfs-fuse: 3408056: LOOKUP() / => -1 (Stale NFS file handle)

There are no errors or warnings in the server logs.

clients are running 3.0.5 and ubuntu 9.04/9.10

does anyone have any idea what going on?  i'm losing all confidence is this 
product.


On Sep 22, 2010, at 11:35 AM, Brian Hirt wrote:

> We have a four node set up where remote1/remote2 and remote3/remote4 
> replicate and then the filesystem is distributed between them.  Well our 
> remote3 crashed hard and all of the clients that mounted the filesystem 
> started to hang when trying to access anything on the mount point.  
> 
> I never expected this to happen -- the whole point of a setup like this is to 
> allow for a machine to fail and still have your filesystem available and 
> accessible.
> 
> I've upgraded to 3.0.5 in the hopes that it's more reliable, but need to ask: 
>   Does anyone know if this bug has been fixed in the 3.x branch?  
> 
> Please advise
> 
> Thanks,
> 
> Brian
> 
> Client log entries at start of failure
> [2010-09-21 20:51:24] E [client-protocol.c:309:call_bail] remote3: bailing 
> out frame WRITE(14) frame sent = 2010-09-21 20:21:23. frame-timeout = 1800
> [2010-09-21 20:51:24] E [client-protocol.c:309:call_bail] remote3: bailing 
> out frame FINODELK(36) frame sent = 2010-09-21 20:21:23. frame-timeout = 1800
> [2010-09-21 20:51:24] W [client-protocol.c:6045:protocol_client_interpret] 
> remote3: no frame for callid=629454 type=4 op=36
> [2010-09-21 20:51:44] E [client-protocol.c:309:call_bail] remote3: bailing 
> out frame FINODELK(36) frame sent = 2010-09-21 20:21:41. frame-timeout = 1800
> [2010-09-21 20:51:44] E [client-protocol.c:309:call_bail] remote3: bailing 
> out frame FINODELK(36) frame sent = 2010-09-21 20:21:41. frame-timeout = 1800
> [2010-09-21 20:52:04] E [client-protocol.c:309:call_bail] remote3: bailing 
> out frame FINODELK(36) frame sent = 2010-09-21 20:22:01. frame-timeout = 1800
> [2010-09-21 20:52:04] E [client-protocol.c:309:call_bail] remote3: bailing 
> out frame FINODELK(36) frame sent = 2010-09-21 20:22:01. frame-timeout = 1800
> [2010-09-21 20:52:34] E [client-protocol.c:309:call_bail] remote3: bailing 
> out frame FINODELK(36) frame sent = 2010-09-21 20:22:28. frame-timeout = 1800
> [2010-09-21 20:52:34] E [client-protocol.c:309:call_bail] remote3: bailing 
> out frame FINODELK(36) frame sent = 2010-09-21 20:22:28. frame-timeout = 1800
> [2010-09-21 20:52:54] E [client-protocol.c:309:call_bail] remote3: bailing 
> out frame FINODELK(36) frame sent = 2010-09-21 20:22:48. frame-timeout = 1800
> [2010-09-21 20:52:54] E [client-protocol.c:309:call_bail] remote3: bailing 
> out frame FINODELK(36) frame sent = 2010-09-21 20:22:48. frame-timeout = 1800
> [2010-09-21 20:53:04] E [client-protocol.c:309:call_bail] remote3: bailing 
> out frame FINODELK(36) frame sent = 2010-09-21 20:23:00. frame-timeout = 1800
> [2010-09-21 20:53:04] E [client-protocol.c:309:call_bail] remote3: bailing 
> out frame FINODELK(36) frame sent = 2010-09-21 20:23:00. frame-timeout = 1800
> [2010-09-21 20:53:24] E [client-protocol.c:309:call_bail] remote3: bailing 
> out frame FINODELK(36) frame sent = 2010-09-21 20:23:21. frame-timeout = 1800
> [2010-09-21 20:53:24] E [client-protocol.c:309:call_bail] remote3: bailing 
> out frame FINODELK(36) frame sent = 2010-09-21 20:23:21. frame-timeout = 1800
> 
> 
> 
> 
> _______________________________________________
> Gluster-devel mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/gluster-devel




reply via email to

[Prev in Thread] Current Thread [Next in Thread]