[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gluster-devel] add-brick
From: |
Emmanuel Dreyfus |
Subject: |
Re: [Gluster-devel] add-brick |
Date: |
Sun, 19 Aug 2012 04:13:09 +0000 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
> I understand this code checks that the layout cover the whole space,
> is that right? Then it must be upset that layout->list[0] does not cover
> anything. Since the error is transcient, I susepct a race condition:
> the layout would be filled after that check. Is it possible? Where is
> the layout crafted?
I improved my test by completely deleting and re-creating the
volume before adding a brick. Here is what happens when I add a brick:
1-vndfs-client-1: Connected to 192.0.2.103:24027, attached to remote
volume '/export/vnd1a'.
1-vndfs-client-1: Server and Client lk-version numbers are not same,
reopening the fds
0-fuse: switched to graph 1
1-vndfs-client-1: Server lk version = 1
1-vndfs-dht: missing disk layout on vndfs-client-0. err = -1
1-dht_layout_merge: ==> layout[0] 0 - 0 err -1
1-dht_layout_merge: ==> layout[1] 0 - 0 err 0
1-vndfs-dht: missing disk layout on vndfs-client-1. err = -1
1-dht_layout_merge: ==> layout[0] 0 - 0 err -1
1-dht_layout_merge: ==> layout[1] 0 - 0 err -1
I am not sure it is expected behavior. The broken layout does not
raise EINVAL to process using the filesystem, but later similar
treatment will.
After playing a bit I tested the race condition with this patch:
--- a/xlators/cluster/dht/src/dht-common.c
+++ b/xlators/cluster/dht/src/dht-common.c
@@ -477,6 +477,11 @@ unlock:
ret = dht_layout_normalize (this, &local->loc, layout);
if (ret != 0) {
+ if (strcmp(local->loc.path, "/") == 0) {
+ gf_log (this->name, GF_LOG_WARNING,
+ "wAit 2s for DHT to settle...");
+ sleep(2);
+ }
gf_log (this->name, GF_LOG_DEBUG,
"fixing assignment on %s",
local->loc.path);
Here is the kind og log it procudes. I do not always see the EINVAL
in the log, but it is never seen by processes using the filesystem.
At least during the tests I did.
[2012-08-19 06:04:06.288131] I [fuse-bridge.c:4195:fuse_graph_setup]
0-fuse: switched to graph 1
[2012-08-19 06:04:06.289052] I [client-handshake.c:453:
client_set_lk_version_cbk] 1-vndfs-client-1: Server lk version = 1
[2012-08-19 06:04:06.294234] W [dht-common.c:482:dht_lookup_dir_cbk]
1-vndfs-dht: wait 2s for DHT to settle...
[2012-08-19 06:04:08.306937] I [client.c:2151:notify] 0-vndfs-client-0:
current graph is no longer active, destroying rpc_client
[2012-08-19 06:04:08.308114] I [client.c:2090:client_rpc_notify]
0-vndfs-client-0: disconnected
[2012-08-19 06:04:08.309833] W [fuse-resolve.c:151:fuse_resolve_gfid_cbk]
0-fuse: 4e4b4110-a585-4aae-b919-b2416355f5d1: failed to resolve
(Invalid argument)
[2012-08-19 06:04:08.310275] E [fuse-bridge.c:353:fuse_lookup_resume]
0-fuse: failed to resolve path (null)
But this probably does not really fix the problem. I got an unreproductible
ENOENT for a directory while copying a hierarchy for instance.
--
Emmanuel Dreyfus
address@hidden
- [Gluster-devel] add-brick, Emmanuel Dreyfus, 2012/08/18
- Re: [Gluster-devel] add-brick, Anand Avati, 2012/08/18
- Re: [Gluster-devel] add-brick, Emmanuel Dreyfus, 2012/08/18
- Re: [Gluster-devel] add-brick, Emmanuel Dreyfus, 2012/08/18
- Re: [Gluster-devel] add-brick,
Emmanuel Dreyfus <=
- Re: [Gluster-devel] add-brick, Anand Avati, 2012/08/19
- Re: [Gluster-devel] add-brick, Emmanuel Dreyfus, 2012/08/19
- Re: [Gluster-devel] add-brick, Anand Avati, 2012/08/19
- Re: [Gluster-devel] add-brick, Emmanuel Dreyfus, 2012/08/19
- Re: [Gluster-devel] add-brick, Anand Avati, 2012/08/19
- Re: [Gluster-devel] add-brick, Emmanuel Dreyfus, 2012/08/20
- Re: [Gluster-devel] add-brick, Anand Avati, 2012/08/20
- Re: [Gluster-devel] add-brick, Emmanuel Dreyfus, 2012/08/20
- Re: [Gluster-devel] add-brick, Anand Avati, 2012/08/20
- Re: [Gluster-devel] add-brick, Emmanuel Dreyfus, 2012/08/20