gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gluster-devel] 1.3.12 segfault


From: Matt McCowan
Subject: [Gluster-devel] 1.3.12 segfault
Date: Thu, 15 Jan 2009 14:02:45 +0900

Greetings. It's my first post to this list so please bear with me while I try 
and flesh out the segfault I saw yesterday ...

Call me brave, call me stupid - without enough equipment on which to test 
things I have plunged glusterfs 1.3.12 straight into production on a small 
Opteron based cluster. The 14 clients are either 2 or 4 way Opteron driven (44 
core all up) running on amd64 Gentoo with a 2.6.20 kernel and the Gluster 2.7.3 
fuse module.
Running the same Gentoo as the clients the two servers are 4 way Opteron, dual 
homed (GigE) with a glusterfsd per network connection, sharing out 250G per 
daemon.

Yesterday the glusterfs process on one of the 2 way clients went to 100%. 
Attaching an strace to it showed it repeatedly calling nanosleep. Since the 
machine needed to be back online quickly (oh for the budget of LANL!) I tried 
to ctrl-c the strace, then sigterm, then had to sigkill it.

The sigterm must have got through to the glusterfs process because the log on 
the client contains:
"2009-01-14 14:01:53 W [glusterfs.c:416:glusterfs_cleanup_and_exit] glusterfs: 
shutting down server"
There were no log entries made when it was running at 100%.
The problem on the client was first noticed when a user tried to tab-complete a 
directory listing of the gluster mounted file system.

The gluster client was restarted. It was only a couple of hours later when some 
of the users reported issues that I noticed one of the glusterfsd's had died on 
a server. The glusterfsd segfault on the server coincides with killing the 
glusterfs on the client.

I haven't compiled gluster with debug, so following are entries from the server 
logs, client config, and a backtrace of the core dump (which unfortunately 
mirrors what's in the logs).

Side note: in an earlier 1.3.12 config we were running stripe across two 
glusterfsd backends. It proved to be quite unstable (specifically with 
directories sometimes not sync'ing on the backends) compared to the 
unify+namespace config. Otherwise glusterfs seems to be all round easier to 
install and use compared to my first cluster filesystem attempt with PVFS.


Contents of /var/log/glusterfsd.log:
====================================
2009-01-14 14:01:53 E [protocol.c:271:gf_block_unserialize_transport] server: 
EOF from peer (172.17.231.162:1016)
2009-01-14 14:01:53 E [protocol.c:271:gf_block_unserialize_transport] server: 
EOF from peer (172.17.231.162:1017)

TLA Repo Revision: glusterfs--mainline--2.5--patch-797
Time : 2009-01-14 14:01:53
Signal Number : 11

glusterfsd -f /etc/glusterfs/glusterfs-server-shareda.vol -l 
/var/log/glusterfs/glusterfsd.log -L WARNING
volume server
  type protocol/server
  option auth.ip.nsbricka.allow *
  option auth.ip.hans.allow *
  option auth.ip.data.allow *
  option bind-address 172.17.231.170
  option transport-type tcp/server
  subvolumes data hans nsbricka 
end-volume

volume data
  type performance/io-threads
  option cache-size 128M
  option thread-count 4
  subvolumes databrick 
end-volume

volume databrick
  type storage/posix
  option directory /var/local/shareda
end-volume

volume hans
  type cluster/afr
  subvolumes nsbricka nsbrickb 
end-volume

volume nsbrickb
  type protocol/client
  option remote-subvolume nsbricka
  option remote-host maelstroma9
  option transport-type tcp/client
end-volume

volume nsbricka
  type storage/posix
  option directory /var/local/namespace
end-volume

frame : type(0) op(0)
frame : type(0) op(0)

2009-01-14 14:01:53 E [protocol.c:271:gf_block_unserialize_transport] server: 
EOF from peer (172.17.231.162:1015)
/lib/libc.so.6[0x2af3d0e0f940]
/usr/lib64/glusterfs/1.3.12/xlator/cluster/afr.so(afr_close+0x140)[0x2aaaaacd37d0]
/usr/lib64/glusterfs/1.3.12/xlator/protocol/server.so(server_protocol_cleanup+0x1af)[0x2aaaaaef80cf]
/usr/lib64/glusterfs/1.3.12/xlator/protocol/server.so(notify+0x6e)[0x2aaaaaef853e]
/usr/lib/libglusterfs.so.0(transport_unref+0x64)[0x2af3d0ab32b4]
/usr/lib64/glusterfs/1.3.12/transport/tcp/client.so(tcp_disconnect+0x7d)[0x2aaaaaffdcfd]
/usr/lib64/glusterfs/1.3.12/xlator/protocol/server.so(notify+0x61)[0x2aaaaaef8531]
/usr/lib/libglusterfs.so.0(sys_epoll_iteration+0xbb)[0x2af3d0ab3c4b]
/usr/lib/libglusterfs.so.0(poll_iteration+0x78)[0x2af3d0ab3008]
[glusterfs](main+0x67c)[0x40288c]
/lib/libc.so.6(__libc_start_main+0xf4)[0x2af3d0dfd374]
[glusterfs][0x401d59]
---------
====================================
end of glusterfsd.log


/etc/glusterfs/glusterfs-client.vol:
====================================
volume brick1
        type protocol/client
        option transport-type tcp/client     # for TCP/IP transport
        option remote-host maelstroma0
        option transport-timeout 120
        option remote-subvolume data # name of the remote volume
end-volume

volume brick2
        type protocol/client
        option transport-type tcp/client     # for TCP/IP transport
        option remote-host maelstroma0a
        option transport-timeout 120
        option remote-subvolume data # name of the remote volume
end-volume

volume brick3
        type protocol/client
        option transport-type tcp/client     # for TCP/IP transport
        option remote-host maelstroma9
        option transport-timeout 120
        option remote-subvolume data # name of the remote volume
end-volume

volume brick4
        type protocol/client
        option transport-type tcp/client     # for TCP/IP transport
        option remote-host maelstroma9a
        option transport-timeout 120
        option remote-subvolume data # name of the remote volume
end-volume

volume ns
        type protocol/client
        option transport-type tcp/client
        option remote-host gluster
        option transport-timeout 120
        option remote-subvolume hans
end-volume

volume unify
        type cluster/unify
        option scheduler rr
        option rr.limits.min-free-disk 5
        option namespace ns
        subvolumes brick1 brick2 brick3 brick4
end-volume

volume iothreads
        type performance/io-threads
        #option thread-count 8
        option thread-count 4
        option cache-size 64M
        subvolumes unify
end-volume

volume readahead
        type performance/read-ahead
        option page-size 1024kb
        option page-count 10
        subvolumes iothreads
end-volume

volume iocache
        type performance/io-cache
        option cache-size 64MB #default 32M
        option page-size 1MB #default 128kb
        subvolumes readahead
end-volume

volume writebehind
        type performance/write-behind
        option aggregate-size 1MB
        option flush-behind off
        subvolumes iocache
end-volume
====================================
end of glusterfs-client.vol

gdb backtrace:
====================================
gdb /usr/sbin/glusterfsd /core.28935 
GNU gdb 6.7.1
Copyright (C) 2007 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu"...
(no debugging symbols found)
Using host libthread_db library "/lib/libthread_db.so.1".
Reading symbols from /usr/lib64/libglusterfs.so.0...(no debugging symbols 
found)...done.
Loaded symbols for /usr/lib/libglusterfs.so.0
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib64/libpthread.so.0...done.
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib64/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /usr/lib64/glusterfs/1.3.12/xlator/storage/posix.so...done.
Loaded symbols for /usr/lib64/glusterfs/1.3.12/xlator/storage/posix.so
Reading symbols from 
/usr/lib64/glusterfs/1.3.12/xlator/protocol/client.so...done.
Loaded symbols for /usr/lib64/glusterfs/1.3.12/xlator/protocol/client.so
Reading symbols from /usr/lib64/glusterfs/1.3.12/xlator/cluster/afr.so...done.
Loaded symbols for /usr/lib64/glusterfs/1.3.12/xlator/cluster/afr.so
Reading symbols from 
/usr/lib64/glusterfs/1.3.12/xlator/performance/io-threads.so...done.
Loaded symbols for /usr/lib64/glusterfs/1.3.12/xlator/performance/io-threads.so
Reading symbols from 
/usr/lib64/glusterfs/1.3.12/xlator/protocol/server.so...done.
Loaded symbols for /usr/lib64/glusterfs/1.3.12/xlator/protocol/server.so
Reading symbols from /usr/lib64/glusterfs/1.3.12/transport/tcp/client.so...done.
Loaded symbols for /usr/lib64/glusterfs/1.3.12/transport/tcp/client.so
Reading symbols from /usr/lib64/glusterfs/1.3.12/transport/tcp/server.so...done.
Loaded symbols for /usr/lib64/glusterfs/1.3.12/transport/tcp/server.so
Reading symbols from /usr/lib64/glusterfs/1.3.12/auth/ip.so...done.
Loaded symbols for /usr/lib64/glusterfs/1.3.12/auth/ip.so
Reading symbols from /lib64/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /lib64/libnss_dns.so.2...done.
Loaded symbols for /lib/libnss_dns.so.2
Reading symbols from /lib64/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib64/libgcc_s.so.1...done.
Loaded symbols for /lib/libgcc_s.so.1

Core was generated by `[glusterfs]                                              
'.
Program terminated with signal 11, Segmentation fault.
#0  0x00002aaaaacd37d0 in afr_close ()
   from /usr/lib64/glusterfs/1.3.12/xlator/cluster/afr.so
(gdb) q
====================================
end of bactrace


Thanks for glusterfs
Regards
Matt McCowan
sysadmin
RPS MetOcean
Perth, Western Australia




reply via email to

[Prev in Thread] Current Thread [Next in Thread]