gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gluster-devel] Re: cp taking 100% cpu and never terminating


From: Mickey Mazarick
Subject: [Gluster-devel] Re: cp taking 100% cpu and never terminating
Date: Sat, 14 Jun 2008 16:12:56 -0400
User-agent: Thunderbird 2.0.0.14 (Windows/20080421)

I'm still seeing the problem described below. It only happens over the ibverbs transport and very infrequently tcp. This is an intermittent problem, but happens quite frequently over ibverbs. It will use all the processing power on a single core of the client machine. I can repeat the command but eventually the machine will lock with all processors doing a cp or a tar command. We see it on both kernel 2.6.18 and 2.6.24. Has anyone there been able to replicate it?

Thanks!
-Mickey Mazarick


Mickey Mazarick wrote:
Something odd is happening when I run a shell script with cp commands in it. This happens infrequently but I have to reboot the system to get my processor back. I'm never taring or copying more than 50 megs of data.

It either hangs on a command like:
cp --reply=yes /usr/src/linux-${kernver}/.config /tftpboot/node_root/boot/config-${kernver}
or
tar cf - etc | gzip > /tftpboot/node_root/drbl_ssi/template_etc.tgz

when I do a top I see:
 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
1603 root      20   0 54160 1616  508 R  100  0.0  33:02.72 cp
(100% cpu time)

I'm unable to kill that process in any way, but I can kill the shell script that spawned it. The CP command is still running.

I see the below errors on the client:
2008-05-11 17:02:32 E [client-protocol.c:1238:client_flush] system1: : returning EBADFD 2008-05-11 17:02:32 E [afr.c:2623:afr_flush_cbk] afr1: (path=/scripts/gluster/afrheal.sh child=system1) op_ret=-1 op_errno=77 2008-05-11 17:02:32 W [client-protocol.c:1296:client_close] system1: no valid fd found, returning 2008-05-11 17:02:32 W [client-protocol.c:1296:client_close] system-ns1: no valid fd found, returning

My client and server specs are identical to:
http://www.gluster.org/docs/index.php/Simple_High_Availability_Storage_with_GlusterFS_1.3

This happens equally over ib-verbs and tcp transports.



--




reply via email to

[Prev in Thread] Current Thread [Next in Thread]