[Gluster-devel] Re: cp taking 100% cpu and never terminating

gluster-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gluster-devel] Re: cp taking 100% cpu and never terminating

From:	Mickey Mazarick
Subject:	[Gluster-devel] Re: cp taking 100% cpu and never terminating
Date:	Sat, 14 Jun 2008 16:12:56 -0400
User-agent:	Thunderbird 2.0.0.14 (Windows/20080421)

I'm still seeing the problem described below. It only happens over theibverbs transport and very infrequently tcp. This is an intermittentproblem, but happens quite frequently over ibverbs. It will use all theprocessing power on a single core of the client machine. I can repeatthe command but eventually the machine will lock with all processorsdoing a cp or a tar command. We see it on both kernel 2.6.18 and 2.6.24.Has anyone there been able to replicate it?


Thanks!
-Mickey Mazarick


Mickey Mazarick wrote:

Something odd is happening when I run a shell script with cp commandsin it. This happens infrequently but I have to reboot the system toget my processor back. I'm never taring or copying more than 50 megsof data.
It either hangs on a command like:
cp --reply=yes /usr/src/linux-${kernver}/.config/tftpboot/node_root/boot/config-${kernver}
or
tar cf - etc | gzip > /tftpboot/node_root/drbl_ssi/template_etc.tgz

when I do a top I see:
 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
1603 root      20   0 54160 1616  508 R  100  0.0  33:02.72 cp
(100% cpu time)
I'm unable to kill that process in any way, but I can kill the shellscript that spawned it. The CP command is still running.
I see the below errors on the client:
2008-05-11 17:02:32 E [client-protocol.c:1238:client_flush] system1: :returning EBADFD2008-05-11 17:02:32 E [afr.c:2623:afr_flush_cbk] afr1:(path=/scripts/gluster/afrheal.sh child=system1) op_ret=-1 op_errno=772008-05-11 17:02:32 W [client-protocol.c:1296:client_close] system1:no valid fd found, returning2008-05-11 17:02:32 W [client-protocol.c:1296:client_close]system-ns1: no valid fd found, returning
My client and server specs are identical to:
http://www.gluster.org/docs/index.php/Simple_High_Availability_Storage_with_GlusterFS_1.3
This happens equally over ib-verbs and tcp transports.

--

[Prev in Thread]

Current Thread

[Next in Thread]

[Gluster-devel] Re: cp taking 100% cpu and never terminating, Mickey Mazarick <=
- Re: [Gluster-devel] Re: cp taking 100% cpu and never terminating, Raghavendra G, 2008/06/15
  - Re: [Gluster-devel] Re: cp taking 100% cpu and never terminating, Mickey Mazarick, 2008/06/16
  - Re: [Gluster-devel] Re: cp taking 100% cpu and never terminating, Mickey Mazarick, 2008/06/17

Prev by Date: Re: [Gluster-devel] LVM
Next by Date: Re: [Gluster-devel] LVM
Previous by thread: [Gluster-devel] LVM
Next by thread: Re: [Gluster-devel] Re: cp taking 100% cpu and never terminating
Index(es):
- Date
- Thread