[Gluster-devel] Some performance issues in mount/fuse

gluster-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gluster-devel] Some performance issues in mount/fuse

From:	Xavier Hernandez
Subject:	[Gluster-devel] Some performance issues in mount/fuse
Date:	Mon, 11 Mar 2013 11:49:47 +0100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130221 Thunderbird/17.0.3

Hello,

I've recently performed some tests with gluster on a fast network (IPover infiniband) and got some unexpected results. It seems thatmount/fuse is becoming a bottleneck when the network and disk are very fast.

I started with a simple distributed volume with 2 bricks mounted on aramdisk to avoid possible disk bottlenecks (however I repeated the testswith an SSD and, later, with a normal hard disk and the results were thesame, probably due to the good work of performance translators). Withthis configuration, a single write reached a throughput of ~420 MB/s.It's way below the maximum network limit, but for a single write it'squite acceptable. However with two concurrent writes (carefully chosenso that each one goes to a different brick), the throughput was ~200MB/s (for each transfer). That was totally unexpected. As there wasplenty of bandwith available and no IO limitation, I was expectingsomething near 800 MB/s.

In fact, any combination of concurrent writes always led to the samecombined throughput of ~400 MB/s.

Trying to determine the cause of this odd behavior, I noticed thatmount/fuse uses a single thread to serve kernel requests, and once arequest is received, it is sent down the xlator stack to process it,only reading additional requests once the stack returns. This means thatto reach a 420 MB/s throughput using 128KB per request (the currentmaximum block size), it needs to serve, at least, 3360 requests persecond. In other words, it processes each request in 300 us. If we takeinto account that every translator will allocate memory, and do somesystem calls, it's quite possible that it really takes 300 us to serveeach request.

To see if this is the case, I added the performance/io-threads justbelow the mount/fuse. This would queue each request to a differentthread, freeing the current one to read another request much before than300 us. This should improve the concurrent writes case.

The results are good. Using this simple modification, 2 concurrentwrites performed at ~300 MB/s each one. However the throughput for asingle write dropped to ~250 MB/s. Anyway, this solution is not validbecause there is some incompatibility with this configuration and somethings do not work well (for example a simple 'ls' does not show all thefiles).

Then I modified the mount/fuse xlator to start some threads to servekernel requests. With this modification all seems to work as expectedand throughput is quite better: a single write still performs at 420MB/s, and 2 concurrent writes reach 330 MB/s. In fact, any combinationof 2 or more concurrent writes has a combined throughput of ~650 MB/s.

However, a replicate volume does not improve at all. I'm not sure why.It seems that there should be some kind of serialization point incluster/afr. A single write has a throughput of ~175 MB/s, and 2concurrent writes ~85 MB/s. I'll have to investigate this further.


Does all this make sense ?

Is this something that would be worth investing more time ?

Regards,

Xavi

[Prev in Thread]

Current Thread

[Next in Thread]

[Gluster-devel] Some performance issues in mount/fuse, Xavier Hernandez <=
- Re: [Gluster-devel] Some performance issues in mount/fuse, lierihanmei, 2013/03/12
  - Re: [Gluster-devel] Some performance issues in mount/fuse, Xavier Hernandez, 2013/03/12

Prev by Date: Re: [Gluster-devel] file version on glusterfs using libgit
Next by Date: Re: [Gluster-devel] Quota command failed (with 3.4 alpa 2)
Previous by thread: [Gluster-devel] Semi-OT: review.gluster.org no longer working with FAS accounts
Next by thread: Re: [Gluster-devel] Some performance issues in mount/fuse
Index(es):
- Date
- Thread