wp-mirror-list
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Wp-mirror-list] Performance experiments


From: wp mirror
Subject: [Wp-mirror-list] Performance experiments
Date: Sun, 30 Dec 2012 16:06:24 -0500

Dear Jason,

Thank you very much for your e-mail of 2012-11-24 on Performance
Experiments.  You raised nine points, some of which I can answer now.

0) Profiling

WP-MIRROR 0.6 has a new command-line option, --time, which displays a
profile of real time (wall clock time) spent by each of the functions
in the Finite State Machine.  This provides a quantitative basis for
deciding what to optimize, and to evaluate the degree of success.

And now to your nine points:

1) cURL, metalink

A precondition for using metalink is the preparation of `.metalink' or
`.meta4' files on the server side.  I have looked in vain for evidence
that the WikiMedia Foundation (WMF) supports metalink.  I initiated
correspondence with WMF to inquire as their future plans, but have not
received an answer as yet.

2) cURL, pipelining

WP-MIRROR 0.6 now uses HTTP/1.1 Persistent Connections (default 10
image files per TCP connection, configurable).  WIth the aid of the
above mentioned profiling, fsm-file-shell (which downloads the image
file) uses 64% less time with persistent connections than without.

However, an examination of the HTTP headers show no evidence that cURL
is using pipelining (meaning several requests are sent from client to
server before the first reply is received).

3) cURL, SPDY

I found no evidence that WMF supports SPDY.  The SPDY home page
indicates that SPDY is a work in progress.  I have initiated
correspondence with WMF, but have no reply as yet.

4) less granular synchronization

WP-MIRROR 0.6 is now let the user configure:  HDD write caching
(default enabled), and innodb_flush_log_at_trx_commit (default 2, not
1).  Profiling shows significant performance improvements for
fsm-file-import (22% less time) and fsm-images-rebuild (50% less
time).  The new default configuration means that up to one second of
transactions may be lost during system failure, but the use of
check-pointing makes this a non-issue.

5) ensure sequential writes

All modern computer systems have a hierarchy of memory (L1 cache, L2
cache, main memory, storage controller cache, HDD write cache,
magnetic material).  This is the result of cost vs. performance
trade-offs.

A downloaded image file will pass through a number of caches on its
way to the magnetic material (disk surface).
<http://monolight.cc/2011/06/barriers-caches-filesystems/> has a
useful discussion.

Under Linux, most file systems (e.g. ext4, ReiserFS) support barriers
(see man 8 mount).  "Write barriers enforce proper on-disk ordering of
journal commits, making volatile disk write caches safe to use, at
some performance penalty."  It is customary to disable write barriers
for database storage, because InnoDB has its own journaling and memory
management system.  Write barrier disabling was done for the following
time trials: 
<http://www.mysqlperformanceblog.com/2010/02/28/maximal-write-througput-in-mysql/>

I am reluctant to have WP-MIRROR disable write barriers, because most
user platforms will be laptops, where everything is written to a
single disk.

Perhaps I misunderstand your question.  Please let me know if you have
an example of some kind of caching/queuing system in mind.

6) MySQL concurrency slowdown with compressed tables

You mentioned 
<http://dev.mysql.com/doc/innodb-plugin/1.0/en/innodb-downgrading-issues-doublewrite.html>.
 However, the issues discussed their pertain to the Built-in InnoDB
with MySQL 5.1.  WP-MIRROR 0.1 through 0.3 used the Plug-in InnoDB
with MySQL 5.1, which did not have those issues.  WP-MIRROR 0.4 and
higher use the Built-in InnoDB with MySQL 5.5, because the features of
the older plug-in have been swept into the new built-in.

I do not yet have an explanation for what caused for the high number
of deadlocks that I saw when using concurrency with Barracuda.

7) Haproxy

I have not yet looked into this enough to provide a good answer.

8) MariaDB

I have not yet looked into this enough to provide a good answer.

9) Memory allocation

The experiments that I ran from Autumn 2010 through Spring 2011,
indicated that innodb_buffer_pool_size and innodb_log_file_size are
crucial to performance.  That said, the MySQL 5.5 Reference Manual,
Chapter 8 Optimization, describes a number of caches to watch out for.
 In particular:  table_open_cache,  key_buffer_size, and
query_cache_size.  These all appear to be adequately sized (using
default values).

key_buffer_size is interesting.  It is only used by MyISAM storage
engine.  While InnoDB is now the default storage engine for MySQL 5.5,
it turns out that the `simple.searchindex' database table still uses
MyISAM, because it can be searched in ways not possible with InnoDB.
Try:

SELECT table_schema, table_name, engine, data_length, index_length
FROM information_schema.tables WHERE table_schema='simplewiki';


And now to a couple of additional points:

10) Compiling an expensive loop for maximum speed

Common lisp permits the programmer great control over what forms are
to be interpreted vs. compiled; and, if compiled, what `optimize
qualities' to apply (e.g. debug, space, speed, etc).

The fsm-file-scrape (previously called fsm-file-wikix) calls
parse-image-file-names.  This latter function has a rather expensive
LOOP, which must crawl line-by-line, character-by-character through
each xchunk to extract image file names.

Significant performance increase was achieved by compiling
parse-image-file-names for maximum SPEED (82% less time).

11) Closing comment

After implementing the above mentioned optimizations,
fsm-images-validate remains the most expensive activity.  So I hope
for good news from WMF regarding metalink.

Sincerely Yours,
Kent



reply via email to

[Prev in Thread] Current Thread [Next in Thread]