wp-mirror-list
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Wp-mirror-list] WP-MIRROR 0.7 feedback


From: wp mirror
Subject: Re: [Wp-mirror-list] WP-MIRROR 0.7 feedback
Date: Fri, 30 May 2014 04:15:51 -0400

Dear Luiz,

Thank you for your patience. I released WP-MIRROR 0.7.3 a few days ago, and have turned my attention now to your feature requests. Thanks also for the nicely detailed e-mail.

0) Bug report

I have not been able to reproduce the bug. That said, I set up `kvm' with my own script, rather than use VirtualBox. So I have not actually reproduced the environment in which you tested WP-MIRROR.

I will have to become familiar with VirtualBox for an additional reason. A Swiss friend who attended the recent Zurich Hackathon, wrote me about a presentation on `Vagrant', which apparently sets up a near current version of MediaWiki using VirtualBox. I will want to check this out.

I will get back to you on this.

1) MediaWiki Extensions

Of the three extensions you mention (ProofreadPage, DynamicPageList, and Quiz) I have some concerns about the first. While doing my homework on these extension, I came across the following discussion page:

<http://meta.wikimedia.org/wiki/Requests_for_comment/Standardize_ProofreadPage_namespaces_across_Wikisources>

The variety of namespace IDs has me concerned. I count 64 `wikisource' wikis as follows:

(shell)$ rsync ftpmirror.your.org::wikimedia-dumps/ | grep wikisource | wc -l
64

I am wondering how to determine which wiki uses which namespace IDs. I know this information can be found in the <siteinfo> element at the head of every XML data dump. Choosing three of the languages that I can read, I immediately see inconsistency:

In `dewikisource-20140519-pages-articles.xml.bz2' I see:

      <namespace key="102" case="first-letter">Seite</namespace>
      <namespace key="103" case="first-letter">Seite Diskussion</namespace>
      <namespace key="104" case="first-letter">Index</namespace>
      <namespace key="105" case="first-letter">Index Diskussion</namespace>

In `enwikisource-201405023-pages-articles.xml.bz2' and `zhwikisource-20140517-pages-articles.xml.bz2` I see:

      <namespace key="104" case="first-letter">Page</namespace>
      <namespace key="105" case="first-letter">Page talk</namespace>
      <namespace key="106" case="first-letter">Index</namespace>
      <namespace key="107" case="first-letter">Index talk</namespace>

Two problems: 1) My reading comprehension does not extend to all 64 languages. 2) When the user is browsing, and Extension:ProofreadPage is executed, that <siteinfo> element is long gone.

Possible solution: I may need to implement some kind of (language-code, namespace ID pair) look-up table.

I do not want my `LocalSettings.php' file to become excessively crufty due to a plethora of special cases. I may need advice from someone in the `wikisource' community. To begin with: Has anyone compiled a table of (language-code, namespace ID pair) that I could use? Is anyone managing the assignment of namespace IDs?

2) Special wikis

You can mirror `betawikiversity' and `sourceswiki' with the following commands:

(rootshell)# wp-mirror --add betawikiversity
(rootshell)# wp-mirror --add sourceswiki
(rootshell)# wp-mirror --mirror

Once the mirrors have been built, you may point your browser to the following URLs:

<http://beta.wikiversity.site/>
<http://sources.wikipedia.site/>

I tested both mirroring and browsing before writing this e-mail.

3) XML data dumps

There are basically three XML dumps from which to choose:

`simplewiki-yyyymmdd-pages-articles.xml.bz2',
`simplewiki-yyyymmdd-pages-meta-current.xml.bz2', and
`simplewiki-yyyymmdd-pages-meta-history.xml.bz2'.

I assume that when you ask for the `allpages' data dump, you are referring to the `pages-meta-current' data dump. This feature request can be implemented easily, and will appear in the next release (v0.7.4). Implementing `pages-meta-history' would be a little more effort.

Please let me know if I understand correctly the meaning of `allpages'.

Sincerely Yours,
Kent



On Wed, May 14, 2014 at 3:03 PM, wp mirror <address@hidden> wrote:
Dear Luiz,

Thank you very much for your bug reports and feature requests.

I am currently working to release WP-MIRROR 0.7.3 this week. It is mostly bug fixes. Then I will start work on your requests. I have been thinking adding a feature that may be of interest to you. Namely, to generate ZIM files using Parsoid.

I will get back to you in a few days, after I have released v0.7.3.

Sincerely Yours,
Kent



On Mon, May 12, 2014 at 3:24 PM, Luiz Augusto <address@hidden> wrote:
Firstly, many thanks for releasing WP-MIRROR and congratulations for doing such impressive work! Your Manual is also awesome being so instructive for those that are nerd enough to run such software set but without being geek enough to known all knowledge that you've summarized on the manual.

I was playing on mirror Wikisource wikis for later creating .ZIM files for Kiwix but found some issues. Below are also some insights for some anothers Wikipedia sister projects

A) Apparently I've found a bug with your script. I was willing to try to run it mirroring ca.wikisource (a small Wikisource wiki, but with all relevant specific configurations that Wikisources shares), but using WP-MIRROR 0.7 on Debian 7.5 within VirtualBox 4.3.10 at some point I got this error message [1]:

B) Additionally, your tool misses some very crucial MediaWiki extensions used on some but not all Wikimedia wikis:

- ProofreadPage [2], usage example on [3] and [4]. Largelly used on all Wikisource wikis

- DynamicPageList [5], used on lot's of Wikimedia wikis. Usage example at [6] and [7]

- Quiz [8], enabled on some Wikibooks and Wikiversity wikis. Usage examples on [9]

C) Is possible to mirror the [[:oldwikisource:]] wiki (<https://wikisource.org/wiki/>, dbname sourceswiki)? How I cand do that? Is that wiki part of the WIkisource farm on your script?

D) Same question for [[:betawikiversity:]] wiki (<https://beta.wikiversity.org/>, dbname betawikiversity) and that wiki being to the Wikiversity farm

E) Is possible to point WP-MIRROR to grab the "allpages" data dump? Some Wikisource wikis stores relevant data on Talk namespace, as on [10]. My home wiki (ptwikisource) stores it in a multi-purspose custom namespace, [11]

Best,
Luiz/[[:m:User:555]]

----
[1] [ ok ]fsm-no-op cawikisource-20140509-local-media-f-ff
[ ok ]fsm-file-wget cawikisource-20140509-local-media-f-ff
[ ok ]fsm-file-remove cawikisource-20140509-local-media-f-ff
[....]fsm-file-list-missing cawikisource-20140509-remote-media-0-00
ERROR 1142 (42000) at line 1: SELECT command denied to user 'wikiadmin'@'localho[ ok ]fsm-file-list-missing cawikisource-20140509-remote-media-0-00
[ ok ]fsm-no-op cawikisource-20140509-remote-media-0-00
[ ok ]fsm-file-wget cawikisource-20140509-remote-media-0-00
[ ok ]fsm-file-remove cawikisource-20140509-remote-media-0-00
[....]fsm-file-list-missing cawikisource-20140509-remote-media-0-01
ERROR 1142 (42000) at line 1: SELECT command denied to user 'wikiadmin'@'localho[ ok ]fsm-file-list-missing cawikisource-20140509-remote-media-0-01













reply via email to

[Prev in Thread] Current Thread [Next in Thread]