gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Wiki invasions


From: Adrien Beau
Subject: Re: [Gnu-arch-users] Wiki invasions
Date: Sat, 15 May 2004 11:08:05 +0200
User-agent: KMail/1.6.2

On Friday 14 May 2004 12:24, Jan Hudec wrote:
> 
> > I think that's what happened today.
> 
> Yes, google bot almost surely will do it. It is, however, fault of Moin
> Moin. Since "revert" has side-efects, it should be only accessibe as
> a POST request -- and I know no bot that would send POST requests.

Wow wow wow, you're wrong. Do you think so poorly of MoinMoin
developers?! If GoogleBot (or any other widespread bot) *could*
do it, the breakage would have started on Day One of this wiki,
or any other MoinMoin wiki for that matter!

Look at the meta tag at the top of the Info page (the one which has
the revert links): <meta name="robots" content="noindex,nofollow">
(it is also present at the top of Edit pages, and other pages where
it does matter).

This means that GoogleBot (and friends) will not follow any links on
the page (no revert from them), and that they will not put the page
in their index of the web, ready to be used and abused by other
people (wouldn't it be great if a Google search for "view raw print revert"
returned a list of sites ready to be abused?).

Actually, this is only a second line of defense, since MoinMoin
has a ua_spiders setting, which is a regexp of of HTTP User Agents
that are excluded from logging in to the wiki and that receive an
HTTP FORBIDDEN for anything except viewing a page. The default
value includes the names of many robots, the word robot itself,
and utilities such as wget.

Now, the problem is that apparently a (professional?) wiki abuser
has been at work recently (222.183.69.243, from China), and that
another clueless one (195.229.241.181, from the United Arab Emirates)
has maybe created and certainly used a script that followed all the
revert links on Info pages. Or, perhaps, he simply ran a recursive
wget on the site, forging the user agent to work around forbidden
pages...

Best things that can be done, I think, from less disruptive to the
wiki ideal, to most disruptive (but still quite acceptable):

* Complain to address@hidden for the professional abuser, and/or
  to address@hidden for the cluless user.
* Create a /robots.txt file, so that programs such as wget who do
  understand this file but do not understand the robots meta tag
  can be kept at bay even when the user alters the User-Agent string.
  (This issue should be reported to MoinMoin developers.)
* Restrain Revert usage to logged-in users.
* Restrain wiki editing to logged-in users.

MoinMoin has Access Control Lists which can be used to implement the
latter two options. Read HelpOnAccessControlLists for details.

-- 
address@hidden - http://adrien.beau.free.fr/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]