savannah-hackers-public
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Savannah-hackers-public] robots.txt disallows all spiders for maili


From: Sylvain Beucler
Subject: Re: [Savannah-hackers-public] robots.txt disallows all spiders for mailing lists
Date: Mon, 6 Jul 2009 00:14:47 +0200
User-agent: Mutt/1.5.18 (2008-05-17)

> How about we disallow all spiders, but add an exemption rule for Google?
> 
> Sylvain, Karl, thoughts?

Yes, we have a good enough experience fighting a monopoly (Microsoft),
so encouraging One Unique Search Engine sounds like an heresy.
Especially that one.

What kind of joke is that! :)

> Can we remove this file and let [$SEARCH_ENGINE] spider the GNU and
> non-GNU lists?

This problem isn't new indeed:
Last-Modified: Fri, 24 Feb 2006 18:06:14 GMT

Last time I asked, probably around that time, the sysadmins in charge
(a different team, incidentally) had less performant hardware and were
taking all measures to reduce the load, including denying access to
search engines. I don't think that was the case before.

In practice, the problem is somewhat worked-around by sites that
mirror lists.gnu.org nonetheless (mail archives, etc.), which
themselves are crawled.

IMHO this is a suboptimal solution as our free software projects are
losing a decent adversiting and helpdesk vector that way.  Since we
now have newer hardware for lists.gnu.org, it would make sense to open
it up again to search engines, possibly with a crawl delay parameter
is there's a problem.

I don't think the decision was political, especially since the FSF
doesn't have any hard stance against ASPs beside suggesting the AGPL,
separately from the GPL (sadly :)).

As was mentioned, we Savannah hackers currently only have restricted
access to lists.gnu.org, so in particular robots.txt is managed by
address@hidden

-- 
Sylvain




reply via email to

[Prev in Thread] Current Thread [Next in Thread]