Re: robots.txt in git and online are not the same

From: Phil Holmes
Subject: Re: robots.txt in git and online are not the same
Date: Mon, 1 Jul 2013 09:52:21 +0100

----- Original Message ----- From: "Mark Polesky" <address@hidden>
To: "Phil Holmes" <address@hidden>; <address@hidden>
Sent: Sunday, June 30, 2013 10:45 PM
Subject: Re: robots.txt in git and online are not the same

Sorry, I'm having trouble with my email client.  My last
post got munged.  Trying again, hope it works, bear with me...

Phil Holmes wrote:
If robots.txt was getting updated properly, all of our
Google search bar problems would be solved. We could
then stop telling Google to restrict the search results
to a patrticular version from the search box itself. The
robots.txt file only allows the current stable docs to
be indexed.

No - it would (AFAICS) prevent indexing docs prior to
current stable.  It would still index current development,
which I believe remains correct.

I know I've been out of the loop, but when was it decided
that we should allow Google to index the development docs?

The CG indicates that the robots.txt file should disallow
the current devel docs with the line
"Disallow: /doc/v2.CURRENT-DEVELOPMENT/":

My interpretation is that this is referring to the development version current at that moment in time - i.e the last one. So when 2.18 is being prepped, 2.17 would be added to the disallow list. I'm certain that this is the right option - we need to allow people to search current development documentation.

By the way, fixing that would kill 3 items in the
tracker with one blow:

Issue 2909: Manual search returns results from wrong
Issue 3209: Searching stable release documentation
Issue 3367: Web/Docs: LilyPond version is not clear on

Again - I don't think it would fix this, because users
would still confuse current stable and current
development.  We had a lot of discussion about this
problem on -user, and I think this is still a positive

But current development docs should not appear on Google.  I
thought that was decided years ago:


That appears to be referring to 2.9 when 2.12 is current stable - so no reference to current development release.

OK - I've checked the server, and you're quite right -
there appears no mechanism for
git/Documentation/web/server/robots.txt to update the root
of the web server.

That is a bug, and if no one has a solution ready, it needs
to be added to the tracker, either as a new issue or as an
addendum to #2909, #3209, or #3367.  I think all 3 could
profitably be merged into one.

I believe that make website copies it to
/website/robots.txt, which is essentially useless.  As I
see it, there are 3 options:

1) I could manually copy robots.txt.  This is not a
long-term solution, but would be a step forward right
now.  If Mark wants me to do this and no-one shouts,
I will.

2) We could have a Cron job on the server to do this.
This strikes me as less good than

3) we could update make website to do this.

Option no. 3!  I'm not opposed to option 1 right now, as
long as option 3 is recorded in the tracker.  Or if anyone
knows how to fix it, feel free to chime in!

- Mark

I'll copy robots.txt later today. http://code.google.com/p/lilypond/issues/detail?id=3430 covers the make website issue.

Phil Holmes

