[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: debbugs.gnu.org search
From: |
Maxim Nikulin |
Subject: |
Re: debbugs.gnu.org search |
Date: |
Tue, 31 Aug 2021 18:56:39 +0700 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 |
On 30/08/2021 22:30, Glenn Morris wrote:
Maxim Nikulin wrote:
and links to raw mail messages, e.g. debbugs.gnu.org/db/17/17678.html
At the top of the page, and the bottom, in bold red text, is a link
"Click here to see this page with the latest information and nicer formatting."
Thank you, I have not noticed this link. I believed that link to raw
messages were indexed by mistake instead of regular pages. My
expectations were based on what I saw in bug reports on bugs.debian.org.
My point is that it is still inconvenient to both humans (intermediate
page) and search engines (heuristics is not powerful enough to recognize
valuable parts).
For a suggestion for further improvement, see
https://lists.gnu.org/r/help-debbugs/2020-12/msg00026.html
Though this branch of "mail vs. web UI" discussion of communication with
users and contributors is rather off-topic, it seems, the link you
provided, might explain why some part of users prefer web UI for
interaction. Technical details of communication are not available to
crawlers (e.g. forensic ones, unfortunately been still fully available
to site owners though). Alternative variant of your link:
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=43073
"#43073 Trim/hide full email headers on debbugs"
For this particular query I expect to get
*#29645 Feature Request: Locale aware formatting*
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=29645
It's the second result?!
Glad to see that the recipe works for someone. I am not so lucky. See
below for details.
https://debbugs.gnu.org/robots.txt
Disallow: /cgi/
Disallowed due to performance reasons.
The "static" pages are indexed, and contain prominent (though clearly not
prominent enough) links to the "dynamic" pages.
I suspected that indexing was broken intentionally. However static pages
may still be better formatted in my opinion (and with a footer with
prominent last update time to avoid confusion related to recent updates).
I have not noticed any special HTTP header sent by bugs.debian.org that
my alleviate server load during scan by search engines. Unsure that
"cache-control: public, max-age=600" plays a significant role.
There is also a simple search on https://debbugs.gnu.org/
and a complex one (I agree the interface is weird) on
https://debbugs.gnu.org/cgi/search.cgi
I intentionally preserved the following in my previous message.
5. And debbugs.gnu.org search sucks. Or at least I suck
at trying to find anything using it.
My impression is that "general purpose" search engines are usually able
to provide more relevant results due to handling of common typos,
synonyms, etc. At least while there are no precise criteria for a filter...
So the problem is not really reproducible, thus harder to debug. Namely
duckduckgo does not show #29645 for me on the first page at all. Maybe
it depends on region from which a request is originated.
There is another issue with "static" debbugs pages for indexing by
search engines: poor metadata. HTML TITLE element contains just "GNU bug
report logs - #29645". Debbugs does not support '<meta
name="description" content="">' and similar info.
+ duckduckgo (unsure concerning particular underlying engine in my case,
maybe bing):
- no #29645 in results
- title is not informative: "GNU bug report logs - #27544"
- summary is either enumeration of headers or a part of message
body. In the former case it is impossible to estimate relevance
of particular result.
https://html.duckduckgo.com/html?q=site%3Adebbugs.gnu.org+emacs+locale+number
+ google
- title often (but not always) is taken from H1 element,
so it is usually much better
"GNU bug report logs - #29645 Feature Request: Locale aware ..."
Unfortunately it is trimmed.
- summary: often useless raw headers
"... X-Spam-Status: No, score=0.8 required=5.0
tests=BAYES_50,FREEMAIL_FROM, ... Request: Locale aware formatting To:
bug-gnu-emacs@HIDDEN Content-Type: ..."
https://www.google.com/search?q=site%3Adebbugs.gnu.org+emacs+locale+number
+ yandex
- #29645 sometimes is present, sometimes it is not
- title is useless: "GNU bug report logs - #29645"
- summary: a snippet from report is hardly noticeable since
bug status is placed earlier
"Package: emacs; Severity: wishlist; Reported by: Gustaf
Waldemarson ... A while ago I started looking for some simple way
of writing numbers correctly formatted to the locale. Specifically,
I wanted the output to use the locale's..."
It may be completely useless though:
"Package: emacs; Reported by: Jan Synacek <jsynacek@HIDDEN>;
merged with #3219 ... Information forwarded to bug-gnu-emacs@HIDDEN:
bug#40007; Package emacs. Full text available. Merged 3219 4123 9589
13675 15555..."
https://yandex.ru/search/?text=site%3Adebbugs.gnu.org+emacs+locale+number
+ bing
- no #29645
- titles are useless: "GNU bug report logs - #5618"
- summary varies from message body snippets to just
"information forwarded to bug-gnu-emacs@HIDDEN: bug#3229; Package
emacs. Full text available"
https://www.bing.com/search?q=site%3Adebbugs.gnu.org+emacs+locale+number
My conclusion is that debbugs.gnu.org is not friendly to search engines,
so relevant results are not guaranteed, it is hard to estimate if
particular item may be useful looking at its title and summary. It does
not matter whether DebBugs, GitLab, or SourceHut is used as a bug
tracker if robots.txt file does not allow to index descriptions friendly
to users and to search engines.
- RE: [External] : Re: Gitlab Migration, (continued)
- Re: [External] : Re: Gitlab Migration, André A . Gomes, 2021/08/30
- Re: [External] : Re: Gitlab Migration, tomas, 2021/08/27
- Debbugs state (was: [External] : Re: Gitlab Migration), Michael Albinus, 2021/08/27
- Re: Debbugs state (was: [External] : Re: Gitlab Migration), tomas, 2021/08/27
- Re: Debbugs state, Michael Albinus, 2021/08/27
- Re: Debbugs state, tomas, 2021/08/27
- gebbugs.gnu.org search (was Re: Gitlab Migration), Maxim Nikulin, 2021/08/30
- Re: debbugs.gnu.org search [was gebbugs.gnu.org search], Glenn Morris, 2021/08/30
- Re: debbugs.gnu.org search,
Maxim Nikulin <=
- Re: [External] : Re: Gitlab Migration, Eli Zaretskii, 2021/08/27
- RE: [External] : Re: Gitlab Migration, Drew Adams, 2021/08/27
- Re: Gitlab Migration, Eli Zaretskii, 2021/08/27
- Re: Gitlab Migration, Arthur Miller, 2021/08/27
- Re: Gitlab Migration, Eli Zaretskii, 2021/08/28
- Re: Gitlab Migration, Arthur Miller, 2021/08/28
- Re: Gitlab Migration, Po Lu, 2021/08/26
- Re: Gitlab Migration, Arthur Miller, 2021/08/26
- Re: Gitlab Migration, Dmitry Gutov, 2021/08/27
- Re: Gitlab Migration, Arthur Miller, 2021/08/28