Re: Yet another browser extension for capturing notes

emacs-orgmode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Yet another browser extension for capturing notes - LinkRemark

From:	Maxim Nikulin
Subject:	Re: Yet another browser extension for capturing notes - LinkRemark
Date:	Sat, 26 Dec 2020 18:49:19 +0700
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0

On 25/12/2020, Ihor Radchenko wrote:


Reading through the code, I can see that you are familiar with metadata
conventions. Do you know good references about what og: metadata is
commonly used? I looked through the official OpenGraph specification,
but popular websites appear to ignore most of the conventions.


I just inspected pages on several sites using developer tools and added
code that handles noticed elements.

I have not tried to find any resources on metadata (OK, once I searchedfor LD+JSON, essentially the outcome was the link to schema.org that Ihave seen in data already). Looking into page source, I realized thatalmost nobody cares if the site has metadata of appropriate quality. Ithink, search engines are advanced enough to work without metadata andeven decrease page rank if something suspicious was added by SEO. Theonly force to add some formal data is "share" buttons. Maybe some guidesfor web developers from social networks or search engines could be moreuseful than formal references, but I have not had a closer look.

Also, org-capture-ref does not really force the user to put BiBTeX into
the capture. Individual metadata fields are available using
org-capture-ref-get-bibtex-field (which extracts data from internal
alist structure). It's just that I mostly had BiBTeX in mind (with
distant goal of supporting export to LaTeX) for my use-cases.

I do not have clear vision how to use collected data for queries.Certainly I want to have more human-friendly representation than BibTeXentries (maybe in addition to machine-parsable data) adjacent to my notes.

Personally, I would prefer to avoid http queries from Emacs. Sometimesit is better to have current DOM state, not page source, that is why Idecided to gather data inside browser, despite security fences that areplaced quite strangely in some cases.

From my point of view, you should be happy with any of projects youmentioned below. Are all of them have some problems critical for you?

Technically it should be possible to push e.g. rawdocument.head.innerHtml to any external metadata parser using nativemessaging (to deal with sites requiring authorization). However it couldcause an alarm during review before publication of the extension to thebrowser catalogues.

Finally, would you be interested to join efforts on metadata parsing?

Could you, please, share a bit more details on your ideas? There is someroom for improvement, but I do not think that quality of metadata forordinary sites could be dramatically better. The case that is nothandled it all is scientific publications, unfortunately currently Ihave quite little interest in it. Definitely results should be stored insome structured format such as BibTeX. I have seen huge <head> elementsdescribing even all references. Certainly such lists are not forgeneral-purpose notes (at least without explicit request from the user),they should be handled by some bibliography software to display citationgraphs in the local library. On the other hand it is not a problem tofeed such data to some tool using native messaging protocol. I have noidea if various publisher provide such data in a uniform way, I justhope that pressure from citation indices and bibliography managementsoftware has positive influence on standardization.

I am not going to blow up the code with recipes for particular sites.However I realize that some special cases still should be handled. I amnot ready to adapt user script model used byGreasemonkey/Violentmonkey/Tampermonkey. I believe, it is better tocreate dedicated extension(s) that either adds and overwrites existingmeta elements or allows to query gathered data using sendMessagewebextensions interface. By the way, scripts for above mentionedextensions could be used as well. It should alleviate cases when somesite with insane metadata is important for particular user.

P.S. Some links I collected myself when working on org-capture-ref. They
might also be of interest for you:

- https://github.com/ageitgey/node-unfluff
- https://github.com/gabceb/node-metainspector
- https://github.com/wikimedia/html-metadata
- https://github.com/microlinkhq/metascraper
- https://github.com/hboisgibault/unicontent

Thank you for the links. I should have a closer look at that projects.E.g. I considered itemprop="author" elements but postponedimplementation of such features. For some reason I even did not tried tofind existing projects for metadata extraction. Maybe I still hope thatquite simple implementation could handle most of the cases.

[Prev in Thread]

Current Thread

[Next in Thread]

Yet another browser extension for capturing notes - LinkRemark, Maxim Nikulin, 2020/12/25
- Re: Yet another browser extension for capturing notes - LinkRemark, Ihor Radchenko, 2020/12/25
  - Re: Yet another browser extension for capturing notes - LinkRemark, Maxim Nikulin <=
    - Re: Yet another browser extension for capturing notes - LinkRemark, Ihor Radchenko, 2020/12/26
    - Re: Yet another browser extension for capturing notes - LinkRemark, Maxim Nikulin, 2020/12/27
- Re: Yet another browser extension for capturing notes - LinkRemark, Russell Adams, 2020/12/25
  - Re: Yet another browser extension for capturing notes - LinkRemark, Samuel Wales, 2020/12/25
    - Re: Yet another browser extension for capturing notes - LinkRemark, Maxim Nikulin, 2020/12/26

Prev by Date: Re: [9.4] Fixing logbook visibility during isearch
Next by Date: Re: [9.4] Fixing logbook visibility during isearch
Previous by thread: Re: Yet another browser extension for capturing notes - LinkRemark
Next by thread: Re: Yet another browser extension for capturing notes - LinkRemark
Index(es):
- Date
- Thread