bug-librejs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LibreJS seems to ignore query strings


From: Yuchen Pei
Subject: Re: LibreJS seems to ignore query strings
Date: Fri, 18 Nov 2022 00:40:15 +1100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)

On Fri 2022-11-11 15:27:41 -0600, Jacob K wrote:

> Hello, thanks for the explanation.
> On 11/9/22 19:44, Yuchen Pei wrote:
>> Hello,
>> 
>> Thanks for the detailed report.
>> On Tue 2022-11-08 17:07:18 -0600, Jacob K wrote:
> [...]
>> 
>> LibreJS removes the query part of a script url as a preprocessing in
>> most (if not all) functions handling scripts.  This means if you
>> whitelist https://foo.com/bar.js, https://foo.com/bar.js?blah is also
>> let through.  OTOH without such whitelisting,
>> https://foo.com/bar.js?blah is blocked as usual if it is not labelled.
>> This is because the response processor checks the external script and
>> rewrites it to /* LibreJS: script blocked ... */.
>> 
>> I suspect the reason for discarding the query part is to avoid having to
>> whitelist all possible query strings which can be tedious.  Perhaps a
>> better approach is to refine the whitelisting facility to allow patterns
>> like globbing and regexes.
> Would it make sense to generally keep handling query strings the same,
> but make the link the user clicks on go to the version with the query
> string included (possibly with a warning that there is a query string
> and that whitelisting the script will whitelist all query strings)? That
> way clicking "Show" next to a script will always take the user to the
> currently blocked or running script.

Definitely.  Patches welcome, otherwise I'll work on it when I get time.

>
>> 
>>>
>>> Ideally, I think LibreJS should store checksums of scripts, but it seems
>>> like it only does this for inline scripts currently?
>> 
>> LibreJS does use hashes of scripts, but only in the built-in whiltelist
>> (see /utilities/hash_script/whitelist).
>> 
>> Best,
>> Yuchen
>> 
>
> Slightly off-topic, but is there a good system set up to add new scripts
> to the internal whitelist? I often see free libraries that are not
> recognized by LibreJS, and it seems like a group of motivated users
> might be better at labeling them than the library developers, at least
> when the library developers do not care about LibreJS.

There isn't one yet, but I've been thinking about how to improve the
script recognition. One idea is to set up a server program, that
maintains a database of webpages and external scripts used in these
webpages. Users can submit a url containing only free js, and the server
will run the headless compliance check on the page, display the check
results to the user, and record the results (librejs version, webpage
url, script urls, script hash, status of each script - accepted or
rejected, reason for acceptance (what licenses) / rejection).

The server will provide API endpoints for listing fully compliant urls,
and statistics of scripts (e.g. counts which indicates well-knowness /
popularity of the scripts).  The former can be used by users for
discovery of nice websites, and the latter can be used by librejs users
to whitelist scripts by hashes / names and librejs developers to decide
mechanisms to add for more recognition (for example, if 99% of the
unrecongised scripts are annotated using spdx, then maybe it makes sense
to add a user option in librejs to enable spdx, despite the problems
with the lack of license headers in spdx annotations).  Librejs can also
simply download the database from the server, and provide user options
to auto whitelist scripts by hash (e.g. set a threshold for the counts).

The tricky part is how do we make sure the server only contains free
script.  FSD has a review process, but we probably want something
faster.

One problem is the server is basically an SaaSS.  The server program
will be free and easy for self-hosting, but we'll probably want one
central server with THE database.  The server runs librejs headless
compliance check, which is computation the user can do on their own
computer.  Alternatively the server can simply take user input for
compliance results, but then users may make mistakes and this opens to
more spam and inaccuracies.

Best,
Yuchen

-- 
PGP Key: 47F9 D050 1E11 8879 9040  4941 2126 7E93 EF86 DFD0
          <https://ypei.org/assets/ypei-pubkey.txt>

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]