Re: [Bug-librejs] Non triviality criterion

>We consider modification of the document non-trivial. There shouldn't be
>much that _javascript_ could do that we would consider trivial, for
>anything else a free software license would be required.

I see.

Can you give me a general idea about the preferred way for me to go about implementing these rules?

>Using regexps to check for the existence of non-trivial actions (such as
>using ajax or calling eval) can filter many scripts quickly, so it is a
>good way to speed the test up. But it is not sufficient.

If you believe that this system I came up with can be used as a first pass to filter out scripts quickly, can you help me understand what would be needed in addition to what is already guaranteed by this first pass?

On Fri, Sep 29, 2017 at 7:58 PM, Ruben Rodriguez <address@hidden> wrote:

On 28/09/17 19:11, Nathan Nichols wrote:
>
> Recently, I was told that there has been some discussion about a new set
> of rules for determining if a script is trivial or not. I'm not sure
> about the details of this, but it is something I have been thinking
> about as well.
>
> I'm writing this email to make it known how webExtensions LibreJS
> handles this problem because it does so differently than the
> documentation describes.
>
> For one, it doesn't consider defining a function to be nontrivial. I
> just didn't see that as necessary.
>
> My system takes in a list of identifiers and uses a regex to see if code
> contains them or not. It does not distinguish between methods and
> objects as the current LibreJS does nor does it use a library to parse
> _javascript_. It only deals with the names of "top-level" variables and
> does not parse names of methods or detect bracket suffix notation.
>
> For example, if the list of identifiers contains "testvar", the code
> "testvar.anything()" is nontrivial. So is "testvar[anything]",
> "testvar()", etc. On the other hand, "anything.testvar()" would be
> trivial. If we wanted to flag "eval()" which can also be accessed
> through "window.eval()," the flagged identifier list must contain
> "window" as well as "eval."
>
> If I am correct, this system is able to guarantee with certainty that
> code without a free license will never be able to get a reference to the
> "eval" object, or anything else we choose. The main idea behind it is
> that _javascript_ ceases to be trivial when it is doing more than just
> modifying a document.
>
> This means that it has to leave the document object unflagged because
> there's not much _javascript_ can do if it can't modify a document.

We consider modification of the document non-trivial. There shouldn't be
much that _javascript_ could do that we would consider trivial, for
anything else a free software license would be required.

An example of the kind of scripts to preserve is library.mit.edu, where
the scripts do things like change the page location, set a cookie or set
focus on the form. Other actions in that site (google analytics or ajax
calls) would not be considered trivial.

> This leads us to the problem of scripts inserting a script tag into HTML
> to remotely load a script.
>
> Remote script tags with an src="" attribute get accepted/denied in
> the same way as everything else, so I don't see that as an issue. This
> leaves the possibility of _javascript_ that was stored in the original
> script as a string which may be inserted into the DOM in a script tag.
>
> I believe there is a way to make scripts like these get denied by the
> browser by changing the content security policy (CSP) headers of
> documents. I haven't implemented this yet, though.
>
> It was necessary for me to come up with something simpler in order to
> progress with LibreJS. But, this new code is actually better in some
> ways. For one, I suspect this is a lot faster than the current system.

Using regexps to check for the existence of non-trivial actions (such as
using ajax or calling eval) can filter many scripts quickly, so it is a
good way to speed the test up. But it is not sufficient.

The original criterion as described in
https://www.gnu.org/philosophy/_javascript_-trap.en.html is to be
considered, but it has been reworked after realizing that scripts should
be evaluated independently of each other to avoid breaking the
asynchronous loading model of JS.

> Also, These rules are probably a lot easier to understand than what we
> had before because there is only a list of objects that aren't allowed.
> The documentation would read "You may not access any of the following
> objects: ... or load external scripts" and that would be all.

The criterion as last discussed is as follows:

- External scripts are always non-trivial. For in-line scripts:

- For each function* definition:
It must call only primitives. **
The number of conditionals and loops must be at most 3.
It does not declare an array more than 50 elements long.
It must not call itself

- For the rest of the script, outside of function definitions:
It must call only primitives and functions defined above in the page.
The number of conditionals and loops must be at most 3.
It does not declare an array more than 50 elements long.

* "function" means anything executable that gets a name, including methods.

** safe primitives exclude
- eval()
- ajax
- calling methods with the square bracket notation
- altering the dom

From:	Nathan Nichols
Subject:	Re: [Bug-librejs] Non triviality criterion
Date:	Sun, 1 Oct 2017 11:03:13 -0500