[Nano-devel] Nano and mmap/partial loading/lazy updates

nano-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Nano-devel] Nano and mmap/partial loading/lazy updates

From:	Devin Hussey
Subject:	[Nano-devel] Nano and mmap/partial loading/lazy updates
Date:	Thu, 16 May 2019 12:57:00 -0400

Ack, sorry about the HTML and the full digest quote. I keep forgetting
that the Gmail app sends HTML whether you like it or not. Why won't
they ever fix it? :(

Anyways, to expand on what I wrote, ne and le use mmap as well as vim,
emacs, and even Windows Notepad. mmap allows us to do stuff like
memchr, regexec, etc on the file itself.

If we were to transition to mmap, we would probably do something like this:

1. First, we would convert direct accesses to the linestruct to inlines.
2. Then, we would rewrite the IO to use mmap and also do stuff like
lazy loading now that accessing stuff like next is abstracted.
3. Remove stuff we don't need from the linestruct.
4. Set up a vector of offsets in the memory map. Each offset points to
the start of a line. Now, accessing a line is O(1). Whenever we update
it, we just modify the values after the changed line.

When we first open nano, we read up to LINES lines from the file and
display them instantly. Then we lazy load more data.

I think it is best to have a ring buffer with LINES * 2 entries to
store the cached data.
This cached data would be something like this:

struct line_cache {
    bool modified; // whether we modified it
    size_t original_length; // length before last modification
    size_t line_number; // line number
    size_t length; // how long our text is
    size_t capacity; // overallocate to make local edits faster
    cchar_t *colored_text; // converted to wchar_t and color data
added for ncursesw
    char text[]; // text buffer (flexible array member for cache's sake)
};

When one of the following events occurs:
 - User is idle for 1 second
 - Scrolling past the end of the ring buffer
 - Opening a menu
 - Multi-line change made
 - Autosave
 - New undo step
 - A line_cache fills up

we should lazily update things:

1. If modified:
   1. Remap the mmap if necessary
   2. Insert the data
   3. Recalculate the line offset vector
   4. Update hidden lines
   5. Write autosave data
   6. Check single line regexes
2. If our ring buffer is not full, try to load more data
3. If we are close to the end of the ring buffer, center it so we have
scroll data for both directions.
4. If we match a multiline start, search for the end. We also cache
the multiline starts before the cursor so we can search for them.
Example:
/*
-----------------buffer-------------------
foo
-----------------buffer-------------------
*/


Obviously, this would not be a trivial task, but at the very least we
should try to implement memory mapping and/or partial loading. That
would vastly improve performance.

> From: Devin Hussey <address@hidden>
> To: address@hidden
> Cc:
> Bcc:
> Date: Thu, 16 May 2019 10:11:55 -0400
> Subject: Re: [Nano-devel] patch #9772: Add color name definition to the 
> nanorc configuration (color schemes)
> >> Personally, I am a little surprised that centi-second differences in
> >> startup time are of concern to users for that change,
>
> >I'm not looking at the absolute numbers, I only look at percentages.
> >If startup time can be reduced by thirty or forty percent, then that is
> >worth it, depending on the amount of code it takes. Nano is dead-slow
> >compared to things like 'ne' and 'le' -- their snappiness is enviable.
>
> Well I believe that is mostly because of how they handle files.
>
> IIRC, ne, le, vim, and perhaps Emacs don't load the entire file.
>
> The bottlenecks in Nano I presume would be
> 1. Reading large files, as nano's current structure requires reading the 
> entire file.
> 2. Linked lists: Jumping around memory like that is terrible in a cache based 
> world. Vectors, or at least hybrid lists, are usually better.
> 3. Rendering. I don't think that nano's coloring rules are very efficient; 
> ideally we would use ncurses's extended chars and use its native scrolling 
> feature. Old curses and slang can f*** themselves.
> 4. Regex: not a big fan of gnulib's regex parser, it is not very fast. 
> Ideally we should use pcre-jit or rure. Additionally, **every syntax 
> definition** requires regex. Even keywords.
>
> An ideal restructure of nano would try to use memory mapping if possible with 
> a malloc+fread fallback.

[Prev in Thread]

Current Thread

[Next in Thread]

[Nano-devel] Nano and mmap/partial loading/lazy updates, Devin Hussey <=

Prev by Date: Re: [Nano-devel] patch #9772: Add color name definition to the nanorc configuration (color schemes)
Next by Date: Re: [Nano-devel] [PATCH 3/3] files: check for an empty FIFO before blocking on it
Previous by thread: [Nano-devel] patch #9772: Add color name definition to the nanorc configuration (color schemes)
Next by thread: [Nano-devel] some nano patches for the next Debian release
Index(es):
- Date
- Thread