[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] Elpa: Pinpoint semantics of `seq-subseq' for streams
From: |
Clément Pit--Claudel |
Subject: |
Re: [PATCH] Elpa: Pinpoint semantics of `seq-subseq' for streams |
Date: |
Wed, 14 Sep 2016 19:26:03 -0400 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 |
On 2016-09-14 11:05, Michael Heerdegen wrote:
> Clément Pit--Claudel <address@hidden> writes:
>> … If the file is represented as a stream of lines, then tail makes
>> sense. Doesn't it? Converting the file to a list of lines
>> beforehand sounds like a bad idea, memory wise.
>
> Indeed, in this case, it would be nonsense to convert into a list.
Ok, so we agree on this part :) Note that it may be nonsense to load the file
into a buffer, too; holding the full file in a list is more or less as bad as
holding the full file in a buffer. If the file is large, that's very
inefficient.
>> I don't see why; isn't it common to implement slyding-window-style
>> algorithms on data streams? 'tail' is just one such example.
>
> Do you have an example where this would be useful for streams in
> Emacs, different from the "last n lines" one (see my comment about
> that below).
Sure: if I have a stream of numbers, I may want to calculate the average of the
last n numbers. Or if I have a stream of non-overlapping buffer spans, I may
want to remove just the last one. To continue on the `last n lines' example, I
may be interested in only the last n lines of output of a running process
(think dmesg | tail).
>> Let me know what you think of the 'last n lines of a file'
>> example.
>
> I think this is actually a very good example why it is good to
> forbid negative indexes. If you are interested in the last n lines
> of a file, why would you dissect the complete file (or buffer) into
> lines and throw away nearly all of the result?
Because it's much more memory-efficient, as long as the file's lines are short
:) Note that I was careful to say file, not buffer: I don't need to load a
full file in memory before I start processing its lines. Same for the output
of a running process: if I just want the last n lines, then accumulating all of
the output before going to the end and looking backwards is extremely
inefficient, memory-wise. Dissecting the output (splitting it on newlines) and
using a ring buffer to keep only the last `n` ones is much better.
> In Emacs, if you dissect a buffer into its lines, this can take
> seconds if the buffer is a bit larger (no matter if you collect the
> lines in a data structure or just throw the result away).
I agree; this approach would not be optimal for a buffer. But I'm not talking
about buffers :)
> If you are interested in the last n lines, this is potentially
> unlimited inefficient.
Loading the contents of a file into a buffer costs time linear in the size of
the file and memory linear in that size too. If I have a large file on disk,
loading all of it into a buffer just to get the last few lines would be
horribly inefficient. On the other hand, the stream-based approach using
subseq with negative indices would take time linear in the size of the buffer,
but memory linear in just the number of lines that I want to keep.
> Instead, go to the end of buffer, go back N lines, and start building
> a stream from there.
I don't know of any implementation of the `tail` utility that works this way
(loading the entire file, then looking backwards for newlines). The approach
you outline is the obvious one if the file is already loaded in a buffer, but
that's not my use case (sorry for not making that clear!).
> So, in this case, in my opinion forbidding negative indexes would
> have saved you from the error of writing bad code. Negative indexes
> would be an invitation to write bad code.
Could it be that I didn't explain the problem well, and you concluded it was
bad code based on that? The stream-based tail implementation has significant
advantages in terms of memory usage. Forbidding negative indexes would have
forced me to either write memory-inefficient code, or more likely to just
reimplement the required subseq functionality myself using a circular (ring)
buffer.
> Maybe it would be better to provide that semantics in a separate
> function, so that the programmer is forced to think about what he is
> doing. WDYT?
I don't know :) By default, I'd assume programmers already think about what
they are doing.
signature.asc
Description: OpenPGP digital signature
- [PATCH] Elpa: Pinpoint semantics of `seq-subseq' for streams, Michael Heerdegen, 2016/09/13
- Re: [PATCH] Elpa: Pinpoint semantics of `seq-subseq' for streams, Clément Pit--Claudel, 2016/09/13
- Re: [PATCH] Elpa: Pinpoint semantics of `seq-subseq' for streams, Michael Heerdegen, 2016/09/13
- Re: [PATCH] Elpa: Pinpoint semantics of `seq-subseq' for streams, Clément Pit--Claudel, 2016/09/13
- Re: [PATCH] Elpa: Pinpoint semantics of `seq-subseq' for streams, Michael Heerdegen, 2016/09/14
- Re: [PATCH] Elpa: Pinpoint semantics of `seq-subseq' for streams,
Clément Pit--Claudel <=
- Re: [PATCH] Elpa: Pinpoint semantics of `seq-subseq' for streams, John Mastro, 2016/09/14
- Re: [PATCH] Elpa: Pinpoint semantics of `seq-subseq' for streams, Clément Pit--Claudel, 2016/09/14
- Re: [PATCH] Elpa: Pinpoint semantics of `seq-subseq' for streams, John Mastro, 2016/09/15
- Re: [PATCH] Elpa: Pinpoint semantics of `seq-subseq' for streams, Michael Heerdegen, 2016/09/15
- Re: [PATCH] Elpa: Pinpoint semantics of `seq-subseq' for streams, Clément Pit--Claudel, 2016/09/15
- Re: [PATCH] Elpa: Pinpoint semantics of `seq-subseq' for streams, Michael Heerdegen, 2016/09/15
- Re: [PATCH] Elpa: Pinpoint semantics of `seq-subseq' for streams, Clément Pit--Claudel, 2016/09/15
- Re: [PATCH] Elpa: Pinpoint semantics of `seq-subseq' for streams, Michael Heerdegen, 2016/09/14
- Re: [PATCH] Elpa: Pinpoint semantics of `seq-subseq' for streams, Clément Pit--Claudel, 2016/09/14
- Re: [PATCH] Elpa: Pinpoint semantics of `seq-subseq' for streams, Nicolas Petton, 2016/09/15