[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Question about data structures
From: |
Taylan Kammer |
Subject: |
Re: Question about data structures |
Date: |
Mon, 23 Nov 2020 04:42:37 +0100 |
User-agent: |
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0 |
On 22.11.2020 19:48, Zelphir Kaltstahl wrote:
Hello Guile Users!
I have a question about data structures.
[...]
How do you approach this problem? Is it a problem at all?
First of all, be cautious about premature optimization. In many cases
it's best to just write the code the most straightforward way possible
with the tools at hand, and not bother with optimization unless it
actually proves to be an issue. Are you going to be processing files
with millions of lines? Thousands of lines but on a very weak CPU?
Does it matter if your program takes 0.1 seconds or 2 seconds to run?
Now the actual answer, in case you need to optimize, or just want to
learn more:
All data structures that offer a sequential list of elements have to
make some trade-offs between the performance of various operations, as
well as the implementation complexity. Linked lists (i.e. "lists" in
Scheme) are very simple, and a few operations are cheap as well, but
they have the shortcomings you've described plus some more.
Since your main concern seems to be appending, you could simply use a
linked list where you keep a reference to the last cons pair (tail) of
the list, so appending is simply a matter of a 'set-cdr!' operation on
the tail.
Python lists, JDK's ArrayList, and .NET ArrayList, among probably many
other "list" or "array" data structures in popular languages nowadays
use a relatively straightforward data structure that is backed by an
actual array which can have empty slots (e.g. your Python list with 3
elements might be backed by an array of size 10), and is reallocated
whenever there's no space left. This means that appending an element at
the end is usually dirt cheap, until there's no space left, at which
point the append operation is much heavier for one call, then the
following calls are dirt cheap again, until it's full again...
Inserting an element at the beginning or middle is also relatively
expensive with those implementations, since all elements need to be
shifted forward to make space for the new element. (Although this might
be done with an operation like C's memcpy which is still actually very
fast.)
It's called a "dynamic array" by Wikipedia:
https://en.wikipedia.org/wiki/Dynamic_array
If you want to go on an adventure, you could implement a Scheme data
structure called DVector that implements this strategy, using plain
Scheme vectors for the backing array.
The VList has also been mentioned in this thread, but from what I can
tell it doesn't seem to offer a very efficient append operation.
- Taylan