guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Question about data structures


From: Taylan Kammer
Subject: Re: Question about data structures
Date: Mon, 23 Nov 2020 04:42:37 +0100
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0

On 22.11.2020 19:48, Zelphir Kaltstahl wrote:
Hello Guile Users!

I have a question about data structures.

[...]

How do you approach this problem? Is it a problem at all?

First of all, be cautious about premature optimization. In many cases it's best to just write the code the most straightforward way possible with the tools at hand, and not bother with optimization unless it actually proves to be an issue. Are you going to be processing files with millions of lines? Thousands of lines but on a very weak CPU? Does it matter if your program takes 0.1 seconds or 2 seconds to run?


Now the actual answer, in case you need to optimize, or just want to learn more:


All data structures that offer a sequential list of elements have to make some trade-offs between the performance of various operations, as well as the implementation complexity. Linked lists (i.e. "lists" in Scheme) are very simple, and a few operations are cheap as well, but they have the shortcomings you've described plus some more.


Since your main concern seems to be appending, you could simply use a linked list where you keep a reference to the last cons pair (tail) of the list, so appending is simply a matter of a 'set-cdr!' operation on the tail.


Python lists, JDK's ArrayList, and .NET ArrayList, among probably many other "list" or "array" data structures in popular languages nowadays use a relatively straightforward data structure that is backed by an actual array which can have empty slots (e.g. your Python list with 3 elements might be backed by an array of size 10), and is reallocated whenever there's no space left. This means that appending an element at the end is usually dirt cheap, until there's no space left, at which point the append operation is much heavier for one call, then the following calls are dirt cheap again, until it's full again...

Inserting an element at the beginning or middle is also relatively expensive with those implementations, since all elements need to be shifted forward to make space for the new element. (Although this might be done with an operation like C's memcpy which is still actually very fast.)

It's called a "dynamic array" by Wikipedia:

    https://en.wikipedia.org/wiki/Dynamic_array

If you want to go on an adventure, you could implement a Scheme data structure called DVector that implements this strategy, using plain Scheme vectors for the backing array.


The VList has also been mentioned in this thread, but from what I can tell it doesn't seem to offer a very efficient append operation.


- Taylan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]