guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ANN] guile-wiredtiger 0.8.0


From: Amirouche
Subject: [ANN] guile-wiredtiger 0.8.0
Date: Thu, 16 May 2019 23:05:04 +0200
User-agent: Roundcube Webmail/1.3.8

I am please to announce the release of guile-wiredtiger 0.8.0.

You can find it at:

  https://framagit.org/a-guile-mind/guile-wiredtiger/

Or using my guix channel:

  $ cat ~/.config/guix/channels.scm
  (cons (channel
          (name 'amz3)
          (url "https://git.sr.ht/~amz3/guix-amz3-channel";))
        %default-channels)
  $ guix pull
  $ guix package -i address@hidden

Here is the list of changes:

- add support for single bytes column
- fix bug in rollback
- rely on guile-r7rs
- remove null byte added in strings
- add some benchmarks...
- add session-reset
- rename cursor-search to cursor-search?
- improve cursor-search-near to return a symbol or #f
- rename cursor-next and cursor-prev to cursor-next? and cursor-prev?
- %key-not-found is not public anymore, no need.

The main additions are:

- add (wiredtiger pack) lexicographic packing of scheme object
- add (wiredtiger okvs) SRFI-167
- add (wiredtiger nstore) SRFI-168

Eventually, I figured what went wrong. I faced two issues:

- wiredtiger raising WT_ROLLBACK using a single application thread
  and a single session which was due to the fact that I did not have
  a big enough cache for storing the whole transaction in memory.
  This is solved when using okvs with 'cache key set to a "reasonable"
  value. With gotofish I set the value to 1GB. It doesn't mean that
  the transaction can be 1GB big, it means that wiredtiger will use at
  most 1GB to execute a transaction.

- my program leaking memory. I am not sure but it is unlikely that guile
  part of the code leaks memory [...] AND I experimented with both Chez
  Scheme and Python, they both seems to leak memory. The latter takes
  more time but in the end the result is the same. I don't have mongodb
  confirmation, to my mind it is again due to a configuration problem.
The default configuration of wiredtiger use one thread for cache eviction. That is there is a single thread dedicated to fighting the growth of the
  cache using some Least Recently Used algorithm IIRC. Anyway, setting
okvs 'eviction-trigger to 85% (aka. triggers eviction when 85% of the cache is filled) and using 4 threads for eviction itself, allows gotofish.scm
  to complete its mission.

The key word is fine-tuning. That is what makes the database works.

So if you read the above carefully you figured that gotofish can index
wikipedia vital articles level 3 that is 500MB big in two hours. Let's
try it:

  $ time guile -L . gotofish.scm search GNU

** 0.09737717752984928: data/wikipedia-vital-articles-level-3/Mathematics/Arithmetic/Division_%28mathematics%29 ** 0.07194504699927694: data/wikipedia-vital-articles-level-3/Mathematics/Geometry/Trigonometry ** 0.06146528292562392: data/wikipedia-vital-articles-level-3/Mathematics/Other/Probability ** 0.03677014042867702: data/wikipedia-vital-articles-level-3/Society and social sciences/Language/Cyrillic_script ** 0.03422772617819057: data/wikipedia-vital-articles-level-3/Technology/Food and health/Medical_imaging ** 0.021683228730822873: data/wikipedia-vital-articles-level-3/Technology/Computing and information technology/Computer

  real  0m3.760s
  user  0m1.680s
  sys   0m2.090s

Three seconds is not bad since it includes the time necessary to open
the database. Also it is using a USB SSD. By the way, the database
behaves better on SSD without encryption...

gotofish code will prolly end up in guile-wiredtiger repository as an
example. In the mean time, it is available at:

  https://git.sr.ht/~amz3/guile-gotofish

Last but not least, there is still a non-determinist error about locale
that fails to be set, I don't know where it is coming from.

# What the future will bring

Regarding guile-wiredtiger, I hope to keep the interface as is. What I plan
to do is:

- drop the use of guile-bytestructures OR add support somehow for function
  pointer in C structs.

- Optimize for the single bytes column using a dedicated set of procedures

- Improve the support of scheme object in (wiredtiger pack)

By the way, I did some testing using guile-next from guix and nothing weird
happened.

I also tried sqlite3 lsm extension but it is (also!) leaking memory. So,
I will not take that route right now. There is also the possibility to
use rocksdb or even postgresql. Anyway, I prefer to continue building datae [0]
and then redo benchmark and switch database when I see it must be done.
That is SRFI 168 rely on SRFI 167 and it is easy to switch backend. I tried with foundationdb. Once you have the okvs interface tested, you just have to drop nstore.scm in your project and re-run the tests for nstore.scm. That the
magic of abstractions :)


Happy hacking!


[0] https://github.com/awesome-data-distribution/datae



reply via email to

[Prev in Thread] Current Thread [Next in Thread]