[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Sat, 10 Feb 2018 08:34:13 +0100
I figured a usecase for an immutable / functional database that works
like git. I like the "streamable immutable database" name but not sure
This prolly seems ambitious and pretentious, that said, I am certain I
get it done. The only uncertainty is performance, but I have also ideas
The idea of building git-like database is not new but now I have a
picture of it.
The question you want to ask, is why not re-implement git in guile and
use wiredtiger as backing store. Well, that is a legitimate question.
What I am trying to achieve is something more general than git.
Feel free to point me to relevant documentation or argue that git in
guile is the
The main use case I want to handle, is the ability to experiment with
versions of a given machine learning model / data / dataset that might
be bigger than
RAM. That is, easily and efficiently switch from one version of the
model to another
without resorting on copying all the files or database.
That is a version-ed branch-able fork-able database.
Feel free to argue that data and code are different and that data MUST
distributed out-of-band, I will be reading with great interest.
It MUST have the following features:
- It support ACID transactions
- It's multi-threaded
- It's an association list database (like guile-wiredtiger's
keys are symbols and values are any scheme value. Otherwise said,
it's a document
- It support git like features ie. tags, branches, push, pull, revert,
log, diff and of course commits and revision. In particular, it's
to access the history of a given association.
- It's immutable in the sens that CRUD operation instead of changing
values in place create new entries in the database to reflect the
change. In terms of wiredtiger API, there is no call to cursor-update.
It's only using cursor-insert calls.
- 'neon checkout REV' will bring in the working space a more efficient
of the data. That representation MUST BE configurable. Otherwise said,
if the user wants to version csv, a geo-temporal data, timeseries or
whatever it must
- It SHOULD allow to mix data with source files.
- It SHOULD also allow to store efficiently binaries.
- code the "bare database" ie. the gist of the story that is the
list that takes inspiration from git.
- create benchmarks
- Index conceptnet and wikidata and demo the git-like features over the
based named entity recognition.
- Functional database,