[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Somewhat relational database using wiredtiger (and else)

From: Amirouche Boubekki
Subject: Somewhat relational database using wiredtiger (and else)
Date: Sun, 26 Jun 2016 23:10:10 +0200
User-agent: Roundcube Webmail/1.1.2

Héllo all,

# Click bait

I've written an article trying to explicit a workflow similar
to the one used in RDBMS. It's very natural to do this in wiredtiger
even if you lake the high level abstractions of a SQL DSL. Through
a few procedures I explain that it's a simple to:

- Define tables with simple indices
- Insert new rows in tables
- Resolve foreign keys via an index
- Pagination is basically smart calls to list-tail and list-head

I think this covers all basic uses of RDBMS except multithreading
and transactions. If you think I missed basic usage of databases
please tell me!

Have a look at

## Transactions

wiredtiger support transactions have a look at the source of
wiredtigerz.scm for a quick introduction.

## Multithreading and Multiprocessing

wiredtiger doesn't support multiprocessing but does support

I think that a good start if you require multiprocessing is to
create a database server similar to UAV database server [0]
which relies on eval. If you have security concerns you can create
a database server that does RPC via stored procedures. And since
`read' and `write' are not safe either, you might find msgpack
scheme port useful [1].


# Where to go from here

Guile bindings of wiredtiger 2.6.1 can be fetched using
the following command:

git clone

wiredtiger itself is still available online via:


It's very simple to install and the guix recipe is trivial ;)

Becarful only 64bit arch are supported by wiredtiger.

## Reading code

I prototyped a few things with this bindings:

- I mocked an hyper graph database called culture [2]
- Tuple space database (with SPARQL-like querying (supported by minikanren)) called UAV database [3] - UAV database is used to build a nanoblog, a twitter-like blogging web app [4]. nanoblog exist in an artanis version and plain Guile web.
- A search engine based on UAV database called hyper [5]


## Reading frenglish

I wrote two other articles about this bindings on my blog:

- Getting started with guile-wiredtiger [6]
- Getting started with UAV database [7]


## Where Do *I* Go From Here

From my perspective using guile-wiredtiger is the most convenient way
to create database backed applications in Guile but I know the API very

I also read good things about it. I also compared it against bsddb and
leveldb which have less features and are slower.

It lakes true mutlithreading story right now, but it's I think not
too complex enough to come up with a mutlithreaded server that use
msgpack to transport the query with its params and bind the query
to the params using eval server side. I think it's secure to do that
this way.
I do not have the motivation to code it right now because of the
lake of feedback. Remember, maybe I'm drinking my kool-aid...

Don't be fooled by the fact that it was *recently* acquired by mongodb.
It's is only the primary backend of mongodb since 3.2 (IIRC) and there
isn't much feedback on mongodb since then.

Also wiredtiger is *not* mongodb.

(beware there is a lot of buzz words in what follows).

Regarding hyper, the search engine. I've been thinking about moving
the database to RDBMS style. This sounds more wiredtiger native solution.
The problem with that solution is that I think the wiredtiger API is
more difficult to understand in the context of a search engine than
the UAV database which is tuple database with a document oriented API.
Similarly UAV database is not as good as a graph database when it comes
to dealing with graphs. Again maybe I drink too much of my own kool-aid
but everything is a graph!

I want hyper to be the hackable search engine of the Culture [*], as such it should be as simple as possible to manipulate the part of the Internet that is scraped (and enriched) with *Scheme*. Yes, I want to interact with my data using scheme code primarily. Doing simple search queries is a solved problem.
The big problem is to enrich the search engine semantic and make it
as simple as possible to so. Making as simple as possible to hack on hyper.


That's what I've been looking for all along. So my current plan is to
experiment with a graph frontend [8] for wiredtiger. If the current features of hyper looks nicer with graph API I'll then think about rebasing the graph frontend on top the UAV layer. This is sort of a graphdb implemented using an RDF store. This would be useful because I *think* that sometimes the same data can be queried using SPARQL-like queries and sometimes using a more graph-y approach like Gremling querying but I'm not sure. (What about graph
pattern matching?!).


Choosing wiredtiger and Guile is part of my reasoning to make hyper
and hackable search engine.

Guile provide a nice threading story (aka. no GIL) and writing code
to be eval'ed is much nicer than in non-lisp languages. Ok it sound
like a bad idea but bear with me.

Wiredtiger on the other hand is simply said the best database engine
out there. But why choose a low level component like wiredtiger?

In my reasoning there is the problem that I still don't know which
features I need nor want.

If I start with a RDBMS, I might end up with not efficient code for
doing spell checking or graph traversal. I will need to learn silly
database management command instead of using cp. If I want to make
usable some feature I'd need to choose another database, which would
make the overall solution more complex. If I use a graphdb, I'll inherit
a giant blob of Java (because all free graphdb that are ACID are
Javaesque) which a) might not be as efficient as Guile for hacking
b) might be a good solution for my professional carrier but not so
much for my interests c) I will never have full control over the
database. And if you wonder why I don't choosed to use REDIS as my
primary data storage, you definitely don't know REDIS well enough.

Simply said I choose a database engine because I want my data storage
to be versatile, safe and accessible. I choosed wiredtiger because
it's the best of this kind.

The questions that remains to be answered are:

1) which primary data model: table, tuple or graph oriented?

Actually this is a feature by feature question. And guile-wiredtiger
is written in a way that allows to easily compose database paradigms.

2) how to scale horizontally? how do to multiprocessing?

Now that I think about it again, I remember that one of the
founding grounds of this project is that there will be *no* horizontal
scaling (multiple machines hosting hyper's database) because of gravity
cost (cf. Culture & Empire [*]) this will not be required. This kind of
a philosophical ground.

I am mostly thinking about single host vertical scaling and that is the
most important matter right now since the search engine semantic is very
poor. I'd rather code features and optimize and scale them next.

Still multiprocessing can be interesting to demo the search engine over
a greedy subset of the GNU Guile Internet... Wait... This can wait!

Thanks for your interest.

Amirouche ~ amz3 ~

reply via email to

[Prev in Thread] Current Thread [Next in Thread]