guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: neon: git for structured data [Was: Functional database]


From: amirouche
Subject: Re: neon: git for structured data [Was: Functional database]
Date: Wed, 21 Feb 2018 19:41:20 +0100

Héllo Roel,

Le mer. 21 févr. 2018 à 17:02, Roel Janssen <address@hidden> a écrit :
Dear Amirouche,

I'm not exactly sure if this fits in with your plans, but nevertheless
I'd like to share this code with you.

Thanks for the input.


I recently looked into using triple stores (actually quad stores)
and wrote an interface to Redland librdf for Guile.

Indeed quad stores. Triple store are only:

 subject predicate object

whereas quad stores are:

 graph subject predicate object

I did not grasp the difference between triple store and quad stores
until recently. see the definition of the w3c [0]

[0] https://www.w3.org/TR/rdf11-concepts/#section-rdf-graph

I somewhat looked at librdf before. In particular this is interesting:

Storage for graphs in memory and persistently with Oracle Berkeley DB,
   MySQL 3-5, PostgreSQL, OpenLink Virtoso, SQLite, files or URIs.

   http://librdf.org/

This is definitely a feature that should be backed into neon.
By the way, wiredtiger is the successor of Oracle Berkley DB.
It was created by the same developers.

The difference between neon and librdf are the following:

- Quads can be version-ed in branches without copy (implemented but
 on triples) making it effectively a quintuple store.

- You can pull / push graphs (called 'world' in librdf, i think)
 ie. you can neon clone part of the remote data repository the
 equivalent of git clone a particular directory (not implemented yet)

- The use of IRIs (or URIs) as 'graph name', 'subject' or 'predicate' is not enforced, this doesn't break compatibility with existing systems. That said, right now, I will implement 'object' as literals as the specification describe
 them [1] to allow compatibility with existing systems.

[1] https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal

Also, the API is I think simpler in neon:


I attached the source code of the interface.
With this interface, you can write something like this:

--8<---------------cut here---------------start------------->8---
(use-modules (redland rdf) ; The attached module.
             (system foreign))

(define world (rdf-world-new))
(rdf-world-open world)

(define store (rdf-storage-new
               world
               "hashes"
               "redland"
               "new=true,hash-type='bdb',dir='path/to/triplestore'"))

(define model (rdf-model-new world store %null-pointer))

(define local-uri (rdf-uri-new world "http://localhost:5000/Redland/";))
(define s (rdf-node-new-from-uri-local-name world local-uri "Test"))
(define p (rdf-node-new-from-uri-local-name world local-uri "TestPredicate")) (define o (rdf-node-new-from-uri-local-name world local-uri "TestObject"))

(define statement (rdf-statement-new-from-nodes world s p o))
(rdf-model-add-statement model statement)

The equivalent of this in neon is basically:

  (add context "Test" "TestPredicate" "TestObject")

Where 'context' is the database context somewhat equivalent to a 'cursor' in
postgresql parlance.

The strings are mapped to 64 bit unsigned integers in the underlying storage to save space and ease comparisons. subjects and predicates are each of them stored in specific tables which hot parts stay in RAM. It makes the string to integer resolution fast. Basically, I rely on the database layer to cache the integer value associated with subjects and predicates, for the time being.

Similarly to retrieve a triple right now, it can be done as follow:

  (ref context "Test" "TestPredicate")

It's a minor difference, and librdf API has the advantage of giving the choice
to the user to do caching themself.

(rdf-statement-free statement)

(rdf-model-size model)
(rdf-storage-size store)

;; Example mime-type: application/rdf+xml
(define serializer (rdf-serializer-new world %null-pointer "text/turtle" %null-pointer)) (define serialized (rdf-serializer-serialize-model-to-string serializer local-uri model))
(format #t "Serialized: ~s~%" (pointer->string serialized))

There is no turtle support yet.


(rdf-uri-free local-uri)
(rdf-model-free model)
(rdf-storage-free store)
(rdf-world-free world)
--8<---------------cut here---------------end--------------->8---

Kind regards,
Roel Janssen

Thanks Roel!






reply via email to

[Prev in Thread] Current Thread [Next in Thread]