fenfire-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Fenfire-dev] PEG swamp_easier--benja: An easier API for Swamp


From: Tuomas Lukka
Subject: Re: [Fenfire-dev] PEG swamp_easier--benja: An easier API for Swamp
Date: Mon, 22 Sep 2003 13:10:38 +0300
User-agent: Mutt/1.5.4i

On Mon, Sep 22, 2003 at 12:54:32PM +0300, Benja Fallenstein wrote:
> Tuomas Lukka wrote:
> >On Mon, Sep 22, 2003 at 05:09:44AM +0300, Benja Fallenstein wrote:
> >>   for(Triples t = graph.get(_, RDF.type, _); t.loop();) {
> >>       System.out.println(t.sub+" is instance of "+t.ob);
> >>   }
> >
> >This is nice.
> >
> >ISSUE: Name for that call: get(...)? We have find() so far.
> 
> Hm. I've always advocated get ;-) ;-)
> 
> I've done some googling-- e.g. Aaron Swartz' Python API uses query(...) 
> (with similar semantics). The thing I don't like about find() and 
> query() is mostly psychological: they seem to indicate a little effort, 
> whereas get(...) sounds like something that's essentially free. But 
> that's only a mild objection to find(), not a strong one.
> 
> What do you think?

I feel better about find(), since it 

1) feels lighter than query
2) feels heavier than get, as it should - we don't *necessarily*
   have all indices ready.

And it's consistent with what code is there already. If there's
a change, change all the occurrences.

> >ISSUE: Name for the iterator-like thing that goes through triples.
> >"Triples" says it contains several triples while it has only one 
> >at a time. "TripleIterator", "TripleIter", ...?
> 
> I wanted it to be short, of course, but I guess you have a point. 
> ``TripleIter`` should be fine... ::
> 
>     for(TripleIter i = graph.get(_, RDF.type, _); i.loop();) {
>         System.out.println(i.subj+" instance of "+i.obj);
>     }
> 
> I still prefer ``Triples``, but I'm willing to settle for ``TripleIter``.

I'd prefer Iter, as it says what it is.

> >>However, to be fair, my code isn't how it would look
> >>when efficiency is at a premium. (Then again, when I print
> >>to the console inside the loop, efficiency isn't at a
> >>premium anyway... but whatever...) The *fast* version
> >>would look like this::
> >
> >Umm, you should note here that the efficiency difference is in the call,
> >not in the actual code, as get() can be just a set of if clauses
> 
> True.
> 
> >and actually I think that hotspot might be able to handle it.
> 
> I earlier suggested that and you were suspicious of it ;-) I do agree-- 
> it's essentially three ``jnz``s per ``get()``, very cheap. I can say 
> this in the PEG.

Three jnz's and a method call.

> >However, there's another performance difference with the Triples objects
> >which you haven't mentioned: *all* members need to be fetched each
> >time.
> 
> Not exactly true: Only the members which change need to be. E.g., if you 
> have ::
> 
>     get(_, RDF.type, _)
> 
> only the subject and object need to be loaded each time.
> 
> And most of the time if you do such a query you would want to use both 
> of them. So it would only cost extra if you do such a query, but do 
> *not* use both subject and object.

Issue: Should you be able to query just subjects, i.e. ignoring objects,
having them null in the triples and not getting duplicates?

> Still, can note it in the PEG. -- Or maybe we *should* have::
> 
>     Object subj(), pred(), obj();
> 
> These are also nicer because they can give error messages when 
> ``next()`` hasn't been called yet. Opinions?

Hmm, could you test what the performances are then? How well
is hotspot able to get that if?

> >>   for(Triples t = graph.get_A1A(RDF.type); t.loop()) {
> >>       System.out.println(t.sub+" is instance of "+t.ob);   
> >>   }
> >
> >Note: missing a semicolon.
> >
> >ISSUE: Naming. I'd think find_X1X_Triples would make more sense here.
> 
> find...Triples: Any particular reason?

we have find_..._Iter, it would be easiest to put the return type
there and once we have used swamp for several years and *know* the
best solution, we'll take that as the return type.

The point is that you can't overload just by return type.

> >>   Object getSubject(Object subject, Object predicate, Object object);
> >>
> >>   Object getSubject_A1A(Object predicate);
> >>   ...
> >
> >ISSUE: If there is more than one?
> 
> Clarified on IRC: The issue is what happens if there is more than one 
> matching triple.
> 
> The current way is to throw NotUniqueException.

The javadoc didn't say that.

> There's a problem with that: Basically always when a property has 
> cardinality one, there can still be two nodes in the graph, e.g.::
> 
>     x:foo   ex:homeCountry   y:bar .
>     x:foo   ex:homeCountry   z:baz .
> 
> because ``x:bar`` and ``x:baz`` may represent the same resource. (You 
> cannot require global agreement on the one URI to be used for every 
> particular thing in the world.)
> 
> So signalling an error isn't necessarily correct.
> 
> Jena returns just an arbitrary one of the matching triples in a similar 
> situation; I'm leaning towards that.

I'd *really* hate that one -- I'd prefer swamp to have totally clear
semantics, with the only arbitrary thing being the order in which a set
is iterated through. 


> >>The iterator-like object, ``Triples``, shall have
> >>the following API::
> >>
> >>   Object sub, pred, ob;
> >
> >Issue: Names. subj, pred, obj would be more consistent, i.e.
> >up to the *end* of the second consonant group.
> 
> Yes, but these are also impossible to pronounce... "SUB-djjjj"
> 
> Sub, pred, ob are the shortest abbreviations that have a chance to get 
> understood, so they're consistent in a sense, too. ;-) I.e., "su" or 
> "pre" would be misleading/not understood.

"s", "p", "o"?

I've seen subj used elsewhere as an abbrev. to subject, but never sub - it's a 
prefix,
as is ob.

> >>The purpose of ``loop()`` is to enable the common loop
> >>pattern, ::
> >>
> >>   for(Triples t = graph.get(...); t.loop();) {
> >>       // ...
> >>   }
> >>
> >>which would otherwise have to be written as::
> >>
> >>   Triples t;
> >>   for(t = graph.get(...); t.hasNext(); t.next()) {
> >>       // ...
> >>   }
> >>   t.free();
> >
> >This should go into the javadoc.
> 
> Sure, but for the PEG I found it easier to read in the body, and the 
> javadoc is in the PEG for clarification of the PEG, no?

Ok.

> The examples should go into the *class*'s javadoc actually, I think.

Exactly what I meant.

        Tuomas






reply via email to

[Prev in Thread] Current Thread [Next in Thread]