fenfire-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Fenfire-dev] Repost: PEGs swamp_easier*--benja


From: Benja Fallenstein
Subject: Re: [Fenfire-dev] Repost: PEGs swamp_easier*--benja
Date: Sun, 28 Sep 2003 18:18:52 +0300
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030908 Debian/1.4-4


Hi,

(After replying to the individual points of your mail, I've come up with a potential compromise we should discuss; you may want to skip to the end of this mail to read it first.)

Matti Katila wrote:
On Sat, 27 Sep 2003, Benja Fallenstein wrote:

The (now) two related PEGs about the Swamp API, swamp_easier--benja and swamp_easier_iteration--benja: please say whether all of your comments have been addressed!

Umh, two pegs in one post is not good at all :/

Would having two related pegs in two different posts have been better?

I think we did have some misunderstanding about the peg split last time,
since I was saing that I don't have any cons related to iterating trought
triplets instead of nodes.

In other words, you would be fine with swamp_easier_iteration--benja, but not with swamp_easier--benja.

But having in the same beg that
find_ [IXA] things should be removed and replaced with findSubject etc. is about method naming but not about iterating triplets.

These are *not* in the swamp_easier--benja PEG, so what's the problem?

==========================================================================
An easier API for Swamp
==========================================================================
...
- Should we keep the current methods, and just add those
  proposed in this PEG? There is a lot of code using the
  current methods; we could just deprecate them for now.

  RESOLVED: No. The point is to *simplify* the API;
  adding more variants doesn't do that.

I don't agree. You are verbalic so you prefer subject, predicate and object. I think this as 'the first item in triplet ..hmm.. what was the name of it.' or 'the third item in triplet.. hmm.. argh, I always forgot what was it..'

Does having both variants really help?

I'd write code using subject, predicate, object, and you would write code using XAA, 1XA and 11X. You would not (easily) understand my code and I would not understand yours. Someone trying to get into Fenfire coding would be confronted with learning *two* systems rather than one.

First of all, we need a good way for iterating
through a set of triples. I propose the following
interface::

    for(TripleIter i = graph.get(_, RDF.type, _); t.loop();) {
        System.out.println(i.subj+" is instance of "+i.obj);
    }

After first read this raised for a question(more of this at the end of reply): What's wrong with java.util.Iterator interface? I don't like having different interface which does the same.

It cannot efficiently iterate through a set of triples, unless we keep an object for every triple anyway.

In Jython, the loop would look like this::

    t = graph.find(_, RDF.type, _)

    while t.loop():
        print "<%s> is instance of <%s>" % (t.sub, t.ob)

A bit different than in Java, but still recognizable.

with Iterator interface this would be something like:

iter = graph.find(_, RDF.type, _)

while iter.hasNext():
    # if mutable
    iter.next()
print "<%s> is instance of <%s>" % (iter.s, iter.o)

This does not work: Iterator does not have ``s`` and ``o``. In fact, Iterator is an interface; it does not have *any* fields.

    # if immutable
    triple = iter.next()
    print "<%s> is instance of <%s>" % (triple.s, triple.o)

This is inefficient, as it requires an object for every triple.

We'll make it a convention that classes using the API
have this at the top::

    static final Object _ = null;

You don't have to have this, but it makes things easier to read.

We have or we haven't. Reading both versions makes you crazy. We stay in one version and make a test which doesn't give a change
to write another way it.

Ok, we can 'strongly recommend' it.

Anyway, I don't think we have that much of swamp code to write with '_' way.

I don't understand this.

``ConstGraph``
--------------

The current methods for finding triples shall be removed
from ``ConstGraph`` and be replaced by the following API::

Currently there are no methods for finding triplets but nodes.

True. s/triples/nodes/.

How about contains method?

Should be added to this PEG.

    Object getSubject(Object subject, Object predicate, Object object)
        throws NotUniqueException;


I don't like having getSubject instead of find1_XII but I do like allow them to be there.

Hmm, these questions do raise in my head:
Why are you giving the subject which you are going to get?

Consider the following examples::

    getSubject(x, y);
    getObject(x, y);

What are ``x`` and ``y`` in each case? I find it hard to see which are which part of the triple. However, in this case::

    getSubject(_, x, y);
    getObject (x, y, _);

it's obvious spatially.

Alternative proposal: Remove ``getSubject`` and ``setSubject`` (we already don't have ``getPredicate`` and ``setPredicate`` as they're useless). Retain only::

    getObject(subject, predicate);
    setObject(subject, predicate, object);

which are the only ones that we expect to be used frequently. (They are, after all, only convenience functions for something that can be done with ``find()``.)

This would also mean that you wouldn't have to remember what subject, predicate, object is-- just that ``getObject`` is the one function that you want. :-) We could even shorten these to ``get(s,p)`` and ``set(s,p,o)`` then... this would be similar to the Collections API.

What's the pattern method?

I don't understand?

    Object getSubject_X1X(Object predicate) throws NotUniqueException;
    ...

Note: The reason for having ``subject`` as a parameter
for ``getSubject()`` is that it's easier to read. It will
almost always be "``_``" (i.e., ``null``). It shall work
consistently, though: If a subject is given, and there is
such a triple in the graph, return that subject; otherwise,
return ``null``.

There's no sense :)

I can explain the above here. It's what I mean by "easier to read" ;-)

Backing is generally used in the Collections API, and allows
for lighter implementations of the method. For example,
when using ``new TreeSet(graph.getSubjects(_, _, _))`` to get
a *sorted* set of all subjects in a graph, it would be quite
wasteful if ``getSubjects()`` created a ``HashSet`` only to have
it discarded after being used in the constructor of ``TreeSet``.

I don't see any good reasons to sort urns :)

That's because you have neither worked on serializing RDF, nor on Loom ;-)

- When serializing, we need to sort the triples (by URI), in order
  to minimize CVS diffs.
- In Loom, it's important that the order of nodes (on the wheel)
  is consistent. Having them re-order themselves when the internal
  hashtable is re-sorted just wouldn't do :-)

Also, you can use TreeSet for different orderings than by URN, e.g. sorting people by name or age. (You just have to write a special ``Comparator``.)

I believe this API will be substantially simpler to use
than the one we have at the moment, and not lose
anything w.r.t. speed. In fact, it may speed things up
in the future, because we can cache the ``TripleIter`` objects.

OTOH, I believe that we should add the more verbal methods in graphs but
not remove the current names. Creating aliases for methods would be fast enough.

The problem with having both is that users of the API need to learn more. Given that the goal is to make learning & using the API simpler, this isn't good :-)

    boolean loop() {
        if(hasNext()) {
            next();
            return true;
        } else {
            free();
            return false;
        }
    }


'loop' doesn't tell you anything what that does.
while t.loop() for (; t.loop();)

That's because it's munging unrelated things together in order to make a common pattern easier to use ;-)

How about  while triple.nextUntilFree():

Hm, "While next until free" is still nonsense :-), but ``nextUntilFree`` may be ok as a slightly more descriptive name.

And.. the java's Iterator is already broken because it's mutable

You mean "And TripleIter does not conform to Java's Iterator interface anyway"?

so having hasNext() to do what loop() does, wouldn't be that bad.

Yes it would: People who know ``Iterator`` would expect ``hasNext()`` to have the same effect as in ``Iterator``. Besides, ``hasXXX()`` suggests a function that *does* nothing, only returns true or false.

Hmm, public methods hasNext() and next() are not needed anymore if we have the ''loop()'' method.

Hm, I would like to be *able* to use ``TripleIters`` in other ways than the common loop pattern.

The purpose of ``loop()`` is to enable the common loop
pattern, ::

    for(TripleIter i = graph.find(...); i.loop();) {
        // ...
    }


Common loop is more like contiously, i.e.:
   while(true)   or  #define ever ;;   \n   for(ever) ..

Looping with 'for' means "iterate n".
That's easier with languages with operator overloading where you can say i[terator]++

I'm sorry, I have no idea what you're getting at here.

which would otherwise have to be written as::

    TripleIter i;
    for(i = graph.find(...); i.hasNext(); i.next()) {
        // ...
    }
    i.free();

This isn't just harder to read, it also scopes ``i``
wrongly.

=) I see it like, that way I definitely see what it does.

Yes, and I'm fine with allowing you to write it like this. But I also find "harder to read" and "scopes ``i`` wrongly" to be convincing reasons to do it the other way.

I definitely like the idea of iterating triples instead of nodes but I like the old [XIA] methods to remain in graph.

Ok, so now for the compromise.

You said that you mainly prefer [XIA] because you find it hard to remember what the subject, predicate and object are, and because [XIA] are positional rather than verbal.

The [XIA] methods being positional isn't actually the problem I have with them, I think. The problem is that when I look at ::

    findN_XA1(node);

then I cannot see immediately what the function does, but have to go and figure out "XA1". The alternative I propose is actually positional::

    find(_, _, node);

I think it's easier to read because I can immediately see that ``node`` is the object (or third-in-triple, if you prefer ;-)) and the other two can be anything.

Now, there are two things that introduce "verbality," i.e. refering to a component of a triple by name. These are a) the ``getXXX()`` family of methods::

    getSubject(), getObject()
    getSubjects(), getPredicates(), getObjects()
    setSubject(), setObject()

and secondly, the fields in ``TripleIter``::

    iter.subj
    iter.pred
    iter.obj

Now, as I said above, we can eliminate ``getSubject()`` and ``setSubject()``, and use only ``getObject()`` and ``setObject()``, or even simply ``get()`` and ``set()`` (requiring you only to remember that these are the functions you mostly need ;-)).

The set functions, ``getSubjects()`` and so on, are not used as often. I think (hope) it may be ok with you to have to figure these out in the cases where they are used.

For the second, I prefer ``iter.subj`` etc., but we *could* use ::

    iter.xoo
    iter.oxo
    iter.oox

which would use the position.

For example, my types example would become::

    for(TripleIter i = graph.find(_, RDF.type, _); i.loop();) {
        System.out.println(i.xoo + " is instance of " + i.oox);
    }

What I do *not* want is having *both* ``oxo`` and ``pred``, because that would mean users have to understand both in order to be able to read our code.

What do people think of this compromise?

- Benja





reply via email to

[Prev in Thread] Current Thread [Next in Thread]