guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Custom streams 10x faster than srfi-41


From: Amirouche Boubekki
Subject: Re: Custom streams 10x faster than srfi-41
Date: Mon, 05 Sep 2016 22:48:59 +0200
User-agent: Roundcube Webmail/1.1.2

On 2016-09-05 21:14, Jeremy Steward wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hey Amirouche,

| (define (lazy-iota count) (let loop ((i 0)) (lambda () (unless (eq?
| i count) (cons i (loop (1+ i)))))))
|
| (define iota3 (lazy-iota 3))
|
| (pk (iota3))
|
| The first time `lazy-iota' is called it returns a procedure that is
| bound to `count'. Then you can call that procedure to get a pair
| with the first value and a procedure which will allow to retrieve
| the rest.

This seems very similar to SRFI 121 (generators) and SRFI 127 (Lazy
Sequences).

Thanks for the pointer.

Is there something particularly different about this implementation?

Values generated by traversi-map are chained which means what actually
travels in the stream is the history list of values with the first
value being the last created value. It allows to backtrack to previous
values. (This is not the case of lazy-iota)

I can't find it anymore in Tinkerpop documentation [0] so maybe it's
not a useful feature.

[0] http://tinkerpop.apache.org/docs/current/reference/#traversal

I'll explain how I use it nonetheless and the reasons I used to do so
in the context of graph traversal.

Usually what happens in graph traversal is that you start with one or
several vertex do some traversal/navigation or other map operation and
filter based on some heuristic and want to find out the original
vertex that lead to the current result you have, that's where
"backtracking" happens. The thing is that you could do the traversal
in reverse order to retrieve the original vertex that's why it might
not be as useful as I was thinking.

The thing is that my graph database library never fully load vertices
and edges into RAM when it traverse a graph. The current value in the
traversi is never more than a proper list. So it can be a string, an
integer or a list of.

For instance if one can use traversi to implement item-item
recommendations.

Draw a graph like the following:

u1 address@hidden> m1

u2 address@hidden> m1
u2 address@hidden> m2
u2 address@hidden> m3

u3 address@hidden> m1
u3 address@hidden> m4

u4 address@hidden> m1
u4 address@hidden> m5

What does the item-item recommendation do, is search for user that
rated high some movie and look what other movie they rated high.

For instance, we can do the following traversal starting with movie
m1:

m1 <address@hidden u1,u2,u3 address@hidden> m1,m2,m4

I won't explain how everything happens it's better explained
elsewhere [1] (in french here [2])

[1] https://markorodriguez.com/2011/09/22/a-graph-based-movie-recommender-engine/
[2] http://hyperdev.fr/notes/recommendations-basees-sur-les-graphes.html

Basically the traversal starts at m1, ask for incomings edges that
are likes which have a score of 1. If edges were fully loaded into
memory I could simply use a filter proc like

(lambda (edge)
  (and (eqv? (assoc-ref edge 'label) 'likes)
       (eq? (assoc-ref edge 'score) 1)))

But instead I have a procedure which fetch a single field.

So the flow of the traversi program is:

- m1
- fetch incomings edges [*]
- fetch 'label for current value
- filter: if value == 'likes
- backtrack to [*]
- fetch 'score for current value
- filter: if value == 1
- backtrack to [*]

Like I said it's an optimisation trick, maybe it's not worthwhile,
maybe it's evil.

I've seen similar performance improvements in CHICKEN Scheme using
SRFI 121 and 127 over SRFI 41 (streams).

Cheers,
- -- Jeremy Steward



Thanks for giving me the opportunity to re-think it.

--
Amirouche ~ amz3 ~ http://www.hyperdev.fr




reply via email to

[Prev in Thread] Current Thread [Next in Thread]