guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Converting a part of byte vector to UTF-8 string


From: Mark H Weaver
Subject: Re: Converting a part of byte vector to UTF-8 string
Date: Wed, 15 Jan 2014 13:29:55 -0500
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux)

Panicz Maciej Godek <address@hidden> writes:

> Your solution seems reasonable, but I have found another way, which
> lead me to some new problems.
> I realised that since sockets are ports in guile, I could process them
> with the plain "read" (which is what I have been using them for
> anyway).
>
> However, this approach caused some new problems. The thing is that if
> I'm trying to read some message from port, and that message does not
> end with a delimiter (like a whitespace or a balancing, closing
> parenthesis), then the read would wait forever, possibly gluing its
> arguments.
>
> The solution I came up with is through soft ports. The idea is to have
> a port proxy, that -- if it would block -- would return an eof-object
> instead.

This is terribly inefficient, and also not robust.  Guile's native soft
ports do not support efficient reading, because everything is one
character at a time.  Also, Guile's 'char-ready?' currently does the job
of 'u8-ready?', i.e. it only checks if a _byte_ is available, not a
whole character, so the 'read-char' might still block.  Anyway, if this
is a socket, what if the data isn't available simply because of network
latency?  Then you'll generate a spurious EOF.


To offer my own answer to your original question: R7RS-small provides an
API that does precisely what you asked for.  Its 'utf8->string'
procedure accepts optional 'start' and 'end' byte positions.  I
implemented this on the 'r7rs-wip' branch of Guile git as follows:

http://git.savannah.gnu.org/gitweb/?p=guile.git;a=blob;f=module/scheme/base.scm;h=f110d4c2b241ec0941b4223cece05c309db5308a;hb=r7rs-wip#l327

  (import (rename (rnrs bytevectors)
                  (utf8->string      r6rs-utf8->string)
                  (string->utf8      r6rs-string->utf8)
                  (bytevector-copy   r6rs-bytevector-copy)
                  (bytevector-copy!  r6rs-bytevector-copy!)))

  [...]

  (define bytevector-copy
    (case-lambda
      ((bv)
       (r6rs-bytevector-copy bv))
      ((bv start)
       (let* ((len (- (bytevector-length bv) start))
              (result (make-bytevector len)))
         (r6rs-bytevector-copy! bv start result 0 len)
         result))
      ((bv start end)
       (let* ((len (- end start))
              (result (make-bytevector len)))
         (r6rs-bytevector-copy! bv start result 0 len)
         result))))

  (define utf8->string
    (case-lambda
      ((bv) (r6rs-utf8->string bv))
      ((bv start)
       (r6rs-utf8->string (bytevector-copy bv start)))
      ((bv start end)
       (r6rs-utf8->string (bytevector-copy bv start end)))))



reply via email to

[Prev in Thread] Current Thread [Next in Thread]