[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: raw-byte and char-table
From: |
MON KEY |
Subject: |
Re: raw-byte and char-table |
Date: |
Thu, 26 Aug 2010 01:30:11 -0400 |
On Wed, Aug 25, 2010 at 11:34 PM, Kenichi Handa <address@hidden> wrote:
> In article <address@hidden>, MON KEY <address@hidden> writes:
>
>> > Number like #x3FFFA0 is so criptic. The function name
>> > unibyte-char-to-multibyte is also not ideal, but I think
>> > it's better than #x3FFFA0.
>
>> Maybe I am misunderstanding, but I think the `#x' and `#o' syntax is
>> not cryptic at all in the context.
>
> I'm not arguing that the syntax is cryptic. What I want to
> say is that it is difficult for one who reads the code to
> understand what #x3FFFA0 means.
So the syntax aren't the problem its their semantic denotation.
This is the realm of Tarski and McDermott[1].
Regardless, right now it is all confusing (esp. for those of us less
inclined to differentiating the multibyte/unibyte distinction).
>
>> This signals an error:
>> (unibyte-char-to-multibyte
>> (unibyte-char-to-multibyte 160))
>
> Yes, but is it a problem?
I would urge that it is a problem wherever the numerical denotation
has no visible/nameable/printable corollary.
Why should it be allowed to be problem if it can be avoided?
>
>> > We could provide a ?\NNN (or similar) notation for it. Similarly to
>> > what we do for those bytes in multibyte strings.
>
>> Howsabout just this one for all of them:
>
>> `#\'
>
> Do you mean that making #\240 to be read as #x3FFFA0?
>
> Do you mean that making #\240 to be read as #x3FFFA0?
Half-jokingly, Yes.
(assuming the #\240 above is the the code-point 0xA0)
Though, I _also_ had these things in mind as well:
#\8-bit-240
or
#\byte-240
Which would allow referencing these chars by something other than a
numeric id.
E.g. in some other dialects of Lisp there is this type of behaviour:
CL-USER> #\ ;<-that's a #x9 after the \
;=> #\Tab
CL-USER> #\ ;<- that's a #xa after the \
;=>
; #\Newline
CL-USER> #\NO-BREAK_SPACE ;<-that's the char-name for #xa0
;=> #\NO-BREAK_SPACE ;<-return is as per `identity'
CL-USER> (identity #\NO-BREAK_SPACE)
;=> #\NO-BREAK_SPACE
CL-USER> (princ #\ )
;=>
; #\NO-BREAK_SPACE
CL-USER> (prin1 #\ )
;=> #\NO-BREAK_SPACE
; #\NO-BREAK_SPACE
CL-USER> #\ ;<- That's a #x20 after the \
;=> #\
CL-USER> (char-code #\ )
32
CL-USER> (describe #\ )
;=> #\
; [standard-char]
;
; :_Char-code: 32
; :_Char-name: Space
; _
The idea being that where those chars in the above example don't have
visibly "printable" representations but the `#\' reader syntax _does_
recognize them either by char-name or a readable identity, e.g.:
CL-USER> (read-char)
;=> #\Ack
Of course, introduction of this type of read syntax to Emacs lisp
would (or at least it should) imply extension to all characters
unibyte and multibyte...
Hence the ":)" smiley in my previous response to Stefan.
[1] McDermott, Drew (1978). Tarskian semantics, or no notation without
denotation. Cognitive Science 2:277-82.
--
/s_P\