emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: raw-byte and char-table


From: MON KEY
Subject: Re: raw-byte and char-table
Date: Thu, 26 Aug 2010 01:30:11 -0400

On Wed, Aug 25, 2010 at 11:34 PM, Kenichi Handa <address@hidden> wrote:
> In article <address@hidden>, MON KEY <address@hidden> writes:
>
>> > Number like #x3FFFA0 is so criptic.  The function name
>> > unibyte-char-to-multibyte is also not ideal, but I think
>> > it's better than #x3FFFA0.
>
>> Maybe I am misunderstanding, but I think the `#x' and `#o' syntax is
>> not cryptic at all in the context.
>
> I'm not arguing that the syntax is cryptic.  What I want to
> say is that it is difficult for one who reads the code to
> understand what #x3FFFA0 means.

So the syntax aren't the problem its their semantic denotation.
This is the realm of Tarski and McDermott[1].

Regardless, right now it is all confusing (esp. for those of us less
inclined to differentiating the multibyte/unibyte distinction).

>
>> This signals an error:
>>  (unibyte-char-to-multibyte
>>   (unibyte-char-to-multibyte 160))
>
> Yes, but is it a problem?

I would urge that it is a problem wherever the numerical denotation
has no visible/nameable/printable corollary.

Why should it be allowed to be problem if it can be avoided?

>
>> > We could provide a ?\NNN (or similar) notation for it.  Similarly to
>> > what we do for those bytes in multibyte strings.
>
>> Howsabout just this one for all of them:
>
>>  `#\'
>
> Do you mean that making #\240 to be read as #x3FFFA0?
>

> Do you mean that making #\240 to be read as #x3FFFA0?

Half-jokingly, Yes.

(assuming the #\240 above is the the code-point 0xA0)

Though, I _also_ had these things in mind as well:

#\8-bit-240

or

#\byte-240

Which would allow referencing these chars by something other than a
numeric id.

E.g. in some other dialects of Lisp there is this type of behaviour:

CL-USER> #\     ;<-that's a #x9 after the \
;=> #\Tab

CL-USER> #\ ;<- that's a #xa after the \
;=>
;  #\Newline

CL-USER> #\NO-BREAK_SPACE ;<-that's the char-name for #xa0
;=> #\NO-BREAK_SPACE      ;<-return is as per `identity'

CL-USER> (identity #\NO-BREAK_SPACE)
;=> #\NO-BREAK_SPACE

CL-USER> (princ #\ )
;=>
;  #\NO-BREAK_SPACE

CL-USER> (prin1 #\ )
;=> #\NO-BREAK_SPACE
;   #\NO-BREAK_SPACE

CL-USER> #\ ;<- That's a #x20 after the \
;=> #\

CL-USER> (char-code #\ )
32

CL-USER> (describe #\ )
;=> #\
;  [standard-char]
;
;  :_Char-code: 32
;  :_Char-name: Space
;  _

The idea being that where those chars in the above example don't have
visibly "printable" representations but the `#\' reader syntax _does_
recognize them either by char-name or a readable identity, e.g.:

CL-USER> (read-char)

;=> #\Ack

Of course, introduction of this type of read syntax to Emacs lisp
would (or at least it should) imply extension to all characters
unibyte and multibyte...

Hence the ":)" smiley in my previous response to Stefan.


[1] McDermott, Drew (1978). Tarskian semantics, or no notation without
    denotation. Cognitive Science 2:277-82.

--
/s_P\



reply via email to

[Prev in Thread] Current Thread [Next in Thread]