Re: bug#23750: 25.0.95; bug in url-retrieve or json.el

Eli Zaretskii <address@hidden> schrieb am Mi., 28. Dez. 2016 um 19:28 Uhr:

> From: Philipp Stephani <address@hidden>
> Date: Wed, 28 Dec 2016 18:09:52 +0000
> Cc: address@hidden, address@hidden, address@hidden,
> address@hidden
>
>
> [1:text/plain Show]
>
>
> [2:text/html Hide Save:noname (9kB)]
>
> Eli Zaretskii <address@hidden> schrieb am Mi., 30. Nov. 2016 um 19:45 Uhr:
>
> > From: Philipp Stephani <address@hidden>
> > Date: Wed, 30 Nov 2016 18:23:14 +0000
> > Cc: address@hidden, address@hidden, address@hidden
> >
> > > Yes, this is not a json.el problem at all. It does the correct thing,
> > > and shouldn't be changed.
> >
> > ??? Why should any code care whether a pure-ASCII string is marked as
> > unibyte or as multibyte? Both are "correct".
> >
> > I guess the problem is that process-send-string cares. If it didn't, we wouldn't have the problem.
>
> I don't think I follow. The error we are talking about is signaled
> from url-http-create-request, not from process-send-string.
>
> Yes, but url-http-create-request only cares about unibyte strings because the request it creates is passed to
> process-send-string, which special-cases unibyte strings.

How do you see that process-send-string special-cases unibyte strings?

The send_process function has two branches, one for unibyte, one for multibyte.

> > For URL, we'd need functions like
> > (byte-array-length s) = (length (string-to-unibyte s))
>
> Why do you need this? string-to-unibyte is well-defined only for
> unibyte or ASCII strings (if we forget the raw bytes for a moment), so
> length will do.
>
> We need it because we have to send the byte length in a header. We can't just use (length s) because it
> would silently give a wrong result.

We are miscommunicating. string-to-unibyte can only meaningfully be
called on a pure-ASCII string, and for pure-ASCII strings 'length'
will count bytes. So I see no need for 'byte-array-length' if its
implementation is as you indicated.

That depends on how you want to represent byte arrays/octet streams in Emacs. If you want to represent them using unibyte strings, then you indeed only need `length'. But some earlier messages sounded like you wanted to represent byte arrays either using unibyte strings or byte-only multibyte strings. In that case `string-to-unibyte' is necessary.

> > (process-send-bytes s) = (process-send-string (string-to-unibyte s))
>
> Why is this needed? process-send-string already encodes its argument,
> which produces a unibyte string.
>
> We can't give a multibyte string to process-send-string, because we have to pass the length in bytes in a
> header first. Therefore we have to encode any string before passing it to process-send-string.

Once you encoded the string, why do you need anything except calling
process-send-string?

The byte size should be added as a Content-length HTTP header. If url-request-data is a unibyte string, that's not a problem (except for the newline conversion behavior in send_string), you can just use `length'. But if it's a multibyte string, you need to encode first to find the byte length.

From:	Philipp Stephani
Subject:	Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
Date:	Wed, 28 Dec 2016 18:35:58 +0000