[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Reimplemented URLEncoder/Decoder

From: Mark Wielaard
Subject: Re: Reimplemented URLEncoder/Decoder
Date: Mon, 8 Oct 2001 22:54:06 +0200
User-agent: Mutt/1.3.22i


On Sun, Oct 07, 2001 at 05:57:25PM -0600, Tom Tromey wrote:
> >>>>> "Mark" == Mark Wielaard <address@hidden> writes:
> Mark>   * where XX is the hexadecimal representation of that character.  Note
> Mark>   * that since unicode characters are 16 bits, and this method encodes 
> only
> Mark>   * 8 bits of information, the lower 8 bits of the character are used.
> This part of the comment is now wrong.
Yes. I forgot to update the class javadoc. Fixed.

> Mark>             try
> Mark>               {
> Mark>                 bytes[index] = (byte)Integer.parseInt(sub, 16);
> Mark>                 index++;
> Mark>               }
> Mark>             catch (NumberFormatException nfe)
> Mark>               {
> Mark>                 // Ignore badly encoded char
> Mark>               }
> The 1.4 docs say that the implementation can either leave illegal
> characters alone or it can throw IllegalArgumentException, but this
> piece of code seems to do neither.  Maybe I'm misunderstanding the
> docs?  Or maybe we should throw IllegalArgumentException here?
I thought that leave alone might also mean skip.
But you do have a point since we already allow 'unsafe' characters into
the decoded result string. So it might make more sense to also add malformed
hex decodings to the result and not just silently skip them.

I have moved the try catch block over the complete while statement so that
a NumberFormatException ends the while loop. Then after I have decoded all
the chars I explicitly check if there is still a % left in which case the
% char is added to te result and decoding resumes at the char just after
that % char.

The javadoc for decode now has the following paragraph added to explain

  * This implementation will decode the string even if it contains
  * unsafe characters (characters that should have been encoded) or if the
  * two characters following a % do not represent a hex encoded byte.
  * In those cases the unsafe character or the % character will be added
  * verbatim to the decoded result.

Thanks for your review.


Stuff to read:
  What's Wrong with Copy Protection, by John Gilmore

reply via email to

[Prev in Thread] Current Thread [Next in Thread]