guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: null terminated strings


From: Ken Anderson
Subject: Re: null terminated strings
Date: Mon, 19 Jan 2004 14:16:44 -0500

At 10:46 AM 1/19/2004 -0800, Per Bothner wrote:
>Ken Anderson wrote:
>
>>    In Java, which does copy-on-write
>
>String (including substrings) are immutable, so they cannot be written.
>The implementation of the StringBuffer class does do copy-on-write, but
>that doesn't affect substrings.
>
>>i often find myself  carefully copying the substrings so they don't share 
>>structure.
>
>Why?  The only reason I can think of is garbage collection:  A shared
>substring prevents the base from being collected.

Yes.  Say you do something like (this is JScheme):
> (define text "foo bar")
"foo bar"
> (define r (StringReader. text))
address@hidden
> (define b (BufferedReader. r))
address@hidden
> (define line (.readLine b))
"foo bar"
> (define a (.substring line 0 3))
"foo"
> (define b (.substring line 4))
"bar"
> (describe a)
foo
 is an instance of java.lang.String

  // from java.lang.String
  value: address@hidden
  offset: 0
  count: 3
  hash: 0
()
> (describe b)
bar
 is an instance of java.lang.String

  // from java.lang.String
  value: address@hidden
  offset: 4
  count: 3
  hash: 0
()
> (vector-length (.value$# a))
80

a and b share the same char[] of size 80, which wastes a lot of space in this 
case. (80 is the default string buffer size in BufferedReader).


>>This is because of things like:
>>- i don't know how long the underlying string (char array actuall) is.
>
>So?

So you don't know how much space your line is taking up.

>>Java only has one kind of string, which is fairly heavy weight.  For example, 
>>the string "" takes 36 bytes:
>>
>>>(describe "")
>> is an instance of java.lang.String
>>  // from java.lang.String
>>  value: address@hidden
>>  offset: 0
>>  count: 0
>>  hash: 0
>
>This depends on the implementation, and the version of the
>implementation.
>
>GCJ uses for "":
>  object header (4 bytes on 32-but systems)
>  private Object data; /* points to itself in this case */
>  private int boffset; /* offset of first char within data */
>  int count; /* number of character */
>  private int cachedHashCode;
>  /* chars follow if data==this */
>(The data and boffset fields are only accessed by native C++ code.)
>
>Total 20 bytes.

Much better. 





reply via email to

[Prev in Thread] Current Thread [Next in Thread]