On 11 Mar 2013, at 08:39, Luboš Doležel <address@hidden> wrote:
Sure. This is what CFString's struct looks like now (I replaced
internal types with what everyone understands):
stuct __CFString
{
void* isa; // for ObjC support
int16_t typeID; // CF type ID - we need to make CFString's typeID
constant forever (now it could randomly change)
struct
{
int16_t ro:1; // always set to 1 for constant strings
int16_t reserved:7;
int16_t info:8; // probably only "hasnull" (0x10) for
constant strings
};
void* data;
uint32_t length;
uintptr_t hashCode;
};
Is this layout something you'd accept as the next constant NSString?
I'd recommend that we have a couple of bits in the flags set for
encoding, to allow ASCII or UTF-8 (ASCII implies no multibyte
characters, so character-index lookups are easier). If we're going
to
have to live with this for a long time, then it might be interesting
for variable-length encodings to have a map from character indexes to
byte indexes.
I'd rather not have a C bitfield in the definition, because its
layout is very ABI-specific: An int16_t with fixed bit definitions is
much easier to work with.
The other question is how we initialise the isa pointer. With recent
libobjc2 / clang, class pointers are exposed as public symbols, so we
could just make a weak reference to the class, but that would break
if
GNUstep-base is compiled with gcc.
Alternatively, we could create the normal call to __objc_exec_class()
in the constructor, but make __objc_exec_class weak and only call it
if it is non-zero. This would impose a small startup cost on every
compilation unit containing CF strings.
Additionally, for short ASCII strings, I'd like to make clang emit
GSTinyString instances (strings hidden in pointers) on 64-bit
platforms. Do you think that this would be a problem for CF?