bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Storing NUL in variables


From: George
Subject: Re: Storing NUL in variables
Date: Sun, 11 Jun 2017 00:11:13 -0400

On Sat, 2017-06-10 at 19:33 +0300, Pierre Gaston wrote:
> 
> 
> > > 
> > One option that might make a feature like this integrate into the shell 
> > better would be to store a captured byte stream as an integer array rather
> > than as an atomic variable. The back-end implementation in this case could 
> > be very efficient, and the stored data would be manipulable using
> > existing array syntax. The main limitation perhaps would be that one could 
> > not create an array of these arrays.
> > 
> Without too much thinking about it, I'd  propose something like this:
> 
> - extend readarray (or maybe provide another builtin)  to read bytes with an 
> interface like the one of dd (block size, offset, skip) and store the
> bytes in the array. eg:  readarray -b bs=1024 cs=100 byte_array 
> 
> - provide another builtin to write the array to a fd, eg "write bs=1024 
> cs=100 byte_array" (I don't really see a good way to extend printf or echo
> for this) 
> 
> -  setting an array directly would store the bytes eg a[0]=0 would put a null 
> bye at the first index
> 
> -  conversely ${a[0]} would expend to "0"
> 
> - I think the current array could even be use, but that would not be very 
> efficient, and there is the question of what to do with sparse arrays
> 
> - I think I would like some efficient way to copy range of bytes from one 
> array to the other, maybe this could be done reusing the above "write
> builtin" like: write offset=100 seek=50 -v dest_array -s source array 
> 
> 
> I think that could even been done with loadables builtins, making all the 
> byte arrays "special_variables"
> 
> My 2 cents....of course I probably should have checked what zsh does as I 
> think it supports nullbytes in variable.
The talk of special variables brings me back to a previous feature request of 
mine: the ability to associate a data payload with a dynamic variable,
and a callback to allow the dynamic variable implementation to clean up that 
data when the variable is unset. (I am planning to write a patch to
provide this functionality - though I don't know if it will be accepted.)
The lack of that functionality stands in the way of implementing an efficient 
byte stream buffer as a dynamic variable using a "loadable builtin"
module. "Dynamic var" implementations have callbacks that may be used to get 
and set variable value, but nowhere to store state data associated with
the variable. This works fine for the dynamic vars that are defined by the 
shell, because each variable has its own implementation, and each
implementation is used for just one variable. There's a 1:1 relationship 
between dynamic variables and dynamic variable implementations, so dynamic
variable implementations can store their state data in global variables. 
Implementing something like a byte stream buffer as a dynamic would require
the ability to provide one backing implementation that could be used for 
multiple variables. As it stands, the only place you can store data in a
variable is in its "value" field - which must contain the textual 
representation of the variable's value.
(A compact byte-array type could be stored in a flat buffer, of course, so a 
dynamic var implementation would require the "dynamic variable data
pointer", but not the "dynamic variable unset callback" if it's assumed that 
unsetting a variable would also de-allocate the buffer indicated by the
data pointer: However in more general application of the "dynamic variable" 
functionality it would be useful to be able to use other data structures
to store the state of the variable: things like linked data structures. So for 
those applications an "unset callback" would be useful. An "unset
callback" would also allow things like using variable lifetimes to control 
other resources, like file descriptors, temporary files, or child
processes.)
I had initially thought the implementation of a byte-stream array could be 
similar to the existing implementation of an integer array. But (from what
I can tell) the implementation of an integer array is just the implementation 
of a string array, but with rules added to constrain the array elements
to numeric strings. (i.e. when assigning a value to such a variable, first 
evaluate the new value as a numeric expression to get an integer value for
the variable, then convert the numeric value to a string and store that. See 
make_variable_value() in variables.c, for instance.) So there would be no
value in creating a separate "byte array" type unless it either provided a 
significant performance advantage (by storing data internally as a byte
array, rather than a text-encoded byte array), or significantly more favorable 
semantics (for instance, expressing the bytes as hex rather than
decimal)
Unfortunately, that may torpedo the whole idea of implementing a type like this 
as a dynamic array var in Bash: even with a "dynamic var"
implementation of a byte array, compactly storing the byte stream in a buffer 
somewhere, as soon as anyone tries to treat the data structure as a
conventional array, the dynamic variable "get value" function is expected to 
populate the whole variable array structure with a textual representation
of the byte values, negating the storage-saving benefit of the compact 
byte-array representation...  At best, you could read binary data in and write
it back out again without triggering the conversion...  But as soon as the 
conversion happens, the whole venture becomes somewhat pointless. To get
around that one would require changing the dynamic var implementation to allow 
for dynamic vars whose text value is generated on-demand and not stored
(which probably complicates applying other logic down the road - like the 
different forms of parameter expansion for instance, all of which presently
could just operate on that variable "value" field and remain blissfully 
ignorant of the fact that it's a dynamic variable)  The only way to retain the
benefits of the compact representation (in the current version of Bash) would 
be to make the dynamic variable a "black box", and use built-in commands
to manipulate the data or extract other representations. (And if it's a black 
box, there's no point making it an "array")
As for reading and writing binary data, I think at this point my inclination 
would be to provide new commands to do it rather than further overloading
"read" and "printf". That would be necessary anyway if the byte stream 
functionality were provided as a loadable "builtin" library.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]