octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Built-in base2dec and dec2base


From: Daniel J Sebald
Subject: Built-in base2dec and dec2base
Date: Sun, 29 Jul 2012 12:46:03 -0500
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.24) Gecko/20111108 Fedora/3.1.16-1.fc14 Thunderbird/3.1.16

Rik,

I've done a first pass of built-in versions of base2dec() and dec2base(), and I put a patch in the features portion of SourceForge here:

https://sourceforge.net/tracker/?func=detail&aid=3551454&group_id=2888&atid=352888

Although the code is fairly well organized and functional, I'm just becoming familiar with the classes and conventions for return values, and there will need to be another pass to get the variable types correct. A particular snag right now is accessing the octave values as long ints as opposed to doubles. This became evident from the smart hunk of test code:

% test
s0 = "";
for n = 1:13
  for b = 2:16
    pp = dec2base (b^n+1, b);
    assert (dec2base (b^n, b), ['1',s0,'0']);
    assert (dec2base (b^n+1, b), ['1',s0,'1']);
  endfor
  s0 = [s0,'0'];
endfor

which originally failed with a combination of n and b corresponding to 2^24, i.e., the limit of the float mantissa. I understand the classes, but it is a case of finding the right member functions to do the trick.

Also, there is a slight amount of code duplication I'd like to reduce. I'd like to experiment with strings as well. Somehow it seems like there is a slight performance loss due to using the strings class, but I may find it not worth optimizing that portion of things.

However, I will need to set this aside for a couple weeks. If you would like to tweak some things and put a new patch on SourceForge, feel free.

OK, so here are some performance results using a 10e6 size vector (strings or numbers, depending upon base2dec vs. dec2base). The numbers are seconds of CPU consumption.

COMMAND                      BUILT-IN        CURRENT
                             VERSION         SCRIPT
                                             VERSION
_______                      ________        _______

bin2dec(<char matrix>)       0.13398         0.55891
bin2dec(<cell vector>)       0.21897         1.7447
hex2dec(<char matrix>)       0.14298         0.55192
hex2dec(<cell vector>)       0.22097         1.7277
base2dec(<char mat>, '01')   0.19697         0.52792
base2dec(<cell vec>, '01')   0.28396         1.7387

dec2bin(<int vector>)        0.22697         1.1598
dec2bin(<cell vector>)       0.23996         3.5465
dec2hex(<int vector>)        0.11998         0.30195
dec2hex(<cell vector>)       0.13898         2.7226
dec2base(<int vec>, '01')    0.22497         1.1548
dec2base(<cell vec>, '01')   0.23996         3.5535

Furthermore, here are some related times:

cellstr(<char matrix>)   0.86387
num2cell(<int vector>)   0.062990

Some observations:

1) There is roughly three times improvement at minimum. In the case of cells, the built-in version is cooking with gas.

2) In theory the string-based version of base2dec should be fastest because there are no ASCII tests to deal with. But I think the string classes are a bit slower than the raw C strings. This could be optimized further, but it is to the point where base2dec is so fast that its cost is small compared to other string manipulations.

3) The builtin version brings the times down such that cellstr() stands out as a critical time. It does no processing and is five times slower than base2dec() which does processing. I'm going to look into cellstr() at a later time. (I see now how "is_cellstr" checks the cache, as mentioned at OctConf 2012.)

Questions (and I know the answer is "compatibility"):

1) In dec2base, if the input is a cell array, the output is a character matrix. I would think that the string cell array would be preferred without the zero padding in front... unless LEN is set.

2) In base2dec, if the input is a cell array row, the output is a cell array column. My first inclination was to make the dimensions match (until I tried the test case that checks this at which point I changed dimensions). To me, it seems the logical thing to do with a cell array is keep the output dimensions the same as the input dimensions. That is sort of the point of cells.

Dan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]