octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Octave] Improving Octave for large files


From: Christian Brædstrup
Subject: Re: [Octave] Improving Octave for large files
Date: Wed, 11 Nov 2009 13:38:39 +0100

Okay, I didn't know the feature request was that old.
When the code is so untested then perhaps the best thing to do is to create some large files and see how Octave handes them to begin with.

As far as I have been told TPIE sorts all the data from the input file and then only access the data it needs after the sort (to save memory space). The reason I suggested the library is because I know it is actively developed at a university level and that the group uses it to handle very very large sets of satelite data to do 3D terrain mapping. But if using the library involves rewriting a lot of good code it would be foolish to use it.

On Wed, Nov 11, 2009 at 12:51 AM, David Bateman <address@hidden> wrote:
Christian Brædstrup wrote:
I have
been looking in the PROJECTS file in the source and wanted to hear if anyone
is working on the problem with large files that Juhana K. Kouhia talks about
(I couldn't find any code in the src/load-save.cc file to indicate that)? I
have a friend working on the TPIE library (
http://www.madalgo.au.dk/Trac-tpie/) and thought it would fit nicely into
the octave source. Does anyone have any concerns about including the TPIE
library or any comments about how best to add the functionality.

 

That idea was proposed in 1994

http://old.nabble.com/Octave-question-to9226868.html#a9226868

and things have perhaps moved a bit since. I'd say the large file issues now are two fold

1) Data sets with more elements that 2^31 due to 64-bit indexing. The ability to handle such datasets is in Octave but poorly tested. The loading and saving of files for such datasets is not however tested though the HDF5 formats should be able to handle this
2) Large data sets tend to go hand in hand with large computational problems, and the parallelisation and distribution of a database across many nodes could be improved

I'm sorry I don't know really what TPIE was to offer, but if as I suspect it defers reading data from a file till its needed. In this case to integrate TPIE probably means implementing user types from the ground up (right down to a reimplementation of the Array class. Is the benefit worth the cost?

D.

--

David Bateman                                address@hidden
35 rue Gambetta                              +33 1 46 04 02 18 (Home)
92100 Boulogne-Billancourt FRANCE            +33 6 72 01 06 33 (Mob)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]