octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [GSoC 2021] How should I do now with project Table datatype


From: Andrew Janke
Subject: Re: [GSoC 2021] How should I do now with project Table datatype
Date: Tue, 9 Mar 2021 03:53:08 -0500
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:78.0) Gecko/20100101 Thunderbird/78.8.0


On 3/8/21 11:13 PM, Kai Torben Ohlhus wrote:
On 3/8/21 6:51 PM, 陈栋林 wrote:
I have seen that you are the potential mentors in the project Table datatype. How should I do now with this project for applying gsoc? How can I make my first contribution? Thank you

Thank you for your interest in GSoC with Octave.  Yes, I am willing to mentor a project on creating a Matlab compatible table datatype [1].

I think tomorrow the mentoring organizations will be announced by Google.  If Octave is chosen (of course you are always free to work on this project outside GSoC as well), you can familiarize yourself with the existing codes and improve them (a bit).  An "easy" potential starting point to show your Octave coding skills is creating some BIST [2] for "octave-tablicious" [3], for example.  That means create a fork of "octave-tablicious" or start a new Octave package [4] copying a subset of that project.

Two more things if you prefer to communicate via email:
1. Please keep the Octave maintainers mailing-list in the CC and add a subject prefix "[GSoC]" or "[GSoC 2021]".
2. Please answer below the previous post ("bottom-posting").

Kai

[1] https://wiki.octave.org/Summer_of_Code_-_Getting_Started#Table_datatype
[2] https://wiki.octave.org/Tests
[3] https://github.com/apjanke/octave-tablicious/issues/30
[4] https://github.com/gnu-octave/pkg-example


Hi, 陈栋林!

I'm Andrew Janke, the author of octave-tablicious. I would be happy to accept PRs for BISTs to Tablicious, and to help you get an adaptation of its Table code or similar in to core Octave, to the extent that I have time. You are of course also welcome to just take its code and use it in a separate project.

Tablicious isn't an official GNU Octave project, and I don't have much free time, so I wouldn't be an official GSoC mentor. But I'll help out as time permits. Feel free to Cc me if you have questions about Tablicious and I'll try to answer them (probably in the form of adding documentation to the package).

For what it's worth, I think this is a good idea for a GSoC project. Tables (or "dataframes") are an important part of modern Matlab, Python, and R coding; it would be nice to see them readily available in Octave.

Please note that much of Tablicious's Table logic depends on a special trick called "proxy keys" that I came up with for doing efficient matching on multiple mixed-type columns (for use in operations such as joins and membership tests). I think it's a good idea, but I don't know if the core Octave maintainers agree; you might need to come up with alternate matching logic if they're not a fan of it.

Sorry for not chiming in about this earlier when y'all were setting up the GSoC stuff. The Wiki page says the project goal is to "define an initial subset of table functions, which involve sorting, splitting, merging, and file I/O". I'd suggest that rather than working on one or two functions at a time, the project focus on choosing an overall underlying data model or API for the Table data structure (that is, deciding whether you want to use "proxy keys" or some other approach), because almost all Table operations (besides I/O) are going to naturally be built on top of that data model: mixed-type multi-column equivalence and order testing is not something that is supported by other existing Octave operations, so you need to decide how you're fundamentally going to deal with that. And almost all Table operations really boil down to variations of mixed-type multi-column equivalence or sorting. (I'd also suggest that whatever model you decide, it should be formally defined in terms of M-code-level operations or functions, so that user-defined classes and new Octave types can be readily supported by the Table array type. For example, Tablicious's "proxy keys" model is defined in terms of the unique(), sort(), and eq() functions on the types in table columns. (With a special exception for eq() for cell columns. (Yuck, cellstrs.)))

If you want to get some theoretical background on tables, I would highly recommend reading C.J. Date's book "Database In Depth" [1], which describes the Relational Model that is the major theoretical basis for table arrays and similar structures, in both SQL and in-memory representations. (Or, if you're feeling more ambitious, try Date's other book "An Introduction to Database Systems" [2], which is a college-textbook style treatment of the same subject matter.)

Also, there's one reference missing from the Table section of the Octave GSoC wiki page: the Octave Forge Dataframe package [3] is another initial implementation of a table-like data structure. It does not follow the Matlab table array API, but is conceptually and functionally similar, and should probably be consulted for this project.

Cheers,
Andrew

[1] https://www.oreilly.com/library/view/database-in-depth/0596100124/
[2] https://www.pearson.com/us/higher-education/program/Date-An-Introduction-to-Database-Systems-8th-Edition/PGM274345.html
[3] https://wiki.octave.org/Dataframe_package



reply via email to

[Prev in Thread] Current Thread [Next in Thread]