emacs-orgmode
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: state of the art in org-mode tables e.g. join, etc


From: Derek Feichtinger
Subject: Re: state of the art in org-mode tables e.g. join, etc
Date: Mon, 22 Feb 2021 09:27:50 +0100
User-agent: mu4e 1.4.9; emacs 27.1.90

Hi John,

I invested time some years ago in preparing babel examples, and a lot of
the description went into using tables. The most detailed documents I
had for elisp and python.

In order to be productive, e.g. for producing all kinds of scientific
graphs, but also for doing the finances and planning for our scientific
computing section I ended up the same as you with mostly going to python
and leveraging Pandas. I think all of us end up using ":colnames no" as
the most convenient solution.

https://github.com/dfeich/org-babel-examples/blob/master/python3/python3-babel.org

(especially look at the Pandas section 10)

In that file I also tangle a python library "orgbabelhelper" that is
available in Conda and PyPi. I mainly use that to work with my tables.

Best regards
Derek

-- 
Paul Scherrer Institut
Dr. Derek Feichtinger                   Phone:   +41 56 310 47 33
Group Head HPC and Emerging Technologies  Email: derek.feichtinger@psi.ch
Building/Room No. OHSA/D17
Forschungsstrasse 111
CH-5232 Villigen PSI 

On Sun, Feb 21 2021, John Kitchin <jkitchin@andrew.cmu.edu> wrote:

> For fun, here is the sqlite equivalent of the Pandas example using the same
> tables as before
>
>
> ** aggregation example
>
> Examples from https://github.com/tbanel/orgaggregate
>
>
> #+NAME: original
> | Day       | Color | Level | Quantity |
> |-----------+-------+-------+----------|
> | Monday    | Red   |    30 |       11 |
> | Monday    | Blue  |    25 |        3 |
> | Tuesday   | Red   |    51 |       12 |
> | Tuesday   | Red   |    45 |       15 |
> | Tuesday   | Blue  |    33 |       18 |
> | Wednesday | Red   |    27 |       23 |
> | Wednesday | Blue  |    12 |       16 |
> | Wednesday | Blue  |    15 |       15 |
> | Thursday  | Red   |    39 |       24 |
> | Thursday  | Red   |    41 |       29 |
> | Thursday  | Red   |    49 |       30 |
> | Friday    | Blue  |     7 |        5 |
> | Friday    | Blue  |     6 |        8 |
> | Friday    | Blue  |    11 |        9 |
>
>
> #+begin_src sqlite :db ":memory:" :var orgtable=original :colnames yes
> drop table if exists testtable;
> create table testtable(Day str, Color str, Level int, Quantity int);
> .mode csv testtable
> .import $orgtable testtable
> select Color, count(*) from testtable group by Color;
> #+end_src
>
> #+RESULTS:
> | Color | count(*) |
> |-------+----------|
> | Blue  |        7 |
> | Red   |        7 |
>
> ** join example
>
> Example from https://github.com/tbanel/orgtbljoin
>
> #+name: nutrition
> | type     | Fiber | Sugar | Protein | Carb |
> |----------+-------+-------+---------+------|
> | eggplant |   2.5 |   3.2 |     0.8 |  8.6 |
> | tomatoe  |   0.6 |   2.1 |     0.8 |  3.4 |
> | onion    |   1.3 |   4.4 |     1.3 |  9.0 |
> | egg      |     0 |  18.3 |    31.9 | 18.3 |
> | rice     |   0.2 |     0 |     1.5 | 16.0 |
> | bread    |   0.7 |   0.7 |     3.3 | 16.0 |
> | orange   |   3.1 |  11.9 |     1.3 | 17.6 |
> | banana   |   2.1 |   9.9 |     0.9 | 18.5 |
> | tofu     |   0.7 |   0.5 |     6.6 |  1.4 |
> | nut      |   2.6 |   1.3 |     4.9 |  7.2 |
> | corn     |   4.7 |   1.8 |     2.8 | 21.3 |
>
>
> #+name: recipe
> | type     | quty |
> |----------+------|
> | onion    |   70 |
> | tomatoe  |  120 |
> | eggplant |  300 |
> | tofu     |  100 |
>
>
> #+begin_src sqlite :db ":memory:" :var nut=nutrition rec=recipe :colnames
> yes
> drop table if exists nutrition;
> drop table if exists recipe;
> create table nutrition(type str, Fiber float, Sugar float, Protein float,
> Carb float);
> create table recipe(type str, quty int);
>
> .mode csv nutrition
> .import $nut nutrition
>
> .mode csv recipe
> .import $rec recipe
>
> select * from recipe, nutrition where recipe.type=nutrition.type;
> #+end_src
>
> #+RESULTS:
> | type     | quty | type     | Fiber | Sugar | Protein | Carb |
> |----------+------+----------+-------+-------+---------+------|
> | onion    |   70 | onion    |   1.3 |   4.4 |     1.3 |  9.0 |
> | tomatoe  |  120 | tomatoe  |   0.6 |   2.1 |     0.8 |  3.4 |
> | eggplant |  300 | eggplant |   2.5 |   3.2 |     0.8 |  8.6 |
> | tofu     |  100 | tofu     |   0.7 |   0.5 |     6.6 |  1.4 |
>
>
> John
>
> -----------------------------------
> Professor John Kitchin
> Doherty Hall A207F
> Department of Chemical Engineering
> Carnegie Mellon University
> Pittsburgh, PA 15213
> 412-268-7803
> @johnkitchin
> http://kitchingroup.cheme.cmu.edu
>
>
>
> On Sun, Feb 21, 2021 at 10:03 AM John Kitchin <jkitchin@andrew.cmu.edu>
> wrote:
>
>> Thanks Tim and Greg. I had mostly come to the same conclusions that it is
>> probably best to outsource this. I worked out some examples from
>> the orgtbljoin and orgaggregate packages with Pandas below, in case anyone
>> is interested in seeing how it works. A key point is using the ":colnames
>> no" header args to get the column names for Pandas. It seems like a pretty
>> good approach.
>>
>> * org-mode tables with Pandas
>> ** Aggregating from a table
>>
>> Examples from https://github.com/tbanel/orgaggregate
>>
>>
>> #+NAME: original
>> | Day       | Color | Level | Quantity |
>> |-----------+-------+-------+----------|
>> | Monday    | Red   |    30 |       11 |
>> | Monday    | Blue  |    25 |        3 |
>> | Tuesday   | Red   |    51 |       12 |
>> | Tuesday   | Red   |    45 |       15 |
>> | Tuesday   | Blue  |    33 |       18 |
>> | Wednesday | Red   |    27 |       23 |
>> | Wednesday | Blue  |    12 |       16 |
>> | Wednesday | Blue  |    15 |       15 |
>> | Thursday  | Red   |    39 |       24 |
>> | Thursday  | Red   |    41 |       29 |
>> | Thursday  | Red   |    49 |       30 |
>> | Friday    | Blue  |     7 |        5 |
>> | Friday    | Blue  |     6 |        8 |
>> | Friday    | Blue  |    11 |        9 |
>>
>>
>> #+BEGIN_SRC ipython :var data=original :colnames no
>> import pandas as pd
>>
>> pd.DataFrame(data[1:], columns=data[0]).groupby('Color').size()
>> #+END_SRC
>>
>> #+RESULTS:
>> :results:
>> # Out [1]:
>> # text/plain
>> : Color
>> : Blue    7
>> : Red     7
>> : dtype: int64
>> :end:
>>
>> The categorical stuff here is just to get the days sorted the same way as
>> the example. It is otherwise not needed. I feel there should be a more
>> clever way to do this, but didn't think of it.
>>
>> #+BEGIN_SRC ipython :var data=original :colnames no
>> df = pd.DataFrame(data[1:], columns=data[0])
>> days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday',
>> 'Saturday', 'Sunday']
>> df['Day'] = pd.Categorical(df['Day'], categories=days, ordered=True)
>>
>> (df
>>  .groupby('Day')
>>  .agg({'Level': 'mean',
>>        'Quantity': 'sum'})
>>  .sort_values('Day'))
>> #+END_SRC
>>
>> #+RESULTS:
>> :results:
>> # Out [2]:
>> # text/plain
>> :            Level  Quantity
>> : Day
>> : Monday      27.5        14
>> : Tuesday     43.0        45
>> : Wednesday   18.0        54
>> : Thursday    43.0        83
>> : Friday       8.0        22
>> : Saturday     NaN         0
>> : Sunday       NaN         0
>>
>>
>> [[file:/var/folders/3q/ht_2mtk52hl7ydxrcr87z2gr0000gn/T/ob-ipython-htmlMnDA9a.html]]
>> :end:
>>
>> ** Joining tables
>>
>> Example from https://github.com/tbanel/orgtbljoin
>>
>> #+name: nutrition
>> | type     | Fiber | Sugar | Protein | Carb |
>> |----------+-------+-------+---------+------|
>> | eggplant |   2.5 |   3.2 |     0.8 |  8.6 |
>> | tomatoe  |   0.6 |   2.1 |     0.8 |  3.4 |
>> | onion    |   1.3 |   4.4 |     1.3 |  9.0 |
>> | egg      |     0 |  18.3 |    31.9 | 18.3 |
>> | rice     |   0.2 |     0 |     1.5 | 16.0 |
>> | bread    |   0.7 |   0.7 |     3.3 | 16.0 |
>> | orange   |   3.1 |  11.9 |     1.3 | 17.6 |
>> | banana   |   2.1 |   9.9 |     0.9 | 18.5 |
>> | tofu     |   0.7 |   0.5 |     6.6 |  1.4 |
>> | nut      |   2.6 |   1.3 |     4.9 |  7.2 |
>> | corn     |   4.7 |   1.8 |     2.8 | 21.3 |
>>
>>
>> #+name: recipe
>> | type     | quty |
>> |----------+------|
>> | onion    |   70 |
>> | tomatoe  |  120 |
>> | eggplant |  300 |
>> | tofu     |  100 |
>>
>>
>> #+BEGIN_SRC ipython :var nut=nutrition recipe=recipe :colnames no
>> nutrition = pd.DataFrame(nut[1:], columns=nut[0])
>> rec = pd.DataFrame(recipe[1:], columns=recipe[0])
>>
>> pd.merge(rec, nutrition, on='type')
>> #+END_SRC
>>
>> #+RESULTS:
>> :results:
>> # Out [4]:
>> # text/plain
>> :        type  quty  Fiber  Sugar  Protein  Carb
>> : 0     onion    70    1.3    4.4      1.3   9.0
>> : 1   tomatoe   120    0.6    2.1      0.8   3.4
>> : 2  eggplant   300    2.5    3.2      0.8   8.6
>> : 3      tofu   100    0.7    0.5      6.6   1.4
>> :end:
>>
>>
>> John
>>
>> -----------------------------------
>> Professor John Kitchin
>> Doherty Hall A207F
>> Department of Chemical Engineering
>> Carnegie Mellon University
>> Pittsburgh, PA 15213
>> 412-268-7803
>> @johnkitchin
>> http://kitchingroup.cheme.cmu.edu
>>
>>
>>
>> On Sun, Feb 21, 2021 at 1:54 AM Tim Cross <theophilusx@gmail.com> wrote:
>>
>>>
>>> Greg Minshall <minshall@umich.edu> writes:
>>>
>>> > John,
>>> >
>>> >> Is there a state of the art in using org-tables as little databases
>>> >> with joins and stuff?
>>> >
>>> > i have to admit i do all that with an R code source block.  (the dplyr
>>> > package has the relevant joins, e.g. dplyr::inner_join().)  and, in R,
>>> > ":colnames yes" as a header argument gives you header lines on results.
>>> > (maybe that's ?now? for "all" languages?)
>>> >
>>>
>>> For really complex joins and ad hoc queries, I would do similar or put
>>> the data into sqlite. For more simple ones, I just define a table which
>>> uses table formulas to extract the values from the other tables - the
>>> downside being the tables need to have the same data ordering or the
>>> formulas need to be somewhat complex. Provided the tables have the same
>>> number of records in the same order, table formulas are usually fairly
>>> easy.
>>>
>>> I did think about writing some elisp functions to use in my table
>>> formulas to make things easier, but then decided I was just re-inventing
>>> and well defined database solution and figured when I need it, just use
>>> sqlite. However, it has been a while since I needed this level of
>>> complexity, so perhaps things have moved on and there are better ways
>>> now.
>>>
>>> --
>>> Tim Cross
>>>
>>>





reply via email to

[Prev in Thread] Current Thread [Next in Thread]