guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Packaging big generated data files?


From: Denis 'GNUtoo' Carikli
Subject: Packaging big generated data files?
Date: Wed, 7 Dec 2022 11:33:15 +0100

Hi,

Is there any policies or past decisions of the Guix project on
packaging big generated data files?

I've added packages for software like kiwix-tools and navit that both
work offline but that also need data files to be useful.

Navit is a (car) navigation software that need maps. The maps can be
generated from OpenStreetMap dumps with a tool available in Navit
source code (maptool)[1] which is not packaged yet. Binary map files can
also be downloaded directly from various sources.

Right now the biggest file possible for such maps is about 47 GiB
(for the whole planet).

As for kiwix-tools, it can serve offline versions of websites like
Wikipedia, and there too it needs files to work. The biggest file seems
to be the complete version of English Wikipedia with scaled down
pictures[2] and it takes about 89 GiB. I didn't look yet how these files
were generated but I guess that they somehow can be generated from
Wikipedia dumps.

Packaging the binary files (without generating them) can be useful as
it simplifies a lot the maintenance as one can just update the package
version and checksum to update these. It also enables to keep the
information (download URL, checksum, license) in one place and it
enables easy reuse by Guix services and/or configuration files.

If these files were generated in packages, it would also enable to
tweak the data, for instance by adding height data in navit maps. As
for kiwix compatible files, it would probably enable to decide when to
make the snapshots or enable to package additional wikis
(like the Libreplanet Wiki) or websites.

The issue here is probably the size of the generated files: they are
huge, so if they are packaged, they will most likely take significant
resources in the Guix infrastructure.

So what would be the way to go here? Would Guix accept patches to add
packages for these files in Guix proper?  

If so, does it needs to be done like with the ZFS (kernel module)
package where "#:substitutable? #f" is used to avoid redistributing
package builds? Or are other ways better for such use cases?

Note that so far I've only packaged locally only kiwix compatible files
for various wikis by just downloading already prepared files, so I
didn't look yet into navit maps or into generating all these files, so
I might miss some details about generating them.

References:
-----------
[1]https://navit.readthedocs.io/en/latest/maps.html#processing-osm-maps-yourself
[2]https://mirror.download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_2022-05.zim

Denis.

Attachment: pgprzmKsTtsbS.pgp
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]