guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Packaging big generated data files?


From: Csepp
Subject: Re: Packaging big generated data files?
Date: Thu, 08 Dec 2022 14:46:51 +0100

Denis 'GNUtoo' Carikli <GNUtoo@cyberdimension.org> writes:

> [[PGP Signed Part:Undecided]]
> Hi,
>
> Is there any policies or past decisions of the Guix project on
> packaging big generated data files?
>
> I've added packages for software like kiwix-tools and navit that both
> work offline but that also need data files to be useful.
>
> Navit is a (car) navigation software that need maps. The maps can be
> generated from OpenStreetMap dumps with a tool available in Navit
> source code (maptool)[1] which is not packaged yet. Binary map files can
> also be downloaded directly from various sources.
>
> Right now the biggest file possible for such maps is about 47 GiB
> (for the whole planet).
>
> As for kiwix-tools, it can serve offline versions of websites like
> Wikipedia, and there too it needs files to work. The biggest file seems
> to be the complete version of English Wikipedia with scaled down
> pictures[2] and it takes about 89 GiB. I didn't look yet how these files
> were generated but I guess that they somehow can be generated from
> Wikipedia dumps.
>
> Packaging the binary files (without generating them) can be useful as
> it simplifies a lot the maintenance as one can just update the package
> version and checksum to update these. It also enables to keep the
> information (download URL, checksum, license) in one place and it
> enables easy reuse by Guix services and/or configuration files.
>
> If these files were generated in packages, it would also enable to
> tweak the data, for instance by adding height data in navit maps. As
> for kiwix compatible files, it would probably enable to decide when to
> make the snapshots or enable to package additional wikis
> (like the Libreplanet Wiki) or websites.
>
> The issue here is probably the size of the generated files: they are
> huge, so if they are packaged, they will most likely take significant
> resources in the Guix infrastructure.
>
> So what would be the way to go here? Would Guix accept patches to add
> packages for these files in Guix proper?  
>
> If so, does it needs to be done like with the ZFS (kernel module)
> package where "#:substitutable? #f" is used to avoid redistributing
> package builds? Or are other ways better for such use cases?
>
> Note that so far I've only packaged locally only kiwix compatible files
> for various wikis by just downloading already prepared files, so I
> didn't look yet into navit maps or into generating all these files, so
> I might miss some details about generating them.
>
> References:
> -----------
> [1]https://navit.readthedocs.io/en/latest/maps.html#processing-osm-maps-yourself
> [2]https://mirror.download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_2022-05.zim
>
> Denis.
>
> [[End of PGP Signed Part]]

Could ZIM files be downloaded over bittorrent as fixed output
derivations?  They can be pretty huge.  Also if the system started
seeding them as well, that would be pretty cool.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]