[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Packaging big generated data files?
From: |
Csepp |
Subject: |
Re: Packaging big generated data files? |
Date: |
Thu, 08 Dec 2022 14:46:51 +0100 |
Denis 'GNUtoo' Carikli <GNUtoo@cyberdimension.org> writes:
> [[PGP Signed Part:Undecided]]
> Hi,
>
> Is there any policies or past decisions of the Guix project on
> packaging big generated data files?
>
> I've added packages for software like kiwix-tools and navit that both
> work offline but that also need data files to be useful.
>
> Navit is a (car) navigation software that need maps. The maps can be
> generated from OpenStreetMap dumps with a tool available in Navit
> source code (maptool)[1] which is not packaged yet. Binary map files can
> also be downloaded directly from various sources.
>
> Right now the biggest file possible for such maps is about 47 GiB
> (for the whole planet).
>
> As for kiwix-tools, it can serve offline versions of websites like
> Wikipedia, and there too it needs files to work. The biggest file seems
> to be the complete version of English Wikipedia with scaled down
> pictures[2] and it takes about 89 GiB. I didn't look yet how these files
> were generated but I guess that they somehow can be generated from
> Wikipedia dumps.
>
> Packaging the binary files (without generating them) can be useful as
> it simplifies a lot the maintenance as one can just update the package
> version and checksum to update these. It also enables to keep the
> information (download URL, checksum, license) in one place and it
> enables easy reuse by Guix services and/or configuration files.
>
> If these files were generated in packages, it would also enable to
> tweak the data, for instance by adding height data in navit maps. As
> for kiwix compatible files, it would probably enable to decide when to
> make the snapshots or enable to package additional wikis
> (like the Libreplanet Wiki) or websites.
>
> The issue here is probably the size of the generated files: they are
> huge, so if they are packaged, they will most likely take significant
> resources in the Guix infrastructure.
>
> So what would be the way to go here? Would Guix accept patches to add
> packages for these files in Guix proper?
>
> If so, does it needs to be done like with the ZFS (kernel module)
> package where "#:substitutable? #f" is used to avoid redistributing
> package builds? Or are other ways better for such use cases?
>
> Note that so far I've only packaged locally only kiwix compatible files
> for various wikis by just downloading already prepared files, so I
> didn't look yet into navit maps or into generating all these files, so
> I might miss some details about generating them.
>
> References:
> -----------
> [1]https://navit.readthedocs.io/en/latest/maps.html#processing-osm-maps-yourself
> [2]https://mirror.download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_2022-05.zim
>
> Denis.
>
> [[End of PGP Signed Part]]
Could ZIM files be downloaded over bittorrent as fixed output
derivations? They can be pretty huge. Also if the system started
seeding them as well, that would be pretty cool.