directory-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[directory-discuss] Debian/Ubuntu Database import


From: Andrew Engelbrecht
Subject: [directory-discuss] Debian/Ubuntu Database import
Date: Mon, 26 Mar 2012 00:04:40 -0400
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.24) Gecko/20111114 Icedove/3.1.16

Hello Michael and directory-discuss,

I'd like to introduce Michael Faille, who I met at the LibrePlanet conference yesterday. He said that he would like to help out with the directory, including the planned database import from either Debian or Ubuntu. It's a big job, and I know that Joshua has been discussing it with some people regarding how it should be done, so I hope he'll weigh in on our strategy.

For those of you who don't know about the plan, the first long-term goal I see is to get updated project version info from some distro's repository data and onto directory entry pages. Reading values about each package shouldn't be the hard part, but I think we will have the biggest challenge in matching up directory.fsf.org project urls/page names with a distribution's package names, since there will be some variation between the two lists. So there will have to be some manual matching and verification wherever it is challenging for an automatic script to find matches.

For instance, in the directory, the project name "GIMP" corresponds to "gimp" in the debian repository. That's an easy match for a script to find. However matching "Armadillo: C++ library" and "libarmadillo2" is a bit harder, and I think there are some that will be more challenging than that. One strategy for this issue could be to try auto-generating a list of possible matches, based on similar names and project homepage urls for each project in the directory. We could then to split between many people the task of human selection/verification.

And once that is done, we can write a simple python script using the mediawiki extension to auto-edit the "templates" on each project page in order to include an entry that lists the distro's package name. Then it will be easy to broaden the scope of data to import, such as updated descriptions, since the groundwork will be laid. For instance, Joshua was telling me and Michael that there is another database that lists extra information beyond what's in bare-bone debian "Packages" files. This info is referenced by debian or ubuntu package name.

So Michael, while I was a bit unclear in my original description to you at LP, I hope this gives you a better idea of what we have before us. If you need a better explanation from me, I can answer questions. If this is indeed something you wish to help us with, we would all love to hear your thoughts.

Thanks, and welcome aboard. :)


-Andrew



reply via email to

[Prev in Thread] Current Thread [Next in Thread]