salutations and web scraping

guile-user

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

salutations and web scraping

From:	Catonano
Subject:	salutations and web scraping
Date:	Fri, 30 Dec 2011 23:58:47 +0100

Hello people,

Happy New Year.

I´m a beginner, I never wrote a single line of LISP or Scheme in my life and I´m here for asking for directions and suggestions.

I´m mumbling about a pet project. I would like to scrape the web site of a comunitarian radio station and grab the flash streamed content they publish. The license the material is published under is Creative Common so what I´m planning is not illegal.

The reason why they chose such an obtuse solution is because they are obtuse. They started the station in the 70s and now they don´t get this digital new thing

I read the web stuff. The client chapter suggests to adopt an architecture similar to that of the server for parallel scrapers and closes flashing the idea of threads and futures.

I don´t see how I could use threads or futures (I´m not even sure what they are) and my boldness is such that I´d ask you to write for me an example skeleton code.

Also I was thinking to write a scraper in Guile scheme and then such scraper would parse the html source for te relevant bits and then delegate the flash stuff to a unix command, I think wget, curl or something similar. Is this reasonable ? Is there any architectural glitch I´m missing, here ?

Don´t worry people, I know that the server setup and the internet connection is not so strong and I don´t want to be server hostile so I guess a maximum of 2 parallel connections are gonna run.

Or, I was dreaming I could try to integrate the thing with the Gnome enviroinment and make it available from the Gnome Shell _javascript_. So the people in the community could use it to grab the footages themselves. I don´t know

Thanks so much for ANY hint
Cato

[Prev in Thread]

Current Thread

[Next in Thread]

salutations and web scraping, Catonano <=

Prev by Date: Re: Guile on Zile, module questions
Previous by thread: Re: bug#10147: HTTP "Expires" header should handle non-date values
Index(es):
- Date
- Thread