emacs-orgmode
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Feature] add a new org-attach dispatcher command to offline save we


From: Matthew Lundin
Subject: Re: [Feature] add a new org-attach dispatcher command to offline save web page
Date: Fri, 29 May 2020 10:33:37 -0500

Ihor Radchenko <yantar92@gmail.com> writes:

>> As I said, PATCH welcome, I admired many times I don't have ability to build 
>> a
>> complex archive functionality on url.el or wget or curl.
>
> I have found the following solution [1] using wget:
>
> wget --mirror -p --convert-links -P ./LOCAL-DIR WEBSITE-URL
>

I don't think --mirror is what we want this context, since that will
initiate a recursive download of the entire site. (Ironically, my IP is
now banned from a personal blog that provides a how-to for using wget
after I tried to run the above command on it.) From the wget manual:

    -m
    --mirror
        Turn on options suitable for mirroring.  This option turns on recursion 
and
        time-stamping, sets infinite recursion depth and keeps FTP directory 
listings.
        It is currently equivalent to -r -N -l inf --no-remove-listing.

AFAICT, org-board uses the following options, which limit the archiving
to a single page and all its resources:

wget -e robots=off --page-requisites --adjust-extension --convert-links [...]

> This will not bundle the page into a single file, but it is better than
> nothing. org-attach does not have to attach exactly one file.

You can also create a warc (web archive) file with wget, but then you
need a web archive replayer to view it, which is not exactly convenient.

Best,

Matt




reply via email to

[Prev in Thread] Current Thread [Next in Thread]