bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #59086] --page-requisites not always working when creating a warc f


From: Thomas Egense
Subject: [bug #59086] --page-requisites not always working when creating a warc file
Date: Wed, 9 Sep 2020 04:52:04 -0400 (EDT)
User-agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0

URL:
  <https://savannah.gnu.org/bugs/?59086>

                 Summary: --page-requisites not always working when creating a
warc file
                 Project: GNU Wget
            Submitted by: thomasegense
            Submitted on: Wed 09 Sep 2020 08:52:02 AM UTC
                Category: Program Logic
                Severity: 3 - Normal
                Priority: 5 - Normal
                  Status: None
                 Privacy: Public
             Assigned to: None
         Originator Name: 
        Originator Email: 
             Open/Closed: Open
                 Release: None
         Discussion Lock: Any
        Operating System: GNU/Linux
         Reproducibility: None
           Fixed Release: None
         Planned Release: None
              Regression: None
           Work Required: None
          Patch Included: None

    _______________________________________________________

Details:

Url example: https://jyllands-posten.dk/

How to reproduce:
echo "https://jyllands-posten.dk/"; >> url_list.txt

wget --level=1 --recursive --warc-cdx --page-requisites --warc-file=jp
--warc-max-size=1G -i url_list.txt

The source code for the page is downloaded in the warc (last record). But none
of the images are downloaded and links are also followed (--recursive
parameter).

It is probably due to some HTTPS redirection, but since the
source code is downloaded correct, it should still be possible to follow links
and download page requisites.




    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?59086>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]