bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Working with index.html


From: Tim Rühsen
Subject: Re: Working with index.html
Date: Mon, 23 May 2022 20:31:07 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.1

On 23.05.22 13:08, Great Zverre wrote:
Hi!

Thanks for your response!
First of all I have the following version of wget:
# wget --version
GNU Wget 1.20.3 built on linux-gnu.

To reproduce the issue could you please do the following commands (it will take 
a couple of minutes):
1. mkdir test
2. cd test
3. mkdir -r releases.hashicorp.com/consul/1.12.0
4. wget -w 10s -N -r -l inf --no-parent  https://releases.hashicorp.com/consul/

I get the following output:
******
--2022-05-23 11:03:18--  https://releases.hashicorp.com/consul/
Resolving releases.hashicorp.com (releases.hashicorp.com)... 151.101.193.183, 
151.101.129.183, 151.101.65.183, ...
Connecting to releases.hashicorp.com 
(releases.hashicorp.com)|151.101.193.183|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘releases.hashicorp.com/consul/index.html’

releases.hashicorp.com/consul/index.html            [ <=>                       
                                                                           ]  19.51K  
--.-KB/s    in 0s

Last-modified header missing -- time-stamps turned off.
2022-05-23 11:03:18 (66.0 MB/s) - ‘releases.hashicorp.com/consul/index.html’ 
saved [19979]

Loading robots.txt; please ignore errors.
--2022-05-23 11:03:28--  https://releases.hashicorp.com/robots.txt
Reusing existing connection to releases.hashicorp.com:443.
HTTP request sent, awaiting response... 200 OK
Length: 23 [text/plain]
Saving to: ‘releases.hashicorp.com/robots.txt’

releases.hashicorp.com/robots.txt               
100%[=====================================================================================================>]
      23  --.-KB/s    in 0s

2022-05-23 11:03:28 (1.53 MB/s) - ‘releases.hashicorp.com/robots.txt’ saved 
[23/23]

--2022-05-23 11:03:38--  https://releases.hashicorp.com/consul/1.12.0
Reusing existing connection to releases.hashicorp.com:443.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
releases.hashicorp.com/consul/1.12.0: Is a directory

Cannot write to ‘releases.hashicorp.com/consul/1.12.0’ (Success).
^C
*******
What is your output? Thank you!

I basically get the same output now. Sorry, if I failed to follow your first email.

So this is basically like
$ mkdir index.html; wget www.example.com
--2022-05-23 20:29:31--  http://www.example.com/
Resolving www.example.com (www.example.com)... 93.184.216.34, 2606:2800:220:1:248:1893:25c8:1946 Connecting to www.example.com (www.example.com)|93.184.216.34|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1256 (1.2K) [text/html]
index.html: Is a directory

Cannot write to 'index.html' (Is a directory).

Not sure what to do here.

Regards, Tim


On 21 May 2022, at 12:10, Tim Rühsen <tim.ruehsen@gmx.de> wrote:

Hi,

I can not reproduce this issue with wget 1.21.3 nor with current wget2.

Please make sure you use the latest version of wget.

Regards, Tim

On 16.05.22 18:39, Great Zverre wrote:
Hello guys!
I’m using wget to make a mirror of https://releases.hashicorp.com but I don’t 
want to make a full mirror, rather I’d like to have a mirror of certain 
“subfolders” of this site (e.g. terraform, consul etc.). So I do this using the 
following command:
wget -N -r -l inf --no-parent  https://releases.hashicorp.com/consul/
The problem is that at first I get the following result
******
$ wget -N -r -l inf --no-parent  https://releases.hashicorp.com/consul/
--2022-05-16 16:28:18--  https://releases.hashicorp.com/consul/
Resolving releases.hashicorp.com (releases.hashicorp.com)... 151.101.193.183, 
151.101.129.183, 151.101.65.183, ...
Connecting to releases.hashicorp.com 
(releases.hashicorp.com)|151.101.193.183|:443... connected.
HTTP request sent, awaiting response...
   HTTP/1.1 200 OK
   Connection: keep-alive
   Content-Type: text/html
   ETag: TvHhjlva/+c=
   X-Api-Version: 0.1.2
   X-Request-Id: 8a74122b-c155-88ff-511e-8d0d93155b2e
   X-Amz-Cf-Pop: AMS50-C1
   X-Amz-Cf-Id: Pdzhym0uq3XXjsZ_PxS8xvkntM0IsSCQtakE2EvgwC0v0tYMPJwCzQ==
   Age: 61398
   Access-Control-Allow-Origin: *
   Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
   X-XSS-Protection: 1; mode=block
   X-Content-Type-Options: nosniff
   X-Frame-Options: sameorigin
   Accept-Ranges: bytes
   Date: Mon, 16 May 2022 16:28:18 GMT
   Vary: Origin, Accept-Encoding
   transfer-encoding: chunked
Length: unspecified [text/html]
Saving to: ‘releases.hashicorp.com/consul/index.html’
releases.hashicorp.com/consul/index.html            [ <=>                       
                                                                           ]  19.51K  
--.-KB/s    in 0s
Last-modified header missing -- time-stamps turned off.
2022-05-16 16:28:18 (45.4 MB/s) - ‘releases.hashicorp.com/consul/index.html’ 
saved [19979]
******
We can see that whatever is there at https://releases.hashicorp.com/consul/ 
gets saved to local releases.hashicorp.com/consul/index.html which is fine, 
exactly what I want. But when in comes to the first href from the 
releases.hashicorp.com/consul/index.html I get the following:
******
--2022-05-16 16:30:21--  https://releases.hashicorp.com/consul/1.12.0
Reusing existing connection to releases.hashicorp.com:443.
HTTP request sent, awaiting response...
   HTTP/1.1 200 OK
   Connection: keep-alive
   Content-Type: text/html
   X-Api-Version: 0.1.2
   X-Request-Id: ca8c47f5-2e54-b09a-adde-6e8cf5e92d45
   ETag: 8p+ndCqEoYc=
   X-Amz-Cf-Pop: AMS50-C1
   X-Amz-Cf-Id: qA5XZEv2hZReEYoZD29GRsD_M6u76VLv6g-usgKJAzTCQm_SyWVFRA==
   Age: 27384
   Access-Control-Allow-Origin: *
   Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
   X-XSS-Protection: 1; mode=block
   X-Content-Type-Options: nosniff
   X-Frame-Options: sameorigin
   Accept-Ranges: bytes
   Date: Mon, 16 May 2022 16:30:21 GMT
   Vary: Origin, Accept-Encoding
   transfer-encoding: chunked
Length: unspecified [text/html]
releases.hashicorp.com/consul/1.12.0: Is a directory
Cannot write to ‘releases.hashicorp.com/consul/1.12.0’ (Success).
******
We can see that it tries to save whatever is there at 
https://releases.hashicorp.com/consul/1.12.0 into 
releases.hashicorp.com/consul/1.12.0, not 
releases.hashicorp.com/consul/1.12.0/index.html as I would prefer.
The mind blowing fact is that it used to work well for me even couple of weeks 
before with the same invocation. It would produce index.html not only at the 
root but at the leaves as well. Definitely something has changed on the server 
but how can I address the issue? As it works currently it leaves me no option 
to maintain my mirror properly because without these index.htmls I simply can’t 
offer my mirror to my users.

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]