bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Working with index.html


From: Great Zverre
Subject: Working with index.html
Date: Mon, 16 May 2022 19:39:24 +0300

Hello guys!

I’m using wget to make a mirror of https://releases.hashicorp.com but I don’t 
want to make a full mirror, rather I’d like to have a mirror of certain 
“subfolders” of this site (e.g. terraform, consul etc.). So I do this using the 
following command:

wget -N -r -l inf --no-parent  https://releases.hashicorp.com/consul/

The problem is that at first I get the following result

******
$ wget -N -r -l inf --no-parent  https://releases.hashicorp.com/consul/
--2022-05-16 16:28:18--  https://releases.hashicorp.com/consul/
Resolving releases.hashicorp.com (releases.hashicorp.com)... 151.101.193.183, 
151.101.129.183, 151.101.65.183, ...
Connecting to releases.hashicorp.com 
(releases.hashicorp.com)|151.101.193.183|:443... connected.
HTTP request sent, awaiting response... 
  HTTP/1.1 200 OK
  Connection: keep-alive
  Content-Type: text/html
  ETag: TvHhjlva/+c=
  X-Api-Version: 0.1.2
  X-Request-Id: 8a74122b-c155-88ff-511e-8d0d93155b2e
  X-Amz-Cf-Pop: AMS50-C1
  X-Amz-Cf-Id: Pdzhym0uq3XXjsZ_PxS8xvkntM0IsSCQtakE2EvgwC0v0tYMPJwCzQ==
  Age: 61398
  Access-Control-Allow-Origin: *
  Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
  X-XSS-Protection: 1; mode=block
  X-Content-Type-Options: nosniff
  X-Frame-Options: sameorigin
  Accept-Ranges: bytes
  Date: Mon, 16 May 2022 16:28:18 GMT
  Vary: Origin, Accept-Encoding
  transfer-encoding: chunked
Length: unspecified [text/html]
Saving to: ‘releases.hashicorp.com/consul/index.html’

releases.hashicorp.com/consul/index.html            [ <=>                       
                                                                           ]  
19.51K  --.-KB/s    in 0s      

Last-modified header missing -- time-stamps turned off.
2022-05-16 16:28:18 (45.4 MB/s) - ‘releases.hashicorp.com/consul/index.html’ 
saved [19979]
******

We can see that whatever is there at https://releases.hashicorp.com/consul/ 
gets saved to local releases.hashicorp.com/consul/index.html which is fine, 
exactly what I want. But when in comes to the first href from the 
releases.hashicorp.com/consul/index.html I get the following:
******
--2022-05-16 16:30:21--  https://releases.hashicorp.com/consul/1.12.0
Reusing existing connection to releases.hashicorp.com:443.
HTTP request sent, awaiting response... 
  HTTP/1.1 200 OK
  Connection: keep-alive
  Content-Type: text/html
  X-Api-Version: 0.1.2
  X-Request-Id: ca8c47f5-2e54-b09a-adde-6e8cf5e92d45
  ETag: 8p+ndCqEoYc=
  X-Amz-Cf-Pop: AMS50-C1
  X-Amz-Cf-Id: qA5XZEv2hZReEYoZD29GRsD_M6u76VLv6g-usgKJAzTCQm_SyWVFRA==
  Age: 27384
  Access-Control-Allow-Origin: *
  Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
  X-XSS-Protection: 1; mode=block
  X-Content-Type-Options: nosniff
  X-Frame-Options: sameorigin
  Accept-Ranges: bytes
  Date: Mon, 16 May 2022 16:30:21 GMT
  Vary: Origin, Accept-Encoding
  transfer-encoding: chunked
Length: unspecified [text/html]
releases.hashicorp.com/consul/1.12.0: Is a directory

Cannot write to ‘releases.hashicorp.com/consul/1.12.0’ (Success).
******
We can see that it tries to save whatever is there at 
https://releases.hashicorp.com/consul/1.12.0 into 
releases.hashicorp.com/consul/1.12.0, not 
releases.hashicorp.com/consul/1.12.0/index.html as I would prefer.

The mind blowing fact is that it used to work well for me even couple of weeks 
before with the same invocation. It would produce index.html not only at the 
root but at the leaves as well. Definitely something has changed on the server 
but how can I address the issue? As it works currently it leaves me no option 
to maintain my mirror properly because without these index.htmls I simply can’t 
offer my mirror to my users.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]