bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Working with index.html


From: Great Zverre
Subject: Re: Working with index.html
Date: Mon, 23 May 2022 14:08:23 +0300

Hi!

Thanks for your response!
First of all I have the following version of wget:
# wget --version
GNU Wget 1.20.3 built on linux-gnu.

To reproduce the issue could you please do the following commands (it will take 
a couple of minutes):
1. mkdir test
2. cd test
3. mkdir -r releases.hashicorp.com/consul/1.12.0
4. wget -w 10s -N -r -l inf --no-parent  https://releases.hashicorp.com/consul/

I get the following output:
******
--2022-05-23 11:03:18--  https://releases.hashicorp.com/consul/
Resolving releases.hashicorp.com (releases.hashicorp.com)... 151.101.193.183, 
151.101.129.183, 151.101.65.183, ...
Connecting to releases.hashicorp.com 
(releases.hashicorp.com)|151.101.193.183|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘releases.hashicorp.com/consul/index.html’

releases.hashicorp.com/consul/index.html            [ <=>                       
                                                                           ]  
19.51K  --.-KB/s    in 0s      

Last-modified header missing -- time-stamps turned off.
2022-05-23 11:03:18 (66.0 MB/s) - ‘releases.hashicorp.com/consul/index.html’ 
saved [19979]

Loading robots.txt; please ignore errors.
--2022-05-23 11:03:28--  https://releases.hashicorp.com/robots.txt
Reusing existing connection to releases.hashicorp.com:443.
HTTP request sent, awaiting response... 200 OK
Length: 23 [text/plain]
Saving to: ‘releases.hashicorp.com/robots.txt’

releases.hashicorp.com/robots.txt               
100%[=====================================================================================================>]
      23  --.-KB/s    in 0s     

2022-05-23 11:03:28 (1.53 MB/s) - ‘releases.hashicorp.com/robots.txt’ saved 
[23/23]

--2022-05-23 11:03:38--  https://releases.hashicorp.com/consul/1.12.0
Reusing existing connection to releases.hashicorp.com:443.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
releases.hashicorp.com/consul/1.12.0: Is a directory

Cannot write to ‘releases.hashicorp.com/consul/1.12.0’ (Success).
^C
*******
What is your output? Thank you!

> On 21 May 2022, at 12:10, Tim Rühsen <tim.ruehsen@gmx.de> wrote:
> 
> Hi,
> 
> I can not reproduce this issue with wget 1.21.3 nor with current wget2.
> 
> Please make sure you use the latest version of wget.
> 
> Regards, Tim
> 
> On 16.05.22 18:39, Great Zverre wrote:
>> Hello guys!
>> I’m using wget to make a mirror of https://releases.hashicorp.com but I 
>> don’t want to make a full mirror, rather I’d like to have a mirror of 
>> certain “subfolders” of this site (e.g. terraform, consul etc.). So I do 
>> this using the following command:
>> wget -N -r -l inf --no-parent  https://releases.hashicorp.com/consul/
>> The problem is that at first I get the following result
>> ******
>> $ wget -N -r -l inf --no-parent  https://releases.hashicorp.com/consul/
>> --2022-05-16 16:28:18--  https://releases.hashicorp.com/consul/
>> Resolving releases.hashicorp.com (releases.hashicorp.com)... 
>> 151.101.193.183, 151.101.129.183, 151.101.65.183, ...
>> Connecting to releases.hashicorp.com 
>> (releases.hashicorp.com)|151.101.193.183|:443... connected.
>> HTTP request sent, awaiting response...
>>   HTTP/1.1 200 OK
>>   Connection: keep-alive
>>   Content-Type: text/html
>>   ETag: TvHhjlva/+c=
>>   X-Api-Version: 0.1.2
>>   X-Request-Id: 8a74122b-c155-88ff-511e-8d0d93155b2e
>>   X-Amz-Cf-Pop: AMS50-C1
>>   X-Amz-Cf-Id: Pdzhym0uq3XXjsZ_PxS8xvkntM0IsSCQtakE2EvgwC0v0tYMPJwCzQ==
>>   Age: 61398
>>   Access-Control-Allow-Origin: *
>>   Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
>>   X-XSS-Protection: 1; mode=block
>>   X-Content-Type-Options: nosniff
>>   X-Frame-Options: sameorigin
>>   Accept-Ranges: bytes
>>   Date: Mon, 16 May 2022 16:28:18 GMT
>>   Vary: Origin, Accept-Encoding
>>   transfer-encoding: chunked
>> Length: unspecified [text/html]
>> Saving to: ‘releases.hashicorp.com/consul/index.html’
>> releases.hashicorp.com/consul/index.html            [ <=>                    
>>                                                                              
>>  ]  19.51K  --.-KB/s    in 0s
>> Last-modified header missing -- time-stamps turned off.
>> 2022-05-16 16:28:18 (45.4 MB/s) - ‘releases.hashicorp.com/consul/index.html’ 
>> saved [19979]
>> ******
>> We can see that whatever is there at https://releases.hashicorp.com/consul/ 
>> gets saved to local releases.hashicorp.com/consul/index.html which is fine, 
>> exactly what I want. But when in comes to the first href from the 
>> releases.hashicorp.com/consul/index.html I get the following:
>> ******
>> --2022-05-16 16:30:21--  https://releases.hashicorp.com/consul/1.12.0
>> Reusing existing connection to releases.hashicorp.com:443.
>> HTTP request sent, awaiting response...
>>   HTTP/1.1 200 OK
>>   Connection: keep-alive
>>   Content-Type: text/html
>>   X-Api-Version: 0.1.2
>>   X-Request-Id: ca8c47f5-2e54-b09a-adde-6e8cf5e92d45
>>   ETag: 8p+ndCqEoYc=
>>   X-Amz-Cf-Pop: AMS50-C1
>>   X-Amz-Cf-Id: qA5XZEv2hZReEYoZD29GRsD_M6u76VLv6g-usgKJAzTCQm_SyWVFRA==
>>   Age: 27384
>>   Access-Control-Allow-Origin: *
>>   Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
>>   X-XSS-Protection: 1; mode=block
>>   X-Content-Type-Options: nosniff
>>   X-Frame-Options: sameorigin
>>   Accept-Ranges: bytes
>>   Date: Mon, 16 May 2022 16:30:21 GMT
>>   Vary: Origin, Accept-Encoding
>>   transfer-encoding: chunked
>> Length: unspecified [text/html]
>> releases.hashicorp.com/consul/1.12.0: Is a directory
>> Cannot write to ‘releases.hashicorp.com/consul/1.12.0’ (Success).
>> ******
>> We can see that it tries to save whatever is there at 
>> https://releases.hashicorp.com/consul/1.12.0 into 
>> releases.hashicorp.com/consul/1.12.0, not 
>> releases.hashicorp.com/consul/1.12.0/index.html as I would prefer.
>> The mind blowing fact is that it used to work well for me even couple of 
>> weeks before with the same invocation. It would produce index.html not only 
>> at the root but at the leaves as well. Definitely something has changed on 
>> the server but how can I address the issue? As it works currently it leaves 
>> me no option to maintain my mirror properly because without these 
>> index.htmls I simply can’t offer my mirror to my users.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]