bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: use of TZ by mktime()/strftime()


From: Ed Morton
Subject: Re: use of TZ by mktime()/strftime()
Date: Wed, 10 Aug 2022 14:03:42 -0500
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.1.0

On 8/10/2022 12:48 PM, Neil R. Ormos wrote:
Ed Morton wrote:
Eli Zaretskii wrote:
[arnold@skeeve.com  wrote:]
Ed Morton wrote:
So in the above setting TZ to EST or UTC
worked and specifying IST at the end of the
timestamp worked, but setting TZ to IST
failed just like it does in gawk. Clearly I'm
missing something...
All of this depends on the underlying C
library.  As far as I know there aren't
standardized time zone names that work the
same everywhere.
Actually, there are, at least in most practical
cases.  But they are very few, and you cannot
rely on their DST rules to be up to date with
the current practices; they might on some
systems still reflect the DST rules of many
years ago, or even work according to the rules
of another country.
FWIW I found some information on "standard" time
   zones: [...]

Thanks for the feedback all, looks like gawk
behaves the same as date wrt TZ environment
values so there's no gawk issue.
As you've seen, date(1) is pretty good at recognizing dates[*], including time 
zones, in arguments supplied via the -d option.

I make an external call to date(1), instead of mktime(), when I can't be sure 
that the input is well-behaved.  I'm sure it's more expensive then mktime(), 
but the overhead seems a tolerable price to pay when compared to the 
alternatives of parsing the date string, trying to maintain a table of time 
zone offsets, or explicitly consulting the system's time zone database.

Something like this:

   returncode = ( ( "date -d  " datearg " +%s" ) | getline dateresult )

(Simplified to show the concept.  I have a wrapper function that escapes the 
arguments to date(1), checks the returncode, and call close().)



[*] At least on systems that have the GNU core utilities date.

Thanks Neil, yeah I've done the same at times for small input files but it is an order of magnitude slower than using builtin time functions so if/when I don't NEED to do that then I avoid it. In this case the input looks like:

   2020-12-03T12:23:34 UTC
   2020-12-03T12:23:34 Z
   2020-12-03T12:23:34 EST
   2020-12-03T12:23:34 EDT
   2020-12-03T12:23:34 BST
   2020-12-03T12:23:34 IST
   2020-12-03T12:23:34 +00:00
   2020-12-03T12:23:34 -0400
   2020-12-03T12:23:34 -0800
   2020-12-03T12:23:34 +06:00

where the numeric values at the end of the last 4 lines are UTC offsets (as `date` would interpret them) rather than timezones so I was hoping this script would be all I needed:

   gawk '{
        dt = gensub(/\s+\S+$/,"",1); gsub(/[-:T]/," ",dt)
        tz = $NF
        if ( match(tz,/^([-+]?)([0-9]{2}):?([0-9]{2})$/,a) ) {
            tz = (a[1] == "-" ? "+" : "-") a[2] ":" a[3]
        }
        ENVIRON["TZ"] = tz

        epochSecs = mktime(dt)

        ENVIRON["TZ"] = "UTC"
        printf "%-30s ->  %10s  ->  %s UTC\n", $0, epochSecs,
   strftime("%F %T",epochSecs)
   }' file
   2020-12-03T12:23:34 UTC        ->  1606998214  -> 2020-12-03
   12:23:34 UTC
   2020-12-03T12:23:34 Z          ->  1606998214  -> 2020-12-03
   12:23:34 UTC
   2020-12-03T12:23:34 EST        ->  1607016214  -> 2020-12-03
   17:23:34 UTC
   2020-12-03T12:23:34 EDT        ->  1606998214  -> 2020-12-03
   12:23:34 UTC
   2020-12-03T12:23:34 BST        ->  1606998214  -> 2020-12-03
   12:23:34 UTC
   2020-12-03T12:23:34 IST        ->  1606998214  -> 2020-12-03
   12:23:34 UTC
   2020-12-03T12:23:34 +00:00     ->  1606998214  -> 2020-12-03
   12:23:34 UTC
   2020-12-03T12:23:34 -0400      ->  1607012614  -> 2020-12-03
   16:23:34 UTC
   2020-12-03T12:23:34 -0800      ->  1607027014  -> 2020-12-03
   20:23:34 UTC
   2020-12-03T12:23:34 +06:00     ->  1606976614  -> 2020-12-03
   06:23:34 UTC

but as you can see from the output above it doesn't recognize EDT (US Eastern Daylight), BST (British Summer), or IST (Indian Standard) so I settled on this instead:

   gawk 'BEGIN {
        tzmap["EST"] = "US/Eastern"
        tzmap["EDT"] = "-04:00"
        tzmap["BST"] = "+01:00"
        tzmap["IST"] = "Asia/Calcutta"
   }
   {
        dt = gensub(/\s+\S+$/,"",1); gsub(/[-:T]/," ",dt)
        tz = ( $NF in tzmap ? tzmap[$NF] : $NF )
        if ( match(tz,/^([-+]?)([0-9]{2}):?([0-9]{2})$/,a) ) {
            tz = (a[1] == "-" ? "+" : "-") a[2] ":" a[3]
        }
        ENVIRON["TZ"] = tz

        epochSecs = mktime(dt)

        ENVIRON["TZ"] = "UTC"
        printf "%-30s ->  %10s  ->  %s UTC\n", $0, epochSecs,
   strftime("%F %T",epochSecs)
   }' file
   2020-12-03T12:23:34 UTC        ->  1606998214  -> 2020-12-03
   12:23:34 UTC
   2020-12-03T12:23:34 Z          ->  1606998214  -> 2020-12-03
   12:23:34 UTC
   2020-12-03T12:23:34 EST        ->  1607016214  -> 2020-12-03
   17:23:34 UTC
   2020-12-03T12:23:34 EDT        ->  1607012614  -> 2020-12-03
   16:23:34 UTC
   2020-12-03T12:23:34 BST        ->  1606994614  -> 2020-12-03
   11:23:34 UTC
   2020-12-03T12:23:34 IST        ->  1606978414  -> 2020-12-03
   06:53:34 UTC
   2020-12-03T12:23:34 +00:00     ->  1606998214  -> 2020-12-03
   12:23:34 UTC
   2020-12-03T12:23:34 -0400      ->  1607012614  -> 2020-12-03
   16:23:34 UTC
   2020-12-03T12:23:34 -0800      ->  1607027014  -> 2020-12-03
   20:23:34 UTC
   2020-12-03T12:23:34 +06:00     ->  1606976614  -> 2020-12-03
   06:23:34 UTC

which is fine for my purposes.

Thanks all who responded.

    Ed.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]