bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: parse_duration()


From: Bruce Korb
Subject: Re: parse_duration()
Date: Tue, 04 Nov 2008 17:18:16 -0800
User-agent: Thunderbird 2.0.0.12 (X11/20071114)

Bruno Haible wrote:

For this context, the right, proper and easiest solution is to say that
months are 30 days and leave it at that.

But it should be documented.

In the new .h file.

It is much more forgiving than the ISO-8601 spec.  It doesn't worry
about counts being lower than the container size.  The result just
has to fit in a time_t value.  So there are lots of ways:

   PyyyymmddThhmmss
   P nnnn Y nn M nn D T nn H nn M nn S
   nn d HH:MM:SS
   SSSS
   MMM:SS

Nice!

Longer description (for those not reading the attachments):

  Readers and users of this function are referred to the ISO-8601
  specification, with particular attention to "Durations".

  At the time of writing, this worked:

  http://en.wikipedia.org/wiki/ISO_8601#Durations

  The string must start with a 'P', 'T' or a digit.

  ==== if it is a digit

  the string may contain:  NNN d NNN h NNN m NNN s
  This represents NNN days, NNN hours, NNN minutes and NNN seconds.
  The embeded white space is optional.
  These terms must appear in this order.
  The final "s" is optional.
  All of the terms ("NNN" plus designator) are optional.
  Minutes and seconds may optionally be represented as NNN:NNN.
  Also, hours, minute and seconds may be represented as NNN:NNN:NNN.
  There is no limitation on the value of any of the terms, except
  that the final result must fit in a time_t value.

  ==== if it is a 'P' or 'T', please see ISO-8601 for a rigorous definition.

  The 'P' term may be followed by any of three formats:
    yyyymmdd
    yy-mm-dd
    yy Y mm M ww W dd D

  or it may be empty and followed by a 'T'.  The "yyyymmdd" must be eight
  digits long.  Note:  months are always 30 days and years are always 365
  days long.  5 years is always 1825, not 1826 or 1827 depending on leap
  year considerations.  3 months is always 90 days.  There is no consideration
  for how many days are in the current, next or previous months.

  For the final format:
  *  Embedded white space is allowed, but it is optional.
  *  All of the terms are optional.  Any or all-but-one may be omitted.
  *  The meanings are yy years, mm months, ww weeks and dd days.
  *  The terms must appear in this order.

  ==== The 'T' term may be followed by any of these formats:

    hhmmss
    hh:mm:ss
    hh H mm M ss S

  For the final format:
  *  Embedded white space is allowed, but it is optional.
  *  All of the terms are optional.  Any or all-but-one may be omitted.
  *  The terms must appear in this order.

Do you really mean that the result is locale dependent? Namely, if the user
uses U+00A0 (NO-BREAK SPACE) as a separator, he will get a parse success in
ISO-8859-1 locales but a parse failure in UTF-8 locales. - The gnulib module
'c-ctype' contains functions that don't have this problem.

I'm not an internationalization expert.  I'll take a look when I have some time.

  if (! isdigit((int)*in_pz))  goto bad_time;

Arguments of <ctype.h> functions need to be casted from 'char' to
`unsigned char', not to 'int'. A cast from 'char' to 'int' is a no-op.

Where I currently work, you had to because of some target platform issues.
(An old Solaris, if I remember correctly.)  So, it's habit now.  Changed.

parse_YMD(char const * pz)
{
  time_t res = 0, val;
  char * ps = strchr(pz, 'Y');

'ps' should be declared as 'char const *', because 'pz' has const.

The input to the main function (parse_duration) is const, the rest are
now non-const not because they've magically become writable, but because
littering the code with casts because various functions return non-const
pointers is a big nuisance.

Cheers - Bruce
/* Parse a time duration and return a seconds count
   Copyright (C) 2008 Free Software Foundation, Inc.
   Written by Bruce Korb <address@hidden>, 2008.

   This program is free software: you can redistribute it and/or modify
   it under the terms of the GNU General Public License as published by
   the Free Software Foundation; either version 3 of the License, or
   (at your option) any later version.

   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.

   You should have received a copy of the GNU General Public License
   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */

/*

  Readers and users of this function are referred to the ISO-8601
  specification, with particular attention to "Durations".

  At the time of writing, this worked:

  http://en.wikipedia.org/wiki/ISO_8601#Durations

  The string must start with a 'P', 'T' or a digit.

  ==== if it is a digit

  the string may contain:  NNN d NNN h NNN m NNN s
  This represents NNN days, NNN hours, NNN minutes and NNN seconds.
  The embeded white space is optional.
  These terms must appear in this order.
  The final "s" is optional.
  All of the terms ("NNN" plus designator) are optional.
  Minutes and seconds may optionally be represented as NNN:NNN.
  Also, hours, minute and seconds may be represented as NNN:NNN:NNN.
  There is no limitation on the value of any of the terms, except
  that the final result must fit in a time_t value.

  ==== if it is a 'P' or 'T', please see ISO-8601 for a rigorous definition.

  The 'P' term may be followed by any of three formats:
    yyyymmdd
    yy-mm-dd
    yy Y mm M ww W dd D

  or it may be empty and followed by a 'T'.  The "yyyymmdd" must be eight
  digits long.  Note:  months are always 30 days and years are always 365
  days long.  5 years is always 1825, not 1826 or 1827 depending on leap
  year considerations.  3 months is always 90 days.  There is no consideration
  for how many days are in the current, next or previous months.

  For the final format:
  *  Embedded white space is allowed, but it is optional.
  *  All of the terms are optional.  Any or all-but-one may be omitted.
  *  The meanings are yy years, mm months, ww weeks and dd days.
  *  The terms must appear in this order.

  ==== The 'T' term may be followed by any of these formats:

    hhmmss
    hh:mm:ss
    hh H mm M ss S

  For the final format:
  *  Embedded white space is allowed, but it is optional.
  *  All of the terms are optional.  Any or all-but-one may be omitted.
  *  The terms must appear in this order.

 */
#ifndef GNULIB_PARSE_DURATION_H
#define GNULIB_PARSE_DURATION_H

#include <time.h>
extern time_t parse_duration(char const * in_pz);

#endif /* GNULIB_PARSE_DURATION_H */
/* Parse a time duration and return a seconds count
   Copyright (C) 2008 Free Software Foundation, Inc.
   Written by Bruce Korb <address@hidden>, 2008.

   This program is free software: you can redistribute it and/or modify
   it under the terms of the GNU General Public License as published by
   the Free Software Foundation; either version 3 of the License, or
   (at your option) any later version.

   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.

   You should have received a copy of the GNU General Public License
   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */

#include <config.h>

#include <ctype.h>
#include <errno.h>
#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

#ifndef _
#define _(_s)  _s
#endif

#ifndef NUL
#define NUL '\0'
#endif

typedef enum {
  NOTHING_IS_DONE,
  DAY_IS_DONE,
  HOUR_IS_DONE,
  MINUTE_IS_DONE,
  SECOND_IS_DONE
} whats_done_t;

#define SEC_PER_MIN     60
#define SEC_PER_HR      (SEC_PER_MIN * 60)
#define SEC_PER_DAY     (SEC_PER_HR  * 24)
#define SEC_PER_WEEK    (SEC_PER_DAY * 7)
#define SEC_PER_MONTH   (SEC_PER_DAY * 30)
#define SEC_PER_YEAR    (SEC_PER_DAY * 365)

#define TIME_MAX        0x7FFFFFFF
#define BAD_TIME        ((time_t)~0)

static time_t inline
scale_n_add (time_t base, time_t val, int scale)
{
  if (base == BAD_TIME)
    {
      if (errno == 0)
        errno = EINVAL;
      return BAD_TIME;
    }

  if (val > TIME_MAX / scale)
    {
      errno = ERANGE;
      return BAD_TIME;
    }

  val *= scale;
  if (base > TIME_MAX - val)
    {
      errno = ERANGE;
      return BAD_TIME;
    }

  return base + val;
}

static time_t
parse_hr_min_sec (time_t start, char * pz)
{
  int lpct = 0;

  errno = 0;

  /* For as long as our scanner pointer points to a colon *AND*
     we've not looped before, then keep looping.  (two iterations max) */
  while ((*pz == ':') && (lpct++ == 0))
    {
      unsigned long v = strtoul (pz+1, &pz, 10);

      if (errno != 0)
        return BAD_TIME;

      start = scale_n_add (v, start, 60);

      if (errno != 0)
        return BAD_TIME;
    }

  /* allow for trailing spaces */
  while (isspace ((unsigned char)*pz))   pz++;
  if (*pz != NUL)
    {
      errno = EINVAL;
      return BAD_TIME;
    }

  return start;
}

static time_t
parse_scaled_value (time_t base, char ** ppz, char * endp, int scale)
{
  char * pz = *ppz;
  time_t val;

  if (base == BAD_TIME)
    return base;

  errno = 0;
  val = strtoul (pz, (char **)&pz, 10);
  if (errno != 0)
    return BAD_TIME;
  while (isspace ((unsigned char)*pz))   pz++;
  if (pz != endp)
    {
      errno = EINVAL;
      return BAD_TIME;
    }

  *ppz =  pz;
  return scale_n_add (base, val, scale);
}

static time_t
parse_year_month_day (char * pz, char * ps)
{
  time_t res = 0, val;

  res = parse_scaled_value (0, &pz, ps, SEC_PER_YEAR);

  ps = strchr (++pz, '-');
  if (ps == NULL)
    {
      errno = EINVAL;
      return BAD_TIME;
    }
  res = parse_scaled_value (res, &pz, ps, SEC_PER_MONTH);

  pz++;
  ps = pz + strlen (pz);
  return parse_scaled_value (res, &pz, ps, SEC_PER_DAY);
}

static time_t
parse_yearmonthday (char * in_pz)
{
  time_t res = 0;
  char   buf[8];
  char * pz;

  if (strlen (in_pz) != 8)
    {
      errno = EINVAL;
      return BAD_TIME;
    }

  memcpy (buf, in_pz, 4);
  buf[4] = NUL;
  pz = buf;
  res = parse_scaled_value (0, &pz, buf + 4, SEC_PER_YEAR);

  memcpy (buf, in_pz + 4, 2);
  buf[2] = NUL;
  pz =   buf;
  res = parse_scaled_value (0, &pz, buf + 2, SEC_PER_MONTH);

  memcpy (buf, in_pz + 6, 2);
  buf[2] = NUL;
  pz =   buf;
  return parse_scaled_value (0, &pz, buf + 2, SEC_PER_DAY);
}

static time_t
parse_YMWD (char * pz)
{
  time_t res = 0, val;
  char * ps = strchr (pz, 'Y');
  if (ps != NULL)
    {
      res = parse_scaled_value (0, &pz, ps, SEC_PER_YEAR);
      pz++;
    }

  ps = strchr (pz, 'M');
  if (ps != NULL)
    {
      res = parse_scaled_value (res, &pz, ps, SEC_PER_MONTH);
      pz++;
    }

  ps = strchr (pz, 'W');
  if (ps != NULL)
    {
      res = parse_scaled_value (res, &pz, ps, SEC_PER_WEEK);
      pz++;
    }

  ps = strchr (pz, 'D');
  if (ps != NULL)
    {
      res = parse_scaled_value (res, &pz, ps, SEC_PER_DAY);
      pz++;
    }

  while (isspace ((unsigned char)*pz))   pz++;
  if (*pz != NUL)
    {
      errno = EINVAL;
      return BAD_TIME;
    }

  return res;
}

static time_t
parse_hour_minute_second (char * pz, char * ps)
{
  time_t res = 0, val;

  res = parse_scaled_value (0, &pz, ps, SEC_PER_HR);

  ps = strchr (++pz, ':');
  if (ps == NULL)
    {
      errno = EINVAL;
      return BAD_TIME;
    }
  res = parse_scaled_value (0, &pz, ps, SEC_PER_MIN);

  pz++;
  ps = pz + strlen (pz);
  return parse_scaled_value (res, &pz, ps, 1);
}

static time_t
parse_hourminutesecond (char * in_pz)
{
  time_t res = 0;
  char   buf[4];
  char * pz;

  if (strlen (in_pz) != 6)
    {
      errno = EINVAL;
      return BAD_TIME;
    }

  memcpy (buf, in_pz, 2);
  buf[2] = NUL;
  pz = buf;
  res = parse_scaled_value (0, &pz, buf + 2, SEC_PER_HR);

  memcpy (buf, in_pz + 2, 2);
  buf[2] = NUL;
  pz =   buf;
  res = parse_scaled_value (0, &pz, buf + 2, SEC_PER_MIN);

  memcpy (buf, in_pz + 4, 2);
  buf[2] = NUL;
  pz =   buf;
  return parse_scaled_value (0, &pz, buf + 2, 1);
}

static time_t
parse_HMS (char * pz)
{
  time_t res = 0, val;
  char * ps = strchr (pz, 'H');
  if (ps != NULL)
    {
      res = parse_scaled_value (0, &pz, ps, SEC_PER_HR);
      pz++;
    }

  ps = strchr (pz, 'M');
  if (ps != NULL)
    {
      res = parse_scaled_value (res, &pz, ps, SEC_PER_MIN);
      pz++;
    }

  ps = strchr (pz, 'S');
  if (ps != NULL)
    {
      res = parse_scaled_value (res, &pz, ps, 1);
      pz++;
    }

  while (isspace ((unsigned char)*pz))   pz++;
  if (*pz != NUL)
    {
      errno = EINVAL;
      return BAD_TIME;
    }

  return res;
}

static time_t
parse_time (char * pz)
{
  char * ps;

  time_t res = 0;
  time_t val;

  /*
   *  Scan for a hyphen
   */
  ps = strchr (pz, ':');
  if (ps != NULL)
    {
      res = parse_hour_minute_second (pz, ps);
    }

  /*
   *  Try for a 'H', 'M' or 'S' suffix
   */
  else if (ps = strpbrk (pz, "HMS"),
           ps == NULL)
    {
      /* Its a YYYYMMDD format: */
      res = parse_hourminutesecond (pz);
    }

  else
    res = parse_HMS (pz);

  return res;
}

/*
 *  Parse the year/months/days of a time period
 */
static time_t
parse_period (char * in_pz)
{
  char * pz  = xstrdup (in_pz);
  char * pT  = strchr (pz, 'T');
  char * ps;
  void * fptr = pz;

  time_t res = 0;
  time_t val;

  if (pT != NUL)
    *(pT++) = NUL;

  /*
   *  Scan for a hyphen
   */
  ps = strchr (pz, '-');
  if (ps != NULL)
    {
      res = parse_year_month_day (pz, ps);
    }

  /*
   *  Try for a 'Y', 'M' or 'D' suffix
   */
  else if (ps = strpbrk (pz, "YMWD"),
           ps == NULL)
    {
      /* Its a YYYYMMDD format: */
      res = parse_yearmonthday (pz);
    }

  else
    res = parse_YMWD (pz);

  if ((errno == 0) && (pT != NULL))
    {
      val = parse_time (pT);
      res = scale_n_add (res, val, 1);
    }

  free (fptr);
  return res;
}

static time_t
parse_non_iso8601(char * in_pz)
{
  whats_done_t whatd_we_do = NOTHING_IS_DONE;

  char * pz  = (char *)in_pz;
  time_t res = 0;

  do  {
    time_t val;

    errno = 0;
    val = strtol (pz, &pz, 10);
    if (errno != 0)
      goto bad_time;

    /*  IF we find a colon, then we're going to have a seconds value.
        We will not loop here any more.  We cannot already have parsed
        a minute value and if we've parsed an hour value, then the result
        value has to be less than an hour. */
    if (*pz == ':')
      {
        if (whatd_we_do >= MINUTE_IS_DONE)
          break;

        res = parse_hr_min_sec (val, pz);

        if ((whatd_we_do == HOUR_IS_DONE) && (res >= SEC_PER_HR))
          break;

        return res;
      }

    {
      unsigned int mult;

      /*  Skip over white space following the number we just parsed. */
      while (isspace ((unsigned char)*pz))   pz++;

      switch (*pz)
        {
        default:  goto bad_time;
        case NUL:
          return scale_n_add (res, val, 1);

        case 'd':
          if (whatd_we_do >= DAY_IS_DONE)
            goto bad_time;
          mult = SEC_PER_DAY;
          whatd_we_do = DAY_IS_DONE;
          break;

        case 'h':
          if (whatd_we_do >= HOUR_IS_DONE)
            goto bad_time;
          mult = SEC_PER_HR;
          whatd_we_do = HOUR_IS_DONE;
          break;

        case 'm':
          if (whatd_we_do >= MINUTE_IS_DONE)
            goto bad_time;
          mult = SEC_PER_MIN;
          whatd_we_do = MINUTE_IS_DONE;
          break;

        case 's':
          mult = 1;
          whatd_we_do = SECOND_IS_DONE;
          break;
        }

      res = scale_n_add (res, val, mult);

      while (isspace ((unsigned char)*++pz))   ;
      if (*pz == NUL)
        return res;

      if (! isdigit ((unsigned char)*pz))
        break;
    }

  } while (whatd_we_do < SECOND_IS_DONE);

 bad_time:
  errno = EINVAL;
  return BAD_TIME;
}

time_t
parse_duration (char const * in_pz)
{
  char * pz = (char *)in_pz;

  while (isspace (*pz)) pz++;

  do {
    if (*pz == 'P')
      {
        res = parse_period (pz + 1);
        if ((errno != 0) || (res == BAD_TIME))
          break;
        return res;
      }

    if (*pz == 'T')
      {
        res = parse_time (pz + 1);
        if ((errno != 0) || (res == BAD_TIME))
          break;
        return res;
      }

    if (! isdigit ((unsigned char)*pz))
      break;

    res = parse_non_iso8601 (pz);
    if ((errno == 0) && (res != BAD_TIME))
      return res;

  } while (0);

  fprintf (stderr, _("Invalid time duration:  %s\n"), pz);
  errno = EINVAL;
  return BAD_TIME;
}

/*
 * Local Variables:
 * mode: C
 * c-file-style: "gnu"
 * indent-tabs-mode: nil
 * End:
 * end of parse-duration.c */

reply via email to

[Prev in Thread] Current Thread [Next in Thread]