\" $NetBSD: parsedate.3,v 1.27 2024/05/02 18:34:01 rillig Exp $
\"
\" Copyright (c) 2006 The NetBSD Foundation, Inc.
\" All rights reserved.
\"
\" This code is derived from software contributed to The NetBSD Foundation
\" by Christos Zoulas.
\"
\" Redistribution and use in source and binary forms, with or without
\" modification, are permitted provided that the following conditions
\" are met:
\" 1. Redistributions of source code must retain the above copyright
\" notice, this list of conditions and the following disclaimer.
\" 2. Redistributions in binary form must reproduce the above copyright
\" notice, this list of conditions and the following disclaimer in the
\" documentation and/or other materials provided with the distribution.
\"
\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
\" POSSIBILITY OF SUCH DAMAGE.
\"
Dd May 16, 2021
Dt PARSEDATE 3
Os
Sh NAME
Nm parsedate
Nd date parsing function
Sh LIBRARY
Lb libutil
Sh SYNOPSIS
In util.h
Ft time_t
Fn parsedate "const char *datestr" "const time_t *time" "const int *tzoff"
Sh DESCRIPTION
The
Fn parsedate
function parses a date and time from
Ar datestr
described in English relative to an optional
Ar time
point,
and an optional timezone offset (in minutes behind/west of UTC)
specified in
Ar tzoff .
If
Ar time
is
Dv NULL
then the current time is used.
If
Ar tzoff
is
Dv NULL ,
then the current time zone is used.
Pp
The
Ar datestr
is a sequence of white-space separated items.
The white-space is optional if the concatenated items are not ambiguous.
The string contains data which can specify a base time (used in
conjunction with the
Ar time
parameter, totally replacing that parameter's value if sufficient data
appears in
Ar datestr
to do so), and data specifying an offset from the base time.
Both of those are optional.
If no data specifies the base time, then
Nm
simply uses the value given by
Ar \&*time
Pq "or now" .
If there is no offset data then no offset is applied.
An empty
Ar datestr ,
or a
Ar datestr
containing nothing but whitespace,
is equivalent to midnight at the start of the day specified by
Ar \&*time
Pq "or today" .
Pp
The following words have the indicated numeric meanings:
Dv last =
\-1,
Dv this =
0,
Dv first , next ,
or
Dv one =
1
Pq but see the BUGS section below ,
Dv second
is unused so that it is not confused with
Dq seconds ,
Dv two =
2,
Dv third
or
Dv three =
3,
Dv fourth
or
Dv four =
4,
Dv fifth
or
Dv five =
5,
Dv sixth
or
Dv six =
6,
Dv seventh
or
Dv seven =
7,
Dv eighth
or
Dv eight =
8,
Dv ninth
or
Dv nine =
9,
Dv tenth
or
Dv ten =
10,
Dv eleventh
or
Dv eleven =
11,
Dv twelfth
or
Dv twelve =
12.
Pp
The following words are recognized in English only:
Dv AM ,
Dv PM ,
Dv a.m. ,
Dv p.m. ,
Dv midnight ,
Dv mn ,
Dv noon .
Pp
The months:
Dv january ,
Dv february ,
Dv march ,
Dv april ,
Dv may ,
Dv june ,
Dv july ,
Dv august ,
Dv september ,
Dv october ,
Dv november ,
Dv december ,
and common abbreviations for them.
When a month name (or its ordinal number) is given,
the number of some particular day of that month is required to accompany it.
This is generally true of any data that specifies a period
with a duration longer than a day, so simply specifying a year,
or a month, is invalid, as also is specifying a year and a month.
Pp
The days of the week:
Dv sunday ,
Dv monday ,
Dv tuesday ,
Dv wednesday ,
Dv thursday ,
Dv friday ,
Dv saturday ,
and common abbreviations for them.
Weekday names are typically ignored if any other data
is given to specify the date, even if the name given
is not the day on which the specified date occurred.
Pp
Time units:
Dv year ,
Dv month ,
Dv fortnight ,
Dv week ,
Dv day ,
Dv hour ,
Dv minute ,
Dv min ,
Dv second ,
Dv sec ,
Dv tomorrow ,
Dv yesterday .
Pp
Timezone names:
Dv gmt (+0000) ,
Dv ut (+0000) ,
Dv utc (+0000) ,
Dv wet (+0000) ,
Dv bst (+0100) ,
Dv wat (-0100) ,
Dv at (-0200) ,
Dv nft (-0330) ,
Dv nst (-0330) ,
Dv ndt (-0230) ,
Dv ast (-0400) ,
Dv adt (-0300) ,
Dv est (-0500) ,
Dv edt (-0400) ,
Dv cst (-0600) ,
Dv cdt (-0500) ,
Dv mst (-0700) ,
Dv mdt (-0600) ,
Dv pst (-0800) ,
Dv pdt (-0700) ,
Dv yst (-0900) ,
Dv ydt (-0800) ,
Dv hst (-1000) ,
Dv hdt (-0900) ,
Dv cat (-1000) ,
Dv ahst (-1000) ,
Dv nt (-1100) ,
Dv idlw (-1200) ,
Dv cet (+0100) ,
Dv met (+0100) ,
Dv mewt (+0100) ,
Dv mest (+0200) ,
Dv swt (+0100) ,
Dv sst (+0200) ,
Dv fwt (+0100) ,
Dv fst (+0200) ,
Dv eet (+0200) ,
Dv bt (+0300) ,
Dv it (+0330) ,
\".Dv zp4 (+0400) ,
\".Dv zp5 (+0500) ,
Dv ist (+0530) ,
\".Dv zp6 (+0600) ,
Dv ict (+0700) ,
Dv wast (+0800) ,
Dv wadt (+0900) ,
Dv awst (+0800) ,
Dv awdt (+0900) ,
Dv cct (+0800) ,
Dv sgt (+0800) ,
Dv hkt (+0800) ,
Dv jst (+0900) ,
Dv cast (+0930) ,
Dv cadt (+1030) ,
Dv acst (+0930) ,
Dv acdt (+1030) ,
Dv east (+1000) ,
Dv eadt (+1100) ,
Dv aest (+1000) ,
Dv aedt (+1100) ,
Dv gst (+1000) ,
Dv nzt (+1200) ,
Dv nzst (+1200) ,
Dv nzdt (+1300) ,
Dv idle (+1200) .
Pp
The timezone names simply specify an offset from
Coordinated Universal Time (UTC)
and do not imply validating the time/date to be reasonable in any zone
that happens to use the abbreviation specified.
Pp
A variety of unambiguous dates are recognized:
Bl -tag -compact -width "20 Jun 1994"
It 9/10/69
For years between 69-99 we assume 1900+ and for years between 0-68
we assume 2000+.
It 2006-11-17
An ISO-8601 date.
Note that when using the ISO-8601 format date and time with the
Sq T
designator to separate date and time-of-day,
this must appear at the start of the input string,
with no preceding whitespace.
Other modifiers may optionally follow.
It 67-09-10
The year in an ISO-8601 date is always taken literally,
so this is the year 67, not 2067.
It 10/1/2000
October 1, 2000; the common, but bizarre, US format.
It 20 Jun 1994
It 23jun2001
It 1-sep-06
Other common abbreviations.
It 1/11
The year can be omitted.
A missing year is taken from the
Ar \&*time
value, or
Dq now
if
Ar time
is NULL.
Again, this is the US month/day format (the 11th of January).
El
Pp
Standard e-mail (RFC822, RFC2822, etc)
formats and the output from
Xr date 1 ,
and
Xr asctime 3
are all supported as input,
as is cvs date format (where years < 100 are treated as
20th century).
Pp
Times can also be specified in common forms:
Bl -tag -compact -width 12:11:01.000012
It 10:01
It 10:12pm
It 12:11:01.000012
It 12:21-0500
El
Fractions of seconds (after a decimal point, or comma) are parsed, but ignored.
If no time is given, midnight on the specified date is assumed.
If a time is given without a date, that time on the day
specified by
Ar \&*time
Pq or now
is used.
Missing minutes, or seconds, are taken to be zero.
Pp
A variety of forms for relative items to specify
an offset from the base time are also supported:
Bl -tag -compact -width "this thursday"
It -1 month
It last friday
It one week ago
It this thursday
It next sunday
It +2 years
El
Pp
Note that, as a special case for
Dv midnight
with the name of a day only,
Dq "midnight tuesday"
implies 00:00 at the beginning of Tuesday,
Pq "the midnight before Tuesday"
whereas
Dq "Sat mn"
implies 00:00 at the end of Saturday
Pq "midnight after Saturday"
Pq "i.e. early Sunday morning" .
Pp
Seconds since epoch, UTC, (also known as UNIX time) are also supported
to specify the base time:
Bl -tag -compact -width "E.g.:\ @735275209\ \ \ \ "
It "E.g.: @735275209"
to specify: Tue Apr 20 03:06:49 UTC 1993
El
provided that the value given is within the range
that can be represented as a
Va "struct tm" .
Negative values
(times before the epoch)
are permitted, but no other significant data as part of
the base time \(en the value given specifies year, month,
day, hour, minute, and second, there is no more.
An offset from this base time may still be included.
Thus
Dq "@735275209 +2 months 5 hours 15 minutes"
produces a time_t which represents
Dq "Sun Jun 20 08:21:49 UTC 1993" .
Pp
Text in
Ar datestr
enclosed in parentheses
Ql \&(
and
Ql \&)
is treated as a comment, and ignored.
Parentheses nest (the comment ends when there have
been the same number of closing parentheses as there
were opening parentheses.)
There is no escape character in comments,
Ql \&)
always ends
(or decreases the nesting level of)
the comment.
Sh RETURN VALUES
Fn parsedate
returns the number of seconds passed since,
or before (if negative,)
the Epoch, or
Dv \-1
if the date could not be parsed properly.
A non-error result of
Dv \-1
can be distinguished from an error by setting
Va errno
to
Dv 0
before calling
Fn parsedate ,
and checking the value of
Va errno
afterwards.
Sh ENVIRONMENT
If the
Ar tzoff
parameter is given as
Dv NULL ,
then:
Bl -tag -width iTZ
It Ev TZ
The timezone to which the input is relative,
when no zone information is otherwise specified in the
Ar datestr
input.
El
Sh SEE ALSO
Xr date 1 ,
Xr touch 1 ,
Xr errno 2 ,
Xr ctime 3 ,
\" WTF ???? eeprom(8)!! Why? Just because it calls this function? Weird!
Xr eeprom 8
Sh HISTORY
The parser used in
Fn parsedate
was originally written by Steven M. Bellovin while at the University
of North Carolina at Chapel Hill.
It was later tweaked by a couple of people on Usenet.
Completely overhauled by Rich $alz and Jim Berets in August, 1990.
Further mangled during its residence with
Nx .
Pp
The
Fn parsedate
function first appeared in
Nx 4.0 .
Sh BUGS
Bl -tag -compact -width 1
It 1
The
Fn parsedate
function is not re-entrant or thread-safe.
It 2
The
Fn parsedate
function assumes years less than 0 mean \(mi
Fa year ,
and in non ISO formats,
that years less than 69 mean 2000 +
Fa year ,
otherwise
years less than 100 mean 1900 +
Fa year .
That is except in the CVS format, where years less than 100
mean 1900 +
Fa year .
It 3
The
Fn parsedate
function accepts
Dq "12 am"
where
Dq "12 midnight"
is correct, and similarly
Dq "12 pm"
for
Dq "12 noon" .
The correct forms are also accepted.
It 4
There are various weird cases that are hard to explain,
but are nevertheless considered correct.
It 5
It is very hard to specify years BC,
and in any case,
conversions of times before the
commencement of the modern Gregorian calendar
(when that occurred depends upon location,
but late 16th century is a rough guide)
are suspicious at best,
and depending upon context,
often just plain wrong.
It 6
Despite what is stated above,
Dq next
is actually 2.
The input
Dq "next January" ,
instead of producing a timestamp for January of the
following year, produces one for January 2nd, of the
current year.
Use caution with
Dq next
it rarely does what humans expect.
For example, on a Sunday
Dq "next sunday"
means the following Sunday (7 days hence)
whereas
Dq "next monday"
means the monday that follows that (8 days hence)
rather than
Dq tomorrow
or just
Dq Mon
Pq without the Dq next
which is the nearest subsequent Monday.
El