Olson database


The tz database, also called the zoneinfo database or IANA Time Zone Database, is a collaborative compilation of information about the world's time zones, primarily intended for use with computer programs and operating systems.[2] It is sometimes called the Olson database, referring to the founding contributor, Arthur David Olson.[3] Paul Eggert is currently its editor and maintainer.[4]

Its most recognizable feature is the uniform naming convention, designed by Paul Eggert, for time zones, such as America/New_York and Europe/Paris (see List of tz database time zones).[5] The database attempts to record historical time zones and all civil changes since 1970, the Unix time epoch.[6] It also includes transitions such as daylight saving time, and even records leap seconds.[7]

History

The project's origins go back to 1986 or earlier.[8] The database, as well as some reference source code, is in the public domain.[9] New editions of the database and code are published as changes warrant, usually several times per year.[10]

2011 lawsuit

On September 30, 2011, a lawsuit, Astrolabe, Inc. v. Olson et al.,[11][12] was filed concerning copyright in the database. As a result, on October 6, 2011, the database's maintenance (mailing list) and dissemination (FTP site) operations were shut down.[13] The case revolved around the database maintainers' use of The American Atlas, by Thomas G. Shanks, and The International Atlas, by Thomas G. Shanks and Rique Pottenger. It specifically complained of unauthorised reproduction of atlas data in the timezone mailing list archive and in some auxiliary link collections maintained with the database, but it didn't actually point at the database itself. The complaint related only to the compilation of historical timezone data, and did not cover current tzdata world timezone tables.[12][14][15] The tz database clearly references its sources, including the atlas, in comments, allowing the extent of use of the data to be evaluated.[16][17]

This lawsuit was resolved on February 22, 2012, when Astrolabe voluntarily moved to dismiss the lawsuit without having ever served the defendants and agreed to a covenant not to sue in the future.[18]

Move to ICANN

ICANN took responsibility for the maintenance of the database on October 14, 2011.[19] The full database and a description of current and future plans for its maintenance are available online from IANA.[20]

Data structure

File formats

The tz database is published as a set of text files which list the rules and zone transitions in a human-readable format. For use, these text files are compiled into a set of platform-independent binary files—one per time zone. The reference source code includes such a compiler called zic (zone information compiler), as well as code to read those files and use them in standard APIs such as localtime() and mktime().

Definition of a time zone

Within the tz database, a time zone is any national region where local clocks have all agreed since 1970.[21] This definition concerns itself first with geographic areas which have had consistent local clocks. This is different from other definitions which concern themselves with consistent offsets from a prime meridian. Therefore, each of the time zones defined by the tz database may document multiple offsets from UTC, typically including both standard time and daylight saving time.

In the time zone text files, each time zone has one or more "zone lines" in one of the time zone text files. The first zone line for a time zone gives the name of the time zone; any subsequent zone lines for that time zone leave the name blank, indicating that they apply to the same zone as the previous line. Each zone line for a zone specifies, for a range of date and time, the offset to UTC for standard time, the name of the set of rules that govern daylight saving time (or a hyphen if standard time always applies), the format for time zone abbreviations, and, for all but the last zone line, the date and time at which the range of date and time governed by that line ends.

Daylight Saving Time (DST) rules

The rules for daylight saving time are specified in named rule sets. Each rule set has one or more rule lines in the time zone text files. A rule line contains the name of the rule set to which it belongs, the first year in which the rule applies, the last year in which the rule applies (or "only" if it applies only in one year or "max" if it is the rule currently in effect), the type of year to which the rule applies ("-" if it applies to all years in the specified range, which is almost always the case, otherwise a name used as an argument to a script that indicates whether the year is of the specified type), the month in which the rule takes effect, the day on which the rule takes effect (which could either be a specific day or a specification such as "the last Sunday of the month"), the time of day at which the rule takes effect, the amount of time to add to the offset to UTC when the rule is in effect, and the letter or letters to use in the time zone abbreviation (for example, "S" if the rule governs standard time and "D" if it governs daylight saving time).

Names of time zones

The time zones have unique names in the form "Area/Location", e.g. "America/New_York", in an attempt to make them easier to understand by the layperson. A choice was also made to use English names or equivalents, and to omit punctuation and common suffixes. The underscore character is used in place of spaces. Hyphens are used where they appear in the name of a location.

Area

Area is the name of a continent, an ocean, or "Etc". The continents and oceans currently include: Africa, America, Antarctica, Arctic, Asia, Atlantic, Australia, Europe, Indian, and Pacific.

The special area of "Etc" is used for some administrative zones, particularly for "Etc/UTC" which represents Coordinated Universal Time. In order to conform with the POSIX style, those zone names beginning with "Etc/GMT" have their sign reversed from what most people expect. In this style, zones west of GMT have a positive sign and those east have a negative sign in their name (e.g "Etc/GMT-14" is 14 hours ahead/east of GMT.)

Location

Location is the name of a specific location within the area – usually a city or small island.

Country names are not used in this scheme, primarily because they would not be robust due to frequent political and boundary changes. The names of large cities tend to be more permanent. However, the database maintainers attempt to include at least one zone for every ISO 3166-1 alpha-2 country code, and a number of user interfaces to the database take advantage of this. Additionally there is a desire to keep locations geographically compact so that any future time zone changes do not split locations into different time zones.

Usually the most populous city in a region is chosen to represent the entire time zone, although other cities may be selected if they are more widely known or result in a less ambiguous name. In the event that the name of a city changes, the convention is to create an alias in future editions so that both the old and new names refer to the same database entry.

In some cases the Location is itself represented as a compound name, for example the time zone "America/Indiana/Indianapolis". The only three-level names currently include those under "America/Argentina/...", "America/Kentucky/...", "America/Indiana/...", and "America/North_Dakota/...".

The location selected is representative for the entire area.

On 2010-05-01 Arthur David Olson mentions a 14 character limit,[22] to justify dropping "de" as in the name of Bahia de Banderas and using only "Bahia_Banderas" for the identifier America/Bahia_Banderas.

Examples

America/Costa_Rica name of country used because the name of the largest city (and capital city) San José is ambiguous
America/New_York Space replaced with underscore
Asia/Kolkata name of city of Kolkata used, because it was the most populous city in the zone at the time the zone was set up, even though it's no longer true[23]
Asia/Sakhalin name of island used, because largest city, Yuzhno-Sakhalinsk, has more than 14 characters
America/Bahia_Banderas name of largest city altered, "de" removed from Bahia de Banderas, because correct name has more than 14 characters
Antarctica/DumontDUrville the apostrophe is removed. Removal of space not conforming to the rule that requires replacement with "_", but with "_" the name would have 15 chars

Example zone and rule lines

These are rule lines for the standard United States daylight saving time rules, rule lines for the daylight saving time rules in effect in the US Eastern Time Zone (called "NYC" as New York City is the city representing that zone) in some years, and zone lines for the America/New_York time zone, as of release version tzdata2011n of the time zone database. The zone and rule lines reflect the history of DST in the United States.

# Rule  NAME    FROM    TO      TYPE    IN      ON      AT      SAVE    LETTER/S
Rule    US      1918    1919    -       Mar     lastSun 2:00    1:00    D
Rule    US      1918    1919    -       Oct     lastSun 2:00    0       S
Rule    US      1942    only    -       Feb     9       2:00    1:00    W # War
Rule    US      1945    only    -       Aug     14      23:00u  1:00    P # Peace
Rule    US      1945    only    -       Sep     30      2:00    0       S
Rule    US      1967    2006    -       Oct     lastSun 2:00    0       S
Rule    US      1967    1973    -       Apr     lastSun 2:00    1:00    D
Rule    US      1974    only    -       Jan     6       2:00    1:00    D
Rule    US      1975    only    -       Feb     23      2:00    1:00    D
Rule    US      1976    1986    -       Apr     lastSun 2:00    1:00    D
Rule    US      1987    2006    -       Apr     Sun>=1  2:00    1:00    D
Rule    US      2007    max     -       Mar     Sun>=8  2:00    1:00    D
Rule    US      2007    max     -       Nov     Sun>=1  2:00    0       S
....
# Rule  NAME    FROM    TO      TYPE    IN      ON      AT      SAVE    LETTER
Rule    NYC     1920    only    -       Mar     lastSun 2:00    1:00    D
Rule    NYC     1920    only    -       Oct     lastSun 2:00    0       S
Rule    NYC     1921    1966    -       Apr     lastSun 2:00    1:00    D
Rule    NYC     1921    1954    -       Sep     lastSun 2:00    0       S
Rule    NYC     1955    1966    -       Oct     lastSun 2:00    0       S
# Zone  NAME            GMTOFF  RULES   FORMAT  [UNTIL]
Zone America/New_York   -4:56:02 -      LMT     1883 Nov 18 12:03:58
                        -5:00   US      E%sT    1920
                        -5:00   NYC     E%sT    1942
                        -5:00   US      E%sT    1946
                        -5:00   NYC     E%sT    1967
                        -5:00   US      E%sT

Data stored for each zone

For each time zone that has multiple offsets (usually due to daylight saving time), the tz database records the exact moment of transition. The format can accommodate changes in the dates and times of transitions as well. Zones may have historical rule changes going back many decades (as shown in the example above).

Zone.tab

The file zone.tab is in the public domain and lists the zones. Columns and row sorting are described in the comments of the file, as follows:

# This file contains a table with the following columns:
# 1.  ISO 3166 2-character country code.  See the file `iso3166.tab'.
# 2.  Latitude and longitude of the zone's principal location
#     in ISO 6709 sign-degrees-minutes-seconds format,
#     either +-DDMM+-DDDMM or +-DDMMSS+-DDDMMSS,
#     first latitude (+ is north), then longitude (+ is east).
# 3.  Zone name used in value of TZ environment variable.
# 4.  Comments; present if and only if the country has multiple rows.
#
# Columns are separated by a single tab.
# The table is sorted first by country, then an order within the country that
# (1) makes some geographical sense, and
# (2) puts the most populous zones first, where that does not contradict (1).

Data before 1970

Data before 1970 aims to be correct for the city identifying the region, but is not necessarily correct for the entire region. This is because new regions are created only as required to distinguish clocks since 1970.

For example, between 1963-10-23 and 1963-12-09 in Brazil only the states of Minas Gerais, Espirito Santo, Rio de Janeiro, and São Paulo had summer time. However, a requested split from America/Sao_Paulo was rejected in 2010 with the reasoning that, since 1970, the clocks were the same in the whole region.[24]

Time in Germany, which is represented by Europe/Berlin, is not correct for the year 1945 when the Trizone used different daylight saving time rules than Berlin.

Coverage

Zones covering multiple post-1970 countries

There are two zones that cover an area that was covered by two countries after 1970. The database follows the definitions of countries as per ISO 3166-1, whose predecessor, ISO 3166, was first published in 1974.

Maintenance

The tz reference code and database is maintained by a group of volunteers. Arthur David Olson makes most of the changes to the code, and Paul Eggert to the database. Proposed changes are sent to the tz mailing list, which is gatewayed to the comp.time.tz IETF.

Unix-like systems

The standard path for the timezone database is /usr/share/zoneinfo/ on most Unix-like systems, including Linux distributions.

Usage and extensions

Boundaries of time zones

Geographical boundaries in the form of coordinate sets are not part of the tz database, but boundaries are published by Eric Muller[1] in the form of vector polygons. Using these vector polygons, one can determine, for each place on the globe, the tz database zone in which it is located.

Use in other standards

The Unicode Common Locale Data Repository (CLDR) uses UN/LOCODEs to identify regions.[25] This means all identifiers are referencing a country, something that the creators of the tz database wanted to avoid.

Use in software systems

The tz database is used for time zone processing and conversions in many computer software systems, including:

The Olson timezone IDs are also used by the Unicode Common Locale Data Repository (CLDR) and International Components for Unicode (ICU). For example, the CLDR Windows–Tzid table maps Microsoft Windows time zone IDs to the standard Olson names.[32]

See also

References

External links

General

  • ITU LEGAL TIME 2013
  • The tz database home page (deprecated, see Official IANA sources below)
  • The tz mailing list archive
  • "tz mailing list"; archives of these messages are available at ftp://elsie.nci.nih.gov/pub/tzarchive.gz.
  • tz mailing list at ICANN
  • Jon Udell

Official IANA sources

  • Home page
  • FTP
  • rsync://rsync.iana.org/tz

Man pages

  • Manual (gives the syntax of source files for the tz database)
  • Manual (gives the format of compiled tz database files)

This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and USA.gov, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for USA.gov and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
 
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
 
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.