Skip to content

UN/LOCODE dataset, but with more and actually reliable coordinates

Notifications You must be signed in to change notification settings

cristan/improved-un-locodes

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

An enhanced UN/LOCODE dataset with significant improvements:

Significantly better coordinates

The main reason this project exists: coordinates in the original UN/LOCODE list have major problems:

1. Only 80% of locations have coordinates

This doesn't just include tiny villages, but world's most important cities like London (GBLON), Madrid (ESMAD), Luxembourg (LULUX) and Milano (ITMIL).

2. Many coordinates are just wrong

Quite a few coordinates have typos (ATWIS), but many are just flat out wrong (EGSCN)

This project aims to solve most of these cases by combining the data with data from OpenStreetMap's Nominatim API and Wikidata.

3. Multiple coordinate formats

Most UN/LOCODES coordinates look like USNYC: 4042N 07400W. However, entries in Bhutan like BTPDL have decimal coordinates: 26.8128N 89.1903E. This project solves this with 2 columns: the Coordinates column now has only the UN/LOCODE style degrees, while the CoordinatesDecimal column has a decimal representation.

CSV with improved locations

This is all solved with code-list-improved.csv. It has both corrected coordinates, as well as just way more of them (98.4%).

A defined hierarchy

Another issue is hierarchy. For example: CNSHZ (Shanghai Hongqiao International Apt), is in Shanghai (CNSGH), but how would you know these are essentially the same place? Ideally, you'd want to know the Airport is in Shanghai.

For this, parents.csv is created, which looks like this:

Unlocode,Parent
CNSHZ,CNSGH
CNPDG,CNSGH

With this, you can easily find out these are all related.

Actually working aliases

It's impossible to find out that both "Vienna" and "Wien" are in fact the same city with UN/LOCODE ATVIE. That is, if you use the offical dataset.

Not so much with aliases-improved.csv, which looks like this:

Unlocode,Alias
ATVIE,Wien
ATVIE,Vienna

This is much more usable than the aliases in the original. Not only because of the improved user-friendlyness, but mostly because of its sheer size. The official dataset has less than 100 aliases, this one has over 575.000.

About UN/LOCODES

The United Nations Code for Trade and Transport Locations is a code list mantained by UNECE (a United Nations agency) to facilitate trade. The list is comes from the UNECE page, released twice a year. However, this dataset is based on datasets/un-locode, which is already much better than the original (e.g. no more encoding problems).

License

UN/LOCODE data

All unlocode data is licensed under the ODC Public Domain Dedication and Licence (PDDL).

Nominatim data

ODbL 1.0. http://osm.org/copyright

Wikidata

CC-0 (No rights reserved)

All other contents in this repo

CC-0 (No rights reserved)

About

UN/LOCODE dataset, but with more and actually reliable coordinates

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • JavaScript 85.3%
  • Python 11.8%
  • Shell 2.6%
  • Makefile 0.3%