You are here

Extract locations from GeoNames database

This enhancer plugin extracts locations in GeoNames databases from documents, making this locations available to exploratory search, aggregated overviews and einteractive filters.

Data protection and privacy: No cloud or external webservice used

This plugin won't call an external webservice, so wether the analyzed texts nor the extracted locations will be sent to cloud or external web services.

Installation and config

Install GeoNames plugin

Download and install the package opensemanticsearch-enhancer-geonames

Download GeoNames database parts

Download the interesting data (f.e. all names of cities with more than 1000 people or the data of your country) from Geonames data dumps.

Extract the ZIP archive(s) to the directory /etc/opensemanticsearch/geonames/

Config plugin

Config the databases you want to use for the whole system in /etc/opensemanticsearch/etl or for a special connector (i.e. in /etc/opensemanticsearch/connector-files):


config['geonames'] = [
'/etc/opensemanticsearch/geonames/cities1000.txt'
]

You can configure more than one geonames database (which will need more ressources and cause more fase positives, because more Locations have the same name like an common word in your language):


config['geonames'] = [
'/etc/opensemanticsearch/geonames/DE.txt',
'/etc/opensemanticsearch/geonames/AT.txt',
'/etc/opensemanticsearch/geonames/CH.txt'
]

Enable plugin

Enable the enhancer plugin in the same connector config:

config['plugins'].append('enhance_geonames')

Hint: Check for as few locations as possible

Since there are many places in the world that have the same name like popular words of your language, you should check only for the interesting geoname data or parts of it.

For example often its enough to check only the most important town names of your country and adding smaller local places for you very interesting custom locations to the named entities manager. Or you create custom GeoNames data files by filtering very granular of geonames database parts, i.e. for all locations within a radius to your main location.