Import Wikidata dictionary (lexemes) to Solr synonyms config
Open Source tool to import Wikidata lexemes to Solr synonyms config
The free Open Source tool wikidata-lexemes-to-solr-synonyms imports lexemes (dictionary including different grammar forms/lexical forms for each lexical entry) from Wikidata to Apache Solr search engine synonyms config.
Grammar rules for search and information retrieval
Natural language processing of unstructured text is complicated, since the same words/meanings can occur in different grammar forms.
Stemming
To find other grammar forms of words, the search engine uses stemming heuristics, which for example cuts suffixes like -ing to find other grammar forms of the same words.
But such automatic heuristics can fail, especially for example on irregular verbs, where for example another grammatical form like "went" is not only "go" with a suffix (like f.e. -ing).
Human curated dictionary (Linked Data Knowledge Graph of Lexemes as Open Data from WikiData)
By considering lexemes by import of Wikidata Lexemes as Solr search engine synonyms you find documents including many such more complicated / irregular grammar forms, too.
Hint: You can have a look and browse the structured data of lexemes in the structured data base and linked open data knowledge graph WikiData by the web user interface Ordia.
Import Wikidata lexemes to Solr synonyms config
The lexemes import tool can be configured by following command line parameters and will be integrated to our web UI next.
Command line options
Usage: wikidata-lexemes-to-solr-synonyms [options]
Options:
-h, --help show this help message and exit
-s SOLR, --solr=SOLR Solr URI like http://localhost:8983/solr/
-c CORE, --core=CORE Solr core/index name
-r RESOURCE, --resource=RESOURCE
Solr managed synonyms resource where to store the
results
-l LANGUAGE, --language=LANGUAGE
Language (Wikidata entity)
Free Open Source Software (FOSS)
The Python & SPARQL based import tool is Free Software under the GPL license.
The Open Source code is available in the Github repository opensemanticsearch/lexemes.