You are here

Network analysis: Visualization of connections & relations between named entities (graph view)

Visualise relations of entities like persons or organizations within and across documents (co-occurences of named entities)

The graph/network analysis view shows you the direct and indirect relations, connections and networks between named entities like persons, organizations or main concepts which occur together (co-occurences) in your content, datasources and documents.

Integration with Neo4J graph database

Therefore enable the Open Source ETL plugin for integration with the Neo4j graph database in the config file /etc/opensemanticsearch/etl.

Named entities like persons, organizations or places

Extracted named entities like persons, organizations or locations (Named entity extraction) are used for structured navigation, aggregated overviews and interactive filters (faceted search) and to be able to get leads for connections and networks because you can analyze which persons, organizations or places occor together in how many documents.

Automatic Named Entity Recognition by machine learning (ML) for automatic classification and annotation of text parts

Additionally to known named entities in a thesaurus or imported ontologies other data analysis plugins integrate Named Entity Recognition (NER) by spaCy and/or Stanford Named Entities Recognizer (Stanford NER).

Named Entity Extraction of yet unknown entities or names

So by integration of machine learning for analysing the structure of the text and classifying parts/words of the sentences to categories like person, location or organization, many yet unknown named entities can be extracted, which aren't configured or listed yet in the thesaurus or a list of names or ontology.

Therefore it uses models trained with existing annotations of a large text corpus, so after that they can "predict" or better: guess by probability if a part of a sentence is a name of a person, a name of an organization, a verb or a place.

Find more by combination with thesaurus and ontologies

Since no machine learning algorithm and machine learning model is perfect, the search engine combines the analysis with other methods and data which is curated by human editors.

Therefore you can add important names, aliases and alternate labels to the thesaurus, so the search engine will extract them even if the named entities recognition fails.

You don't have to add each name yourself:

By the ontologies manager you can import thousands of names from Open Data like Wikidata which offers an universal structured database with names of people like for example lists of names of politicians and members of parliament(s).

Improve OCR results

Additional entities in the thesaurus are added to the OCR dictionary and so they are found better in scanned documents by the automatic OCR integration for example for images of scanned pages of legacy documents within PDF files.

Manual tagging and annotation

Since no automatic analysis and automatic tagging or annotation is perfect you can tag manually documents by the semantic tagger or annotate visual parts/words/names/paragraphs/senteces within documents by Hypothesis annotator.