Named Entity Recogniton (NER)

Automatic Named Entity Recognition by machine learning (ML) for automatic classification and annotation of text parts

There are models trained with existing annotations of a large text corpus, so after that they can predict / guess if a part of a sentence is a name of a person, a name of an organization, a verb or a place.

Named Entities Recognition with Stanford Named Entitiy Recognizer (Stanford NER)

Despite Stanford NER is integrated and configured out of the box, automatic Named entities recognition needs many resources (mainly CPU time and RAM) while anylsis on document import or data enrichment, so you have to switch it on manually or you can only enrich some important documents or texts with such an automatic text analysis.

To setup the automatic named entity recognition with Stanford Named Entitiy Recognizer (NER).

Setting language model

How to improve Named Entities Extraction

In many cases such Named Entity Recognition (NER) heuristics and machine learning works, but not all.

To get the most analysis with less manual effort for meta data management, you can combine all methods and tools: Own domain knowledge by managing named entities manually and integrate, import or enrich with external knowledge or open data, extracting structured data with text patterns by regex and machine learning.

Train the machine learning model

You can train the machine learning model with additional structure or annotations to get better.

Combine recognized Named Entities with results of additional methods like thesaurus of domain knowledge, lists of names, open data and precise rules

But often rules more precise, especially for new or not very popular names.

The problem with rules or the Named Entities manager is, that you have to know and manage all this names.

Importing A list or ontology

If a named you can define precise Text patterns by regular expressions i.e. for a special currency or for scanned forms.

You can add names You can add rules by queries