New features, history and changelog

Open Semantic Search Appliance 16.12.09

  • Upgrade of the VM image with newest Open Semantic Search version
  • Since there is no need for an Virtual Box shared folder for an external storage of the index anymore, the search appliance VM for teams can be runned on other virtualization solutions than Virtual Box like Qemu, KVM or Xen, too.

Open Semantic ETL 16.10.21

Open Semantic Search 16.08.27 and Open Semantic Search Apps 16.08.27

Open Semantic Search 16.08.15 and Open Semantic ETL 16.08.15

  • Named Entity Recognition with Stanford Named Entity Recognizer (NER) for automatic extraction of yet unknown entities like persons, organizations and places yet not listed in thesaurus or ontology with named entities

Open Semantic Search 16.07.20 and Open Semantic ETL 16.07.20

  • Documents which could not be analysed have the content type "Unknown" so you can easy see and filter with the search user interface how many and which documents content could not be analysed. Just click on the value "Unknown" in the facet "Content type".
  • Excluding query parts with - now works for query parts with wildcards, too, even if stemming is on: Doing more protection/query rewriting of operators if combining stemming and wildcards in one query. In former versions this did not work for excluding of wildcards if stemming on and not facet explicit defined because for Solr facet:-value is not the same than -facet:value, to which the such to be excluded query parts with wildcards will be rewritten now)
  • Richer error messages for Indexing: If something went wrong while posting data to Solr index, the ETL tools will print the full Solr error message instead of only the HTTP exception code.
  • Bugfix: Multiple values in the author metadata field of document files will not cause problems anymore. If multiple values in author field, joining them with a comma, since this Solr standard field is not multivalued in Solr standard schema.

Open Semantic Search 16.06.23 and Open Semantic ETL 16.06.23

  • Interfaces for advanced search: Setting search operators by easier and self explaining user interfaces instead of having to set text operators manually
  • Multiple times faster extraction of new documents in file directories and file shares by automatic parallel processing automatically without manual setup of parallel tasks which before that needed Linux admin knowledge and some work.
  • Much faster indexing of new documents by optimized commit intervals. Now Solr is preconfiured to autocommit for central management of commits, so the ETL framework or importers (which now runs tasks parallel automatically) have not to manage commits anymore and can run parallel without knowledge about other Extract Transfer Load (ETL) processes
  • Operators for stemming: Stemming can easy be switched on or off in the advances search options. With the new operators exact: or stemmed: you can switch on or off stemming even for parts of your search query.
  • You can use stemming or wildcards within the same search query.
  • Errors while extraction are now stored in the search index, too. So you can search or filter for them without additional tools for log file analysis and the users can see if something and which parts went wrong by automatic analysis of interesting documents or why they can not see some analysis results for interesting documents.
  • Upgraded to Apache Solr 6.1.0

Open Semantic Search 16.05.04 and Open Semantic ETL 16.05.04

  • Lightweight daemon for file system monitoring on remote servers & file shares
  • Possibility to blacklist content types/file endings or uris/paths/filenames on plugin level. So you can switch on or switch off active plugins for special or additional analysis more granular for content types or file endings and/or special content or paths adding this URIs, paths, filenames or content types to a blacklist only for this plugin. For example by blacklisting open documents content type in the blacklist for the ZIP plugin this type of documents won't be extracted by the ZIP plugin to analyse each contained file (which all together are a single document), even if technically they are ZIP files.
  • Upgrade to Apache Solr 6

Open Semantic Search 16.04.21 and Open Semantic ETL 16.04.21

  • New Tika server daemon package and Tika-Server ETL plugin for faster text extraction and metadata extraction of many files with a Tika server daemon running all the time (so the Tika-App has not to be loaded again for each file)