- Text analytics
The term text analytics describes a set of linguistic, lexical, pattern recognition,extraction, tagging/structuring, visualization, and predictive techniques. The termalso describes processes that apply these techniques, whether independently or inconjunction with query and analysis of fielded, numerical data, to solve businessproblems. These techniques and processes discover and present knowledge – facts,business rules, and relationships – that is otherwise locked in textual form, impenetrableto automated processing.
A typical application is to scan a set of documents written in a
natural language and either model the document set for predictive classification purposes or populate a database or search index with the information extracted. Current approaches to text analytics usenatural language processing techniques that focus on specialized domains.Typical subtasks are:
*
Named Entity Recognition : recognition of entity names (for people and organizations), place names, temporal expressions, and certain types of numerical expressions.
*Coreference : identification chains ofnoun phrase s that refer to the same object. For example, anaphora is a type of coreference.
*Relationship Extraction : extraction of named relationships between entities in textee also
*
Noisy text analytics
*Information extraction
*Computational linguistics
*Natural language processing
*Named entity recognition
*Text mining oftware and Applications
Commercial Software and Applications
*
AeroText - provides a suite of text mining applications for content analysis. Content used can be in multiple languages.
*Alethes OpenEyes [http://www.alethes.it] - provides a complete suite fot text analytics for 8 different language, including information extration, entity recognition, taxonomy generation, clustering, categorization, summarization, sentiment analysis.
*Anderson Analytics - provider of text analytics and content analysis especially as it relates to consumer behavior.
*Attensity provides hosted, integrated and stand-alone text analytics software.
*Carabao Language Kit - suite of components for text analytics, categorization, sense disambiguation, idiom extraction, named entity recognition with tools to add a new language or edit exiting one(s).
*Clarabridge is a provider of end-to-end text analytics software and solutions for Voice of the Customer, Quality Assurance, Competitive Intelligence and other application areas.
*Clearforest [http://www.clearforest.com] is a provider of solutions and software to extract structured data from unstructured texts. It recently got acquired by Reuters which was merged with Thomson. The new organization is now called Thomson Reuters.
* [http://www.eaagle.com/index.php?go=FTM Eaagle Full Text Mapper] - a text mining software solution that uses knowledge discovery and data visualization as a basis for analyzing unstructured text.
*EpiAnalytics [http://www.EpiAnalytics.com] provides advanced operational analytics for routing, classification and business intelligence.
*IBM LanguageWare [http://www.alphaworks.ibm.com/tech/lrw] is the IBM suite for Text Analytics (Tools and Runtime).
* Ixreveal [http://www.ixreveal.com] is commercial text mining and patented OLAP (OnLine Analytical Processing) for Text software vendor specialized in providing complete solution for structured and unstructured data using advanced analytics algorithms and techniques. uReveal and uReka! [http://www.ureka.info] products have been adopted by major international companies and US local and federal government agencies in areas like fraud and recovery, voice of the customer, and law enforcement.
*Infonic provides commercial sentiment analysis of financial news feeds for the Thomson Reuters RMDS trading information system. The "sentiment scores" that this software provides are used withinalgorithmic trading systems by several major trading banks.Infonic also develops unique document summarization and textual navigation technologies that aid inKnowledge Management .
*Island Data [http://www.islanddata.com] provides real-time text analysis for unstructured textual data sources. The text analytics engine is statistically based which makes the algorithm equally effective for all languages. The company is managed by text mining experts including James Sanger (Chairman, Island Data Corp.), author of The Text Mining Handbook.
*Lexalytics [http://www.lexalytics.com] is a commercial provider of enterprise software solutions offering entity, theme, and quote extraction, as well as summarization and sentiment analysis of unstrutured content including online news, blogs and corporate documents. The company recently merged with Infonic's Text Analytics Division.
*Leximancer is a commercial data mining tools that can be used to analyze collections of textual documents and visually displays the extracted information. It is language independent and can be used for text analysis, coding open-ended surveys, media analysis and CRM notes. [http://www.Leximancer.com]
*Rapid-I is a provider of predictive analytics, data mining, and text mining software, solutions, and services.
*SPSS [http://www.spss.com] - provider of SPSS Text Analysis for Surveys, Text Mining for Clementine, LexiQuest Mine and LexiQuest Categorize, commercial text analytics software that can be used in conjunction with SPSS Predictive Analytics Solutions.
*Teezir Search Solutions designs, delivers and hosts knowledge management applications for professional services firms. Its flagship solution is Teezir Expert Finder, a search engine that identifies experts within an organization, based on all documents on the firm's networks
* TEMIS [http://www.temis.com] - Software solution editor providing Collaborative Solutions for Analyzing and Discovering Strategic Information to serve the Information Intelligence needs of business corporations.Open-Source Software and Applications
*
RapidMiner - open-source software for data and text mining
* GATE - Open-source toolbox for text engineering and natural language processingExternal links
* Automatic Content Extraction, Linguistic Data Consortium: http://projects.ldc.upenn.edu/ace/
* Automatic Content Extraction, NIST: http://www.itl.nist.gov/iad/894.01/tests/ace/
* Message Understanding Conference: http://www.itl.nist.gov/iaui/894.02/related_projects/muc/
* Seth Grimes's Text Analytics expert channel at the Business Intelligence Network: http://www.b-eye-network.com/channels/index.php?filter_channel=1394
* Text Analytics Summit: http://www.textanalyticsnews.com/
* Text Analytics Wiki: http://textanalytics.wikidot.com/start
* Text Analytics Yahoo group: http://tech.groups.yahoo.com/group/TextAnalytics/
* Text Analytics Linkedin group: http://www.linkedin.com/e/gis/22313/3A5CAF691C78
Wikimedia Foundation. 2010.