Text mining or analysis of free text

In addition to structured data (alphanumeric data), Coheris SPAD can also processing unstructured information: free text. The principle consists in establishing a certain number of metrics before, for the frequency of words, their distance in the text etc. From there, the classical statistical methods are applied to these metrics.

This purely statistical approach of the text – no semantic notion is considered – allows for applications for automatic classification of incoming mail (complaint letters, information requests etc.), and applications for analysis of content of web pages or comments posted on different blogs, forums etc.

 

In the first case, the system succeeds in correctly classifying more than 80% mail, thus eliminating a large part of the manual sorting work. In the second, it involves making available observatories on the web and the blogosphere, to know what is said about a product, a company etc.

 

Note that these techniques remain valid, regardless of the text language. It is also a benefit of the strategic approach of text mining

 

 

Strenghts

  • The text analysis: an additional approach to those of numbers
  • Statistical approach, contributing a significant rate of success
  • The text language is not an obstacle
Learn more

Téléchargez notre documentation :

 

Coheris SPAD

Catalogue des formations