![]()
![]() |
Text mining or analysis of free text![]() In addition to structured data (alphanumeric data), Coheris SPAD can also processing unstructured information: free text. The principle consists in establishing a certain number of metrics before, for the frequency of words, their distance in the text etc. From there, the classical statistical methods are applied to these metrics. This purely statistical approach of the text – no semantic notion is considered – allows for applications for automatic classification of incoming mail (complaint letters, information requests etc.), and applications for analysis of content of web pages or comments posted on different blogs, forums etc.
In the first case, the system succeeds in correctly classifying more than 80% mail, thus eliminating a large part of the manual sorting work. In the second, it involves making available observatories on the web and the blogosphere, to know what is said about a product, a company etc.
Note that these techniques remain valid, regardless of the text language. It is also a benefit of the strategic approach of text mining
Strenghts
|
![]() Learn more
![]() |
