Saturday, December 3, 2016

Symbiosis of Language Technology and AI at LT-Accelerate

AI and Natural Language Processing propel each other, because most of human knowledge and interaction is textual. What is textual is globally always multilingual. The LT-Accelerate Conference (Brussels, Nov 20-21) focuses on Text Analytics, AI, and related subjects. The speakers, CEO of SMEs, project leads/data scientist of larger companies, and NLP/AI researchers, provided amazing insights into the progress the field has made in the last year. 

Needless to say, also here the industry’s buzzwords Deep, Neural, Machine Learning are ubiquitous. Luckily innovators have become much better in explaining the concepts behind and how to use them. Open Source Software puts these powerful tools also in the hand of smaller teams. Matthew Honnibal from spaCy summarized on use case nicely: “You shall know a word by the company it keeps”.

Michalis Michael, CEO of DigitalMR, sets the bar high for state-of-the-art text analytics. The sentiment accuracy and topic match has to be >80% while significantly reducing noise. Only by supporting all languages enterprises become omniscient. Human emotions are slightly more complex than Positive/Neutral/Negative. HeartBeat AI, for example, features a comprehensive emotion model. Text analytics needs to be meaningfully integrated in existing surveys and other data sources. Profiling allows customer segmentation by demographics or other derived variables. 

Demanding requirements, but when done right text analytics strongly correlates with survey results. Only that it is much cheaper. Therefore the industry is bullish that their currently still small 3% share of the $65B spent annually on market research will grow dramatically.

Mike Hyde, former Skype’s Director of Data and Insights, explained why Bots are the new Apps. These bots need to understand language. They must have access to and make sense of enterprises knowledge. And the bots have to be polyglot. A rich playing field for language technology deployed on top of a Multilingual Knowledge System.

Many believe Machine Learning can do miracles. And ML does, as long as there are mountains of good data at hand. For example, Google claims to have outperformed humans in lip reading (automatic speech recognition of vids is at 95-98% accuracy, so lots of data). Microsoft claims that they do as well as humans in describing pics in one sentence. 

However, often there aren't humongous amount of data available. Obviously “>80%” accuracy doesn’t cut it, when applications deal with serious matters such as health, legal, or money. The community agrees that for most use cases Machine Learning needs to be based on human knowledge: on taxonomies, ontologies, and terms.