Wednesday, April 5, 2017

Why Machine Learning still Needs Humans for Language?

Outperforming Humans

Machine Learning (ML) begins to outperform humans in many tasks which seemingly require intelligence. The hype about ML makes it even into mass media. ML can read lips, recognizes faces, or transform speech to text. But when ML has to deal with the ambiguity, variety and richness of language, when it has to understand text or extract knowledge, ML continues to need human experts.

Knowledge is Stored as Text

The Web is certainly our greatest knowledge source. However, the Web has been designed for being consumed by humans, not by machines. The Web’s knowledge is mostly stored in text and spoken language, enriched with images and video. It is not a structured relational database storing numeric data in machine processable form.

Text is Multilingual

The Web is also very multilingual. Recent statistics show that surprisingly only 27% of the Web’s content is English and only 21% in the next 5 most used languages. That means more than half of its knowledge is expressed in a long tail of other languages.

Constraints of Machine Learning

 

ML faces some serious challenges. Even with today’s availability of hardware, the demand for computing power can become astronomical when input and desired output are rather fuzzy (see the great NYT article "The Great A.I. Awakening").

ML is great for 80/20 problems, but it is dangerous in contexts with high accuracy needs: “Digital assistants on personal smartphones can get away with mistakes, but for some business applications the tolerance for error is close to zero", emphasizes Nikita Ivanov, from Datalingvo, a Silicon Valley startup.

ML performs good on n-to-1 questions. For instance, in face recognition “all these pixel show which person?” has only one correct answer. However, ML is struggling in n-to-many or in gradual circumstances … there are many ways to translate a text correctly or express a certain piece of knowledge.

ML is only as good as its available relevant training material. For many tasks mountains of data are needed. And the data better be of supreme quality. For language related tasks these mountains of data are often required per language and per domain. Further, it is also hard to decide when the machine has learned enough.

Monolingual ML Good enough?

 

Some suggest why not process everything in English. ML does also an OK job at Machine Translation, like Google Translate. So why not translate everything into English and then lets run our ML algorithms? This is a very dangerous approach since errors multiply. If the output of an 80% accurate Machine Translation becomes the input to an 80% accurate Sentiment Analysis errors multiply to 64%. At that hit rate you are getting close to flipping a coin. 


 

Human Knowledge to Help

 

The world is innovating constantly. Every day new products and services are created. To talk about them we continuously craft new words: the bumpon, the ribbon, a plug-in hybrid, TTIP ‒ only with the innovative force of language we can communicate new things.

Struggle with Rare Words

By definition new words are rare. They first appear in one language and then may slowly propagate into other domains or languages. There is no knowledge without these rare words, the terms. Look at a typical product catalog description with the terms highlighted. Now imagine this description without the terms – it would be nothing but a meaningless scaffold of fill-words.


Knowledge Training Required

At university we acquire the specific language, the terminology, of the field we are studying. We become experts in that domain. But even so, later in our professional career when we change jobs we still have to acquire the lingo of the new company: names of products, modules, services, but also job roles and their titles, names for departments, processes, etc. We get familiar with a specific corporate language by attending training, by reading policies, specifications, and functional descriptions. Machines need to be trained in the very same way with that explicit knowledge and language.

Multilingual Knowledge Systems Boost ML with Knowledge

 

There is a remedy: Terminology databases, enterprise vocabularies, word lists, glossaries – organizations usually already own an inventory of “their” words. This invaluable data can be leveraged to boost ML with human knowledge: by transforming these inventories into a Multilingual Knowledge System (MKS). An MKS captures not only all words in all registers in all languages, but structures them into a knowledge graph (a 'convertible' IS-A 'car' IS-A 'vehicle'…, 'front fork' IS-PART of 'frame' IS-PART of 'bicycle').

It is the humanly curated Multilingual Knowledge System that enables ML and Artificial Intelligence solutions to work for specific domains with only small amounts of textual data and also for less resourced languages.

4 comments:

  1. Massive post. Really good-looking blog. A lot of blogs, I observe these days don't really present anything that I'm interested in custom essay writing reviews .but I'm most definitely interest in this one. I am in reality happy with article quality and direction. This post is mark on in helpful how some thought apply to any script point. Thanks a lot for protection enormous things.

    ReplyDelete
  2. It was a very good post indeed. I thoroughly enjoyed reading it in my lunch time. Will surely come and visit this blog more often. Thanks for sharing. Tim Penniman All the contents you mentioned in post is too good and can be very useful. I will keep it in mind, thanks for sharing the information keep updating, looking forward for more posts.Thanks

    ReplyDelete
  3. For our motivation here, we'll be talking about organized learning as it website that writes essays for you the most dependable and institutionalized.

    ReplyDelete
  4. Many of us turn into authorities in this area. Nevertheless having said that, after in your specialized occupation if we adjust work opportunities many of us even now ought to discover the language in the brand-new firm: labels involving solutions, adventures, solutions, and also employment jobs along with his or her headings, labels pertaining to sectors, functions, and many others. Many of us find informed about a unique corporate and business words by simply participating coaching, by simply looking at plans, features, along with well-designed points.
    more

    ReplyDelete