Monday, February 12, 2018

IoT Banks on Semantic Interoperability

The biggest challenge for widespread adoption of the Internet of Things is interoperability. A much-noticed McKinsey report states that achieving interoperability in IoT would unlock an additional 40% of value. This is not surprising since the IoT is in essence about connecting machines, devices, and sensors – ideally cross organization, cross industries, and even cross borders. But while technical and syntactic interoperability are pretty much solved, little has been available so far to make sure devices actually understand each other.

Focus Semantic Interoperability

Embedded Computing Design superbly describes the situation in a recent series of articles. Technical interoperability, the fundamental ability to exchange raw data (bits, frames, packets, messages), is well understood and standardized. Syntactic interoperability, the ability to exchange structured data, is supported by standard data formats such as XML and JSON. Core connectivity standards such as DDS or OPC-UA provide syntactic interoperability cross-industries by communicating through a proposed set of standardized gateways.

Semantic interoperability, though, requires that the meaning (context) of exchanged data is automatically and accurately interpreted. Several industry bodies have tried to implement semantic data models. However, these semantic data schemes have either been way too narrow for cross-industry use cases or had to stay too high-level. Without schemes data from IoT devices lack information to describe their own meaning. Therefore, a laborious and, worse, inflexible normalization effort is required before that data can be really used. 

Luckily there is a solution: abstract metadata from devices by creating an IoT knowledge system.

Controlled Vocabulary and Ontologies

A controlled vocabulary is a collection of identifiers which ensure consistency of metadata terminology. These terms are used to label concepts (nodes) in a graph which provides a standardized classification for a particular domain. Such ontology, incorporating characteristics of a taxonomy and thesaurus, links concepts with their terms and attributes in semantic relationships. This way it provides metadata abstraction. It represents knowledge in machine-readable form and thus functions as a knowledge system for specific domains and their IoT applications.

IoT Knowledge Systems made Easy

A domain ontology can be maintained in a repository completely abstracted from any programming environment. It needs to be created and maintained by domain experts. With the explosive growth of IoT constantly new devices, applications, organizations, industries, and even countries are added. Metadata abstraction parallels object-oriented programming and unfortunately so do the tools used so far to maintain and extend ontologies.

But now our SaaS solution Coreon makes sure that IoT devices understand each other. Not only does Coreon function with its API as a semantic gateway in the IoT connectivity architecture, it also provides a modern, very easy-to-use application to maintain ontologies; featuring a user interface domain experts can actually work with. With Coreon they can deliver the knowledge necessary for semantic interoperability so that IoT applications can unlock their full value.

Coreon will be presented at the Bosch ConnectedWorld Internet of Things conference February 2018 in Berlin. If you cannot come by our stand (S20) just flip thru our presentation or drop us a mail with questions. 

Monday, January 29, 2018

Language Service Providers Need to Look Ahead to Compete with Machines

By Rachel Wheeler, Morningside Translations

Language localization services have been big business, and estimates indicate that the market will grow at an annual rate of about 7%. Companies that focus solely on translations services will continue to find demand for several years to come. The global marketplace, however, also presents new opportunities for language service providers (LSPs) to elevate their services and expand their businesses beyond translation alone.

Other LSPs Are Not The Only Competition

Some of the key benefits that professional translation agencies provide are quality translation and local expertise. To date, machine language translation software has had it limitations: poor quality, faulty grammar and syntax, and lack of contextual understanding. LSPs have benefited from these flaws by being able to provide a superior alternative.

However, in 2017, Google introduced Google Neural Machine Translation (GNMT). What GNMT promises to provide is a new machine approach that will directly compete with human translators. Machine learning translation software has relied on an algorithmic approach to translation that was an almost a word-for-word dictionary approach. Therein lies its major flaw: it can only learn through predictive behavior analysis.

Neural networks like GNMT, however, incorporate a more complex structure that mimics the way the human brain processes information. This approach replicates the idea of intuition in many ways, not simply hard definitions. In its first published iteration, Google is already claiming a 60% reduction in errors.

For LSPs, these neural networks mean more–and cheaper–competition in the future. The nature of work for translation agencies will need to change in order to remain relevant.

Marketing Remains the Realm of People

By far, the main edge LSPs will have over machine translation is experience and local culture understanding. For global businesses, marketing their goods and services is not just a matter of translating words. Successful marketing understands the emotional impact of how information is presented.

Subtle differences in words–“discover” versus “find”, for example–have a different impact in sales and marketing than they do in more formal written content. Factoring in the additional layer of translation word choices, and the tone or intent of words can change dramatically beyond the original purpose.

Marketing content does not automatically translate from one language to another. Even visual imagery can fall in the purview of the cross-cultural marketer. Lingerie, for instance, is promote differently in conservative countries than in the West. LSPs are in the perfect position to expand their services into marketing, either as outside consultants or even agency-level providers.

Essentially, their ability to localize is a human translator’s greatest differentiator. Whether that’s leveraged for eLearning localization or creating images for a website specifically geared towards a regional audience, this is where an LSP can still shine.

Data Mining Works In Any Language

With today’s enormous output of information, data mining has become big business of its own. Data miners often refer to their work as “discovering insights.” As they review the clicks of a website, the comments on social media, and results of customer surveys, they inherently build a consumer profile with cultural bias built in.

LSPs with experts in particular languages and cultures offer the opportunity to sift through these insights in the original language that a non-native speaker can miss in translation.

Plan Ahead for Competitive Advantage

The technology world makes no secret of its innovations. LSPs should keep on eye on the changes and trends and plan for the future. By anticipating the coming shift in global demand for translation service, language service providers can be ahead of their competitors instead of playing catch-up.

What a great follow up to Coreon's last newsletter we welcome contributions from partner companies and industry experts.
This guest post is written by Rachel Wheeler from Morningside Translations.

Monday, September 18, 2017

Multilingual Knowledge supporting AI, IoT, and Industry 4.0

A Review on Summer Events

We would like to share some impressions from recent events and conferences. The interesting common denominator was the following themes: how can we leverage and deploy terminology assets in other business processes? How can we deploy the valuable knowledge in terminology assets to support AI, Machine Learning, Internet of Things, and Industry 4.0?

Coreon Innovation Seminar 

The Future of Human Expert Knowledge

Experts in machine learning and industry consultants gathered in Berlin to discuss and brainstorm about the opportunities Coreon provides for the diverse fields they work in. The Coreon use cases presented were: Cross-border e-commerce, AI expert know-how for knowledge heavy applications, and EU Institutions and interoperability. The event was by personal invitation only and was a huge success. We look forward to repeating it soon! Please click here if you would like to be invited next time.

ILKR 2017: Industry 4.0 meets Language and Knowledge Resources

The first trip brought us to Vienna to the Austrian Standards Institute. The ILKR 2017 took place just ahead of the ISO TC37 annual meeting. As its title suggests, ILKR tackles the question how multilingual knowledge resources enable Industry 4.0. Thus many presentations explored the possibilities around multilingual knowledge management, knowledge transfer, and new business models.

No Industry 4.0 without Semantics

Our contribution illustrated why the Internet of Things and Industry 4.0 need semantics. When hardware devices speak to each other, they interoperate. This requires a mutual understanding of what they actually do, like “I measure temperature.
Interoperability by Multilingual Knowlegde System MKS semantic mapping
What do you measure?” The answer is in the semantic of the devices’ metadata. We explained how Multilingual Knowledge Systems (MKS) resolve this challenge and how they facilitate interoperability. And how existing terminologies, taxonomies, and ontologies can be re-purposed to become an MKS.

ILKR was followed by a pretty exciting workshop on eCl@ss and Multilingual Product Master Data Management. It had a particular focus on how e-procurement processes benefit from classifications and knowledge systems.


TSS 2017: Terminology Summer School

This year back in Cologne, the TSS is a five day course that attracts participants worldwide who look for a kick-start in terminology and knowledge resource management. During the first 3 days, TSS usually hovers around the fundamentals of terminology management and its role in business processes. Then we were invited to give two presentations:
    Michael Wetzel, Coreon MD, about KOS and Semantic Web
  1. Terminologies and other Knowledge Organization Systems (KOS): What is a KOS, what are its benefits, typical examples, the role it plays in the Semantic Web? What is the difference between a classification, a taxonomy, a thesaurus, and an ontology?
  2. Knowledge meets Language: Multilingual Concept Maps: How Coreon is a fusion of terminology with taxonomy / ontology, what benefits organizations enjoy by deploying Multilingual Knowledge Systems
Coreon is proud to be a regular sponsor of TSS, and we look forward to next year, then again in Vienna (9-13 July 2018).


Terminology - Ontology Round Table

Mid-August we were invited to a one day workshop on touch-points between terminology and ontology data and science. It took place at the HS Karlsruhe, sponsored by DIT, and organized by Petra Drewer, Francois Massion, and Donatella Pulitano. The workshop benefited from a valuable mix of participants: academic researchers from the terminology and ontology world, industry and institutional representatives (SAP, DIN, Deutsche Bahn …), and tool vendors. Its goal was to find commonalities and differences between the two disciplines. As a provider of a unified solution we contributed to the workshop by illustrating how Coreon customers benefit from a fusion of terminology with ontology. Experts confirmed our claim that humanly curated resources, i.e. MKS, are indispensable to make Machine Learning work for less resourced domains and languages.

We recommend Petra’s and Francois’ presentation at the upcoming tekom conference on exactly that topic, Wed, 25 Oct, 11:15: Why Artificial Intelligence requires intelligent terminologies (and terminologists)!

See Coreon live this Autumn 2017

And of course, we’d be happy to meet you on upcoming events this autumn:
  • LT Industry Summit, 9-11 Oct, Brussels
    Meet Jochen Hummel, Coreon CEO and Chairman of the Board of LT Innovate at the event. Do not miss the opening keynote by Marija Gabriel, Commissioner for Digital Economy and Society, and Jochen's panel session "Artificial Intelligence: Hype or Reality?" on Oct 10, 9am.
  • tekom / tcworld, 24-26 Oct, Stuttgart
    Find us in the large hall C2, booth 2/G04 together with our partner company Semantix.
    We are proud to present recent highlights, such as brand new filtering capabilities and inline formatting! Learn how Multilingual Knowledge Systems boost AI and Machine Learning solutions and how they make the Internet of Things and Industry 4.0 work. Join us for a product demo Tuesday afternoon, 14:45 room C10.1.
Happy networking!

Tuesday, June 6, 2017

The IoT will Thrive on Semantics

Why the IoT will thrive on Semantics?In the Internet of Things (IoT) all devices are supposed to communicate among themselves, worldwide. Only, what are they saying to each other? Recently, former Siemens CTO Siegfried Russwurm got to the core of the issue: “Industry 4.0 needs first of all semantics. We can only get through interfaces and breaking points using unified semantics." Apparently not only civil servants in cross-border projects or industry supply chain managers need semantic interoperability. The billions and billions of IoT devices need semantic interoperability as well.

The Must of Semantic Technologies

Sebastian Tramp, coordinator of the Linked Enterprise Data Services (LEDS) project, nicely explains why the vision of the IoT and Industry 4.0 cannot be realized without semantics. If the meaning of IoT devices is not clear, it’s hard for them to interact or even communicate. For this the devices and their relevant metadata must be clearly defined. If, for example, some value is supposed to be measured, the data stream needs to contain information which sensor took the value when and where. But also what this value is all about. The power of the IoT is based on combining data from different sources. To link this data in a meaningful way you need interfaces in form of shared knowledge, i.e. ontologies. That’s what semantic technologies deliver.

Textual Metadata

Human language plays a surprisingly big role in the IoT. For example, a visual sensor’s image Exif information records under [Flash mode] the value “flash, red eye, no strobe return”. Another device processing this textual metadata needs to understand what “… red eye, no strobe …” actually means. And, very important, if it can’t provide specific processing for the strobe usage, it should conclude the more generic fact that a flash was active. To make things even more complex, depending on where the device was built it might say this in Chinese or German.

Leverage Terminologies, Taxonomies, and Ontologies

Luckily Multilingual Knowledge Systems (MKS) like Coreon deliver the required semantic and linguistic intelligence for the communication of IoT devices. Companies can leverage existing resources such as word lists, multilingual termbases, and taxonomies to build their metadata concepts with corresponding labels in one or more languages. The metadata concepts need to be semantically structured at least in broader-narrower relations. Through auto-taxonomisation a provisional graph is suggested which is reviewed and finalised by subject matter experts. Knowledge resources require often coverage of several languages. Mono- and bilingual term extraction, text and translation memory harvesting algorithms reduce this effort significantly.

This way a knowledge graph is created with each node representing a metadata meaning expressed by one or more labels. When shared this graph become the interface for IoT devices.

Semantics for the IoT

Without semantic interoperability IoT devices fail to communicate with each other. If human intervention is necessary the Internet of Things with billions of devices remains a buzzword for a great vision. Multilingual Knowledge Systems are a proven solution to make data repositories, systems, organizations, and even countries interoperable. They will provide the unified semantics for the Internet of Things, globally.

Learn more about Coreon or jump right in for a look and feel.

Wednesday, April 5, 2017

Why Machine Learning still Needs Humans for Language?

Outperforming Humans

Machine Learning (ML) begins to outperform humans in many tasks which seemingly require intelligence. The hype about ML makes it even into mass media. ML can read lips, recognizes faces, or transform speech to text. But when ML has to deal with the ambiguity, variety and richness of language, when it has to understand text or extract knowledge, ML continues to need human experts.

Knowledge is Stored as Text

The Web is certainly our greatest knowledge source. However, the Web has been designed for being consumed by humans, not by machines. The Web’s knowledge is mostly stored in text and spoken language, enriched with images and video. It is not a structured relational database storing numeric data in machine processable form.

Text is Multilingual

The Web is also very multilingual. Recent statistics show that surprisingly only 27% of the Web’s content is English and only 21% in the next 5 most used languages. That means more than half of its knowledge is expressed in a long tail of other languages.

Constraints of Machine Learning


ML faces some serious challenges. Even with today’s availability of hardware, the demand for computing power can become astronomical when input and desired output are rather fuzzy (see the great NYT article "The Great A.I. Awakening").

ML is great for 80/20 problems, but it is dangerous in contexts with high accuracy needs: “Digital assistants on personal smartphones can get away with mistakes, but for some business applications the tolerance for error is close to zero", emphasizes Nikita Ivanov, from Datalingvo, a Silicon Valley startup.

ML performs good on n-to-1 questions. For instance, in face recognition “all these pixel show which person?” has only one correct answer. However, ML is struggling in n-to-many or in gradual circumstances … there are many ways to translate a text correctly or express a certain piece of knowledge.

ML is only as good as its available relevant training material. For many tasks mountains of data are needed. And the data better be of supreme quality. For language related tasks these mountains of data are often required per language and per domain. Further, it is also hard to decide when the machine has learned enough.

Monolingual ML Good enough?


Some suggest why not process everything in English. ML does also an OK job at Machine Translation, like Google Translate. So why not translate everything into English and then lets run our ML algorithms? This is a very dangerous approach since errors multiply. If the output of an 80% accurate Machine Translation becomes the input to an 80% accurate Sentiment Analysis errors multiply to 64%. At that hit rate you are getting close to flipping a coin. 


Human Knowledge to Help


The world is innovating constantly. Every day new products and services are created. To talk about them we continuously craft new words: the bumpon, the ribbon, a plug-in hybrid, TTIP ‒ only with the innovative force of language we can communicate new things.

Struggle with Rare Words

By definition new words are rare. They first appear in one language and then may slowly propagate into other domains or languages. There is no knowledge without these rare words, the terms. Look at a typical product catalog description with the terms highlighted. Now imagine this description without the terms – it would be nothing but a meaningless scaffold of fill-words.

Knowledge Training Required

At university we acquire the specific language, the terminology, of the field we are studying. We become experts in that domain. But even so, later in our professional career when we change jobs we still have to acquire the lingo of the new company: names of products, modules, services, but also job roles and their titles, names for departments, processes, etc. We get familiar with a specific corporate language by attending training, by reading policies, specifications, and functional descriptions. Machines need to be trained in the very same way with that explicit knowledge and language.

Multilingual Knowledge Systems Boost ML with Knowledge


There is a remedy: Terminology databases, enterprise vocabularies, word lists, glossaries – organizations usually already own an inventory of “their” words. This invaluable data can be leveraged to boost ML with human knowledge: by transforming these inventories into a Multilingual Knowledge System (MKS). An MKS captures not only all words in all registers in all languages, but structures them into a knowledge graph (a 'convertible' IS-A 'car' IS-A 'vehicle'…, 'front fork' IS-PART of 'frame' IS-PART of 'bicycle').

It is the humanly curated Multilingual Knowledge System that enables ML and Artificial Intelligence solutions to work for specific domains with only small amounts of textual data and also for less resourced languages.

Thursday, March 23, 2017

Excel with Enterprise Taxonomy

In multiple blog posts we have mentioned Multilingual Knowledge Systems (MKS) and how it is a core component in several applications both monolingual and multilingual. An MKS is in fact a multilingual Enterprise Taxonomy.

We have explained what an MKS is and now we want to advise you how to build one.

People often fear the task of creating the basic infrastructure (Enterprise Taxonomy) for their operations in different countries. They think that it is too costly, needs special expertise and is difficult to maintain. Often due to an expensive software that is homegrown and cumbersome to use. What many do not understand is that they already have this data and have been paying for it for years in their translation contracts.

What you need to do is the following:

  • Collect your terminology data in all the languages you need from your translation provider and send it to us at
  • Assign a responsible knowledge carrier with a good overview of your operations. 

At Coreon we will manage your terminology data and in collaboration with you and your experts our team will structure, verify and QA the result.

A RESTful API makes connectivity straight forward. Your company can easily add a new product/service/operation on top of your Enterprise Taxonomy.

Deploy the power of your MKS in your applications. Contact us - we get back to you with a proposal that will do more than make you happy - it will boost you career!

Saturday, December 3, 2016

Symbiosis of Language Technology and AI at LT-Accelerate

AI and Natural Language Processing propel each other, because most of human knowledge and interaction is textual. What is textual is globally always multilingual. The LT-Accelerate Conference (Brussels, Nov 20-21) focuses on Text Analytics, AI, and related subjects. The speakers, CEO of SMEs, project leads/data scientist of larger companies, and NLP/AI researchers, provided amazing insights into the progress the field has made in the last year. 

Needless to say, also here the industry’s buzzwords Deep, Neural, Machine Learning are ubiquitous. Luckily innovators have become much better in explaining the concepts behind and how to use them. Open Source Software puts these powerful tools also in the hand of smaller teams. Matthew Honnibal from spaCy summarized on use case nicely: “You shall know a word by the company it keeps”.

Michalis Michael, CEO of DigitalMR, sets the bar high for state-of-the-art text analytics. The sentiment accuracy and topic match has to be >80% while significantly reducing noise. Only by supporting all languages enterprises become omniscient. Human emotions are slightly more complex than Positive/Neutral/Negative. HeartBeat AI, for example, features a comprehensive emotion model. Text analytics needs to be meaningfully integrated in existing surveys and other data sources. Profiling allows customer segmentation by demographics or other derived variables. 

Demanding requirements, but when done right text analytics strongly correlates with survey results. Only that it is much cheaper. Therefore the industry is bullish that their currently still small 3% share of the $65B spent annually on market research will grow dramatically.

Mike Hyde, former Skype’s Director of Data and Insights, explained why Bots are the new Apps. These bots need to understand language. They must have access to and make sense of enterprises knowledge. And the bots have to be polyglot. A rich playing field for language technology deployed on top of a Multilingual Knowledge System.

Many believe Machine Learning can do miracles. And ML does, as long as there are mountains of good data at hand. For example, Google claims to have outperformed humans in lip reading (automatic speech recognition of vids is at 95-98% accuracy, so lots of data). Microsoft claims that they do as well as humans in describing pics in one sentence. 

However, often there aren't humongous amount of data available. Obviously “>80%” accuracy doesn’t cut it, when applications deal with serious matters such as health, legal, or money. The community agrees that for most use cases Machine Learning needs to be based on human knowledge: on taxonomies, ontologies, and terms.