Monday, January 11, 2021
With a little help from your (AI) friend…Imagine a chaotic pile of books (of course, the less-organized among us may not have to imagine this) being automatically sorted into shelves, branches, and sub-branches, together with an index to help quickly find a desired book. This describes what our semi‑automatic taxonomization method can do. An initial knowledge tree is produced by Machine Learning (ML), using language models stored in huge neural networks. Clustering algorithms on top of word embeddings automatically converts a haystack of concepts into a structured tree. The final curation of the taxonomy is still carried out by a human, but the most time-consuming and tedious aspects of the task have already been dealt with, and in a consistent way.
‘Cobot’ versus manual
In a study, we benchmarked this collaborative robot approach (ML auto‑taxonomization and human curation) against the manual job done by an expert linguist. Below are the data and task flows of the two approaches:
We aimed to taxonomize 424 concepts related to COVID-19. The traditional manual method was tedious and tiring for the human expert, who took a flat list of concepts and turned them into a systematic knowledge graph by working concept by concept to get everything in its right place. Wading through the list from scratch (including constantly switching contexts – from drugs, to vaccines, to social distancing, for example) made progress on the task difficult to measure. Having no perception of how many clusters of concepts still needed to be created was demotivating.
In contrast, our semi-automatic method started off with a tree of 55 suggested clusters of leaf concepts, each representing a specific context. Of course, ML doesn’t always produce the exact results a human expert would (we hear you, AI skeptics!), so some algorithm-suggested clusters were a bit off. However, the majority of the 55 were pretty accurate. They were ready to be worked on in Coreon’s visual UI, making the human curation task much faster and easier. This also enabled progress to be measured, as the job was done cluster by cluster.
From a business perspective the most important result was that the semi‑automatic method was five(!) times faster. The structured head-start enabled the human curator to work methodically through the concepts. The clustered nature of the ML‑suggested taxonomy would also allow the workload to be distributed – e.g., one expert could focus on one medicine, another on public health measures.
More difficult to measure (but nicely visible below) was the quality of the two resulting taxonomies. While our linguist did a sterling job working manually, the automatic approach produced a tidier taxonomy which is easier for humans to explore and can be effectively consumed by machines for classification, search, or text analytics. Significantly, as the original data was multilingual, the taxonomy can also be leveraged in all languages.
A barrier removed
So, can we auto-taxonomize a list of semantic concepts? The answer is yes, with some human help. The hybrid approach frees knowledge workers from the tedious work in the taxonomization process and offers immediate benefits – being able to navigate swiftly through data, and efficient conceptualization.
Most importantly, though, linking concepts in a knowledge graph enables machines to consume enterprise data. By dramatically lowering the effort, time, and money needed to create taxonomies, managing textual data will become much easier and AI applications will see a tremendous boost.
If you’d like to discover more about our technology and services on auto-taxonomization, feel free to get in touch with us here.
Wednesday, December 9, 2020
Current processes violate GDPR
A GDPR-compliant translation workflow
Anonymization is mandatory
Wednesday, December 12, 2018
NMT Crossing the Rubicon
Industry Getting it Wrong
Different Actors, Different Tools
For the new workflow a product design is required, that can support dozens of millions of, mostly occasional, expert revisers. Also, the revisers need to be pointed to the sentences which need revision. This requires multilingual knowledge.
Disruption Powered by Coreon
Wednesday, April 4, 2018
|"Ausgebucht - no further seats left!"|
“Concept Maps Everywhere”
Back to the event ... as one participant tweeted, concept maps were the dominating topic throughout the days. First a workshop by Annette Weilandt (eccenca) on taxonomy, thesauri, and ontologies, followed by a presentation by Petra Drewer (University Karlsruhe). Petra unveiled a plethora of benefits:
- insight into the domain
- systematic presentation
- clear distinction between concepts
- identification of gaps
- equivalence checks across languages
- new opportunities in AI contexts
DTT 2018 Award for a Master Thesis on Coreon
And then the “i-Tüpfelchen” (cherry on the cake) on Friday afternoon: David Reininghaus received this year’s DTT award on his master thesis: “Applying concept maps onto terminology collections: implementation of WIPO terminology with Coreon”. David analyzed in his work how a real graph driven technology outperforms simple hyperlink based approaches: no redundancies, more efficient, less error-prone. David further developed an XSL-based method how to transform the MultiTerm / TBX hyperlink based workarounds into a real graph, visualized in Coreon.
Deutsche Bahn: Terminology-Driven AI Applications
Tom Winter (Deutsche Bahn and President of the DTT) illustrated in his session how terminology boosts AI applications. Through already simple synonym expansion the intranet search engines are now more meaningful (a search for the unofficial Schaffner, now finds even documents where only the approved Zugbegleiter was used). Other applications are automatic pre-processing of incoming requests in a customer query-answering system or even improving Alexa driven speech interaction at ticket vending machines … who says terminology is still a niche application?
From Language to Knowledge
I am excited about the evolution of the DTT in recent years. How many more participants will we see in spring 2020? I am convinced the more the DTT community continues to leave the pure documentation niche and the more the focus moves onto areas that our customer Liebherr or Tom Winter have illustrated, the relevance and awareness level of the community will continue to grow. So that the organisers can again proudly announce: Ausgebucht - no more seats left!
Monday, February 12, 2018
Focus Semantic Interoperability
Controlled Vocabulary and Ontologies
IoT Knowledge Systems made Easy
Monday, January 29, 2018
market will grow at an annual rate of about 7%. Companies that focus solely on translations services will continue to find demand for several years to come. The global marketplace, however, also presents new opportunities for language service providers (LSPs) to elevate their services and expand their businesses beyond translation alone.
Other LSPs Are Not The Only CompetitionSome of the key benefits that professional translation agencies provide are quality translation and local expertise. To date, machine language translation software has had it limitations: poor quality, faulty grammar and syntax, and lack of contextual understanding. LSPs have benefited from these flaws by being able to provide a superior alternative.
However, in 2017, Google introduced Google Neural Machine Translation (GNMT). What GNMT promises to provide is a new machine approach that will directly compete with human translators. Machine learning translation software has relied on an algorithmic approach to translation that was an almost a word-for-word dictionary approach. Therein lies its major flaw: it can only learn through predictive behavior analysis.
Neural networks like GNMT, however, incorporate a more complex structure that mimics the way the human brain processes information. This approach replicates the idea of intuition in many ways, not simply hard definitions. In its first published iteration, Google is already claiming a 60% reduction in errors.
For LSPs, these neural networks mean more–and cheaper–competition in the future. The nature of work for translation agencies will need to change in order to remain relevant.
Marketing Remains the Realm of PeopleBy far, the main edge LSPs will have over machine translation is experience and local culture understanding. For global businesses, marketing their goods and services is not just a matter of translating words. Successful marketing understands the emotional impact of how information is presented.
Subtle differences in words–“discover” versus “find”, for example–have a different impact in sales and marketing than they do in more formal written content. Factoring in the additional layer of translation word choices, and the tone or intent of words can change dramatically beyond the original purpose.
Marketing content does not automatically translate from one language to another. Even visual imagery can fall in the purview of the cross-cultural marketer. Lingerie, for instance, is promote differently in conservative countries than in the West. LSPs are in the perfect position to expand their services into marketing, either as outside consultants or even agency-level providers.
Essentially, their ability to localize is a human translator’s greatest differentiator. Whether that’s leveraged for eLearning localization or creating images for a website specifically geared towards a regional audience, this is where an LSP can still shine.
Data Mining Works In Any LanguageWith today’s enormous output of information, data mining has become big business of its own. Data miners often refer to their work as “discovering insights.” As they review the clicks of a website, the comments on social media, and results of customer surveys, they inherently build a consumer profile with cultural bias built in.
LSPs with experts in particular languages and cultures offer the opportunity to sift through these insights in the original language that a non-native speaker can miss in translation.
Plan Ahead for Competitive AdvantageThe technology world makes no secret of its innovations. LSPs should keep on eye on the changes and trends and plan for the future. By anticipating the coming shift in global demand for translation service, language service providers can be ahead of their competitors instead of playing catch-up.
What a great follow up to Coreon's last newsletter we welcome contributions from partner companies and industry experts.
This guest post is written by Rachel Wheeler from Morningside Translations.
Monday, September 18, 2017
A Review on Summer EventsWe would like to share some impressions from recent events and conferences. The interesting common denominator was the following themes: how can we leverage and deploy terminology assets in other business processes? How can we deploy the valuable knowledge in terminology assets to support AI, Machine Learning, Internet of Things, and Industry 4.0?
Coreon Innovation Seminar
The Future of Human Expert Knowledge
ILKR 2017: Industry 4.0 meets Language and Knowledge ResourcesThe first trip brought us to Vienna to the Austrian Standards Institute. The ILKR 2017 took place just ahead of the ISO TC37 annual meeting. As its title suggests, ILKR tackles the question how multilingual knowledge resources enable Industry 4.0. Thus many presentations explored the possibilities around multilingual knowledge management, knowledge transfer, and new business models.
No Industry 4.0 without SemanticsOur contribution illustrated why the Internet of Things and Industry 4.0 need semantics. When hardware devices speak to each other, they interoperate. This requires a mutual understanding of what they actually do, like “I measure temperature.
MKS) resolve this challenge and how they facilitate interoperability. And how existing terminologies, taxonomies, and ontologies can be re-purposed to become an MKS.
ILKR was followed by a pretty exciting workshop on eCl@ss and Multilingual Product Master Data Management. It had a particular focus on how e-procurement processes benefit from classifications and knowledge systems.
TSS 2017: Terminology Summer SchoolThis year back in Cologne, the TSS is a five day course that attracts participants worldwide who look for a kick-start in terminology and knowledge resource management. During the first 3 days, TSS usually hovers around the fundamentals of terminology management and its role in business processes. Then we were invited to give two presentations:
- Terminologies and other Knowledge Organization Systems (KOS): What is a KOS, what are its benefits, typical examples, the role it plays in the Semantic Web? What is the difference between a classification, a taxonomy, a thesaurus, and an ontology?
- Knowledge meets Language: Multilingual Concept Maps: How Coreon is a fusion of terminology with taxonomy / ontology, what benefits organizations enjoy by deploying Multilingual Knowledge Systems
Terminology - Ontology Round TableMid-August we were invited to a one day workshop on touch-points between terminology and ontology data and science. It took place at the HS Karlsruhe, sponsored by DIT, and organized by Petra Drewer, Francois Massion, and Donatella Pulitano. The workshop benefited from a valuable mix of participants: academic researchers from the terminology and ontology world, industry and institutional representatives (SAP, DIN, Deutsche Bahn …), and tool vendors. Its goal was to find commonalities and differences between the two disciplines. As a provider of a unified solution we contributed to the workshop by illustrating how Coreon customers benefit from a fusion of terminology with ontology. Experts confirmed our claim that humanly curated resources, i.e. MKS, are indispensable to make Machine Learning work for less resourced domains and languages.
We recommend Petra’s and Francois’ presentation at the upcoming tekom conference on exactly that topic, Wed, 25 Oct, 11:15: Why Artificial Intelligence requires intelligent terminologies (and terminologists)!
See Coreon live this Autumn 2017And of course, we’d be happy to meet you on upcoming events this autumn:
- LT Industry Summit, 9-11 Oct, Brussels
Meet Jochen Hummel, Coreon CEO and Chairman of the Board of LT Innovate at the event. Do not miss the opening keynote by Marija Gabriel, Commissioner for Digital Economy and Society, and Jochen's panel session "Artificial Intelligence: Hype or Reality?" on Oct 10, 9am.
tekom / tcworld, 24-26 Oct, Stuttgart
Find us in the large hall C2, booth 2/G04 together with our partner company Semantix.
We are proud to present recent highlights, such as brand new filtering capabilities and inline formatting! Learn how Multilingual Knowledge Systems boost AI and Machine Learning solutions and how they make the Internet of Things and Industry 4.0 work. Join us for a product demo Tuesday afternoon, 14:45 room C10.1.