YiSi, a machine translation teacher who cracks down on errors in meaning

Machine Learning   |   
Published February 12, 2019   |   

Most people have seen bad machine translations based on word-for-word equivalents that can lead to confusion and can sometimes be quite funny. But how do you tell a machine it might not be getting 100 percent on its latest translation assignment, and what it needs to improve?

Meet YiSi, a new machine translation teacher developed by the National Research Council of Canada. YiSi is an open-source software that examines sentences produced by machine translation and compares them against the original text or a human reference translation. YiSi assigns an accuracy score from 0 to 100 to each translated sentence, pinpointing problems in translation for developers to improve in the translation system.

Research officer Jackie Lo came up with the idea behind YiSi–using databases that map out the relationships between words to score machine translation–and developed YiSi’s prototype code in 2017. She then worked closely with software development specialist Darlene Stewart, who made sure YiSi tells the user nicely how to launch evaluation tasks, does not quit when the user provides the wrong database for a particular task, and warns them gracefully when they make a mistake.

You might be wondering who checks YiSi’s work and how we know its scores are accurate. YiSi competes against other automatic translation teachers at international competitions, such as the Metric Shared Task in the Third Conference on Machine Translation (known as WMT). In 2018, after grading over 400,000 translated sentences, which humans also graded for reference, YiSi came first in evaluating meaning for Turkish-to-English, English-to-Russian, and English-to-Turkish sentence outputs, and was also the best overall performer across all 14 language pairs.

What’s next for Jackie, Darlene, and YiSi? In 2019, the trio plans to apply YiSi to collaborative projects with NRC clients, integrate YiSi into machine translation system development toolkits to further promote its use, and enter the WMT 2019 metrics competition. In the meantime, machine translation developers interested in giving YiSi a try can contact the NRC’s Digital Technologies Research Centre.

Quick Facts

  • Named after the Cantonese word for meaning, YiSi is powered by a massive database of word embeddings: sequences of numbers that map out the ‘distance’ or relationships of meaning between words, taking into account the nature of words and how they work in sentences.
  • To generate the 400,000 sentences that YiSi scores during contests, several systems developed by other research teams translate about 3,000 original sentences to and from English: Czech, German, Estonian, Finnish, Russian, Turkish and Chinese.
  • The National Research Council of Canada’s Digital Technologies Research Centre conducts research that makes sense of data and creates value from information. Its experts specialize in advanced analytics, computer vision, natural language processing, and artificial intelligence.