Analysing English-Arabic Machine Translation
eBook - ePub

Analysing English-Arabic Machine Translation

Google Translate, Microsoft Translator and Sakhr

Zakaryia Almahasees

Share book
  1. 184 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Analysing English-Arabic Machine Translation

Google Translate, Microsoft Translator and Sakhr

Zakaryia Almahasees

Book details
Book preview
Table of contents
Citations

About This Book

Machine Translation (MT) has become widely used throughout the world as a medium of communication between those who live in different countries and speak different languages. However, translation between distant languages constitutes a challenge for machines. Therefore, translation evaluation is poised to play a significant role in the process of designing and developing effective MT systems. This book evaluates three prominent MT systems, including Google Translate, Microsoft Translator, and Sakhr, each of which provides translation between English and Arabic. In the book Almahasees scrutinizes the capacity of the three systems in dealing with translation between English and Arabic in a large corpus taken from various domains, including the United Nation (UN), the World Health Organization (WHO), the Arab League, Petra News Agency reports, and two literary texts: The Old Man and the Sea and The Prophet. The evaluation covers holistic analysis to assess the output of the three systems in terms of Translation Automation User Society (TAUS) adequacy and fluency scales. The text also looks at error analysis to evaluate the systems' output in terms of orthography, lexis, grammar, and semantics at the entire-text level and in terms of lexis, grammar, and semantics at the collocation level.

The research findings contained within this volume provide important feedback about the capabilities of the three MT systems with respect to English Arabic translation and paves the way for further research on such an important topic. This book will be of interest to scholars and students of translation studies and translation technology.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Analysing English-Arabic Machine Translation an online PDF/ePUB?
Yes, you can access Analysing English-Arabic Machine Translation by Zakaryia Almahasees in PDF and/or ePUB format, as well as other popular books in Langues et linguistique & Traduction et interprétation. We have over one million books available in our catalogue for you to explore.

Information

Publisher
Routledge
Year
2021
ISBN
9781000472806

1

Introduction

DOI: 10.4324/9781003191018-1

1.1 Overview of MT

Translation is an important medium that bridges the gap between human communities and ensures the best way of sharing knowledge and communication among them. The 20th century has been characterised as an era in which technology had a ubiquitous influence on human life. Advances in technology have increased interest in employing technology for human services. One of the most advanced inventions is the internet, which is used by a variety of entities, including individuals, enterprises, professionals, and governments around the world. People are now living in a globalised world, which is sometimes described as a ‘village’ due to the huge spread of the internet in all spheres of our life. Importantly, provided they have access to the internet, people have access to a great number of databases all over the world. The combination of MT services with the internet enables humans to read in their language, documents that were written in different languages. In this domain, there are several translation systems, including some that require paid subscriptions, others providing free access, and others yet again being desktop translation tools, which somewhat limits their popularity, capabilities, and use. The availability of online MT systems for free or as a low-cost, service makes it necessary to verify the capacity of MT to provide the end-users with a constructive feedback about the strength and the limitations of MT systems.
MT is one of the applications of natural language processing (NLP) that involves the utilisation of computers to offer translation for human languages automatically. MT is the process of using computers to provide translation from one language to another via monolingual/bilingual dictionaries, corpus-based and neural networks, and other algorithm-based processes. It follows that MT is not simply substituting words for other words, but it entails applying linguistics rules via rule-based machine translation (RBMT) approaches. Moreover, it also relies on statistical models via statistical-based machine translation (SBMT) approaches to produce translations based on already set or electronically available corpora. MT relies on linguistic knowledge of syntax, morphology, and semantics in its processing from a source language (SL) into a target language (TL) for RBMT approaches and statistical models for SBMT approaches. It can translate words, sentences and full documents and then present them to the end-users. However, MT should not be confused with computer-aided translation (CAT). MT is different from CAT and its tools, even though both of them enhance the process of translation by using a computer. MT provides automatic translation in the manner described as autonomous translation with no human involvement, whereas CAT aims to provide suitable tools to assist human translation, such as translation memories (TM) and term bases (TBs).
TM is a database that stores the translation units of previous translated texts to aid human translators, retrieve previous human translations and handle repeated phrases and sentences easily in certain projects. MemoQ (2019) indicated, “When you start to translate, the CAT tool will show in the translation results pane segments from these databases, which are similar to the segment being translated, these are called matches”. TB is also an essential part of the translation tools, where giving the translators the ability to develop their own bilingual glossaries in their subject areas. Some well-known CAT tools are SDL Trados Studio and MemoQ, and others. SDL Trados is a computer-aided translation tool that provides editing or reviewing translations, manages translation projects, organises terminology, and connects them to MT. It is the most trusted tool in the translation profession “SDL Trados Studio, the CAT tool is used by over 250,000 translation professionals, provides a range of sophisticated features to help you complete projects more quickly and easily” (SDLTrados, 2019). MemoQ is also another CAT system that allows users to edit, revise, and perform their translation tasks quickly. Unlike CAT tools, MT relies on a set of translation approaches to generate translations based on linguistic rules, such as RBMT and the data-driven rules like SBMT and neural machine translation (NMT).

1.2 MT systems chosen for the study

The progress of MT can be referred back to the media revolution, the advancement of NLP research and applications, artificial intelligence development, and the need for ensuring global communication among people speaking different languages. As a result, MT systems have advanced rapidly, and many MT systems have appeared. The present study has chosen three systems that are globally accredited as widely used systems, Google, Microsoft Translator, and Sakhr. Each of which provides fully automatic translation for English and Arabic on a variety of platforms: desktop, mobile devices, online, speech translation, voice, and offline.

1.2.1 Google

Google Translate, which is a statistical machine translation (SMT) system, is a multilingual translation portal provided by Google Company since 2006. It is a free multilingual machine translation service that provides translations for over 103 languages including “type, talk, snap (image translation), see, and write offline translations from one language into another” (Google Translate, 2018). It is estimated that there are more than 500 million daily users of Google. The most common translations executed using the service are between English and Spanish, Arabic, Russian, Portuguese and Indonesian (Sommerlad, 2018). Moreover, most users are from outside the United States: “Ninety-two percent of our translations come from outside of the United States, with Brazil topping the list” (Google Translate Blog, 2016). Google Translate has several translation features, such as pronouncing translated texts, highlighting the corresponding words in the source and target text, automatic language detection, suggestion of alternate translations, voice recognition translating, and image translation. It is widely used on the web interfaces of Mozilla Firefox and Google Chrome. More importantly, Google Translate has an active translation community and suggested translation service, whereby the volunteer can select up to five languages to help and suggest translations with improved accuracy. In 2016, Google Translate changed its approach from the SBMT to the Neural Machine Translation (NMT) system in order to provide adequate and accurate translations for the chosen languages, including Arabic. NMT is an artificial neural network that predicts the sequences of words based on large parallel datasets with the aim of mapping text from the source language to the target language. It has a large number of highly specialised elements to solve linguistic problems. Moreover, NMT aims at imitating humans in their translation using neural networks, which make use of large parallel corpora to learn from both the corpora and all previous translated texts (United Language Group, 2019a). Wu et al. (2016, p. 1) indicates that Google Translate replaced the SBMT approach with NMT upgrade to “bridge the gap between human and Machine Translation [and] address the issues that face MT, such as rare words”.

1.2.2 Microsoft Translator

Microsoft Translator is a multilingual translation portal provided by Microsoft Translator Corporation to translate texts or entire websites; the service became active in the year 2000 (Menezes, 2019). Currently, it provides translation for personal use for free and paid subscription for enterprise use, including Microsoft Translate Office, SharePoint, Microsoft Translate Edge, Microsoft Translate Lync, Yammer, Skype Translator, Visual Studio, Internet Explorer, and Microsoft Translator apps for Windows, Windows Phone, iPhone and Apple Watch, and Android phone and Android Wear. They began their translation service utilising an SBMT approach, but as of 2016 have substituted it with NMT. They used the automatic evaluation metric Bilingual Evaluation Understudy (BLEU) to verify the accuracy of their output. Recently, Microsoft Translator began to offer free translation services for 60 languages and this number increases from time to time. It is used to translate words, texts, speech, and images (Microsoft Translator, 2018).

1.2.3 Sakhr

Sakhr is the first Arabian company that has worked on natural language processing of Arabic. It was initially established as a division of Al Alamiah Electronics Co. LLC in 1982 with the far-reaching vision to bring Arabic language support to the new age of information technology. It started with Sakhr Enterprise Translation system, which is considered a Computer Aided Translation (CAT) to translation teams. The company has provided several products for Arabic, such as Arabic to English Machine Translation, optical character recognition for Arabic-script based languages including Dari and Pashto, Arabic text-to-speech or TTS, and Arabic search engine (Sakhr, 2019a). Sakhr uses “a hybrid engine that optimizes rules-based and statistical-based processes to achieve rapid, highest accuracy translation. The engine is a full-fledged integrated system embedding NLP processors, formal grammars, transfer lexicons, and enterprise-specific terminology” (Sakhr, 2019b). Zughoul and Abu-Alshaar (2005, p. 1032) show that Sakhr Translation Company has contributed greatly in providing Arabic with the first MT products: “Sakhr has done to Arabic the basic work that was done earlier to European languages and mainly to English … to make Natural Language Processing of Arabic feasible”.
It is worth stating that Sakhr provides only paid bidirectional translation services between English and Arabic, and indeed, at the price point of USD 350 per month, Sakhr’s services are not accessible to everyone. The researcher has paid USD 700 to conduct the two tests over two months of testing the study’s corpus, once in 2016 and again in 2017. Therefore, the study has selected Sakhr translation service to verify whether Sakhr is a reliable source in rendering English-Arabic translation of different registers, and to provide feedback as to whether commercial paid translation services are better than free translation services.
Sakhr is the first system in the Arab world, while Microsoft Translator and Google Translate provide their service from the US to the rest of the world. Moreover, all three systems offer translation between English and Arabic. However, there are significant differences between the three systems. Google Translate provides translation for 103 languages, and is considered to be “the most popular, followed by Microsoft Translator” (G2 Grid® for Machine Translation, 2019) with 60 languages, and Sakhr, which is a bilingual MT for English-Arabic only. Sakhr does not publicly release their total number of daily users, though it does list a number of institutions and client partners that use Sakhr as a translation service for their businesses. Both Google Translate and Microsoft Translator support talk, speech, snap, voice, and offline translation, while Sakhr provides desktop translation only. Furthermore, Google Translate and Microsoft Translator provide a free service, while Sakhr provides a paid translation service. Google Translate and Microsoft Translator have adopted NMT, while Sakhr is still using hybrid MT. These architectures will be studied in more depth in Chapter 2, section 2.3.

1.3 English and Arabic

Arabic and English are unrelated languages, with very different structures, semantics, and morphological aspects. English is an Indo-European language. With 379 million speaking English as their first language (Statista, 2019), it is one of the most commonly spoken languages, after Mandarin and Spanish. It is one of the official languages of the United Nations (UN), the Commonwealth of Nations, Asia Pacific Nations, and other international organisations such as the European Union. In addition, English is by far the most commonly studied foreign language in the world. In the 20th century, English becomes the international language used globally in bilateral relations between countries where English is not even spoken as a native language.
On the other hand, Arabic is a branch of the Semitic language family. By number of native speakers, Arabic is one of the most commonly spoken languages after Mandarin, Spanish, English, and Hindi. Statista (2019) reports that Arabic is spoken as a first language by 319 million people in 22 countries. Arabic is one of the official languages of the United Nations and the official language of the League of Arab States and the Arab Gulf Union. Moreover, it is one of the official languages of the Organisation of Islamic cooperation alongside English and French.
Arabic has been divided into three main types: Classical Arabic (CA), Colloquial Arabic and Modern Standard Arabic (MSA). MSA is both an official and literary language. Al-Johani (1982, p. 7) explains that MSA “conforms to the norms of Classical Arabic grammar”. Colloquial Arabic is the language of everyday speech and conversation. Written Arabic is similar in all Arabic speaking countries because each uses MSA as the language of official communication. Al-Suwaiyan (2018, p. 230) states, “It appears that Arabic speaking countries have adopted MSA as their official language, yet dialect varieties of the language exist as well, causing the phenomena of diglossia in Arabic”. Moreover, Zbib and Soudi (2012, p. ix) state, “Arabic-speaking countries are characterized...

Table of contents