{"id":9635,"date":"2023-10-04T13:00:40","date_gmt":"2023-10-04T13:00:40","guid":{"rendered":"https:\/\/shareperformanceinsight.com\/index.php\/2023\/10\/04\/arabic-ai-could-help-open-doors-for-other-languages\/"},"modified":"2023-10-04T13:00:40","modified_gmt":"2023-10-04T13:00:40","slug":"arabic-ai-could-help-open-doors-for-other-languages","status":"publish","type":"post","link":"https:\/\/shareperformanceinsight.com\/index.php\/2023\/10\/04\/arabic-ai-could-help-open-doors-for-other-languages\/","title":{"rendered":"Arabic AI could help open doors for other languages"},"content":{"rendered":"<p class=\"paragraph inline-placeholder\">      The emergence of Chat-GPT and similar platforms has created a buzz around large language model AI \u2013 artificial intelligence trained on vast sets of data from the internet to respond to text commands.  <\/p>\n<p class=\"paragraph inline-placeholder\">      Despite growing interest in AI in the Middle East, Arabic-language models have lagged behind<em>.<\/em> But a team of academics, researchers and engineers in the United Arab Emirates (UAE) recently unveiled a powerful tool tailored to the world\u2019s Arabic speakers, which its creators say could pave the way for large language model (LLM systems) in other languages that are \u201cunderrepresented in mainstream AI.\u201d  <\/p>\n<p class=\"paragraph inline-placeholder\">      Named after the UAE\u2019s largest mountain, \u201cJais\u201d was created in collaboration between Abu Dhabi\u2019s Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Silicon Valley-based Cerebras Systems, and Inception, a subsidiary of UAE-based AI company G42.  <\/p>\n<p class=\"paragraph inline-placeholder\">      Although ChatGPT, Meta\u2019s LLaMA and other LLMs have Arabic-language capabilities, they were mostly trained on English data on the internet, according to Timothy Baldwin, acting provost and professor of natural language processing at MBZUAI.<strong><\/strong>  <\/p>\n<p class=\"paragraph inline-placeholder\">      Instead, Jais used English and Arabic datasets, with a focus on content from the Middle East, allowing it to go beyond \u201cwhat anyone else has been able to achieve for Arabic,\u201d Baldwin says.  <\/p>\n<p class=\"paragraph inline-placeholder\">      Languages that use the Latin alphabet dominate the internet, with English by far the most-used. That means datasets are largest in those languages, according to Mohammed Soliman, director of strategic technologies and the cyber security program at the Middle East Institute, in Washington DC.  <\/p>\n<p class=\"paragraph inline-placeholder\">      Typically, language models trained in English have Western-centric data sets. \u201c[These LLMs] lack awareness of other cultures, adversely affecting the user experience for people of diverse backgrounds,\u201d Soliman added.  <\/p>\n<p class=\"paragraph inline-placeholder\">      As a result of its training,<strong>\u00a0<\/strong>Jais\u00a0understands cultural nuances and dialects, according to MBZUAI, enabling it to be used more widely across different industries. In future releases, the team aims to have Jais work with images, graphs or tabular data instead of just text,<strong>\u00a0<\/strong>broadening its uses and potentially enabling it to interpret medical scans, investment data or data from satellites.  <\/p>\n<h2 class=\"subheader\">    Different dialects<\/h2>\n<p class=\"paragraph inline-placeholder\">      Arabic is the sixth<strong> <\/strong>most spoken language in the world and is rich with a \u201cconstellation\u201d of different dialects, which adds to the complexity of training a language model, Baldwin said.\u00a0Modern Standard Arabic is typically used for official documents and formal writing, but local dialects are often used on blogs or social media. By training on a diverse set of data Jais can usually switch between dialects, said Baldwin.<strong><\/strong>  <\/p>\n<p class=\"paragraph inline-placeholder\">      \u201cThere\u2019s certainly room for improvement there, but the focus has been more on the robustness in terms of being able to understand if we do have more informal inputs to the model,\u201d Baldwin added.  <\/p>\n<p class=\"paragraph inline-placeholder\">      A recent update allows Google\u2019s Bard to also understand questions in over a dozen Arabic dialects, including Egyptian colloquial Arabic and Saudi colloquial Arabic; the response are then returned using Modern Standard Arabic.  <\/p>\n<p class=\"paragraph inline-placeholder\">      Jais has 13 billion parameters, and a 30-billion parameter update is in the works, Baldwin said. Parameters quantify the size of a language model, but not necessarily the accuracy.<strong> <\/strong>ChatGPT-3.5 has around 175 billion parameters, according to OpenAI.  <\/p>\n<p class=\"paragraph inline-placeholder\">      Jais, like other generative AI models, uses instruction tuning to prevent it from creating \u201ctoxic\u201d or \u201charmful\u201d answers, Baldwin said. It won\u2019t generate anything that could lead to self-harm, damage to others, or is suggestive of addiction.<strong> <\/strong>The responses it generates adhere to local rules and customs on topics such as homosexuality and drugs.  <\/p>\n<p class=\"paragraph inline-placeholder\">      MBZUAI had \u201cvarious dialogues\u201d with the UAE government and other institutions around responsible AI, which were referenced when developing Jais, according to Baldwin.  <\/p>\n<h2 class=\"subheader\">    Regional developments<\/h2>\n<p class=\"paragraph inline-placeholder\">      There have been growing efforts in the UAE to develop generative AI systems. It was the first country in the world to appoint a minister of AI, in 2017, and the region\u2019s largest generative AI model, Falcon, was unveiled by Abu Dhabi\u2019s Advanced Technology Research Council and<strong> <\/strong>the Technology Innovation Institute (TII) in March, with a new iteration released in September.  <\/p>\n<p class=\"paragraph inline-placeholder\">      Although not currently available in Arabic, Falcon is more powerful than Jais in English, with 180 billion parameters, and outperforms competitors such as Meta\u2019s LLaMA 2 based on its ability to reason, code and complete knowledge tests, according to TII. Unlike Google\u2019s Bard and ChatGPT, Falcon and Jais are open-source, which means their code is available for anyone to use or change.  <\/p>\n<p class=\"paragraph inline-placeholder\">      A 2018 report by consulting firm PwC estimated that the Middle East could accrue up to $320 billion in benefits from AI by 2030. The region wants to make sure it has its \u201cown capabilities\u201d in terms of AI, says Ali Hosseini, PwC\u2019s Middle East chief digital officer.  <\/p>\n<p class=\"paragraph inline-placeholder\">      \u201cSome of the best open-source models are actually developed in our region,\u201d Hosseini added, referencing Falcon and Jais.  <\/p>\n<p class=\"paragraph inline-placeholder\">      Its makers hope that Jais will further the development of generative AI in the Middle East. \u201cThis is kind of step one of many future steps,\u201d Baldwin said. \u201cNot just for Arabic large language models, but elsewhere.\u201d  <\/p>\n\n<div>This post appeared first on cnn.com<\/div>","protected":false},"excerpt":{"rendered":"<p>The emergence of Chat-GPT and similar platforms has created a buzz around large language model AI \u2013 artificial intelligence trained on vast sets of data from the internet to respond to text commands. Despite growing interest in AI in the Middle East, Arabic-language models have lagged behind. But a team of academics, researchers and engineers <\/p>\n","protected":false},"author":0,"featured_media":9636,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[23],"tags":[],"class_list":{"0":"post-9635","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-world"},"_links":{"self":[{"href":"https:\/\/shareperformanceinsight.com\/index.php\/wp-json\/wp\/v2\/posts\/9635","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/shareperformanceinsight.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/shareperformanceinsight.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/shareperformanceinsight.com\/index.php\/wp-json\/wp\/v2\/comments?post=9635"}],"version-history":[{"count":0,"href":"https:\/\/shareperformanceinsight.com\/index.php\/wp-json\/wp\/v2\/posts\/9635\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/shareperformanceinsight.com\/index.php\/wp-json\/wp\/v2\/media\/9636"}],"wp:attachment":[{"href":"https:\/\/shareperformanceinsight.com\/index.php\/wp-json\/wp\/v2\/media?parent=9635"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/shareperformanceinsight.com\/index.php\/wp-json\/wp\/v2\/categories?post=9635"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/shareperformanceinsight.com\/index.php\/wp-json\/wp\/v2\/tags?post=9635"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}