Ķ��ù�觱
AI�S��q�L���� ���ǻy���D�����H
Stanford researchers gave a popular artificial intelligence chatbot a language test.
�v���֤j�Ǭ�s���w������H�u���z��Ѿ����H�i��y�����աC
They asked the bot in Vietnamese to write a traditional poem in the form known as ��song thất lục bát�� that follows a pattern of lines made up of seven, seven, six, then eight words. When the bot spit out an answer, it wrote a poem but didn��t follow the format.
�L�̭n�D�V�n�y�����H�g�@���Dzθֺq�A�H�֥y�̧Ǭ��C�r�B�C�r�B���r���ۤK�r���u���C���K��v�榡���g�C�����H�R�X���סA�g�F�@���֡A���S�����`�榡�C
The team tried a different prompt, asking what the proper Vietnamese word was for a mother��s younger brother, and it responded with the words for a father��s younger and older siblings.
�o�ӹζ��դF���P���O�A�߰ݺ٩I���˪��̧̪��A���V�n�y��r�O����A���o�^��������ˤ⨬���V�n�y��r�C
While the use of AI has exploded in the West, much of the rest of the world has been left out of the conversation since most of the technology is trained in English. AI experts worry that the language gap could exacerbate technological inequities and that it could leave many regions and cultures behind.
���ަ��H�u���z�ϥζq�E�W�A�@�ɨ�L�\�h�a��o�Q�ư��b��ܥ~�A�]���o����ޤj�����H�^�y�V�m�C�H�u���z�M�a�~�ߡA�y���E���i��[�@��ޤ������A�]�i��N�\�h�a�ϩM��Ʃߦb���Y�C
A delay of access to good technology of even a few years ��can potentially lead to a few decades of economic delay,�� said Sang Truong, a doctoral candidate at the Stanford Artificial Intelligence Laboratory at Stanford University on the team that built and tested a Vietnamese language model against others.
�v���֤j�ǡu�v���֤H�u���z����ǡv�դh�Կ�H�i�СA�O�t�d���y�ô��նV�n�y�ҫ��ζ��������C�L���A�u�O�ߤF�u�u�X�~�~���o�u�}��ޡA�u�]�i��ɭP�g�٩���o�i�ƤQ�~�v�C
The tests his team ran found that AI tools across the board could get facts and diction wrong when working with Vietnamese, likely because it is a ��low-resource�� language by industry standards, which means that there aren��t sufficient data sets and content available online for the AI model to learn from.
�L���ζ��i����յo�{�A����Ө��H�u���z�u��b�B�z�V�n�y�ɡA�i��o�ͨƹ�M����W�����~�A�o�i��O�]���H��~�зǦӨ��A�V�n�y�O�ӡu�C�귽�y���v�A�N���۶V�n�y�b�u�W�S����������ƶ��M���e���H�u���z�ҫ��DzߡC
Low-resource languages are spoken by tens and sometimes hundreds of millions of people around the world, but they yield less digital data because AI tech development and online engagement is centered in the United States and China.
�C�귽�y���Q�@�ɦU�a�W�d�U�ƦܤW���H�ϥΡA�����̲��ͪ��Ʀ��Ƹ��֡A�]���H�u���z��}�o�M�u�W�ѻP�����b����M����C
An analysis of top websites by W3Techs, a tech survey company, found that English makes up more than 60% of the internet��s language data. While English is widely spoken globally, native English speakers make up about 5% of the population, according to Ethnologue, a research organization that collects language data. Mandarin and Spanish are other examples of languages with a significant online presence and reliable digital data sets.
��լd���qW3Techs�w��D�n�������@�����R�o�{�A�^�y�e���ں����y����ƪ�60%�H�W�C�����y����ƪ���s��´�u���ڻy�v���X�A���ޭ^�y�b���y�Q�s���ϥΡA���^�y���y�̶ȥe�@�ɤH�f��5%�C����M��Z����O�㦳���j�u�W�s�b�P�M�i�H�Ʀ��ƶ��y������L�d�ҡC
��Large companies like Google, Apple, OpenAI, for example, have not necessarily trained their models for tools that serve these markets,�� Chinasa T. Okolo, a fellow at the Center for Technology Innovation at the Brookings Institution, said about communities with low-resource languages. ��They don��t provide enough market value for them to do so.��
���|������s�|��зs���߬�s���_�R�ġD���쬥����ϥΧC�귽�y�������s�ɪ��ܡG�u���q�Bī�G�MOpenAI�o���j�����q�A�����|���F�A�ȳo�ǥ������u��ӰV�m�L�̪��ҫ��C���̨S�����Ѩ����������������o�Ǥ��q�o�v�C