Students switch to AI to learn languages The overt-stereotype analysis closely followed the methodology of the covert-stereotype analysis, with the difference being that instead of providing the language models with AAE and SAE texts, we provided them with overt descriptions of race (specifically, ‘Black’/‘black’ and ‘White’/‘white’). This methodological difference is also reflected by a different set of prompts (Supplementary Information). As a result, the experimental set-up is very similar to existing studies on overt racial bias in language models4,7. For instance, it’s saved him a great deal of time to be able to find an English word for a tool by describing it. And, unlike when I’m chatting to him on WhatsApp, I don’t have to factor in time zone differences. A not-for-profit organization, IEEE is the world’s largest technical professional organization dedicated to advancing technology for the benefit of humanity.© Copyright 2024 IEEE – All rights reserved. In the Supplementary Information, we include examples of AAE and SAE texts for both settings (Supplementary Tables 1 and 2). Tweets are well suited for matched guise probing because they are a rich source of dialectal variation97,98,99, especially for AAE100,101,102, but matched guise probing can be applied to any type of text. Although we do not consider it here, matched guise probing can in principle also be applied to speech-based models, with the potential advantage that dialectal variation on the phonetic level could be captured more directly, which would make it possible to study dialect prejudice specific to regional variants of AAE23. To evaluate the familiarity of the models with AAE, we measured their perplexity on the datasets used for the two evaluation settings83,87. Perplexity is defined as the exponentiated average negative log-likelihood of a sequence of tokens111, with lower values indicating higher familiarity. Perplexity requires the language models to assign probabilities to full sequences of tokens, which is only the case for GPT2 and GPT3.5. For RoBERTa and T5, we resorted to pseudo-perplexity112 as the measure of familiarity. We excluded GPT4 from this analysis because it is not possible to compute perplexity using the OpenAI API. Advances in artificial intelligence and computer graphics digital technologies have contributed to a relative increase in realism in virtual characters. Preserving virtual characters’ communicative realism, in particular, joined the ranks of the improvements in natural language technology, and animation algorithms. This paper focuses on culturally relevant paralinguistic cues in nonverbal communication. We model the effects of an English-speaking digital character with different accents on human interactants (i.e., users). Our cultural influence model proposes that paralinguistic realism, in the form of accented speech, is effective in promoting culturally congruent cognition only when it is self-relevant to users. For example, a Chinese or Middle Eastern English accent may be perceived as foreign to individuals who do not share the same ethnic cultural background with members of those cultures. However, for individuals who are familiar and affiliate with those cultures (i.e., in-group members who are bicultural), accent not only serves as a motif of shared social identity, it also primes them to adopt culturally appropriate interpretive frames that influence their decision making. In 1998, the New Radicals sang the lyric “You only get what you give” and while they most probably were not referring to issues of language and accent recognition in voice technology, they hit the cause right on the nose. When building a voice recognition solution, you only get a system as good and well-performing as the data you train it on. From accent rejection to potential racial bias, training data can not only have huge impacts on how the AI behaves, it can also alienate entire groups of people. All other aspects of the analysis (such as computing adjective association scores) were identical to the analysis for covert stereotypes. This also holds for GPT4, for which we again could not conduct the agreement analysis. Language models are pretrained on web-scraped corpora such as WebText46, C4 (ref. 48) and the Pile70, which encode raciolinguistic stereotypes about AAE. Crucially, a growing body of evidence indicates that language models pick up prejudices present in the pretraining corpus72,73,74,75, which would explain how they become prejudiced against speakers of AAE, and why they show varying levels of dialect prejudice as a function of the pretraining corpus. However, the web also abounds with overt racism against African Americans76,77, so we wondered why the language models exhibit much less overt than covert racial prejudice. In particular, we discuss the most important challenges when dealing with diatopic language variation, and we present some of the available datasets, the process of data collection, and the most common data collection strategies used to compile datasets for similar languages, varieties, and dialects. We further present a number of studies on computational methods developed and/or adapted for preprocessing, normalization, part-of-speech tagging, and parsing similar languages, language varieties, and dialects. Finally, we discuss relevant applications such as language and dialect identification and machine translation for closely related languages, language varieties, and dialects. In a 2018 research study in collaboration with the Washington Post, findings from 20 cities across the US alone showed big-name smart speakers had a harder time understanding certain accents. For example, the study found that Google Home is 3% less likely to give an accurate response to people with Southern accents compared to a Western accent. With Alexa, people with Midwestern accents were 2% less likely to be understood than people from the East Coast. To check for consistency, we also computed the average favourability of the top five adjectives without weighting, which yields similar results (Supplementary Fig. 6). Current language technologies, which are typically trained on Standard American English (SAE), are fraught with performance issues when handling other English variants. “We’ve seen performance drops in question-answering for Singapore English, for example, of up to 19 percent,” says Ziems. Similar articles At this point, bias in AI and natural language processing (NLP) is such a well-documented and frequent issue in the news that when researchers and journalists point out yet another example of prejudice in language