AI Models Decode Sound Symbolism in Human Language

In a groundbreaking study, researchers Jinhong Jeong, Sunghyun Lee, Jaeyoung Lee, Seonah Han, and Youngjae Yu have delved into the fascinating realm of sound symbolism, exploring how Multimodal Large Language Models (MLLMs) interpret auditory information in human languages. Sound symbolism, a linguistic concept that refers to non-arbitrary associations between phonetic forms and their meanings, has long been a subject of intrigue. The researchers suggest that this phenomenon can serve as a compelling probe into the inner workings of MLLMs, shedding light on how these models process and understand auditory information.

The study focuses on MLLMs’ performance on phonetic iconicity, which is the idea that certain sounds can evoke specific meanings or images. The researchers examined this across various forms of inputs, including textual (orthographic and IPA) and auditory forms, spanning up to 25 semantic dimensions, such as sharp versus round. To facilitate this investigation, the team introduced LEX-ICON, an extensive mimetic word dataset. This dataset comprises 8,052 words from four natural languages—English, French, Japanese, and Korean—and 2,930 systematically constructed pseudo-words. These words are annotated with semantic features that apply across both text and audio modalities.

One of the key findings of the study is that MLLMs exhibit phonetic intuitions that align with existing linguistic research across multiple semantic dimensions. This suggests that these models can capture and interpret the subtle associations between sounds and meanings, much like humans do. Additionally, the researchers observed phonosemantic attention patterns, which highlight the models’ focus on iconic phonemes—the specific sounds that are particularly evocative or meaningful.

The study’s results bridge the domains of artificial intelligence and cognitive linguistics, providing the first large-scale, quantitative analyses of phonetic iconicity in terms of MLLMs’ interpretability. This research not only advances our understanding of how language models process auditory information but also opens up new avenues for exploring the intersection of technology and human cognition. As we continue to develop more sophisticated AI models, studies like this one will be crucial in uncovering the intricate ways in which these models understand and interact with human language.

Scroll to Top