MCAD: AI Brings Soccer to Life for Visually Impaired Fans

In a groundbreaking stride towards accessibility in sports broadcasting, researchers have introduced MCAD, an innovative end-to-end pipeline designed to generate audio descriptions (AD) for soccer games. This development is poised to revolutionize how visually impaired individuals experience live sports, making the thrill of the game more accessible than ever before. The team behind this project includes Lipisha Chaudhary, Trisha Mittal, Subhadra Gopalakrishnan, Ifeoma Nwogu, and Jaclyn Pytlarz, who collectively have tackled the challenge of automating AD generation without relying on human-annotated ground truth AD.

Traditionally, ADs have been limited to high-quality movie content, relying heavily on meticulously annotated data. However, MCAD breaks new ground by extending its capabilities to the dynamic and fast-paced domain of soccer. The system leverages a Video Large Language Model (VideoLLM), fine-tuned on publicly available movie AD datasets to understand the narrative structure and conventions of AD. This fine-tuning allows MCAD to generate contextually relevant descriptions for soccer games, making it a versatile tool for sports broadcasting.

During inference, MCAD incorporates a wealth of multimodal contextual cues, including player identities, soccer events and actions, and live commentary from the game. These elements are combined with input prompts to the fine-tuned VideoLLM, enabling the system to produce comprehensive AD text for each video segment. This holistic approach ensures that the generated descriptions are not only accurate but also engaging, capturing the essence of the game in real-time.

To evaluate the quality of the generated ADs, the researchers introduced a new metric called ARGE-AD. This metric assesses the presence of five key characteristics in the ADs: the usage of people’s names, the mention of actions and events, the appropriate length of the description, the absence of pronouns, and the overlap from commentary or subtitles. ARGE-AD provides a quantitative measure of the AD quality, ensuring that the descriptions meet the necessary standards for accessibility and clarity.

The researchers validated their approach using both movie and soccer datasets, demonstrating the versatility and robustness of MCAD. They also contributed audio descriptions for 100 soccer game clips, annotated by two AD experts, further enriching the dataset available for future research and development in this field.

The implications of this research are profound. By automating the generation of audio descriptions for soccer games, MCAD not only enhances accessibility but also opens up new possibilities for real-time commentary and analysis. This technology could be integrated into broadcasting systems, making live sports more inclusive for visually impaired fans. Moreover, the development of ARGE-AD provides a standardized way to evaluate the quality of ADs, ensuring that future advancements in this field adhere to high standards of accuracy and engagement.

In conclusion, MCAD represents a significant leap forward in the quest for accessibility in sports broadcasting. Its innovative use of multimodal contextual cues and advanced language models sets a new benchmark for automated AD generation. As this technology continues to evolve, it has the potential to transform the way we experience live sports, making the thrill of the game accessible to all.

Scroll to Top