Posts by Collection

portfolio

publications

What does an Audio-Visual Speech Recognition Model know about Visemes?

Published in UK & Ireland Speech Workshop 2025 Book of Abstracts, 2025

Phonemes are the basic speech unit with numerous studies exploring the inner workings of end-to-end transformer-based speech models, but they have mainly focused on Audio Speech Recognition (ASR). These studies have shown that there is significant phoneme capturing and encoding within the encoderlayers. The methodologies found in the literature include probing and the use of similarity measures, among others. Considerably less investigation into the interpretability of Audio-Visual Speech Recognition (AVSR) models has been done. In particular, no work has explored what AVSR models learn about visemes, the visual equivalent of phonemes. Our work therefore utilizes the concepts developed for ASR and applies them to AV-HuBERT, where a thorough analysis is performed to establish what the model learns about visemes.

Recommended citation: Papadopoulos, Aristeidis. (2025). "What does an Audio-Visual Speech Recognition Model know about Visemes?"
Download Paper

talks

teaching

Teaching Assistant for 4C5 -Digital Signal Processing

Undergraduate / Postgraduate course, Trinity College, Department of Electrical and Electronic Engineering, 2024

Teaching Assintant for 4C5-Digital Signal Processing. Duties include lab sessions in MATLab and grading of assignments. Course description can be found in link.

Teaching Assistant for 1E6 - Electrical Engineering

Undergraduate course, Trinity College, Department of Electrical and Electronic Engineering, 2025

Teaching assistant for 1e6 - Electrical Engineering. Duties include tutorials, lab sessions with circuits and grading of assignments. Course description can be found in link