Author : Dr. A. Syed Mustafa, Hanan Abdul Razack, kifulla Khan, Anushka Bhakare, Ibaad Khan
Date of Publication :7th January 2026
Abstract: LinguaLive is a real-time multilingual translation platform built on the fine-tuning and adaptation of the SeamlessM4T model for Indian languages. Integrating Automatic Speech Recognition (ASR), Neural Machine Translation (NMT), and Text-to-Speech (TTS) within a unified deep learning framework, LinguaLive enhances the model’s performance for low-resource regional languages such as Tamil, Hindi, and Malayalam. The system fine-tunes SeamlessM4T using datasets like jhu-clsp/seamless-align and pszemraj/t2t\_re\_pretrain-small, which together provide large-scale multilingual speech-text alignment and text-to-text pretraining data suitable for improving model generalization across languages. Developed with a Python-based backend and React interface, it provides an efficient and interactive real-time translation experience. Experimental evaluation demonstrates significant improvements—BLEU scores increased by up to 19.4%, WER reduced by 22.6%, and latency reduced by 10.3% compared to the baseline SeamlessM4T. These results validate the effectiveness of domain-specific fine-tuning in addressing linguistic diversity and speech variability across Indian languages. LinguaLive thus establishes a practical pathway for inclusive, context-aware, and high-fidelity multilingual communication in sectors such as education, healthcare, and cross-cultural collaboration.
Reference :