I am a PhD student at the University of Surrey's People-Centred AI Institute specializing in multimodal deep learning, working at the intersection of vision, language, and audio processing. My focus is to leverage foundational AI models to learn audio-visual correspondence and solve real-world challenges. I work within the Universal Perception (UP) Lab under the guidance of Dr. Xiatian Zhu and Dr. Diptesh Kanojia.
Before starting my PhD, I worked as a researcher at TCS-Research, Mumbai under Dr. Sunil Kumar Kopparapu, where I developed cutting-edge solutions in audio event detection, multimodal emotion recognition, and pathological speech processing, contributing to impactful publications and patents in speech and audio signal processing.

   

Research Experience

Researcher
Speech and NLP team, TCS-Research,
Mumbai, India.
2019.08 ~ 2022.09
Audio and Speech Signal Processing, Few-shot Audio Event Detection, Audio Captioning.


Research Intern
Speech and NLP team, TCS-Research,
Mumbai, India.
2019.01 ~ 2019.06
End-to-End Spoken Language Understanding.