I’m a PhD student at Mila - Quebec AI Institute and University of Montreal with Aishwarya Agrawal. I’m also a visiting researcher at Multimodal Foundation Models Team, ServiceNow Research with Sai Rajeswar.
Prior to Mila, I received an MSc in CS from University of Saskatchewan and BSc in CS from Noakhali Science and Technology University.
Building AI systems that truly understand the physical world is an exciting frontier. My research focuses: (1) learning rich visual representations that capture the true structure of the world, and (2) developing controllable generative (diffusion) world models. This naturally connects to my interest in alignment and economic utility.
See my Google Scholar for a full list of publications.
WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation
DL4C Workshop @ ICLR'25
CulturalVQA: Benchmarking Vision Language Models for Cultural Knowledge
EMNLP'24 (Oral)
My writings on how-to-cs-grad, ai research, and systematic issues in Bangladesh.