PhD Student

Carnegie Mellon University

Ayush Jain

I am a PhD student in Robotics Institute at Carnegie Mellon University working with Prof. Katerina Fragkiadaki. Previously, I received M.S. in Robotics from Carnegie Mellon University advised by Prof. Katerina Fragkiadaki and B.E. in Computer Science from BITS Pilani, Pilani Campus where I spent one year working with Prof. Katerina Fragkiadaki as a research assistant in Machine Learning Department, Carnegie Mellon University. I am supported in part by CMU Robotics Vision Fellowship (AY 24-25).

I am interested in Computer Vision and its application to robot learning. Lately, I have been interested in relationship between various visual modalities: how are 2D, 3D and video modalities related? How can one benefit the other? In my free time, I love playing racquet sports (tennis, squash and badminton), listening to audiobooks and podcasts, watching (and playing) cricket, learning music and travelling.

Virtual MLC office hours (open to everyone): I set aside two hours every week to help newcomers in the fields of robotics, computer vision, and machine learning. If you're new to these areas and need help with research, grad school applications, or anything else, please feel free to book a slot. Here is a living google doc of resources related to commonly asked questions in these office hours.

Interests

Computer Vision
Robotics

Education

PhD in Robotics, 2023-Present

Carnegie Mellon University
Masters in Robotics, 2021-23 ( Thesis PDF Thesis Talk )

Carnegie Mellon University
B.E. in Computer Science, 2017-21

BITS PILANI, Pilani Campus

News

[May 2025] I’ll be spending my summer as a research intern at Meta with Fan Zhang and Adam Harley.

[April 2025] Awarded Meta AI Mentorship Fellowship (AY 25-26). I'll be a visiting researcher at Meta AI working with Roozbeh Mottaghi.

[Nov 2024] Awarded CMU Robotics Vision Fellowship (AY 24-25).

[June 2024] I’ll be presenting ODIN as a highlight paper at CVPR 2024. I’ll also be giving an oral talk at Causal and Object-Centric Representations for Robotics workshop at CVPR 2024.

[May 2024] I’ll be spending my summer as a research intern at Meta with Franziska Meier and Sasha Sax.

[Apr 2024] Awarded outstanding reviewer award in CVPR 2024.

[Sep 2023] Awarded outstanding reviewer award in ICCV 2023.

[Apr 2023] I’ll be continuing in Robotics Institute at CMU for my PhD.

[Oct 2022] I’ll be presenting BUTD-DETR at ECCV 2022.

[May 2022] I’ll be spending my summer as a research intern at Apple with Miguel and Navdeep.

[Dec 2021] Our team got selected for Alexa Prize Simbot Challenge. We will be working on advancing teachable embodied household agents.

[Aug 2021] I’ll be starting my Masters in Robotics at Carnegie Mellon University. I will also continue as a Research Assistant with Prof. Katerina Fragkiadaki.

Recent Publications

Grounded Reinforcement Learning for Visual Reasoning

Grounding reasoning chains in visual observations via RL lead to better performance in visual reasoning tasks.

Gabriel Sarch, Snigdha Saha, Naitik Khandelwal, Ayush Jain, Michael Tarr, Aviral Kumar, Katerina Fragkiadaki

Under Review

PDF Code Project Twitter

Unifying 2D and 3D Vision-Language Understanding

UniVLG achieves state-of-the-art performance in 3D vision-language understanding by unifying 2D and 3D visual processing and leveraging 2D pre-training.

Ayush Jain*, Alexander Swerdlow*, Yuzhou Wang, Sergio Arnaud, Ada Martin, Alexander Sax, Franziska Meier, Katerina Fragkiadaki

ICML 2025

PDF Code Project Twitter

Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D

Operating directly on standard sensor data, Locate 3D brings spatial intelligence to robotics and augmented reality applications in real-world settings.

Paul McVay, Sergio Arnaud, Ada Martin, Arjun Majumdar, Krishna Murthy Jatavallabhula, Phillip Thomas, Ruslan Partsey, Daniel Dugas, Abha Gejji, Alexander Sax, Vincent-Pierre Berges, Mikael Henaff, Ayush Jain, Ang Cao, Ishita Prasad, Mrinal Kalakrishnan, Michael Rabbat, Nicolas Ballas, Mido Assran, Oleksandr Maksymets, Aravind Rajeswaran, Franziska Meier

ICML 2025 (Spotlight)

PDF Code Project Twitter

LIFT-GS: Cross-Scene Render-Supervised Distillation for 3D Language Grounding

Image-based supervision and differentiable rendering (alone) can train 3D vision-language grounding models. This allows us to distill 3D grounding models from a 2D model.

Ang Cao, Sergio Arnaud, Oleksander Maksymets, Jianing Yang, Ayush Jain, Sriram Yenamandra, Ada Martin, Vincent-Pierre Berges, Paul McVay, Ruslan Partsey, Aravind Rajeswaran, Franziska Meier, Justin Johnson, Jeong Joon Park, Alexander Sax

ICML 2025

PDF Code Project

ODIN: A Single Model for 2D and 3D Segmentation

ODIN is a single model for 2D and 3D segmentation.

Ayush Jain, Pushkal Katara, Nikolaos Gkanatsios, Adam W. Harley, Gabriel Sarch, Kriti Aggarwal, Vishrav Chaudhary, Katerina Fragkiadaki

CVPR 2024 (Highlight)

PDF Code Project Poster Twitter

Diffusion-ES: Gradient-free Planning with Diffusion for Autonomous Driving and Zero-Shot Instruction Following

Diffusion-ES is a flexible test-time trajectory optimization method for arbitrary reward functions that combines generative trajectory models and sampling-based search.

Brian Yang, Huangyuan Su, Nikolaos Gkanatsios, Tsung-Wei Ke, Ayush Jain, Jeff Schneider, Katerina Fragkiadaki

CVPR 2024

PDF Code Project

Energy-based Models as Zero-Shot Planners for Compositional Scene Rearrangement

We propose a model that maps spatial rearrangement instructions to goal scene configurations via gradient descent on a set of relational energy functions

Nikolaos Gkanatsios*, Ayush Jain*, Zhou Xian, Yunchu Zhang, Christopher A. Atketson, Katerina Fragkiadaki (* Equal Contribution)

RSS 2023

PDF Code Project

Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point Clouds

Most models tasked to ground referential utterances in 2D and 3D scenes learn to select the referred …

Ayush Jain*, Nikolaos Gkanatsios*, Ishita Mediratta, Katerina Fragkiadaki (* Equal Contribution)

ECCV 2022.

PDF Code Project Blog Twitter

Move to See Better: Towards Self-Improving Embodied Object Detection

Humans learn to better understand the world by moving around their environment to get more …

Zhaoyuan Fang*, Ayush Jain*, Gabriel Sarch*, Adam Harley, Katerina Fragkiadaki (* Equal Contribution)

BMVC 2021.

PDF Code Project

AI-enabled Object Detection in Unmanned Aerial Vehicles for Edge Computing Applications

Unmanned Aerial Vehicles (UAVs) are emerging as a powerful tool for various industrial and smart …

Ayush Jain*, Rohit Ramaprasad*, Pratik Narang, Murari Mandal, Vinay Chamola, F. Richard Yu, Mohsen Guizani (* Equal Contribution)

IEEE Network. 2021.

PDF Code Project

VisDrone-DET2020: The Vision Meets Drone Object Detection in Image Challenge Results

The Vision Meets Drone Object Detection in Image Challenge (VisDrone-DET 2020) is the third annual …

Dawei Du, Longyin Wen, Pengfei Zhu, Heng Fan, Qinghua Hu, Haibin Ling, Mubarak Shah, Junwen Pan, Ayush Jain, Pratik Narang, et al.

ECCV 2020 Workshop.

PDF

NukeBERT: A Pre-trained language model for Low Resource Nuclear Domain

Significant advances have been made in recent years on Natural Language Processing with machines …

Ayush Jain*, N.M. Meenachi, B. Venkatraman

Arxiv 2020.

PDF Code Project Slides

Experience

Research Scientist Intern

Research Intern

Apple

May 2022 – August 2022 Cupertino, California, USA

Graduate Research Assistant

Carnegie Mellon University

Aug 2021 – Present Pittsburgh, Pennsylvania, USA

With Prof. Katerina Fragkiadaki, I am working on language grounding in static 2D and 3D scenes, robot manipultation following language instructions, and instruction following in indoor household environment.

Research Associate

Carnegie Mellon University

May 2020 – Jul 2021 Pittsburgh, Pennsylvania, USA

Under the supervision of Prof. Katerina Fragkiadaki, I worked on enabling an embodied agent to learn about objects without ground truth supervision in an unseen 3D environment just by moving around.

Research Assistant

BITS PILANI

Aug 2019 – Apr 2020 Rajasthan, India

As a research assistant, I worked under the guidance of Prof. Pratik Narang in computer vision. My team and I proposed a novel architectural design that improved the performance of previous methods by a substantial margin.

PhD Student

Carnegie Mellon University

Ayush Jain

Interests

Education

News

Recent Publications

Experience

Research Scientist Intern

Research Scientist Intern

Research Intern

Graduate Research Assistant

Research Associate

Research Assistant

Academic Service and Teaching