Avatar

PhD Student

Carnegie Mellon University

Ayush Jain

I am a PhD student in Robotics Institute at Carnegie Mellon University working with Prof. Katerina Fragkiadaki. Previously, I received M.S. in Robotics from Carnegie Mellon University advised by Prof. Katerina Fragkiadaki and B.E. in Computer Science from BITS Pilani, Pilani Campus where I spent one year working with Prof. Katerina Fragkiadaki as a research assistant in Machine Learning Department, Carnegie Mellon University.

I am interested in Computer Vision and its application to robot learning. Lately, I have been interested in relationship between various visual modalities: how are 2D, 3D and video modalities related? How can one benefit the other? In my free time, I love listening to audiobooks and podcasts, playing badminton, watching (and playing) cricket and travelling.

Virtual MLC office hours (open to everyone): I dedicate two hours every week to mentoring newcomers in the fields of robotics, computer vision, and machine learning. If you're new to these areas and need help with research, grad school applications, or anything else, please feel free to book a slot.

Interests

  • Computer Vision
  • Robotics

Education

  • PhD in Robotics, 2023-Present

    Carnegie Mellon University

  • Masters in Robotics, 2021-23 ( Thesis PDF Thesis Talk )

    Carnegie Mellon University

  • B.E. in Computer Science, 2017-21

    BITS PILANI, Pilani Campus

News

[June 2024] I’ll be presenting ODIN as a highlight paper at CVPR 2024. I’ll also be giving an oral talk at Causal and Object-Centric Representations for Robotics workshop at CVPR 2024.

[May 2024] I’ll be spending my summer as a research intern at Meta with Franziska Meier and Sasha Sax.

[Apr 2024] Awarded outstanding reviewer award in CVPR 2024.

[Sep 2023] Awarded outstanding reviewer award in ICCV 2023.

[Apr 2023] I’ll be continuing in Robotics Institute at CMU for my PhD.

[Oct 2022] I’ll be presenting BUTD-DETR at ECCV 2022.

[May 2022] I’ll be spending my summer as a research intern at Apple with Miguel and Navdeep.

[Dec 2021] Our team got selected for Alexa Prize Simbot Challenge. We will be working on advancing teachable embodied household agents.

[Aug 2021] I’ll be starting my Masters in Robotics at Carnegie Mellon University. I will also continue as a Research Assistant with Prof. Katerina Fragkiadaki.

Recent Publications

ODIN: A Single Model for 2D and 3D Segmentation

ODIN is a single model for 2D and 3D segmentation.
CVPR 2024 (Highlight)

Diffusion-ES: Gradient-free Planning with Diffusion for Autonomous Driving and Zero-Shot Instruction Following

Diffusion-ES is a flexible test-time trajectory optimization method for arbitrary reward functions that combines generative trajectory models and sampling-based search.
CVPR 2024

Energy-based Models as Zero-Shot Planners for Compositional Scene Rearrangement

We propose a model that maps spatial rearrangement instructions to goal scene configurations via gradient descent on a set of relational energy functions
RSS 2023

Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point Clouds

Most models tasked to ground referential utterances in 2D and 3D scenes learn to select the referred …
ECCV 2022.

Move to See Better: Towards Self-Improving Embodied Object Detection

Humans learn to better understand the world by moving around their environment to get more …
BMVC 2021.

AI-enabled Object Detection in Unmanned Aerial Vehicles for Edge Computing Applications

Unmanned Aerial Vehicles (UAVs) are emerging as a powerful tool for various industrial and smart …
IEEE Network. 2021.

VisDrone-DET2020: The Vision Meets Drone Object Detection in Image Challenge Results

The Vision Meets Drone Object Detection in Image Challenge (VisDrone-DET 2020) is the third annual …
ECCV 2020 Workshop.

NukeBERT: A Pre-trained language model for Low Resource Nuclear Domain

Significant advances have been made in recent years on Natural Language Processing with machines …
Arxiv 2020.

Experience

 
 
 
 
 

Research Scientist Intern

Meta

May 2024 – Present Pittsburgh, Pennsylvania, PA
 
 
 
 
 

Research Intern

Apple

May 2022 – August 2022 Cupertino, California, USA
 
 
 
 
 

Graduate Research Assistant

Carnegie Mellon University

Aug 2021 – Present Pittsburgh, Pennsylvania, USA
With Prof. Katerina Fragkiadaki, I am working on language grounding in static 2D and 3D scenes, robot manipultation following language instructions, and instruction following in indoor household environment.
 
 
 
 
 

Research Associate

Carnegie Mellon University

May 2020 – Jul 2021 Pittsburgh, Pennsylvania, USA
Under the supervision of Prof. Katerina Fragkiadaki, I worked on enabling an embodied agent to learn about objects without ground truth supervision in an unseen 3D environment just by moving around.
 
 
 
 
 

Research Assistant

BITS PILANI

Aug 2019 – Apr 2020 Rajasthan, India
As a research assistant, I worked under the guidance of Prof. Pratik Narang in computer vision. My team and I proposed a novel architectural design that improved the performance of previous methods by a substantial margin.

Academic Service and Teaching

I reviewed for CVPR 2024 (outstanding reviewer award) ICCV 2023 (outstanding reviewer award), Neurips 2023-24, AAAI 2023, CVPR 2022-23, ECCV 2022-24, BMVC 2020-21, TPAMI 2021-22

I have served as a teaching assistant for the following courses:

  • Advanced Computer Vision, Fall 2024, CMU
  • Learning for 3D Vision, Spring 2024, CMU
  • CS F407 Artificial Intelligence, Fall 2020, BITS Pilani
  • CS F464 Machine Learning, Spring 2020, BITS Pilani
  • CS F111 Computer Programming, Spring 2019 and Fall 2019, BITS Pilani