Junhao CAI

Shanghai AI Lab
Shanghai
China

About Me

I am a Researcher at Shanghai AI Laboratory, working on Vision-Language-Action (VLA) models for robotic manipulation and embodied intelligence. I received my Ph.D. from the Robotics Institute, The Hong Kong University of Science and Technology, where I was advised by Prof. Qifeng Chen and Prof. Michael Yu Wang. Here is my CV.

My research focuses on Vision-Language-Action (VLA) models and embodied intelligence for robotic manipulation. My work spans robotic perception, object pose and physical property estimation, and multimodal representation learning, with the goal of enabling robots to understand and interact with the physical world. Feel free to contact me via email.

Education

M.Eng. Student | Sun Yat-sen University
Time: Sep 2017 - July 2019. Supervisor: Prof. Hui Cheng

Undergraduate Student | South China Normal University
Time: Sep 2013 - July 2017. Supervisor: Prof. Yun Xue

Experience

Researcher | Embodied Intelligence Center, Shanghai AI Lab
Time: Aug 2025 - Now.

Research Intern | Embodied Intelligence Center, Shanghai AI Lab
Time: Jan 2025 - Jun 2025. Mentor: Jia Zeng.

Research Intern | Applied Computer Vision Lab, Institute for Intelligent Computing, Alibaba
Time: Jun 2023 - Jun 2024. Mentor: Yisheng He, Weihao Yuan, and Zilong Dong.

Research Assistant | RAPID Lab, Sun Yat-sen University
Time: Jul 2019 - Aug 2020. Supervisor: Prof. Hui Cheng

Selected Publications [Google Scholar]

InternVLA-A1: Unifying Understanding, Generation and Action for Robotic Manipulation
Junhao Cai, Zetao Cai, Jiafei Cao, Yilun Chen, Zeyu He, Lei Jiang, Hang Li, Hengjie Li, Yang Li, Yufei Liu, Yanan Lu, Qi Lv, Haoxiang Ma, Jiangmiao Pang, Yu Qiao, Zherui Qiu, Yanqing Shen, Xu Shi, Yang Tian, Bolun Wang, Hanqing Wang, Jiaheng Wang, Tai Wang, Xueyuan Wei, Chao Wu, Yiman Xie, Boyang Xing, Yuqiang Yang, Yuyin Yang, Qiaojun Yu, Feng Yuan, Jia Zeng, Jingjing Zhang, Shenghan Zhang, Shi Zhang, Zhuoma Zhaxi, Bowen Zhou, Yuanzhen Zhou, Yunsong Zhou, Hongrui Zhu, Yangkun Zhu, Yuchen Zhu
Preprint, 2026
[Paper] [Project Page] [Github Page]

InternData-A1: Pioneering High-Fidelity Synthetic Data for Pre-training Generalist Policy
Yang Tian, Yuyin Yang, Yiman Xie, Zetao Cai, Xu Shi, Ning Gao, Hangxu Liu, Xuekun Jiang, Zherui Qiu, Feng Yuan, Yaping Li, Ping Wang, Junhao Cai, Jia Zeng, Hao Dong, Jiangmiao Pang
Preprint, 2025
[Paper] [Project Page]

Generative Artificial Intelligence in Robotic Manipulation: A Survey
Kun Zhang, Peng Yun, Jun Cen, Junhao Cai, Didi Zhu, Hangjie Yuan, Chao Zhao, Tao Feng, Michael Yu Wang, Qifeng Chen, Jia Pan, Wei Zhang, Bo Yang, and Hua Chen
Preprint, 2025
[Paper] [Github Page]

Gaussian-Informed Continuum for Physical Property Identification and Simulation
Junhao Cai*, Yuji Yang*, Weihao Yuan, Yisheng He, Zilong Dong, Liefeng Bo, Hui Cheng, and Qifeng Chen
[NeurIPS 2024] Conference on Neural Information Processing Systems, 2024 (Oral | Percentage: 0.46%)
[Paper] [Project Page]

Open-Vocabulary Category-Level Object Pose and Size Estimation
Junhao Cai*, Yisheng He*, Weihao Yuan, Siyu Zhu, Zilong Dong, Liefeng Bo, and Qifeng Chen
[RA-L 2024] IEEE Robotic and Automation Letter, 2024
[Paper] [Project Page]

Volumetric-based Contact Point Detection for 7-DoF Grasping
Junhao Cai, Jingcheng Su, Zida Zhou, Hui Cheng, Qifeng Chen, Michael Yu Wang
[CoRL 2022] Conference on Robot Learning, 2022
[Paper] [Code]

Real-Time Collision-Free Grasp Pose Detection With Geometry-Aware Refinement Using High-Resolution Volume
Junhao Cai, Jun Cen, Haokun Wang, Michael Yu Wang
[RA-L with ICRA 2022] IEEE Robotics and Automation Letters, 2022
[Paper] [Project Page]

CCAN: Constraint Co-Attention Network for Instance Grasping
Junhao Cai, Xuefeng Tao, Hui Cheng, and Zhanpeng Zhang
[ICRA 2020] IEEE International Conference on Robotics and Automation, 2020
[Paper]

MetaGrasp: Data Efficient Grasping by Affordance Interpreter Network
Junhao Cai, Hui Cheng, Zhanpeng Zhang, and Jingcheng Su
[ICRA 2019] IEEE International Conference on Robotics and Automation, 2019
[Paper] [Code]

Grasping Novel Objects by Semi-supervised Domain Adaptation
Junhao Cai, Zhanpeng Zhang, and Hui Cheng
[RCAR 2019] IEEE International Conference on Real-time Computing and Robotics, 2019
[Paper]

Fusing Object Context to Detect Functional Area for Cognitive Robots
Hui Cheng (Supervisor), Junhao Cai, Quande Liu, Zhanpeng Zhang, Kai Yang, Chen Change Loy, and Liang Lin
[ICRA 2018] IEEE International Conference on Real-time Computing and Robotics, 2018
[Paper]

Teaching

  • COMP 5214
  • COMP 4471
  • EESM 5900V

Services

  • Conference and Journal Reviewer:
     IEEE Robotics and Automation Letter (RA-L)
     IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR)
     Conference on Neural Information Processing Systems (NeurIPS)
     International Conference on Learning Representations (ICLR)
     IEEE/RAS International Conference on Robotics and Automation (ICRA)
     IEEE/RAS International Conference on Intelligent Robots and Systems (IROS)

Awards

  • Research Travel Grant, HKUST, 2022.
  • Postgraduate Studentship, 2020-2024, HKUST.
  • Second Prize Scholarship, 2017-2018, SYSU. (Top 10%)
  • First Prize Scholarship, 2015-2016, SCNU. (Top 2%)
  • National Scholarship, 2015-2016, Ministry of Education. (Top 0.5%)