Ph.D. Student | Robotics Institute, The Hong Kong University of Science and Technology
Time: Sep 2020 - May 2025. Supervisor: Prof. Qifeng Chen and Prof. Michael Yu Wang
Shanghai AI Lab
Shanghai
China
I am a Researcher at Shanghai AI Laboratory, working on Vision-Language-Action (VLA) models for robotic manipulation and embodied intelligence. I received my Ph.D. from the Robotics Institute, The Hong Kong University of Science and Technology, where I was advised by Prof. Qifeng Chen and Prof. Michael Yu Wang. Here is my CV.
My research focuses on Vision-Language-Action (VLA) models and embodied intelligence for robotic manipulation. My work spans robotic perception, object pose and physical property estimation, and multimodal representation learning, with the goal of enabling robots to understand and interact with the physical world. Feel free to contact me via email.
Ph.D. Student | Robotics Institute, The Hong Kong University of Science and Technology
Time: Sep 2020 - May 2025. Supervisor: Prof. Qifeng Chen and Prof. Michael Yu Wang
M.Eng. Student | Sun Yat-sen University
Time: Sep 2017 - July 2019. Supervisor: Prof. Hui
Cheng
Undergraduate Student | South China Normal University
Time: Sep 2013 - July 2017. Supervisor: Prof.
Yun Xue
Researcher | Embodied Intelligence Center, Shanghai AI Lab
Time: Aug 2025 - Now.
Research Intern | Embodied Intelligence Center, Shanghai AI Lab
Time: Jan 2025 - Jun 2025. Mentor: Jia Zeng.
Research Intern | Applied Computer Vision Lab, Institute for Intelligent Computing,
Alibaba
Time: Jun 2023 - Jun 2024. Mentor: Yisheng He, Weihao Yuan, and Zilong Dong.
Research Assistant | RAPID Lab, Sun Yat-sen University
Time: Jul 2019 - Aug 2020. Supervisor: Prof. Hui
Cheng
Research Intern | Robotic Group, Sensetime Group Limited
Time: Mar 2017 - Aug 2017. Mentor: Dr. Tao Zhou, and Dr. Zhanpeng Zhang
InternVLA-A1: Unifying Understanding, Generation and Action for Robotic Manipulation
Junhao Cai, Zetao Cai, Jiafei Cao, Yilun Chen, Zeyu He, Lei Jiang, Hang Li, Hengjie Li, Yang Li, Yufei Liu, Yanan Lu, Qi Lv, Haoxiang Ma, Jiangmiao Pang, Yu Qiao, Zherui Qiu, Yanqing Shen, Xu Shi, Yang Tian, Bolun Wang, Hanqing Wang, Jiaheng Wang, Tai Wang, Xueyuan Wei, Chao Wu, Yiman Xie, Boyang Xing, Yuqiang Yang, Yuyin Yang, Qiaojun Yu, Feng Yuan, Jia Zeng, Jingjing Zhang, Shenghan Zhang, Shi Zhang, Zhuoma Zhaxi, Bowen Zhou, Yuanzhen Zhou, Yunsong Zhou, Hongrui Zhu, Yangkun Zhu, Yuchen Zhu
Preprint, 2026
[Paper]
[Project Page]
[Github Page]
InternData-A1: Pioneering High-Fidelity Synthetic Data for Pre-training Generalist Policy
Yang Tian, Yuyin Yang, Yiman Xie, Zetao Cai, Xu Shi, Ning Gao, Hangxu Liu, Xuekun Jiang, Zherui Qiu, Feng Yuan, Yaping Li, Ping Wang, Junhao Cai, Jia Zeng, Hao Dong, Jiangmiao Pang
Preprint, 2025
[Paper]
[Project Page]
Generative Artificial Intelligence in Robotic Manipulation: A Survey
Kun Zhang, Peng Yun, Jun Cen, Junhao Cai, Didi Zhu, Hangjie Yuan, Chao Zhao, Tao Feng, Michael Yu Wang, Qifeng Chen, Jia Pan, Wei Zhang, Bo Yang, and Hua Chen
Preprint, 2025
[Paper]
[Github Page]
Gaussian-Informed Continuum for Physical Property Identification and Simulation
Junhao Cai*, Yuji Yang*, Weihao Yuan, Yisheng He, Zilong Dong, Liefeng Bo, Hui Cheng, and
Qifeng Chen
[NeurIPS 2024] Conference on Neural Information Processing Systems,
2024 (Oral | Percentage: 0.46%)
[Paper]
[Project Page]
Open-Vocabulary Category-Level Object Pose and Size Estimation
Junhao Cai*, Yisheng He*, Weihao Yuan, Siyu Zhu, Zilong Dong, Liefeng Bo, and Qifeng Chen
[RA-L 2024] IEEE Robotic and Automation Letter,
2024
[Paper]
[Project Page]
Real-Time Collision-Free Grasp Pose Detection With Geometry-Aware Refinement Using
High-Resolution Volume
Junhao Cai, Jun Cen, Haokun Wang, Michael Yu Wang
[RA-L with ICRA 2022] IEEE Robotics and Automation
Letters, 2022
[Paper]
[Project Page]
CCAN: Constraint Co-Attention Network for Instance Grasping
Junhao Cai, Xuefeng Tao, Hui Cheng, and Zhanpeng Zhang
[ICRA 2020] IEEE International Conference on Robotics and
Automation, 2020
[Paper]
Grasping Novel Objects by Semi-supervised Domain Adaptation
Junhao Cai, Zhanpeng Zhang, and Hui Cheng
[RCAR 2019] IEEE International Conference on Real-time
Computing and Robotics, 2019
[Paper]
Fusing Object Context to Detect Functional Area for Cognitive Robots
Hui Cheng (Supervisor), Junhao Cai, Quande Liu, Zhanpeng Zhang, Kai Yang, Chen Change Loy,
and Liang Lin
[ICRA 2018] IEEE International Conference on Real-time
Computing and Robotics, 2018
[Paper]