I am currently a Senior Research Scientist at Meta AI. I got my PhD in Computer Science from the
University
of Chicago, where I was fortunate to have been working with
Prof. Yuxin Chen,
Prof. Bo Li,
and
Prof. Michael
Maire.
Prior to that, I received my Master's degree in Robotics from
the Johns Hopkins University,
under the supervision of
Prof. Nassir
Navab, and my Bachelor's degree with Honors in Electrical Engineering from
the University of Illinois at
Urbana-Champaign, where I was working with
Prof. Seth
Hutchinson.
My research focuses on developing novel algorithms to enhance the post-training effectiveness (e.g., improved reasoning, reduced hallucinations) and efficiency (e.g., lower data and compute requirements) of multimodal large language models (MLLMs), as well as multi-agent systems that leverage them.
StreamMem: Query-Agnostic KV Cache Memory for Streaming Video Understanding
In submission, 2025
DISCO Balances the Scales: Adaptive Domain- and Difficulty-Aware Reinforcement Learning on Imbalanced Data
Yuhang Zhou*,
Jing Zhu*,
Shengyi Qian,
Zhuokai Zhao,
Xiyao Wang,
Xiaoyu Liu,
Ming Li,
Paiheng Xu,
Wei Ai, and
Furong Huang,
Empirical Methods in Natural Language Processing (EMNLP), 2025
CircuitProbe: Dissecting Spatiotemporal Visual Semantics with Circuit Tracing
Yiming Zhang*, Chengzhang Yu*,
Zhuokai Zhao*,
Kun Wang, Qiankun Li, Zihan Chen,
Yang Liu, Zenghui Ding, and Yining Sun
In submission, 2025
Boosting LLM Reasoning via Spontaneous Self-Correction
Xutong Zhao,
Tengyu Xu,
Xuewei Wang,
Zhengxing Chen,
Di Jin,
Liang Tan,
Yen-Ting Lin,
Zishun Yu,
Zhuokai Zhao,
Yun He,
Sinong Wang,
Han Fang,
Sarath Chandar, and
Chen Zhu
Conference on Language Modeling (COLM), 2025
RankCLIP: Ranking-Consistent Language-Image Pretraining
Yiming Zhang*,
Zhuokai Zhao*,
Zhaorun Chen,
Zhili Feng, Zenghui Ding, and Yining Sun
International Conference on Computer Vision (ICCV), 2025
Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding
Yiming Zhang,
Zhuokai Zhao,
Zhaorun Chen, Zenghui Ding, Xianjun Yang, and Yining Sun
International Conference on Computer Vision (ICCV), 2025
S'MoRE: Structural Mixture of Residual Experts for LLM Fine-tuning
Hanqing Zeng,
Yinglong Xia,
Zhuokai Zhao, Gilbert Jiang, Qiang Zhang, Jiayi Liu, Lizhu Zhang, Xiangjun Fan, and
Benyu Zhang
In submission, 2025
Transfer between Modalities with MetaQueries
Xichen Pan,
Satya Narayan Shukla,
Aashu Singh,
Zhuokai Zhao,
Shlok Kumar Mishra,
Jialiang Wang,
Zhiyang Xu,
Jiuhai Chen,
Kunpeng Li,
Felix Juefei-Xu,
Ji Hou, and
Saining Xie
In submission, 2025
CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning
Hao Yu,
Zhuokai Zhao,
Shen Yan,
Lukasz Korycki, Jianyu Wang, Baosheng He, Jiayi Liu, Lizhu Zhang, Xiangjun Fan, and Hanchao Yu
ICCV Findings, 2025
HumanMM: Global Human Motion Recovery from Multi-shot Videos
Yuhong Zhang*,
Guanlin Wu*,
Ling-Hao Chen,
Zhuokai Zhao,
Jing Lin, Xiaoke Jiang, Jiamin Wu, Zhuoheng Li,
Hao Frank Yang,
Haoqian Wang, and
Lei Zhang
The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR), 2025
Autonomous Multimodal Reasoning via Implicit Chain-of-Vision
Yiqiao Huang*, He Qi*,
Zhaorun Chen,
Haopeng Zhang, Hanchao Yu, and
Zhuokai Zhao
CVPR Workshop on Multimodal Algorithmic Reasoning (Oral Presentation), 2025
Quantifying Generalization Complexity for Large Language Models
International Conference on Learning Representations (ICLR), 2025
Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment
Chaoqi Wang*,
Zhuokai Zhao*,
Yibo Jiang*,
Zhaorun Chen*, Chen Zhu,
Yuxin Chen, Jiayi Liu, Lizhu Zhang,
Hao Ma, and
Sinong Wang
In submission, 2025
From Uncertainty to Trust: Enhancing Reliability in Vision-Language Models with Uncertainty-Guided Selective Decoding
Yixiong Fang,
Ziran Yang,
Zhaorun Chen,
Zhuokai Zhao†, and
Jiawei Zhou†
In submission, 2024
Direct Acquisition Optimization for Low-Budget Active Learning
Zhuokai Zhao,
Yibo Jiang, and
Yuxin Chen
38th NeurIPS Workshop on Bayesian Decision-making and Uncertainty (Spotlight Talk), 2024
Evaluating Machine Learning Models with NERO: Non-Equivariance Revealed on Orbits
38th NeurIPS Workshop on Interpretable AI, 2024
EscIRL: Evolving Self-Contrastive IRL for Trajectory Prediction in Autonomous Driving
Siyue Wang*,
Zhaorun Chen*,
Zhuokai Zhao, Chaoli Mao,
Yiyang Zhou, Jiayu He, and Albert Sibo Hu
8th Annual Conference on Robot Learning (CoRL), 2024
Preference Optimization with Multi-Sample Comparisons
In submission, 2024
Multimodal Guidance Network for Missing-Modality Inference in Content Moderation
Zhuokai Zhao, Harish Palani, Tianyi Liu, Lena Evans, and Ruth Toner
IEEE International Conference on Multimedia and Expo (ICME), 2024
MJ-BENCH: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Zhaorun Chen*, Yichao Du*, Zichen Wen*, Yiyang Zhou*, Chenhang Cui, Zhenzhen Weng, Haoqin Tu,
Chaoqi Wang, Zhengwei Tong, Qinglan Huang, Canyu Chen, Qinghao Ye, Zhihong Zhu, Yuqing Zhang, Jiawei Zhou,
Zhuokai Zhao, Rafael Rafailov, Chelsea Finn, and Huaxiu Yao
41st ICML Workshop on Foundation Models in the Wild, 2024
PANDORA: Detailed LLM Jailbreaking via Collaborated Phishing Agents with Decomposed Reasoning
12th ICLR Workshop on Secure and Trustworthy Large Language Models, 2024
Safe Reinforcement Learning via Hierarchical Adaptive Chance-Constraint Safeguards
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024
HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding
41st International Conference on Machine Learning (ICML), 2024
Preliminary version appeared in 12th ICLR Workshop on Reliable and Responsible Foundation Models, 2024
AutoPRM: Automating Procedural Supervision for Multi-Step Reasoning via Controllable Question Decomposition
Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Preliminary version appeared in ICLR Workshop on Reliable and Responsible Foundation Models, 2024
RELAX: Reinforcement Learning Enabled 2D-LiDAR Autonomous System for Parsimonious UAVs
Guanlin Wu,
Zhuokai Zhao, and
Yutao He
39th AAAI Workshop on Planning and Reinforcement Learning (PRL), 2023
Breaking the Curse of Quality Saturation with User-Centric Ranking
29th Conference on Knowledge Discovery and Data Mining (KDD), 2023
System and Method for Assisted Patient Positioning
U.S. Patent No. 10,783,655, 2020
Early Feasibility Studies of Augmented Reality Navigation for Lateral Skull Base Surgery
Otology & Neurotology 41(7):p 883-888, 2020
Enhanced Data Utilization for Efficient and Trustworthy Deep Learning
Zhuokai Zhao
Ph.D. in Computer Science, 2024
Utilizing Both Past and Future: Multi-Frame Memory Based Network in Solving Particle Image Velocimetry
Zhuokai Zhao
MS in Computer Science, 2021
OpenChemistry/Stempy: Stable Version
Zenodo, 2024
Trajectory Planning and Control for Nonholonomic Robot Among Onstacles
Nonlinear Control and Planning in Robotics, 2018
Head-Mounted Display Integration for Orthopedic Surgery
Advanced Computer-Integrated Surgery, 2017