I am currently with the Meituan LongCat Team, working on unified multimodal models.

My recent work focuses on multimodal representation learning, unified understanding-generation systems.

Multimodal Learning Unified Multimodal Models Cross-modal Generalization Understanding & Generation

Publications

* indicates equal contribution.

NeurIPS 2023
CMG
ICCV 2025
URMMDG

Bridging Domain Generalization to Multimodal Domain Generalization via Unified Representations

Hai Huang, Yan Xia, Sashuai Zhou, Hanting Wang, Shulei Wang, Zhou Zhao

ICCV 2025
OSCMG

Open-set Cross Modal Generalization via Multimodal Unified Representation

Hai Huang, Yan Xia, Shulei Wang, Hanting Wang, Minghui Fang, Shengpeng Ji, Sashuai Zhou, Tao Jin, Zhou Zhao

Project

ACL 2025 Findings
TOC

Enhancing Multimodal Unified Representations for Cross Modal Generalization

Hai Huang, Yan Xia, Shengpeng Ji, Shulei Wang, Hanting Wang, Minghui Fang, Jieming Zhu, Zhenhua Dong, Sashuai Zhou, Zhou Zhao

Project

ICASSP 2025
Semantic Residual
NAACL 2025 Findings
RVS

Honors

Education

  • 2023.09 - 2026.03, M.S. in Artificial Intelligence, Zhejiang University
  • 2019.09 - 2023.06, B.S. in Computer Science and Technology, Northeastern University, China

Experience

  • 2025.01 - 2025.07, Huawei, pretraining a 6B-parameter image generation model based on DiT