DyDiT++: Dynamic Diffusion Transformers for Efficient Visual Generation [PDF] [Code]
Wangbo Zhao*, Yizeng Han*, Jiasheng Tang, Kai Wang, Hao Luo, Yibing Song, Gao Huang, Fan Wang, Yang You
Arxiv Preprint, 2025.
We extend DyDiT to T2I (DyFLUX) and video generation. Moreover, LoRA finetuning is supported.
RAPID^3: Tri-Level Reinforced Acceleration Policies for Diffusion Transformer [PDF]
Wangbo Zhao, Yizeng Han, Zhiwei Tang, Jiasheng Tang, Pengfei Zhou, Kai Wang, Bohan Zhuang, Zhangyang Wang, Fan Wang, Yang You
Arxiv Preprint, 2025.
Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation [PDF] [Code]
Inferix Team: Tianyu Feng, Yizeng Han, Jiahao He, Yuanyu He, Xi Lin, Teng Liu, Hanfeng Lu, Jiasheng Tang, Wei Wang, Zhiyuan Wang, Jichao Wu, Mingyang Yang, Yinghao Yu, Zeyu Zhang, Bohan Zhuang
Arxiv Preprint, 2025.
Emulating Human-like Adaptive Vision for Efficient and Flexible Machine Visual Perception [PDF] [Code]
Yulin Wang, Yang Yue, Yang Yue, Huanqian Wang, Haojun Jiang, Yizeng Han, Zanlin Ni, Yifan Pu, Minglei Shi, Rui Lu, Qisen Yang, Andrew Zhao, Zhuofan Xia, Shiji Song, Gao Huang
Nature Machine Intelligence, 2025.
FPSAttention: Training-Aware FP8 and Sparsity Co-Design for Fast Video Diffusion [PDF] [Project]
Akide Liu, Zeyu Zhang, Zhexin Li, Xuehai Bai, Yizeng Han, Jiasheng Tang, Yuanjie Xing, Jichao Wu, Mingyang Yang, Weihua Chen, Jiahao He, Yuanyu He, Fan Wang, Gholamreza Haffari, Bohan Zhuang
NeurIPS (Highlight), 2025.
A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for accelerating Large VLMs [PDF] [Code]
Wangbo Zhao*, Yizeng Han*, Jiasheng Tang, Zhikai Li, Yibing Song, Kai Wang, Zhangyang Wang, Yang You
The Conference on Computer Vision and Pattern Recognition (CVPR), 2025.
We propose to use a small VLM to guide the visual token pruning in a large VLM. Meanwhile, the small VLM can also perform dynamic early exiting to further improve the inference efficiency.
Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation [PDF] [Code]
Wangbo Zhao, Jiasheng Tang, Yizeng Han, Yibing Song, Kai Wang, Gao Huang, Fan Wang, Yang You
Conference on Neural Information Processing Systems (NeurIPS), 2024.
We propose to adapt static ViT to dynamic ViT via parameter-efficient fine-tuning without full-parameter tuning.
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution [PDF] [Code]
Yang Yue, Yulin Wang, Bingyi Kang, Yizeng Han, Shenzhi Wang, Shiji Song, Jiashi Feng, Gao Huang
Conference on Neural Information Processing Systems (NeurIPS), 2024.
We propose dynamic early exiting in Robot MLLMs.