Yizeng Han (韩益增)

🧑‍🎓 Bio

I'm a Ph.D Candidate in Department of Automation, Tsinghua University, advised by Prof. Gao Huang and Prof. Shiji Song. Download my C.V. here: English / 简体中文.
🌟 My research focuses on deep learning and computer vision, in particular dynamic neural networks and efficient learning/inference of deep models in resource-constrained scenarios.
🧐 I'm also interested in fundamental machine learning problems, such as long-tailed learning, semi-supervised learning, and fine-grained learning.
🔥 Recently, I am interested in directions related to efficient/dynamic multi-modal LLM and generative AI.

📚 Education

  • Ph.D, Tsinghua University, 2018 - present.
  • B.E., Tsinghua University, 2014 - 2018.

💡 Research Experience

  • Research Intern, Megvii Technology (Foundation Model Group, advisor: Xiangyu Zhang), 04/2023 - 12/2023
  • Research Intern, Georgia Institute of Technology (advisor: Gregory D. Abowd), 06/2017 - 08/2017
Your Photo

🔥 News

  • 04/2024: 🎉 Our work (LAUDNet) is accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)!
  • 02/2024: 🎉 Two works (GSVA and Mask Grounding) are accepted by CVPR 2024.
  • 12/2023: 🎉 Our work (Learnable Semantic Data Augmentation) is accepted by IEEE Transactions on Image Processing (TIP).
  • 10/2023: 🎉 Awarded by Comprehensive Excellence Scholarship, Tsinghua University, 2023.
  • 07/2023: 🎉 Three works are accepted by ICCV 2023.
  • 10/2022: 🎉 Awarded by National Scholarship, Ministry of Education of China.

📄 Selected Papers (Full publication list on Google Scholar)

Recent Works Representative Publications
Paper Image

Latency-aware Unified Dynamic Networks for Efficient Image Recognition [PDF] [Code]

Yizeng Han*, Zeyu Liu*, Zhihang Yuan*, Yifan Pu, Chaofei Wang, Shiji Song, Gao Huang

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI, IF=24.314), 2024

We propose Latency-aware Unified Dynamic Networks (LAUDNet), a comprehensive framework that amalgamates three cornerstone dynamic paradigms—spatially-adaptive computation, dynamic layer skipping, and dynamic channel skipping—under a unified formulation.

Paper Image

GSVA: Generalized Segmentation via Multimodal Large Language Models [PDF] [Code]

Zhuofan Xia, Dongchen Han, Yizeng Han, Xuran Pan, Shiji Song, Gao Huang

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

We propose Generalized Segmentation Vision Assistant (GSVA) to address the issues of multi-object and empty-object in Generalized Referring Expression Segmentation (GRES).

Paper Image

Mask Grounding for Referring Image Segmentation [PDF]

Yong Xien Chng, Henry Zheng, Yizeng Han, Xuchong Qiu, Gao Huang

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

We introduce a novel Mask Grounding auxiliary task that significantly improves visual grounding within language features, by explicitly teaching the model to learn fine-grained correspondence between masked textual tokens and their matching visual objects.

Paper Image

SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning [PDF] [Code]

Chaoqun Du*, Yizeng Han*, Gao Huang

Arxiv Preprint, 2024

We focus on a realistic yet challenging task: addressing imbalances in labeled data while the class distribution of unlabeled data is unknown and mismatched. The proposed SimPro does not rely on any predefined assumptions about the distribution of unlabeled data.

Paper Image

Fine-grained Recognition with Learnable Semantic Data Augmentation [PDF] [Code]

Yifan Pu*, Yizeng Han*, Yulin Wang, Junlan Feng, Chao Deng, Gao Huang

IEEE Transactions on Image Processing (TIP), 2023

We propose diversifying the training data at the feature space to alleviate the discriminative region loss problem in fine-grained image recognition. Specifically, we produce diversified augmented samples by translating image features along semantically meaningful directions. The semantic directions are estimated with a sample-wise covariance prediction network.

Paper Image

Agent Attention: On the Integration of Softmax and Linear Attention [PDF] [Code]

Dongchen Han, Tianzhu Ye, Yizeng Han, Zhuofan Xia, Shiji Song, Gao Huang

Arxiv Preprint, 2024

We propose Agent Attention, a linear attention mechanism in vision recognition and generation. Notably, agent attention has shown remarkable performance in high-resolution scenarios, owning to its linear attention nature.

Paper Image

Dynamic Perceiver for Efficient Visual Recognition [PDF] [Code]

Yizeng Han*, Dongchen Han*, Zeyu Liu, Yulin Wang, Xuran Pan, Yifan Pu, Chao Deng, Junlan Feng, Shiji Song, Gao Huang

IEEE/CVF International Conference on Computer Vision (ICCV), 2023

We propose Dynamic Perceiver (Dyn-Perceiver), a general framework which can be conveniently implemented on top of any visual backbones. It explicitly decouples feature extraction and early classification. We show that early classifiers can be constructed in the classification branch without harming the performance of the last classifier. Experiments demonstrate that Dyn-Perceiver significantly outperforms existing state-of-the-art methods in terms of the trade-off between accuracy and efficiency.

Paper Image

Adaptive Rotated Convolution for Rotated Object Detection [PDF] [Code]

Yifan Pu*, Yiru Wang*, Zhuofan Xia, Yizeng Han, Yulin Wang, Weihao Gan, Zidong Wang, Shiji Song, Gao Huang

IEEE/CVF International Conference on Computer Vision (ICCV), 2023

We propose adaptive rotated convolution (ARC) for rotated object detection. In the proposed approach, the convolution kernels rotate adaptively according to different object orientations in the images. The ARC module can be plugged into any backbone networks with convolution layer. Our work achievs SOTA performance on the DOTA benchmark.

Paper Image

FLatten Transformer: Vision Transformer using Focused Linear Attention [PDF] [Code]

Dongchen Han*, Xuran Pan*, Yizeng Han, Shiji Song, Gao Huang

IEEE/CVF International Conference on Computer Vision (ICCV), 2023

We propose a novel focused linear attention module. By addressing the limitations of previous linear attention methods from focus ability and feature diversity perspectives, our module achieves an impressive combination of high efficiency and expressive capability.

Paper Image

Dynamic Neural Networks: A Survey [PDF] [智源社区] [机器之心-在线讲座] [Bilibili] [Slides]

Yizeng Han*, Gao Huang*, Shiji Song, Le Yang, Honghui Wang, Yulin Wang

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI, IF=24.314), 2021

In this survey, we comprehensively review the rapidly developing area, dynamic neural networks. The important research problems, e.g., architecture design, decision making scheme, and optimization technique, are reviewed systematically. We also discuss the open problems in this field together with interesting future research directions.

Paper Image

Latency-aware Unified Dynamic Networks for Efficient Image Recognition [PDF] [Code]

Yizeng Han*, Zeyu Liu*, Zhihang Yuan*, Yifan Pu, Chaofei Wang, Shiji Song, Gao Huang

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI, IF=24.314), 2024

We propose Latency-aware Unified Dynamic Networks (LAUDNet), a comprehensive framework that amalgamates three cornerstone dynamic paradigms—spatially-adaptive computation, dynamic layer skipping, and dynamic channel skipping—under a unified formulation.

Paper Image

Spatially Adaptive Feature Refinement for Efficient Inference [PDF]

Yizeng Han, Gao Huang, Shiji Song, Le Yang, Yitian Zhang, Haojun Jiang

IEEE Transactions on Image Processing (TIP, IF=11.041), 2021

We propose to perform efficient inference by adaptively fusing information from two branches: one conducts standard convolution on inputs at a lower resolution, and the other one selectively refines a set of regions at the original resolution. Experiments on classification, object detection and semantic segmentation validate that SAR can consistently improve the network performance and efficiency.

Paper Image

Dynamic Perceiver for Efficient Visual Recognition [PDF] [Code]

Yizeng Han*, Dongchen Han*, Zeyu Liu, Yulin Wang, Xuran Pan, Yifan Pu, Chao Deng, Junlan Feng, Shiji Song, Gao Huang

IEEE/CVF International Conference on Computer Vision (ICCV), 2023

We propose Dynamic Perceiver (Dyn-Perceiver), a general framework which can be conveniently implemented on top of any visual backbones. It explicitly decouples feature extraction and early classification. We show that early classifiers can be constructed in the classification branch without harming the performance of the last classifier. Experiments demonstrate that Dyn-Perceiver significantly outperforms existing state-of-the-art methods in terms of the trade-off between accuracy and efficiency.

Paper Image

Latency-aware Spatial-wise Dynamic Networks [PDF] [Code]

Yizeng Han*, Zhihang Yuan*, Yifan Pu*, Chenhao Xue, Shiji Song, Guangyu Sun, Gao Huang

Conference on Neural Information Processing Systems (NeurIPS), 2022

We use a latency predictor to guide both algorithm design and scheduling optimization of spatial-wise dynamic networks on various hardware platforms. We show that "coarse-grained" spatially adaptive computation can effectively reduce the memory access cost and shows superior efficiency than pixel-level dynamic operations.

Paper Image

Learning to Weight Samples for Dynamic Early-exiting Networks [PDF] [Code]

Yizeng Han*, Yifan Pu*, Zihang Lai, Chaofei Wang, Shiji Song, Junfen Cao, Wenhui Huang, Chao Deng, Gao Huang

European Conference on Computer Vision (ECCV), 2022

We propose to bridge the gap between training and testing of dynamic early-exiting networks by sample weighting. By bringing the adaptive behavior during inference into the training phase, we show that the proposed weighting mechanism consistently improves the trade-off between classification accuracy and inference efficiency.

Paper Image

Fine-grained Recognition with Learnable Semantic Data Augmentation [PDF] [Code]

Yifan Pu*, Yizeng Han*, Yulin Wang, Junlan Feng, Chao Deng, Gao Huang

IEEE Transactions on Image Processing (TIP), 2023

We propose diversifying the training data at the feature space to alleviate the discriminative region loss problem in fine-grained image recognition. Specifically, we produce diversified augmented samples by translating image features along semantically meaningful directions. The semantic directions are estimated with a sample-wise covariance prediction network.

Paper Image

Resolution Adaptive Networks for Efficient Inference [PDF] [Code]

Le Yang*, Yizeng Han*, Xi Chen*, Shiji Song, Jifeng Dai, Gao Huang

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020

We focus on the spatial redundancy of images, and propose a novel Resolution Adaptive Network (RANet), which is inspired by the intuition that low-resolution representations are sufficient for classifying “easy” inputs, while only some “hard” samples need spatially detailed information. Empirically, we demonstrate the effectiveness of the proposed RANet in both the anytime prediction setting and the budgeted batch classification setting.

🎖 Awards

  • Comprehensive Excellence Scholarship, Tsinghua University, 2023
  • National Scholarship, Ministry of Education of China, 2022
  • Comprehensive Excellence Scholarship, Tsinghua University, 2017
  • Comprehensive Excellence Scholarship, Tsinghua University, 2016
  • Academic Excellence Scholarship, Tsinghua University, 2015

📧 Contact

  • hanyz18 at mails dot tsinghua dot edu dot cn
  • yizeng38 at gmail dot com
  • 616 Centre Main Building, Tsinghua University, Beijing 100084, China