Yizeng Han

Ph.D Candidate, advised by Prof. Gao Huang and Prof. Shiji Song.
Department of Automation, Tsinghua University.


  • Ph.D, Tsinghua University, 2018 - present.
  • B.E., Tsinghua University, 2014 - 2018.

  • Research Experience

  • Intern, Georgia Institute of Technology, 06/2017 - 08/2017

  • Research Interest

    My research focuses on machine learning and computer vision, in particular deep learning, efficient inference and dynamic neural networks.


  • 09/2022: Our work (latency-aware spatial-wise dynamic networks) is accepted by NeurIPS 2022.

  • Recent Publications & Preprints (Google Scholar)

    Dynamic Neural Networks: A Survey. [智源社区][机器之心-在线讲座][Bilibili][slides]
    Yizeng Han*, Gao Huang*, Shiji Song, Le Yang, Honghui Wang, Yulin Wang.
    IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI, IF=24.314), 2021
    Dynamic neural network is an emerging research topic in deep learning. Compared to static models which have fixed computational graphs and parameters at the inference stage, dynamic networks can adapt their structures or parameters to different inputs, leading to notable advantages in terms of accuracy, computational efficiency, adaptiveness, etc. In this survey, we comprehensively review this rapidly developing area. The important research problems of dynamic networks, e.g., architecture design, decision making scheme, and optimization technique, are reviewed systematically. Finally, we discuss the open problems in this field together with interesting future research directions.
    Learning to Weight Samples for Dynamic Early-exiting Networks. [code][slides][poster][video]
    Yizeng Han*, Yifan Pu*, Zihang Lai, Chaofei Wang, Shiji Song, Junfen Cao, Wenhui Huang, Chao Deng, Gao Huang.
    European Conference on Computer Vision (ECCV), 2022
    In this paper, we propose to bridge the gap between training and testing of dynamic early-exiting networks by sample weighting. Intuitively, easy samples, which generally exit early in the network during inference, should contribute more to training early classifiers. The training of hard samples (mostly exit from deeper layers), however, should be emphasized by the late classifiers. Our work proposes to adopt a weight prediction network to weight the loss of different training samples at each exit. This weight prediction network and the backbone model are jointly optimized under a meta-learning framework with a novel optimization objective. By bringing the adaptive behavior during inference into the training phase, we show that the proposed weighting mechanism consistently improves the trade-off between classification accuracy and inference efficiency.
    Spatially Adaptive Feature Refinement for Efficient Inference.
    Yizeng Han, Gao Huang, Shiji Song, Le Yang, Yitian Zhang, Haojun Jiang.
    IEEE Transactions on Image Processing (TIP, IF=11.041), 2021
    We propose a novel Spatially Adaptive feature Refinement (SAR) approach to perform efficient inference by adaptively fusing information from two branches: one conducts standard convolution on input features at a lower spatial resolution, and the other one selectively refines a set of regions at the original resolution. The two branches complement each other in feature learning, and both of them evoke much less computation than standard convolution. Experiments on CIFAR and ImageNet classification, COCO object detection and PASCAL VOC semantic segmentation tasks validate that SAR can consistently improve the network performance and efficiency.
    Resolution Adaptive Networks for Efficient Inference. [code]
    Le Yang*, Yizeng Han*, Xi Chen*, Shiji Song, Jifeng Dai, Gao Huang.
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020.
    We focus on the spatial redundancy of images, and propose a novel Resolution Adaptive Network (RANet), which is inspired by the intuition that low-resolution representations are sufficient for classifying “easy” inputs, while only some “hard” samples need spatially detailed information. Empirically, we demonstrate the effectiveness of the proposed RANet on the CIFAR-10, CIFAR-100 and ImageNet datasets in both the anytime prediction setting and the budgeted batch classification setting.
    Adaptive Focus for Efficient Video Recognition. [code]
    Yulin Wang, Zhaoxi Chen, Haojun Jiang, Shiji Song, Yizeng Han, and Gao Huang.
    IEEE/CVF International Conference on Computer Vision (ICCV Oral) 2021.
    In this paper, we explore the spatial redundancy in video recognition with the aim to improve the computational efficiency. Extensive experiments on five benchmark datasets, i.e., ActivityNet, FCVID, Mini-Kinetics, Something-Something V1&V2, demonstrate that our method is significantly more efficient than the competitive baselines.
    Towards Learning Spatially Discriminative Feature Representations.
    Chaofei Wang*, Jiayu Xiao*, Yizeng Han, Qisen Yang, Shiji Song, Gao Huang.
    IEEE/CVF International Conference on Computer Vision (ICCV) 2021.
    We propose CAM-loss to constrain the embedded feature maps with the class activation maps (CAMs) which indicate the spatially discriminative regions of an image for particular categories. Experimental results show that CAM-loss is applicable to a variety of network structures and can be combined with mainstream regularization methods to improve the performance of image classification. The strong generalization ability of CAM-loss is validated in transfer learning and few shot learning tasks.
    * Equal Contribution.


    • Comprehensive Merit Scholarship, 2017, 2016 at Tsinghua University.
    • Academic Excellence Scholarship, 2015 at Tsinghua University.


    • hanyz18 at mails dot tsinghua dot edu dot cn.
    • 616 Centre Main Building, Tsinghua University, Beijing 100084, China.