Zilong Huang / 黄子龙

I am now a research scientist at TikTok. I received my Ph.D. degree and B.E. degree from Huazhong University of Science and Technology (HUST) in 2020 and 2015 respectively, advised by Prof. Wenyu Liu and Prof. Xinggang Wang. I was working as a visiting student (2018-2019) in the IFP group at the University of Illinois at Urbana-Champaign (UIUC), advised by Prof. Thomas S. Huang, Prof. Yunchao Wei and Prof. Humphrey Shi.

I work on computer vision problems with special focus on Multi-modal learning, Image/video understanding and generation, Efficient network design.

CV  /  Email  /  GitHub  /  Google Scholar  /  LinkedIn

profile photo
Highlights
Publications

My selected publications are listed here. The complete list of publications can be seen from my Google Scholar page.

^ students mentored by me. * equal contribution

Harnessing Diffusion Models for Visual Perception with Meta Prompts
Qiang Wan, Zilong Huang, Bingyi Kang, Jiashi Feng, Li Zhang.
Arxiv, 2024
code / pdf

This work presents Meta Prompts, a simple yet effective scheme to harness a diffusion model for visual perception tasks.

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao.
CVPR, 2024
code / pdf

This work presents Depth Anything, a highly practical solution for robust monocular depth estimation by training on a combination of 1.5M labeled images and 62M+ unlabeled images.

Disentangled Pre-training for Image Matting
Yanda Li^, Zilong Huang, Gang Yu, Ling Chen, Yunchao Wei, Jianbo Jiao
WACV, 2024
code / pdf

we propose the first self-supervised large-scale pretraining approach for image matting.

Executing your Commands via Motion Diffusion in Latent Space
Xin Chen*, Biao Jiang*, Wen Liu, Zilong Huang, Bin Fu, Tao Chen, Jingyi Yu, Gang Yu
CVPR, 2023
code / pdf

we propose a Motion Latent-based Diffusion model (MLD) could produce vivid motion sequences conforming to the given conditional inputs.

SeaFormer: Squeeze-enhanced Axial Transformer for Mobile Semantic Segmentation
Qiang Wan^, Zilong Huang, Jiachen Lu, Gang Yu, Li Zhang
ICLR, 2023
code / pdf

we design a generic attention block characterized by the formulation of squeeze Axial and detail enhancement for mobile vision.

Coordinates Are NOT Lonely - Codebook Prior Helps Implicit Neural 3D Representations
Fukun Yin*, Wen Liu*, Zilong Huang, Pei Cheng, Tao Chen, Gang Yu
NeurIPS, 2022
code / pdf

CoCo-INR is a novel framework for implicit neural 3D representations, which builds a connection between each coordinate and the prior information.

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation
Wenqiang Zhang^*, Zilong Huang*, Guozhong Luo, Tao Chen, Xinggang Wang, Wenyu Liu, Gang Yu, Chunhua Shen
CVPR, 2022
code / pdf

Topformer is the first work that makes transformer real-time on mobile devices for segmentation tasks.

Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer
Zilong Huang, Youcheng Ben, Guozhong Luo, Pei Cheng, Gang Yu, Bin Fu
arXiv, 2021
code / pdf

we revisit the spatial shuffle as an efficient way to build connections among windows in window-based self-attention.

AlignSeg: Feature-Aligned Segmentation Networks
Zilong Huang, Yunchao Wei, Xinggang Wang, Wenyu Liu, Thomas S. Huang, Humphrey Shi
TPAMI, 2021
code / pdf

we focus on the feature misalignment issue in previous popular feature aggregation architectures for semantic segmentation.

Human De-occlusion: Invisible Perception and Recovery for Humans
Qiang Zhou, Shiyin Wang, Yitong Wang, Zilong Huang, Xinggang Wang
CVPR, 2021
dataset / pdf

we tackle the problem of human de-occlusion which reasons about occluded segmentation masks and invisible appearance content of humans.

High-Resolution Deep Image Matting
Haichao Yu, Ning Xu, Zilong Huang, Yuqian Zhou, Humphrey Shi.
AAAI, 2021
pdf

we propose HDMatt, a first deep learning based image matting approach for high-resolution inputs.

Agriculture-Vision: A Large Aerial Image Database for Agricultural Pattern Analysis
Mang Tik Chiu*, Xingqian Xu*, Yunchao Wei, Zilong Huang, Alexander Schwing, Robert Brunner, Hrant Khachatrian, Hovnatan Karapetyan, Ivan Dozier, Greg Rose, David Wilson, Adrian Tudor, Naira Hovakimyan, Thomas S Huang, Honghui Shi
CVPR, 2020
dataset / pdf / video

we present Agriculture-Vision: a large-scale aerial farmland image dataset for semantic segmentation of agricultural patterns.

CCNet: Criss-Cross Attention for Semantic Segmentation
Zilong Huang, Xinggang Wang, Yunchao Wei, Lichao Huang, Humphrey Shi, Wenyu Liu, Thomas S. Huang
ICCV, 2019 |TPAMI, 2020
code / pdf

More than 1700 citations, PaperDigest Most Influential ICCV 2019 papers (5th). Applications of CCNet also include AlphaFold2.

we propose a Criss-Cross Network (CCNet) for obtaining full-image contextual information in a very effective and efficient way.

SPGNet: Semantic Prediction Guidance for Scene Parsing
Bowen Cheng, Liang-Chieh Chen, Yunchao Wei, Yukun Zhu, Zilong Huang, Jinjun Xiong, Thomas Huang, Wen-Mei Hwu, Humphrey Shi.
ICCV, 2019
pdf

we propose a Semantic Prediction Guidance (SPG) module which learns to re-weight the local features through the guidance from pixel-wise semantic prediction.

Semantic Image Segmentation by Scale-Adaptive Networks
Zilong Huang, Chunyu Wang, Xinggang Wang, Wenyu Liu, Jingdong Wang
TIP, 2019
code / pdf

we propose a Scale-Adaptive Network (SAN) which consists of multiple branches with each one taking charge of the segmentation of the objects of a certain range of scales.

Devil in the Details: Towards Accurate Single and Multiple Human Parsing
Tao Ruan*, Ting Liu*, Zilong Huang, Yunchao Wei, Shikui Wei, Yao Zhao, Thomas Huang
AAAI, 2019
code / pdf

we identify several useful properties, including feature resolution, global context information and edge details, and perform rigorous analyses to reveal how to leverage them to benefit the human parsing task.

Weakly-supervised semantic segmentation network with deep seeded region growing
Zilong Huang, Xinggang Wang, Jiasi Wang, Wenyu Liu, Jingdong Wang
CVPR, 2018
code / pdf

we propose to train a semantic segmentation network starting from the discriminative regions and progressively increase the pixel-level supervision using by seeded region growing.


Last updated on March 09, 2022. Thanks to Jon Barron for this minimalist website template.