Publications
My selected publications are listed here. The complete list of publications can be seen from my Google Scholar page.
^ students mentored by me. * equal contribution
|
|
Classification Done Right for Vision-Language Pre-Training
Zilong Huang, Qinghao Ye, Bingyi Kang, Jiashi Feng, Haoqi Fan
NeurIPS, 2024
code /
pdf
We introduce SuperClass, a super simple classification method for vision-language pre-training on image-text data.
|
|
Depth Anything V2
Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao
NeurIPS, 2024
code /
pdf
Depth Anything V2 is trained from 595K synthetic labeled images and 62M+ real unlabeled images.
|
|
DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention
Lianghui Zhu, Zilong Huang, Bencheng Liao, Jun Hao Liew, Hanshu Yan, Jiashi Feng, Xinggang Wang
Arxiv, 2024
code /
pdf
This work presents Diffusion GLA, the first exploration for diffusion backbone with linear attention
transformer.
|
|
Harnessing Diffusion Models for Visual Perception with Meta Prompts
Qiang Wan, Zilong Huang, Bingyi Kang, Jiashi Feng, Li Zhang.
Arxiv, 2024
code /
pdf
This work presents Meta Prompts, a simple yet effective scheme to harness a diffusion model for visual perception tasks.
|
|
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao.
CVPR, 2024
code /
pdf
This work presents Depth Anything, a highly practical solution for robust monocular depth estimation by training on a combination of 1.5M labeled images and 62M+ unlabeled images.
|
|
Disentangled Pre-training for Image Matting
Yanda Li^, Zilong Huang, Gang Yu, Ling Chen, Yunchao Wei, Jianbo Jiao
WACV, 2024
code /
pdf
we propose the first self-supervised large-scale pretraining approach for image matting.
|
|
Executing your Commands via Motion Diffusion in Latent Space
Xin Chen*, Biao Jiang*, Wen Liu, Zilong Huang, Bin Fu, Tao Chen, Jingyi Yu, Gang Yu
CVPR, 2023
code /
pdf
we propose a Motion Latent-based Diffusion model (MLD) could produce vivid motion sequences conforming to the given conditional inputs.
|
|
SeaFormer: Squeeze-enhanced Axial Transformer for Mobile Semantic Segmentation
Qiang Wan^, Zilong Huang, Jiachen Lu, Gang Yu, Li Zhang
ICLR, 2023
code /
pdf
we design a generic attention block characterized by the formulation of squeeze Axial and detail enhancement for mobile vision.
|
|
Coordinates Are NOT Lonely - Codebook Prior Helps Implicit Neural 3D Representations
Fukun Yin*, Wen Liu*, Zilong Huang, Pei Cheng, Tao Chen, Gang Yu
NeurIPS, 2022
code /
pdf
CoCo-INR is a novel framework for implicit neural 3D representations, which builds a connection between each coordinate and the prior information.
|
|
TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation
Wenqiang Zhang^*,
Zilong Huang*,
Guozhong Luo,
Tao Chen,
Xinggang Wang,
Wenyu Liu,
Gang Yu,
Chunhua Shen
CVPR, 2022
code /
pdf
Topformer is the first work that makes transformer real-time on mobile devices for segmentation tasks.
|
|
Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer
Zilong Huang,
Youcheng Ben, Guozhong Luo, Pei Cheng, Gang Yu, Bin Fu
arXiv, 2021
code /
pdf
we revisit the spatial shuffle as an efficient way to build connections among windows in window-based self-attention.
|
|
AlignSeg: Feature-Aligned Segmentation Networks
Zilong Huang,
Yunchao Wei, Xinggang Wang, Wenyu Liu, Thomas S. Huang, Humphrey Shi
TPAMI, 2021
code /
pdf
we focus on the feature misalignment issue in previous popular feature aggregation architectures for semantic segmentation.
|
|
Human De-occlusion: Invisible Perception and Recovery for Humans
Qiang Zhou, Shiyin Wang, Yitong Wang, Zilong Huang, Xinggang Wang
CVPR, 2021
dataset /
pdf
we tackle the problem of human de-occlusion which reasons about occluded segmentation masks and invisible appearance content of humans.
|
|
High-Resolution Deep Image Matting
Haichao Yu, Ning Xu, Zilong Huang, Yuqian Zhou, Humphrey Shi.
AAAI, 2021
pdf
we propose HDMatt, a first deep learning based image matting approach for high-resolution inputs.
|
|
Agriculture-Vision: A Large Aerial Image Database for Agricultural Pattern Analysis
Mang Tik Chiu*, Xingqian Xu*, Yunchao Wei, Zilong Huang, Alexander Schwing, Robert Brunner, Hrant Khachatrian, Hovnatan Karapetyan, Ivan Dozier, Greg Rose, David Wilson, Adrian Tudor, Naira Hovakimyan, Thomas S Huang, Honghui Shi
CVPR, 2020
dataset /
pdf /
video
we present Agriculture-Vision: a large-scale aerial farmland image dataset for semantic segmentation of agricultural patterns.
|
|
CCNet: Criss-Cross Attention for Semantic Segmentation
Zilong Huang,
Xinggang Wang, Yunchao Wei, Lichao Huang, Humphrey Shi, Wenyu Liu, Thomas S. Huang
ICCV, 2019 |TPAMI, 2020
code /
pdf
More than 3000 citations, PaperDigest Most Influential ICCV 2019 papers (5th).
Applications of CCNet also include AlphaFold2.
we propose a Criss-Cross Network (CCNet) for obtaining full-image contextual information in a very effective and efficient way.
|
|
SPGNet: Semantic Prediction Guidance for Scene Parsing
Bowen Cheng, Liang-Chieh Chen, Yunchao Wei, Yukun Zhu, Zilong Huang, Jinjun Xiong, Thomas Huang, Wen-Mei Hwu, Humphrey Shi.
ICCV, 2019
pdf
we propose a Semantic Prediction Guidance (SPG) module which learns to re-weight the
local features through the guidance from pixel-wise semantic prediction.
|
|
Semantic Image Segmentation by Scale-Adaptive Networks
Zilong Huang,
Chunyu Wang, Xinggang Wang, Wenyu Liu, Jingdong Wang
TIP, 2019
code /
pdf
we propose a Scale-Adaptive Network (SAN) which consists of multiple branches with each one taking charge of the segmentation of the objects of a certain range of scales.
|
|
Devil in the Details: Towards Accurate Single and Multiple Human Parsing
Tao Ruan*, Ting Liu*, Zilong Huang, Yunchao Wei, Shikui Wei, Yao Zhao, Thomas Huang
AAAI, 2019
code /
pdf
we identify several useful properties, including feature resolution, global context information and edge details, and perform rigorous analyses to reveal how to leverage them to benefit the human parsing task.
|
|
Weakly-supervised semantic segmentation network with deep seeded region growing
Zilong Huang,
Xinggang Wang, Jiasi Wang, Wenyu Liu, Jingdong Wang
CVPR, 2018
code /
pdf
we propose to train a semantic segmentation network starting from the discriminative regions and
progressively increase the pixel-level supervision using by seeded region growing.
|
Last updated on March 09, 2022. Thanks to Jon Barron for this minimalist website template.
|
|