Overview

2020-04-01 2026-07-15

CV / Overview

an hour read (About 6844 words) 0 visits

There are the overall of paper with code for CV / AIGC / LLM / VLM.
https://github.com/Gojay001/paper-with-code-skills.
[Updating…]

AIGC (AI Generated Content)
- GAN
- VAE
- Diffusion
- Video Generation
- Applications
  - Face Editing
  - Face Swapping
Agent
- Agentic Image Generation
LLM / VLM (Large Language Model / Vision-Language Model)
- Transformer
- ViT
- PLM
- LLM
- VLM
CV (Computer Vision)
- Backbone
- Optimization
- Detection
- Segmentation
- Tracking
  - MOT
  - VOT
- FSS
- FSL
- 3D-Face
- Others
  - Detection-3D
  - RGBD-SOT
- Survey

Generative Adversarial Network

Title	Paper	Conf	Code
GAN	Generative Adversarial Networks	arXiv(2014)	[code]
pix2pix	Image-to-Image Translation with Conditional Adversarial Networks	arXiv(2016) / CVPR(2017)	PyTorch
CycleGAN	Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks	ICCV(2017)	PyTorch
pix2pixHD	High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs	arXiv(2017) / CVPR(2018)	PyTroch
StyleGAN	A Style-Based Generator Architecture for Generative Adversarial Networks	arXiv(2018) / CVPR(2019)	TensorFlow
StyleGAN2	Analyzing and Improving the Image Quality of StyleGAN	arXiv(2019) / CVPR(2020)	TensorFlow
StyleGAN2-ADA	Training Generative Adversarial Networks with Limited Data	NIPS(2020)	PyTorch
StyleCLIP	StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery	ICCV(2021)	PyTorch
MobileStyleGAN	MobileStyleGAN: A Lightweight Convolutional Neural Network for High-Fidelity Image Synthesis	arXiv(2021)	PyTorch
StyleGAN3	Alias-Free Generative Adversarial Networks	NIPS(2021)	PyTorch

More implementation for GANs can be found in PyTorch-GAN.

Variational Auto-Encoder

Title	Paper	Conf	Code
VAE	Auto-Encoding Variational Bayes	arXiv(2013) / ICLR(2014)	PyTorch
VQ-VAE	Neural Discrete Representation Learning	arXiv(2017) / NeurIPS(2017)	TensorFlow
VQ-VAE-2	Generating Diverse High-Fidelity Images with VQ-VAE-2	arXiv(2019) / NeurIPS(2019)	TensorFlow
VQGAN	Taming Transformers for High-Resolution Image Synthesis	arXiv(2020) / CVPR(2021)	PyTorch

More implementation for VAEs can be found in PyTorch-VAE.

Diffusion Model

Title	Paper	Conf	Code
DDPM	Denoising Diffusion Probabilistic Models	arXiv(2020) / NIPS(2020)	PyTorch
DDIM	Denoising Diffusion Implicit Models	arXiv(2020) / ICLR(2021)	PyTorch
DALL-E	Zero-Shot Text-to-Image Generation	arXiv(2021) / ICML(2021)	PyTorch
SD 1.x	High-Resolution Image Synthesis with Latent Diffusion Models	arXiv(2021) / CVPR(2022)	PyTorch
SD 2	High-Resolution Image Synthesis with Latent Diffusion Models	arXiv(2021) / CVPR(2022)	PyTorch
DALL-E 2	Hierarchical Text-Conditional Image Generation with CLIP Latents	arXiv(2022)	[code]
FM	Flow Matching for Generative Modeling	arXiv(2022) / ICLR(2023)	PyTorch
DiT	Scalable Diffusion Models with Transformers	arXiv(2022) / ICCV(2023)	PyTorch
ControlNet	Adding Conditional Control to Text-to-Image Diffusion Models	arXiv(2023) / ICCV(2023)	PyTorch
SDXL	SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis	arXiv(2023) / ICLR(2024)	PyTorch
PixArt-α	PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis	arXiv(2023) / ICLR(2024)	PyTorch
DALL-E 3	Improving Image Generation with Better Captions	OpenAI(2023)	[code]
PixArt-δ	PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models	arXiv(2024)	PyTorch
SD 3	Scaling Rectified Flow Transformers for High-Resolution Image Synthesis	arXiv(2024) / ICML(2024)	PyTorch
Qihoo-T2X	Qihoo-T2X: An Efficient Proxy-Tokenized Diffusion Transformer for Text-to-Any-Task	arXiv(2024) / ICLR(2025)	PyTorch
RelaCtrl	RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers	arXiv(2025) / AAAI(2026)	PyTorch
U-StyDiT	U-StyDiT: Ultra-high Quality Artistic Style Transfer Using Diffusion Transformers	arXiv(2025) / ICCV(2025)	[code]
GPT-Image-1	Introducing our latest image generation model in the API	OpenAI(2025)	[code]
FLUX.1	FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space	arXiv(2025)	PyTorch
Qwen-Image	Qwen-Image Technical Report	arXiv(2025)	PyTorch
Nano Banana	Introducing Gemini 2.5 Flash Image, our state-of-the-art image model	Google(2025)	[code]
JiT	Back to Basics: Let Denoising Generative Models Denoise	arXiv(2025)	PyTorch
PixelDiT	PixelDiT: Pixel Diffusion Transformers for Image Generation	arXiv(2025) / CVPR(2026)	PyTorch
Z-Image	Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer	arXiv(2025)	PyTorch
FLUX.2	FLUX.2: Frontier Visual Intelligence	BFL(2025)	PyTorch
GPT-Image-1.5	The new ChatGPT Images is here	OpenAI(2025)	[code]
Nano Banana Pro	Introducing Nano Banana Pro	Google(2025)	[code]
ChordEdit	ChordEdit: One-Step Low-Energy Transport for Image Editing	arXiv(2026) / CVPR(2026)	PyTorch
InnoAds-Composer	InnoAds-Composer: Efficient Condition Composition for E-Commerce Poster Generation	arXiv(2026) / CVPR(2026)	[code]
Nano Banana 2	Nano Banana 2: Google’s latest AI image generation model	Google(2026)	[code]
Qwen-Image-2.0	Qwen-Image-2.0 Technical Report	arXiv(2026)	PyTorch
GPT-Image-2	Introducing ChatGPT Images 2.0	OpenAI(2026)	[code]
Ideogram 4.0	Ideogram 4.0 Technical Details: Open model at the forefront of design	Ideogram(2026)	PyTorch
i1	i1: A Simple and Fully Open Recipe for Strong Text-to-Image Models	arXiv(2026)	PyTorch
MiniT2I	MiniT2I: A Minimalist Baseline for Text-to-Image Generation	blog(2026)	PyTorch

More leaderboard for AIGC can be found in Artificial-Analysis.

Video Generation

Title	Paper	Conf	Code
CogVideoX	CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer	arXiv(2024) / ICLR(2025)	PyTorch
FancyVideo	FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance	arXiv(2024) / IJCAI(2025)	PyTorch
HunyuanVideo	HunyuanVideo: A Systematic Framework For Large Video Generative Models	arXiv(2024)	PyTorch
WISA	WISA: World Simulator Assistant for Physics-Aware Text-to-Video Generation	arXiv(2025) / NeurIPS(2025)	PyTorch
Open-Sora 2.0	Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k	arXiv(2025)	PyTorch
Wan2.1	Wan: Open and Advanced Large-Scale Video Generative Models	arXiv(2025)	PyTorch
Wan2.2	Wan: Open and Advanced Large-Scale Video Generative Models	arXiv(2025)	PyTorch
Lay2Story	Lay2Story: Extending Diffusion Transformers for Layout-Togglable Story Generation	arXiv(2025) / ICCV(2025)	[code]
Wan2.2-S2V-14B	Wan-S2V: Audio-Driven Cinematic Video Generation	arXiv(2025)	PyTorch
LongCat-Video	LongCat-Video Technical Report	arXiv(2025)	PyTorch
HunyuanVideo 1.5	HunyuanVideo 1.5 Technical Report	arXiv(2025)	PyTorch
MoFu	MoFu: Scale-Aware Modulation and Fourier Fusion for Multi-Subject Video Generation	arXiv(2025) / AAAI(2026)	[code]
LongCat-Video-Avatar	LongCat-Video-Avatar: Super-Realistic Lip-Synchronized Long Video Generation	blog(2025)	PyTorch
LTX-2	LTX-2: Efficient Joint Audio-Visual Foundation Model	arXiv(2026)	PyTorch
LTX-2.3	LTX-2: Efficient Joint Audio-Visual Foundation Model	blog(2026)	PyTorch
Wan2.6	Wan2.6: Native Multimodal Video Generation with Multi-Shot Narrative	Alibaba(2026)	[code]
MOVA	MOVA: Towards Scalable and Synchronized Video–Audio Generation	arXiv(2026)	PyTorch

AIGC-Applications

Face Editing

Title	Paper	Conf	Code
BeautyGAN	BeautyGAN: Instance-level Facial Makeup Transfer with Deep Generative Adversarial Network	ACM MM(2018)	TensorFlow
GFPGAN	Towards Real-World Blind Face Restoration with Generative Facial Prior	CVPR(2021)	PyTorch
HairCLIP	HairCLIP: Design Your Hair by Text and Reference Image	CVPR(2022)	PyTorch
HairMapper	HairMapper: Removing Hair from Portraits Using GANs	CVPR(2022)	PyTorch
LEDITS	LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance	arXiv(2023)	[code]
LEDITS++	LEDITS++: Limitless Image Editing using Text-to-Image Models	arXiv(2023) / CVPR(2024)	PyTorch

Face Swapping

Title	Paper	Conf	Code
FaceShifter	FaceShifter: Towards High Fidelity And Occlusion Aware Face Swapping	arXiv(2019)	[code]
DeepFaceLab	DeepFaceLab: Integrated, flexible and extensible face-swapping framework	arXiv(2020)	TensorFlow
SimSwap	SimSwap: An Efficient Framework For High Fidelity Face Swapping	ACM MM(2020)	PyTorch
FaceController	FaceController: Controllable Attribute Editing for Face in the Wild	AAAI(2021)	[code]
HifiFace	HifiFace: 3D Shape and Semantic Prior Guided High Fidelity Face Swapping	IJCAI(2021)	PyTorch
GHOST	GHOST—A New Face Swap Approach for Image and Video Domains	IEEE Acess(2022)	PyTorch
MobileFaceSwap	MobileFaceSwap: A Lightweight Framework for Video Face Swapping	AAAI(2022)	PaddlePaddle
E4S	Fine-Grained Face Swapping via Regional GAN Inversion	arXiv(2022) / CVPR(2023)	PyTorch
SimSwap++	SimSwap++: Towards Faster and High-Quality Identity Swapping	TPAMI(2024)	Github
DiffFace	DiffFace: Diffusion-based Face Swapping with Facial Guidance	arXiv(2022) / PR(2025)	PyTorch
DiffSwap	DiffSwap: High-Fidelity and Controllable Face Swapping via 3D-Aware Masked Diffusion	CVPR(2023)	PyTorch
DreamID	DreamID: High-Fidelity and Fast diffusion-based Face Swapping via Triplet ID Group Learning	SIGGRAPH Asia(2025)	GitHub

Agentic Image Generation

Title	Paper	Conf	Code
Qwen-Image-Agent	Qwen-Image-Agent: Bridging the Context Gap in Real-World Image Generation	arXiv(2026)	[code]

Attention or Transformer

Title	Paper	Conf	Code
CAM	Learning Deep Features for Discriminative Localization	arXiv(2015) / CVPR(2016)	Caffe
Transformer	Attention Is All You Need	NIPS(2017)	TensorFlow
SENet	Squeeze-and-Excitation Networks	arXiv(2017) / CVPR(2018)	Caffe
GAT	Graph Attention Networks	arXiv(2017) / ICLR(2018)	TensorFlow
Non-local	Non-local Neural Networks	arXiv(2017) / CVPR(2018)	Caffe

Vision Transformer

Title	Paper	Conf	Code
Image Transformer	Image Transformer	ICML(2018)	[code]
ViT	An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale	arXiv(2020) / ICLR(2021)	PyTorch
Swin Transformer	Swin Transformer: Hierarchical Vision Transformer using Shifted Windows	ICCV(2021)	PyTorch
DINO	Emerging Properties in Self-Supervised Vision Transformers	ICCV(2021)	PyTorch
ResT	ResT: An Efficient Transformer for Visual Recognition	NIPS(2021)	PyTorch
HAT-Net	Vision Transformers with Hierarchical Attention	arXiv(2021)	PyTorch
Shuffle-T	Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer	arXiv(2021)	PyTorch
Swinv2	Swin Transformer V2: Scaling Up Capacity and Resolution	arXiv(2021) / CVPR(2022)	PyTorch
DINOv2	DINOv2: Learning Robust Visual Features without Supervision	arXiv(2023)	PyTorch
DINOv3	DINOv3	arXiv(2025)	PyTorch
LAST-ViT	Vision Transformers Need More Than Registers	arXiv(2026)	PyTorch

More implementation for ViTs can be found in vit-pytorch.

Pre-trained Language Model

Title	Paper	Conf	Code
BERT	BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding	arXiv(2018) / NAACL(2019)	TensorFlow
GPT	Improving Language Understanding by Generative Pre-Training	OpenAI(2018)	TensorFlow
T5	Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer	arXiv(2019) / JMLR(2020)	TensorFlow
GPT-2	Language Models are Unsupervised Multitask Learners	OpenAI(2019)	PyTorch
GLM	GLM: General Language Model Pretraining with Autoregressive Blank Infilling	arXiv(2021) / ACL(2022)	PyTorch
GLM-130B	GLM-130B: An Open Bilingual Pre-trained Model	arXiv(2022) / ICLR(2023)	PyTorch

Large Language Model

Title	Paper	Conf	Code
GPT-3	Language Models are Few-Shot Learners	arXiv(2020) / NeurIPS(2020)	[code]
LLaMA	LLaMA: Open and Efficient Foundation Language Models	arXiv(2023)	PyTorch
GPT-4	GPT-4 Technical Report	arXiv(2023)	[code]
LLaMA 2	Llama 2: Open Foundation and Fine-Tuned Chat Models	arXiv(2023)	PyTorch
Qwen	Qwen Technical Report	arXiv(2023)	PyTorch
Gemini	Gemini: A Family of Highly Capable Multimodal Models	arXiv(2023)	[code]
DeepSeek LLM	DeepSeek LLM: Scaling Open-Source Language Models with Longtermism	arXiv(2024)	PyTorch
Gemma	Gemma: Open Models Based on Gemini Research and Technology	arXiv(2024)	PyTorch
DeepSeek-V2	DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model	arXiv(2024)	PyTorch
GLM-4	ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools	arXiv(2024)	PyTorch
Qwen2	Qwen2 Technical Report	arXiv(2024)	PyTorch
LLaMA 3	The Llama 3 Herd of Models	arXiv(2024)	PyTorch
Claude 3	The Claude 3 Model Family: Opus, Sonnet, Haiku	Anthropic(2024)	[code]
Qwen2.5	Qwen2.5 Technical Report	arXiv(2024)	PyTorch
GPT-o1	OpenAI o1 System Card	arXiv(2024)	[code]
DeepSeek-V3	DeepSeek-V3 Technical Report	arXiv(2024)	PyTorch
DeepSeek-R1	DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning	arXiv(2025)	PyTorch
LLaMA 4	The Llama 4 Herd: The Beginning of a New Era of Natively Multimodal AI Innovation	Meta(2025)	PyTorch
Qwen3	Qwen3 Technical Report	arXiv(2025)	PyTorch
Claude 4	System Card: Claude Opus 4 & Claude Sonnet 4	Anthropic(2025)	[code]
Gemini 2.5	Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities	arXiv(2025)	[code]
Kimi K2	Kimi K2: Open Agentic Intelligence	arXiv(2025)	PyTorch
GLM-4.5	GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models	arXiv(2025)	PyTorch
Gemini 3	Gemini 3 Pro Model Card	Google DeepMind(2025)	[code]
GPT-5	GPT-5 System Card	arXiv(2025)	[code]
GLM-4.7	GLM-4.7: Advancing Coding Capability	Z.ai(2025)	PyTorch
GLM-5	GLM-5: from Vibe Coding to Agentic Engineering	arXiv(2026)	PyTorch
Qwen3.5	Qwen3.5: Towards Native Multimodal Agents	Qwen(2026)	PyTorch
DeepSeek-V4	DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence	DeepSeek(2026)	Hugging Face

Vision Language Model

Title	Paper	Conf	Code
CLIP	Learning Transferable Visual Models From Natural Language Supervision	arXiv(2021) / ICML(2021)	PyTorch
BLIP	BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation	arXiv(2022) / ICML(2022)	PyTorch
SigLIP	Sigmoid Loss for Language Image Pre-Training	arXiv(2023) / ICCV(2023)	JAX
Qwen-VL	Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond	arXiv(2023)	PyTorch
Qwen2-VL	Qwen2-VL: Enhancing Vision-Language Model’s Perception of the World at Any Resolution	arXiv(2024)	PyTorch
Qwen2.5-VL	Qwen2.5-VL Technical Report	arXiv(2025)	PyTorch
Qwen3-VL	Qwen3-VL Technical Report	arXiv(2025)	PyTorch
Vision Banana	Image Generators are Generalist Vision Learners	arXiv(2026)	[code]

Backbone

Title	Paper	Conf	Code
LeNet-5	Gradient-based learning applied to document recognition	IEEE(1998)	[code]
AlexNet	ImageNet Classification with Deep Convolutional Neural Networks	NIPS(2012)	[code]
NIN	Network In Network	arXiv(2013)	PyTorch
VGG	Very Deep Convolutional Networks for Large-Scale Image Recognition	ICLR(2015)	[code]
GoogLeNet	Going deeper with convolutions	CVPR(2015)	PyTorch
ResNet	Deep Residual Learning for Image Recognition	CVPR(2016)	PyTorch
Inception-v4	Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning	AAAI(2017)	[code]
DenseNet	Densely Connected Convolutional Networks	CVPR(2017)	[code]
DLA	Deep Layer Aggregation	CVPR(2018)	PyTorch
ShuffleNet	ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices	CVPR(2018)	[code]
MobileNetV3	Searching for MobileNetV3	ICCV(2019)	[code]

More information can be found in Awesome - Image Classification.

Object Detection

Title	Paper	Conf	Code
R-CNN	Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation	CVPR(2014)	[code]
SPP	Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition	TPAMI(2015)	[code]
Fast R-CNN	Fast R-CNN	ICCV(2015)	[code]
Faster R-CNN	Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks	NIPS(2015)	PyTorch
SSD	SSD: Single Shot MultiBox Detector	ECCV(2016)	Caffe
YOLO	You Only Look Once: Unified, Real-Time Object Detection	CVPR(2016)	[code]
YOLOv2	YOLO9000: Better, Faster, Stronger	CVPR(2017)	[code]
FPN	Feature Pyramid Networks for Object Detection	CVPR(2017)	[code]
RetinaNet	Focal Loss for Dense Object Detection	ICCV(2017)	[code]
YOLOv3	YOLOv3: An Incremental Improvement	arXiv(2018)	Offical
CornerNet	CornerNet: Detecting Objects as Paired Keypoints	ECCV(2018)	PyTorch
CenterNet	Objects as Points	arXiv(2019)	PyTorch
YOLOv4	YOLOv4: Optimal Speed and Accuracy of Object Detection	arXiv(2020)	Offical
YOLOF	You Only Look One-level Feature	CVPR(2021)	PyTorch

More information can be found in awesome-object-detection.

Object Segmentation

Title	Paper	Conf	Code
FCN	Fully convolutional networks for semantic segmentation	CVPR(2015)	PyTorch
U-Net	U-Net: Convolutional Networks for Biomedical Image Segmentation	MICCAI(2015)	PyTorch
Seg-Net	SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling	arXiv(2015)	PyTorch
DeepLab V1	Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs	arXiv(2014) / ICLR(2015)	PyTorch
PSPNet	Pyramid Scene Parsing Network	CVPR(2017)	PyTorch
DeepLab V2	DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs	arXiv(2016) / TPAMI(2017)	PyTorch
Mask R-CNN	Mask R-CNN	ICCV / TPAMI(2017)	PyTorch
DeepLab V3	Rethinking Atrous Convolution for Semantic Image Segmentation	arXiv(2017)	PyTorch
PointNet	PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation	CVPR(2017)	PyTorch
PointNet++	PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space	NIPS(2017)	PyTorch
DeepLab V3+	Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation	ECCV(2018)	PyTorch
DGCNet	Dual Graph Convolutional Network for Semantic Segmentation	BMVC(2019)	PyTorch
SETR	Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers	CVPR(2021)	PyTorch
Segmenter	Segmenter: Transformer for Semantic Segmentation	arXiv(2021)	PyTorch
SegFormer	SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers	arXiv(2021)	PyTorch
FTN	Fully Transformer Networks for Semantic ImageSegmentation	arXiv(2021)	[code]

Object Tracking

Multiple Object Tracking

Title	Paper	Conf	Code
SORT	Simple Online and Realtime Tracking	ICIP(2016)	PyTorch
DeepSORT	Simple Online and Realtime Tracking with a Deep Association Metric	ICIP(2017)	PyTorch
Tracktor	Tracking without bells and whistles	ICCV(2019)	PyTorch
FFT	Multiple Object Tracking by Flowing and Fusing	arXiv(2020)	[code]
JRMOT	JRMOT: A Real-Time 3D Multi-Object Tracker and a New Large-Scale Dataset	arXiv(2020)	[code]
Tracklet	Multi-object Tracking via End-to-end Tracklet Searching and Ranking	arXiv(2020)	[code]
DMCT	Real-time 3D Deep Multi-Camera Tracking	arXiv(2020)	[code]
FairMOT	A Simple Baseline for Multi-Object Tracking	arXiv(2020)	PyTorch
CenterPoint	Center-based 3D Object Detection and Tracking	CVPR(2021)	PyTorch

Visual Object Tracking

Title	Paper	Conf	Code
DepthTrack	Real-time depth-based tracking using a binocular camera	WCICA(2016)	[code]
BinocularTrack	Research on Target Tracking Algorithm Based on Parallel Binocular Camera	ITAIC(2019)	[code]
SiamFC	Fully-Convolutional Siamese Networks for Object Tracking	ECCV(2016)	PyTorch
SiamRPN	High Performance Visual Tracking with Siamese Region Proposal Network	CVPR(2018)	PyTorch
SiamRPN++	SiamRPN++: Evolution of Siamese Visual Tracking with Very Deep Networks	CVPR(2019)	PyTorch
SiamMask	Fast Online Object Tracking and Segmentation: A Unifying Approach	CVPR(2019)	PyTorch
GlobalTrack	GlobalTrack: A Simple and Strong Baseline for Long-term Tracking	AAAI(2020)	PyTorch
SiamCAR	SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking	CVPR(2020)	PyTorch
SiamBAN	Siamese Box Adaptive Network for Visual Tracking	CVPR(2020)	PyTorch
SiamAttn	Deformable Siamese Attention Networks for Visual Object Tracking	CVPR(2020)	PyTorch
PAMCC-AOT	Pose-Assisted Multi-Camera Collaboration for Active Object Tracking	AAAI(2020)	[code]
TSDM	TSDM: Tracking by SiamRPN++ with a Depth-refiner and a Mask-generator	arXiv(2020)	PyTorch
SiamGAT	Graph Attention Tracking	CVPR(2021)	PyTorch
RE-SiamNets	Rotation Equivariant Siamese Networks for Tracking	CVPR(2021)	PyTorch

Few-Shot Segmentation

Title	Paper	Conf	Code
OSLSM	One-Shot Learning for Semantic Segmentation	BMVC(2017)	Caffe
co-FCN	Conditional Networks for Few-Shot Semantic Segmentation	ICLR(2018)	[code]
AMP	AMP: Adaptive Masked Proxies for Few-Shot Segmentation	ICCV(2019)	Pytorch
SG-One	SG-One: Similarity Guidance Network for One-Shot Semantic Segmentation	arXiv(2018) / TCYB(2020)	PyTorch
CENet	Learning Combinatorial Embedding Networks for Deep Graph Matching	ICCV(2019)	Pytorch
PANet	PANet: Few-Shot Image Semantic Segmentation with Prototype Alignment	ICCV(2019)	PyTorch
CANet	CANet: Class-Agnostic Segmentation Networks with Iterative Refinement and Attentive Few-Shot Learning	CVPR(2019)	PyTorch
PGNet	Pyramid Graph Networks with Connection Attentions for Region-Based One-Shot Semantic Segmentation	ICCV(2019)	[code]
CRNet	CRNet: Cross-Reference Networks for Few-Shot Segmentation	CVPR(2020)	[code]
FGN	FGN: Fully Guided Network for Few-Shot Instance Segmentation	CVPR(2020)	[code]
OTB	On the Texture Bias for Few-Shot CNN Segmentation	arXiv(2020)	TensorFlow
LTM	A New Local Transformation Module for Few-Shot Segmentation	MMMM(2020)	[code]
SimPropNet	SimPropNet: Improved Similarity Propagation for Few-shot Image Segmentation	IJCAI(2020)	[code]
PPNet	Part-aware Prototype Network for Few-shot Semantic Segmentation	ECCV(2020)	PyTorch
PFENet	PFENet: Prior Guided Feature Enrichment Network for Few-shot Segmentation	TPAMI(2020)	PyTorch
PMMs	Prototype Mixture Models for Few-shot Semantic Segmentation	ECCV(2020)	PyTorch
GFS-Seg	Generalized Few-Shot Semantic Segmentation	arXiv(2020)	[code]
SCL	Self-Guided and Cross-Guided Learning for Few-Shot Segmentation	CVPR(2021)	PyTorch
ASGNet	Adaptive Prototype Learning and Allocation for Few-Shot Segmentation	CVPR(2021)	PyTorch
HSNet	Hypercorrelation Squeeze for Few-Shot Segmenation	ICCV(2021)	PyTorch
BAM	Learning What Not to Segment: A New Perspective on Few-Shot Segmentation	CVPR(2022)	PyTorch

More information can be found in Few-Shot-Semantic-Segmentation-Papers.

Few-Shot Learning

Title	Paper	Conf	Code
RN	Learning to Compare: Relation Network for Few-Shot Learning	CVPR(2018)	PyTorch
SimSiam	Exploring Simple Siamese Representation Learning	CVPR(2021)	PyTorch

3D Face Reconstruction and Facial Animation

Title	Paper	Conf	Code
3DMM	A Morphable Model For The Synthesis Of 3D Faces	SIGGRAPH(1999)	[code]
CameraCalibration	A Flexible New Technique for CameraCalibration	TPAMI(2000)	[code]
Bilinear	Bilinear Models for 3-D Face andFacial Expression Recognition	TIFS(2008)	[code]
DDE	Displaced Dynamic Expression Regression forReal-time Facial Tracking and Animation	TOG(2014)	[code]
FaceWarehouse	FaceWarehouse: a 3D Facial Expression Databasefor Visual Computing	TVCG(2014)	[code]
Face2Face	Face2Face: Real-Time Face Capture and Reenactment of RGB Videos	CVPR(2016)	[code]
DynamicAvatars	Real-time Facial Animation with Image-based Dynamic Avatars	TOG(2016)	[code]
FLAME	Learning a model of facial shape and expression from 4D scans	TOG(2017)	Tensorflow PyTorch
Nonlinear	Nonlinear 3D Face Morphable Model	CVPR(2018)	Tensorflow
DynamicRigidityPrior	Stabilized real-time face tracking via a learned dynamic rigidity prior	TOG(2018)	[code]
Deep3D	Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set	CVPR(2019)	Tensorflow PyTorch
SimpleAnimation	Face It!: A Pipeline for Real-Time Performance-Driven Facial Animation	ICIP(2019)	[code]
RingNet	Learning to Regress 3D Face Shape and Expression from an Image without 3D Supervision	CVPR(2019)	Tensorflow
FOCUS	To fit or not to fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision	arXiv(2021)	PyTorch
MICA	Towards Metrical Reconstruction of Human Faces	ECCV(2022)	PyTorch
HRN	A Hierarchical Representation Network for Accurate and Detailed Face Reconstruction from In-The-Wild Images	CVPR(2023)	PyTorch

Salient Object Detection

Title	Paper	Conf	Code
UC-Net	UC-Net: Uncertainty Inspired RGB-D Saliency Detection via Conditional Variational Autoencoders	CVPR(2020)	PyTorch
JL-DCF	JL-DCF: Joint Learning and Densely-Cooperative Fusion Framework for RGB-D Salient Object Detection	CVPR(2020)	PyTorch
SA-Gate	Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation	ECCV(2020)	PyTorch
BiANet	Bilateral Attention Network for RGB-D Salient Object Detection	TIP(2021)	[Code]
DSA^2F	Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion	CVPR(2021)	[Code]

3D Object Detection

Title	Paper	Conf	Code
PV-RCNN	PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection	CVPR(2020)	PyTorch

Optimization

Title	Paper	Conf	Code
ReLU	Deep Sparse Rectifier Neural Networks	JMLR(2011)	[code]
Momentum	On the importance of initialization and momentum in deep learning	ICML(2013)	[code]
Dropout	Dropout: a simple way to prevent neural networks from overfitting	JMLR(2014)	[code]
Adam	Adam: A Method for Stochastic Optimization	ICLR(2015)	[code]
BN	Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift	ICML(2015)	[code]
GDoptimization	An overview of gradient descent optimization algorithms	arXiv(2016)	[code]
StableCNN	Single-frame regularization for temporally stable cnns	CVPR(2019)	[code]

Survey

Title	Paper	Conf
3D-Detection-Survey-2019	A Survey on 3D Object Detection Methods for Autonomous Driving Applications	ITS(2019)
FSL-Survey-2019	Generalizing from a Few Examples: A Survey on Few-Shot Learning	CSUR(2019)
MOT-Survey-2020	Deep Learning in Video Multi-Object Tracking: A Survey	Neurocomputing(2020)
Transformer-Survey-2021	A Survey of Transformers	arXiv(2021)

Title：Overview
Author：
Link：https://gojay.top/2020/04/01/Overview/
Date：2020-04-01
Copyright：All articles in this blog are licensed under CC BY-NC-SA 4.0 unless stating additionally.

DL, Overview

Overview

Contents

Generative Adversarial Network

Variational Auto-Encoder

Diffusion Model

Video Generation

AIGC-Applications

Face Editing

Face Swapping

Agentic Image Generation

Attention or Transformer

Vision Transformer

Pre-trained Language Model

Large Language Model

Vision Language Model

Backbone

Object Detection

Object Segmentation

Object Tracking

Multiple Object Tracking

Visual Object Tracking

Few-Shot Segmentation

Few-Shot Learning

3D Face Reconstruction and Facial Animation

Salient Object Detection

3D Object Detection

Optimization

Survey

Comments

Catalogue

Your browser is out-of-date!