There are the overall of paper with code for CV / AIGC / LLM / VLM.
https://github.com/Gojay001/DeepLearning-Paper-with-Code.
[Updating…]
Contents
- AIGC (AI Generated Content)
- LLM / VLM (Large Language Model / Vision-Language Model)
- CV (Computer Vision)
Generative Adversarial Network
More implementation for GANs can be found in PyTorch-GAN.
Variational Auto-Encoder
| Title | Paper | Conf | Code |
|---|---|---|---|
| VAE | Auto-Encoding Variational Bayes | arXiv(2013) / ICLR(2014) | PyTorch |
More implementation for VAEs can be found in PyTorch-VAE.
Diffusion Model
| Title | Paper | Conf | Code |
|---|---|---|---|
| DDPM | Denoising Diffusion Probabilistic Models | arXiv(2020) / NIPS(2020) | PyTorch |
| SD | High-Resolution Image Synthesis with Latent Diffusion Models | arXiv(2021) / CVPR(2022) | PyTorch |
| DiT | Scalable Diffusion Models with Transformers | arXiv(2022) / ICCV(2023) | PyTorch |
| JiT | Back to Basics: Let Denoising Generative Models Denoise | arXiv(2025) | PyTorch |
| PixelDiT | PixelDiT: Pixel Diffusion Transformers for Image Generation | arXiv(2025) / CVPR(2026) | PyTorch |
More implementation for Diffusion Models can be found in Awesome-Diffusion-Models.
AIGC-Applications
Face Editing
| Title | Paper | Conf | Code |
|---|---|---|---|
| BeautyGAN | BeautyGAN: Instance-level Facial Makeup Transfer with Deep Generative Adversarial Network | ACM MM(2018) | TensorFlow |
| GFPGAN | Towards Real-World Blind Face Restoration with Generative Facial Prior | CVPR(2021) | PyTorch |
| HairCLIP | HairCLIP: Design Your Hair by Text and Reference Image | CVPR(2022) | PyTorch |
| HairMapper | HairMapper: Removing Hair from Portraits Using GANs | CVPR(2022) | PyTorch |
| LEDITS | LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance | arXiv(2023) | [code] |
| LEDITS++ | LEDITS++: Limitless Image Editing using Text-to-Image Models | arXiv(2023) / CVPR(2024) | PyTorch |
Face Swapping
Attention or Transformer
| Title | Paper | Conf | Code |
|---|---|---|---|
| CAM | Learning Deep Features for Discriminative Localization | arXiv(2015) / CVPR(2016) | Caffe |
| Transformer | Attention Is All You Need | NIPS(2017) | TensorFlow |
| SENet | Squeeze-and-Excitation Networks | arXiv(2017) / CVPR(2018) | Caffe |
| GAT | Graph Attention Networks | arXiv(2017) / ICLR(2018) | TensorFlow |
| Non-local | Non-local Neural Networks | arXiv(2017) / CVPR(2018) | Caffe |
Vision Transformer
| Title | Paper | Conf | Code |
|---|---|---|---|
| Image Transformer | Image Transformer | ICML(2018) | [code] |
| ViT | An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale | arXiv(2020) / ICLR(2021) | PyTorch |
| Swin Transformer | Swin Transformer: Hierarchical Vision Transformer using Shifted Windows | ICCV(2021) | PyTorch |
| DINO | Emerging Properties in Self-Supervised Vision Transformers | ICCV(2021) | PyTorch |
| ResT | ResT: An Efficient Transformer for Visual Recognition | NIPS(2021) | PyTorch |
| HAT-Net | Vision Transformers with Hierarchical Attention | arXiv(2021) | PyTorch |
| Shuffle-T | Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer | arXiv(2021) | PyTorch |
| Swinv2 | Swin Transformer V2: Scaling Up Capacity and Resolution | arXiv(2021) / CVPR(2022) | PyTorch |
| DINOv2 | DINOv2: Learning Robust Visual Features without Supervision | arXiv(2023) | PyTorch |
| DINOv3 | DINOv3 | arXiv(2025) | PyTorch |
| LAST-ViT | Vision Transformers Need More Than Registers | arXiv(2026) | PyTorch |
More implementation for ViTs can be found in vit-pytorch.
Backbone
| Title | Paper | Conf | Code |
|---|---|---|---|
| LeNet-5 | Gradient-based learning applied to document recognition | IEEE(1998) | [code] |
| AlexNet | ImageNet Classification with Deep Convolutional Neural Networks | NIPS(2012) | [code] |
| NIN | Network In Network | arXiv(2013) | PyTorch |
| VGG | Very Deep Convolutional Networks for Large-Scale Image Recognition | ICLR(2015) | [code] |
| GoogLeNet | Going deeper with convolutions | CVPR(2015) | PyTorch |
| ResNet | Deep Residual Learning for Image Recognition | CVPR(2016) | PyTorch |
| Inception-v4 | Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning | AAAI(2017) | [code] |
| DenseNet | Densely Connected Convolutional Networks | CVPR(2017) | [code] |
| DLA | Deep Layer Aggregation | CVPR(2018) | PyTorch |
| ShuffleNet | ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices | CVPR(2018) | [code] |
| MobileNetV3 | Searching for MobileNetV3 | ICCV(2019) | [code] |
More information can be found in Awesome - Image Classification.
Object Detection
More information can be found in awesome-object-detection.
Object Segmentation
Object Tracking
Multiple Object Tracking
| Title | Paper | Conf | Code |
|---|---|---|---|
| SORT | Simple Online and Realtime Tracking | ICIP(2016) | PyTorch |
| DeepSORT | Simple Online and Realtime Tracking with a Deep Association Metric | ICIP(2017) | PyTorch |
| Tracktor | Tracking without bells and whistles | ICCV(2019) | PyTorch |
| FFT | Multiple Object Tracking by Flowing and Fusing | arXiv(2020) | [code] |
| JRMOT | JRMOT: A Real-Time 3D Multi-Object Tracker and a New Large-Scale Dataset | arXiv(2020) | [code] |
| Tracklet | Multi-object Tracking via End-to-end Tracklet Searching and Ranking | arXiv(2020) | [code] |
| DMCT | Real-time 3D Deep Multi-Camera Tracking | arXiv(2020) | [code] |
| FairMOT | A Simple Baseline for Multi-Object Tracking | arXiv(2020) | PyTorch |
| CenterPoint | Center-based 3D Object Detection and Tracking | CVPR(2021) | PyTorch |
Visual Object Tracking
Few-Shot Segmentation
More information can be found in Few-Shot-Semantic-Segmentation-Papers.
Few-Shot Learning
| Title | Paper | Conf | Code |
|---|---|---|---|
| RN | Learning to Compare: Relation Network for Few-Shot Learning | CVPR(2018) | PyTorch |
| SimSiam | Exploring Simple Siamese Representation Learning | CVPR(2021) | PyTorch |
3D Face Reconstruction and Facial Animation
Salient Object Detection
| Title | Paper | Conf | Code |
|---|---|---|---|
| UC-Net | UC-Net: Uncertainty Inspired RGB-D Saliency Detection via Conditional Variational Autoencoders | CVPR(2020) | PyTorch |
| JL-DCF | JL-DCF: Joint Learning and Densely-Cooperative Fusion Framework for RGB-D Salient Object Detection | CVPR(2020) | PyTorch |
| SA-Gate | Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation | ECCV(2020) | PyTorch |
| BiANet | Bilateral Attention Network for RGB-D Salient Object Detection | TIP(2021) | [Code] |
| DSA^2F | Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion | CVPR(2021) | [Code] |
3D Object Detection
| Title | Paper | Conf | Code |
|---|---|---|---|
| PV-RCNN | PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection | CVPR(2020) | PyTorch |
Optimization
| Title | Paper | Conf | Code |
|---|---|---|---|
| ReLU | Deep Sparse Rectifier Neural Networks | JMLR(2011) | [code] |
| Momentum | On the importance of initialization and momentum in deep learning | ICML(2013) | [code] |
| Dropout | Dropout: a simple way to prevent neural networks from overfitting | JMLR(2014) | [code] |
| Adam | Adam: A Method for Stochastic Optimization | ICLR(2015) | [code] |
| BN | Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift | ICML(2015) | [code] |
| GDoptimization | An overview of gradient descent optimization algorithms | arXiv(2016) | [code] |
| StableCNN | Single-frame regularization for temporally stable cnns | CVPR(2019) | [code] |
Survey
| Title | Paper | Conf |
|---|---|---|
| 3D-Detection-Survey-2019 | A Survey on 3D Object Detection Methods for Autonomous Driving Applications | ITS(2019) |
| FSL-Survey-2019 | Generalizing from a Few Examples: A Survey on Few-Shot Learning | CSUR(2019) |
| MOT-Survey-2020 | Deep Learning in Video Multi-Object Tracking: A Survey | Neurocomputing(2020) |
| Transformer-Survey-2021 | A Survey of Transformers | arXiv(2021) |