Found 87 repositories(showing 30)
aigc-apps
📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion
Vchitect
[TMLR 2025] Latte: Latent Diffusion Transformer for Video Generation.
alibaba
[CVPR'25]Tora: Trajectory-oriented Diffusion Transformer for Video Generation
wzk1015
[ACM MM 2021 Best Paper Award] Video Background Music Generation with Controllable Music Transformer
TencentARC
[CVPR 2025] Official code of "DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation"
songweige
Official PyTorch implementation of TATS: A Long Video Generation Framework with Time-Agnostic VQGAN and Time-Sensitive Transformer (ECCV 2022)
AMAAI-Lab
Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model
AdaCache-DiT
Code for our ICCV 2025 paper "Adaptive Caching for Faster Video Generation with Diffusion Transformers"
thu-nics
[ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
JIA-Lab-research
[ICCV 2025] MagicMirror: ID-Preserved Video Generation in Video Diffusion Transformers
DSaurus
This repository is the official implementation of Human4DiT: 360-degree Human Video Generation with 4D Diffusion Transformer.
YuxiaoYang23
Official implementation of EchoMotion: Unified Human Video and Motion Generation via Dual-Modality Diffusion Transformer |
lucidrains
Implementation of Transframer, Deepmind's U-net + Transformer architecture for up to 30 seconds video generation, in Pytorch
klingfoley
Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation
Tencent
KsanaDiT: High-Performance DiT (Diffusion Transformer) Inference Framework for Video & Image Generation
Yaziwel
[MICCAI 2025] FEAT:Full-Dimensional Efficient Attention Transformer for Medical Video Generation.
HCPLab-SYSU
Spatial-Temporal Knowledge-Embedded Transformer for Video Scene Graph Generation (TIP 2024, ACM MM 2023)
wlfeng0509
(ICML-2025) Q-VDiT: Towards Accurate Quantization and Distillation of Video-Generation Diffusion Transformers
explainingai-code
This repo implements Video generation model using Latent Diffusion Transformers(Latte) in PyTorch and provides training and inference code on Moving mnist dataset and UCF101 dataset
Ceaglex
The code and weight for LoVA. LoVA is a novel model for Long-form Video-to-Audio generation. Based on the Diffusion Transformer (DiT) architecture, LoVA proves to be more effective at generating long-form audio compared to existing autoregressive models and UNet-based diffusion models.
LTX-2-desktop
LTX-2.3 is an open-source generative video architecture based on the Diffusion Transformer (DiT). The model delivers commercial-grade generation quality (on par with Google Veo 3) but without the strict limitations of closed ecosystems. Crucially, the model operates completely without censorship or content restrictions.
Video Summary Generation
A tokenized graph transformer-based video scene graph generation model that considers the temporal consistency of the video.
Thehunk1206
A complete video generation pipeline(Training and Inference) using MeanFlow on DiT (Diffusion Transformer) architecture.
Souradeep1101
HunyuanVideo-T2V is an implementation of the HunYuanVideo research paper using PyTorch. The project focuses on building a scalable pipeline for Text-to-Video (T2V) generation using diffusion models, transformers, and multimodal language models.
Nilanshrajput
New Transformer network-based GAN for video generation.
yy1lab
Semantic Frame Aggregation-based Transformer for Live Video Comment Generation
gitzyong812
Official implementation of the TMM'23 paper “End-to-End Video Scene Graph Generation with Temporal Propagation Transformer”
AdaneNT
Automated Video News Clip Generation via Robust Video Summarization using Deep Generative Models and Transformers
SadakhyaNarnur
A Deep Learning NMT Transformers approach for text to gloss translation and GAN implementation for realistic video generation.