Search Results

Found 87 repositories(showing 30)

EasyAnimate

aigc-apps

💛74

📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion

2.3k

181

Apache-2.0

Python

Updated 1 hour ago

Latte

Vchitect

💛74

[TMLR 2025] Latte: Latent Diffusion Transformer for Video Generation.

1.9k

191

Apache-2.0

Python

Updated 2 days ago

Tora

alibaba

💛72

[CVPR'25]Tora: Trajectory-oriented Diffusion Transformer for Video Generation

1.2k

Apache-2.0

Python

Updated 47 minutes ago

video-bgm-generation

wzk1015

🧡61

[ACM MM 2021 Best Paper Award] Video Background Music Generation with Controllable Music Transformer

323

MIT

Python

Updated 4 weeks ago

ai-musicmusic-generation

DiTCtrl

TencentARC

💛70

[CVPR 2025] Official code of "DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation"

322

NOASSERTION

Python

Updated 4 days ago

TATS

songweige

🧡50

Official PyTorch implementation of TATS: A Long Video Generation Framework with Time-Agnostic VQGAN and Time-Sensitive Transformer (ECCV 2022)

287

MIT

Python

Updated 1 month ago

audio-to-videolong-video-generationpytorch+4

Video2Music

AMAAI-Lab

💛70

Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model

194

MIT

Python

Updated 2 days ago

affective-computingaideep-learning+2

AdaCache

AdaCache-DiT

🧡55

Code for our ICCV 2025 paper "Adaptive Caching for Faster Video Generation with Diffusion Transformers"

171

Apache-2.0

Python

Updated 1 week ago

cachingcontent-adaptivediffusion-transformer+2

ViDiT-Q

thu-nics

❤️40

[ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation

157

Python

Updated 1 week ago

diffusion-modelsefficientmlmixed-precision+1

MagicMirror

JIA-Lab-research

🧡55

[ICCV 2025] MagicMirror: ID-Preserved Video Generation in Video Diffusion Transformers

130

Updated 23 hours ago

Human4DiT

DSaurus

🧡50

This repository is the official implementation of Human4DiT: 360-degree Human Video Generation with 4D Diffusion Transformer.

108

Python

Updated 2 weeks ago

EchoMotion

YuxiaoYang23

💛70

Official implementation of EchoMotion: Unified Human Video and Motion Generation via Dual-Modality Diffusion Transformer |

NOASSERTION

Updated 3 hours ago

transframer-pytorch

lucidrains

❤️40

Implementation of Transframer, Deepmind's U-net + Transformer architecture for up to 30 seconds video generation, in Pytorch

MIT

Python

Updated 8 months ago

artificial-intelligenceattention-mechanismsdeep-learning+3

Kling-Foley

klingfoley

❤️45

Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation

JavaScript

Updated 1 month ago

KsanaDiT

Tencent

💛70

KsanaDiT: High-Performance DiT (Diffusion Transformer) Inference Framework for Video & Image Generation

Apache-2.0

Python

Updated 1 day ago

ai-infraattentioncuda+8

FEAT

Yaziwel

🧡60

[MICCAI 2025] FEAT：Full-Dimensional Efficient Attention Transformer for Medical Video Generation.

Apache-2.0

Python

Updated 3 weeks ago

diffusionrwkvtransformer+1

STKET

HCPLab-SYSU

❤️45

Spatial-Temporal Knowledge-Embedded Transformer for Video Scene Graph Generation (TIP 2024, ACM MM 2023)

Python

Updated 2 weeks ago

scene-graph-generationvideo-understanding

Q-VDiT

wlfeng0509

🧡50

(ICML-2025) Q-VDiT: Towards Accurate Quantization and Distillation of Video-Generation Diffusion Transformers

MIT

Python

Updated 1 week ago

VideoGeneration-PyTorch

explainingai-code

🧡50

This repo implements Video generation model using Latent Diffusion Transformers(Latte) in PyTorch and provides training and inference code on Moving mnist dataset and UCF101 dataset

MIT

Python

Updated 1 week ago

diffusion-transformerlatent-diffusion-modelslatte+1

The code and weight for LoVA. LoVA is a novel model for Long-form Video-to-Audio generation. Based on the Diffusion Transformer (DiT) architecture, LoVA proves to be more effective at generating long-form audio compared to existing autoregressive models and UNet-based diffusion models.

Python

Updated 6 days ago

audio-generationmultimodalvideo-to-audio

LTX-2.3

LTX-2-desktop

🧡65

LTX-2.3 is an open-source generative video architecture based on the Diffusion Transformer (DiT). The model delivers commercial-grade generation quality (on par with Google Veo 3) but without the strict limitations of closed ecosystems. Crucially, the model operates completely without censorship or content restrictions.

MIT

Python

Updated 2 days ago

ltx-2ltx-2-3ltx-2-download+3

Transformer-Based-Text-Summary-Generation-for-Videos

moreharsh

❤️40

Video Summary Generation

MIT

Jupyter Notebook

Updated 10 months ago

Learning-Temporal-Consistency-for-Video-Scene-Graph-Generation

J-PARK11

❤️35

A tokenized graph transformer-based video scene graph generation model that considers the temporal consistency of the video.

Python

Updated 3 months ago

videogen-mean-flow

Thehunk1206

❤️45

A complete video generation pipeline(Training and Inference) using MeanFlow on DiT (Diffusion Transformer) architecture.

Python

Updated 1 month ago

HunyuanVideo-T2V

Souradeep1101

🧡60

HunyuanVideo-T2V is an implementation of the HunYuanVideo research paper using PyTorch. The project focuses on building a scalable pipeline for Text-to-Video (T2V) generation using diffusion models, transformers, and multimodal language models.

MIT

Python

Updated 3 weeks ago

Video_Generation_Transformer

Nilanshrajput

❤️35

New Transformer network-based GAN for video generation.

Jupyter Notebook

Updated 2 years ago

ganpytorchsingan+1

SFAT

yy1lab

❤️20

Semantic Frame Aggregation-based Transformer for Live Video Comment Generation

Python

Updated 5 months ago

TPT_TMM23

gitzyong812

❤️35

Official implementation of the TMM'23 paper “End-to-End Video Scene Graph Generation with Temporal Propagation Transformer”

Apache-2.0

Python

Updated 9 months ago

AI-Video-Clipping-

AdaneNT

❤️40

Automated Video News Clip Generation via Robust Video Summarization using Deep Generative Models and Transformers

Python

Updated 1 month ago

Automatic-Speech-to-Sign-Language-Translation

SadakhyaNarnur

❤️35

A Deep Learning NMT Transformers approach for text to gloss translation and GAN implementation for realistic video generation.

Jupyter Notebook

Updated 10 months ago

GitHub Explorer

Search Results

EasyAnimate

Latte

Tora

video-bgm-generation

DiTCtrl

TATS

Video2Music

AdaCache

ViDiT-Q

MagicMirror

Human4DiT

EchoMotion

transframer-pytorch

Kling-Foley

KsanaDiT

FEAT

STKET

Q-VDiT

VideoGeneration-PyTorch

LoVA

LTX-2.3

Transformer-Based-Text-Summary-Generation-for-Videos

Learning-Temporal-Consistency-for-Video-Scene-Graph-Generation

videogen-mean-flow

HunyuanVideo-T2V

Video_Generation_Transformer

SFAT

TPT_TMM23

AI-Video-Clipping-

Automatic-Speech-to-Sign-Language-Translation

EasyAnimate

Latte

Tora

video-bgm-generation

DiTCtrl

TATS

Video2Music

AdaCache

ViDiT-Q

MagicMirror

Human4DiT

EchoMotion

transframer-pytorch

Kling-Foley

KsanaDiT

FEAT

STKET

Q-VDiT

VideoGeneration-PyTorch

LoVA

LTX-2.3

Transformer-Based-Text-Summary-Generation-for-Videos

Learning-Temporal-Consistency-for-Video-Scene-Graph-Generation

videogen-mean-flow

HunyuanVideo-T2V

Video_Generation_Transformer

SFAT

TPT_TMM23

AI-Video-Clipping-

Automatic-Speech-to-Sign-Language-Translation