Search Results

Found 43 repositories(showing 30)

Vision-R1

Osilly

🧡61

[ICLR2026] This is the first paper to explore how to effectively use R1-like RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-start initialization and RL training to incentivize reasoning capability.

1.2k

Python

Updated 14 hours ago

Griffon

jefferyZhan

🧡60

Official repo of Griffon series including v1(ECCV 2024), v2(ICCV 2025), G, and R, and also the RL tool Vision-R1.

250

Apache-2.0

Python

Updated 1 week ago

GUI-R1

ritzz-ai

🧡65

Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents

241

Apache-2.0

Python

Updated 2 days ago

deep-reinforcement-learninggrpogui-agent+6

VLAA-Thinking

UCSC-VLAA

🧡55

[TMLR 25] SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

148

Apache-2.0

Python

Updated 1 week ago

multimodalreasoningvision-language-model+1

Med-R1

Yuxiang-Lai117

🧡60

Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models

118

Python

Updated 7 hours ago

RealSR-R1

Junboooo

🧡55

Code for paper "RealSR-R1: Reinforcement Learning for Real-World Image Super-Resolution with Vision-Language Chain-of-Thought"

104

Python

Updated 3 weeks ago

time-r1

xiaomi-research

🧡60

[NeurIPS'25] Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding

Apache-2.0

Python

Updated 1 day ago

MobileVLA-R1

AIGeeksGroup

💛70

MobileVLA-R1: Reinforcing Vision-Language-Action for Mobile Robots

Apache-2.0

Python

Updated 1 day ago

IAD-R1

Yanhui-Lee

🧡55

We propose IAD-R1, a universal post-training framework that enhances Vision-Language Models for industrial anomaly detection through a two-stage training strategy.

Python

Updated 18 hours ago

R1-Vision

yuyq96

❤️35

R1-Vision: Let's first take a look at the image

MIT

Python

Updated 8 months ago

Visionary-R1

maifoundations

❤️45

Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning

Apache-2.0

Python

Updated 1 month ago

ReWatch-R1

alibaba

🧡60

[ICLR 2026] ReWatch-R1: Boosting Complex Video Reasoning in Large Vision-Language Models through Agentic Data Synthesis

Apache-2.0

Python

Updated 19 hours ago

VTC-R1

w-yibo

🧡65

VTC-R1: Vision-Text Compression for Efficient Long-Context Reasoning.

Apache-2.0

Python

Updated 6 days ago

ETP-R1

Cepillar

🧡60

Official implementation of the paper "ETP-R1: Evolving Topological Planning with Reinforcement Fine-tuning for Vision-Language Navigation in Continuous Environments"

Python

Updated 18 hours ago

PathReasoner-R1

cyclexfy

❤️45

Official implementation of "PathReasoner-R1: Instilling Structured Reasoning into Pathology Vision-Language Model via Knowledge-Guided Policy Optimization"

Updated 1 week ago

deepseek-r1-vision

sungatetop

❤️45

an method to make vlm think like r1

Python

Updated 2 months ago

R1V-Free

Exgc

❤️35

R1V, trained with AI feedback, answers open-ended visual questions.

Python

Updated 10 months ago

open-r1r1vvideo-r1+2

vision-r1

JacksonCakes

❤️20

No description available

Python

Updated 9 months ago

open-r1-vision

alexxchen

❤️40

One-click start reproduction of multi-modal DeepSeek R1-Zero

Apache-2.0

Python

Updated 11 months ago

NTI_R1

MohamadHKamal

❤️25

This repository include 5 projects for NTI R1 advanced computer vision

Apache-2.0

Updated 4 months ago

genseg-r1

ai-and-lab

🧡55

official code for the paper "GenSeg-R1: RL-Driven Vision–Language Grounding for Fine-Grained Referring Segmentation"

Apache-2.0

Updated 2 weeks ago

DocScope-R1

PRITHIVSAKTHIUR

🧡60

DocScope-R1 is an experimental, advanced document vision suite designed for high-performance Optical Character Recognition (OCR) and complex visual reasoning.

Apache-2.0

Python

Updated 2 weeks ago

document-parsinggradiohuggingface-spaces+11

Multimodal-Retrieval-Augmented-Generation-MuRAG-

Lyf-of-sakthi

❤️35

This is a Multimodal Retrieval-Augmented Generation (RAG) system that extracts text and images from PDFs, retrieves relevant information, and generates responses using DeepSeek-R1-Distill-Qwen-1.5B model for output text generation, BLIP vision model for image captioning and Flet framework for GUI.

Python

Updated 6 months ago

-TheDress

rlaferso

❤️40

This repository contains data published in Lafer-Sousa, R., & Conway, B. R. (2017). #thedress: Categorical perception of an ambiguous color image. Journal of Vision; Lafer-Sousa, R., Hermann, K., L., & Conway, R., B. (2015). Striking Individual Differences in Color Perception Uncovered by 'the Dress' Photograph. Current Biology 25, R1–R2

MIT

Matlab

Updated 4 years ago

Vision-R1

taivu1998

❤️45

No description available

Python

Updated 1 week ago

Tiny-R1-Vision

richardoo-707

❤️45

基于Tiny-R1架构的极限显存使用场景的GRPO训练框架

Python

Updated 1 month ago

Saliency_R1

peterant330

🧡65

[CVPR'26] Saliency-R1: Enforcing Interpretable and Faithful Vision-language Reasoning via Saliency-map Alignment Reward

Python

Updated 13 hours ago

contract-bot

arishtanemi3007

🧡65

A privacy-first, fully local Agentic AI legal assistant. Combines multimodal RAG (DeepSeek-R1 + Llama 3.2 Vision) via Telegram for zero-data-leakage contract analysis.

Python

Updated 8 hours ago

agentic-aideepseek-r1legal-tech+11

DeepSeek-R1

Rafa49451

❤️35

DeepSeek-R1 is a cutting-edge deep learning model designed for advanced image recognition tasks, boasting high accuracy and efficiency in detecting complex patterns within visual data. Leveraging state-of-the-art neural network architecture, DeepSeek-R1 sets a new benchmark in the field of computer vision by pushing the boundaries of what is possib

Updated 7 months ago

chat-apichatbotchatgpt-api+3

IIT-Ropar-End-SEM-Project

AHSharan

🧡50

Multi-modal AI video editing pipeline. Whisper ASR + Qwen2-VL vision + DeepSeek-R1 reasoning → FCPXML timelines for DaVinci Resolve. Features 33:1 metadata compression, semantic vector search, and A-roll/B-roll classification. IIT Ropar Module E project.

MIT

Python

Updated 2 months ago

GitHub Explorer

Search Results

Vision-R1

Griffon

GUI-R1

VLAA-Thinking

Med-R1

RealSR-R1

time-r1

MobileVLA-R1

IAD-R1

R1-Vision

Visionary-R1

ReWatch-R1

VTC-R1

ETP-R1

PathReasoner-R1

deepseek-r1-vision

R1V-Free

vision-r1

open-r1-vision

NTI_R1

genseg-r1

DocScope-R1

Multimodal-Retrieval-Augmented-Generation-MuRAG-

-TheDress

Vision-R1

Tiny-R1-Vision

Saliency_R1

contract-bot

DeepSeek-R1

IIT-Ropar-End-SEM-Project

Vision-R1

Griffon

GUI-R1

VLAA-Thinking

Med-R1

RealSR-R1

time-r1

MobileVLA-R1

IAD-R1

R1-Vision

Visionary-R1

ReWatch-R1

VTC-R1

ETP-R1

PathReasoner-R1

deepseek-r1-vision

R1V-Free

vision-r1

open-r1-vision

NTI_R1

genseg-r1

DocScope-R1

Multimodal-Retrieval-Augmented-Generation-MuRAG-

-TheDress

Vision-R1

Tiny-R1-Vision

Saliency_R1

contract-bot

DeepSeek-R1

IIT-Ropar-End-SEM-Project