Found 43 repositories(showing 30)
Osilly
[ICLR2026] This is the first paper to explore how to effectively use R1-like RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-start initialization and RL training to incentivize reasoning capability.
jefferyZhan
Official repo of Griffon series including v1(ECCV 2024), v2(ICCV 2025), G, and R, and also the RL tool Vision-R1.
ritzz-ai
Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents
UCSC-VLAA
[TMLR 25] SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
Yuxiang-Lai117
Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models
Junboooo
Code for paper "RealSR-R1: Reinforcement Learning for Real-World Image Super-Resolution with Vision-Language Chain-of-Thought"
xiaomi-research
[NeurIPS'25] Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding
AIGeeksGroup
MobileVLA-R1: Reinforcing Vision-Language-Action for Mobile Robots
Yanhui-Lee
We propose IAD-R1, a universal post-training framework that enhances Vision-Language Models for industrial anomaly detection through a two-stage training strategy.
yuyq96
R1-Vision: Let's first take a look at the image
maifoundations
Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning
alibaba
[ICLR 2026] ReWatch-R1: Boosting Complex Video Reasoning in Large Vision-Language Models through Agentic Data Synthesis
w-yibo
VTC-R1: Vision-Text Compression for Efficient Long-Context Reasoning.
Cepillar
Official implementation of the paper "ETP-R1: Evolving Topological Planning with Reinforcement Fine-tuning for Vision-Language Navigation in Continuous Environments"
cyclexfy
Official implementation of "PathReasoner-R1: Instilling Structured Reasoning into Pathology Vision-Language Model via Knowledge-Guided Policy Optimization"
sungatetop
an method to make vlm think like r1
Exgc
R1V, trained with AI feedback, answers open-ended visual questions.
JacksonCakes
No description available
alexxchen
One-click start reproduction of multi-modal DeepSeek R1-Zero
MohamadHKamal
This repository include 5 projects for NTI R1 advanced computer vision
ai-and-lab
official code for the paper "GenSeg-R1: RL-Driven Vision–Language Grounding for Fine-Grained Referring Segmentation"
PRITHIVSAKTHIUR
DocScope-R1 is an experimental, advanced document vision suite designed for high-performance Optical Character Recognition (OCR) and complex visual reasoning.
Lyf-of-sakthi
This is a Multimodal Retrieval-Augmented Generation (RAG) system that extracts text and images from PDFs, retrieves relevant information, and generates responses using DeepSeek-R1-Distill-Qwen-1.5B model for output text generation, BLIP vision model for image captioning and Flet framework for GUI.
rlaferso
This repository contains data published in Lafer-Sousa, R., & Conway, B. R. (2017). #thedress: Categorical perception of an ambiguous color image. Journal of Vision; Lafer-Sousa, R., Hermann, K., L., & Conway, R., B. (2015). Striking Individual Differences in Color Perception Uncovered by 'the Dress' Photograph. Current Biology 25, R1–R2
taivu1998
No description available
richardoo-707
基于Tiny-R1架构的极限显存使用场景的GRPO训练框架
peterant330
[CVPR'26] Saliency-R1: Enforcing Interpretable and Faithful Vision-language Reasoning via Saliency-map Alignment Reward
arishtanemi3007
A privacy-first, fully local Agentic AI legal assistant. Combines multimodal RAG (DeepSeek-R1 + Llama 3.2 Vision) via Telegram for zero-data-leakage contract analysis.
Rafa49451
DeepSeek-R1 is a cutting-edge deep learning model designed for advanced image recognition tasks, boasting high accuracy and efficiency in detecting complex patterns within visual data. Leveraging state-of-the-art neural network architecture, DeepSeek-R1 sets a new benchmark in the field of computer vision by pushing the boundaries of what is possib
AHSharan
Multi-modal AI video editing pipeline. Whisper ASR + Qwen2-VL vision + DeepSeek-R1 reasoning → FCPXML timelines for DaVinci Resolve. Features 33:1 metadata compression, semantic vector search, and A-roll/B-roll classification. IIT Ropar Module E project.