Found 97 repositories(showing 30)
riedlerm
Implementation and evaluation of multimodal RAG with text and image inputs for industrial applications
CornelliusYW
This repository contains a Multimodal Retrieval-Augmented Generation (RAG) Pipeline that integrates images, audio, and text for advanced multimodal querying and response generation..
microsoft
Enterprise-ready solution leveraging multimodal Generative AI (Gen AI) to enhance existing or new applications beyond text—implementing RAG, image classification, video analysis, and advanced image embeddings.
HyeonjeongHa
Official PyTorch implementation of "MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks"
utkartist
Multimodal Retrieval-Augmented Generation (RAG) is an advanced technique that combines text and image data to enhance the capabilities of large language models (LLMs) like GPT-4. This tutorial will guide you through the process of implementing a multimodal RAG system using GPT-4 and Llama Index.
ranasaurus9
This is a sample code implementation of Multimodal RAG using Google Gemini & MongoDB Altas Vector Search
Thaman-N
Advanced Contract Analysis System: A comprehensive legal contract analysis system using generative AI. The project implements various NLP techniques, prompt engineering approaches (CoT, TroT, GoT), Retrieval-Augmented Generation (RAG), multimodal inputs, QLoRA fine-tuning, and evaluation frameworks.
TeenLucifer
An implementation of a multimodal RAG system provides support for images, tables, and formulas.
kirollos2001
A Python-based Retrieval-Augmented Generation (RAG) system designed to handle multimodal inputs and outputs. This project implements an advanced RAG architecture capable of processing and retrieving information across multiple modalities (text, images, etc.), enabling mo
Benedictusy
In this project, I implemented a multimodal RAG (Retrieval-Augmented Generation) video question answering system that can understand both visual and textual information in videos to provide accurate answers.
This project implements a Multimodal Retrieval-Augmented Generation (RAG) pipeline using AWS Bedrock's Nova and Titan models. The system ingests PDFs and extracts: - Text - Tables - Embedded images - Full-page images and performs similarity search using FAISS to generate grounded answers using "Amazon Nova" with both text and visual context.
Dat-Bois
Multimodal RAG implementation for Recipe1M dataset
steve601
Implementing a multimodal rag system
mmm-megahed
Multimodal RAG implementation for Moodle with evaluation experiments
MansoobeZahra
Study assistant to help assist while studying, Involves RAG implementation multimodal and multi agent system
hash2004
This repository features a Multimodal Agentic RAG fusion with complex techniques implemented like Agentic Ingestion, and RAG Fusion
Multimodal (Text,Images,Tables) RAG Pipeline implementation using Llama3.1 , Google Gemini 1.5-Flash , Chroma DB
jemayz
ATLAST is a Multimodal Chatbot that implement RAG which covers three domains which are Medical, Islamic and Insurance.
This implementation highlights the Agentic RAG implementation using ApertureDB data store, which is a graph-based multimodal database. Hugging Face SmolAgents will be employed for implementing a multi-agent LLM workflow.
Naveed05
Advanced GenAI projects implementing Retrieval-Augmented Generation (RAG) across text, audio, and multimodal pipelines using vector databases and foundation models.
Gauravmangate27
NovaSearch– Multimodal RAG Engine Python, LLMs, LangChain, FastAPI, RAG Demo • Developed a multimodal RAG system enabling semantic search across text and image data using OpenAI embeddings and CLIP. • Implemented real-time ingestion and hybrid retrieval pipelines with Kafka, Spark Streaming, FAISS, and Elasticsearch k-NN, improving retrieval .
nandanavijesh
End-to-end RAG implementation using Jina Embeddings v2 and FAISS for vector search, with Groq llama-3.2-vision for grounded, multimodal response generation.
FormalIngenieroniel
This project implements a Multimodal Retrieval-Augmented Generation (RAG) system designed to identify, retrieve, and describe specific train wagons based on visual and textual data.
fllin1
Modern search engine techniques implemented end-to-end: keyword (BM25), semantic (embeddings), hybrid fusion (weighted, RRF), multimodal (image→text), and LLM-enhanced retrieval (RAG with Gemini).
This project implements a Multimodal Retrieval-Augmented Generation (RAG) pipeline that combines text and visual understanding using a Vision-Language Model (VLM). It enables querying across both documents and images, retrieving relevant multimodal context, and generating grounded responses.
pparitoshh
A collection of Generative AI implementations focused on real-world applications like Retrieval-Augmented Generation (RAG), chatbots, and multimodal systems. Includes production-ready code, tutorials, and experiments using LangChain, OpenAI, and open-source models (Llama, Mistral). Contributions welcome!
rafamartinezquiles
This project implements a multimodal pipeline capable of ingesting text, extracting knowledge, and enabling intelligent search using Retrieval-Augmented Generation (RAG). It uses cutting-edge tools like LangChain, OpenAI, and Neo4j to build a searchable knowledge graph from unstructured documents like employee handbooks.
This Streamlit application implements a Multimodal Retrieval-Augmented Generation (RAG) system. It processes various types of documents including text files, PDFs, PowerPoint presentations, and images. The app leverages Large Language Models and Vision Language Models to extract and index information from these documents.
kratipandya
This project implements a multimodal Retrieval-Augmented Generation (RAG) search engine focused on scientific content from arXiv. It allows users to search through research papers using text queries, image uploads, or audio inputs, and provides AI-generated answers based on relevant content.
karimtawfikk
This project implements a multimodal RAG system for designing creative flower arrangements. Flower images are stored in ChromaDB with OpenCLIP embeddings, enabling natural language queries like “What flowers would look elegant for a wedding bouquet?” The model then generates personalized bouquet suggestions grounded in retrieved visuals.