Found 43 repositories(showing 30)
DataFog
Python SDK for PII detection and redaction in text and images, combining regex + NLP pipelines for production privacy workflows.
jcatama
Rule-based PII and secret redaction for Markdown documents — audit log, risk-level filtering, LLM pipeline ready
chandika
Fast, layered PII redaction for LLM pipelines. Anonymize before sending to any provider, rehydrate on the way back.
darkmatter2222
Python pipeline for synthetic data generation with a custom Llama sentence generator. It creates field values, prompts & validated sentences (stored in JSON) and includes a training template focused on PII redaction, data sensitivity & compliance.
AnaPaula04
Lightweight PII redaction pipeline using Hugging Face NER + regex (Python) 96.5% accuracy
venuvankaraghuvardhan-arch
Enterprise-grade RAG pipeline with **PII auto-redaction** (Presidio), **input/output safety guardrails**, **retrieval confidence scoring**, graceful fallback, and a full analytics dashboard.
Michael-A-Kuykendall
FeedMe: A hungry, memory-safe streaming data pipeline in Rust. Efficient streaming ETL with ownership transfer, bounded resources, PII redaction, validation, dead-letter queues, and Prometheus metrics. Production-ready with comprehensive testing.
AbhinavGhatak
ResuNavigator is an industry-grade AI resume anonymization platform designed for high-volume hiring workflows. It performs secure, local, batch-level PII redaction across resumes (PDF, DOCX, TXT) using a state-of-the-art NLP pipeline, with real-time progress tracking and an in-browser side-by-side review system..
miniarjabri
No description available
shivkhurana
Automated Data Processing Pipeline designed to detect and redact PII (Personally Identifiable Information) from server logs using NLP (Spacy) and Regex. Containerized with Docker and integrated into a GitHub Actions CI/CD workflow for automated compliance testing.
dtiern55
No description available
sujan22359
No description available
A context-aware Named Entity Recognition (NER) system for detecting and sanitizing Personally Identifiable Information (PII) in unstructured text logs using fine-tuned DistilBERT.
raghavsyal
No description available
miniarjabri
No description available
mappy92
Redaction Project
ProfBisca29
The tool performs SHA-256 integrity verification on input and output documents, detects 10 PII entity types including SSNs, credit cards, and passport numbers, assigns risk severity ratings (Critical, High, Medium, Low) to each finding, and generates structured JSON compliance reports with audit logging.
Built an event-driven PII redaction pipeline using Apache Kafka and Spark Structured Streaming. The system detects and masks sensitive data in real time, with Dockerized microservices for scalability and seamless data flow across ingestion, processing, and storage layers.
danielmaddaleno
Pluggable guardrails pipeline for LLM apps – PII redaction, prompt injection, toxicity & token budget
amafjarkasi
Context hygiene & risk adjudication for LLM pipelines: secrets, PII, prompt-injection, policy redaction & tokenization.
simpli-support
PII detection, data redaction, and privacy risk scanning for AI-safe support data pipelines
bnthameur
A privacy-preserving PII redaction proxy for enterprise AI pipelines. Markdown-first & On-premise ready.
ShravanTalabhaktula
A security-focused PII detection and redaction pipeline that sanitizes application logs before persistence, with pluggable Azure AI integration.
YIHAO0225
A multimodal PII redaction system that detects and removes sensitive information from video, audio, and text. Uses AWS Textract, Transcribe, Comprehend, and Rekognition. Features OCR, face detection, speech-to-text, audio PII detection, and automated redaction pipelines.
Handwritten OCR + PII Extraction pipeline using OpenCV, Tesseract and EasyOCR. Includes image preprocessing, tilt correction, text extraction, PII detection and optional redaction for medical-style handwritten documents.
Shrivea
An on-device PII detection and redaction system that scans documents before they enter company knowledge bases or RAG pipelines.
DevanshuNEU
Serverless data ingestion pipeline on GCP - Cloud Run, Pub/Sub, Firestore. Handles 1000+ RPM with multi-tenant isolation and automatic PII redaction.
Mmm11222
Advanced Data Analysis Pipeline using Python to process 32M+ rows of Instacart sales data. Features: Data Wrangling, PII Redaction, and Customer Profiling.
ahlemtr
A secure data pipeline in Python that automates PII redaction and implements AES-based symmetric encryption to protect sensitive financial data-in-transit.
Automated PDF redaction pipeline using Python, AI (Gemini), and OCR. Permanently removes sensitive data (PII) from text, images, and metadata instead of just hiding it.