Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSON or Markdown
Stars
3.1k
Forks
268
Watchers
3.1k
Open Issues
47
Overall repository health assessment
No package.json found
This might not be a Node.js project
129
commits
107
commits
5
commits
2
commits
1
commits
1
commits
1
commits
1
commits
1
commits
1
commits
Merge pull request #123 from CatchTheTornado/122-bug-docker-based-make-install-fails
8a977e5View on GitHubMerge pull request #127 from majcheradam/fix/debian-libgl1
8f8a4abView on GitHubfix(docker): replace deprecated libgl1-mesa-glx with libgl1 for Debian trixie and Ubuntu images
92132d5View on GitHubFix Docker installation issues: remove invalid 'maker' dependency, correct license format, and reorganize Dockerfile for improved build efficiency
994f3c4View on GitHubMerge pull request #121 from CatchTheTornado/119-security-path-traversal-leads-to-arbitrary-file-readwrite-in-localfilesystemstoragestrategy
1383579View on GitHubImplement path traversal protection in LocalFilesystemStorageStrategy and add comprehensive tests
9a6413fView on GitHubMerge pull request #113 from CatchTheTornado/feature/54-add-docling-support
70af9f1View on GitHubMerge remote-tracking branch 'origin/feature/54-add-docling-support' into feature/54-add-docling-support
341c8ebView on GitHub