End-to-end Speaker Diarization and Transcription system combining Faster-Whisper (ASR), Pyannote (VAD/segmentation), NVIDIA NeMo (speaker embeddings + clustering), and CTC forced aligner for precise word-level timestamps. Includes a React + Vite frontend and Flask + FastAPI backend for easy audio upload, diarized transcripts, and summaries.
Stars
5
Forks
3
Watchers
5
Open Issues
2
Overall repository health assessment
No package.json found
This might not be a Node.js project
Add end-to-end summary for diarization with known speaker matching in README
1172376View on GitHubRefactor sequence diagram in README to clarify VAD and segmentation roles
3a3d0e1View on GitHubAdd speaker enrollment pipeline diagram and explanation to README
3d0add9View on GitHubAdd support for known speaker enrollment and identification in the diarization pipeline
518498cView on GitHubAdd new speaker "Amber" with embeddings and update diarization threshold
4f5d8f3View on GitHubChanged the proxy in the personal computer to keep up with the container
1077497View on GitHubRemoving hte heading for the diarization and the transcriptions
4339aacView on GitHubMade changes to the banner to reflect the changes where we are shpwing the transcription and the diarization sperately and also changing the banner in real time so that the user knows the diarization is happening
7b37a9bView on GitHubUpdate README to clarify processing modes for diarization and transcription
38e2932View on GitHubRefactor code structure for removing the code for the error in the waveform.jsx
4706de0View on GitHubMaking code cahnges to remove general interation and doctor-patient interaction. Also removing the summarization
5916bd5View on GitHubAdd social media links to the top right corner of the app
67e41d5View on GitHub