Found 2 repositories(showing 2)
danielrosehill
A compilation of resources (model profiles, benchmarks, docs) for multimodal AI models with audio understanding (esp. focused on ASR and transcription use-cases)
bhanudeergasi
Integrating Text, Audio, and Vision:- A next-generation multimodal AI system that mimics human-like understanding by integrating Natural Language Processing (NLP), Speech Recognition, and Computer Vision into one intelligent platform. Resources
All 2 repositories loaded