A method that directly addresses the modality gap by aligning speech token with the corresponding text transcription during the tokenization stage.
Stars
115
Forks
13
Watchers
115
Open Issues
3
Overall repository health assessment
No package.json found
This might not be a Node.js project
speed up audio processing by expliciting sequence of audio into numpy format
5e9dc94View on GitHubMerge branch 'main' of https://github.com/mtkresearch/TASTE-SpokenLM
c001615View on GitHubMerge pull request #2 from mtkresearch/feature/training-by-hf
dc00192View on GitHub