Found 8,634 repositories(showing 30)
modelscope
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
KoljaB
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
pyannote
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
smacke
Automagically synchronize subtitles with video.
QwenLM
Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music/song recognition, language detection and timestamp prediction.
FluidInference
Frontier CoreML audio models in your apps — text-to-speech, speech-to-text, voice activity detection, and speaker diarization. In Swift, powered by SOTA open source.
k2-fsa
Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn without Internet connection. Support iOS, Android, Linux, macOS, Windows, Raspberry Pi, VisionFive2, LicheePi4A etc.
nyrahealth
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
ina-foss
CNN-based audio segmentation toolkit. Allows to detect speech, music, noise and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.
t-davidson
Repository for the paper "Automated Hate Speech Detection and the Problem of Offensive Language", ICWSM 2017
evancohen
:speech_balloon: /so.nus/ STT (speech to text) for Node with offline hotword detection
richmondu
libfaceid is a research framework for prototyping of face recognition solutions. It seamlessly integrates multiple detection, recognition and liveness models w/ speech synthesis and speech recognition.
gkonovalov
Android Voice Activity Detection (VAD) library. Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.
echogarden-project
Cross-platform speech toolset, used from the command-line or as a Node.js library. Includes a variety of engines for speech synthesis, speech recognition, forced alignment, speech translation, voice isolation, language detection and more.
SuperKogito
A collection of datasets for the purpose of emotion recognition/detection in speech.
filippogiruzzi
Voice Activity Detection based on Deep Learning & TensorFlow
HKAllThingsLink
A lightweight client-server system for real-time audio processing with voice activity detection (VAD) and automatic speech recognition (ASR).
gtreshchev
Cross-platform, real-time, offline speech recognition plugin for Unreal Engine. Based on Whisper OpenAI technology, whisper.cpp.
media-sec-lab
Research progress on speech deepfake detection: Relevant datasets aggregated from the review literature and publicly available codes
This repository is a curated list of awesome Speech Keyword Spotting (Wake-Up Word Detection).
RealCorebb
A hands-free mini walkie-talkie powered by embedded AI, featuring automatic voice detection, keyword-triggered animations, and real-time speech-to-text display.
harshalbenake
(1) Name :- ActionBarSearchView Description :- Action bar search view. (2) Name :- Adsfree Description :- Admob integration. (3) Name :- AndroidDayDreamDemo Description :- Day dream demo. (4) Name :- android query demo live Description :- Google play live app details parsing. (5) Name :- Arc GIS map Description :- Arc gis map integration without hash key. (6) Name :- aviarySdk Description :- Aviary integration for image operations. (7) Name :- BetterGestureDetector Description :- Gesture accrate detection. (8) Name :- BlinkText Description :- Blinking text. (9) Name :- BuzzBoxSDKHelloWorld Description :- Buzz box integration cron scheduler. (10) Name :- CircularProgressBar Description :- Circular progress bar. (11) Name :- ContactNumbersDemo Description :- Get contact details from device. (12) Name :- ControlViewheight Description :- Manage height of specific view. (13) Name :- ControlViewHeightSeekbar Description :- Two listview manage appropriate hieght. (14) Name :- DownloadManagerAndroid Description :- Download specific file online. (15) Name :- Facebook Integration Description :- Facebook integration. (16) Name :- Graphview Description :- Graphview demo. (17) Name :- HB 1337 Description :- Virus and antivirus. (18) Name :- HomeButtonEvent Description :- Block home button press. (19) Name :- HomeLauncher Description :- Home launcher demo. (20) Name :- InAppPurchaseTut Description :- InAppPurchase demo. (21) Name :- KeyboardCustom Description :- Creating Custom keyboard demo. (22) Name :- MapDemoGeofencing Description :- Location map for geo fencing. (23) Name :- MapDemoV2Final Description :- Map demo for google version 2. (24) Name :- OpenGLESSquare Description :- Opengl moving square. (25) Name :- pagination numbering 2 Description :- Pagination type 2. (26) Name :- Pagination numbering Description :- Pagination type 1. (27) Name :- PhoneGapCordova Description :- Phone gap simple cordova demo. (28) Name :- PhoneGapCordovaCamera Description :- Phone gap for camera. (29) Name :- PhoneGapCordovaParsing Description :- Phone gap for parsing. (30) Name :- PhoneGapCordovaSMS Description :- Phone gap for sending sms. (31) Name :- RotatingWheel Description :- Rotating wheel by user interaction. (32) Name :- RotatingWheelSocialsites Description :- Rotating wheel by user interaction for socialsites. (33) Name :- RunningBackgroundServices Description :- Get Running services in background for package name/class name. (34) Name :- SearchList Description :- Searching from a specific list. (35) Name :- SearchViewContacts Description :- Search from contacts details. (36) Name :- SlidingDrawer Description :- Sliding drawer from bottom over another activity. (37) Name :- SpeechToTextDemo Description :- Convert speech to text. (38) Name :- TextToSpeak Description :- Convert text to speech. (39) Name :- TouchCordinates Description :- Get coordinate of user touch intergration. (40) Name :- TreeViewListDemo Description :- Tree view integration demo. (41) Name :- UninstallDeleteapp Description :- Uninstall another app from my app after removing admin permission. (42) Name :- ViewPagerCustomWidthFragment Description :- Fragment in viewpager. (43) Name :- WearableNotification Description :- Wearable notification. (44) Name :- WearablePages Description :- Wearable pages. (45) Name :- WidgetDemo Description :- Widget demo. (46) Name :- CameraIntentAll Description :- Camera demo for picture as well as video demo. (47) Name :- CameraOverlay Description :- Camera overlay image as aim shooting game. (48) Name :- DrmIntegration Description :- Drm Integration library for authorize users apk file. (49) Name :- SwipeRefreshLayout Description :- SwipeRefreshLayout Pulltorefresh like google. (50) Name :- TwitterIntegration Description :- Twitter Integration. (51) Name :- CameraADev Description :- Custom Camera for picture as well as video capture from android developer. (52) Name :- DataBaseSQLiteCRUD Description :- Simple SQLite CRUD funtions for contact database. (53) Name :- DataBaseSQLiteDBUtility Description :- Simple SQLite DBUtility all files and basic operations. (54) Name :- CustomDropdownMenu Description :- Custom Dropdown/Poup Menu. (55) Name :- CalenderSimpleView Description :- Simple calender view as well as timestamp using calender class. (56) Name :- CalendarProviderADevIntent Description :- Calender provider Intent from android developer. (57) Name :- AnimationTextViewAnimateLayoutChanges Description :- Animation of adding view inside another view using animatelayoutchanges. (58) Name :- DragnDropLowVersion Description :- Drag n drop funtionality for low version. (59) Name :- GoogleWalletAdev Description :- Google Wallet Integration from android developer. (60) Name :- AndroidShootingGame Description :- Android Shooting Game without opengl. (61) Name :- ViewPagerAnimation Description :- ViewPager page transformation for pages like alpha,scaling,rotation. (62) Name :- GoogleCloudWirelessPrintingIntent Description :- Google cloud wireless printing integration from google developer. (63) Name :- Barcode_or_QRCode_Scanner_openurl Description :- Barcord/QR code scanner from google play and open result url in browser. (64) Name :- MSServerListSyncSample Description :- List Sync Sample using MS Server. (65) Name :- SlidingMenuAPI Description :- Sliding Menu jeremyfeinstein library like facebook,gmail,etc. (66) Name :- GCMIntegration Description :- Google cloud messageing integration for notification. (67) Name :- NoiseAlert Description :- Detect noise or blow sound. (68) Name :- GregorianCalendar Description :- Basic Gregorian Calendar information. (69) Name :- getVariableName Description :- Get name of the variable not its value. (70) Name :- GoogleAnalyticsV4Adev Description :- Google analytics integration V4. (71) Name :- FlipboardAnimationAdev Description :- Animation like Flipboard. (72) Name :- Html5Camera Description :- Camera in Html5 without phonegap. (73) Name :- CopyPasteClipboard Description :- Copy & Paste Clipboard textual data. (74) Name :- AndroidPhpMysql Description :- Php and Mysql data parsing in android. (75) Name :- SpellChecker Description :- Check spelling and give appropriate suggestion for enter text. (76) Name :- PdfReader Description :- Read pdf file.Barcode/QR code scanner. (77) Name :- BarcodeQRcodeIntegration Description :- Barcode/QR code scanner using ZbarScanner lib and also Zxing lib without intent. (78) Name :- InstagramIntegrationApi Description :- Instagram Integration using sample demo. (79) Name :- Logger Description :- Read logger/logcat using api. (80) Name :- SmsControl Description :- Control device via sms codes. (81) Name :- EncryptDecryptString Description :- Encrypt string and Decrypt the same string. (82) Name :- FloatingActionButton Description :- Floating Action Button. (83) Name :- DownloadAndUnzipFile Description :- Download And Unzip File. (84) Name :- MoPubAd Description :- MoPub Ad Banner integration . (85) Name :- ListViewParsingDB_AndroidStudio Description :- ListView Parsing in android studio. (86) Name :- CustomCamera_AS Description :- Custom Camera using surfaceview. (87) Name :- ResizeableBox_AS Description :- Resizeable Box like crop. (88) Name :- AudioRecorder_AS Description :- Audio Recorder. (89) Name :- DateAndTimePicker_AS Description :- Date And Time Picker. (90) Name :- CustomActionBar_AS Description :- Simple Custom ActionBar. (91) Name :- CustomSpinner_AS Description :- Custom Spinner with default text item. (92) Name :- SendEmail_AS Description :- Send email in background after authentication. (93) Name :- GoogleAnalytics_AS Description :- GoogleAnalytics integration demo for crash and screen. (94) Name :- BroadcastReciever_AS Description :- Broadcast Reciever for sms ,call and boot receiver. (95) Name :- Azure Description :- Azure storage gsi credentials zip dowload. (96) Name :- InAppPurchase_AS Description :- In App Purchase simple demo. (97) Name :- iOS_Listview Description :- Simple Listview in ios. (98) Name :- iOS_Database Description :- Sqlite Database in ios. (99) Name :- MessangerList_AS Description :- Messanger Listview UI send and recieve. (100) Name :- FindingFriend_AS Geofencing for enter and exit another pin.
abusufyanvu
MIT Introduction to Deep Learning (6.S191) Instructors: Alexander Amini and Ava Soleimany Course Information Summary Prerequisites Schedule Lectures Labs, Final Projects, Grading, and Prizes Software labs Gather.Town lab + Office Hour sessions Final project Paper Review Project Proposal Presentation Project Proposal Grading Rubric Past Project Proposal Ideas Awards + Categories Important Links and Emails Course Information Summary MIT's introductory course on deep learning methods with applications to computer vision, natural language processing, biology, and more! Students will gain foundational knowledge of deep learning algorithms and get practical experience in building neural networks in TensorFlow. Course concludes with a project proposal competition with feedback from staff and a panel of industry sponsors. Prerequisites We expect basic knowledge of calculus (e.g., taking derivatives), linear algebra (e.g., matrix multiplication), and probability (e.g., Bayes theorem) -- we'll try to explain everything else along the way! Experience in Python is helpful but not necessary. This class is taught during MIT's IAP term by current MIT PhD researchers. Listeners are welcome! Schedule Monday Jan 18, 2021 Lecture: Introduction to Deep Learning and NNs Lab: Lab 1A Tensorflow and building NNs from scratch Tuesday Jan 19, 2021 Lecture: Deep Sequence Modelling Lab: Lab 1B Music Generation using RNNs Wednesday Jan 20, 2021 Lecture: Deep Computer Vision Lab: Lab 2A Image classification and detection Thursday Jan 21, 2021 Lecture: Deep Generative Modelling Lab: Lab 2B Debiasing facial recognition systems Friday Jan 22, 2021 Lecture: Deep Reinforcement Learning Lab: Lab 3 pixel-to-control planning Monday Jan 25, 2021 Lecture: Limitations and New Frontiers Lab: Lab 3 continued Tuesday Jan 26, 2021 Lecture (part 1): Evidential Deep Learning Lecture (part 2): Bias and Fairness Lab: Work on final assignments Lab competition entries due at 11:59pm ET on Canvas! Lab 1, Lab 2, and Lab 3 Wednesday Jan 27, 2021 Lecture (part 1): Nigel Duffy, Ernst & Young Lecture (part 2): Kate Saenko, Boston University and MIT-IBM Watson AI Lab Lab: Work on final assignments Assignments due: Sign up for Final Project Competition Thursday Jan 28, 2021 Lecture (part 1): Sanja Fidler, U. Toronto, Vector Institute, and NVIDIA Lecture (part 2): Katherine Chou, Google Lab: Work on final assignments Assignments due: 1 page paper review (if applicable) Friday Jan 29, 2021 Lecture: Student project pitch competition Lab: Awards ceremony and prize giveaway Assignments due: Project proposals (if applicable) Lectures Lectures will be held starting at 1:00pm ET from Jan 18 - Jan 29 2021, Monday through Friday, virtually through Zoom. Current MIT students, faculty, postdocs, researchers, staff, etc. will be able to access the lectures during this two week period, synchronously or asynchronously, via the MIT Canvas course webpage (MIT internal only). Lecture recordings will be uploaded to the Canvas as soon as possible; students are not required to attend any lectures synchronously. Please see the Canvas for details on Zoom links. The public edition of the course will only be made available after completion of the MIT course. Labs, Final Projects, Grading, and Prizes Course will be graded during MIT IAP for 6 units under P/D/F grading. Receiving a passing grade requires completion of each software lab project (through honor code, with submission required to enter lab competitions), a final project proposal/presentation or written review of a deep learning paper (submission required), and attendance/lecture viewing (through honor code). Submission of a written report or presentation of a project proposal will ensure a passing grade. MIT students will be eligible for prizes and awards as part of the class competitions. There will be two parts to the competitions: (1) software labs and (2) final projects. More information is provided below. Winners will be announced on the last day of class, with thousands of dollars of prizes being given away! Software labs There are three TensorFlow software lab exercises for the course, designed as iPython notebooks hosted in Google Colab. Software labs can be found on GitHub: https://github.com/aamini/introtodeeplearning. These are self-paced exercises and are designed to help you gain practical experience implementing neural networks in TensorFlow. For registered MIT students, submission of lab materials is not necessary to get credit for the course or to pass the course. At the end of each software lab there will be task-associated materials to submit (along with instructions) for entry into the competitions, open to MIT students and affiliates during the IAP offering. This includes MIT students/affiliates who are taking the class as listeners -- you are eligible! These instructions are provided at the end of each of the labs. Completing these tasks and submitting your materials to Canvas will enter you into a per-lab competition. MIT students and affiliates will be eligible for prizes during the IAP offering; at the end of the course, prize-winners will be awarded with their prizes. All competition submissions are due on January 26 at 11:59pm ET to Canvas. For the software lab competitions, submissions will be judged on the basis of the following criteria: Strength and quality of final results (lab dependent) Soundness of implementation and approach Thoroughness and quality of provided descriptions and figures Gather.Town lab + Office Hour sessions After each day’s lecture, there will be open Office Hours in the class GatherTown, up until 3pm ET. An MIT email is required to log in and join the GatherTown. During these sessions, there will not be a walk through or dictation of the labs; the labs are designed to be self-paced and to be worked on on your own time. The GatherTown sessions will be hosted by course staff and are held so you can: Ask questions on course lectures, labs, logistics, project, or anything else; Work on the labs in the presence of classmates/TAs/instructors; Meet classmates to find groups for the final project; Group work time for the final project; Bring the class community together. Final project To satisfy the final project requirement for this course, students will have two options: (1) write a 1 page paper review (single-spaced) on a recent deep learning paper of your choice or (2) participate and present in the project proposal pitch competition. The 1 page paper review option is straightforward, we propose some papers within this document to help you get started, and you can satisfy a passing grade with this option -- you will not be eligible for the grand prizes. On the other hand, participation in the project proposal pitch competition will equivalently satisfy your course requirements but additionally make you eligible for the grand prizes. See the section below for more details and requirements for each of these options. Paper Review Students may satisfy the final project requirement by reading and reviewing a recent deep learning paper of their choosing. In the written review, students should provide both: 1) a description of the problem, technical approach, and results of the paper; 2) critical analysis and exposition of the limitations of the work and opportunities for future work. Reviews should be submitted on Canvas by Thursday Jan 28, 2021, 11:59:59pm Eastern Time (ET). Just a few paper options to consider... https://papers.nips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf https://papers.nips.cc/paper/2018/file/69386f6bb1dfed68692a24c8686939b9-Paper.pdf https://papers.nips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf https://science.sciencemag.org/content/362/6419/1140 https://papers.nips.cc/paper/2018/file/0e64a7b00c83e3d22ce6b3acf2c582b6-Paper.pdf https://arxiv.org/pdf/1906.11829.pdf https://www.nature.com/articles/s42256-020-00237-3 https://pubmed.ncbi.nlm.nih.gov/32084340/ Project Proposal Presentation Keyword: proposal This is a 2 week course so we do not require results or working implementations! However, to win the top prizes, nice, clear results and implementations will demonstrate feasibility of your proposal which is something we look for! Logistics -- please read! You must sign up to present before 11:59:59pm Eastern Time (ET) on Wednesday Jan 27, 2021 Slides must be in a Google Slide before 11:59:59pm Eastern Time (ET) on Thursday Jan 28, 2021 Project groups can be between 1 and 5 people Listeners welcome To be eligible for a prize you must have at least 1 registered MIT student in your group Each participant will only be allowed to be in one group and present one project pitch Synchronous attendance on 1/29/21 is required to make the project pitch! 3 min presentation on your idea (we will be very strict with the time limits) Prizes! (see below) Sign up to Present here: by 11:59pm ET on Wednesday Jan 27 Once you sign up, make your slide in the following Google Slides; submit by midnight on Thursday Jan 28. Please specify the project group # on your slides!!! Things to Consider This doesn’t have to be a new deep learning method. It can just be an interesting application that you apply some existing deep learning method to. What problem are you solving? Are there use cases/applications? Why do you think deep learning methods might be suited to this task? How have people done it before? Is it a new task? If so, what are similar tasks that people have worked on? In what aspects have they succeeded or failed? What is your method of solving this problem? What type of model + architecture would you use? Why? What is the data for this task? Do you need to make a dataset or is there one publicly available? What are the characteristics of the data? Is it sparse, messy, imbalanced? How would you deal with that? Project Proposal Grading Rubric Project proposals will be evaluated by a panel of judges on the basis of the following three criteria: 1) novelty and impact; 2) technical soundness, feasibility, and organization, including quality of any presented results; 3) clarity and presentation. Each judge will award a score from 1 (lowest) to 5 (highest) for each of the criteria; the average score from each judge across these criteria will then be averaged with that of the other judges to provide the final score. The proposals with the highest final scores will be selected for prizes. Here are the guidelines for the criteria: Novelty and impact: encompasses the potential impact of the project idea, its novelty with respect to existing approaches. Why does the proposed work matter? What problem(s) does it solve? Why are these problems important? Technical soundness, feasibility, and organization: encompasses all technical aspects of the proposal. Do the proposed methodology and architecture make sense? Is the architecture the best suited for the proposed problem? Is deep learning the best approach for the problem? How realistic is it to implement the idea? Was there any implementation of the method? If results and data are presented, we will evaluate the strength of the results/data. Clarity and presentation: encompasses the delivery and quality of the presentation itself. Is the talk well organized? Are the slides aesthetically compelling? Is there a clear, well-delivered narrative? Are the problem and proposed method clearly presented? Past Project Proposal Ideas Recipe Generation with RNNs Can we compress videos with CNN + RNN? Music Generation with RNNs Style Transfer Applied to X GAN’s on a new modality Summarizing text/news articles Combining news articles about similar events Code or spec generation Multimodal speech → handwriting Generate handwriting based on keywords (i.e. cursive, slanted, neat) Predicting stock market trends Show language learners articles or videos at their level Transfer of writing style Chemical Synthesis with Recurrent Neural networks Transfer learning to learn something in a domain for which it’s hard or risky to gather data or do training RNNs to model some type of time series data Computer vision to coach sports players Computer vision system for safety brakes or warnings Use IBM Watson API to get the sentiment of your Facebook newsfeed Deep learning webcam to give wifi-access to friends or improve video chat in some way Domain-specific chatbot to help you perform a specific task Detect whether a signature is fraudulent Awards + Categories Final Project Awards: 1x NVIDIA RTX 3080 4x Google Home Max 3x Display Monitors Software Lab Awards: Bose headphones (Lab 1) Display monitor (Lab 2) Bebop drone (Lab 3) Important Links and Emails Course website: http://introtodeeplearning.com Course staff: introtodeeplearning-staff@mit.edu Piazza forum (MIT only): https://piazza.com/mit/spring2021/6s191 Canvas (MIT only): https://canvas.mit.edu/courses/8291 Software lab repository: https://github.com/aamini/introtodeeplearning Lab/office hour sessions (MIT only): https://gather.town/app/56toTnlBrsKCyFgj/MITDeepLearning
castorini
Wake word detection modeling toolkit for Firefox Voice, supporting open datasets like Speech Commands and Common Voice.
Hironsan
Hate Speech Detection Library for Python.
Aastha2104
Introduction Parkinson’s Disease is the second most prevalent neurodegenerative disorder after Alzheimer’s, affecting more than 10 million people worldwide. Parkinson’s is characterized primarily by the deterioration of motor and cognitive ability. There is no single test which can be administered for diagnosis. Instead, doctors must perform a careful clinical analysis of the patient’s medical history. Unfortunately, this method of diagnosis is highly inaccurate. A study from the National Institute of Neurological Disorders finds that early diagnosis (having symptoms for 5 years or less) is only 53% accurate. This is not much better than random guessing, but an early diagnosis is critical to effective treatment. Because of these difficulties, I investigate a machine learning approach to accurately diagnose Parkinson’s, using a dataset of various speech features (a non-invasive yet characteristic tool) from the University of Oxford. Why speech features? Speech is very predictive and characteristic of Parkinson’s disease; almost every Parkinson’s patient experiences severe vocal degradation (inability to produce sustained phonations, tremor, hoarseness), so it makes sense to use voice to diagnose the disease. Voice analysis gives the added benefit of being non-invasive, inexpensive, and very easy to extract clinically. Background Parkinson's Disease Parkinson’s is a progressive neurodegenerative condition resulting from the death of the dopamine containing cells of the substantia nigra (which plays an important role in movement). Symptoms include: “frozen” facial features, bradykinesia (slowness of movement), akinesia (impairment of voluntary movement), tremor, and voice impairment. Typically, by the time the disease is diagnosed, 60% of nigrostriatal neurons have degenerated, and 80% of striatal dopamine have been depleted. Performance Metrics TP = true positive, FP = false positive, TN = true negative, FN = false negative Accuracy: (TP+TN)/(P+N) Matthews Correlation Coefficient: 1=perfect, 0=random, -1=completely inaccurate Algorithms Employed Logistic Regression (LR): Uses the sigmoid logistic equation with weights (coefficient values) and biases (constants) to model the probability of a certain class for binary classification. An output of 1 represents one class, and an output of 0 represents the other. Training the model will learn the optimal weights and biases. Linear Discriminant Analysis (LDA): Assumes that the data is Gaussian and each feature has the same variance. LDA estimates the mean and variance for each class from the training data, and then uses properties of statistics (Bayes theorem , Gaussian distribution, etc) to compute the probability of a particular instance belonging to a given class. The class with the largest probability is the prediction. k Nearest Neighbors (KNN): Makes predictions about the validation set using the entire training set. KNN makes a prediction about a new instance by searching through the entire set to find the k “closest” instances. “Closeness” is determined using a proximity measurement (Euclidean) across all features. The class that the majority of the k closest instances belong to is the class that the model predicts the new instance to be. Decision Tree (DT): Represented by a binary tree, where each root node represents an input variable and a split point, and each leaf node contains an output used to make a prediction. Neural Network (NN): Models the way the human brain makes decisions. Each neuron takes in 1+ inputs, and then uses an activation function to process the input with weights and biases to produce an output. Neurons can be arranged into layers, and multiple layers can form a network to model complex decisions. Training the network involves using the training instances to optimize the weights and biases. Naive Bayes (NB): Simplifies the calculation of probabilities by assuming that all features are independent of one another (a strong but effective assumption). Employs Bayes Theorem to calculate the probabilities that the instance to be predicted is in each class, then finds the class with the highest probability. Gradient Boost (GB): Generally used when seeking a model with very high predictive performance. Used to reduce bias and variance (“error”) by combining multiple “weak learners” (not very good models) to create a “strong learner” (high performance model). Involves 3 elements: a loss function (error function) to be optimized, a weak learner (decision tree) to make predictions, and an additive model to add trees to minimize the loss function. Gradient descent is used to minimize error after adding each tree (one by one). Engineering Goal Produce a machine learning model to diagnose Parkinson’s disease given various features of a patient’s speech with at least 90% accuracy and/or a Matthews Correlation Coefficient of at least 0.9. Compare various algorithms and parameters to determine the best model for predicting Parkinson’s. Dataset Description Source: the University of Oxford 195 instances (147 subjects with Parkinson’s, 48 without Parkinson’s) 22 features (elements that are possibly characteristic of Parkinson’s, such as frequency, pitch, amplitude / period of the sound wave) 1 label (1 for Parkinson’s, 0 for no Parkinson’s) Project Pipeline pipeline Summary of Procedure Split the Oxford Parkinson’s Dataset into two parts: one for training, one for validation (evaluate how well the model performs) Train each of the following algorithms with the training set: Logistic Regression, Linear Discriminant Analysis, k Nearest Neighbors, Decision Tree, Neural Network, Naive Bayes, Gradient Boost Evaluate results using the validation set Repeat for the following training set to validation set splits: 80% training / 20% validation, 75% / 25%, and 70% / 30% Repeat for a rescaled version of the dataset (scale all the numbers in the dataset to a range from 0 to 1: this helps to reduce the effect of outliers) Conduct 5 trials and average the results Data a_o a_r m_o m_r Data Analysis In general, the models tended to perform the best (both in terms of accuracy and Matthews Correlation Coefficient) on the rescaled dataset with a 75-25 train-test split. The two highest performing algorithms, k Nearest Neighbors and the Neural Network, both achieved an accuracy of 98%. The NN achieved a MCC of 0.96, while KNN achieved a MCC of 0.94. These figures outperform most existing literature and significantly outperform current methods of diagnosis. Conclusion and Significance These robust results suggest that a machine learning approach can indeed be implemented to significantly improve diagnosis methods of Parkinson’s disease. Given the necessity of early diagnosis for effective treatment, my machine learning models provide a very promising alternative to the current, rather ineffective method of diagnosis. Current methods of early diagnosis are only 53% accurate, while my machine learning model produces 98% accuracy. This 45% increase is critical because an accurate, early diagnosis is needed to effectively treat the disease. Typically, by the time the disease is diagnosed, 60% of nigrostriatal neurons have degenerated, and 80% of striatal dopamine have been depleted. With an earlier diagnosis, much of this degradation could have been slowed or treated. My results are very significant because Parkinson’s affects over 10 million people worldwide who could benefit greatly from an early, accurate diagnosis. Not only is my machine learning approach more accurate in terms of diagnostic accuracy, it is also more scalable, less expensive, and therefore more accessible to people who might not have access to established medical facilities and professionals. The diagnosis is also much simpler, requiring only a 10-15 second voice recording and producing an immediate diagnosis. Future Research Given more time and resources, I would investigate the following: Create a mobile application which would allow the user to record his/her voice, extract the necessary vocal features, and feed it into my machine learning model to diagnose Parkinson’s. Use larger datasets in conjunction with the University of Oxford dataset. Tune and improve my models even further to achieve even better results. Investigate different structures and types of neural networks. Construct a novel algorithm specifically suited for the prediction of Parkinson’s. Generalize my findings and algorithms for all types of dementia disorders, such as Alzheimer’s. References Bind, Shubham. "A Survey of Machine Learning Based Approaches for Parkinson Disease Prediction." International Journal of Computer Science and Information Technologies 6 (2015): n. pag. International Journal of Computer Science and Information Technologies. 2015. Web. 8 Mar. 2017. Brooks, Megan. "Diagnosing Parkinson's Disease Still Challenging." Medscape Medical News. National Institute of Neurological Disorders, 31 July 2014. Web. 20 Mar. 2017. Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder Detection', Little MA, McSharry PE, Roberts SJ, Costello DAE, Moroz IM. BioMedical Engineering OnLine 2007, 6:23 (26 June 2007) Hashmi, Sumaiya F. "A Machine Learning Approach to Diagnosis of Parkinson’s Disease."Claremont Colleges Scholarship. Claremont College, 2013. Web. 10 Mar. 2017. Karplus, Abraham. "Machine Learning Algorithms for Cancer Diagnosis." Machine Learning Algorithms for Cancer Diagnosis (n.d.): n. pag. Mar. 2012. Web. 20 Mar. 2017. Little, Max. "Parkinsons Data Set." UCI Machine Learning Repository. University of Oxford, 26 June 2008. Web. 20 Feb. 2017. Ozcift, Akin, and Arif Gulten. "Classifier Ensemble Construction with Rotation Forest to Improve Medical Diagnosis Performance of Machine Learning Algorithms." Computer Methods and Programs in Biomedicine 104.3 (2011): 443-51. Semantic Scholar. 2011. Web. 15 Mar. 2017. "Parkinson’s Disease Dementia." UCI MIND. N.p., 19 Oct. 2015. Web. 17 Feb. 2017. Salvatore, C., A. Cerasa, I. Castiglioni, F. Gallivanone, A. Augimeri, M. Lopez, G. Arabia, M. Morelli, M.c. Gilardi, and A. Quattrone. "Machine Learning on Brain MRI Data for Differential Diagnosis of Parkinson's Disease and Progressive Supranuclear Palsy."Journal of Neuroscience Methods 222 (2014): 230-37. 2014. Web. 18 Mar. 2017. Shahbakhi, Mohammad, Danial Taheri Far, and Ehsan Tahami. "Speech Analysis for Diagnosis of Parkinson’s Disease Using Genetic Algorithm and Support Vector Machine."Journal of Biomedical Science and Engineering 07.04 (2014): 147-56. Scientific Research. July 2014. Web. 2 Mar. 2017. "Speech and Communication." Speech and Communication. Parkinson's Disease Foundation, n.d. Web. 22 Mar. 2017. Sriram, Tarigoppula V. S., M. Venkateswara Rao, G. V. Satya Narayana, and D. S. V. G. K. Kaladhar. "Diagnosis of Parkinson Disease Using Machine Learning and Data Mining Systems from Voice Dataset." SpringerLink. Springer, Cham, 01 Jan. 1970. Web. 17 Mar. 2017.
liamadsr
Native macOS speech-to-text app with active speech detection and amazing intuitive UX. Help us make this the official open-source alternative to Wispr Flow!
Samarth-Tripathi
Multi-modal Emotion detection from IEMOCAP on Speech, Text, Motion-Capture Data using Neural Nets.
tympanix
Synchronize your subtitles using machine learning
brucewlee
[BEA @ ACL 2023] General-purpose tool for linguistic features extraction; Tested on readability assessment, essay scoring, fake news detection, hate speech detection, etc.