Found 557 repositories(showing 30)
seth814
Code for YouTube series: Deep Learning for Audio Classification
jaron
Deep Learning experiments for audio classification
List of articles related to deep learning applied to music
In oceanic remote sensing operations, underwater acoustic target recognition is always a difficult and extremely important task of sonar systems, especially in the condition of complex sound wave propagation characteristics. Expensively learning recognition model for big data analysis is typically an obstacle for most traditional machine learning (ML) algorithms, whereas convolutional neural network (CNN), a type of deep neural network, can automatically extract features for accurate classification. In this study, we propose an approach using a dense CNN model for underwater target recognition. The network architecture is designed to cleverly re-use all former feature maps to optimize classification rate under various impaired conditions while satisfying low computational cost. In addition, instead of using time-frequency spectrogram images, the proposed scheme allows directly utilizing original audio signal in time domain as the network input data. Based on the experimental results evaluated on the real-world dataset of passive sonar, our classification model achieves the overall accuracy of 98.85$\%$ at 0 dB signal-to-noise ratio (SNR) and outperforms traditional ML techniques, as well as other state-of-the-art CNN models.
vishalshar
Multi class audio classification using Deep Learning (MLP, CNN): The objective of this project is to build a multi class classifier to identify sound of a bee, cricket or noise.
cetinsamet
Music genre classification from audio spectrograms using deep learning
abishek-as
We'll look into audio categorization using deep learning principles like Artificial Neural Networks (ANN), 1D Convolutional Neural Networks (CNN1D), and CNN2D in this repository. We undertake some basic data preprocessing and feature extraction on audio sources before developing models. As a result, the accuracy, training time, and prediction time of each model are compared. This is explained by model deployment, which allows users to load the desired sound output for each model that is successfully deployed, as will be addressed in more depth later.
🕌 Urban 🚃 Sound 🚋 Classification 🚞 with ⛩ UrbanSound8K 🚁 Dataset is 🛳 a deep 🚠 learning 🚢 that ✈ classifies 🚀 urban 🛼 sound ⛱ events 🚈 the 🚂 UrbanSound8K 🚋 dataset 🏛 demonstrates 🏘 how 🧱 audio 🧸 signal 🏆 processing ⚽ neural 🏀 networks 🥎 can ⚾ power 🏐 sound 🎮 recognition 📟 systems 🔋smart 🧰 cities 💣 surveillance 🪣
bapalto
Birdsong classification in noisy environments with Convolutional Neural Networks implemented in Keras Deep Learning library for the BIRDCLEF 2016 competition. Can be fine-tuned to arbitrary audio classification task.
Learning discriminative and robust time-frequency representations for environmental sound classification: Convolutional neural networks (CNN) are one of the best-performing neural network architectures for environmental sound classification (ESC). Recently, attention mechanisms have been used in CNN to capture the useful information from the audio signal for sound classification, especially for weakly labelled data where the timing information about the acoustic events is not available in the training data, apart from the availability of sound class labels. In these methods, however, the inherent time-frequency characteristics and variations are not explicitly exploited when obtaining the deep features. In this paper, we propose a new method, called time-frequency enhancement block (TFBlock), which temporal attention and frequency attention are employed to enhance the features from relevant frames and frequency bands. Compared with other attention mechanisms, in our method, parallel branches are constructed which allow the temporal and frequency features to be attended respectively in order to mitigate interference from the sections where no sound events happened in the acoustic environments. The experiments on three benchmark ESC datasets show that our method improves the classification performance and also exhibits robustness to noise.
swainshashwat
Classifying 10 different categories of Sound using Deep Learning.
aliivaezii
WeldFusionNet: Multimodal deep learning for automated weld defect classification using sensor, audio, and video data (F1: 0.9567)
hasnainnaeem
Audio classification deep learning model using TensorFlow 2.0 to detect Gunshots. 97.5% test set accuracy and 99% training set accuracy was achieved on Binary-Urban8K. This work was done during my summer internship at TUKL-NUST lab.
silveranon323
Audio Monk is an advanced music genre classification system that leverages deep learning techniques and the Spotify Web API to provide intelligent music discovery and recommendation services.
dioptx
A deep learning approach for respiratory audio discovery and classification.
gsmafra
Unsupervised feature learning for audio classification using convolutional deep belief networks
Labbeti
Deep Semi-Supervised Learning with Holistic methods for audio classification.
gkotti4
Guitar Audio Transcriber is a deep-learning research project aimed at converting raw guitar audio into musical note representations (and eventually tabulature) using neural networks and signal processing. The current implementation focuses on single-note classification using CNN and MLP models trained on time-frequency audio features.
Leveraged wavelet denoising and deep learning techniques for the classification of respiratory sounds. - Implemented signal processing techniques and wavelet denoising for audio data cleanup and feature extraction. - Developed and trained a deep learning model (Conv1D, Bi-LSTM, CNN, RNN) for phase identification
rawbeen248
This project focuses on the classification of animal sounds using deep learning. The core idea is to utilize audio processing techniques and a fine-tuned version of the hubert-base-ls960 model to accurately classify different animal sounds. This application could serve various purposes, from ecological monitoring to educational software.
The Covid-19 virus is fast spreading disease in globally, which threateness billions of human begins. In this paper, Jaya Honey Badger Optimization-based Deep Neuro Fuzzy Network (JHBO-based DNFN) is introduced for Covid-19 prediction by audio signal. Here, Covid-19 prediction is done using DNFN, and it is trained by developed JHBO algorithm. The developed JHBO-based DNFN is outperformed than other existing methods testing accuracy, sensitivity and specificity of 0.9176, 0.9218 and 0.9219. The Covid-19 prediction process is more indispensable to handle the spread and death occurred rate because of Covid-19. However, early and precise prediction of Covid-19 is more difficult, because of different sizes and resolutions of input image. An effective Covid-19 detection technique is introduced based on hybrid optimization driven deep learning model. The Deep Neuro Fuzzy network (DNFN) is used for detecting Covid-19, which classifies the feature vector as Covid-19 or non Covid-19. Moreover, the DNFN is trained by devised Jaya Honey Badger Optimization (JHBO) approach, which is introduced by combining Honey Badger optimization Algorithm (HBA) and Jaya algorithm. The developed JHBO-based DNFN is outperformed than other existing methods testing accuracy, sensitivity and specificity of 0.9176, 0.9218 and 0.9219. Covid-19 is respiratory disease, which is usually produced by Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2). However, it is more indispensable to detect the positive cases for reducing further spread of virus, and former treatment of affected patients. An effectual Covid-19 detection model using devised Jaya Honey Badger Optimization-based Deep Neuro Fuzzy Network (JHBO-based DNFN) is developed in this paper. Here, the audio signal is considered as input for detecting Covid-19. The gaussian filter is applied to input signal for removing the noises and then feature extraction is performed. The substantial features, like spectral roll-off, spectral bandwidth, Mel frequency cepstral coefficients (MFCC), spectral flatness, zero crossing rate, spectral centroid, mean square energy and spectral contract are extracted for further processing. Finally, DNFN is applied for detecting Covid-19 and the deep leaning model is trained by designed JHBO algorithm. Accordingly, the developed JHBO method is newly designed by incorporating Honey Badger optimization Algorithm (HBA) and Jaya algorithm. The performance of developed Covid-19 detection model is evaluated using three metrics, like testing accuracy, sensitivity and specificity. The developed JHBO-based DNFN is outperformed than other existing methods testing accuracy, sensitivity and specificity of 0.9176, 0.9218 and 0.9219. The recent investigation has started for evaluating the human respiratory sounds, like voice recorded, cough, and breathing from hospital confirmed Covid-19 tools, which differs from healthy persons sound. The cough-based detection of Covid-19 also considered with non-respiratory and respiratory sounds data related with all declared situations. This paper explicates the Covid-19 detection approach using designed Jaya Honey Badger Optimization-based Deep Neuro Fuzzy Network (JHBO-based DNFN) with audio sample. The series of steps followed for introduced Covid-19 diagnosis model are pre-processing, feature extraction, and classification. The input audio sample is acquired from a Coswara dataset and gaussian filter is applied. The gaussian filter effectively reduces the salt and pepper noise with minimal duration. Feature extraction process is most significant for precise detection of Covid-19, where spectral bandwidth, spectral roll off, Spectral flatness, Mel frequency cepstral coefficients (MFCC), spectral centroid, root mean square energy, spectral contract, and zero crossing rate are extracted. The Deep learning approach is effectual for disease detection and classification process in medical field. Here, DNFN is utilized for detecting the Covid-19 disease. Moreover, DNFN is trained by developed JHBO approach for obtaining better performance. The proposed JHBO algorithm is newly devised by combining Jaya algorithm and HBA. Here, Jaya algorithm is incorporated with HBA for obtaining improved performance with better convergence speed. The performance of DNFN is estimated with three performance metrics, namely specificity, testing accuracy and sensitivity. The proposed JHBO-based DNFN achieved improved performance testing accuracy, sensitivity and specificity of 0.9176, 0.9218 and 0.9219.
CanakkaleDevelopers
A research tool for anybody can build, train, test and analysis deep learning models on audio data for the purpose of emotion classification.
AFLucas-UOM
ARI2201 - IAPT · Comparative analysis of machine learning and deep learning models for automated classification of lung respiratory sounds using audio feature extraction to support pulmonary disease detection.
HBansiwal
Designed for different deep_learning projects, like-text, image classification and audio analysis. It consists CNN,ANN and also NLP based projects.
ThanAid
Chord Recognition Framework: A deep learning system using CNNs and biLSTM for accurate audio chord recognition. Features advanced engineering, transfer learning, and Fourier transforms (STFT, CQT) to enhance music analysis and chord classification.
nam-htran
A web-based application built with FastAPI to detect deepfake audio using state-of-the-art deep learning models. This tool provides a user-friendly interface to upload an audio file and get real-time classification results from multiple models simultaneously.
Abstract With the advancement of Deep Neural Networks (DNN), the accuracy of sound classification such as Urban Sound Classification, Environmental Sound Classification etc., has been significantly improved. In this project, we propose a model that uses Convolutional Neural Networks (CNN) to identify sound based on the spectrograms for different sound samples collected. The model can be used for detection of deforestation, detection of shooting in urban areas and detection of strange noises at odd hours in streets such as Air Conditioner, Car Horn, Children Playing, Dog bark, Drilling, Engine Idling, Gun Shot, Jackhammer, Siren, Street Music etc., Challenges Environmental sound work has two major obstacles, namely the lack of audio data labelled. Previous work focused on audio from carefully produced films or TV tracks from particular environments such as elevators or office spaces and commercial or proprietary datasets. Lack of fundamental vocabulary in Environmental Sounds work. This means that the classification of sounds in to the semantic groups may vary from study to study, making it difficult to compare results so the goal of this notebook is to address the two challenges mentioned above. Dataset The dataset is called UrbanSound8K and contains 8732 labelled sound excerpts (<=4s) of urban sounds from 10 classes: - The dataset contains 8732 sound excerpts (<=4s) of urban sounds from 10 classes, namely: Air Conditioner Car Horn Children Playing Dog bark Drilling Engine Idling Gun Shot Jackhammer Siren Street Music The attributes of data are as follows: ID Unique ID of sound excerpt Class type of sound Problem statement It will show how to apply Deep Learning techniques to environmental recognition sounds, focusing specifically on recognizing unique Environmental sounds. If we give an audio sample of a few seconds duration in a computer-readable format (such as a.wav file), we want to be able to determine whether it contains one of the target Environmental sounds with a corresponding classification accuracy score. Note: Loading audio files and pre-processing takes some times to complete with large dataset. To avoid reload every time reset the kernel or resume works on next day, all loaded audio data will be serialized into a object file. so next round only need to load the seriazed object file. Optional GPU configuration initialization
No description available
mostafa-kermaninia
A comprehensive machine learning pipeline for robust Speaker Identification and Gender Classification using advanced audio feature extraction and deep learning models (SVM, XGBoost, MLP).
wasifbiswas
Deep learning–based speech analysis system for detecting stress, anxiety, and depression from voice patterns using advanced audio preprocessing, feature extraction, and emotion classification models.