Search Results

Found 32 repositories(showing 30)

mms

alpa-projects

❤️45

AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23)

Python

Updated 1 month ago

iopddl

google

❤️45

Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning

Apache-2.0

C++

Updated 2 months ago

asplosasplos-2025asplos-contest+6

Attention-Based-Deep-Multiple-Instance-Learning

jmg764

❤️45

Constructed attention-based deep multiple instance learning model using PyTorch and trained on 624 whole slide images of digitized H&E-stained prostate biopsies using AWS SageMaker’s data parallelism toolkit.

Jupyter Notebook

Updated 2 months ago

QuintNet

Wodlfvllf

🧡60

QuintNet is a research-oriented PyTorch framework designed to explore and implement multi-dimensional parallelism strategies for distributed deep learning.

MIT

Python

Updated 1 week ago

distributed-computingdistributedtrainingparallelism+2

nemesyst

DreamingRaven

❤️25

Generalised and highly customisable, hybrid-parallelism, database based, deep learning framework.

MIT

Python

Updated 12 months ago

adversarial-networksaggregation-pipelinearchlinux+16

EM070_New-FPGA-family-for-CNN-architectures-High-Speed-Soft-Neuron-Design

Hossamomar

❤️35

Who doesn’t dream of a new FPGA family that can provide embedded hard neurons in its silicon architecture fabric instead of the conventional DSP and multiplier blocks? The optimized hard neuron design will allow all the software and hardware designers to create or test different deep learning network architectures, especially the convolutional neural networks (CNN), more easily and faster in comparing to any previous FPGA family in the market nowadays. The revolutionary idea about this project is to open the gate of creativity for a precise-tailored new generation of FPGA families that can solve the problems of wasting logic resources and/or unneeded buses width as in the conventional DSP blocks nowadays. The project focusing on the anchor point of the any deep learning architecture, which is to design an optimized high-speed neuron block which should replace the conventional DSP blocks to avoid the drawbacks that designers face while trying to fit the CNN architecture design to it. The design of the proposed neuron also takes the parallelism operation concept as it’s primary keystone, beside the minimization of logic elements usage to construct the proposed neuron cell. The targeted neuron design resource usage is not to exceeds 500 ALM and the expected maximum operating frequency of 834.03 MHz for each neuron. In this project, ultra-fast, adaptive, and parallel modules are designed as soft blocks using VHDL code such as parallel Multipliers-Accumulators (MACs), RELU activation function that will contribute to open a new horizon for all the FPGA designers to build their own Convolutional Neural Networks (CNN). We couldn’t stop imagining INTEL ALTERA to lead the market by converting the proposed designed CNN block and to be a part of their new FPGA architecture fabrics in a separated new Logic Family so soon. The users of such proposed CNN blocks will be amazed from the high-speed operation per seconds that it can provide to them while they are trying to design their own CNN architectures. For instance, and according to the first coding trial, the initial speed of just one MAC unit can reach 3.5 Giga Operations per Second (GOPS) and has the ability to multiply up to 4 different inputs beside a common weight value, which will lead to a revolution in the FPGA capabilities for adopting the era of deep learning algorithms especially if we take in our consideration that also the blocks can work in parallel mode which can lead to increasing the data throughput of the proposed project to about 16 Tera Operations per Second (TOPS). Finally, we believe that this proposed CNN block for FPGA is just the first step that will leave no areas for competitions with the conventional CPUs and GPUs due to the massive speed that it can provide and its flexible scalability that it can be achieved from the parallelism concept of operation of such FPGA-based CNN blocks.

VHDL

Updated 5 months ago

MPI4DL

OSU-Nowlab

❤️30

Distributed deep learning parallelism framework written in PyTorch

NOASSERTION

Python

Updated 2 months ago

archax

ekzhang

❤️40

Experiments in multi-architecture parallelism for deep learning with JAX

MIT

Python

Updated 7 months ago

cpugpujax+5

Deep-Learning-Parallelism

NiuHuangxiaozi

❤️30

This repository outlines a comprehensive guide for training a distributed deep learning model.

Python

Updated 6 months ago

allreduceddpdeepspeed+4

tensortorch

RavenbornJB

❤️35

A low-level deep learning framework that leverages both CPU and GPU parallelism.

C++

Updated 3 years ago

inference-compilers

dbgannon

❤️35

this is the notebook to accomany "Accelerating Deep Learning Inference with Hardware and Software Parallelism"

Jupyter Notebook

Updated 5 years ago

StarFormationForecasting

AstroDnerd

❤️40

A spatiotemporal deep learning framework for forecasting high-dimensional chaotic systems. Efficiently processes multi-terabyte 3D volumetric data using Distributed Data Parallelism (DDP) and Custom HDF5 Data Loaders. Cleaned and partially forked from nbisht_core_analysis

MIT

Jupyter Notebook

Updated 3 months ago

parallel-mt-inference

geraldzakwan

❤️40

This repository is for the COMS 6998 Practical Deep Learning System Performance course final project that I took at Columbia (https://www.cs.columbia.edu/education/ms/fall-2020-topics-courses/#e6998010). In this project, my teammate and I investigate parallelism in NLP. We experimented on how parallelism (e.g. using multi-head attention instead of recurrent connection and splitting input for inference) affects model performance (accuracy and speed-wise). More on it here http://bit.ly/pract-dl-final-report.

MIT

Python

Updated 4 years ago

inferencemachine-translationmt-model+2

data-parallelism-distributed-deep-learning

RadhaGulhane13

❤️25

No description available

Python

Updated 2 years ago

pipeline-parallelism-distributed-deep-learning

RadhaGulhane13

❤️25

No description available

Python

Updated 2 years ago

Melanoma-Detection-using-Deep-Learning-and-Parallelism

Saiyam155

❤️25

No description available

Updated 7 months ago

Deep-Learning-Hyperparameter-Tuning-with-Multi-GPU-Parallelism

Ning-vv

❤️25

No description available

Python

Updated 4 months ago

Towards_Accelerating_Model_Parallelism_in_Distributed_Deep_Learning_Systems

haaamin

❤️20

No description available

Jupyter Notebook

Updated 2 years ago

Data-Parallelism-How-to-Train-Deep-Learning-Models-on-Multiple-GPUs

unruli

❤️30

No description available

Apache-2.0

Updated 1 year ago

Data_Parallelism-How_to_Train_Deep_Learning_Models_on_Multiple_GPUs

zeberity123

❤️25

No description available

Python

Updated 1 year ago

Data-Parallelism-How-to-Train-Deep-Learning-Models-on-Multiple-GPUs

batulaiko

❤️35

NVIDIA course that covers training deep learning models on multiple GPUs using PyTorch’s DistributedDataParallel (DDP), focusing on data parallelism concepts, multi-GPU setup, scalable model implementation, and optimization techniques for efficient large-scale training.

Jupyter Notebook

Updated 10 months ago

143-AlpaServe-Statistical-Multiplexing-with-Model-Parallelism-for-Deep-Learning-Serving

SZU-AdvTech-2023

❤️25

No description available

Python

Updated 1 year ago

neuro-go

b0tShaman

❤️45

Data parallelism, zero-allocation Deep Learning framework in Go

Updated 2 months ago

low-level-cnn

BrandonXue

❤️40

Deep learning with convolutional neural networks, from scratch, using parallelism (CUDA).

MIT

Cuda

Updated 3 years ago

ccudaimage-classification

Cataract_Detection

MALAY-21

❤️35

Cataract, a common eye condition characterized by clouding of the lens, Timely detection and intervention are crucial for effective management of this condition. In this project, we propose a novel approach for cataract detection leveraging deep learning methods and data parallelism techniques.

PureBasic

Updated 1 year ago

15418-needle

shawnsihyunlee

❤️35

Apple Silicon acceleration and data parallelism for Needle, a home-brewed deep learning framework

Jupyter Notebook

Updated 3 months ago

solution_iopddl

Ava4wonder

❤️40

My solution for ASPLOS-2025-Contest #1: Intra-Operator Parallelism for Distributed Deep Learning

Apache-2.0

Makefile

Updated 1 year ago

iopddl

ti2-group

🧡50

Source code for the Contest on Intra-Operator Parallelism for Distributed Deep Learning (IOPDDL).

NOASSERTION

C++

Updated 1 month ago

nvidia-data-parallelism

subashreevs

❤️35

Course Content from Data Parallelism: How to Train Deep Learning Models on Multiple GPUs by NVIDIA

Jupyter Notebook

Updated 9 months ago

python-concurrency

asifrahaman13

❤️35

My learning journey into deeper python concurrency. Contains the codes related to multiprocessing, multi threading, async operation codes, and few other concurrency and parallelism concepts.

Python

Updated 1 year ago

GitHub Explorer

Search Results

mms

iopddl

Attention-Based-Deep-Multiple-Instance-Learning

QuintNet

nemesyst

EM070_New-FPGA-family-for-CNN-architectures-High-Speed-Soft-Neuron-Design

MPI4DL

archax

Deep-Learning-Parallelism

tensortorch

inference-compilers

StarFormationForecasting

parallel-mt-inference

data-parallelism-distributed-deep-learning

pipeline-parallelism-distributed-deep-learning

Melanoma-Detection-using-Deep-Learning-and-Parallelism

Deep-Learning-Hyperparameter-Tuning-with-Multi-GPU-Parallelism

Towards_Accelerating_Model_Parallelism_in_Distributed_Deep_Learning_Systems

Data-Parallelism-How-to-Train-Deep-Learning-Models-on-Multiple-GPUs

Data_Parallelism-How_to_Train_Deep_Learning_Models_on_Multiple_GPUs

Data-Parallelism-How-to-Train-Deep-Learning-Models-on-Multiple-GPUs

143-AlpaServe-Statistical-Multiplexing-with-Model-Parallelism-for-Deep-Learning-Serving

neuro-go

low-level-cnn

Cataract_Detection

15418-needle

solution_iopddl

iopddl

nvidia-data-parallelism

python-concurrency

mms

iopddl

Attention-Based-Deep-Multiple-Instance-Learning

QuintNet

nemesyst

EM070_New-FPGA-family-for-CNN-architectures-High-Speed-Soft-Neuron-Design

MPI4DL

archax

Deep-Learning-Parallelism

tensortorch

inference-compilers

StarFormationForecasting

parallel-mt-inference

data-parallelism-distributed-deep-learning

pipeline-parallelism-distributed-deep-learning

Melanoma-Detection-using-Deep-Learning-and-Parallelism

Deep-Learning-Hyperparameter-Tuning-with-Multi-GPU-Parallelism

Towards_Accelerating_Model_Parallelism_in_Distributed_Deep_Learning_Systems

Data-Parallelism-How-to-Train-Deep-Learning-Models-on-Multiple-GPUs

Data_Parallelism-How_to_Train_Deep_Learning_Models_on_Multiple_GPUs

Data-Parallelism-How-to-Train-Deep-Learning-Models-on-Multiple-GPUs

143-AlpaServe-Statistical-Multiplexing-with-Model-Parallelism-for-Deep-Learning-Serving

neuro-go

low-level-cnn

Cataract_Detection

15418-needle

solution_iopddl

iopddl

nvidia-data-parallelism

python-concurrency