Found 61 repositories(showing 30)
bytedance
A high performance and generic framework for distributed DNN training
vdutts7
Distributed training of DNNs • C++/MPI Proxies (GPT-2, GPT-3, CosmoFlow, DLRM)
hipersys-team
[NSDI 2023] TopoOpt: Optimizing the Network Topology for Distributed DNN Training
RCL-NUS
Auto-Multilift is a novel learning framework for cooperative load transportation with quadrotors. It can automatically tune various MPC hyperparameters, which are modeled by DNNs and difficult to tune manually, via reinforcement learning in a distributed and closed-loop manner.
gbxu
[NSDI25] AutoCCL: Automated Collective Communication Tuning for Accelerating Distributed and Parallel DNN Training
Official Pytorch implementation of "DBS: Dynamic Batch Size for Distributed Deep Neural Network Training"
jaywonchung
(ICPP '20) ShadowTutor: Distributed Partial Distillation for Mobile Video DNN Inference
A hybrid DDoS detection system comparing Deep Neural Networks (DNN) and Graph Neural Networks (GNN). GNNs achieve 97.56% accuracy on stealth attacks, outperforming traditional methods on subtle, distributed threats.
hmofrad
Distributed Sparse DNN Inference
qub-blesson
A benchmarking tool for distributing Deep Neural Networks (DNN) in an efficient manner using transfer layers to reduce the data transferred between the distributed DNN
zhiqi-0
HPCA'2024 & TPDS'2024 Tessel: Boosting Distributed Execution of Large DNN Models via Flexible Schedule Search
lprez
Implementation of the algorithm described in "Fully Distributed Deep Learning Inference on Resource-Constrained Edge Devices"
dscpesu
A decentralized and distributed framework for training DNNs
fengyoung
Distributed DNN trainer based on woo and C++
acomze
A Distributed DNN inference systems
Writing low-level TensorFlow programs. Learned how TensorFlow Python API works by building a graph, running a graph, and feeding values into a graph. Calculated area of a triangle using TensorFlow. Implementing a Machine Learning model in TensorFlow using Estimator API. Implemented a simple machine learning model using tf.learn. Read csv data into a Pandas dataframe. Implemented a linear regression model in TensorFlow. Trained and evaluated the model. Predicted with the model. Repeated with a Deep Neural Network (DNN) model in TensorFlow. Scaling up TensorFlow ingest using batching. Loaded large dataset progressively using tf.data.Dataset. Broke the one-to-one relationship between inputs and features. Creating a distributed training TensorFlow model with Estimator API. Learned the importance of watching your validation metrics while training is in progress. Used the estimator.train_and_evaluate function. Monitored training using TensorBoard. Scaling TensorFlow with Cloud Machine Learning Engine. Packaged up TensorFlow model. Ran training locally. Ran training on cloud. Deployed model to cloud. Invoked model to carry out predictions.
psrisank
No description available
EmekaGdswill
This code considers network clock synchronization for wireless networks via pulse coupled PLLs at the nodes
filrg
Optimization Parallelism Efficiency Controller with Distributed DNN Controller Protocol
Superjomn
A distributed training framework for DNN with support of both PS and Collective ways.
mchang6137
Trace Driven simulator to evaluate network accelerators and communication patterns for distributed DNN training
hpdps-group
Artifacts of SC'24 paper "A High-Performance Data Loading Framework for Distributed DNN Training with Remote Storage".
VegetableChook
distributed dnn training
bocway
Modeling the Training Iteration Time for Heterogeneous Distributed Deep Learning Systems in PS+BSP
Foley-ops
No description available
AIoT-MLSys-Lab
[SenSys 2021] "Mercury: Efficient On-Device Distributed DNN Training via Stochastic Importance Sampling" by Xiao Zeng, Ming Yan, Mi Zhang
ctuning
Reproducibility report and the Collective Knowledge workflow for the SysML'19 paper "Priority-based Parameter Propagation for Distributed DNN Training"
EnoxSoftware
Combines OpenCV for Unity with PaddleOCR for end-to-end text detection, classification, and recognition in a MultiSource2MatHelper sample scene. Uses the OpenCV DNN module to perform inference with PaddleOCR 3.0 models. The models are converted from PaddleOCR distributed models to ONNX format files for use, and can handle multiple languages.
gyom
distributed dnn parameter server and job dispatcher
wyaa1801
Distributed Artificial Intelligens protocol implementation base on libp2p. Peer(s) share CPU and GPU resources with each other and makes Distributed Neural Network (DNN).