Found 6,708 repositories(showing 30)
taskflow
A General-purpose Task-parallel Programming System in C++
Netflix
Build, Manage and Deploy AI/ML Systems
KhronosGroup
MoltenVK is a Vulkan Portability implementation. It layers a subset of the high-performance, industry-standard Vulkan graphics and compute API over Apple's Metal graphics framework, enabling Vulkan applications to run on macOS, iOS and tvOS.
High-performance TensorFlow library for quantitative finance.
ProjectPhysX
The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs and CPUs via OpenCL. Free for non-commercial use.
parallel101
้ซๆง่ฝๅนถ่ก็ผ็จไธไผๅ - ่ฏพไปถ
alpa-projects
Training and serving large-scale neural networks with auto parallelization.
bshoshany
BS::thread_pool: a fast, lightweight, modern, and easy-to-use C++17 / C++20 / C++23 thread pool library
merrymercy
A list of awesome compiler projects and papers for tensor computation and deep learning.
flame
BLAS-like Library Instantiation Software Framework
kokkos
Kokkos C++ Performance Portability Programming Ecosystem: The Programming Model - Parallel Execution and Memory Abstraction
jofpin
Build applications, scripts, and automations powered by high-performance multicore computing using Node.js
BOINC
Open-source software for volunteer computing and grid computing.
mfem
Lightweight, general, scalable C++ library for finite element methods
tracel-ai
Multi-platform high-performance compute language extension for Rust.
zero-equals-false
๐ A curated list of awesome programming books (Algorithms and data structures, Artificial intelligence, Software Architecture, Humanโcomputer interaction, Operating Systems, Database Systems, IT Security, Concurrency, Interpreters and Compilers, High-Performance Computing, Distributed Systems, Game Development, Mathematical optimization)
chapel-lang
a Productive Parallel Programming Language
hermit-os
Hermit for Rust.
AdaptiveCpp
Compiler for multiple programming models (SYCL, C++ standard parallelism, HIP/CUDA) for CPUs and GPUs from all vendors: The independent, community-driven compiler for C++-based heterogeneous programming models. Lets applications adapt themselves to all the hardware in the system - even at runtime!
Maratyszcza
Acceleration package for neural networks on multi-core CPUs
DLTcollab
A translator from Intel SSE intrinsics to Arm/Aarch64 NEON implementation
hermit-os
A Rust-based, lightweight unikernel.
mratsim
A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends
trilinos
Primary repository for the Trilinos Project
ropensci
An R-focused pipeline toolkit for reproducibility and high-performance computing
sail-sg
C++-based high-performance parallel environment execution engine (vectorized env) for general RL environments.
diwi
A Processing/Java library for high performance GPU-Computing (GLSL). Fluid Simulation + SoftBody Dynamics + Optical Flow + Rendering + Image Processing + Particle Systems + Physics +...
Liu-xiandong
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
trevor-vincent
A curated list of awesome high performance computing resources
uncomplicate
Fast Clojure Matrix Library