Found 65 repositories(showing 30)
awai54st
Python on Zynq FPGA for Convolutional Neural Networks
quanzaihh
A Convolutional Neural Network Accelerator implementation on FPGA, xilinx (xczu7ev-ffvc1156-2-i), The inference of yolov8 took 60ms.
hunterlew
CNN acceleration on virtex-7 FPGA with verilog HDL
cornell-zhang
Binarized Convolutional Neural Networks on Software-Programmable FPGAs (FPGA'17)
sumanth-kalluri
This is a fully parameterized verilog implementation of computation kernels for accleration of the Inference of Convolutional Neural Networks on FPGAs
ilaydayaman
A trained Convolutional Neural Network implemented on ZedBoard Zynq-7000 FPGA.
Alioth2000
Built a convolutional neural network on the PYNQ-Z2 platform and accelerated traffic sign recognition using FPGA.
mertz1999
implement convolution neural network on FPGA based on VHDL design
A project on hardware design for convolutional neural network. This neural network is of 2 layers with 400 inputs in the first layer. This layer takes input from a memory. A MATLAB script was created to get the floating point inputs and convert it to 7 bit signed binary output. This was done for inputs as well as the weights in these two layers. Sigmoid case statement was also implemented in verilog to get the sigmoid values for intermediate outputs in a layer. This design was simulated and synthesized at 50 MHz on Quartus Prime 17.0. The FPGA family was Cyclone V. Total logic elements used were 724, total bits used 121856(only 50% use of memory).
Who doesn’t dream of a new FPGA family that can provide embedded hard neurons in its silicon architecture fabric instead of the conventional DSP and multiplier blocks? The optimized hard neuron design will allow all the software and hardware designers to create or test different deep learning network architectures, especially the convolutional neural networks (CNN), more easily and faster in comparing to any previous FPGA family in the market nowadays. The revolutionary idea about this project is to open the gate of creativity for a precise-tailored new generation of FPGA families that can solve the problems of wasting logic resources and/or unneeded buses width as in the conventional DSP blocks nowadays. The project focusing on the anchor point of the any deep learning architecture, which is to design an optimized high-speed neuron block which should replace the conventional DSP blocks to avoid the drawbacks that designers face while trying to fit the CNN architecture design to it. The design of the proposed neuron also takes the parallelism operation concept as it’s primary keystone, beside the minimization of logic elements usage to construct the proposed neuron cell. The targeted neuron design resource usage is not to exceeds 500 ALM and the expected maximum operating frequency of 834.03 MHz for each neuron. In this project, ultra-fast, adaptive, and parallel modules are designed as soft blocks using VHDL code such as parallel Multipliers-Accumulators (MACs), RELU activation function that will contribute to open a new horizon for all the FPGA designers to build their own Convolutional Neural Networks (CNN). We couldn’t stop imagining INTEL ALTERA to lead the market by converting the proposed designed CNN block and to be a part of their new FPGA architecture fabrics in a separated new Logic Family so soon. The users of such proposed CNN blocks will be amazed from the high-speed operation per seconds that it can provide to them while they are trying to design their own CNN architectures. For instance, and according to the first coding trial, the initial speed of just one MAC unit can reach 3.5 Giga Operations per Second (GOPS) and has the ability to multiply up to 4 different inputs beside a common weight value, which will lead to a revolution in the FPGA capabilities for adopting the era of deep learning algorithms especially if we take in our consideration that also the blocks can work in parallel mode which can lead to increasing the data throughput of the proposed project to about 16 Tera Operations per Second (TOPS). Finally, we believe that this proposed CNN block for FPGA is just the first step that will leave no areas for competitions with the conventional CPUs and GPUs due to the massive speed that it can provide and its flexible scalability that it can be achieved from the parallelism concept of operation of such FPGA-based CNN blocks.
CNN on Artix-7 FPGA to perform pattern detection from a pool of objects
Feed-forward neural networks can be trained based on a gradient-descent based backpropagation algorithm. But, these algorithms require more computation time. Extreme Learning Machines (ELM’s) are time-efficient, and they are less complicated than the conventional gradient-based algorithm. In previous years, an SRAM based convolutional neural network using a receptive – field Approach was proposed. This neural network was used as an encoder for the ELM algorithm and was implemented on FPGA. But, this neural network used an inaccurate 3-stage pipelined parallel adder. Hence, this neural network generates imprecise stimuli to the hidden layer neurons. This paper presents an implementation of precise convolutional neural network for encoding in the ELM algorithm based on the receptive - field approach at the hardware level. In the third stage of the pipelined parallel adder, instead of approximating the output by using one 2-input 15-bit adder, one 4-input 14-bit adder is used. Also, an additional weighted pixel array block is used. This weighted pixel array improves the accuracy of generating 128 weighted pixels. This neural network was simulated using ModelSim-Altera 10.1d and synthesized using Quartus II 13.0 sp1. This neural network is implemented on Cyclone V FPGA and used for pattern recognition applications. Although this design consumes slightly more hardware resources, this design is more accurate compared to previously existing encoders.
FPGA Implementation of Image Processing for MNIST Dataset Based on Convolutional Neural Network Algorithm (CNN)
ielecer
Build convolutional neural network (CNN) accelerator based on FPGA
ikwzm
This repository provides VHDL code for performing quantized convolution for deep neural networks on FPGA/ASIC.
Increasing the accuracy of Convolutional Neural Networks (CNNs) has become a recent research focus in computer vision applications. Smaller CNN architectures like SqueezeNet and MobileNet can demonstrate accelerated performance on FPGAs and GPUs due to smaller model size and fewer network parameters. Implementation of CNNs on accelerators have two important benefits - GPUs provide thread-level parallelism to achieve higher throughput and FPGAs offer a customizable application-specific datapath. These two reasons make these platforms better suited for convolution like operations which involve huge data. This project aims to implement one such CNN architecture, MobileNet on an Image dataset in OpenCL, thereby comparing kernel execution time and memory bandwidth usage on FPGA and GPU
tboser
Convolutional Neural Networks on FPGA.
Developed and implemented a high-performance accelerator for Convolutional Neural Networks (CNNs) on the PYNQ-Z2 FPGA, focusing on optimizing computational efficiency and resource utilization. Conducted performance comparisons between FPGA- based and CPU-based CNN acceleration
HemantaIngle
In this Project, our main aim is to accelerate the image recognition of CNN (Convolution Neural Network) with the help of a platform deployable on FPGA. CNN focuses on image classification, speech recognition, and video analysis. CNN is accelerated by using GPU (Graphical Processing Unit), which is relatively slow and consumes a high amount of power as CNN requires 20 GFLOPS/image. Also, the CPU acceleration being cheaper as it is readily available on most x86 machines is proportional to power. The modern Application-Specific Chips(ASICS) and the capability of a Field Programmable Gate Array( FPGA ) have power efficiency and faster computation rate over the GPU. With FPGA as a reconfigurable base and parallel architecture, we decided to target the CNN acceleration with an FPGA using Pipe CNN- an algorithm that gets synthesized via HLS (Hardware Level Synthesis Tools) like Intel's Quartus, and Open CL toolkit. Modern Large scale FPGA's like Stratix 10 and Arria 10 have shown a 10 percent less power consumption than GPU's, and it has an added advantage of pipeline parallel architecture and dedicated DSP for faster and efficient computations. The main goal of the Project is to design an OpenCL accelerator that is generic and yet powerful means of improving throughput in inference computations
nitheeshkm
A Convolution Neural Network on FPGA.
KT220u
A simple binary convolutional neural network implemented on FPGA.
arsalz1999
Implemented a basic convolutional neural network for performing real-time doodle classification on a Xilinx PYNQ P1 FPGA Board.
jongeunl
Stochastic computing based convolutional neural networks implemented on FPGA
JieFangD
FPGA-based Instant Image Recognition on Convolutional Neural Network
wigwagwent
Accelerating Privacy-Preserving Convolutional Neural Networks on FPGAs Using Fully Homomorphic Encryption
nurbano
Comparison of Vitis-AI and FINN for implementing convolutional neural networks on FPGA (KV260)
SorrasitBunluehan
Master Thesis on the topic of create an HW Convolution Neural Network Accelerator based FPGA
youngyang00
This project implements a hardware-accelerated FSRCNN (Fast Super-Resolution Convolutional Neural Network) on FPGA, designed to upscale low-resolution images from 320×180 to 1280×720 (×4) and achieve real-time processing at 60 frames per second.
CalmCelestialSJ
The project involves the design and implementation of a trained Convolutional Neural Network (CNN) on an FPGA platform for image classification tasks.
tinaba96
Quantization of a Convolutional Neural Network (CNN) for motion estimation with the goal of implementing it on a Field-Programmable Gate Array (FPGA)