Found 31 repositories(showing 30)
molyswu
using Neural Networks (SSD) on Tensorflow. This repo documents steps and scripts used to train a hand detector using Tensorflow (Object Detection API). As with any DNN based task, the most expensive (and riskiest) part of the process has to do with finding or creating the right (annotated) dataset. I was interested mainly in detecting hands on a table (egocentric view point). I experimented first with the [Oxford Hands Dataset](http://www.robots.ox.ac.uk/~vgg/data/hands/) (the results were not good). I then tried the [Egohands Dataset](http://vision.soic.indiana.edu/projects/egohands/) which was a much better fit to my requirements. The goal of this repo/post is to demonstrate how neural networks can be applied to the (hard) problem of tracking hands (egocentric and other views). Better still, provide code that can be adapted to other uses cases. If you use this tutorial or models in your research or project, please cite [this](#citing-this-tutorial). Here is the detector in action. <img src="images/hand1.gif" width="33.3%"><img src="images/hand2.gif" width="33.3%"><img src="images/hand3.gif" width="33.3%"> Realtime detection on video stream from a webcam . <img src="images/chess1.gif" width="33.3%"><img src="images/chess2.gif" width="33.3%"><img src="images/chess3.gif" width="33.3%"> Detection on a Youtube video. Both examples above were run on a macbook pro **CPU** (i7, 2.5GHz, 16GB). Some fps numbers are: | FPS | Image Size | Device| Comments| | ------------- | ------------- | ------------- | ------------- | | 21 | 320 * 240 | Macbook pro (i7, 2.5GHz, 16GB) | Run without visualizing results| | 16 | 320 * 240 | Macbook pro (i7, 2.5GHz, 16GB) | Run while visualizing results (image above) | | 11 | 640 * 480 | Macbook pro (i7, 2.5GHz, 16GB) | Run while visualizing results (image above) | > Note: The code in this repo is written and tested with Tensorflow `1.4.0-rc0`. Using a different version may result in [some errors](https://github.com/tensorflow/models/issues/1581). You may need to [generate your own frozen model](https://pythonprogramming.net/testing-custom-object-detector-tensorflow-object-detection-api-tutorial/?completed=/training-custom-objects-tensorflow-object-detection-api-tutorial/) graph using the [model checkpoints](model-checkpoint) in the repo to fit your TF version. **Content of this document** - Motivation - Why Track/Detect hands with Neural Networks - Data preparation and network training in Tensorflow (Dataset, Import, Training) - Training the hand detection Model - Using the Detector to Detect/Track hands - Thoughts on Optimizations. > P.S if you are using or have used the models provided here, feel free to reach out on twitter ([@vykthur](https://twitter.com/vykthur)) and share your work! ## Motivation - Why Track/Detect hands with Neural Networks? There are several existing approaches to tracking hands in the computer vision domain. Incidentally, many of these approaches are rule based (e.g extracting background based on texture and boundary features, distinguishing between hands and background using color histograms and HOG classifiers,) making them not very robust. For example, these algorithms might get confused if the background is unusual or in situations where sharp changes in lighting conditions cause sharp changes in skin color or the tracked object becomes occluded.(see [here for a review](https://www.cse.unr.edu/~bebis/handposerev.pdf) paper on hand pose estimation from the HCI perspective) With sufficiently large datasets, neural networks provide opportunity to train models that perform well and address challenges of existing object tracking/detection algorithms - varied/poor lighting, noisy environments, diverse viewpoints and even occlusion. The main drawbacks to usage for real-time tracking/detection is that they can be complex, are relatively slow compared to tracking-only algorithms and it can be quite expensive to assemble a good dataset. But things are changing with advances in fast neural networks. Furthermore, this entire area of work has been made more approachable by deep learning frameworks (such as the tensorflow object detection api) that simplify the process of training a model for custom object detection. More importantly, the advent of fast neural network models like ssd, faster r-cnn, rfcn (see [here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md#coco-trained-models-coco-models) ) etc make neural networks an attractive candidate for real-time detection (and tracking) applications. Hopefully, this repo demonstrates this. > If you are not interested in the process of training the detector, you can skip straight to applying the [pretrained model I provide in detecting hands](#detecting-hands). Training a model is a multi-stage process (assembling dataset, cleaning, splitting into training/test partitions and generating an inference graph). While I lightly touch on the details of these parts, there are a few other tutorials cover training a custom object detector using the tensorflow object detection api in more detail[ see [here](https://pythonprogramming.net/training-custom-objects-tensorflow-object-detection-api-tutorial/) and [here](https://towardsdatascience.com/how-to-train-your-own-object-detector-with-tensorflows-object-detector-api-bec72ecfe1d9) ]. I recommend you walk through those if interested in training a custom object detector from scratch. ## Data preparation and network training in Tensorflow (Dataset, Import, Training) **The Egohands Dataset** The hand detector model is built using data from the [Egohands Dataset](http://vision.soic.indiana.edu/projects/egohands/) dataset. This dataset works well for several reasons. It contains high quality, pixel level annotations (>15000 ground truth labels) where hands are located across 4800 images. All images are captured from an egocentric view (Google glass) across 48 different environments (indoor, outdoor) and activities (playing cards, chess, jenga, solving puzzles etc). <img src="images/egohandstrain.jpg" width="100%"> If you will be using the Egohands dataset, you can cite them as follows: > Bambach, Sven, et al. "Lending a hand: Detecting hands and recognizing activities in complex egocentric interactions." Proceedings of the IEEE International Conference on Computer Vision. 2015. The Egohands dataset (zip file with labelled data) contains 48 folders of locations where video data was collected (100 images per folder). ``` -- LOCATION_X -- frame_1.jpg -- frame_2.jpg ... -- frame_100.jpg -- polygons.mat // contains annotations for all 100 images in current folder -- LOCATION_Y -- frame_1.jpg -- frame_2.jpg ... -- frame_100.jpg -- polygons.mat // contains annotations for all 100 images in current folder ``` **Converting data to Tensorflow Format** Some initial work needs to be done to the Egohands dataset to transform it into the format (`tfrecord`) which Tensorflow needs to train a model. This repo contains `egohands_dataset_clean.py` a script that will help you generate these csv files. - Downloads the egohands datasets - Renames all files to include their directory names to ensure each filename is unique - Splits the dataset into train (80%), test (10%) and eval (10%) folders. - Reads in `polygons.mat` for each folder, generates bounding boxes and visualizes them to ensure correctness (see image above). - Once the script is done running, you should have an images folder containing three folders - train, test and eval. Each of these folders should also contain a csv label document each - `train_labels.csv`, `test_labels.csv` that can be used to generate `tfrecords` Note: While the egohands dataset provides four separate labels for hands (own left, own right, other left, and other right), for my purpose, I am only interested in the general `hand` class and label all training data as `hand`. You can modify the data prep script to generate `tfrecords` that support 4 labels. Next: convert your dataset + csv files to tfrecords. A helpful guide on this can be found [here](https://pythonprogramming.net/creating-tfrecord-files-tensorflow-object-detection-api-tutorial/).For each folder, you should be able to generate `train.record`, `test.record` required in the training process. ## Training the hand detection Model Now that the dataset has been assembled (and your tfrecords), the next task is to train a model based on this. With neural networks, it is possible to use a process called [transfer learning](https://www.tensorflow.org/tutorials/image_retraining) to shorten the amount of time needed to train the entire model. This means we can take an existing model (that has been trained well on a related domain (here image classification) and retrain its final layer(s) to detect hands for us. Sweet!. Given that neural networks sometimes have thousands or millions of parameters that can take weeks or months to train, transfer learning helps shorten training time to possibly hours. Tensorflow does offer a few models (in the tensorflow [model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md#coco-trained-models-coco-models)) and I chose to use the `ssd_mobilenet_v1_coco` model as my start point given it is currently (one of) the fastest models (read the SSD research [paper here](https://arxiv.org/pdf/1512.02325.pdf)). The training process can be done locally on your CPU machine which may take a while or better on a (cloud) GPU machine (which is what I did). For reference, training on my macbook pro (tensorflow compiled from source to take advantage of the mac's cpu architecture) the maximum speed I got was 5 seconds per step as opposed to the ~0.5 seconds per step I got with a GPU. For reference it would take about 12 days to run 200k steps on my mac (i7, 2.5GHz, 16GB) compared to ~5hrs on a GPU. > **Training on your own images**: Please use the [guide provided by Harrison from pythonprogramming](https://pythonprogramming.net/training-custom-objects-tensorflow-object-detection-api-tutorial/) on how to generate tfrecords given your label csv files and your images. The guide also covers how to start the training process if training locally. [see [here] (https://pythonprogramming.net/training-custom-objects-tensorflow-object-detection-api-tutorial/)]. If training in the cloud using a service like GCP, see the [guide here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_cloud.md). As the training process progresses, the expectation is that total loss (errors) gets reduced to its possible minimum (about a value of 1 or thereabout). By observing the tensorboard graphs for total loss(see image below), it should be possible to get an idea of when the training process is complete (total loss does not decrease with further iterations/steps). I ran my training job for 200k steps (took about 5 hours) and stopped at a total Loss (errors) value of 2.575.(In retrospect, I could have stopped the training at about 50k steps and gotten a similar total loss value). With tensorflow, you can also run an evaluation concurrently that assesses your model to see how well it performs on the test data. A commonly used metric for performance is mean average precision (mAP) which is single number used to summarize the area under the precision-recall curve. mAP is a measure of how well the model generates a bounding box that has at least a 50% overlap with the ground truth bounding box in our test dataset. For the hand detector trained here, the mAP value was **0.9686@0.5IOU**. mAP values range from 0-1, the higher the better. <img src="images/accuracy.jpg" width="100%"> Once training is completed, the trained inference graph (`frozen_inference_graph.pb`) is then exported (see the earlier referenced guides for how to do this) and saved in the `hand_inference_graph` folder. Now its time to do some interesting detection. ## Using the Detector to Detect/Track hands If you have not done this yet, please following the guide on installing [Tensorflow and the Tensorflow object detection api](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md). This will walk you through setting up the tensorflow framework, cloning the tensorflow github repo and a guide on - Load the `frozen_inference_graph.pb` trained on the hands dataset as well as the corresponding label map. In this repo, this is done in the `utils/detector_utils.py` script by the `load_inference_graph` method. ```python detection_graph = tf.Graph() with detection_graph.as_default(): od_graph_def = tf.GraphDef() with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid: serialized_graph = fid.read() od_graph_def.ParseFromString(serialized_graph) tf.import_graph_def(od_graph_def, name='') sess = tf.Session(graph=detection_graph) print("> ====== Hand Inference graph loaded.") ``` - Detect hands. In this repo, this is done in the `utils/detector_utils.py` script by the `detect_objects` method. ```python (boxes, scores, classes, num) = sess.run( [detection_boxes, detection_scores, detection_classes, num_detections], feed_dict={image_tensor: image_np_expanded}) ``` - Visualize detected bounding detection_boxes. In this repo, this is done in the `utils/detector_utils.py` script by the `draw_box_on_image` method. This repo contains two scripts that tie all these steps together. - detect_multi_threaded.py : A threaded implementation for reading camera video input detection and detecting. Takes a set of command line flags to set parameters such as `--display` (visualize detections), image parameters `--width` and `--height`, videe `--source` (0 for camera) etc. - detect_single_threaded.py : Same as above, but single threaded. This script works for video files by setting the video source parameter videe `--source` (path to a video file). ```cmd # load and run detection on video at path "videos/chess.mov" python detect_single_threaded.py --source videos/chess.mov ``` > Update: If you do have errors loading the frozen inference graph in this repo, feel free to generate a new graph that fits your TF version from the model-checkpoint in this repo. Use the [export_inference_graph.py](https://github.com/tensorflow/models/blob/master/research/object_detection/export_inference_graph.py) script provided in the tensorflow object detection api repo. More guidance on this [here](https://pythonprogramming.net/testing-custom-object-detector-tensorflow-object-detection-api-tutorial/?completed=/training-custom-objects-tensorflow-object-detection-api-tutorial/). ## Thoughts on Optimization. A few things that led to noticeable performance increases. - Threading: Turns out that reading images from a webcam is a heavy I/O event and if run on the main application thread can slow down the program. I implemented some good ideas from [Adrian Rosebuck](https://www.pyimagesearch.com/2017/02/06/faster-video-file-fps-with-cv2-videocapture-and-opencv/) on parrallelizing image capture across multiple worker threads. This mostly led to an FPS increase of about 5 points. - For those new to Opencv, images from the `cv2.read()` method return images in [BGR format](https://www.learnopencv.com/why-does-opencv-use-bgr-color-format/). Ensure you convert to RGB before detection (accuracy will be much reduced if you dont). ```python cv2.cvtColor(image_np, cv2.COLOR_BGR2RGB) ``` - Keeping your input image small will increase fps without any significant accuracy drop.(I used about 320 x 240 compared to the 1280 x 720 which my webcam provides). - Model Quantization. Moving from the current 32 bit to 8 bit can achieve up to 4x reduction in memory required to load and store models. One way to further speed up this model is to explore the use of [8-bit fixed point quantization](https://heartbeat.fritz.ai/8-bit-quantization-and-tensorflow-lite-speeding-up-mobile-inference-with-low-precision-a882dfcafbbd). Performance can also be increased by a clever combination of tracking algorithms with the already decent detection and this is something I am still experimenting with. Have ideas for optimizing better, please share! <img src="images/general.jpg" width="100%"> Note: The detector does reflect some limitations associated with the training set. This includes non-egocentric viewpoints, very noisy backgrounds (e.g in a sea of hands) and sometimes skin tone. There is opportunity to improve these with additional data. ## Integrating Multiple DNNs. One way to make things more interesting is to integrate our new knowledge of where "hands" are with other detectors trained to recognize other objects. Unfortunately, while our hand detector can in fact detect hands, it cannot detect other objects (a factor or how it is trained). To create a detector that classifies multiple different objects would mean a long involved process of assembling datasets for each class and a lengthy training process. > Given the above, a potential strategy is to explore structures that allow us **efficiently** interleave output form multiple pretrained models for various object classes and have them detect multiple objects on a single image. An example of this is with my primary use case where I am interested in understanding the position of objects on a table with respect to hands on same table. I am currently doing some work on a threaded application that loads multiple detectors and outputs bounding boxes on a single image. More on this soon.
This repository presents a couple of approaches to the problem of multi-view image classification. I faced this challenge during a hackathon in which I participated, and decided to share my code here. I've also written a Medium article to provide further details and explanations. Feel free to check it out !
ferjad
Code for CVPR23 Highlight "I2MVFormer: Large Language Model Generated Multi-View Document Supervision for Zero-Shot Image Classification" and NeurIPS2022 "I2DFormer: Learning Image to Document Attention for Zero-Shot Image Classification"
Penn000
GMMDP, hyperspectral images classification, multi-view learning
XZheng0427
Cross-fusion Mamba for multi-view medical image classification
ahmed1996said
Source code for GARDNet: Robust Multi-View Network for Glaucoma Classification in Color Fundus Images
HongliLiu1
1st place solution for PBVS 2025 Multi-modal Aerial View Image Challenge - C (SAR Classification)
yufeixue-ai
HKUST ELCE6910 2024Fall HW2
A Multi-View-CNN Framework for Deep Representation Learning in Image Classification
libingcheng3979
A lightweight street view images classification project with GradCAM visualization, supporting multi-stage training, evaluation, and explainable AI techniques.
isliao613
It is going to make a model that is trained by a new chest X-ray dataset, which is named “ChestX-ray8”. This dataset contains 108,948 frontal view X-ray images from 32,717 individual patients with eight different disease image labels. These diseases can be detected by using supervised multi-label image classification.
Multi-view fish image classification using PyTorch. Three ResNet18 models extract features from side, back, and belly views, fused via an attention mechanism for final classification. Grad-CAM visualizes model focus areas.
ChiungYunChang
Using a pre-trained model, build a deep network that predicts the category and attributes of an item simultaneously. image from : https://drive.google.com/file/d/1alC3j-4yaHfZe4bYkr6M7Snxeru4GzYN/view • Category (multi class classification): 10 • Attribute (multi label classification): 15
ANALYZING ROAD SAFETY & TRAFFIC DEMOGRAPHICS IN THE UK (Multi-class Classification) SUMMARY Here, I am aim to analyze the Road Safety and Traffic Demographics dataset (UK), containing accidents reported by the police between the years of 2004 - 2017. PROJECT GOALS: Identify factors responsible for most of the reported accidents. Build a machine learning model that is capable of accurately predicting the severity of an accident. Provide recommendations to the Department of Transport (UK Government), to improve road safety policies and prevent recurrences of severe accidents where possible. PACKAGES USED: Scikit-learn, numpy, pandas, imblearn (imbalanced-learn), seaborn, Matplotlib MOTIVATION World Health Organization (WHO) reported that more than 1.25 million people die each year while 50 million are injured as a result of road accidents worldwide. Road accidents are the 10th leading cause of death globally. On current trends, road traffic accidents are to become the 7th leading cause of death by 2030 making it a major public health concern. Between the years 2005 and 2016, there were roughly 2 million road accidents reported in the United Kingdom (UK) alone of which 16,000 were fatal. As a big data project, I wanted to explore the traffic demographics data in greater detail using machine learning! CONTEXT The UK government amassed traffic data from 2004 to 2017, recording over 2 million accidents in the process and making this one of the most comprehensive traffic data sets out there. It's a huge picture of a country undergoing change. Note that all the contained accident data comes from police reports, so this data does not include minor incidents. For steps undertaken to pre-process and clean the data, please view the "Data Cleansing & Descriptive Analysis_UK Traffic Demographics.ipynb" file DESCRIPTIVE ANALYTICS (EDA) Tools used include Python, Tableau, MS PowerBI Percent (%) distribution of target classes Percent dist of Accident Severity As seen above, the data is highly imbalanced. For detailed steps undertaken to deal with the imbalanced data, please view the "Modelling_Predictive Analytics_UK Traffic Demographics.ipynb" file. This article provides some great tips on utilizing the correct performance metrics when analyzing a models performance trained on an imbalanced dataset. This article describes several strategies that can help combat the case of a severly imbalanced dataset. Methods include: Resampling strategies (under - Tomek Links, Cluster Centroids, over sampling - SMOTE) Using Decision Tree based models Using Cost-Sensitive training (Penalize algorithms) Number of accidents by Year and Accident Severity Total accidents by year and severity It can be seen above that the trend seems to be increasing as the years go. In addition, the spike between 2008 - 2009 was because of a enhancement in the reporting system introduced in the UK in 2009, where all accident including minor accidents needed to be reported by the police so as to match the counts represented by hospitals, insurance claims etc. Accidents density by Location geomap Most accidents took place in major cities - Birmingham, London, leeds, Newcastle Accidents by Gender and Age Accidents by gender and age Accidents by Day of the week and Year Accidents by year and weekday Most accidents take place on a Friday Vehicle Manoever at time of accident Vehicle Manoever at time of accident Most accidents take place as a result of overtaking For more findings, please go to the "Images" folder. For steps undertaken to carry out some predictive modeling and hyper-parameter tuning, please view the "Modelling_Predictive Analytics_UK Traffic Demographics.ipynb" file. RECOMMENDATIONS TO THE DEPARTMENT OF TRANSPORT (UK) Decrease emergency response times during afternoon rush-hours (15-19) especially on Fridays. Allocate resources to investigate high density traffic points and identify new infrastructure needs to divert traffic from dual-carriage ways. Explore conditions of vehicles and casualties such as vehicle type, age of vehicles registered, pedestrian movements, etc. for policy makers. Adopt comprehensive distracted driving laws that increase penalties for drivers who commit traffic violations like aggressive overtaking. ACKNOWLEDGEMENTS The license for this dataset is the Open Givernment Licence used by all data on data.gov.uk. The raw datasets are available from the UK Department of Transport website. I had a lot of fun working on this dataset and learned a lot in the process. I plan to further my research in the area of predictive modeling using imabalanced data and how to effectively build a highly robust model for future projects. About Here, I analyze the Road Safety and Traffic Demographics dataset (UK), containing accidents reported by the police between the years of 2004 - 2017. Topics accident-rate accident-severity imbalanced-data imbalanced-learning road-accident reported-accidents road-safety uk-government transport traffic-demographics severe-accidents pca classification Resources Readme Releases No releases published Packages No packages published Languages Jupyter Notebook 100.0% © 2020 GitHub, Inc.
Aryia-Behroziuan
In the late 1960s, computer vision began at universities which were pioneering artificial intelligence. It was meant to mimic the human visual system, as a stepping stone to endowing robots with intelligent behavior.[11] In 1966, it was believed that this could be achieved through a summer project, by attaching a camera to a computer and having it "describe what it saw".[12][13] What distinguished computer vision from the prevalent field of digital image processing at that time was a desire to extract three-dimensional structure from images with the goal of achieving full scene understanding. Studies in the 1970s formed the early foundations for many of the computer vision algorithms that exist today, including extraction of edges from images, labeling of lines, non-polyhedral and polyhedral modeling, representation of objects as interconnections of smaller structures, optical flow, and motion estimation.[11] The next decade saw studies based on more rigorous mathematical analysis and quantitative aspects of computer vision. These include the concept of scale-space, the inference of shape from various cues such as shading, texture and focus, and contour models known as snakes. Researchers also realized that many of these mathematical concepts could be treated within the same optimization framework as regularization and Markov random fields.[14] By the 1990s, some of the previous research topics became more active than the others. Research in projective 3-D reconstructions led to better understanding of camera calibration. With the advent of optimization methods for camera calibration, it was realized that a lot of the ideas were already explored in bundle adjustment theory from the field of photogrammetry. This led to methods for sparse 3-D reconstructions of scenes from multiple images. Progress was made on the dense stereo correspondence problem and further multi-view stereo techniques. At the same time, variations of graph cut were used to solve image segmentation. This decade also marked the first time statistical learning techniques were used in practice to recognize faces in images (see Eigenface). Toward the end of the 1990s, a significant change came about with the increased interaction between the fields of computer graphics and computer vision. This included image-based rendering, image morphing, view interpolation, panoramic image stitching and early light-field rendering.[11] Recent work has seen the resurgence of feature-based methods, used in conjunction with machine learning techniques and complex optimization frameworks.[15][16] The advancement of Deep Learning techniques has brought further life to the field of computer vision. The accuracy of deep learning algorithms on several benchmark computer vision data sets for tasks ranging from classification, segmentation and optical flow has surpassed prior methods.[citation needed]
qianyuwang
code for Multi-View Analysis Dictionary Learning for Image Classification
CCCXJ-CXJ
All the prediction results of comparison models in "Multi-view Graph Convolutional Network with Spectral Component Decompose for Remote Sensing Images Classification"
nse1994
Mini Projects for EEL 6935 Deep Learning Course. Covering Cost functions, Neural Network for multi-class classification, Regularizers, Image Classification using CNN on CIFAR-10 datset, View filters, Data Augmentation, Denoising Autoencoders, Recommender systems , DNN, CNN, GANS, LSTM- RNN
genissrruiz
This repository features a multi-modal neural network to classify 28 business types from street view images. Combining visual features from ResNet and textual data from OCR, the model enhances classification accuracy using the Con-Text dataset.
MohamedRamadan-prog
4.3.1 Abstract - Classification problem has an effective role in self-driving cars. However, classifying weather the showing object is a car, traffic sign, or building is needed, but "how it can be implemented?" This takes many phases. - Traffic sign recognition is a multi-class classification problem. Traffic signs can provide a wide range of variations between classes in terms of color, shape, and the presence of pictograms or text. However, there exist subsets of classes (e. g., speed limit signs) that are very similar to each other. - The classifier has to cope with large variations in visual appearances due to illumination changes, partial occlusions, rotations, weather conditions, etc. - Humans are capable of recognizing the large variety of existing road signs with close to 100% correctness. This does not only apply to real-world driving, which provides both context and multiple views of a single traffic sign, but also to the recognition from single images. - In this project we will focus on the mechanism of recognizing the traffic signs and differentiating between them. In order to achieve this, several steps had been taken.
charton-MGH-HARVARD
No description available
Songlm1995
The code of "Multi-View and Multi-Modal Transformer for Bacterial Pneumonia Classification in CT Images"
hawk-sudo
This repository is the implementation for EyeMVNet: Uncertainty-Aware Multi-View Network for Retinal Image Classification.
Lyc1022
Code for the paper'DMVMLC-VT: Deep Incomplete Multi-View Image Classification with View Translation and Pseudo-Label Enhancement' (under submission)
lizitao2005
A Multi-View Image Dataset for Deep Learning-Based Classification of 25 Quarantine Fruit Fly Species (Diptera: Tephritidae)
ankitaag1
Image-based classification of plants under normal and stress (moisture-deficit) conditions using multi-view features and machine learning models.
LeoPag
This repo contains the projects for the Computer Vision course at ETH Zurich. These projects assess a broad range of Computer Vision tasks: Image Classification, Image Segmentation, Camera Calibration, Structure From Motion, Multi-View-Stereo, Tracking.
- Combined multi-view (front and side) image inputs and processed over 40,000 images, categorizing BMI into clinical grades (Underweight, Normal, Overweight). \\ - Developed a ResNet50-based model achieving 97.66\% accuracy in gender classification and 85.95\% accuracy in BMI categorization.
eskutcheon
An efficient, interactive GUI framework for reviewing and categorizing image annotations and more, supporting single-label and multi-label classification with a results (slideshow) viewer for quick inspection and optional generation of more advanced views
DevAdy
End-to-end AI-powered cattle & buffalo breed recognition for India, integrating multi-view image classification, metadata fusion, blockchain-backed registry, REST APIs, and a mobile-ready frontend for seamless BPA integration and field deployment.