Found 1,770 repositories(showing 30)
wendy7756
Transcribe and summarize videos and podcasts using AI. Open-source, multi-platform, and supports multiple languages.
molyswu
using Neural Networks (SSD) on Tensorflow. This repo documents steps and scripts used to train a hand detector using Tensorflow (Object Detection API). As with any DNN based task, the most expensive (and riskiest) part of the process has to do with finding or creating the right (annotated) dataset. I was interested mainly in detecting hands on a table (egocentric view point). I experimented first with the [Oxford Hands Dataset](http://www.robots.ox.ac.uk/~vgg/data/hands/) (the results were not good). I then tried the [Egohands Dataset](http://vision.soic.indiana.edu/projects/egohands/) which was a much better fit to my requirements. The goal of this repo/post is to demonstrate how neural networks can be applied to the (hard) problem of tracking hands (egocentric and other views). Better still, provide code that can be adapted to other uses cases. If you use this tutorial or models in your research or project, please cite [this](#citing-this-tutorial). Here is the detector in action. <img src="images/hand1.gif" width="33.3%"><img src="images/hand2.gif" width="33.3%"><img src="images/hand3.gif" width="33.3%"> Realtime detection on video stream from a webcam . <img src="images/chess1.gif" width="33.3%"><img src="images/chess2.gif" width="33.3%"><img src="images/chess3.gif" width="33.3%"> Detection on a Youtube video. Both examples above were run on a macbook pro **CPU** (i7, 2.5GHz, 16GB). Some fps numbers are: | FPS | Image Size | Device| Comments| | ------------- | ------------- | ------------- | ------------- | | 21 | 320 * 240 | Macbook pro (i7, 2.5GHz, 16GB) | Run without visualizing results| | 16 | 320 * 240 | Macbook pro (i7, 2.5GHz, 16GB) | Run while visualizing results (image above) | | 11 | 640 * 480 | Macbook pro (i7, 2.5GHz, 16GB) | Run while visualizing results (image above) | > Note: The code in this repo is written and tested with Tensorflow `1.4.0-rc0`. Using a different version may result in [some errors](https://github.com/tensorflow/models/issues/1581). You may need to [generate your own frozen model](https://pythonprogramming.net/testing-custom-object-detector-tensorflow-object-detection-api-tutorial/?completed=/training-custom-objects-tensorflow-object-detection-api-tutorial/) graph using the [model checkpoints](model-checkpoint) in the repo to fit your TF version. **Content of this document** - Motivation - Why Track/Detect hands with Neural Networks - Data preparation and network training in Tensorflow (Dataset, Import, Training) - Training the hand detection Model - Using the Detector to Detect/Track hands - Thoughts on Optimizations. > P.S if you are using or have used the models provided here, feel free to reach out on twitter ([@vykthur](https://twitter.com/vykthur)) and share your work! ## Motivation - Why Track/Detect hands with Neural Networks? There are several existing approaches to tracking hands in the computer vision domain. Incidentally, many of these approaches are rule based (e.g extracting background based on texture and boundary features, distinguishing between hands and background using color histograms and HOG classifiers,) making them not very robust. For example, these algorithms might get confused if the background is unusual or in situations where sharp changes in lighting conditions cause sharp changes in skin color or the tracked object becomes occluded.(see [here for a review](https://www.cse.unr.edu/~bebis/handposerev.pdf) paper on hand pose estimation from the HCI perspective) With sufficiently large datasets, neural networks provide opportunity to train models that perform well and address challenges of existing object tracking/detection algorithms - varied/poor lighting, noisy environments, diverse viewpoints and even occlusion. The main drawbacks to usage for real-time tracking/detection is that they can be complex, are relatively slow compared to tracking-only algorithms and it can be quite expensive to assemble a good dataset. But things are changing with advances in fast neural networks. Furthermore, this entire area of work has been made more approachable by deep learning frameworks (such as the tensorflow object detection api) that simplify the process of training a model for custom object detection. More importantly, the advent of fast neural network models like ssd, faster r-cnn, rfcn (see [here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md#coco-trained-models-coco-models) ) etc make neural networks an attractive candidate for real-time detection (and tracking) applications. Hopefully, this repo demonstrates this. > If you are not interested in the process of training the detector, you can skip straight to applying the [pretrained model I provide in detecting hands](#detecting-hands). Training a model is a multi-stage process (assembling dataset, cleaning, splitting into training/test partitions and generating an inference graph). While I lightly touch on the details of these parts, there are a few other tutorials cover training a custom object detector using the tensorflow object detection api in more detail[ see [here](https://pythonprogramming.net/training-custom-objects-tensorflow-object-detection-api-tutorial/) and [here](https://towardsdatascience.com/how-to-train-your-own-object-detector-with-tensorflows-object-detector-api-bec72ecfe1d9) ]. I recommend you walk through those if interested in training a custom object detector from scratch. ## Data preparation and network training in Tensorflow (Dataset, Import, Training) **The Egohands Dataset** The hand detector model is built using data from the [Egohands Dataset](http://vision.soic.indiana.edu/projects/egohands/) dataset. This dataset works well for several reasons. It contains high quality, pixel level annotations (>15000 ground truth labels) where hands are located across 4800 images. All images are captured from an egocentric view (Google glass) across 48 different environments (indoor, outdoor) and activities (playing cards, chess, jenga, solving puzzles etc). <img src="images/egohandstrain.jpg" width="100%"> If you will be using the Egohands dataset, you can cite them as follows: > Bambach, Sven, et al. "Lending a hand: Detecting hands and recognizing activities in complex egocentric interactions." Proceedings of the IEEE International Conference on Computer Vision. 2015. The Egohands dataset (zip file with labelled data) contains 48 folders of locations where video data was collected (100 images per folder). ``` -- LOCATION_X -- frame_1.jpg -- frame_2.jpg ... -- frame_100.jpg -- polygons.mat // contains annotations for all 100 images in current folder -- LOCATION_Y -- frame_1.jpg -- frame_2.jpg ... -- frame_100.jpg -- polygons.mat // contains annotations for all 100 images in current folder ``` **Converting data to Tensorflow Format** Some initial work needs to be done to the Egohands dataset to transform it into the format (`tfrecord`) which Tensorflow needs to train a model. This repo contains `egohands_dataset_clean.py` a script that will help you generate these csv files. - Downloads the egohands datasets - Renames all files to include their directory names to ensure each filename is unique - Splits the dataset into train (80%), test (10%) and eval (10%) folders. - Reads in `polygons.mat` for each folder, generates bounding boxes and visualizes them to ensure correctness (see image above). - Once the script is done running, you should have an images folder containing three folders - train, test and eval. Each of these folders should also contain a csv label document each - `train_labels.csv`, `test_labels.csv` that can be used to generate `tfrecords` Note: While the egohands dataset provides four separate labels for hands (own left, own right, other left, and other right), for my purpose, I am only interested in the general `hand` class and label all training data as `hand`. You can modify the data prep script to generate `tfrecords` that support 4 labels. Next: convert your dataset + csv files to tfrecords. A helpful guide on this can be found [here](https://pythonprogramming.net/creating-tfrecord-files-tensorflow-object-detection-api-tutorial/).For each folder, you should be able to generate `train.record`, `test.record` required in the training process. ## Training the hand detection Model Now that the dataset has been assembled (and your tfrecords), the next task is to train a model based on this. With neural networks, it is possible to use a process called [transfer learning](https://www.tensorflow.org/tutorials/image_retraining) to shorten the amount of time needed to train the entire model. This means we can take an existing model (that has been trained well on a related domain (here image classification) and retrain its final layer(s) to detect hands for us. Sweet!. Given that neural networks sometimes have thousands or millions of parameters that can take weeks or months to train, transfer learning helps shorten training time to possibly hours. Tensorflow does offer a few models (in the tensorflow [model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md#coco-trained-models-coco-models)) and I chose to use the `ssd_mobilenet_v1_coco` model as my start point given it is currently (one of) the fastest models (read the SSD research [paper here](https://arxiv.org/pdf/1512.02325.pdf)). The training process can be done locally on your CPU machine which may take a while or better on a (cloud) GPU machine (which is what I did). For reference, training on my macbook pro (tensorflow compiled from source to take advantage of the mac's cpu architecture) the maximum speed I got was 5 seconds per step as opposed to the ~0.5 seconds per step I got with a GPU. For reference it would take about 12 days to run 200k steps on my mac (i7, 2.5GHz, 16GB) compared to ~5hrs on a GPU. > **Training on your own images**: Please use the [guide provided by Harrison from pythonprogramming](https://pythonprogramming.net/training-custom-objects-tensorflow-object-detection-api-tutorial/) on how to generate tfrecords given your label csv files and your images. The guide also covers how to start the training process if training locally. [see [here] (https://pythonprogramming.net/training-custom-objects-tensorflow-object-detection-api-tutorial/)]. If training in the cloud using a service like GCP, see the [guide here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_cloud.md). As the training process progresses, the expectation is that total loss (errors) gets reduced to its possible minimum (about a value of 1 or thereabout). By observing the tensorboard graphs for total loss(see image below), it should be possible to get an idea of when the training process is complete (total loss does not decrease with further iterations/steps). I ran my training job for 200k steps (took about 5 hours) and stopped at a total Loss (errors) value of 2.575.(In retrospect, I could have stopped the training at about 50k steps and gotten a similar total loss value). With tensorflow, you can also run an evaluation concurrently that assesses your model to see how well it performs on the test data. A commonly used metric for performance is mean average precision (mAP) which is single number used to summarize the area under the precision-recall curve. mAP is a measure of how well the model generates a bounding box that has at least a 50% overlap with the ground truth bounding box in our test dataset. For the hand detector trained here, the mAP value was **0.9686@0.5IOU**. mAP values range from 0-1, the higher the better. <img src="images/accuracy.jpg" width="100%"> Once training is completed, the trained inference graph (`frozen_inference_graph.pb`) is then exported (see the earlier referenced guides for how to do this) and saved in the `hand_inference_graph` folder. Now its time to do some interesting detection. ## Using the Detector to Detect/Track hands If you have not done this yet, please following the guide on installing [Tensorflow and the Tensorflow object detection api](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md). This will walk you through setting up the tensorflow framework, cloning the tensorflow github repo and a guide on - Load the `frozen_inference_graph.pb` trained on the hands dataset as well as the corresponding label map. In this repo, this is done in the `utils/detector_utils.py` script by the `load_inference_graph` method. ```python detection_graph = tf.Graph() with detection_graph.as_default(): od_graph_def = tf.GraphDef() with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid: serialized_graph = fid.read() od_graph_def.ParseFromString(serialized_graph) tf.import_graph_def(od_graph_def, name='') sess = tf.Session(graph=detection_graph) print("> ====== Hand Inference graph loaded.") ``` - Detect hands. In this repo, this is done in the `utils/detector_utils.py` script by the `detect_objects` method. ```python (boxes, scores, classes, num) = sess.run( [detection_boxes, detection_scores, detection_classes, num_detections], feed_dict={image_tensor: image_np_expanded}) ``` - Visualize detected bounding detection_boxes. In this repo, this is done in the `utils/detector_utils.py` script by the `draw_box_on_image` method. This repo contains two scripts that tie all these steps together. - detect_multi_threaded.py : A threaded implementation for reading camera video input detection and detecting. Takes a set of command line flags to set parameters such as `--display` (visualize detections), image parameters `--width` and `--height`, videe `--source` (0 for camera) etc. - detect_single_threaded.py : Same as above, but single threaded. This script works for video files by setting the video source parameter videe `--source` (path to a video file). ```cmd # load and run detection on video at path "videos/chess.mov" python detect_single_threaded.py --source videos/chess.mov ``` > Update: If you do have errors loading the frozen inference graph in this repo, feel free to generate a new graph that fits your TF version from the model-checkpoint in this repo. Use the [export_inference_graph.py](https://github.com/tensorflow/models/blob/master/research/object_detection/export_inference_graph.py) script provided in the tensorflow object detection api repo. More guidance on this [here](https://pythonprogramming.net/testing-custom-object-detector-tensorflow-object-detection-api-tutorial/?completed=/training-custom-objects-tensorflow-object-detection-api-tutorial/). ## Thoughts on Optimization. A few things that led to noticeable performance increases. - Threading: Turns out that reading images from a webcam is a heavy I/O event and if run on the main application thread can slow down the program. I implemented some good ideas from [Adrian Rosebuck](https://www.pyimagesearch.com/2017/02/06/faster-video-file-fps-with-cv2-videocapture-and-opencv/) on parrallelizing image capture across multiple worker threads. This mostly led to an FPS increase of about 5 points. - For those new to Opencv, images from the `cv2.read()` method return images in [BGR format](https://www.learnopencv.com/why-does-opencv-use-bgr-color-format/). Ensure you convert to RGB before detection (accuracy will be much reduced if you dont). ```python cv2.cvtColor(image_np, cv2.COLOR_BGR2RGB) ``` - Keeping your input image small will increase fps without any significant accuracy drop.(I used about 320 x 240 compared to the 1280 x 720 which my webcam provides). - Model Quantization. Moving from the current 32 bit to 8 bit can achieve up to 4x reduction in memory required to load and store models. One way to further speed up this model is to explore the use of [8-bit fixed point quantization](https://heartbeat.fritz.ai/8-bit-quantization-and-tensorflow-lite-speeding-up-mobile-inference-with-low-precision-a882dfcafbbd). Performance can also be increased by a clever combination of tracking algorithms with the already decent detection and this is something I am still experimenting with. Have ideas for optimizing better, please share! <img src="images/general.jpg" width="100%"> Note: The detector does reflect some limitations associated with the training set. This includes non-egocentric viewpoints, very noisy backgrounds (e.g in a sea of hands) and sometimes skin tone. There is opportunity to improve these with additional data. ## Integrating Multiple DNNs. One way to make things more interesting is to integrate our new knowledge of where "hands" are with other detectors trained to recognize other objects. Unfortunately, while our hand detector can in fact detect hands, it cannot detect other objects (a factor or how it is trained). To create a detector that classifies multiple different objects would mean a long involved process of assembling datasets for each class and a lengthy training process. > Given the above, a potential strategy is to explore structures that allow us **efficiently** interleave output form multiple pretrained models for various object classes and have them detect multiple objects on a single image. An example of this is with my primary use case where I am interested in understanding the position of objects on a table with respect to hands on same table. I am currently doing some work on a threaded application that loads multiple detectors and outputs bounding boxes on a single image. More on this soon.
abusufyanvu
MIT Introduction to Deep Learning (6.S191) Instructors: Alexander Amini and Ava Soleimany Course Information Summary Prerequisites Schedule Lectures Labs, Final Projects, Grading, and Prizes Software labs Gather.Town lab + Office Hour sessions Final project Paper Review Project Proposal Presentation Project Proposal Grading Rubric Past Project Proposal Ideas Awards + Categories Important Links and Emails Course Information Summary MIT's introductory course on deep learning methods with applications to computer vision, natural language processing, biology, and more! Students will gain foundational knowledge of deep learning algorithms and get practical experience in building neural networks in TensorFlow. Course concludes with a project proposal competition with feedback from staff and a panel of industry sponsors. Prerequisites We expect basic knowledge of calculus (e.g., taking derivatives), linear algebra (e.g., matrix multiplication), and probability (e.g., Bayes theorem) -- we'll try to explain everything else along the way! Experience in Python is helpful but not necessary. This class is taught during MIT's IAP term by current MIT PhD researchers. Listeners are welcome! Schedule Monday Jan 18, 2021 Lecture: Introduction to Deep Learning and NNs Lab: Lab 1A Tensorflow and building NNs from scratch Tuesday Jan 19, 2021 Lecture: Deep Sequence Modelling Lab: Lab 1B Music Generation using RNNs Wednesday Jan 20, 2021 Lecture: Deep Computer Vision Lab: Lab 2A Image classification and detection Thursday Jan 21, 2021 Lecture: Deep Generative Modelling Lab: Lab 2B Debiasing facial recognition systems Friday Jan 22, 2021 Lecture: Deep Reinforcement Learning Lab: Lab 3 pixel-to-control planning Monday Jan 25, 2021 Lecture: Limitations and New Frontiers Lab: Lab 3 continued Tuesday Jan 26, 2021 Lecture (part 1): Evidential Deep Learning Lecture (part 2): Bias and Fairness Lab: Work on final assignments Lab competition entries due at 11:59pm ET on Canvas! Lab 1, Lab 2, and Lab 3 Wednesday Jan 27, 2021 Lecture (part 1): Nigel Duffy, Ernst & Young Lecture (part 2): Kate Saenko, Boston University and MIT-IBM Watson AI Lab Lab: Work on final assignments Assignments due: Sign up for Final Project Competition Thursday Jan 28, 2021 Lecture (part 1): Sanja Fidler, U. Toronto, Vector Institute, and NVIDIA Lecture (part 2): Katherine Chou, Google Lab: Work on final assignments Assignments due: 1 page paper review (if applicable) Friday Jan 29, 2021 Lecture: Student project pitch competition Lab: Awards ceremony and prize giveaway Assignments due: Project proposals (if applicable) Lectures Lectures will be held starting at 1:00pm ET from Jan 18 - Jan 29 2021, Monday through Friday, virtually through Zoom. Current MIT students, faculty, postdocs, researchers, staff, etc. will be able to access the lectures during this two week period, synchronously or asynchronously, via the MIT Canvas course webpage (MIT internal only). Lecture recordings will be uploaded to the Canvas as soon as possible; students are not required to attend any lectures synchronously. Please see the Canvas for details on Zoom links. The public edition of the course will only be made available after completion of the MIT course. Labs, Final Projects, Grading, and Prizes Course will be graded during MIT IAP for 6 units under P/D/F grading. Receiving a passing grade requires completion of each software lab project (through honor code, with submission required to enter lab competitions), a final project proposal/presentation or written review of a deep learning paper (submission required), and attendance/lecture viewing (through honor code). Submission of a written report or presentation of a project proposal will ensure a passing grade. MIT students will be eligible for prizes and awards as part of the class competitions. There will be two parts to the competitions: (1) software labs and (2) final projects. More information is provided below. Winners will be announced on the last day of class, with thousands of dollars of prizes being given away! Software labs There are three TensorFlow software lab exercises for the course, designed as iPython notebooks hosted in Google Colab. Software labs can be found on GitHub: https://github.com/aamini/introtodeeplearning. These are self-paced exercises and are designed to help you gain practical experience implementing neural networks in TensorFlow. For registered MIT students, submission of lab materials is not necessary to get credit for the course or to pass the course. At the end of each software lab there will be task-associated materials to submit (along with instructions) for entry into the competitions, open to MIT students and affiliates during the IAP offering. This includes MIT students/affiliates who are taking the class as listeners -- you are eligible! These instructions are provided at the end of each of the labs. Completing these tasks and submitting your materials to Canvas will enter you into a per-lab competition. MIT students and affiliates will be eligible for prizes during the IAP offering; at the end of the course, prize-winners will be awarded with their prizes. All competition submissions are due on January 26 at 11:59pm ET to Canvas. For the software lab competitions, submissions will be judged on the basis of the following criteria: Strength and quality of final results (lab dependent) Soundness of implementation and approach Thoroughness and quality of provided descriptions and figures Gather.Town lab + Office Hour sessions After each day’s lecture, there will be open Office Hours in the class GatherTown, up until 3pm ET. An MIT email is required to log in and join the GatherTown. During these sessions, there will not be a walk through or dictation of the labs; the labs are designed to be self-paced and to be worked on on your own time. The GatherTown sessions will be hosted by course staff and are held so you can: Ask questions on course lectures, labs, logistics, project, or anything else; Work on the labs in the presence of classmates/TAs/instructors; Meet classmates to find groups for the final project; Group work time for the final project; Bring the class community together. Final project To satisfy the final project requirement for this course, students will have two options: (1) write a 1 page paper review (single-spaced) on a recent deep learning paper of your choice or (2) participate and present in the project proposal pitch competition. The 1 page paper review option is straightforward, we propose some papers within this document to help you get started, and you can satisfy a passing grade with this option -- you will not be eligible for the grand prizes. On the other hand, participation in the project proposal pitch competition will equivalently satisfy your course requirements but additionally make you eligible for the grand prizes. See the section below for more details and requirements for each of these options. Paper Review Students may satisfy the final project requirement by reading and reviewing a recent deep learning paper of their choosing. In the written review, students should provide both: 1) a description of the problem, technical approach, and results of the paper; 2) critical analysis and exposition of the limitations of the work and opportunities for future work. Reviews should be submitted on Canvas by Thursday Jan 28, 2021, 11:59:59pm Eastern Time (ET). Just a few paper options to consider... https://papers.nips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf https://papers.nips.cc/paper/2018/file/69386f6bb1dfed68692a24c8686939b9-Paper.pdf https://papers.nips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf https://science.sciencemag.org/content/362/6419/1140 https://papers.nips.cc/paper/2018/file/0e64a7b00c83e3d22ce6b3acf2c582b6-Paper.pdf https://arxiv.org/pdf/1906.11829.pdf https://www.nature.com/articles/s42256-020-00237-3 https://pubmed.ncbi.nlm.nih.gov/32084340/ Project Proposal Presentation Keyword: proposal This is a 2 week course so we do not require results or working implementations! However, to win the top prizes, nice, clear results and implementations will demonstrate feasibility of your proposal which is something we look for! Logistics -- please read! You must sign up to present before 11:59:59pm Eastern Time (ET) on Wednesday Jan 27, 2021 Slides must be in a Google Slide before 11:59:59pm Eastern Time (ET) on Thursday Jan 28, 2021 Project groups can be between 1 and 5 people Listeners welcome To be eligible for a prize you must have at least 1 registered MIT student in your group Each participant will only be allowed to be in one group and present one project pitch Synchronous attendance on 1/29/21 is required to make the project pitch! 3 min presentation on your idea (we will be very strict with the time limits) Prizes! (see below) Sign up to Present here: by 11:59pm ET on Wednesday Jan 27 Once you sign up, make your slide in the following Google Slides; submit by midnight on Thursday Jan 28. Please specify the project group # on your slides!!! Things to Consider This doesn’t have to be a new deep learning method. It can just be an interesting application that you apply some existing deep learning method to. What problem are you solving? Are there use cases/applications? Why do you think deep learning methods might be suited to this task? How have people done it before? Is it a new task? If so, what are similar tasks that people have worked on? In what aspects have they succeeded or failed? What is your method of solving this problem? What type of model + architecture would you use? Why? What is the data for this task? Do you need to make a dataset or is there one publicly available? What are the characteristics of the data? Is it sparse, messy, imbalanced? How would you deal with that? Project Proposal Grading Rubric Project proposals will be evaluated by a panel of judges on the basis of the following three criteria: 1) novelty and impact; 2) technical soundness, feasibility, and organization, including quality of any presented results; 3) clarity and presentation. Each judge will award a score from 1 (lowest) to 5 (highest) for each of the criteria; the average score from each judge across these criteria will then be averaged with that of the other judges to provide the final score. The proposals with the highest final scores will be selected for prizes. Here are the guidelines for the criteria: Novelty and impact: encompasses the potential impact of the project idea, its novelty with respect to existing approaches. Why does the proposed work matter? What problem(s) does it solve? Why are these problems important? Technical soundness, feasibility, and organization: encompasses all technical aspects of the proposal. Do the proposed methodology and architecture make sense? Is the architecture the best suited for the proposed problem? Is deep learning the best approach for the problem? How realistic is it to implement the idea? Was there any implementation of the method? If results and data are presented, we will evaluate the strength of the results/data. Clarity and presentation: encompasses the delivery and quality of the presentation itself. Is the talk well organized? Are the slides aesthetically compelling? Is there a clear, well-delivered narrative? Are the problem and proposed method clearly presented? Past Project Proposal Ideas Recipe Generation with RNNs Can we compress videos with CNN + RNN? Music Generation with RNNs Style Transfer Applied to X GAN’s on a new modality Summarizing text/news articles Combining news articles about similar events Code or spec generation Multimodal speech → handwriting Generate handwriting based on keywords (i.e. cursive, slanted, neat) Predicting stock market trends Show language learners articles or videos at their level Transfer of writing style Chemical Synthesis with Recurrent Neural networks Transfer learning to learn something in a domain for which it’s hard or risky to gather data or do training RNNs to model some type of time series data Computer vision to coach sports players Computer vision system for safety brakes or warnings Use IBM Watson API to get the sentiment of your Facebook newsfeed Deep learning webcam to give wifi-access to friends or improve video chat in some way Domain-specific chatbot to help you perform a specific task Detect whether a signature is fraudulent Awards + Categories Final Project Awards: 1x NVIDIA RTX 3080 4x Google Home Max 3x Display Monitors Software Lab Awards: Bose headphones (Lab 1) Display monitor (Lab 2) Bebop drone (Lab 3) Important Links and Emails Course website: http://introtodeeplearning.com Course staff: introtodeeplearning-staff@mit.edu Piazza forum (MIT only): https://piazza.com/mit/spring2021/6s191 Canvas (MIT only): https://canvas.mit.edu/courses/8291 Software lab repository: https://github.com/aamini/introtodeeplearning Lab/office hour sessions (MIT only): https://gather.town/app/56toTnlBrsKCyFgj/MITDeepLearning
Milkshiift
⚡ A lightweight, self-hosted YouTube video summarizer with Gemini AI
tpai
An AI-powered text summarization Telegram bot that generates concise summaries of text, URLs, PDFs, and YouTube videos.
DevRico003
A modern Next.js-based tool for AI-powered YouTube video summarization. Features smart chapter detection with clickable timestamps, multi-language support (EN, DE, FR, ES, IT), visual chapter timelines, and full transcript access with markdown export.
talosross
Summarize YouTube-Videos, Articles, Images and Documents with AI
Long-louis
Real-time content creator update monitor with AI-powered video summarization and Feishu bot integration.
Anil-matcha
Chat with any Youtube video. Easily input the video url you'd like to chat with. Instant answers. Ask questions, extract information, and summarize documents with AI. Sources included.
DataAnts-AI
Process OBS recordings (or Any video files) with AI-based transcription and summarization.
siddharthsky
Summarize Youtube Videos and Generate Timestamps Efficiently using LLM [Google Gemini Pro, OpenAI GPT]
Amirrezahmi
Engage in conversation with your virtual self using AI techniques like NLP, voice cloning, and computer vision. Get accurate answers with generated keywords, phrases, images, videos. Summarization included. Synthesize voice, create talking-head video. Seek advice, career guidance, satisfy curiosity. Endless possibilities.
jonaskahn
AskTube - An AI-powered YouTube video summarizer and QA assistant powered by Retrieval Augmented Generation (RAG) 🤖. Run it entirely on your local machine with Ollama, or cloud-based models like Claude, OpenAI, Gemini, Mistral, and more.
destinyfrancis
Open CLAW Knowledge Distiller · 龍蝦知識蒸餾器 — Turn YouTube/Bilibili videos into structured knowledge articles. Local Qwen3-ASR MLX + AI summarization. MCP server for Claude Code / Open CLAW agents.
sidedwards
An AI-powered tool for transcribing, summarizing, and creating smart clips from video and audio content.
HHousen
Convert lecture videos to notes using AI & machine learning. Code for the research titled "Lecture2Notes: Summarizing Lecture Videos by Classifying Slides and Analyzing Text using Machine Learning."
oztrkoguz
An AI-powered tool for summarizing YouTube videos by generating scene descriptions, translating them, and creating subtitled videos with text-to-speech narration
yizhiyanhua-ai
Claude Code Skill: Browse, summarize and capture AI-related YouTube videos
recally-io
Recally ✨ - AI-powered memory assistant for organizing digital content. Capture web articles, Twitter threads, videos, and more via browser extensions or Telegram. Features AI summarization, smart tagging, semantic search, and self-hosting. Open source (AGPLv3) alternative to Pocket/Readwise with enhanced AI capabilities and privacy control.
walksoda
Crawl4AI MCP Server: Extract content from web pages, PDFs, Office docs, YouTube videos with AI-powered summarization. 17 tools, token reduction, production-ready.
neuron-core
AI Agents summarizing YouTube videos - built with Neuron PHP AI framework.
HazWahb8080
summarize and chat with any youtube video with AI
SomSingh23
Virtual patient-doctor connections via video calls, with additional chatbot and AI doctor interaction options. Patients can conveniently upload reports for OCR-based summarization.
psycho-baller
A browser extension that allows you to highlight, tag, annotate, and export the best parts of your favorite YouTube videos and summarize these snips with the power of AI. Winner of Boost Hacks
parsakhaz
A powerful video summarization tool that utilizes Moondream alongside multiple AI models to provide comprehensive video understanding through audio transcription, intelligent frame selection, visual description, and content summarization.
generative-ai-on-aws
SummarizeMe is an AI-powered app for transcribing meeting recordings, generating summarized meeting notes, extracting key points and action items, and creating video summaries.
Harshit-shrivastav
A telegram bot that can Summarize youtube video using AI
TanayGhanshyam
We have come a long way since I was a child in the 1960s when all I wanted for Christmas was a slinky and some Rock’Em – Sock’Em Robots. Now imagine we have traveled ten years into the future, and it is Christmas 2031. Alexa has replaced kids’ parents and Santa Claus. Every toy is connected to the Internet and looks like a robot version of the animal it represents. Clean thermonuclear Christmas trees will be providing us with radiant, gamma-ray energy for all our holiday needs. Pogo sticks have also made a comeback, but they are solar-powered and can leap entire city blocks. And while I am busy pretending to be the Ghost of Christmas Future, I thought it would also be fun to ask the Office of the CTO team about their predictions for futuristic, technical toys. So, I posed these two questions: What cool TECHNICAL toy or gadget would you like Santa to bring you this year in 2021? As a participating member of the Office of the CTO, what cool TECHNICAL toy or gadget (that has not yet been invented) would you like Santa to bring you in 10 years from now in 2031? christmas wishlist for the octo team overlay You know what? We just might see I see a sneak preview of some of these magical tech toys of the future in just a few weeks at the CES 2022 conference. In the meantime, take a look at the wish list from all of our Extreme technical gurus: Marcus Burton – Wireless and Cloud Architect Christmas Wish 2021: Is a Tesla Cybertruck an option? I’ll even take a prototype. That will scratch several technology itches at the same time. Think about it…EV, autonomous driving, AI, 5G probably, cloud-connected, mobile-first, and all the best in materials sciences and mechanical engineering applied to trucks. What more could an outdoorsy tech guy want? Christmas Wish 2031: I’m kinda thinking that while everyone else has their brain slurped out in the metaverse (with VR!), I will prefer to go to the actual mountains. But you know, I have a wife and kids, so I have to think about safety. So here’s my wish: a smart personal device that has a full week of battery life (using ultra-thin silicon wafers) with rapid solar charging, LEO satellite connectivity (for sending “eat your heart out” 3D pics to my friends from the “there’s no 6G here” wilderness), and ultra-HD terrain feature maps for modern navigation. Carla Guzzetti – VP, Experience, Messaging & Enablement Christmas Wish 2021: I want this: Meeting Owl Pro – 360-Degree, 1080p HD Smart Video Conference Camera, Microphone, and Speaker Christmas Wish 2031: I want a gadget where we can have virtual meetings without the need for a wearable! Who wants to wear heavy goggles all day? Doug McDonald – Director of Product Management Christmas Wish 2021: As a technologist often looking for a balance between screen time and health and fitness I hope Santa brings me the Aura Strap. The Aura strap adds additional IoT sensory capabilities to compliment your Apple smartwatch. Bioelectrical impedance analysis is the cutting-edge science behind the AURA Strap. This innovation provides a way to truly see how your body changes over the course of a day. Their body composition analysis includes fat, muscle mass, minerals, and hydration; providing personalized insights that improve the results of your workouts, diet, and your lifestyle as a whole. Christmas Wish 2031: Hopefully, this innovation will be here sooner. Still, in the spirit of my first wish from Santa, I also hope to have a service engine warning light for me. The concept is utilizing advancements in biomedical sensory devices to pinpoint potential changes in your physical metrics that may help in seeking medical attention sooner than later if variances in health data occur. I spoke about this concept in the Digital Diagnosis episode of the Inflection Points podcast from the Office of the CTO. Ed Koehler – Principal Engineer Christmas Wish 2021: My answers are short and sweet. I want a nice drone with high-resolution pan, tilt, and zoom (PTZ) cameras. Christmas Wish 2031: In ten years, I want a drone that I can sit inside and fly away! Puneet Sehgal – Business Initiatives Program Manager Christmas Wish 2021: I have always wanted to enjoy the world from a bird’s eye view. Therefore, my wish is for Santa to bring me a good-quality drone camera this year. It is amazing how quickly drones have evolved from commercial /military use to becoming a personal gadget. Christmas Wish 2031: In 2031, I wish Santa could get me a virtual reality (VR) trainer to help me internalize physical motion by looking at a simulation video while sending an electrical impulse to mimic it. It will open endless possibilities, and I could become an ice skater, a karate expert, or a pianist – all in one. Maybe similar research is already being done, but we are far away from something like this maturing for practical use. So, who knows – it’s Santa after all and we are talking 2031! Tim Harrison – Director of Product Marketing, Service Provider Christmas Wish 2021: This year, I would love to extend my audio recording setup and move from a digital 24 channel mixer to a control surface that integrates with my DAW (digital audio workstation) and allows me to use my outboard microphone pre-amps. I’ve been looking at an ICON QCon Pro G2 plus one QCon EX G2 extender to give me direct control over 16 channels at once (I use 16 channels just for my drum kit). Christmas Wish 2031: Ten years from now, I sincerely hope to receive an anti-gravity platform. First, I’ll be old, and climbing stairs will have become more challenging for these creaky old bones. Secondly, who hasn’t hoped for a REAL hoverboard? Once we know what gravity is “made of,” we can start making it easier to manipulate objects on earth and make space more habitable for human physiology. Either that or a puppy. Puppy sitting Divya Balu Pazhayannur – Director of Business Initiatives Christmas Wish 2021: I’m upgrading parts of my house over the holidays and browsing online for kitchen and laundry appliances. If you had told me that I would be spending three hours reading blogs on choosing the right cooktop for me, I would not have believed you. Does it have the right power, is it reliable, is it Wi-Fi enabled, can you talk to it – I’m kidding on that last one. Having said that, I’d love to get the Bosch Benchmark Gas Stovetop. Although I can’t speak to my appliance, its minimalist look has me writing it down on my wish list for Santa. I’ll even offer him some crispy dosas in exchange. Christmas Wish 2031: Apart from flying cars and personal robot assistants, I’d love to get the gift of better connectivity. I miss my family and friends in India, and it would be amazing to engage with them through holographic technology. I imagine it would allow for a much higher level of communication than today’s ‘talking head’ approach. Although do I want my family sitting with me in my living room? Still – I’d like to think a holograph would be just fantastic. Yury Ostrovsky – Sr. Technology Manager Christmas Wish 2021: I believe 2022 will be the year of VR toys. Virtual Reality is already popular, but I believe more applications will be developed in this area. We might see radio waves coming from different sources (Wi-Fi, LTE, 5G, BT, etc.) and visualize propagation in real-time. Christmas Wish 2031: “Prediction is very difficult, especially if it’s about the future” – Niels Bohr Kurt Semba – Principal Architect Christmas Wish 2021: The Crown from Neurosity. It helps you get and stay in a deep focus to improve your work and gaming results. Christmas Wish 2031: A non-evasive health device that can quickly look deep into your body and cells and explain why you are not feeling well today. Jon Filson – Senior Producer, Content Christmas Wish 2021: I want a large rollable TV by LG. In part because I watch a lot of football. And while I have a Smart TV, I still can’t get it to connect to my Bluetooth speaker … so while I love it, I want it to work better, and isn’t that so often the way with tech? But more than that, I don’t like and have never liked that rooms have to be designed around TVs. They are big, which is fine, but they are often in the way, which is less so. They should disappear when not in use. It’s $100,000 so I don’t expect it any time soon. But it’s an idea whose time has come. Christmas Wish 2031: I cheated on this one and asked my 12-year-old son Jack what he would want. It’s the portal gun, from Rick and Morty, a show in which a crazed scientist named Rick takes his grandson Morty on wacky adventures in a multi-verse. That last part is important to me. Kids today are already well into multi-verses, while we adults are just struggling to make one decent Metaverse. The next generation is already way ahead of us digitally speaking, it’s clear. Alexey Reznik – Senior UX Designer Christmas Wish 2021: This awesome toy: DJI Mavic 2 Pro – Drone Quadcopter UAV with Hasselblad Camera 3-Axis Gimbal HDR 4K Video Adjustable Aperture 20MP 1″ CMOS Sensor, up to 48mph, Gray Christmas Wish 2031: Something along these lines: BMW Motorrad VISION NEXT 100 BMW Motorcycle Michael Rash – Distinguished Engineer – Security Christmas Wish 2021: Satechi USB-C Multiport MX Adapter – Dual 4K HDMI. Christmas Wish 2031: A virtual reality headset that actually works. Alena Amir – Senior Content and Communications Manager Christmas Wish 2021: With conversations around VR/AR and the metaverse taking the world by storm, Santa could help out with an Oculus Quest. Purely for research purposes of course! Christmas Wish 2031: The 1985 movie, Back to the Future, was a family favorite and sure we didn’t get it all exactly right by 2015 but hey, it’s almost 2022! About time we get those hoverboards! David Coleman – Director of Wireless Christmas Wish 2021: Well, it looks like drones are the #1 wish item for 2021, and I am no exception. My wife and I just bought a home in the mountains of Blue Ridge, Georgia, where there is an abundance of wildlife. I want a state-of-the-art drone for bear surveillance. Christmas Wish 2031: In ten years, I will be 71 years old, and I hope to be at least semi-retired and savoring the fruits of my long tech career. Even though we are looking to the future, I want a time machine to revisit the past. I would travel back to July 16th, 1969, and watch Apollo 11 liftoff from Cape Kennedy to the moon. I actually did that as a nine-year-old kid. Oh, and I would also travel back to 1966 and play with my Rock’Em – Sock’Em Robots. Rock'em Sock'em Robots To summarize, our peeps in the Office of the CTO all envision Christmas 2031, where the way we interact as a society will have progressed. In 2021, we already have unlimited access to information, so future tech toys might depend less on magical new technologies and more on the kinds of experiences these new technologies can create. And when those experiences can be shared across the globe in real-time, the world gains an opportunity to learn from each other and grow together in ways that would never have been possible.
dimhunter125
Автономный ИИ-секретарь на базе DeepSeek-R1 и Faster-Whisper для расшифровки и анализа видео
AIAnytime
This is the official repo of Text Summarizer Streamlit App video from AI Anytime YouTube channel.