Found 277 repositories(showing 30)
molyswu
using Neural Networks (SSD) on Tensorflow. This repo documents steps and scripts used to train a hand detector using Tensorflow (Object Detection API). As with any DNN based task, the most expensive (and riskiest) part of the process has to do with finding or creating the right (annotated) dataset. I was interested mainly in detecting hands on a table (egocentric view point). I experimented first with the [Oxford Hands Dataset](http://www.robots.ox.ac.uk/~vgg/data/hands/) (the results were not good). I then tried the [Egohands Dataset](http://vision.soic.indiana.edu/projects/egohands/) which was a much better fit to my requirements. The goal of this repo/post is to demonstrate how neural networks can be applied to the (hard) problem of tracking hands (egocentric and other views). Better still, provide code that can be adapted to other uses cases. If you use this tutorial or models in your research or project, please cite [this](#citing-this-tutorial). Here is the detector in action. <img src="images/hand1.gif" width="33.3%"><img src="images/hand2.gif" width="33.3%"><img src="images/hand3.gif" width="33.3%"> Realtime detection on video stream from a webcam . <img src="images/chess1.gif" width="33.3%"><img src="images/chess2.gif" width="33.3%"><img src="images/chess3.gif" width="33.3%"> Detection on a Youtube video. Both examples above were run on a macbook pro **CPU** (i7, 2.5GHz, 16GB). Some fps numbers are: | FPS | Image Size | Device| Comments| | ------------- | ------------- | ------------- | ------------- | | 21 | 320 * 240 | Macbook pro (i7, 2.5GHz, 16GB) | Run without visualizing results| | 16 | 320 * 240 | Macbook pro (i7, 2.5GHz, 16GB) | Run while visualizing results (image above) | | 11 | 640 * 480 | Macbook pro (i7, 2.5GHz, 16GB) | Run while visualizing results (image above) | > Note: The code in this repo is written and tested with Tensorflow `1.4.0-rc0`. Using a different version may result in [some errors](https://github.com/tensorflow/models/issues/1581). You may need to [generate your own frozen model](https://pythonprogramming.net/testing-custom-object-detector-tensorflow-object-detection-api-tutorial/?completed=/training-custom-objects-tensorflow-object-detection-api-tutorial/) graph using the [model checkpoints](model-checkpoint) in the repo to fit your TF version. **Content of this document** - Motivation - Why Track/Detect hands with Neural Networks - Data preparation and network training in Tensorflow (Dataset, Import, Training) - Training the hand detection Model - Using the Detector to Detect/Track hands - Thoughts on Optimizations. > P.S if you are using or have used the models provided here, feel free to reach out on twitter ([@vykthur](https://twitter.com/vykthur)) and share your work! ## Motivation - Why Track/Detect hands with Neural Networks? There are several existing approaches to tracking hands in the computer vision domain. Incidentally, many of these approaches are rule based (e.g extracting background based on texture and boundary features, distinguishing between hands and background using color histograms and HOG classifiers,) making them not very robust. For example, these algorithms might get confused if the background is unusual or in situations where sharp changes in lighting conditions cause sharp changes in skin color or the tracked object becomes occluded.(see [here for a review](https://www.cse.unr.edu/~bebis/handposerev.pdf) paper on hand pose estimation from the HCI perspective) With sufficiently large datasets, neural networks provide opportunity to train models that perform well and address challenges of existing object tracking/detection algorithms - varied/poor lighting, noisy environments, diverse viewpoints and even occlusion. The main drawbacks to usage for real-time tracking/detection is that they can be complex, are relatively slow compared to tracking-only algorithms and it can be quite expensive to assemble a good dataset. But things are changing with advances in fast neural networks. Furthermore, this entire area of work has been made more approachable by deep learning frameworks (such as the tensorflow object detection api) that simplify the process of training a model for custom object detection. More importantly, the advent of fast neural network models like ssd, faster r-cnn, rfcn (see [here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md#coco-trained-models-coco-models) ) etc make neural networks an attractive candidate for real-time detection (and tracking) applications. Hopefully, this repo demonstrates this. > If you are not interested in the process of training the detector, you can skip straight to applying the [pretrained model I provide in detecting hands](#detecting-hands). Training a model is a multi-stage process (assembling dataset, cleaning, splitting into training/test partitions and generating an inference graph). While I lightly touch on the details of these parts, there are a few other tutorials cover training a custom object detector using the tensorflow object detection api in more detail[ see [here](https://pythonprogramming.net/training-custom-objects-tensorflow-object-detection-api-tutorial/) and [here](https://towardsdatascience.com/how-to-train-your-own-object-detector-with-tensorflows-object-detector-api-bec72ecfe1d9) ]. I recommend you walk through those if interested in training a custom object detector from scratch. ## Data preparation and network training in Tensorflow (Dataset, Import, Training) **The Egohands Dataset** The hand detector model is built using data from the [Egohands Dataset](http://vision.soic.indiana.edu/projects/egohands/) dataset. This dataset works well for several reasons. It contains high quality, pixel level annotations (>15000 ground truth labels) where hands are located across 4800 images. All images are captured from an egocentric view (Google glass) across 48 different environments (indoor, outdoor) and activities (playing cards, chess, jenga, solving puzzles etc). <img src="images/egohandstrain.jpg" width="100%"> If you will be using the Egohands dataset, you can cite them as follows: > Bambach, Sven, et al. "Lending a hand: Detecting hands and recognizing activities in complex egocentric interactions." Proceedings of the IEEE International Conference on Computer Vision. 2015. The Egohands dataset (zip file with labelled data) contains 48 folders of locations where video data was collected (100 images per folder). ``` -- LOCATION_X -- frame_1.jpg -- frame_2.jpg ... -- frame_100.jpg -- polygons.mat // contains annotations for all 100 images in current folder -- LOCATION_Y -- frame_1.jpg -- frame_2.jpg ... -- frame_100.jpg -- polygons.mat // contains annotations for all 100 images in current folder ``` **Converting data to Tensorflow Format** Some initial work needs to be done to the Egohands dataset to transform it into the format (`tfrecord`) which Tensorflow needs to train a model. This repo contains `egohands_dataset_clean.py` a script that will help you generate these csv files. - Downloads the egohands datasets - Renames all files to include their directory names to ensure each filename is unique - Splits the dataset into train (80%), test (10%) and eval (10%) folders. - Reads in `polygons.mat` for each folder, generates bounding boxes and visualizes them to ensure correctness (see image above). - Once the script is done running, you should have an images folder containing three folders - train, test and eval. Each of these folders should also contain a csv label document each - `train_labels.csv`, `test_labels.csv` that can be used to generate `tfrecords` Note: While the egohands dataset provides four separate labels for hands (own left, own right, other left, and other right), for my purpose, I am only interested in the general `hand` class and label all training data as `hand`. You can modify the data prep script to generate `tfrecords` that support 4 labels. Next: convert your dataset + csv files to tfrecords. A helpful guide on this can be found [here](https://pythonprogramming.net/creating-tfrecord-files-tensorflow-object-detection-api-tutorial/).For each folder, you should be able to generate `train.record`, `test.record` required in the training process. ## Training the hand detection Model Now that the dataset has been assembled (and your tfrecords), the next task is to train a model based on this. With neural networks, it is possible to use a process called [transfer learning](https://www.tensorflow.org/tutorials/image_retraining) to shorten the amount of time needed to train the entire model. This means we can take an existing model (that has been trained well on a related domain (here image classification) and retrain its final layer(s) to detect hands for us. Sweet!. Given that neural networks sometimes have thousands or millions of parameters that can take weeks or months to train, transfer learning helps shorten training time to possibly hours. Tensorflow does offer a few models (in the tensorflow [model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md#coco-trained-models-coco-models)) and I chose to use the `ssd_mobilenet_v1_coco` model as my start point given it is currently (one of) the fastest models (read the SSD research [paper here](https://arxiv.org/pdf/1512.02325.pdf)). The training process can be done locally on your CPU machine which may take a while or better on a (cloud) GPU machine (which is what I did). For reference, training on my macbook pro (tensorflow compiled from source to take advantage of the mac's cpu architecture) the maximum speed I got was 5 seconds per step as opposed to the ~0.5 seconds per step I got with a GPU. For reference it would take about 12 days to run 200k steps on my mac (i7, 2.5GHz, 16GB) compared to ~5hrs on a GPU. > **Training on your own images**: Please use the [guide provided by Harrison from pythonprogramming](https://pythonprogramming.net/training-custom-objects-tensorflow-object-detection-api-tutorial/) on how to generate tfrecords given your label csv files and your images. The guide also covers how to start the training process if training locally. [see [here] (https://pythonprogramming.net/training-custom-objects-tensorflow-object-detection-api-tutorial/)]. If training in the cloud using a service like GCP, see the [guide here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_cloud.md). As the training process progresses, the expectation is that total loss (errors) gets reduced to its possible minimum (about a value of 1 or thereabout). By observing the tensorboard graphs for total loss(see image below), it should be possible to get an idea of when the training process is complete (total loss does not decrease with further iterations/steps). I ran my training job for 200k steps (took about 5 hours) and stopped at a total Loss (errors) value of 2.575.(In retrospect, I could have stopped the training at about 50k steps and gotten a similar total loss value). With tensorflow, you can also run an evaluation concurrently that assesses your model to see how well it performs on the test data. A commonly used metric for performance is mean average precision (mAP) which is single number used to summarize the area under the precision-recall curve. mAP is a measure of how well the model generates a bounding box that has at least a 50% overlap with the ground truth bounding box in our test dataset. For the hand detector trained here, the mAP value was **0.9686@0.5IOU**. mAP values range from 0-1, the higher the better. <img src="images/accuracy.jpg" width="100%"> Once training is completed, the trained inference graph (`frozen_inference_graph.pb`) is then exported (see the earlier referenced guides for how to do this) and saved in the `hand_inference_graph` folder. Now its time to do some interesting detection. ## Using the Detector to Detect/Track hands If you have not done this yet, please following the guide on installing [Tensorflow and the Tensorflow object detection api](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md). This will walk you through setting up the tensorflow framework, cloning the tensorflow github repo and a guide on - Load the `frozen_inference_graph.pb` trained on the hands dataset as well as the corresponding label map. In this repo, this is done in the `utils/detector_utils.py` script by the `load_inference_graph` method. ```python detection_graph = tf.Graph() with detection_graph.as_default(): od_graph_def = tf.GraphDef() with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid: serialized_graph = fid.read() od_graph_def.ParseFromString(serialized_graph) tf.import_graph_def(od_graph_def, name='') sess = tf.Session(graph=detection_graph) print("> ====== Hand Inference graph loaded.") ``` - Detect hands. In this repo, this is done in the `utils/detector_utils.py` script by the `detect_objects` method. ```python (boxes, scores, classes, num) = sess.run( [detection_boxes, detection_scores, detection_classes, num_detections], feed_dict={image_tensor: image_np_expanded}) ``` - Visualize detected bounding detection_boxes. In this repo, this is done in the `utils/detector_utils.py` script by the `draw_box_on_image` method. This repo contains two scripts that tie all these steps together. - detect_multi_threaded.py : A threaded implementation for reading camera video input detection and detecting. Takes a set of command line flags to set parameters such as `--display` (visualize detections), image parameters `--width` and `--height`, videe `--source` (0 for camera) etc. - detect_single_threaded.py : Same as above, but single threaded. This script works for video files by setting the video source parameter videe `--source` (path to a video file). ```cmd # load and run detection on video at path "videos/chess.mov" python detect_single_threaded.py --source videos/chess.mov ``` > Update: If you do have errors loading the frozen inference graph in this repo, feel free to generate a new graph that fits your TF version from the model-checkpoint in this repo. Use the [export_inference_graph.py](https://github.com/tensorflow/models/blob/master/research/object_detection/export_inference_graph.py) script provided in the tensorflow object detection api repo. More guidance on this [here](https://pythonprogramming.net/testing-custom-object-detector-tensorflow-object-detection-api-tutorial/?completed=/training-custom-objects-tensorflow-object-detection-api-tutorial/). ## Thoughts on Optimization. A few things that led to noticeable performance increases. - Threading: Turns out that reading images from a webcam is a heavy I/O event and if run on the main application thread can slow down the program. I implemented some good ideas from [Adrian Rosebuck](https://www.pyimagesearch.com/2017/02/06/faster-video-file-fps-with-cv2-videocapture-and-opencv/) on parrallelizing image capture across multiple worker threads. This mostly led to an FPS increase of about 5 points. - For those new to Opencv, images from the `cv2.read()` method return images in [BGR format](https://www.learnopencv.com/why-does-opencv-use-bgr-color-format/). Ensure you convert to RGB before detection (accuracy will be much reduced if you dont). ```python cv2.cvtColor(image_np, cv2.COLOR_BGR2RGB) ``` - Keeping your input image small will increase fps without any significant accuracy drop.(I used about 320 x 240 compared to the 1280 x 720 which my webcam provides). - Model Quantization. Moving from the current 32 bit to 8 bit can achieve up to 4x reduction in memory required to load and store models. One way to further speed up this model is to explore the use of [8-bit fixed point quantization](https://heartbeat.fritz.ai/8-bit-quantization-and-tensorflow-lite-speeding-up-mobile-inference-with-low-precision-a882dfcafbbd). Performance can also be increased by a clever combination of tracking algorithms with the already decent detection and this is something I am still experimenting with. Have ideas for optimizing better, please share! <img src="images/general.jpg" width="100%"> Note: The detector does reflect some limitations associated with the training set. This includes non-egocentric viewpoints, very noisy backgrounds (e.g in a sea of hands) and sometimes skin tone. There is opportunity to improve these with additional data. ## Integrating Multiple DNNs. One way to make things more interesting is to integrate our new knowledge of where "hands" are with other detectors trained to recognize other objects. Unfortunately, while our hand detector can in fact detect hands, it cannot detect other objects (a factor or how it is trained). To create a detector that classifies multiple different objects would mean a long involved process of assembling datasets for each class and a lengthy training process. > Given the above, a potential strategy is to explore structures that allow us **efficiently** interleave output form multiple pretrained models for various object classes and have them detect multiple objects on a single image. An example of this is with my primary use case where I am interested in understanding the position of objects on a table with respect to hands on same table. I am currently doing some work on a threaded application that loads multiple detectors and outputs bounding boxes on a single image. More on this soon.
abusufyanvu
MIT Introduction to Deep Learning (6.S191) Instructors: Alexander Amini and Ava Soleimany Course Information Summary Prerequisites Schedule Lectures Labs, Final Projects, Grading, and Prizes Software labs Gather.Town lab + Office Hour sessions Final project Paper Review Project Proposal Presentation Project Proposal Grading Rubric Past Project Proposal Ideas Awards + Categories Important Links and Emails Course Information Summary MIT's introductory course on deep learning methods with applications to computer vision, natural language processing, biology, and more! Students will gain foundational knowledge of deep learning algorithms and get practical experience in building neural networks in TensorFlow. Course concludes with a project proposal competition with feedback from staff and a panel of industry sponsors. Prerequisites We expect basic knowledge of calculus (e.g., taking derivatives), linear algebra (e.g., matrix multiplication), and probability (e.g., Bayes theorem) -- we'll try to explain everything else along the way! Experience in Python is helpful but not necessary. This class is taught during MIT's IAP term by current MIT PhD researchers. Listeners are welcome! Schedule Monday Jan 18, 2021 Lecture: Introduction to Deep Learning and NNs Lab: Lab 1A Tensorflow and building NNs from scratch Tuesday Jan 19, 2021 Lecture: Deep Sequence Modelling Lab: Lab 1B Music Generation using RNNs Wednesday Jan 20, 2021 Lecture: Deep Computer Vision Lab: Lab 2A Image classification and detection Thursday Jan 21, 2021 Lecture: Deep Generative Modelling Lab: Lab 2B Debiasing facial recognition systems Friday Jan 22, 2021 Lecture: Deep Reinforcement Learning Lab: Lab 3 pixel-to-control planning Monday Jan 25, 2021 Lecture: Limitations and New Frontiers Lab: Lab 3 continued Tuesday Jan 26, 2021 Lecture (part 1): Evidential Deep Learning Lecture (part 2): Bias and Fairness Lab: Work on final assignments Lab competition entries due at 11:59pm ET on Canvas! Lab 1, Lab 2, and Lab 3 Wednesday Jan 27, 2021 Lecture (part 1): Nigel Duffy, Ernst & Young Lecture (part 2): Kate Saenko, Boston University and MIT-IBM Watson AI Lab Lab: Work on final assignments Assignments due: Sign up for Final Project Competition Thursday Jan 28, 2021 Lecture (part 1): Sanja Fidler, U. Toronto, Vector Institute, and NVIDIA Lecture (part 2): Katherine Chou, Google Lab: Work on final assignments Assignments due: 1 page paper review (if applicable) Friday Jan 29, 2021 Lecture: Student project pitch competition Lab: Awards ceremony and prize giveaway Assignments due: Project proposals (if applicable) Lectures Lectures will be held starting at 1:00pm ET from Jan 18 - Jan 29 2021, Monday through Friday, virtually through Zoom. Current MIT students, faculty, postdocs, researchers, staff, etc. will be able to access the lectures during this two week period, synchronously or asynchronously, via the MIT Canvas course webpage (MIT internal only). Lecture recordings will be uploaded to the Canvas as soon as possible; students are not required to attend any lectures synchronously. Please see the Canvas for details on Zoom links. The public edition of the course will only be made available after completion of the MIT course. Labs, Final Projects, Grading, and Prizes Course will be graded during MIT IAP for 6 units under P/D/F grading. Receiving a passing grade requires completion of each software lab project (through honor code, with submission required to enter lab competitions), a final project proposal/presentation or written review of a deep learning paper (submission required), and attendance/lecture viewing (through honor code). Submission of a written report or presentation of a project proposal will ensure a passing grade. MIT students will be eligible for prizes and awards as part of the class competitions. There will be two parts to the competitions: (1) software labs and (2) final projects. More information is provided below. Winners will be announced on the last day of class, with thousands of dollars of prizes being given away! Software labs There are three TensorFlow software lab exercises for the course, designed as iPython notebooks hosted in Google Colab. Software labs can be found on GitHub: https://github.com/aamini/introtodeeplearning. These are self-paced exercises and are designed to help you gain practical experience implementing neural networks in TensorFlow. For registered MIT students, submission of lab materials is not necessary to get credit for the course or to pass the course. At the end of each software lab there will be task-associated materials to submit (along with instructions) for entry into the competitions, open to MIT students and affiliates during the IAP offering. This includes MIT students/affiliates who are taking the class as listeners -- you are eligible! These instructions are provided at the end of each of the labs. Completing these tasks and submitting your materials to Canvas will enter you into a per-lab competition. MIT students and affiliates will be eligible for prizes during the IAP offering; at the end of the course, prize-winners will be awarded with their prizes. All competition submissions are due on January 26 at 11:59pm ET to Canvas. For the software lab competitions, submissions will be judged on the basis of the following criteria: Strength and quality of final results (lab dependent) Soundness of implementation and approach Thoroughness and quality of provided descriptions and figures Gather.Town lab + Office Hour sessions After each day’s lecture, there will be open Office Hours in the class GatherTown, up until 3pm ET. An MIT email is required to log in and join the GatherTown. During these sessions, there will not be a walk through or dictation of the labs; the labs are designed to be self-paced and to be worked on on your own time. The GatherTown sessions will be hosted by course staff and are held so you can: Ask questions on course lectures, labs, logistics, project, or anything else; Work on the labs in the presence of classmates/TAs/instructors; Meet classmates to find groups for the final project; Group work time for the final project; Bring the class community together. Final project To satisfy the final project requirement for this course, students will have two options: (1) write a 1 page paper review (single-spaced) on a recent deep learning paper of your choice or (2) participate and present in the project proposal pitch competition. The 1 page paper review option is straightforward, we propose some papers within this document to help you get started, and you can satisfy a passing grade with this option -- you will not be eligible for the grand prizes. On the other hand, participation in the project proposal pitch competition will equivalently satisfy your course requirements but additionally make you eligible for the grand prizes. See the section below for more details and requirements for each of these options. Paper Review Students may satisfy the final project requirement by reading and reviewing a recent deep learning paper of their choosing. In the written review, students should provide both: 1) a description of the problem, technical approach, and results of the paper; 2) critical analysis and exposition of the limitations of the work and opportunities for future work. Reviews should be submitted on Canvas by Thursday Jan 28, 2021, 11:59:59pm Eastern Time (ET). Just a few paper options to consider... https://papers.nips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf https://papers.nips.cc/paper/2018/file/69386f6bb1dfed68692a24c8686939b9-Paper.pdf https://papers.nips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf https://science.sciencemag.org/content/362/6419/1140 https://papers.nips.cc/paper/2018/file/0e64a7b00c83e3d22ce6b3acf2c582b6-Paper.pdf https://arxiv.org/pdf/1906.11829.pdf https://www.nature.com/articles/s42256-020-00237-3 https://pubmed.ncbi.nlm.nih.gov/32084340/ Project Proposal Presentation Keyword: proposal This is a 2 week course so we do not require results or working implementations! However, to win the top prizes, nice, clear results and implementations will demonstrate feasibility of your proposal which is something we look for! Logistics -- please read! You must sign up to present before 11:59:59pm Eastern Time (ET) on Wednesday Jan 27, 2021 Slides must be in a Google Slide before 11:59:59pm Eastern Time (ET) on Thursday Jan 28, 2021 Project groups can be between 1 and 5 people Listeners welcome To be eligible for a prize you must have at least 1 registered MIT student in your group Each participant will only be allowed to be in one group and present one project pitch Synchronous attendance on 1/29/21 is required to make the project pitch! 3 min presentation on your idea (we will be very strict with the time limits) Prizes! (see below) Sign up to Present here: by 11:59pm ET on Wednesday Jan 27 Once you sign up, make your slide in the following Google Slides; submit by midnight on Thursday Jan 28. Please specify the project group # on your slides!!! Things to Consider This doesn’t have to be a new deep learning method. It can just be an interesting application that you apply some existing deep learning method to. What problem are you solving? Are there use cases/applications? Why do you think deep learning methods might be suited to this task? How have people done it before? Is it a new task? If so, what are similar tasks that people have worked on? In what aspects have they succeeded or failed? What is your method of solving this problem? What type of model + architecture would you use? Why? What is the data for this task? Do you need to make a dataset or is there one publicly available? What are the characteristics of the data? Is it sparse, messy, imbalanced? How would you deal with that? Project Proposal Grading Rubric Project proposals will be evaluated by a panel of judges on the basis of the following three criteria: 1) novelty and impact; 2) technical soundness, feasibility, and organization, including quality of any presented results; 3) clarity and presentation. Each judge will award a score from 1 (lowest) to 5 (highest) for each of the criteria; the average score from each judge across these criteria will then be averaged with that of the other judges to provide the final score. The proposals with the highest final scores will be selected for prizes. Here are the guidelines for the criteria: Novelty and impact: encompasses the potential impact of the project idea, its novelty with respect to existing approaches. Why does the proposed work matter? What problem(s) does it solve? Why are these problems important? Technical soundness, feasibility, and organization: encompasses all technical aspects of the proposal. Do the proposed methodology and architecture make sense? Is the architecture the best suited for the proposed problem? Is deep learning the best approach for the problem? How realistic is it to implement the idea? Was there any implementation of the method? If results and data are presented, we will evaluate the strength of the results/data. Clarity and presentation: encompasses the delivery and quality of the presentation itself. Is the talk well organized? Are the slides aesthetically compelling? Is there a clear, well-delivered narrative? Are the problem and proposed method clearly presented? Past Project Proposal Ideas Recipe Generation with RNNs Can we compress videos with CNN + RNN? Music Generation with RNNs Style Transfer Applied to X GAN’s on a new modality Summarizing text/news articles Combining news articles about similar events Code or spec generation Multimodal speech → handwriting Generate handwriting based on keywords (i.e. cursive, slanted, neat) Predicting stock market trends Show language learners articles or videos at their level Transfer of writing style Chemical Synthesis with Recurrent Neural networks Transfer learning to learn something in a domain for which it’s hard or risky to gather data or do training RNNs to model some type of time series data Computer vision to coach sports players Computer vision system for safety brakes or warnings Use IBM Watson API to get the sentiment of your Facebook newsfeed Deep learning webcam to give wifi-access to friends or improve video chat in some way Domain-specific chatbot to help you perform a specific task Detect whether a signature is fraudulent Awards + Categories Final Project Awards: 1x NVIDIA RTX 3080 4x Google Home Max 3x Display Monitors Software Lab Awards: Bose headphones (Lab 1) Display monitor (Lab 2) Bebop drone (Lab 3) Important Links and Emails Course website: http://introtodeeplearning.com Course staff: introtodeeplearning-staff@mit.edu Piazza forum (MIT only): https://piazza.com/mit/spring2021/6s191 Canvas (MIT only): https://canvas.mit.edu/courses/8291 Software lab repository: https://github.com/aamini/introtodeeplearning Lab/office hour sessions (MIT only): https://gather.town/app/56toTnlBrsKCyFgj/MITDeepLearning
rustyneuron01
Multi-modal AI-generated content detection: image, video, and audio. Benchmarks, training code (DINOv2, DINOv3, ReStraV, BreathNet), and evaluation pipeline for real vs. synthetic classification with calibration-aware metrics.
zhouying20
COLING'24 Humanizing Machine-Generated Content: Evading AI-Text Detection through Adversarial Attack
Aryia-Behroziuan
Poole, Mackworth & Goebel 1998, p. 1. Russell & Norvig 2003, p. 55. Definition of AI as the study of intelligent agents: Poole, Mackworth & Goebel (1998), which provides the version that is used in this article. These authors use the term "computational intelligence" as a synonym for artificial intelligence.[1] Russell & Norvig (2003) (who prefer the term "rational agent") and write "The whole-agent view is now widely accepted in the field".[2] Nilsson 1998 Legg & Hutter 2007 Russell & Norvig 2009, p. 2. McCorduck 2004, p. 204 Maloof, Mark. "Artificial Intelligence: An Introduction, p. 37" (PDF). georgetown.edu. Archived (PDF) from the original on 25 August 2018. "How AI Is Getting Groundbreaking Changes In Talent Management And HR Tech". Hackernoon. Archived from the original on 11 September 2019. Retrieved 14 February 2020. Schank, Roger C. (1991). "Where's the AI". AI magazine. Vol. 12 no. 4. p. 38. Russell & Norvig 2009. "AlphaGo – Google DeepMind". Archived from the original on 10 March 2016. Allen, Gregory (April 2020). "Department of Defense Joint AI Center - Understanding AI Technology" (PDF). AI.mil - The official site of the Department of Defense Joint Artificial Intelligence Center. Archived (PDF) from the original on 21 April 2020. Retrieved 25 April 2020. Optimism of early AI: * Herbert Simon quote: Simon 1965, p. 96 quoted in Crevier 1993, p. 109. * Marvin Minsky quote: Minsky 1967, p. 2 quoted in Crevier 1993, p. 109. Boom of the 1980s: rise of expert systems, Fifth Generation Project, Alvey, MCC, SCI: * McCorduck 2004, pp. 426–441 * Crevier 1993, pp. 161–162,197–203, 211, 240 * Russell & Norvig 2003, p. 24 * NRC 1999, pp. 210–211 * Newquist 1994, pp. 235–248 First AI Winter, Mansfield Amendment, Lighthill report * Crevier 1993, pp. 115–117 * Russell & Norvig 2003, p. 22 * NRC 1999, pp. 212–213 * Howe 1994 * Newquist 1994, pp. 189–201 Second AI winter: * McCorduck 2004, pp. 430–435 * Crevier 1993, pp. 209–210 * NRC 1999, pp. 214–216 * Newquist 1994, pp. 301–318 AI becomes hugely successful in the early 21st century * Clark 2015 Pamela McCorduck (2004, p. 424) writes of "the rough shattering of AI in subfields—vision, natural language, decision theory, genetic algorithms, robotics ... and these with own sub-subfield—that would hardly have anything to say to each other." This list of intelligent traits is based on the topics covered by the major AI textbooks, including: * Russell & Norvig 2003 * Luger & Stubblefield 2004 * Poole, Mackworth & Goebel 1998 * Nilsson 1998 Kolata 1982. Maker 2006. Biological intelligence vs. intelligence in general: Russell & Norvig 2003, pp. 2–3, who make the analogy with aeronautical engineering. McCorduck 2004, pp. 100–101, who writes that there are "two major branches of artificial intelligence: one aimed at producing intelligent behavior regardless of how it was accomplished, and the other aimed at modeling intelligent processes found in nature, particularly human ones." Kolata 1982, a paper in Science, which describes McCarthy's indifference to biological models. Kolata quotes McCarthy as writing: "This is AI, so we don't care if it's psychologically real".[19] McCarthy recently reiterated his position at the AI@50 conference where he said "Artificial intelligence is not, by definition, simulation of human intelligence".[20]. Neats vs. scruffies: * McCorduck 2004, pp. 421–424, 486–489 * Crevier 1993, p. 168 * Nilsson 1983, pp. 10–11 Symbolic vs. sub-symbolic AI: * Nilsson (1998, p. 7), who uses the term "sub-symbolic". General intelligence (strong AI) is discussed in popular introductions to AI: * Kurzweil 1999 and Kurzweil 2005 See the Dartmouth proposal, under Philosophy, below. McCorduck 2004, p. 34. McCorduck 2004, p. xviii. McCorduck 2004, p. 3. McCorduck 2004, pp. 340–400. This is a central idea of Pamela McCorduck's Machines Who Think. She writes: "I like to think of artificial intelligence as the scientific apotheosis of a venerable cultural tradition."[26] "Artificial intelligence in one form or another is an idea that has pervaded Western intellectual history, a dream in urgent need of being realized."[27] "Our history is full of attempts—nutty, eerie, comical, earnest, legendary and real—to make artificial intelligences, to reproduce what is the essential us—bypassing the ordinary means. Back and forth between myth and reality, our imaginations supplying what our workshops couldn't, we have engaged for a long time in this odd form of self-reproduction."[28] She traces the desire back to its Hellenistic roots and calls it the urge to "forge the Gods."[29] "Stephen Hawking believes AI could be mankind's last accomplishment". BetaNews. 21 October 2016. Archived from the original on 28 August 2017. Lombardo P, Boehm I, Nairz K (2020). "RadioComics – Santa Claus and the future of radiology". Eur J Radiol. 122 (1): 108771. doi:10.1016/j.ejrad.2019.108771. PMID 31835078. Ford, Martin; Colvin, Geoff (6 September 2015). "Will robots create more jobs than they destroy?". The Guardian. Archived from the original on 16 June 2018. Retrieved 13 January 2018. AI applications widely used behind the scenes: * Russell & Norvig 2003, p. 28 * Kurzweil 2005, p. 265 * NRC 1999, pp. 216–222 * Newquist 1994, pp. 189–201 AI in myth: * McCorduck 2004, pp. 4–5 * Russell & Norvig 2003, p. 939 AI in early science fiction. * McCorduck 2004, pp. 17–25 Formal reasoning: * Berlinski, David (2000). The Advent of the Algorithm. Harcourt Books. ISBN 978-0-15-601391-8. OCLC 46890682. Archived from the original on 26 July 2020. Retrieved 22 August 2020. Turing, Alan (1948), "Machine Intelligence", in Copeland, B. Jack (ed.), The Essential Turing: The ideas that gave birth to the computer age, Oxford: Oxford University Press, p. 412, ISBN 978-0-19-825080-7 Russell & Norvig 2009, p. 16. Dartmouth conference: * McCorduck 2004, pp. 111–136 * Crevier 1993, pp. 47–49, who writes "the conference is generally recognized as the official birthdate of the new science." * Russell & Norvig 2003, p. 17, who call the conference "the birth of artificial intelligence." * NRC 1999, pp. 200–201 McCarthy, John (1988). "Review of The Question of Artificial Intelligence". Annals of the History of Computing. 10 (3): 224–229., collected in McCarthy, John (1996). "10. Review of The Question of Artificial Intelligence". Defending AI Research: A Collection of Essays and Reviews. CSLI., p. 73, "[O]ne of the reasons for inventing the term "artificial intelligence" was to escape association with "cybernetics". Its concentration on analog feedback seemed misguided, and I wished to avoid having either to accept Norbert (not Robert) Wiener as a guru or having to argue with him." Hegemony of the Dartmouth conference attendees: * Russell & Norvig 2003, p. 17, who write "for the next 20 years the field would be dominated by these people and their students." * McCorduck 2004, pp. 129–130 Russell & Norvig 2003, p. 18. Schaeffer J. (2009) Didn't Samuel Solve That Game?. In: One Jump Ahead. Springer, Boston, MA Samuel, A. L. (July 1959). "Some Studies in Machine Learning Using the Game of Checkers". IBM Journal of Research and Development. 3 (3): 210–229. CiteSeerX 10.1.1.368.2254. doi:10.1147/rd.33.0210. "Golden years" of AI (successful symbolic reasoning programs 1956–1973): * McCorduck 2004, pp. 243–252 * Crevier 1993, pp. 52–107 * Moravec 1988, p. 9 * Russell & Norvig 2003, pp. 18–21 The programs described are Arthur Samuel's checkers program for the IBM 701, Daniel Bobrow's STUDENT, Newell and Simon's Logic Theorist and Terry Winograd's SHRDLU. DARPA pours money into undirected pure research into AI during the 1960s: * McCorduck 2004, p. 131 * Crevier 1993, pp. 51, 64–65 * NRC 1999, pp. 204–205 AI in England: * Howe 1994 Lighthill 1973. Expert systems: * ACM 1998, I.2.1 * Russell & Norvig 2003, pp. 22–24 * Luger & Stubblefield 2004, pp. 227–331 * Nilsson 1998, chpt. 17.4 * McCorduck 2004, pp. 327–335, 434–435 * Crevier 1993, pp. 145–62, 197–203 * Newquist 1994, pp. 155–183 Mead, Carver A.; Ismail, Mohammed (8 May 1989). Analog VLSI Implementation of Neural Systems (PDF). The Kluwer International Series in Engineering and Computer Science. 80. Norwell, MA: Kluwer Academic Publishers. doi:10.1007/978-1-4613-1639-8. ISBN 978-1-4613-1639-8. Archived from the original (PDF) on 6 November 2019. Retrieved 24 January 2020. Formal methods are now preferred ("Victory of the neats"): * Russell & Norvig 2003, pp. 25–26 * McCorduck 2004, pp. 486–487 McCorduck 2004, pp. 480–483. Markoff 2011. "Ask the AI experts: What's driving today's progress in AI?". McKinsey & Company. Archived from the original on 13 April 2018. Retrieved 13 April 2018. Administrator. "Kinect's AI breakthrough explained". i-programmer.info. Archived from the original on 1 February 2016. Rowinski, Dan (15 January 2013). "Virtual Personal Assistants & The Future Of Your Smartphone [Infographic]". ReadWrite. Archived from the original on 22 December 2015. "Artificial intelligence: Google's AlphaGo beats Go master Lee Se-dol". BBC News. 12 March 2016. Archived from the original on 26 August 2016. Retrieved 1 October 2016. Metz, Cade (27 May 2017). "After Win in China, AlphaGo's Designers Explore New AI". Wired. Archived from the original on 2 June 2017. "World's Go Player Ratings". May 2017. Archived from the original on 1 April 2017. "柯洁迎19岁生日 雄踞人类世界排名第一已两年" (in Chinese). May 2017. Archived from the original on 11 August 2017. Clark, Jack (8 December 2015). "Why 2015 Was a Breakthrough Year in Artificial Intelligence". Bloomberg News. Archived from the original on 23 November 2016. Retrieved 23 November 2016. After a half-decade of quiet breakthroughs in artificial intelligence, 2015 has been a landmark year. Computers are smarter and learning faster than ever. "Reshaping Business With Artificial Intelligence". MIT Sloan Management Review. Archived from the original on 19 May 2018. Retrieved 2 May 2018. Lorica, Ben (18 December 2017). "The state of AI adoption". O'Reilly Media. Archived from the original on 2 May 2018. Retrieved 2 May 2018. Allen, Gregory (6 February 2019). "Understanding China's AI Strategy". Center for a New American Security. Archived from the original on 17 March 2019. "Review | How two AI superpowers – the U.S. and China – battle for supremacy in the field". Washington Post. 2 November 2018. Archived from the original on 4 November 2018. Retrieved 4 November 2018. at 10:11, Alistair Dabbs 22 Feb 2019. "Artificial Intelligence: You know it isn't real, yeah?". www.theregister.co.uk. Archived from the original on 21 May 2020. Retrieved 22 August 2020. "Stop Calling it Artificial Intelligence". Archived from the original on 2 December 2019. Retrieved 1 December 2019. "AI isn't taking over the world – it doesn't exist yet". GBG Global website. Archived from the original on 11 August 2020. Retrieved 22 August 2020. Kaplan, Andreas; Haenlein, Michael (1 January 2019). "Siri, Siri, in my hand: Who's the fairest in the land? On the interpretations, illustrations, and implications of artificial intelligence". Business Horizons. 62 (1): 15–25. doi:10.1016/j.bushor.2018.08.004. Domingos 2015, Chapter 5. Domingos 2015, Chapter 7. Lindenbaum, M., Markovitch, S., & Rusakov, D. (2004). Selective sampling for nearest neighbor classifiers. Machine learning, 54(2), 125–152. Domingos 2015, Chapter 1. Intractability and efficiency and the combinatorial explosion: * Russell & Norvig 2003, pp. 9, 21–22 Domingos 2015, Chapter 2, Chapter 3. Hart, P. E.; Nilsson, N. J.; Raphael, B. (1972). "Correction to "A Formal Basis for the Heuristic Determination of Minimum Cost Paths"". SIGART Newsletter (37): 28–29. doi:10.1145/1056777.1056779. S2CID 6386648. Domingos 2015, Chapter 2, Chapter 4, Chapter 6. "Can neural network computers learn from experience, and if so, could they ever become what we would call 'smart'?". Scientific American. 2018. Archived from the original on 25 March 2018. Retrieved 24 March 2018. Domingos 2015, Chapter 6, Chapter 7. Domingos 2015, p. 286. "Single pixel change fools AI programs". BBC News. 3 November 2017. Archived from the original on 22 March 2018. Retrieved 12 March 2018. "AI Has a Hallucination Problem That's Proving Tough to Fix". WIRED. 2018. Archived from the original on 12 March 2018. Retrieved 12 March 2018. Matti, D.; Ekenel, H. K.; Thiran, J. P. (2017). Combining LiDAR space clustering and convolutional neural networks for pedestrian detection. 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). pp. 1–6. arXiv:1710.06160. doi:10.1109/AVSS.2017.8078512. ISBN 978-1-5386-2939-0. S2CID 2401976. Ferguson, Sarah; Luders, Brandon; Grande, Robert C.; How, Jonathan P. (2015). Real-Time Predictive Modeling and Robust Avoidance of Pedestrians with Uncertain, Changing Intentions. Algorithmic Foundations of Robotics XI. Springer Tracts in Advanced Robotics. 107. Springer, Cham. pp. 161–177. arXiv:1405.5581. doi:10.1007/978-3-319-16595-0_10. ISBN 978-3-319-16594-3. S2CID 8681101. "Cultivating Common Sense | DiscoverMagazine.com". Discover Magazine. 2017. Archived from the original on 25 March 2018. Retrieved 24 March 2018. Davis, Ernest; Marcus, Gary (24 August 2015). "Commonsense reasoning and commonsense knowledge in artificial intelligence". Communications of the ACM. 58 (9): 92–103. doi:10.1145/2701413. S2CID 13583137. Archived from the original on 22 August 2020. Retrieved 6 April 2020. Winograd, Terry (January 1972). "Understanding natural language". Cognitive Psychology. 3 (1): 1–191. doi:10.1016/0010-0285(72)90002-3. "Don't worry: Autonomous cars aren't coming tomorrow (or next year)". Autoweek. 2016. Archived from the original on 25 March 2018. Retrieved 24 March 2018. Knight, Will (2017). "Boston may be famous for bad drivers, but it's the testing ground for a smarter self-driving car". MIT Technology Review. Archived from the original on 22 August 2020. Retrieved 27 March 2018. Prakken, Henry (31 August 2017). "On the problem of making autonomous vehicles conform to traffic law". Artificial Intelligence and Law. 25 (3): 341–363. doi:10.1007/s10506-017-9210-0. Lieto, Antonio (May 2018). "The knowledge level in cognitive architectures: Current limitations and possible developments". Cognitive Systems Research. 48: 39–55. doi:10.1016/j.cogsys.2017.05.001. hdl:2318/1665207. S2CID 206868967. Problem solving, puzzle solving, game playing and deduction: * Russell & Norvig 2003, chpt. 3–9, * Poole, Mackworth & Goebel 1998, chpt. 2,3,7,9, * Luger & Stubblefield 2004, chpt. 3,4,6,8, * Nilsson 1998, chpt. 7–12 Uncertain reasoning: * Russell & Norvig 2003, pp. 452–644, * Poole, Mackworth & Goebel 1998, pp. 345–395, * Luger & Stubblefield 2004, pp. 333–381, * Nilsson 1998, chpt. 19 Psychological evidence of sub-symbolic reasoning: * Wason & Shapiro (1966) showed that people do poorly on completely abstract problems, but if the problem is restated to allow the use of intuitive social intelligence, performance dramatically improves. (See Wason selection task) * Kahneman, Slovic & Tversky (1982) have shown that people are terrible at elementary problems that involve uncertain reasoning. (See list of cognitive biases for several examples). * Lakoff & Núñez (2000) have controversially argued that even our skills at mathematics depend on knowledge and skills that come from "the body", i.e. sensorimotor and perceptual skills. (See Where Mathematics Comes From) Knowledge representation: * ACM 1998, I.2.4, * Russell & Norvig 2003, pp. 320–363, * Poole, Mackworth & Goebel 1998, pp. 23–46, 69–81, 169–196, 235–277, 281–298, 319–345, * Luger & Stubblefield 2004, pp. 227–243, * Nilsson 1998, chpt. 18 Knowledge engineering: * Russell & Norvig 2003, pp. 260–266, * Poole, Mackworth & Goebel 1998, pp. 199–233, * Nilsson 1998, chpt. ≈17.1–17.4 Representing categories and relations: Semantic networks, description logics, inheritance (including frames and scripts): * Russell & Norvig 2003, pp. 349–354, * Poole, Mackworth & Goebel 1998, pp. 174–177, * Luger & Stubblefield 2004, pp. 248–258, * Nilsson 1998, chpt. 18.3 Representing events and time:Situation calculus, event calculus, fluent calculus (including solving the frame problem): * Russell & Norvig 2003, pp. 328–341, * Poole, Mackworth & Goebel 1998, pp. 281–298, * Nilsson 1998, chpt. 18.2 Causal calculus: * Poole, Mackworth & Goebel 1998, pp. 335–337 Representing knowledge about knowledge: Belief calculus, modal logics: * Russell & Norvig 2003, pp. 341–344, * Poole, Mackworth & Goebel 1998, pp. 275–277 Sikos, Leslie F. (June 2017). Description Logics in Multimedia Reasoning. Cham: Springer. doi:10.1007/978-3-319-54066-5. ISBN 978-3-319-54066-5. S2CID 3180114. Archived from the original on 29 August 2017. Ontology: * Russell & Norvig 2003, pp. 320–328 Smoliar, Stephen W.; Zhang, HongJiang (1994). "Content based video indexing and retrieval". IEEE Multimedia. 1 (2): 62–72. doi:10.1109/93.311653. S2CID 32710913. Neumann, Bernd; Möller, Ralf (January 2008). "On scene interpretation with description logics". Image and Vision Computing. 26 (1): 82–101. doi:10.1016/j.imavis.2007.08.013. Kuperman, G. J.; Reichley, R. M.; Bailey, T. C. (1 July 2006). "Using Commercial Knowledge Bases for Clinical Decision Support: Opportunities, Hurdles, and Recommendations". Journal of the American Medical Informatics Association. 13 (4): 369–371. doi:10.1197/jamia.M2055. PMC 1513681. PMID 16622160. MCGARRY, KEN (1 December 2005). "A survey of interestingness measures for knowledge discovery". The Knowledge Engineering Review. 20 (1): 39–61. doi:10.1017/S0269888905000408. S2CID 14987656. Bertini, M; Del Bimbo, A; Torniai, C (2006). "Automatic annotation and semantic retrieval of video sequences using multimedia ontologies". MM '06 Proceedings of the 14th ACM international conference on Multimedia. 14th ACM international conference on Multimedia. Santa Barbara: ACM. pp. 679–682. Qualification problem: * McCarthy & Hayes 1969 * Russell & Norvig 2003[page needed] While McCarthy was primarily concerned with issues in the logical representation of actions, Russell & Norvig 2003 apply the term to the more general issue of default reasoning in the vast network of assumptions underlying all our commonsense knowledge. Default reasoning and default logic, non-monotonic logics, circumscription, closed world assumption, abduction (Poole et al. places abduction under "default reasoning". Luger et al. places this under "uncertain reasoning"): * Russell & Norvig 2003, pp. 354–360, * Poole, Mackworth & Goebel 1998, pp. 248–256, 323–335, * Luger & Stubblefield 2004, pp. 335–363, * Nilsson 1998, ~18.3.3 Breadth of commonsense knowledge: * Russell & Norvig 2003, p. 21, * Crevier 1993, pp. 113–114, * Moravec 1988, p. 13, * Lenat & Guha 1989 (Introduction) Dreyfus & Dreyfus 1986. Gladwell 2005. Expert knowledge as embodied intuition: * Dreyfus & Dreyfus 1986 (Hubert Dreyfus is a philosopher and critic of AI who was among the first to argue that most useful human knowledge was encoded sub-symbolically. See Dreyfus' critique of AI) * Gladwell 2005 (Gladwell's Blink is a popular introduction to sub-symbolic reasoning and knowledge.) * Hawkins & Blakeslee 2005 (Hawkins argues that sub-symbolic knowledge should be the primary focus of AI research.) Planning: * ACM 1998, ~I.2.8, * Russell & Norvig 2003, pp. 375–459, * Poole, Mackworth & Goebel 1998, pp. 281–316, * Luger & Stubblefield 2004, pp. 314–329, * Nilsson 1998, chpt. 10.1–2, 22 Information value theory: * Russell & Norvig 2003, pp. 600–604 Classical planning: * Russell & Norvig 2003, pp. 375–430, * Poole, Mackworth & Goebel 1998, pp. 281–315, * Luger & Stubblefield 2004, pp. 314–329, * Nilsson 1998, chpt. 10.1–2, 22 Planning and acting in non-deterministic domains: conditional planning, execution monitoring, replanning and continuous planning: * Russell & Norvig 2003, pp. 430–449 Multi-agent planning and emergent behavior: * Russell & Norvig 2003, pp. 449–455 Turing 1950. Solomonoff 1956. Alan Turing discussed the centrality of learning as early as 1950, in his classic paper "Computing Machinery and Intelligence".[120] In 1956, at the original Dartmouth AI summer conference, Ray Solomonoff wrote a report on unsupervised probabilistic machine learning: "An Inductive Inference Machine".[121] This is a form of Tom Mitchell's widely quoted definition of machine learning: "A computer program is set to learn from an experience E with respect to some task T and some performance measure P if its performance on T as measured by P improves with experience E." Learning: * ACM 1998, I.2.6, * Russell & Norvig 2003, pp. 649–788, * Poole, Mackworth & Goebel 1998, pp. 397–438, * Luger & Stubblefield 2004, pp. 385–542, * Nilsson 1998, chpt. 3.3, 10.3, 17.5, 20 Jordan, M. I.; Mitchell, T. M. (16 July 2015). "Machine learning: Trends, perspectives, and prospects". Science. 349 (6245): 255–260. Bibcode:2015Sci...349..255J. doi:10.1126/science.aaa8415. PMID 26185243. S2CID 677218. Reinforcement learning: * Russell & Norvig 2003, pp. 763–788 * Luger & Stubblefield 2004, pp. 442–449 Natural language processing: * ACM 1998, I.2.7 * Russell & Norvig 2003, pp. 790–831 * Poole, Mackworth & Goebel 1998, pp. 91–104 * Luger & Stubblefield 2004, pp. 591–632 "Versatile question answering systems: seeing in synthesis" Archived 1 February 2016 at the Wayback Machine, Mittal et al., IJIIDS, 5(2), 119–142, 2011 Applications of natural language processing, including information retrieval (i.e. text mining) and machine translation: * Russell & Norvig 2003, pp. 840–857, * Luger & Stubblefield 2004, pp. 623–630 Cambria, Erik; White, Bebo (May 2014). "Jumping NLP Curves: A Review of Natural Language Processing Research [Review Article]". IEEE Computational Intelligence Magazine. 9 (2): 48–57. doi:10.1109/MCI.2014.2307227. S2CID 206451986. Vincent, James (7 November 2019). "OpenAI has published the text-generating AI it said was too dangerous to share". The Verge. Archived from the original on 11 June 2020. Retrieved 11 June 2020. Machine perception: * Russell & Norvig 2003, pp. 537–581, 863–898 * Nilsson 1998, ~chpt. 6 Speech recognition: * ACM 1998, ~I.2.7 * Russell & Norvig 2003, pp. 568–578 Object recognition: * Russell & Norvig 2003, pp. 885–892 Computer vision: * ACM 1998, I.2.10 * Russell & Norvig 2003, pp. 863–898 * Nilsson 1998, chpt. 6 Robotics: * ACM 1998, I.2.9, * Russell & Norvig 2003, pp. 901–942, * Poole, Mackworth & Goebel 1998, pp. 443–460 Moving and configuration space: * Russell & Norvig 2003, pp. 916–932 Tecuci 2012. Robotic mapping (localization, etc): * Russell & Norvig 2003, pp. 908–915 Cadena, Cesar; Carlone, Luca; Carrillo, Henry; Latif, Yasir; Scaramuzza, Davide; Neira, Jose; Reid, Ian; Leonard, John J. (December 2016). "Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age". IEEE Transactions on Robotics. 32 (6): 1309–1332. arXiv:1606.05830. Bibcode:2016arXiv160605830C. doi:10.1109/TRO.2016.2624754. S2CID 2596787. Moravec, Hans (1988). Mind Children. Harvard University Press. p. 15. Chan, Szu Ping (15 November 2015). "This is what will happen when robots take over the world". Archived from the original on 24 April 2018. Retrieved 23 April 2018. "IKEA furniture and the limits of AI". The Economist. 2018. Archived from the original on 24 April 2018. Retrieved 24 April 2018. Kismet. Thompson, Derek (2018). "What Jobs Will the Robots Take?". The Atlantic. Archived from the original on 24 April 2018. Retrieved 24 April 2018. Scassellati, Brian (2002). "Theory of mind for a humanoid robot". Autonomous Robots. 12 (1): 13–24. doi:10.1023/A:1013298507114. S2CID 1979315. Cao, Yongcan; Yu, Wenwu; Ren, Wei; Chen, Guanrong (February 2013). "An Overview of Recent Progress in the Study of Distributed Multi-Agent Coordination". IEEE Transactions on Industrial Informatics. 9 (1): 427–438. arXiv:1207.3231. doi:10.1109/TII.2012.2219061. S2CID 9588126. Thro 1993. Edelson 1991. Tao & Tan 2005. Poria, Soujanya; Cambria, Erik; Bajpai, Rajiv; Hussain, Amir (September 2017). "A review of affective computing: From unimodal analysis to multimodal fusion". Information Fusion. 37: 98–125. doi:10.1016/j.inffus.2017.02.003. hdl:1893/25490. Emotion and affective computing: * Minsky 2006 Waddell, Kaveh (2018). "Chatbots Have Entered the Uncanny Valley". The Atlantic. Archived from the original on 24 April 2018. Retrieved 24 April 2018. Pennachin, C.; Goertzel, B. (2007). Contemporary Approaches to Artificial General Intelligence. Artificial General Intelligence. Cognitive Technologies. Cognitive Technologies. Berlin, Heidelberg: Springer. doi:10.1007/978-3-540-68677-4_1. ISBN 978-3-540-23733-4. Roberts, Jacob (2016). "Thinking Machines: The Search for Artificial Intelligence". Distillations. Vol. 2 no. 2. pp. 14–23. Archived from the original on 19 August 2018. Retrieved 20 March 2018. "The superhero of artificial intelligence: can this genius keep it in check?". the Guardian. 16 February 2016. Archived from the original on 23 April 2018. Retrieved 26 April 2018. Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A.; Veness, Joel; Bellemare, Marc G.; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K.; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane; Hassabis, Demis (26 February 2015). "Human-level control through deep reinforcement learning". Nature. 518 (7540): 529–533. Bibcode:2015Natur.518..529M. doi:10.1038/nature14236. PMID 25719670. S2CID 205242740. Sample, Ian (14 March 2017). "Google's DeepMind makes AI program that can learn like a human". the Guardian. Archived from the original on 26 April 2018. Retrieved 26 April 2018. "From not working to neural networking". The Economist. 2016. Archived from the original on 31 December 2016. Retrieved 26 April 2018. Domingos 2015. Artificial brain arguments: AI requires a simulation of the operation of the human brain * Russell & Norvig 2003, p. 957 * Crevier 1993, pp. 271 and 279 A few of the people who make some form of the argument: * Moravec 1988 * Kurzweil 2005, p. 262 * Hawkins & Blakeslee 2005 The most extreme form of this argument (the brain replacement scenario) was put forward by Clark Glymour in the mid-1970s and was touched on by Zenon Pylyshyn and John Searle in 1980. Goertzel, Ben; Lian, Ruiting; Arel, Itamar; de Garis, Hugo; Chen, Shuo (December 2010). "A world survey of artificial brain projects, Part II: Biologically inspired cognitive architectures". Neurocomputing. 74 (1–3): 30–49. doi:10.1016/j.neucom.2010.08.012. Nilsson 1983, p. 10. Nils Nilsson writes: "Simply put, there is wide disagreement in the field about what AI is all about."[163] AI's immediate precursors: * McCorduck 2004, pp. 51–107 * Crevier 1993, pp. 27–32 * Russell & Norvig 2003, pp. 15, 940 * Moravec 1988, p. 3 Haugeland 1985, pp. 112–117 The most dramatic case of sub-symbolic AI being pushed into the background was the devastating critique of perceptrons by Marvin Minsky and Seymour Papert in 1969. See History of AI, AI winter, or Frank Rosenblatt. Cognitive simulation, Newell and Simon, AI at CMU (then called Carnegie Tech): * McCorduck 2004, pp. 139–179, 245–250, 322–323 (EPAM) * Crevier 1993, pp. 145–149 Soar (history): * McCorduck 2004, pp. 450–451 * Crevier 1993, pp. 258–263 McCarthy and AI research at SAIL and SRI International: * McCorduck 2004, pp. 251–259 * Crevier 1993 AI research at Edinburgh and in France, birth of Prolog: * Crevier 1993, pp. 193–196 * Howe 1994 AI at MIT under Marvin Minsky in the 1960s : * McCorduck 2004, pp. 259–305 * Crevier 1993, pp. 83–102, 163–176 * Russell & Norvig 2003, p. 19 Cyc: * McCorduck 2004, p. 489, who calls it "a determinedly scruffy enterprise" * Crevier 1993, pp. 239–243 * Russell & Norvig 2003, p. 363−365 * Lenat & Guha 1989 Knowledge revolution: * McCorduck 2004, pp. 266–276, 298–300, 314, 421 * Russell & Norvig 2003, pp. 22–23 Frederick, Hayes-Roth; William, Murray; Leonard, Adelman. "Expert systems". AccessScience. doi:10.1036/1097-8542.248550. Embodied approaches to AI: * McCorduck 2004, pp. 454–462 * Brooks 1990 * Moravec 1988 Weng et al. 2001. Lungarella et al. 2003. Asada et al. 2009. Oudeyer 2010. Revival of connectionism: * Crevier 1993, pp. 214–215 * Russell & Norvig 2003, p. 25 Computational intelligence * IEEE Computational Intelligence Society Archived 9 May 2008 at the Wayback Machine Hutson, Matthew (16 February 2018). "Artificial intelligence faces reproducibility crisis". Science. pp. 725–726. Bibcode:2018Sci...359..725H. doi:10.1126/science.359.6377.725. Archived from the original on 29 April 2018. Retrieved 28 April 2018. Norvig 2012. Langley 2011. Katz 2012. The intelligent agent paradigm: * Russell & Norvig 2003, pp. 27, 32–58, 968–972 * Poole, Mackworth & Goebel 1998, pp. 7–21 * Luger & Stubblefield 2004, pp. 235–240 * Hutter 2005, pp. 125–126 The definition used in this article, in terms of goals, actions, perception and environment, is due to Russell & Norvig (2003). Other definitions also include knowledge and learning as additional criteria. Agent architectures, hybrid intelligent systems: * Russell & Norvig (2003, pp. 27, 932, 970–972) * Nilsson (1998, chpt. 25) Hierarchical control system: * Albus 2002 Lieto, Antonio; Lebiere, Christian; Oltramari, Alessandro (May 2018). "The knowledge level in cognitive architectures: Current limitations and possibile developments". Cognitive Systems Research. 48: 39–55. doi:10.1016/j.cogsys.2017.05.001. hdl:2318/1665207. S2CID 206868967. Lieto, Antonio; Bhatt, Mehul; Oltramari, Alessandro; Vernon, David (May 2018). "The role of cognitive architectures in general artificial intelligence". Cognitive Systems Research. 48: 1–3. doi:10.1016/j.cogsys.2017.08.003. hdl:2318/1665249. S2CID 36189683. Russell & Norvig 2009, p. 1. White Paper: On Artificial Intelligence - A European approach to excellence and trust (PDF). Brussels: European Commission. 2020. p. 1. Archived (PDF) from the original on 20 February 2020. Retrieved 20 February 2020. CNN 2006. Using AI to predict flight delays Archived 20 November 2018 at the Wayback Machine, Ishti.org. N. Aletras; D. Tsarapatsanis; D. Preotiuc-Pietro; V. Lampos (2016). "Predicting judicial decisions of the European Court of Human Rights: a Natural Language Processing perspective". PeerJ Computer Science. 2: e93. doi:10.7717/peerj-cs.93. "The Economist Explains: Why firms are piling into artificial intelligence". The Economist. 31 March 2016. Archived from the original on 8 May 2016. Retrieved 19 May 2016. Lohr, Steve (28 February 2016). "The Promise of Artificial Intelligence Unfolds in Small Steps". The New York Times. Archived from the original on 29 February 2016. Retrieved 29 February 2016. Frangoul, Anmar (14 June 2019). "A Californian business is using A.I. to change the way we think about energy storage". CNBC. Archived from the original on 25 July 2020. Retrieved 5 November 2019. Wakefield, Jane (15 June 2016). "Social media 'outstrips TV' as news source for young people". BBC News. Archived from the original on 24 June 2016. Smith, Mark (22 July 2016). "So you think you chose to read this article?". BBC News. Archived from the original on 25 July 2016. Brown, Eileen. "Half of Americans do not believe deepfake news could target them online". ZDNet. Archived from the original on 6 November 2019. Retrieved 3 December 2019. The Turing test: Turing's original publication: * Turing 1950 Historical influence and philosophical implications: * Haugeland 1985, pp. 6–9 * Crevier 1993, p. 24 * McCorduck 2004, pp. 70–71 * Russell & Norvig 2003, pp. 2–3 and 948 Dartmouth proposal: * McCarthy et al. 1955 (the original proposal) * Crevier 1993, p. 49 (historical significance) The physical symbol systems hypothesis: * Newell & Simon 1976, p. 116 * McCorduck 2004, p. 153 * Russell & Norvig 2003, p. 18 Dreyfus 1992, p. 156. Dreyfus criticized the necessary condition of the physical symbol system hypothesis, which he called the "psychological assumption": "The mind can be viewed as a device operating on bits of information according to formal rules."[206] Dreyfus' critique of artificial intelligence: * Dreyfus 1972, Dreyfus & Dreyfus 1986 * Crevier 1993, pp. 120–132 * McCorduck 2004, pp. 211–239 * Russell & Norvig 2003, pp. 950–952, Gödel 1951: in this lecture, Kurt Gödel uses the incompleteness theorem to arrive at the following disjunction: (a) the human mind is not a consistent finite machine, or (b) there exist Diophantine equations for which it cannot decide whether solutions exist. Gödel finds (b) implausible, and thus seems to have believed the human mind was not equivalent to a finite machine, i.e., its power exceeded that of any finite machine. He recognized that this was only a conjecture, since one could never disprove (b). Yet he considered the disjunctive conclusion to be a "certain fact". The Mathematical Objection: * Russell & Norvig 2003, p. 949 * McCorduck 2004, pp. 448–449 Making the Mathematical Objection: * Lucas 1961 * Penrose 1989 Refuting Mathematical Objection: * Turing 1950 under "(2) The Mathematical Objection" * Hofstadter 1979 Background: * Gödel 1931, Church 1936, Kleene 1935, Turing 1937 Graham Oppy (20 January 2015). "Gödel's Incompleteness Theorems". Stanford Encyclopedia of Philosophy. Archived from the original on 22 April 2016. Retrieved 27 April 2016. These Gödelian anti-mechanist arguments are, however, problematic, and there is wide consensus that they fail. Stuart J. Russell; Peter Norvig (2010). "26.1.2: Philosophical Foundations/Weak AI: Can Machines Act Intelligently?/The mathematical objection". Artificial Intelligence: A Modern Approach (3rd ed.). Upper Saddle River, NJ: Prentice Hall. ISBN 978-0-13-604259-4. even if we grant that computers have limitations on what they can prove, there is no evidence that humans are immune from those limitations. Mark Colyvan. An introduction to the philosophy of mathematics. Cambridge University Press, 2012. From 2.2.2, 'Philosophical significance of Gödel's incompleteness results': "The accepted wisdom (with which I concur) is that the Lucas-Penrose arguments fail." Iphofen, Ron; Kritikos, Mihalis (3 January 2019). "Regulating artificial intelligence and robotics: ethics by design in a digital society". Contemporary Social Science: 1–15. doi:10.1080/21582041.2018.1563803. ISSN 2158-2041. "Ethical AI Learns Human Rights Framework". Voice of America. Archived from the original on 11 November 2019. Retrieved 10 November 2019. Crevier 1993, pp. 132–144. In the early 1970s, Kenneth Colby presented a version of Weizenbaum's ELIZA known as DOCTOR which he promoted as a serious therapeutic tool.[216] Joseph Weizenbaum's critique of AI: * Weizenbaum 1976 * Crevier 1993, pp. 132–144 * McCorduck 2004, pp. 356–373 * Russell & Norvig 2003, p. 961 Weizenbaum (the AI researcher who developed the first chatterbot program, ELIZA) argued in 1976 that the misuse of artificial intelligence has the potential to devalue human life. Wendell Wallach (2010). Moral Machines, Oxford University Press. Wallach, pp 37–54. Wallach, pp 55–73. Wallach, Introduction chapter. Michael Anderson and Susan Leigh Anderson (2011), Machine Ethics, Cambridge University Press. "Machine Ethics". aaai.org. Archived from the original on 29 November 2014. Rubin, Charles (Spring 2003). "Artificial Intelligence and Human Nature". The New Atlantis. 1: 88–100. Archived from the original on 11 June 2012. Brooks, Rodney (10 November 2014). "artificial intelligence is a tool, not a threat". Archived from the original on 12 November 2014. "Stephen Hawking, Elon Musk, and Bill Gates Warn About Artificial Intelligence". Observer. 19 August 2015. Archived from the original on 30 October 2015. Retrieved 30 October 2015. Chalmers, David (1995). "Facing up to the problem of consciousness". Journal of Consciousness Studies. 2 (3): 200–219. Archived from the original on 8 March 2005. Retrieved 11 October 2018. See also this link Archived 8 April 2011 at the Wayback Machine Horst, Steven, (2005) "The Computational Theory of Mind" Archived 11 September 2018 at the Wayback Machine in The Stanford Encyclopedia of Philosophy Searle 1980, p. 1. This version is from Searle (1999), and is also quoted in Dennett 1991, p. 435. Searle's original formulation was "The appropriately programmed computer really is a mind, in the sense that computers given the right programs can be literally said to understand and have other cognitive states." [230] Strong AI is defined similarly by Russell & Norvig (2003, p. 947): "The assertion that machines could possibly act intelligently
Text2Go
A powerful Model Context Protocol (MCP) server that helps refine AI-generated content to sound more natural and human-like. Built with advanced AI detection and text enhancement capabilities.
ZAYUVALYA
ZAYUVALYA - AI Humanizer is a developing tool designed to rephrase AI-generated text to make it more natural and reduce detection by AI content detectors. Currently in its early stages, the project focuses on context-aware paraphrasing and aims to improve with further optimizations.
Duffy12150557
Security Research Toolkit — Video and image analysis tool for neural inpainting and AI-generated content detection with SORA signature extraction, temporal consistency analysis, CNN artifact detection, CPU/CUDA device selection, multi-format support, and colorama-styled terminal interface
hbchen121
AI-Generated Visual Content (AIGVC) Detection
meshackbahati
Plagiarism and AI-generated content detection system with text and image comparison,.
baoguangsheng
Code base for "Decoupling Content and Expression: Two-Dimensional Detection of AI-Generated Text"
ZhihaoZhang97
[WWW'25] Official repo for paper: RU-AI: A Large Multimodal Dataset for Machine Generated Content Detection
vardhin
AI text humanization tool with detection capabilities. Transform AI-generated content into natural, human-like writing using advanced NLP models (T5, BART, Pegasus). Features real-time AI detection, multi-model paraphrasing, and a modern Svelte frontend.
jesvijonathan
This Machine Learning Project detects AI/LLM generated content & provides in depth analysis. The trained model can detect text generated by GPT-4, 3.5, 2, Zero, Bard & other LLMs with high accuracy by using newer efficient detection algorithms.
TristoneJiang
The code for EMNLP 2025 paper SenDetEX: Sentence-Level AI-Generated Text Detection for Human-AI Hybrid Content via Style and Context Fusion
progressionnetwork
CheckGPT is a neural network to detect content generated using big LMs (like ChatGPT, GPT3, GPT2). For AI generated content detection, we are using statistical and heuristical methods, perplexity, entropy, coherence and consistency and other. Accuracy is up to 98%. Use our web app and RestAPI at checkgpt.app.
PhDN
NoCheat is a web application and machine learning algorithm that allows for the detection of AI-generated text content.
theShuai-t
[TIFS 2025] The official code of "Towards Extensible Detection of AI-Generated Images via Content-Agnostic Adapter-Based Category-Aware Incremental Learning".
AdityaKK0407
A project for AI generated content detection.
devrupt-io
Self-hosted AI text detection — analyze written work for AI-generated content using local or cloud LLMs.
Ab-Romia
Transform AI-generated text to match your unique writing voice. VoicePrint learns from your writing samples and applies your personal style to any AI-generated content while bypassing detection.
chongzixuan-ai
A powerful Model Context Protocol (MCP) server that helps refine AI-generated content to sound more natural and human-like. Built with advanced AI detection and text enhancement capabilities.
midlaj-muhammed
✨ WriteGenuine is a comprehensive text analysis platform that helps users 🔍 detect AI-generated content, 🚫 check for plagiarism, and 🧙♂️ transform artificial text into natural human writing that bypasses AI detection tools! 🤖➡️👤
AI Generated Content(Text, Images & Code) Detection using Python
This DeepFake Detection project analyzes images and videos to determine whether the content is real or artificially manipulated using deep learning techniques. It helps identify synthetic media generated by AI, ensuring authenticity and preventing misinformation.
tony-42069
An AI text detection tool designed for educational use, helping teachers identify potentially AI-generated content in student submissions.
lxgicstudios
CLI tool to detect AI-generated text patterns and improve writing to avoid detection. Score, analyze, and humanize content.
tettehosabutey49
To solidify Python skills through a real-world application that addresses the growing need for AI-generated content detection in our digital landscape.
HxHippy
De-Slop is a content filtering extension that detects and removes AI-generated "slop" content from web pages using local pattern matching. Users can customize detection sensitivity and choose which types of content to filter.
Naman511
AI-powered interview analysis tool that transcribes audio recordings, identifies speakers, and evaluates responses using multiple LLMs. Features speaker diarization, AI-generated content detection, timing analysis, and automated scoring with visual reports.