Found 10 repositories(showing 10)
CarterLoftus
This repository contains all code necessary to reproduce the results from “Loftus et al. 2022. Ecological and social pressures interfere with homeostatic sleep regulation in the wild. eLife. https://doi.org/10.7554/eLife.73695”. To reproduce the analysis, run the scripts in the order they are numbered. Please email me with any questions at jcloftus@ucdavis.edu. Main analysis: The script “00_processing_raw_acc.ipynb” takes the raw GPS and accelerometry data from 2012, downloaded with “all sensor types” (i.e. including both GPS and accelerometry data) from Movebank as an input – “Collective movement in wild baboons.csv”. The script then removes columns that are not needed, and downsamples and interpolates the daytime accelerometry data to match the nighttime accelerometry bursts. It outputs the file “all_burst_acc.csv”. The script “01_acc_to_vedba.R” takes “all_burst_acc.csv” as an input, calculates the average VeDBA and log VeDBA for each accelerometry burst, and outputs the file “full_night_and_day_data.csv” The script “02_acc_1min_vedba_2012.m” takes the raw data as an input (“Collective movement in wild baboons.csv”, downloaded from Movebank), calculates the average VeDBA for each minute of the day, using “02a_calc_vedba.m” and “02b_calc_stat_only_vedba.m”, and produces “vedba_mean_2012.csv” as an output. The script “03_baboon_sleep_analysis.R” performs most of the analysis associated with the manuscript. It requires the following files inputs (with the locations of the inputs in parentheses): “full_night_and_day_data.csv” (produced by script above), “sleep trees.cpg/.dbf/.prj/.shp/shx” (Dyrad), “env_data.csv-6150899038464587825.csv” (Dryad), “vedba_mean_2012.csv” (both produced by script above, and also available on Dryad). The script runs the sleep classification algorithm using the log VeDBA data, adds all data that is used for predictor variables in the models (e.g. temperature, moon phase), runs most of the models used in the analysis, and plots several the results from these models. The script “04_social_sleep_analysis.R” takes “full_night_and_day_data.csv” (produced by “01_acc_to_vedba.R”) and “final_sleep.csv" (produced by “03_baboon_sleep_analysis.R”) as inputs. The script performs permutations to test the sentinel hypothesis (whether at least one group member is awake more often than expected by chance) and whether the group exhibits synchronization in their sleep-wake patterns during the night. The script then runs a model to test whether baboons are more likely to synchronize their sleep-wake patterns when they are sleeping in the same tree. The script “05_arousal_threshold_analysis.R” takes “sleep_analysis_pub_code_pre_mods.RData” (produced by “03_baboon_sleep_analysis.R”) as an input. The script further prepares the data for analysis and runs a model to test whether baboons are less likely to wake in response to the waking activity of their neighbors (i.e. if they have a higher arousal threshold) following nights of poor sleep. The raw data is available for download from the project called “Collective movement in wild baboons” on Movebank. Validation study: The script “00_processing_raw_acc_2019.ipynb” takes the raw GPS and accelerometry data from 2019, downloaded with “all sensor types” (i.e. including both GPS and accelerometry data) from Movebank as an input – “Papio Anubis Mpala 2019.csv”. The script then removes columns that are not needed as well as the data from individuals whose collars were programmed to sample on a different schedule than in 2012. With a different sampling schedule, we could not apply the same sleep algorithm to these individuals. The script then produces the output “2019_Papio_anubis_acc_Loftus_et_al_Dryad.csv", which is published on Dryad (note: the full 2019 dataset is not yet publicly available on Movebank), and this file becomes the input for the rest of the script. The rest of the script interpolates both the daytime and nighttime accelerometry data to match the sampling rate of the 2012 accelerometry bursts. It outputs the file “validation_burst_acc.csv”. The script “01_acc_to_vedba_2019.R” takes “validation_burst_acc.csv” as an input, trims the daytime accelerometry to match the nighttime accelerometry bursts, calculates the average VeDBA and log VeDBA for each burst, and outputs this information in the dataframe “2019_full_night_and_day_data.csv”. The script “02_sleep_validation.R” performs the actual validation of the sleep algorithm. It takes “2019_full_night_and_day_data.csv” and files within the folder “loopy_focal_follows_2021_09_17”, which contain the behavioral observations from the thermal imagery, as inputs. It also requires the input file “tag_metadata.csv”, and .txt files associated with the thermal videos that are saved within the archive of the MPI-AB EAS department storage (these raw inputs are not needed if downloading the data from Dryad). The script first runs the sleep classification algorithm on the log VeDBA data, and saves the sleep classification. The script then aggregates the behavioral observation data into one dataframe, adds the real (absolute) timestamps based on the timestamps of the frames of the videos, trims the dataframe to only the relevant observations, and produces the file “2019_Papio_anubis_behavioral_scoring_Loftus_et_al_Dryad.csv”, which is available for download on Dryad (the raw behavioral observations are not available for download). The rest of the script uses the sleep classification from above within this script, and the behavioral scoring csv that is available on Dryad as inputs, and applies a time correction to he behavioral scoring, so that it matches GPS time, then compares the sleep classification from the algorithm to the sleep classification from behavioral observations to determine the accuracy of, and produce a confusion matrix for, the accelerometry-based sleep classification algorithm.
Seth22
Data Analysis of ACC(stock) using R
oluwatumininuvaughan
No description available
No description available
kartikparlikar
No description available
JonasFortes12
Este aplicativo foi desenvolvido em Streamlit para explorar e analisar os dados de Boletins de Acidente de Trânsito, disponibilizados pelo portal de Dados Abertos da Polícia Rodoviária Federal (PRF)
No description available
It is a project with Korea Highway Corporation with the students of Data Youth Campus hosted by the Korea Data Industry Promotion Agency in July 2020.
hisham2alsuiss
The run_analysis.R script performs the data preparation and then followed by the 5 steps required as described in the course project’s definition. Download the dataset Dataset downloaded and extracted under the folder called UCI HAR Dataset Assign each data to variables features <- features.txt : 561 rows, 2 columns The features selected for this database come from the accelerometer and gyroscope 3-axial raw signals tAcc-XYZ and tGyro-XYZ. activities <- activity_labels.txt : 6 rows, 2 columns List of activities performed when the corresponding measurements were taken and its codes (labels) subject_test <- test/subject_test.txt : 2947 rows, 1 column contains test data of 9/30 volunteer test subjects being observed x_test <- test/X_test.txt : 2947 rows, 561 columns contains recorded features test data y_test <- test/y_test.txt : 2947 rows, 1 columns contains test data of activities’code labels subject_train <- test/subject_train.txt : 7352 rows, 1 column contains train data of 21/30 volunteer subjects being observed x_train <- test/X_train.txt : 7352 rows, 561 columns contains recorded features train data y_train <- test/y_train.txt : 7352 rows, 1 columns contains train data of activities’code labels Merges the training and the test sets to create one data set X (10299 rows, 561 columns) is created by merging x_train and x_test using rbind() function Y (10299 rows, 1 column) is created by merging y_train and y_test using rbind() function Subject (10299 rows, 1 column) is created by merging subject_train and subject_test using rbind() function Merged_Data (10299 rows, 563 column) is created by merging Subject, Y and X using cbind() function Extracts only the measurements on the mean and standard deviation for each measurement TidyData (10299 rows, 88 columns) is created by subsetting Merged_Data, selecting only columns: subject, code and the measurements on the mean and standard deviation (std) for each measurement Uses descriptive activity names to name the activities in the data set Entire numbers in code column of the TidyData replaced with corresponding activity taken from second column of the activities variable Appropriately labels the data set with descriptive variable names code column in TidyData renamed into activities All Acc in column’s name replaced by Accelerometer All Gyro in column’s name replaced by Gyroscope All BodyBody in column’s name replaced by Body All Mag in column’s name replaced by Magnitude All start with character f in column’s name replaced by Frequency All start with character t in column’s name replaced by Time From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject FinalData (180 rows, 88 columns) is created by sumarizing TidyData taking the means of each variable for each activity and each subject, after groupped by subject and activity. Export FinalData into FinalData.txt file.
emmtizy
Predict activity quality from activity monitors ##Synopsis Using devices such as Jawbone Up, Nike FuelBand, and Fitbit it is now possible to collect a large amount of data about personal activity relatively inexpensively. These type of devices are part of the quantified self movement – a group of enthusiasts who take measurements about themselves regularly to improve their health, to find patterns in their behavior, or because they are tech geeks. One thing that people regularly do is quantify how much of a particular activity they do, but they rarely quantify how well they do it. In this project, your goal will be to use data from accelerometers on the belt, forearm, arm, and dumbell of 6 participants. They were asked to perform barbell lifts correctly and incorrectly in 5 different ways. The goal of this project is to predict the manner in which they did the exercise. This is the classe variable in the training set. Data description The outcome variable is classe, a factor variable with 5 levels. For this data set, participants were asked to perform one set of 10 repetitions of the Unilateral Dumbbell Biceps Curl in 5 different fashions: exactly according to the specification (Class A) throwing the elbows to the front (Class B) lifting the dumbbell only halfway (Class C) lowering the dumbbell only halfway (Class D) throwing the hips to the front (Class E) Initial configuration The initial configuration consists of loading some required packages and initializing some variables. #Data variables training.file <- './data/pml-training.csv' test.cases.file <- './data/pml-testing.csv' training.url <- 'http://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv' test.cases.url <- 'http://d396qusza40orc.cloudfront.net/predmachlearn/pml-testing.csv' #Directories if (!file.exists("data")){ dir.create("data") } if (!file.exists("data/submission")){ dir.create("data/submission") } #R-Packages IscaretInstalled <- require("caret") ## Loading required package: caret ## Loading required package: lattice ## Loading required package: ggplot2 if(!IscaretInstalled){ install.packages("caret") library("caret") } IsrandomForestInstalled <- require("randomForest") ## Loading required package: randomForest ## randomForest 4.6-10 ## Type rfNews() to see new features/changes/bug fixes. if(!IsrandomForestInstalled){ install.packages("randomForest") library("randomForest") } IsRpartInstalled <- require("rpart") ## Loading required package: rpart if(!IsRpartInstalled){ install.packages("rpart") library("rpart") } IsRpartPlotInstalled <- require("rpart.plot") ## Loading required package: rpart.plot if(!IsRpartPlotInstalled){ install.packages("rpart.plot") library("rpart.plot") } # Set seed for reproducability set.seed(9999) Data processing In this section the data is downloaded and processed. Some basic transformations and cleanup will be performed, so that NA values are omitted. Irrelevant columns such as user_name, raw_timestamp_part_1, raw_timestamp_part_2, cvtd_timestamp, new_window, and num_window (columns 1 to 7) will be removed in the subset. The pml-training.csv data is used to devise training and testing sets. The pml-test.csv data is used to predict and answer the 20 questions based on the trained model. # Download data download.file(training.url, training.file) download.file(test.cases.url,test.cases.file ) # Clean data training <-read.csv(training.file, na.strings=c("NA","#DIV/0!", "")) testing <-read.csv(test.cases.file , na.strings=c("NA", "#DIV/0!", "")) training<-training[,colSums(is.na(training)) == 0] testing <-testing[,colSums(is.na(testing)) == 0] # Subset data training <-training[,-c(1:7)] testing <-testing[,-c(1:7)] Cross-validation In this section cross-validation will be performed by splitting the training data in training (75%) and testing (25%) data. subSamples <- createDataPartition(y=training$classe, p=0.75, list=FALSE) subTraining <- training[subSamples, ] subTesting <- training[-subSamples, ] Expected out-of-sample error The expected out-of-sample error will correspond to the quantity: 1-accuracy in the cross-validation data. Accuracy is the proportion of correct classified observation over the total sample in the subTesting data set. Expected accuracy is the expected accuracy in the out-of-sample data set (i.e. original testing data set). Thus, the expected value of the out-of-sample error will correspond to the expected number of missclassified observations/total observations in the Test data set, which is the quantity: 1-accuracy found from the cross-validation data set. Exploratory analysis The variable classe contains 5 levels. The plot of the outcome variable shows the frequency of each levels in the subTraining data. plot(subTraining$classe, col="orange", main="Levels of the variable classe", xlab="classe levels", ylab="Frequency") The plot above shows that Level A is the most frequent classe. D appears to be the least frequent one. Prediction models In this section a decision tree and random forest will be applied to the data. Decision tree # Fit model modFitDT <- rpart(classe ~ ., data=subTraining, method="class") # Perform prediction predictDT <- predict(modFitDT, subTesting, type = "class") # Plot result rpart.plot(modFitDT, main="Classification Tree", extra=102, under=TRUE, faclen=0) Following confusion matrix shows the errors of the prediction algorithm. confusionMatrix(predictDT, subTesting$classe) ## Confusion Matrix and Statistics ## ## Reference ## Prediction A B C D E ## A 1266 208 25 91 29 ## B 33 535 71 30 67 ## C 28 90 676 130 94 ## D 45 72 59 501 43 ## E 23 44 24 52 668 ## ## Overall Statistics ## ## Accuracy : 0.7435 ## 95% CI : (0.731, 0.7557) ## No Information Rate : 0.2845 ## P-Value [Acc > NIR] : < 2.2e-16 ## ## Kappa : 0.6738 ## Mcnemar's Test P-Value : < 2.2e-16 ## ## Statistics by Class: ## ## Class: A Class: B Class: C Class: D Class: E ## Sensitivity 0.9075 0.5638 0.7906 0.6231 0.7414 ## Specificity 0.8994 0.9492 0.9155 0.9466 0.9643 ## Pos Pred Value 0.7820 0.7269 0.6640 0.6958 0.8237 ## Neg Pred Value 0.9607 0.9007 0.9539 0.9276 0.9431 ## Prevalence 0.2845 0.1935 0.1743 0.1639 0.1837 ## Detection Rate 0.2582 0.1091 0.1378 0.1022 0.1362 ## Detection Prevalence 0.3301 0.1501 0.2076 0.1468 0.1654 ## Balanced Accuracy 0.9035 0.7565 0.8531 0.7849 0.8528 Random forest # Fit model modFitRF <- randomForest(classe ~ ., data=subTraining, method="class") # Perform prediction predictRF <- predict(modFitRF, subTesting, type = "class") Following confusion matrix shows the errors of the prediction algorithm. confusionMatrix(predictRF, subTesting$classe) ## Confusion Matrix and Statistics ## ## Reference ## Prediction A B C D E ## A 1394 2 0 0 0 ## B 1 946 8 0 0 ## C 0 1 846 6 0 ## D 0 0 1 796 1 ## E 0 0 0 2 900 ## ## Overall Statistics ## ## Accuracy : 0.9955 ## 95% CI : (0.9932, 0.9972) ## No Information Rate : 0.2845 ## P-Value [Acc > NIR] : < 2.2e-16 ## ## Kappa : 0.9943 ## Mcnemar's Test P-Value : NA ## ## Statistics by Class: ## ## Class: A Class: B Class: C Class: D Class: E ## Sensitivity 0.9993 0.9968 0.9895 0.9900 0.9989 ## Specificity 0.9994 0.9977 0.9983 0.9995 0.9995 ## Pos Pred Value 0.9986 0.9906 0.9918 0.9975 0.9978 ## Neg Pred Value 0.9997 0.9992 0.9978 0.9981 0.9998 ## Prevalence 0.2845 0.1935 0.1743 0.1639 0.1837 ## Detection Rate 0.2843 0.1929 0.1725 0.1623 0.1835 ## Detection Prevalence 0.2847 0.1947 0.1739 0.1627 0.1839 ## Balanced Accuracy 0.9994 0.9973 0.9939 0.9948 0.9992 Conclusion Result The confusion matrices show, that the Random Forest algorithm performens better than decision trees. The accuracy for the Random Forest model was 0.995 (95% CI: (0.993, 0.997)) compared to 0.739 (95% CI: (0.727, 0.752)) for Decision Tree model. The random Forest model is choosen. Expected out-of-sample error The expected out-of-sample error is estimated at 0.005, or 0.5%. The expected out-of-sample error is calculated as 1 - accuracy for predictions made against the cross-validation set. Our Test data set comprises 20 cases. With an accuracy above 99% on our cross-validation data, we can expect that very few, or none, of the test samples will be missclassified. Submission In this section the files for the project submission are generated using the random forest algorithm on the testing data. # Perform prediction predictSubmission <- predict(modFitRF, testing, type="class") predictSubmission ## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ## B A B A A E D B A A B C B A E E A B B B ## Levels: A B C D E # Write files for submission pml_write_files = function(x){ n = length(x) for(i in 1:n){ filename = paste0("./data/submission/problem_id_",i,".txt") write.table(x[i],file=filename,quote=FALSE,row.names=FALSE,col.names=FALSE) } } pml_write_files(predictSubmission)
All 10 repositories loaded