Found 782 repositories(showing 30)
gitabcworld
This repo provides pytorch code which replicates the results of the Matching Networks for One Shot Learning paper on the Omniglot and MiniImageNet dataset
roboflow
Code for replicating Roboflow 100 benchmark results and programmatically downloading benchmark datasets
whoschek
bzfs is a CLI built for highly reliable and scalable near real-time ZFS snapshot replication with minimal operational complexity. It reliably replicates ZFS datasets in parallel using zfs send/receive and ssh, and can operate at sub-second intervals across large fleets of hosts for safe DR/HA.
SpaceinvaderOne
No description available
ExpediaGroup
Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.
karolpiczak
ESC: Dataset for Environmental Sound Classification - paper replication data
YARlabs
[H] HyperspaceDB is a high-performance, hyperbolic vector database written in Rust. It features 1-bit quantization, async replication, and native support for hierarchical datasets (Poincaré ball model).
tnwei
Modern replication of WaterNet from "An Underwater Image Enhancement Benchmark Dataset and Beyond", IEEE TIP 2019
varungohil
This repository contains code to replicate the experiments given in NeurIPS 2019 paper "One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers"
This repository contains code to replicate the no-longer publicly available Toronto BookCorpus dataset
alexcaselli
Thanks to the proliferation of smart devices, such as smartphones and wearables, which are equipped with computation, communication and sensing capabilities, a plethora of new location-based services and applications are available for the users at any time and everywhere. Understanding human mobility has gain importance to offer better services able to provide valuable products to the user whenever it's required. The ability to predict when and where individuals will go next allows enabling smart recommendation systems or a better organization of resources such as public transport vehicles or taxis. Network providers can predict future activities of individuals and groups to optimize network handovers, while transport systems can provide more vehicles or lines where required, reducing waiting time and discomfort to their clients. The representation of the movements of individuals or groups of mobile entities are called human mobility models. Such models replicate real human mobility characteristics, enabling to simulate movements of different individuals and infer their future whereabouts. The development of these models requires to collect in a centralized location, as a server, the information related to the users' locations. Such data represents sensitive information, and the collection of those threatens the privacy of the users involved. The recent introduction of federated learning, a privacy-preserving approach to build machine and deep learning models, represents a promising technique to solve the privacy issue. Federated learning allows mobile devices to contribute with their private data to the model creation without sharing them with a centralized server. In this thesis, we investigate the application of the federated learning paradigm to the field of human mobility modelling. Using three different mobility datasets, we first designed and developed a robust human mobility model by investigating different classes of neural networks and the influence of demographic data over models' performance. Second, we applied federated learning to create a human mobility model based on deep learning which does not require the collection of users' mobility traces, achieving promising results on two different datasets. Users' data remains so distributed over the big number of devices which have generated them, while the model is shared and trained among the server and the devices. Furthermore, the developed federated model has been the subject of different analyses including: the effects of sparse availability of the clients; The communication costs required by federated settings; The application of transfer-learning techniques and model refinement through federated learning and, lastly, the influence of differential privacy on the model’s prediction performance, also called utility
DolbyUUU
Pure RL to post-train base models for social reasoning capabilities. Lightweight replication of DeepSeek-R1-Zero with Social IQa dataset.
christianjauregui
Python package designed to construct and replicate datasets from Ken French's online library by accessing WRDS remotely through its cloud server "wrds-cloud".
prio-data
Replication scripts that were used to generate the PRIO-GRID v3.0 dataset using R. For version 2.0 replication see legacy branch.
2019ChenGong
[S&P 2024] Replication Package for "Mind Your Data! Hiding Backdoors in Offline Reinforcement Learning Datasets".
hiyouga
Code and dataset for our paper "Replicate, Walk, and Stop on Syntax: an Effective Neural Network Model for Aspect-Level Sentiment Classification", AAAI2020
abraithwaite
Create an example replicated + distributed dataset using Clickhouse
MadryLab
No description available
yuxdux
An open-source replication and extension of the Meta AI's LLAMA dataset
danilofreire
Manuscript, data, and replication materials for the paper "Deaths and Disappearances in the Pinochet Regime: A New Dataset" (2019)
SAP-samples
This repository complements our paper by offering the training dataset, the best-performing models utilized in our real-world experiment, the list of identified malicious packages, and the scripts necessary to replicate and verify our results.
bkelly-lab
Python codebase to create the global dataset of factor returns, stock returns, and firm characteristics from “Is there a Replication Crisis in Finance?” by Jensen, Kelly, and Pedersen (2023)
This repository contains a Python code with five implementations of topology optimization approaches suitable for 2D and 3D problems, all considering bi-directional evolutionary structural optimization. The approaches implemented include both discrete and continuous methods, namely: - Optimality Criteria, for continuous or discrete variables; - Method of Moving Asymptotes; - Sequential Least Squares Programming (from SciPy module); - Trust-region (from SciPy module). The implementation of the Optimality Criteria method is suitable for compliance minimization problems with one mass or volume constraint. The implementation of the remaining methods is suitable for stress constrained compliance minimization and stress minimization problems, both with one mass or volume constraint. The code uses the commercial software ABAQUS to execute Finite Element Analysis (FEA) and automatically access most of the necessary information for the optimization process, such as initial design, material properties, and loading conditions from a model database file (.cae) while providing a simple graphic user interface. Although the code has been developed mainly for educational purposes, its modularity allows for easy editing and extension to other topology optimization problems, making it interesting for more experienced researchers. This code has been used in the article "Python code for 2D and 3D stress constrained topology optimization in ABAQUS: theory, implementation, and case studies" [1]. The folders included in this dataset contain the results obtained, as well as the information necessary to replicate them. In particular, the folder 'Validation' contains the data used to validate the functioning of the code provided. Notes: - Stress-dependent problems are only compatible with the following ABAQUS element types: CPE4, CPS4, 3DQ8, and S4. - The authorship of the functions 'mmasub' and 'subsolv' used in the Method of Moving Asymptotes are credited to Arjen Deetman. Source: https://github.com/arjendeetman/GCMMA-MMA-Python - Despite the validations performed, this program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
haolunc
This repository contains the replication of the iGSM dataset generation process from the Physics of LLM paper by Zeyuan Zhu.
Archeries
The lack of benchmarking datasets for pedestrian stride length estimation makes it hard to pinpoint differences of published methods. Existing datasets either lack the ground-truth of each stride or are limited to small spaces with single scene or motion pattern. To fully evaluate the performance of proposed ASLE algorithm, we conducted benchmark dataset for natural pedestrian dead reckoning using smartphone sensors and FM-INS module. we leveraged the FM-INS module to provide the ground-truth of each stride with motion distance errors in 0.3% of the entire travel distance. The datasets were obtained from a group of healthy adults with natural motion patterns (fast walking, normal walking, slow walking, running, jumping). The datasets contained more than 22 km, 10000 strides of gait measurements. The datasets cover both indoor and outdoor cases, including: stairs, escalators, elevators, office environments, shopping mall, streets and metro station. To make it easier for readers to replicate experiment, we shared the sampling software.
Replication of Jay Sinha's Efficient Deep CNN-BiLSTM model for network intrusion detection using NSL-KDD and UNSW-NB15 datasets to understand the research process, model training, accuracy evaluation, and research paper writing.
eleonorapoeta
This repository contains the official implementation of "A Benchmarking Study of Kolmogorov-Arnold Networks on Tabular Data" (under review). You can use this codebase to replicate our experiments about benchmarking KAN networks on some of the most used real-world tabular datasets.
ajaybhatiya1234
 Read the technical deep dive: https://www.dessa.com/post/deepfake-detection-that-actually-works # Visual DeepFake Detection In our recent [article](https://www.dessa.com/post/deepfake-detection-that-actually-works), we make the following contributions: * We show that the model proposed in current state of the art in video manipulation (FaceForensics++) does not generalize to real-life videos randomly collected from Youtube. * We show the need for the detector to be constantly updated with real-world data, and propose an initial solution in hopes of solving deepfake video detection. Our Pytorch implementation, conducts extensive experiments to demonstrate that the datasets produced by Google and detailed in the FaceForensics++ paper are not sufficient for making neural networks generalize to detect real-life face manipulation techniques. It also provides a current solution for such behavior which relies on adding more data. Our Pytorch model is based on a pre-trained ResNet18 on Imagenet, that we finetune to solve the deepfake detection problem. We also conduct large scale experiments using Dessa's open source scheduler + experiment manger [Atlas](https://github.com/dessa-research/atlas). ## Setup ## Prerequisities To run the code, your system should meet the following requirements: RAM >= 32GB , GPUs >=1 ## Steps 0. Install [nvidia-docker](https://github.com/nvidia/nvidia-docker/wiki/Installation-(version-2.0)) 00. Install [ffmpeg](https://www.ffmpeg.org/download.html) or `sudo apt install ffmpeg` 1. Git Clone this repository. 2. If you haven't already, install [Atlas](https://github.com/dessa-research/atlas). 3. Once you've installed Atlas, activate your environment if you haven't already, and navigate to your project folder. That's it, You're ready to go! ## Datasets Half of the dataset used in this project is from the [FaceForensics](https://github.com/ondyari/FaceForensics/tree/master/dataset) deepfake detection dataset. . To download this data, please make sure to fill out the [google form](https://github.com/ondyari/FaceForensics/#access) to request access to the data. For the dataset that we collected from Youtube, it is accessible on [S3](ttps://deepfake-detection.s3.amazonaws.com/augment_deepfake.tar.gz) for download. To automatically download and restructure both datasets, please execute: ``` bash restructure_data.sh faceforensics_download.py ``` Note: You need to have received the download script from FaceForensics++ people before executing the restructure script. Note2: We created the `restructure_data.sh` to do a split that replicates our exact experiments avaiable in the UI above, please feel free to change the splits as you wish. ## Walkthrough Before starting to train/evaluate models, we should first create the docker image that we will be running our experiments with. To do so, we already prepared a dockerfile to do that inside `custom_docker_image`. To create the docker image, execute the following commands in terminal: ``` cd custom_docker_image nvidia-docker build . -t atlas_ff ``` Note: if you change the image name, please make sure you also modify line 16 of `job.config.yaml` to match the docker image name. Inside `job.config.yaml`, please modify the data path on host from `/media/biggie2/FaceForensics/datasets/` to the absolute path of your `datasets` folder. The folder containing your datasets should have the following structure: ``` datasets ├── augment_deepfake (2) │ ├── fake │ │ └── frames │ ├── real │ │ └── frames │ └── val │ ├── fake │ └── real ├── base_deepfake (1) │ ├── fake │ │ └── frames │ ├── real │ │ └── frames │ └── val │ ├── fake │ └── real ├── both_deepfake (3) │ ├── fake │ │ └── frames │ ├── real │ │ └── frames │ └── val │ ├── fake │ └── real ├── precomputed (4) └── T_deepfake (0) ├── manipulated_sequences │ ├── DeepFakeDetection │ ├── Deepfakes │ ├── Face2Face │ ├── FaceSwap │ └── NeuralTextures └── original_sequences ├── actors └── youtube ``` Notes: * (0) is the dataset downloaded using the FaceForensics repo scripts * (1) is a reshaped version of FaceForensics data to match the expected structure by the codebase. subfolders called `frames` contain frames collected using `ffmpeg` * (2) is the augmented dataset, collected from youtube, available on s3. * (3) is the combination of both base and augmented datasets. * (4) precomputed will be automatically created during training. It holds cashed cropped frames. Then, to run all the experiments we will show in the article to come, you can launch the script `hparams_search.py` using: ```bash python hparams_search.py ``` ## Results In the following pictures, the title for each subplot is in the form `real_prob, fake_prob | prediction | label`. #### Model trained on FaceForensics++ dataset For models trained on the paper dataset alone, we notice that the model only learns to detect the manipulation techniques mentioned in the paper and misses all the manipulations in real world data (from data)   #### Model trained on Youtube dataset Models trained on the youtube data alone learn to detect real world deepfakes, but also learn to detect easy deepfakes in the paper dataset as well. These models however fail to detect any other type of manipulation (such as NeuralTextures).   #### Model trained on Paper + Youtube dataset Finally, models trained on the combination of both datasets together, learns to detect both real world manipulation techniques as well as the other methods mentioned in FaceForensics++ paper.   for a more in depth explanation of these results, please refer to the [article](https://www.dessa.com/post/deepfake-detection-that-actually-works) we published. More results can be seen in the [interactive UI](http://deepfake-detection.dessa.com/projects) ## Help improve this technology Please feel free to fork this work and keep pushing on it. If you also want to help improving the deepfake detection datasets, please share your real/forged samples at foundations@dessa.com. ## LICENSE © 2020 Square, Inc. ATLAS, DESSA, the Dessa Logo, and others are trademarks of Square, Inc. All third party names and trademarks are properties of their respective owners and are used for identification purposes only.
JYongSmile
HAASD: A dataset of Household Appliances Abnormal Sound Detection - paper replication data
daetz-coder
VectorNet Code Replication,Contains mini data datasets that can be run directly,The visualization content will be updated in the future