Found 185 repositories(showing 30)
WenjieDu
A Python toolkit/library for reality-centric machine/deep learning and data mining on partially-observed time series, including SOTA neural network models for scientific analysis tasks of imputation/classification/clustering/forecasting/anomaly detection/cleaning on incomplete industrial (irregularly-sampled) multivariate TS with NaN missing values
fipelle
A Julia implementation of basic tools for time series analysis compatible with incomplete data.
Haoyu-ha
Towards Robust Multimodal Sentiment Analysis with Incomplete Data
CmosZhang
Seismic data reconstruction is an important research direction in the field of seismic signal analysis. The complete seismic data can be used to estimate interior images of the Earth, which can aid the exploration for resources and research in to the shallow structure of the crust for geological and environmental purposes. However, due to the severely corrupted seismic traces and seismic slices, harsh detection conditions, and even financial constraints, seismic data usually has lots of missing data entries and noise. Therefore, it is necessary to investigate the robust recovery of seismic data from incomplete and noisy data.
FT-ZHOU-ZZZ
Incomplete Multimodal Data Integration to Advance Precise Treatment Response Prediction and Survival Analysis for Gastric Cancer
hawksilent
Code for P-RMF (ACL 2025): Proxy-Driven Robust Multimodal Sentiment Analysis with Incomplete Data.
A dense summary of data analysis techniques (includes incompleted mnemonic R codes) in "14.310x - Data Analysis for Social Scientists" MOOC offered by Massachusetts Institute of Technology (MIT).
xumaomao94
code for "Tensor Train Factorization under Noisy and Incomplete Data with Automatic Rank Estimation" and "Overfitting Avoidance in Tensor Train Factorization and Completion: Prior Analysis and Inference"
Introduction This project looks at the mergers and acquisitions of 30 publicly traded companies and attempts to determine the stock price at closing. M&As are incredibly difficult to assess, and while the company's instrinsic value and fundamentals play a significant role in predicting whether a merger will be "successful", public sentiment from Wall Street investors is another commonly referenced topic. Brainstorming for this project prompted two notable observations; data on M&As are often incomplete and highly inconsistent given the confidentiality behind these deals, and determining an appropriate dependent variable y for analysis presents a significant challenge (would most likely require an additional project on its own). The success of a merger could be measured various ways, but often times the unpredictability of management makes all the more challenging. Culture, reorganization, and leadership shake-up are all attributes that play an important role in the success of an M&A but are difficult to quantify. Although I do build and run a model in this proejct, the complexity around this subject urged me to focus primarily on data gathering and manipulation. Since one would most likely need to compose a dataframe with the attributes necessary to run an a useful analysis on Mergers and Acquisition, I believe this is a valuable first step. For a more balanced notebook between EDA, data manipulation, and models, I have a project that focuses on COVID19's impact on Post-Secondary Education below titled COVID19 Effects on Post-Secondary Education https://github.com/clozgil The Process My objective was to build a dataframe with useful attribtues from scratch. I found that three reports per company would have sufficient information to get started. Acquistion data (any and all information on the company's M&A) Financial ratios (data to determine the company's fundamentals) Stock information (data to gain insight into Wall Street sentiment) Since downloading, importanting, and cleaning each one of those files for each of the 30 companies would be cumbersome, I looped on all the data files using the OS module, simulteanously cleaning and merging each one of the files. However, for the purpose of this presentation, I will feature each one of my data cleaning techniques for one company - Apple. Data Sources For reference only. All necessary data for this project can be found in the data dictionary Acquisition data: https://www.capitaliq.com/CIQDotNet/my/dashboard.aspx * Financial ratios: https://www-mergentonline-com.pitt.idm.oclc.org/companyfinancials.php?pagetype=ratios&compnumber=46247&period=Quarters&range=50&Submit=Refresh&csrf_token_mol=3680683535 * Stock info: https://www-mergentonline-com.pitt.idm.oclc.org/equitypricing.php?pagetype=report&compnumber=46247 * (*) = Account required. University of Pittsburgh account used for access
No description available
ErikBoesen
Bravo is a pit dashboard for FRC which shows data from The Blue Alliance about upcoming matches, along with analysis and many other useful features. INCOMPLETE AND ABANDONED.
RTA data collected are huge, multi-dimensional and heterogeneous. Moreover, the data may be incomplete and contain erroneous values, which makes the data analysis a daunting task. The target data for this study was collected by the Department for Transport, GB. Several data mining techniques such as handling an imbalanced dataset, factor reduction and prediction algorithms such as Naïve Bayes, Decision Tree, Random Forest, Logistic Regression, Support Vector Machines (SVM) were carried out to perform an effective data analysis that could potentially support the transport department in devising better precautional measures to minimize the road accident occurrences in Great Britain. Moreover, the idea of chaining two different algorithms was attempted by identifying the significant attributes through Random Forest technique and feeding them as input to other ML algorithms. In addition, the key factors that influence these road collisions were identified and presented.
klyshko
Datasets, aggregated from different sources, can have missing or incomplete values which impose difficulties on data analysis and research. The proposed project aims to build the pipeline for the recovery of various data: categorical, numerical and textual – collected from multiple resources. In the project, we focus on incomplete data, such as geographical locations (cities, places, highways, coordinates), information sources (news websites, TV channels, articles) and measured features of celestial objects (meteorites’ mass and type).
Chris7462
This is a modified version of rainbow package in R software. We propose to use the conditional expectation approach to functional principal component analysis (FPCA) that can be applied to the functional bagplot and functional highest density region (HDR) boxplot, which makes outlier detection possible for incomplete functional data.
jwhite1987
The goal of this project is to take the dataset, an employee database, to create a table schema for each of the six files (located in the data folder). Then, importing each file into a corresponding table, an analysis will be performed on the dataset. The analysis will include taking the information given and making a more comprehensive database by linking the "incomplete" data files and joining them together coding in a fashion that makes it easy to use and much more detailed and comprehensive. The final result will be a database that's much easier to use and understand.
grouptheory
This software implements a new method for obtaining network properties from incomplete data sets. Problems associated with missing data represent well-known stumbling blocks in Social Network Analysis. The method of “estimating connectivity from spanning tree completions” (ECSTC) is specifically designed to address situations where only spanning tree(s) of a network are known, such as those obtained through respondent driven sampling (RDS). Using repeated random completions derived from degree information, this method forgoes the usual step of trying to obtain final edge or vertex rosters, and instead aims to estimate network-centric properties of vertices probabilistically from the spanning trees themselves. In this paper, we discuss the problem of missing data and describe the protocols of our completion method, and finally the results of an experiment where ECSTC was used to estimate graph dependent vertex properties from spanning trees sampled from a graph whose characteristics were known ahead of time. The results show that ECSTC methods hold more promise for obtaining network-centric properties of individuals from a limited set of data than researchers may have previously assumed. Such an approach represents a break with past strategies of working with missing data which have mainly sought means to complete the graph, rather than ECSTC's approach, which is to estimate network properties themselves without deciding on the final edge set.
Arturo-Esquivel
No description available
MaeveLi
R markdown files for IDA
uscensusbureau
Analysis of Incomplete Multivariate Data under a Normal Model
changgee
Supplementary Material for Multiple Imputation for Analysis of Incomplete Data in Distributed Health Data Networks
JKP1575540259
Data for: Extended singular spectrum analysis for processing incomplete and heterogeneous time series
JinchengZ
R code and data for the manuscript "A Bayesian Hierarchical CACE Model Accounting for Incomplete Noncompliance Data in Meta-analysis"
cran
:exclamation: This is a read-only mirror of the CRAN R package repository. norm2 — Analysis of Incomplete Multivariate Data under a Normal Model
futureomics
Cancer Gene Expression and Exploratory Data Analysis (EDA) with Skrub is a powerful and modern approach to analyzing tabular data, particularly when that data is messy, incomplete, or contains categorical columns.
spathak01
Data cleaning in Excel involves the process of identifying and correcting inaccurate, incomplete, or irrelevant data in a spreadsheet. This is an important step in preparing data for analysis or reporting.
aman040499
This contains all the work and incomplete projects related to learning path of data analysis. This includes dashboards (using Power BI and IBM Cognos) and python visualization projects.
Asemota-otasowie
A data analysis project focused on evaluating outreach campaign effectiveness by tracking application progress across countries. Due to incomplete applicant follow-ups, records were categorized into "Completed" and "Not Completed" applications to enable structured analysis and reporting.
pradumansalunkhe
Data cleaning is the process of fixing or removing incorrect, corrupted, duplicate, or incomplete data within a dataset. Messy data leads to unreliable outcomes. Cleaning data is an essential part of data analysis, and demonstrating your data cleaning skills is key to landing a job. Here are some projects to test out your data cleaning skills
Ankur-Halder
Auto-CSV-Clener is a Python script that automates data cleaning for CSV files. It drops unnecessary or incomplete columns, handles missing values, encodes categorical data, standardizes numerical data, and exports a clean version—ready for analysis or ML models. Perfect for quick, consistent preprocessing.
kamalinikongara-gif
Real estate decisions are often influenced by personal experience, market perception, or incomplete information. This project aims to demonstrate how data analysis can be used to better understand property pricing patterns and support more informed decision-making.