Search Results

Found 185 repositories(showing 30)

PyPOTS

WenjieDu

💛74

A Python toolkit/library for reality-centric machine/deep learning and data mining on partially-observed time series, including SOTA neural network models for scientific analysis tasks of imputation/classification/clustering/forecasting/anomaly detection/cleaning on incomplete industrial (irregularly-sampled) multivariate TS with NaN missing values

2.0k

183

BSD-3-Clause

Python

Updated 34 minutes ago

anomaly-detectionclassificationclustering+12

MessyTimeSeries.jl

fipelle

❤️40

A Julia implementation of basic tools for time series analysis compatible with incomplete data.

124

BSD-3-Clause

Julia

Updated 6 months ago

forecastirregular-time-seriesjulia+3

LNLN

Haoyu-ha

🧡55

Towards Robust Multimodal Sentiment Analysis with Incomplete Data

111

MIT

Python

Updated 1 week ago

multimodal-learningmultimodal-sentiment-analysisrobust-msa

Seismic_Sensory_Data_Analysis

CmosZhang

❤️35

Seismic data reconstruction is an important research direction in the field of seismic signal analysis. The complete seismic data can be used to estimate interior images of the Earth, which can aid the exploration for resources and research in to the shallow structure of the crust for geological and environmental purposes. However, due to the severely corrupted seismic traces and seismic slices, harsh detection conditions, and even financial constraints, seismic data usually has lots of missing data entries and noise. Therefore, it is necessary to investigate the robust recovery of seismic data from incomplete and noisy data.

Matlab

Updated 4 months ago

iMD4GC

FT-ZHOU-ZZZ

❤️30

Incomplete Multimodal Data Integration to Advance Precise Treatment Response Prediction and Survival Analysis for Gastric Cancer

Python

Updated 3 months ago

P-RMF

hawksilent

🧡60

Code for P-RMF (ACL 2025): Proxy-Driven Robust Multimodal Sentiment Analysis with Incomplete Data.

MIT

Updated 1 week ago

Data-Analysis-Cheat-Sheet-for-Social-Scientists

thienan092

❤️35

A dense summary of data analysis techniques (includes incompleted mnemonic R codes) in "14.310x - Data Analysis for Social Scientists" MOOC offered by Massachusetts Institute of Technology (MIT).

TeX

Updated 1 year ago

PTTC

xumaomao94

❤️35

code for "Tensor Train Factorization under Noisy and Incomplete Data with Automatic Rank Estimation" and "Overfitting Avoidance in Tensor Train Factorization and Completion: Prior Analysis and Inference"

MATLAB

Updated 6 months ago

Mergers-Acquisition-Stock-Price-Prediction

clozgil

❤️35

Introduction This project looks at the mergers and acquisitions of 30 publicly traded companies and attempts to determine the stock price at closing. M&As are incredibly difficult to assess, and while the company's instrinsic value and fundamentals play a significant role in predicting whether a merger will be "successful", public sentiment from Wall Street investors is another commonly referenced topic. Brainstorming for this project prompted two notable observations; data on M&As are often incomplete and highly inconsistent given the confidentiality behind these deals, and determining an appropriate dependent variable y for analysis presents a significant challenge (would most likely require an additional project on its own). The success of a merger could be measured various ways, but often times the unpredictability of management makes all the more challenging. Culture, reorganization, and leadership shake-up are all attributes that play an important role in the success of an M&A but are difficult to quantify. Although I do build and run a model in this proejct, the complexity around this subject urged me to focus primarily on data gathering and manipulation. Since one would most likely need to compose a dataframe with the attributes necessary to run an a useful analysis on Mergers and Acquisition, I believe this is a valuable first step. For a more balanced notebook between EDA, data manipulation, and models, I have a project that focuses on COVID19's impact on Post-Secondary Education below titled COVID19 Effects on Post-Secondary Education https://github.com/clozgil The Process My objective was to build a dataframe with useful attribtues from scratch. I found that three reports per company would have sufficient information to get started. Acquistion data (any and all information on the company's M&A) Financial ratios (data to determine the company's fundamentals) Stock information (data to gain insight into Wall Street sentiment) Since downloading, importanting, and cleaning each one of those files for each of the 30 companies would be cumbersome, I looped on all the data files using the OS module, simulteanously cleaning and merging each one of the files. However, for the purpose of this presentation, I will feature each one of my data cleaning techniques for one company - Apple. Data Sources For reference only. All necessary data for this project can be found in the data dictionary Acquisition data: https://www.capitaliq.com/CIQDotNet/my/dashboard.aspx * Financial ratios: https://www-mergentonline-com.pitt.idm.oclc.org/companyfinancials.php?pagetype=ratios&compnumber=46247&period=Quarters&range=50&Submit=Refresh&csrf_token_mol=3680683535 * Stock info: https://www-mergentonline-com.pitt.idm.oclc.org/equitypricing.php?pagetype=report&compnumber=46247 * (*) = Account required. University of Pittsburgh account used for access

Jupyter Notebook

Updated 1 year ago

Multi-class-boosting-for-the-analysis-of-multiple-incomplete-views-on-microbiome-data

AndreaMSBios

❤️25

No description available

Python

Updated 2 years ago

bravo

ErikBoesen

❤️25

Bravo is a pit dashboard for FRC which shows data from The Blue Alliance about upcoming matches, along with analysis and many other useful features. INCOMPLETE AND ABANDONED.

MIT

JavaScript

Updated 8 years ago

A-data-mining-approach-on-the-analysis-of-road-safety-in-Great-Britain

vignesh2191

❤️35

RTA data collected are huge, multi-dimensional and heterogeneous. Moreover, the data may be incomplete and contain erroneous values, which makes the data analysis a daunting task. The target data for this study was collected by the Department for Transport, GB. Several data mining techniques such as handling an imbalanced dataset, factor reduction and prediction algorithms such as Naïve Bayes, Decision Tree, Random Forest, Logistic Regression, Support Vector Machines (SVM) were carried out to perform an effective data analysis that could potentially support the transport department in devising better precautional measures to minimize the road accident occurrences in Great Britain. Moreover, the idea of chaining two different algorithms was attempted by identifying the significant attributes through Random Forest technique and feeding them as input to other ML algorithms. In addition, the key factors that influence these road collisions were identified and presented.

Updated 6 years ago

NASA-hackathon

klyshko

❤️35

Datasets, aggregated from different sources, can have missing or incomplete values which impose difficulties on data analysis and research. The proposed project aims to build the pipeline for the recovery of various data: categorical, numerical and textual – collected from multiple resources. In the project, we focus on incomplete data, such as geographical locations (cities, places, highways, coordinates), information sources (news websites, TV channels, articles) and measured features of celestial objects (meteorites’ mass and type).

Jupyter Notebook

Updated 5 years ago

Modified-Rainbow

Chris7462

❤️35

This is a modified version of rainbow package in R software. We propose to use the conditional expectation approach to functional principal component analysis (FPCA) that can be applied to the functional bagplot and functional highest density region (HDR) boxplot, which makes outlier detection possible for incomplete functional data.

Updated 1 year ago

Employee-Database

jwhite1987

❤️35

The goal of this project is to take the dataset, an employee database, to create a table schema for each of the six files (located in the data folder). Then, importing each file into a corresponding table, an analysis will be performed on the dataset. The analysis will include taking the information given and making a more comprehensive database by linking the "incomplete" data files and joining them together coding in a fashion that makes it easy to use and much more detailed and comprehensive. The final result will be a database that's much easier to use and understand.

Jupyter Notebook

Updated 3 years ago

databasejupyter-notebookpostgresql+2

ecstc

grouptheory

❤️35

This software implements a new method for obtaining network properties from incomplete data sets. Problems associated with missing data represent well-known stumbling blocks in Social Network Analysis. The method of “estimating connectivity from spanning tree completions” (ECSTC) is specifically designed to address situations where only spanning tree(s) of a network are known, such as those obtained through respondent driven sampling (RDS). Using repeated random completions derived from degree information, this method forgoes the usual step of trying to obtain final edge or vertex rosters, and instead aims to estimate network-centric properties of vertices probabilistically from the spanning trees themselves. In this paper, we discuss the problem of missing data and describe the protocols of our completion method, and finally the results of an experiment where ECSTC was used to estimate graph dependent vertex properties from spanning trees sampled from a graph whose characteristics were known ahead of time. The results show that ECSTC methods hold more promise for obtaining network-centric properties of individuals from a limited set of data than researchers may have previously assumed. Such an approach represents a break with past strategies of working with missing data which have mainly sought means to complete the graph, rather than ECSTC's approach, which is to estimate network properties themselves without deciding on the final edge set.

HTML

Updated 2 years ago

Incomplete-Data-Analysis

Arturo-Esquivel

❤️25

No description available

Updated 3 years ago

Incomplete-Data-Analysis

MaeveLi

❤️35

R markdown files for IDA

Updated 3 years ago

norm2

uscensusbureau

❤️20

Analysis of Incomplete Multivariate Data under a Normal Model

Fortran

Updated 1 year ago

MIDist

changgee

❤️35

Supplementary Material for Multiple Imputation for Analysis of Incomplete Data in Distributed Health Data Networks

Updated 2 years ago

ESSA

JKP1575540259

❤️35

Data for: Extended singular spectrum analysis for processing incomplete and heterogeneous time series

MATLAB

Updated 1 year ago

CACEmetaBayes

JinchengZ

❤️35

R code and data for the manuscript "A Bayesian Hierarchical CACE Model Accounting for Incomplete Noncompliance Data in Meta-analysis"

Updated 2 years ago

norm2

cran

❤️35

:exclamation: This is a read-only mirror of the CRAN R package repository. norm2 — Analysis of Incomplete Multivariate Data under a Normal Model

Fortran

Updated 2 years ago

Exploratory-Data-Analysis-with-Skrub

futureomics

❤️35

Cancer Gene Expression and Exploratory Data Analysis (EDA) with Skrub is a powerful and modern approach to analyzing tabular data, particularly when that data is messy, incomplete, or contains categorical columns.

Jupyter Notebook

Updated 9 months ago

Data-Cleaning-Excel

spathak01

❤️35

Data cleaning in Excel involves the process of identifying and correcting inaccurate, incomplete, or irrelevant data in a spreadsheet. This is an important step in preparing data for analysis or reporting.

Updated 2 years ago

Data-anaysis

aman040499

❤️35

This contains all the work and incomplete projects related to learning path of data analysis. This includes dashboards (using Power BI and IBM Cognos) and python visualization projects.

Jupyter Notebook

Updated 2 years ago

Excelerate-Outreach-Program

Asemota-otasowie

❤️35

A data analysis project focused on evaluating outreach campaign effectiveness by tracking application progress across countries. Due to incomplete applicant follow-ups, records were categorized into "Completed" and "Not Completed" applications to enable structured analysis and reporting.

Updated 3 months ago

New-York-City-Airbnb-Homestays-Data

pradumansalunkhe

❤️35

Data cleaning is the process of fixing or removing incorrect, corrupted, duplicate, or incomplete data within a dataset. Messy data leads to unreliable outcomes. Cleaning data is an essential part of data analysis, and demonstrating your data cleaning skills is key to landing a job. Here are some projects to test out your data cleaning skills

Jupyter Notebook

Updated 1 year ago

Auto-CSV-Cleaner

Ankur-Halder

❤️40

Auto-CSV-Clener is a Python script that automates data cleaning for CSV files. It drops unnecessary or incomplete columns, handles missing values, encodes categorical data, standardizes numerical data, and exports a clean version—ready for analysis or ML models. Perfect for quick, consistent preprocessing.

MIT

Python

Updated 8 months ago

Real-Estate-Data-Analysis

kamalinikongara-gif

❤️45

Real estate decisions are often influenced by personal experience, market perception, or incomplete information. This project aims to demonstrate how data analysis can be used to better understand property pricing patterns and support more informed decision-making.

Jupyter Notebook

Updated 2 months ago

GitHub Explorer

Search Results

PyPOTS

MessyTimeSeries.jl

LNLN

Seismic_Sensory_Data_Analysis

iMD4GC

P-RMF

Data-Analysis-Cheat-Sheet-for-Social-Scientists

PTTC

Mergers-Acquisition-Stock-Price-Prediction

Multi-class-boosting-for-the-analysis-of-multiple-incomplete-views-on-microbiome-data

bravo

A-data-mining-approach-on-the-analysis-of-road-safety-in-Great-Britain

NASA-hackathon

Modified-Rainbow

Employee-Database

ecstc

Incomplete-Data-Analysis

Incomplete-Data-Analysis

norm2

MIDist

ESSA

CACEmetaBayes

norm2

Exploratory-Data-Analysis-with-Skrub

Data-Cleaning-Excel

Data-anaysis

Excelerate-Outreach-Program

New-York-City-Airbnb-Homestays-Data

Auto-CSV-Cleaner

Real-Estate-Data-Analysis

PyPOTS

MessyTimeSeries.jl

LNLN

Seismic_Sensory_Data_Analysis

iMD4GC

P-RMF

Data-Analysis-Cheat-Sheet-for-Social-Scientists

PTTC

Mergers-Acquisition-Stock-Price-Prediction

Multi-class-boosting-for-the-analysis-of-multiple-incomplete-views-on-microbiome-data

bravo

A-data-mining-approach-on-the-analysis-of-road-safety-in-Great-Britain

NASA-hackathon

Modified-Rainbow

Employee-Database

ecstc

Incomplete-Data-Analysis

Incomplete-Data-Analysis

norm2

MIDist

ESSA

CACEmetaBayes

norm2

Exploratory-Data-Analysis-with-Skrub

Data-Cleaning-Excel

Data-anaysis

Excelerate-Outreach-Program

New-York-City-Airbnb-Homestays-Data

Auto-CSV-Cleaner

Real-Estate-Data-Analysis