# Questions tagged [imputation]

Missing data imputation is the process of replacing missing data with substituted, 'best guess', values. Because missing data can create problems for analyzing data and can lead to missing-data bias, imputation is seen as a way to avoid the problems associated with listwise deletion (ignoring all observations with any missing values).

**0**

votes

**2**answers

25 views

### Imputation of Missing Values by Categorical Mean?

I have a dataset with several columns, one of which is missing chunks of data that is needed.
The column with missing data, df$Variable, is always attributed to a specific person, df$Name. Is there a ...

**0**

votes

**0**answers

11 views

### knn imputation of all missing values

I have a large data set with a lot of missing values. Some variables up to 30%. Deletion is not an option. What would be the best way to impute?
For KNN, when I run
df_KNN = pd.DataFrame(data=KNN(k=...

**-3**

votes

**1**answer

42 views

### Interpolation C++ library [closed]

Could you please share me a high-performance and reliable C++ interpolation library?
It's better to have the method for linear, spline and monotonic spline.
And btw, do you know any C++ imputation ...

**0**

votes

**0**answers

19 views

### Using Pipeline to Avoid Data Leakage both for X and y

I followed the example at this link very closely: https://scikit-learn.org/stable/auto_examples/compose/plot_column_transformer_mixed_types.html
but used a different dataset (found here: https://...

**1**

vote

**1**answer

22 views

### Multiple imputation in R (mice) - How do I test imputation runs?

I work with a data set of 171 observations of 55 variables with 35 variables having NA's that I want to impute with the mice function:
imp_Data <- mice(Data,m=5,maxit=50,meth='pmm',seed=500)
...

**0**

votes

**0**answers

29 views

### Import Imputed Data from SAS to R

I am working on a project where I get the imputed data from a colleague who uses SAS and I want to analyze it in R. The problem is that I import it into R as a dataframe using:
final<-read....

**2**

votes

**1**answer

35 views

### How to write a function that imputes missing numeric and character values?

I have the following sample data:
ID GLUC TGL HDL LDL HRT MAMM SMOKE
A 88 NA 32 99 Y NA never
B NA 150 60 NA NA no never
C 110 NA NA 120 N NA NA
D NA 200 65 165 ...

**0**

votes

**1**answer

45 views

### Fill missing value with mean of another variable based on categories in R [duplicate]

I want to replace NA values in val2 in each row with the mean of val corresponding to that ID column. Any easy (tidyverse) way to do this?
Also, I want to know how to replace it by mean(na.rm=TRUE) ...

**1**

vote

**2**answers

41 views

### Generate larger synthetic dataset based on a smaller dataset in Python

I have a dataset with 21000 rows (data samples) and 102 columns (features). I would like to have a larger synthetic dataset generated based on the current dataset, say with 100000 rows, so I can use ...

**0**

votes

**0**answers

34 views

### Replace Impute Resample values in a dataframe column(s) on a condition in Python

I have a time series sensor data. Each column (SENSOR1, SENSOR2, …) has sensor readings values and also ‘Not in Service’ as 'Serv', ‘Fail’, ‘Config’. They needs to be replaced with smth meaningful.
...

**0**

votes

**1**answer

22 views

### per group assign to every start time the latest end time and transport mode that belongs to the highest ID in R

I have a data manipulation problem for which I can solve both imputation individually but not both simultaneously. I have a dataset of tracks which is grouped by ID (different persons), each track has ...

**0**

votes

**1**answer

41 views

### Imputing missing observation

I am analysing a dataset with over 450k rows about 100k rows in one of the columns I am looking at (pa1min_) has NA values, due to non-responses and other random factors. This column deals with ...

**0**

votes

**1**answer

12 views

### R language Amelia specify prefix of output files

This R statement uses the Amelia package to create output data files containing imputed data:
ds.im <- amelia(ds, m=5, p2s=2)
The names of the 5 output files are: output1.csv to output5.csv
In ...

**0**

votes

**1**answer

47 views

### Pandas, replace NaNs with values from MultiIndex DataFrame

Problem
I have a dataframe with some NaNs that I am trying to fill intelligently based off values from another dataframe. I have not found an efficient way to do this but I suspect there is a way ...

**1**

vote

**2**answers

48 views

### Impute missing values in timeseries via bsts

I work with a minutely timeseries with about 20% missing data (in varying lengths).
AFAIK bayesian methods can handle missing data elegantly and I would like to try to fit a bayesian timeseries model ...

**0**

votes

**0**answers

47 views

### NA not permitted in predictors. missForest

I am using missForest in order to impute missing data. I have the data as a data frame and when I put it into the missForest function I get the error:
Error in randomForest.default(x = obsX, y = ...

**0**

votes

**0**answers

28 views

### SAS proc mianalyze EDF

We have a cluster randomized trial with a small number of clusters, The primary endpoint is measured at follow-up and we have missing data. We proposed to conduct a linear mixed model including ...

**3**

votes

**2**answers

50 views

### Forward fill column with an index-based limit

I want to forward fill a column and I want to specify a limit, but I want the limit to be based on the index---not a simple number of rows like limit allows.
For example, say I have the dataframe ...

**-1**

votes

**1**answer

32 views

### Fine and Gray model in R with imputed datasets

I have a long (vertically stacked) dataset containing 10 imputations (variable "imputation" identifies imputation number). The imputation was done in SAS but I would like to calculate some c-...

**0**

votes

**1**answer

58 views

### Multiple imputation in r using “missForest” on categorical variables

I have survey dataset with NAs in several columns. THerefore, I decided to perform multiple imputation using the "missForest" package to impute the missing values. This was not a problem, however I ...

**-1**

votes

**1**answer

28 views

### Using imputation models created from amelia or mice in R for new data

Suppose I run one of the missing variable imputation R packages, amelia or mice (or similar), on a large data frame -- let's say 100000 rows and 50 columns -- to get imputations for one particular ...

**1**

vote

**0**answers

101 views

### Is there an R function that performs LASSO regression on multiple imputed datasets and pools results together?

I have a dataset with 283 observation of 60 variables. My outcome variable is dichotomous (Diagnosis) and can be either of two diseases. I am comparing two types of diseases that often show much ...

**2**

votes

**0**answers

287 views

### How to fit and combine submodels into a single stanfit object?

I would appreciate any help to do this:
for each P fit the model for each column of weights.
do the step 1 for all observations in the dataset to get p * w submodels, where p is the number of ...

**0**

votes

**0**answers

15 views

### How to solve '' aregImpute error : 'column_name' is constant ''

I would like to delete some of the entries in my dataframe and impute them by using the remaining information by means of aregImpute function. However, when I randomly delete 25% of the data in some ...

**1**

vote

**1**answer

44 views

### How to impute missing values with KNN

I'm trying to impute missing values from my data frames and for this I use fancyimpute library.
from fancyimpute import KNN
X_filled_knn = KNN(k=3).complete(df_OppLine[['family']])
I v' got this ...

**1**

vote

**3**answers

71 views

### When i convert my numpy array to Dataframe it update values to Nan

import impyute.imputation.cs as imp
print(Data)
Data = pd.DataFrame(data = imp.em(Data),columns = columns)
print(Data)
When i do the above code all my values gets converted to Nan as below,Can ...

**0**

votes

**1**answer

45 views

### Multiple Imputed datasets - pooling results

I have a dataset containing missing values. I have imputed this dataset, as follows:
library(mice)
id <- c(1,2,3,4,5,6,7,8,9,10)
group <- c(0,1,1,0,1,1,0,1,0,1)
measure_1 <- c(60,80,90,54,...

**2**

votes

**1**answer

698 views

### Differences between sklearn's SimpleImputer and Imputer

In python's sklearn library there exist two classes, which are doing approximately the same things:
sklearn.preprocessing.Imputer and sklearn.impute.SimpleImputer
The only difference that I found is ...

**12**

votes

**3**answers

160 views

### how to impute the distance to a value

I'd like to fill missing values with a "row distance" to the nearest non-NA value. In other words, how would I convert column x in this sample dataframe into column y?
# x y
#1 0 0
#2 NA 1
#3 ...

**-1**

votes

**1**answer

59 views

### Pandas: Missing value imputation based on date

I have a pandas data-frame which is as follows:
df_first = pd.DataFrame({"id": [102, 102, 102, 102, 103, 103], "val1": [np.nan, 4, np.nan, np.nan, 1, np.nan], "val2": [5, np.nan, np.nan, np.nan, np....

**1**

vote

**1**answer

20 views

### How to use cross validation after imputing on a training and validation set?

So I've gotten myself a little confused.
At the moment, I've got a dataset of about 800 instances. I've split it into a training and validation set because there were missing values so I used ...

**-1**

votes

**1**answer

44 views

### R program dealing with missing values (Similar to apply function in Python)

I am new to 'R' program and currently want to deal with the missing values.
Basically, I have a dataset with a few columns and there are missing values in the 'Purchase' column.
I want to impute the ...

**0**

votes

**0**answers

42 views

### Matlab out of memory. Error using pdistmex

I have a matrix M of size(262322x4). On running knn imputation:
M=csvread("C:\Users\Hello\Desktop\DATA\B.csv",1,0);
B = transpose(M);
A = knnimpute(B,1);
C=transpose(A);
I get the following error:
&...

**1**

vote

**0**answers

27 views

### How to convert multiple imputation data to mids in r?

I used another program to impute missings in my data
.imp is coded as X_mult_ in csv file
After converting X_mult_ column into .imp and making .id column, I tried as.mids() function but r says "...

**0**

votes

**0**answers

70 views

### Unable to impute missing values in my PCA

I have a character matrix for some different plant species, in which most species are missing data for at least a few characters. I want to do a principal components analysis, so I tried to impute the ...

**0**

votes

**0**answers

35 views

### Reduced number of rows using MICE package for imputation

I have a multivariate time series. I'm using MICE package to fill NAs. This is resulting in reduced number of rows which I cant afford because it's time series data.
Unfortunately, I'm unable to ...

**0**

votes

**1**answer

36 views

### How to FIND missing observations within a time series and fill with NAs

I have a 10 year long time series containing daily observations. I've discovered that some of the rows (whole rows, not just observations) from this series are missing, which is problematic for my use ...

**1**

vote

**1**answer

25 views

### How to do forward filling for each group in pandas

I have a dataframe similar to below
id A B C D E
1 2 3 4 5 5
1 NaN 4 NaN 6 7
2 3 4 5 6 6
2 NaN NaN 5 4 1
I want to do a null value imputation for columns A, B, C in a ...

**0**

votes

**1**answer

186 views

### change values into missing in KNIME

I have a dataset in which I have N/A for each missing value, how can I change it into an actual missing value, inside the column itself.
I've been trying with the Rule Engine node but it just doesn't ...

**0**

votes

**0**answers

19 views

### Is it possible to set up a minimum limit for imputed values in the mice function?

I am using the mice function to impute values for plant height, but I am getting negative values. Is it possible to set up a minimum (and maybe also a maximum) limit for values that are produced?

**-1**

votes

**2**answers

47 views

### Replace NA values with median by group

I have used the below tapply function to get the median of Age based on Pclass.
Now how can I impute those median values to NA values based on Pclass?
tapply(titan_train$Age, titan_train$Pclass, ...

**0**

votes

**2**answers

42 views

### How to permanently remove all NAs?

I am imputing missing variables. The function seems to work at first:
# Replace NA with "None"
vars_to_none = c("Alley", "BsmtQual", "BsmtCond", "BsmtExposure", "BsmtFinType1", "BsmtFinSF1", "...

**0**

votes

**0**answers

26 views

### How to combine results of survIDINR using data imputed by Amelia II?

I am now using „Amelia“ and „SurvIDINRI“ package, in order to impute missing data and after that to do the IDI and NRI analyses.
I could make the 5 results of imputed analyses as below (m=5), but I ...

**1**

vote

**0**answers

44 views

### How to do multiple imputation on Julia?

I've found the package Impute.jl but it's only able to use these simple methods:
drop: remove missing.
locf: last observation carried forward
nocb: next observation carried backward
interp: linear ...

**0**

votes

**1**answer

74 views

### Imputation model for time series missing data in R

Time series data consists of:
Product (categorical); ProductGroup (categorical); Country (categorical); YearSinceProductLaunch (numeric); SalesAtLaunchYear (numeric)
Only "SalesAtLaunchYear" data ...

**0**

votes

**0**answers

45 views

### Memory usage of imputation with mice in R

I am currently working on the imputation of 10 large datasets (by first creating a prediction matrix with correlation of 0.3, dfpred03) with mice in R and I am having a lot of issues like the ...

**1**

vote

**1**answer

32 views

### Combining svyimputationLists

I am working with the Survey of Consumer Finances dataset and am looking to do analysis across years. My initial thought is to combine them into the same svyimputationList, but my attempts don't seem ...

**0**

votes

**2**answers

400 views

### Simple way to do a weighted hot deck imputation in Stata?

I'd like to do a simple weighted hot deck imputation in Stata. In SAS the equivalent command would be the following (and note that this is a newer SAS feature, beginning with SAS/STAT 14.1 in 2015 or ...

**1**

vote

**1**answer

308 views

### Imputation on the test set with fancyimpute

The python package Fancyimpute provides several methods for the imputation of missing values in Python. The documentation provides examples such as:
# X is the complete data matrix
# X_incomplete has ...

**2**

votes

**1**answer

49 views

### Order of preprocessing step in mlr package in R

Working with already implemented preprocessing Wrappers as well as own Wrappers in mlr, I am wondering in which order the preprocessing steps are computed for the following example?
classif.lrn.net = ...