# Questions tagged [pca]

Principal component analysis (PCA) is a statistical technique for dimension reduction often used in clustering or factor analysis. Given any number of explanatory or causal variables, PCA ranks the variables by their ability to explain greatest variation in the data. It is this property that allows PCA to be used for dimension reduction, i.e. to identify the most important variables from amongst a large set possible influences.

1,775 questions

**1**

vote

**0**answers

27 views

### How to ensure that a loading belongs to a variable in PCA

I have the following piece of code which I'm using to perform PCA on my data:
import pandas as pd
from sklearn.decomposition import PCA
# drop the 'memberid' column
X = df[list(df.columns.difference([...

**0**

votes

**0**answers

25 views

### How to improve Pairwise Euclidean Distance for Similarity Measure

I am trying to identify the most similar stations between two DataFrames like below:
stations feature_1 feature_2 feature_3 ------ feature_10
----------------------------------------------...

**0**

votes

**0**answers

25 views

### Cross validated PCA in R

I am analysing a chemometric dataset of roughly 300 variables and 24 samples. I want to use PCA for variable selection. Previously, I have used the prcomp function and selected the components which ...

**1**

vote

**0**answers

27 views

### sci-kit (sklearn) PCA decomposition transform

To manually perform a transformation, I have to matrix multiply the original data (de-meaned) by PCA().fit(data).components_.T (the transpose of the components) to get the results to match PCA()....

**-1**

votes

**0**answers

7 views

### How do you do PCA with correlation matrix, without any package?

I need a PCA with correlation matrix but without any R package

**2**

votes

**1**answer

39 views

### Sklearn PCA, how to restore mean in lower dimension?

This question concerns how to de-center and "restore" the data in a lower dimension after performing PCA.
I'm doing a simple principal component analysis with sklearn. As I understand it, the ...

**3**

votes

**0**answers

74 views

### How can I implement this L1 norm Robust PCA equation in a more efficient way?

I recently learned in class the Principle Component Analysis method aims to approximate a matrix X to a multiplication of two matrices Z*W. If X is a n x d matrix, Z is a n x k matrix and W is a k x d ...

**1**

vote

**1**answer

39 views

### Why does having too many principal components for handwritten digits classification result in less accuracy

I'm currently using PCA to do handwritten digits recognition for MNIST database (each digit has about 1000 observations and 784 features). One thing I have found confusing is that the accuracy is the ...

**1**

vote

**2**answers

38 views

### PCA with ellipsis without colour in R

I'm trying to make a PCA plot for publication. That means without colours. However, all packages I have tried color the plot in the moment that you tell it to group the categories of the data.
I have ...

**0**

votes

**0**answers

8 views

### How to reconstruct original data after doing MCA on a training data set

I did MCA reduction on my dataset(into 2 dimensions) and did K means clustering(4 clusters).
Now i have 4 center coordinates of these clusters.How to reconstruct the original values of these center ...

**2**

votes

**1**answer

27 views

### Normalization before PCA on different data types

Prior to running principal component analysis you should normalize the data as to not have the results skewed. Under normal situations, this is a fairly simple task. I am curious how I should go about ...

**0**

votes

**1**answer

16 views

### r: dimensionality reduction with PCA in raster brick

Based on the examples here:
[https://stats.stackexchange.com/questions/57467/how-to-perform-dimensionality-reduction-with-pca-in-r/57478#57478][1]
and
[https://stats.stackexchange.com/questions/...

**-4**

votes

**0**answers

25 views

### eigen vectors get the wrong signe

import numpy as np
A=np.array([ [1,0.36651584,0.48540429,-0.56789174 ],
[0.36651584,1,0.39551461,-0.62873728 ],
[0.48540429,0.39551461,1,-0.32232919 ],
[-0....

**0**

votes

**1**answer

17 views

### Plotting PCA, autoplot() doesn't separate colors by group variable

I am using ggplot2 package and ggfortify to plot PCA results. The last column of my data matrix is a column of four different factors. Name of the column is 'group'.
It is like:
group
a
b
a
c
d
The ...

**0**

votes

**1**answer

24 views

### How to plot PCA with caret in R

I am getting PCA components using preProcess() from caret in R, and getting quantitative results.
dataPCA <- preProcess(data[1:ncol(data)-1], method ="pca", thresh = 0.95)
print(dataPCA)
print(...

**0**

votes

**1**answer

15 views

### How to save a trunckated svd model in python

I am working on a machine learning project. I have applied truncated svd on my data for feature reduction and then trained the neural networks on that data. I have saved the neural network model using ...

**1**

vote

**1**answer

48 views

### Interpreting OLS Weights after PCA (in Python)

I want to interpret the regression model weights in a model where the input data has been pre-processed with PCA. In reality, I have 100s of input dimensions which are highly correlated, so I know ...

**-1**

votes

**1**answer

39 views

### list of string to list of colorlover

I have a list of 100 strings and I would like to convert them to a list of colorlover colors to be able to use them in plot.py. I have tried multiple options but it always gives me a different color ...

**-3**

votes

**0**answers

31 views

### Clustering with a huge dataset [closed]

I am trying to create a customer segmentation model using a dataset containing 6.5 Million customers and about 120 mixed variables (numerical and categorical).
I converted all the categorical ...

**0**

votes

**1**answer

29 views

### PCA vs TSNE vs MDS (review cluster)

I have a well know dataset from Movielens of review and i wish cluster the user for movie taste.
I m starting from a dataset like this:
idUser iDmovies review
1 2 1
1 10 2
5 ...

**1**

vote

**1**answer

35 views

### Eigen-values of covariance matrix usign QR factorization

Given the matrix X of dimension D x N, I am interested to compute the eigen-values of C = np.dot(X, X.T)/N using QR factorization. Based on following:
we expect the eigen-values of C to be np.diag(r....

**0**

votes

**1**answer

11 views

### What can I do for preprocessing image BMP before using PCA by MATLAB?

I have BMP images .I want to preprocessing it with compatible way that can handle by using PCA for features extracting .

**1**

vote

**1**answer

31 views

### How to get BIC/AIC plot for selecting number of Principal Components in Python or R

I want to get a plot like this one for selecting number of components in a PCA:
I am however stuck trying to manually code the BIC/AIC. Are there any packages in either R or Python that can help me ...

**0**

votes

**1**answer

44 views

### How can I import and work with correlation matrix as the only data source in PCA and PCF in R

I am new to R, and am working on a problem of *mporting and working with correlation matrix as the only data source in PCA and PCF in R
I have referred to stack overflow answer banks and even books, ...

**0**

votes

**0**answers

13 views

### How do I use PCA and tSNE for sampled weighted data?

I am exploring massive data and later applying PCA and tSNE on full data to extract patterns/clusters. e.g. samples into features = (1,000,000 * 1,000) then 50 PCs (1,000,000 * 50) and tSNE embedding ...

**2**

votes

**1**answer

43 views

### Can I standardize my PCA applied count vector?

I have applied CountVectorizer() on my X_train and it returned a sparse matrix.
Usually if we want to Standardize sparse matrix we pass in with_mean=False param.
scaler = StandardScaler(with_mean=...

**0**

votes

**0**answers

54 views

### Do I have to do fit PCA separately for train and test data

I am considering to do PCA(TruncatedSVD) for reducing the number of dimension for my sparse matrix.
I split my data into train and test split.
X_train , X_test, y_train, y_test = train_test_split(X, ...

**1**

vote

**1**answer

45 views

### How many principal components should I choose for PCA?

I have a dataframe with few categorical and numerical features. To that I've concatenated my BoW(CountVectorizer) of text column which resulted in more than 56,000 features. So I'm considering to do ...

**0**

votes

**0**answers

11 views

### PCA gives equal values in explained_variance after scaling for 2D dataset

I have 2D dataset and want apply PCA to reduce it to 1D, this is how it looks like:
age thalach
0 63 150
1 37 187
2 41 172
3 56 178
4 57 163
If I don't apply scaling, I get following ...

**0**

votes

**1**answer

26 views

### Sklearn PCA: Correct Dimensionality of PCs

I have a dataframe, df, which contains a column called 'event' wherein there is a 24x24x40 numpy array. I want to:
extract this numpy array;
flatten it into a 1x23040 vector;
add this entry as a ...

**0**

votes

**0**answers

30 views

### ModuleNotFoundError: No module named 'sklearn.utils._joblib'

I'm using python 3.6 on on Anaconda Jupyter notebooks platform. My pc uses win 8.1 as OS.
I was trying to import PCA from sklearn using the following lines:
import sklearn
from sklearn import ...

**0**

votes

**1**answer

26 views

### How to avoid the max. amount of input fields for JPMML

I have problems using PMML models in JPMML (scala) with many input fields. Find a minimal example below: Load an image with 300x150 pixel and use this as an input for a PCA (python):
img = PIL.Image....

**0**

votes

**1**answer

49 views

### How can I remove arrows from ggbiplot of PCA in R that are not significant?

So, I am attempting to create a ggbiplot of a PCA of prey order in the diet of diurnal and nocturnal raptors, but the problem is that the ggbiplot function automatically creates arrows for each order. ...

**0**

votes

**0**answers

31 views

### PCA plot reduction dimensionality

I m try to cluster using PCA technique.
In my case i have review made by user of n movies.
I create a table user x movie in this way :
User Movie
0 1 2 3 4
0 2 0 5 0 0
1 0 1 ...

**1**

vote

**1**answer

40 views

### R: training random forest using PCA data

I have a data set called Data, with 30 scaled and centered features and 1 outcome with column name OUTCOME, referred to 700k records, stored in data.table format. I computed its PCA, and observed that ...

**0**

votes

**0**answers

9 views

### PCA and tSNE in higher dimensions

The spectral components (eigenvalues) of covariance matrix for very high dimensional data are distorted, i.e. larger eigenvalues become too large and relatively smaller eigenvalues are compressed. Due ...

**0**

votes

**0**answers

7 views

### Can we use manifold learning methods for image compression like PCA?

I'm trying to understand manifold learning for dimensionality reduction . Most examples I see use Isomap or LLE on image data only to project it to 2d where the relationship between different data ...

**1**

vote

**0**answers

23 views

### prcomp( .. ,retx=TRUE), do I get the new data to train over?

I am having some issues in interpreting the results from prcomp().
Say I have a centered and scaled data.table called dat, with N columns and M rows. Indeed every column represents a feature and ...

**1**

vote

**1**answer

47 views

### While using R, PCA and Plotting Cumulative Variance

I am working with R using a scaled dataset and principle component analysis (princomp). Everything works fine but I would like to graph the cumulative % variances of principle components to the whole. ...

**0**

votes

**1**answer

24 views

### PCA: What does it mean that the number of necessary PCs for a given explanation percentage changes?

Say one has a program that performs PCA.
The program calculates the number of PCs necessary in order to cover a given share of total variation in the data, e.g. 95 %.
Say the number of PCs necessary ...

**0**

votes

**1**answer

25 views

### Spark MLlib: PCA on 9570 columns takes too long

1) I am doing a PCA on 9570 columns giving it 12288 mb RAM in local mode(which means driver only) and it takes from 1.5 hours up to 2. This is the code (very simple):
System.out.println("level1\n");
...

**0**

votes

**0**answers

20 views

### The shape_index feature from sklearn not able to apply PCA, due to a NaN error

The image process:
for i in ...
img = Image.open(img_path).convert('L') #get gray img
shape_img = shape_index(img, sigma = 0.1)
images[i] = shape_img
...

**0**

votes

**0**answers

7 views

### PCA analysis of implied volatility surface and use components for future changes in the implied volatility

I need to perform dimension reduction on implied volatility surfaces returns, i.e i want to estimate the return in the implied volatility surface over time by using as few datapoints in the implied ...

**2**

votes

**0**answers

49 views

### Compute first few principal components of a large data set, quickly [closed]

I'm working with large data sets (matrices of dimension 6000 x 3072), and using the prcomp() function to do my principal component calculation. However, the function is extremely slow. Even using the ...

**-1**

votes

**2**answers

24 views

### R-Hiding/Deleting Vectors in my biplot with PCA

So I just want to be able to clearly see the points, and get rid of the vectors, because I am not interpreting those, here is my code:
FrogPCA <- prcomp(FrogData[,3:12], center=TRUE, scale=TRUE)
...

**1**

vote

**0**answers

28 views

### Add legend cluster text document

I want to add legend to my plot. I have text documents, I have processed them with PCA in order to be able to plot a 2d graph but I want to have a legend explaining the label of each color for the ...

**1**

vote

**1**answer

51 views

### Python : Merging/joining Dataframe after PCA transform results in NAN

import pickle
import numpy as np
import pandas as pd
from sklearn.externals import joblib
from sklearn.decomposition import PCA
PCA = joblib.load('pcawithstandard.pkl')
with open('collist.pickle', '...

**1**

vote

**0**answers

16 views

### Replicating results of SPSS PCA with Equamax rotation in R

Having used SPSS for many years I now want to switch to R. For consistency in findings I want to replicate previous SPSS results in R. I am not a statistician and I am quite new in learning R. I ...

**1**

vote

**1**answer

73 views

### TypeError: PCA() got an unexpected keyword argument 'n_components'

Hi I was trying to implement the PCA(), but I'm getting an error, '
TypeError: PCA() got an unexpected keyword argument 'n_components'.
from sklearn.decomposition import PCA
#Principal component ...

**0**

votes

**1**answer

41 views

### How to select the records depend on PCs to reduce dimensionality in Rapidminer?

I am a new in Rapidminer, so i have a huge dataset and i use Principle component analysis to reduce dimensionality, the problem is when i get the PCs i do not know how to select the records depend on ...