Questions tagged [pca]

Principal component analysis (PCA) is a statistical technique for dimension reduction often used in clustering or factor analysis. Given any number of explanatory or causal variables, PCA ranks the variables by their ability to explain greatest variation in the data. It is this property that allows PCA to be used for dimension reduction, i.e. to identify the most important variables from amongst a large set possible influences.

1
vote
0answers
27 views

How to ensure that a loading belongs to a variable in PCA

I have the following piece of code which I'm using to perform PCA on my data: import pandas as pd from sklearn.decomposition import PCA # drop the 'memberid' column X = df[list(df.columns.difference([...
0
votes
0answers
25 views

How to improve Pairwise Euclidean Distance for Similarity Measure

I am trying to identify the most similar stations between two DataFrames like below: stations feature_1 feature_2 feature_3 ------ feature_10 ----------------------------------------------...
0
votes
0answers
25 views

Cross validated PCA in R

I am analysing a chemometric dataset of roughly 300 variables and 24 samples. I want to use PCA for variable selection. Previously, I have used the prcomp function and selected the components which ...
1
vote
0answers
27 views

sci-kit (sklearn) PCA decomposition transform

To manually perform a transformation, I have to matrix multiply the original data (de-meaned) by PCA().fit(data).components_.T (the transpose of the components) to get the results to match PCA()....
-1
votes
0answers
7 views

How do you do PCA with correlation matrix, without any package?

I need a PCA with correlation matrix but without any R package
2
votes
1answer
39 views

Sklearn PCA, how to restore mean in lower dimension?

This question concerns how to de-center and "restore" the data in a lower dimension after performing PCA. I'm doing a simple principal component analysis with sklearn. As I understand it, the ...
3
votes
0answers
74 views

How can I implement this L1 norm Robust PCA equation in a more efficient way?

I recently learned in class the Principle Component Analysis method aims to approximate a matrix X to a multiplication of two matrices Z*W. If X is a n x d matrix, Z is a n x k matrix and W is a k x d ...
1
vote
1answer
39 views

Why does having too many principal components for handwritten digits classification result in less accuracy

I'm currently using PCA to do handwritten digits recognition for MNIST database (each digit has about 1000 observations and 784 features). One thing I have found confusing is that the accuracy is the ...
1
vote
2answers
38 views

PCA with ellipsis without colour in R

I'm trying to make a PCA plot for publication. That means without colours. However, all packages I have tried color the plot in the moment that you tell it to group the categories of the data. I have ...
0
votes
0answers
8 views

How to reconstruct original data after doing MCA on a training data set

I did MCA reduction on my dataset(into 2 dimensions) and did K means clustering(4 clusters). Now i have 4 center coordinates of these clusters.How to reconstruct the original values of these center ...
2
votes
1answer
27 views

Normalization before PCA on different data types

Prior to running principal component analysis you should normalize the data as to not have the results skewed. Under normal situations, this is a fairly simple task. I am curious how I should go about ...
0
votes
1answer
16 views

r: dimensionality reduction with PCA in raster brick

Based on the examples here: [https://stats.stackexchange.com/questions/57467/how-to-perform-dimensionality-reduction-with-pca-in-r/57478#57478][1] and [https://stats.stackexchange.com/questions/...
-4
votes
0answers
25 views

eigen vectors get the wrong signe

import numpy as np A=np.array([ [1,0.36651584,0.48540429,-0.56789174 ], [0.36651584,1,0.39551461,-0.62873728 ], [0.48540429,0.39551461,1,-0.32232919 ], [-0....
0
votes
1answer
17 views

Plotting PCA, autoplot() doesn't separate colors by group variable

I am using ggplot2 package and ggfortify to plot PCA results. The last column of my data matrix is a column of four different factors. Name of the column is 'group'. It is like: group a b a c d The ...
0
votes
1answer
24 views

How to plot PCA with caret in R

I am getting PCA components using preProcess() from caret in R, and getting quantitative results. dataPCA <- preProcess(data[1:ncol(data)-1], method ="pca", thresh = 0.95) print(dataPCA) print(...
0
votes
1answer
15 views

How to save a trunckated svd model in python

I am working on a machine learning project. I have applied truncated svd on my data for feature reduction and then trained the neural networks on that data. I have saved the neural network model using ...
1
vote
1answer
48 views

Interpreting OLS Weights after PCA (in Python)

I want to interpret the regression model weights in a model where the input data has been pre-processed with PCA. In reality, I have 100s of input dimensions which are highly correlated, so I know ...
-1
votes
1answer
39 views

list of string to list of colorlover

I have a list of 100 strings and I would like to convert them to a list of colorlover colors to be able to use them in plot.py. I have tried multiple options but it always gives me a different color ...
-3
votes
0answers
31 views

Clustering with a huge dataset [closed]

I am trying to create a customer segmentation model using a dataset containing 6.5 Million customers and about 120 mixed variables (numerical and categorical). I converted all the categorical ...
0
votes
1answer
29 views

PCA vs TSNE vs MDS (review cluster)

I have a well know dataset from Movielens of review and i wish cluster the user for movie taste. I m starting from a dataset like this: idUser iDmovies review 1 2 1 1 10 2 5 ...
1
vote
1answer
35 views

Eigen-values of covariance matrix usign QR factorization

Given the matrix X of dimension D x N, I am interested to compute the eigen-values of C = np.dot(X, X.T)/N using QR factorization. Based on following: we expect the eigen-values of C to be np.diag(r....
0
votes
1answer
11 views

What can I do for preprocessing image BMP before using PCA by MATLAB?

I have BMP images .I want to preprocessing it with compatible way that can handle by using PCA for features extracting .
1
vote
1answer
31 views

How to get BIC/AIC plot for selecting number of Principal Components in Python or R

I want to get a plot like this one for selecting number of components in a PCA: I am however stuck trying to manually code the BIC/AIC. Are there any packages in either R or Python that can help me ...
0
votes
1answer
44 views

How can I import and work with correlation matrix as the only data source in PCA and PCF in R

I am new to R, and am working on a problem of *mporting and working with correlation matrix as the only data source in PCA and PCF in R I have referred to stack overflow answer banks and even books, ...
0
votes
0answers
13 views

How do I use PCA and tSNE for sampled weighted data?

I am exploring massive data and later applying PCA and tSNE on full data to extract patterns/clusters. e.g. samples into features = (1,000,000 * 1,000) then 50 PCs (1,000,000 * 50) and tSNE embedding ...
2
votes
1answer
43 views

Can I standardize my PCA applied count vector?

I have applied CountVectorizer() on my X_train and it returned a sparse matrix. Usually if we want to Standardize sparse matrix we pass in with_mean=False param. scaler = StandardScaler(with_mean=...
0
votes
0answers
54 views

Do I have to do fit PCA separately for train and test data

I am considering to do PCA(TruncatedSVD) for reducing the number of dimension for my sparse matrix. I split my data into train and test split. X_train , X_test, y_train, y_test = train_test_split(X, ...
1
vote
1answer
45 views

How many principal components should I choose for PCA?

I have a dataframe with few categorical and numerical features. To that I've concatenated my BoW(CountVectorizer) of text column which resulted in more than 56,000 features. So I'm considering to do ...
0
votes
0answers
11 views

PCA gives equal values in explained_variance after scaling for 2D dataset

I have 2D dataset and want apply PCA to reduce it to 1D, this is how it looks like: age thalach 0 63 150 1 37 187 2 41 172 3 56 178 4 57 163 If I don't apply scaling, I get following ...
0
votes
1answer
26 views

Sklearn PCA: Correct Dimensionality of PCs

I have a dataframe, df, which contains a column called 'event' wherein there is a 24x24x40 numpy array. I want to: extract this numpy array; flatten it into a 1x23040 vector; add this entry as a ...
0
votes
0answers
30 views

ModuleNotFoundError: No module named 'sklearn.utils._joblib'

I'm using python 3.6 on on Anaconda Jupyter notebooks platform. My pc uses win 8.1 as OS. I was trying to import PCA from sklearn using the following lines: import sklearn from sklearn import ...
0
votes
1answer
26 views

How to avoid the max. amount of input fields for JPMML

I have problems using PMML models in JPMML (scala) with many input fields. Find a minimal example below: Load an image with 300x150 pixel and use this as an input for a PCA (python): img = PIL.Image....
0
votes
1answer
49 views

How can I remove arrows from ggbiplot of PCA in R that are not significant?

So, I am attempting to create a ggbiplot of a PCA of prey order in the diet of diurnal and nocturnal raptors, but the problem is that the ggbiplot function automatically creates arrows for each order. ...
0
votes
0answers
31 views

PCA plot reduction dimensionality

I m try to cluster using PCA technique. In my case i have review made by user of n movies. I create a table user x movie in this way : User Movie 0 1 2 3 4 0 2 0 5 0 0 1 0 1 ...
1
vote
1answer
40 views

R: training random forest using PCA data

I have a data set called Data, with 30 scaled and centered features and 1 outcome with column name OUTCOME, referred to 700k records, stored in data.table format. I computed its PCA, and observed that ...
0
votes
0answers
9 views

PCA and tSNE in higher dimensions

The spectral components (eigenvalues) of covariance matrix for very high dimensional data are distorted, i.e. larger eigenvalues become too large and relatively smaller eigenvalues are compressed. Due ...
0
votes
0answers
7 views

Can we use manifold learning methods for image compression like PCA?

I'm trying to understand manifold learning for dimensionality reduction . Most examples I see use Isomap or LLE on image data only to project it to 2d where the relationship between different data ...
1
vote
0answers
23 views

prcomp( .. ,retx=TRUE), do I get the new data to train over?

I am having some issues in interpreting the results from prcomp(). Say I have a centered and scaled data.table called dat, with N columns and M rows. Indeed every column represents a feature and ...
1
vote
1answer
47 views

While using R, PCA and Plotting Cumulative Variance

I am working with R using a scaled dataset and principle component analysis (princomp). Everything works fine but I would like to graph the cumulative % variances of principle components to the whole. ...
0
votes
1answer
24 views

PCA: What does it mean that the number of necessary PCs for a given explanation percentage changes?

Say one has a program that performs PCA. The program calculates the number of PCs necessary in order to cover a given share of total variation in the data, e.g. 95 %. Say the number of PCs necessary ...
0
votes
1answer
25 views

Spark MLlib: PCA on 9570 columns takes too long

1) I am doing a PCA on 9570 columns giving it 12288 mb RAM in local mode(which means driver only) and it takes from 1.5 hours up to 2. This is the code (very simple): System.out.println("level1\n"); ...
0
votes
0answers
20 views

The shape_index feature from sklearn not able to apply PCA, due to a NaN error

The image process: for i in ... img = Image.open(img_path).convert('L') #get gray img shape_img = shape_index(img, sigma = 0.1) images[i] = shape_img ...
0
votes
0answers
7 views

PCA analysis of implied volatility surface and use components for future changes in the implied volatility

I need to perform dimension reduction on implied volatility surfaces returns, i.e i want to estimate the return in the implied volatility surface over time by using as few datapoints in the implied ...
2
votes
0answers
49 views

Compute first few principal components of a large data set, quickly [closed]

I'm working with large data sets (matrices of dimension 6000 x 3072), and using the prcomp() function to do my principal component calculation. However, the function is extremely slow. Even using the ...
-1
votes
2answers
24 views

R-Hiding/Deleting Vectors in my biplot with PCA

So I just want to be able to clearly see the points, and get rid of the vectors, because I am not interpreting those, here is my code: FrogPCA <- prcomp(FrogData[,3:12], center=TRUE, scale=TRUE) ...
1
vote
0answers
28 views

Add legend cluster text document

I want to add legend to my plot. I have text documents, I have processed them with PCA in order to be able to plot a 2d graph but I want to have a legend explaining the label of each color for the ...
1
vote
1answer
51 views

Python : Merging/joining Dataframe after PCA transform results in NAN

import pickle import numpy as np import pandas as pd from sklearn.externals import joblib from sklearn.decomposition import PCA PCA = joblib.load('pcawithstandard.pkl') with open('collist.pickle', '...
1
vote
0answers
16 views

Replicating results of SPSS PCA with Equamax rotation in R

Having used SPSS for many years I now want to switch to R. For consistency in findings I want to replicate previous SPSS results in R. I am not a statistician and I am quite new in learning R. I ...
1
vote
1answer
73 views

TypeError: PCA() got an unexpected keyword argument 'n_components'

Hi I was trying to implement the PCA(), but I'm getting an error, ' TypeError: PCA() got an unexpected keyword argument 'n_components'. from sklearn.decomposition import PCA #Principal component ...
0
votes
1answer
41 views

How to select the records depend on PCs to reduce dimensionality in Rapidminer?

I am a new in Rapidminer, so i have a huge dataset and i use Principle component analysis to reduce dimensionality, the problem is when i get the PCs i do not know how to select the records depend on ...

http://mssss.yulina-kosm.ru