1

I have a dataframe with 3 columns: variable1, variable2, value. value is measured from all possible combinations of variable1 and variable2 (all-against-all). Also, variable1 and variable2 have the same names. When reshaping my dataframe using the built-in pivot function, this is not done in a certain order.

This is how my original dataframe looks like:

var1 var2   value
A   A   0,00016
A   B   0,02848
A   C   0,00028
A   D   0,0028
A   E   0,00012
A   F   0,00092
A   G   0,08612
A   H   0,00704
B   B   0,00364
B   C   2,27228
B   D   0,00244
B   E   0,00136
B   F   0,00024
B   G   0,00504
B   H   1,08716
C   C   0,00032
C   D   4,0033
C   E   0,00024
C   F   0,00012
C   G   0
C   H   0,00592
D   D   0,01288
D   E   0,00268
D   F   0,00644
D   G   0,00012
D   H   5,57488
E   E   0,00048
E   F   0,00012
E   G   0,0886
E   H   0,01948
F   F   0,00016
F   G   0,00188
F   H   0,0212
G   G   0,0014
G   H   0,00244
H   H   0,00092

I tried reordering my reshaped table after generating it, but this does not solve the problem because it is generated in the wrong order.

This is how I generate my reshaped dataframe:

df = pd.read_csv("results.csv", sep =";")
# sort names case-insensitive
columns = sorted(df['var1'].unique(), key=lambda s: s.casefold())
pivot = pd.pivot_table(df, values ="value", index = ["var1"], columns ="var2")
pivot = pivot.reindex_axis(columns, axis = 1)
pivot = pivot.reindex_axis(columns, axis = 0)

This generates something like this:

var1    A   B   C   D   E   F   G   H
A   0,00016 0,02848 0,00028 0,0028  0,00012 0,00092 0,08612 0,00704
B       0,00364     0,00244     0,00024     1,08716 
C       2,27228 0,00032 4,00E-05    0,00024 0,00012 0   0,00592
D               0,01288     0,00644     5,57488 
E       0,00136     0,00268 0,00048 0,00012 0,0886  0,01948
F                       0,00016     0,0212  
G       0,00504     0,00012     0,00188 0,0014  0,00244
H                               0,00092 

This is how I tried reordering it after generating it, but it does not solve the problem because the reshaped dataframe is generated in the wrong order. Hence, this is useless because the alphabetical order is not respected.

pivot['Total'] = pivot.count(axis = 1)
pivot = pivot.sort_values(by="Total" , ascending = False)
pivot = pivot.drop(columns = ["Total"])
pivot.loc['Total']= pivot.count()
pivot = pivot.sort_values(by ="Total", axis = 1, ascending = False)
pivot = pivot.drop(index = ["Total"])

However, I would like to have my dataframe symmetrical like this:

var1    A   B   C   D   E   F   G   H
A   0,00016 0,02848 0,00028 0,0028  0,00012 0,00092 0,08612 0,00704
B       0,00364 2,27228 0,00244 0,00136 0,00024 0,00504 1,08716 
C           0,00032 4,0033  0,00024 0,00012 0   0,00592
D               0,01288 0,00268 0,00644 0,00012 5,57488 
E                   0,00048 0,00012 0,0886  0,01948
F                       0,00016 0,00188 0,0212  
G                           0,0014  0,00244
H                               0,00092 

(Formatting looked good in the editor but it is not displayed correctly. I hope you get the point.)

  • "I hope you get the point" Unfortunately I don't. Your last two dataframes are unreadable. Please reformat it correctly. – Erfan Apr 15 at 23:37
0

If I understand you correctly.

result = pd.pivot_table(df, values='value', index='var1',
                        columns='var2', aggfunc=lambda x: x)

result.fillna('')

var2        A        B        C        D        E        F        G        H
var1                                                                        
A     0,00016  0,02848  0,00028   0,0028  0,00012  0,00092  0,08612  0,00704
B              0,00364  2,27228  0,00244  0,00136  0,00024  0,00504  1,08716
C                       0,00032   4,0033  0,00024  0,00012        0  0,00592
D                                0,01288  0,00268  0,00644  0,00012  5,57488
E                                         0,00048  0,00012   0,0886  0,01948
F                                                  0,00016  0,00188   0,0212
G                                                            0,0014  0,00244
H                                                                     0,0009

Use pd.pivot_table with custom aggfunc.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.