3

Below is the structure of my data frame. I need to group by based on id, country and state and aggregate the vectors_1 & vector_2 respectively.Please someone suggest how to add the vector for multiple columns

Id  Country State    Vector_1                   Vector_2
1     US     IL   [1.0,2.0,3.0,4.0,5.0]   [5.0,5.0,5.0,5.0,5.0]

1     US     IL   [5.0,3.0,3.0,2.0,1.0]   [5.0,5.0,5.0,5.0,5.0]

2     US     TX   [6.0,7.0,8.0,9.0,1.0]   [1.0,1.0,1.0,1.0,1.0]

The output should looks like this

Id  Country State    Vector_1                      Vector_2
1     US     IL   [6.0,5.0,6.0,6.0,6.0]    [10.0,10.0,10.0,10.0,10.0] 
2     US     TX    [6.0,7.0,8.0,9.0,1.0]    [1.0,1.0,1.0,1.0,1.0]
  • What have you tried so far? – HS-nebula Apr 15 at 21:40
  • I'm trying with the aggregate function.. – prabuster Apr 15 at 21:51
  • Something like this.. but got stuck up df1.groupby('Id','Country','State').agg({Vector_1:sum}) – prabuster Apr 15 at 21:51
  • Try .agg({Vector_1: 'sum'}, axis = 1) – HS-nebula Apr 15 at 21:58
1

If your Vector_1 and Vector_2 are not np.array, try to convert them first.

cols = ['Vector_1', 'Vector_2']

df[cols] = df[cols].applymap(lambda x: np.array(x))

Then use groupby with apply to sum each group

result = (df.groupby(['Id', 'Country', 'State'])[cols]
            .apply(lambda x: x.sum())
            .reset_index())
result

   Id Country State                   Vector_1                        Vector_2
0   1      US    IL  [6.0, 5.0, 6.0, 6.0, 6.0]  [10.0, 10.0, 10.0, 10.0, 10.0]
1   2      US    TX  [6.0, 7.0, 8.0, 9.0, 1.0]       [1.0, 1.0, 1.0, 1.0, 1.0]
  • Thanks..It worked. – prabuster Apr 16 at 14:14
  • Accepted the answer. Thanks – prabuster Apr 16 at 18:49
  • How to implement the same logic in pyspark? I tried 2-3 different logic, but nothing helps. Any advise? – prabuster Apr 22 at 17:24
  • @prabuster I'm not sure how to do this in pyspark. Maybe you can ask a new question and tag pyspark. – ResidentSleeper Apr 22 at 18:58

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.