Transpose a dataframe in Pyspark

how can I do to transpose the following data frame in Pyspark?

The idea is to achieve the result that appears below.

import pandas as pd



d = {'id' : pd.Series([1, 1, 1, 2, 2, 2, 3, 3, 3], index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']),

     'place' : pd.Series(['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']),

     'value' : pd.Series([10, 30, 20, 10, 30, 20, 10, 30, 20], index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']),

     'attribute' : pd.Series(['size', 'height', 'weigth', 'size', 'height', 'weigth','size', 'height', 'weigth'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i'])}



   id place  value attribute

a   1     A     10      size

b   1     A     30    height

c   1     A     20    weigth

d   2     A     10      size

e   2     A     30    height

f   2     A     20    weigth

g   3     A     10      size

h   3     A     30    height

i   3     A     20    weigth



d = {'id' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),

     'place' : pd.Series(['A', 'A', 'A'], index=['a', 'b', 'c']),

     'size' : pd.Series([10, 30, 20], index=['a', 'b', 'c']),

     'height' : pd.Series([10, 30, 20], index=['a', 'b', 'c']),

     'weigth' : pd.Series([10, 30, 20], index=['a', 'b', 'c'])}



df = pd.DataFrame(d)

print(df)



   id place  size  height  weigth

a   1     A    10      10      10

b   2     A    30      30      30

c   3     A    20      20      20

Any help is welcome. From already thank you very much

edited Nov 25 '18 at 9:40

user10465355

1,9452418

asked Nov 23 '18 at 21:28

lolo

208211

Possible duplicate of How to pivot DataFrame?

– user10465355
Nov 24 '18 at 10:55

add a comment |

how can I do to transpose the following data frame in Pyspark?

The idea is to achieve the result that appears below.

import pandas as pd



d = {'id' : pd.Series([1, 1, 1, 2, 2, 2, 3, 3, 3], index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']),

     'place' : pd.Series(['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']),

     'value' : pd.Series([10, 30, 20, 10, 30, 20, 10, 30, 20], index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']),

     'attribute' : pd.Series(['size', 'height', 'weigth', 'size', 'height', 'weigth','size', 'height', 'weigth'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i'])}



   id place  value attribute

a   1     A     10      size

b   1     A     30    height

c   1     A     20    weigth

d   2     A     10      size

e   2     A     30    height

f   2     A     20    weigth

g   3     A     10      size

h   3     A     30    height

i   3     A     20    weigth



d = {'id' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),

     'place' : pd.Series(['A', 'A', 'A'], index=['a', 'b', 'c']),

     'size' : pd.Series([10, 30, 20], index=['a', 'b', 'c']),

     'height' : pd.Series([10, 30, 20], index=['a', 'b', 'c']),

     'weigth' : pd.Series([10, 30, 20], index=['a', 'b', 'c'])}



df = pd.DataFrame(d)

print(df)



   id place  size  height  weigth

a   1     A    10      10      10

b   2     A    30      30      30

c   3     A    20      20      20

Any help is welcome. From already thank you very much

edited Nov 25 '18 at 9:40

user10465355

1,9452418

asked Nov 23 '18 at 21:28

lolo

208211

Possible duplicate of How to pivot DataFrame?

– user10465355
Nov 24 '18 at 10:55

add a comment |

how can I do to transpose the following data frame in Pyspark?

The idea is to achieve the result that appears below.

import pandas as pd



d = {'id' : pd.Series([1, 1, 1, 2, 2, 2, 3, 3, 3], index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']),

     'place' : pd.Series(['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']),

     'value' : pd.Series([10, 30, 20, 10, 30, 20, 10, 30, 20], index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']),

     'attribute' : pd.Series(['size', 'height', 'weigth', 'size', 'height', 'weigth','size', 'height', 'weigth'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i'])}



   id place  value attribute

a   1     A     10      size

b   1     A     30    height

c   1     A     20    weigth

d   2     A     10      size

e   2     A     30    height

f   2     A     20    weigth

g   3     A     10      size

h   3     A     30    height

i   3     A     20    weigth



d = {'id' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),

     'place' : pd.Series(['A', 'A', 'A'], index=['a', 'b', 'c']),

     'size' : pd.Series([10, 30, 20], index=['a', 'b', 'c']),

     'height' : pd.Series([10, 30, 20], index=['a', 'b', 'c']),

     'weigth' : pd.Series([10, 30, 20], index=['a', 'b', 'c'])}



df = pd.DataFrame(d)

print(df)



   id place  size  height  weigth

a   1     A    10      10      10

b   2     A    30      30      30

c   3     A    20      20      20

Any help is welcome. From already thank you very much

edited Nov 25 '18 at 9:40

user10465355

1,9452418

asked Nov 23 '18 at 21:28

lolo

208211

how can I do to transpose the following data frame in Pyspark?

The idea is to achieve the result that appears below.

import pandas as pd



d = {'id' : pd.Series([1, 1, 1, 2, 2, 2, 3, 3, 3], index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']),

     'place' : pd.Series(['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']),

     'value' : pd.Series([10, 30, 20, 10, 30, 20, 10, 30, 20], index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']),

     'attribute' : pd.Series(['size', 'height', 'weigth', 'size', 'height', 'weigth','size', 'height', 'weigth'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i'])}



   id place  value attribute

a   1     A     10      size

b   1     A     30    height

c   1     A     20    weigth

d   2     A     10      size

e   2     A     30    height

f   2     A     20    weigth

g   3     A     10      size

h   3     A     30    height

i   3     A     20    weigth



d = {'id' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),

     'place' : pd.Series(['A', 'A', 'A'], index=['a', 'b', 'c']),

     'size' : pd.Series([10, 30, 20], index=['a', 'b', 'c']),

     'height' : pd.Series([10, 30, 20], index=['a', 'b', 'c']),

     'weigth' : pd.Series([10, 30, 20], index=['a', 'b', 'c'])}



df = pd.DataFrame(d)

print(df)



   id place  size  height  weigth

a   1     A    10      10      10

b   2     A    30      30      30

c   3     A    20      20      20

Any help is welcome. From already thank you very much

apache-spark pyspark apache-spark-sql

edited Nov 25 '18 at 9:40

user10465355

1,9452418

asked Nov 23 '18 at 21:28

lolo

208211

edited Nov 25 '18 at 9:40

user10465355

1,9452418

asked Nov 23 '18 at 21:28

lolo

208211

edited Nov 25 '18 at 9:40

user10465355

1,9452418

edited Nov 25 '18 at 9:40

user10465355

1,9452418

edited Nov 25 '18 at 9:40

user10465355

1,9452418

asked Nov 23 '18 at 21:28

lolo

208211

asked Nov 23 '18 at 21:28

lolo

208211

asked Nov 23 '18 at 21:28

lolo

208211

Possible duplicate of How to pivot DataFrame?

– user10465355
Nov 24 '18 at 10:55

add a comment |

Possible duplicate of How to pivot DataFrame?

– user10465355
Nov 24 '18 at 10:55

Possible duplicate of How to pivot DataFrame?

– user10465355
Nov 24 '18 at 10:55

add a comment |

2 Answers
2

active

oldest

votes

First of all I don't think your sample output is correct. Your input data has size set to 10, height set to 30 and weigth set to 20 for every id, but the desired output set's everything to 10 for id 1. If this is really what you, please explain it a bit more. If this was a mistake, then you want to use the pivot function. Example:

from pyspark.sql.functions import first

l =[( 1        ,'A', 10, 'size' ),

( 1        , 'A', 30, 'height' ),

( 1        , 'A', 20, 'weigth' ),

( 2        , 'A', 10, 'size' ),

( 2        , 'A', 30, 'height' ),

( 2        , 'A', 20, 'weigth' ),

( 3        , 'A', 10, 'size' ),

( 3        , 'A', 30, 'height' ),

( 3        , 'A', 20, 'weigth' )]



df = spark.createDataFrame(l, ['id','place', 'value', 'attribute'])



df.groupBy(df.id, df.place).pivot('attribute').agg(first("value")).show()



+---+-----+------+----+------+ 

| id|place|height|size|weigth|

+---+-----+------+----+------+ 

|  2|    A|    30|  10|    20| 

|  3|    A|    30|  10|    20| 

|  1|    A|    30|  10|    20|

+---+-----+------+----+------+

answered Nov 24 '18 at 1:27

cronoik

425314

Thank you! that is what I was looking for!

– lolo
Nov 26 '18 at 0:36

add a comment |

Refer to the documentation. Pivoting is always done in context to aggregation, and I have chosen sum here. So, if for same id, place or attribute, there are multiple values, then their sum will be taken. You could use min,max or mean as well, depending upon what you need.

df = df.groupBy(["id","place"]).pivot("attribute").sum("value")

This link also addresses the same question.

answered Nov 25 '18 at 10:10

cph_sto

2,3392421

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53453118%2ftranspose-a-dataframe-in-pyspark%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

from pyspark.sql.functions import first

l =[( 1        ,'A', 10, 'size' ),

( 1        , 'A', 30, 'height' ),

( 1        , 'A', 20, 'weigth' ),

( 2        , 'A', 10, 'size' ),

( 2        , 'A', 30, 'height' ),

( 2        , 'A', 20, 'weigth' ),

( 3        , 'A', 10, 'size' ),

( 3        , 'A', 30, 'height' ),

( 3        , 'A', 20, 'weigth' )]



df = spark.createDataFrame(l, ['id','place', 'value', 'attribute'])



df.groupBy(df.id, df.place).pivot('attribute').agg(first("value")).show()



+---+-----+------+----+------+ 

| id|place|height|size|weigth|

+---+-----+------+----+------+ 

|  2|    A|    30|  10|    20| 

|  3|    A|    30|  10|    20| 

|  1|    A|    30|  10|    20|

+---+-----+------+----+------+

answered Nov 24 '18 at 1:27

cronoik

425314

Thank you! that is what I was looking for!

– lolo
Nov 26 '18 at 0:36

add a comment |

from pyspark.sql.functions import first

l =[( 1        ,'A', 10, 'size' ),

( 1        , 'A', 30, 'height' ),

( 1        , 'A', 20, 'weigth' ),

( 2        , 'A', 10, 'size' ),

( 2        , 'A', 30, 'height' ),

( 2        , 'A', 20, 'weigth' ),

( 3        , 'A', 10, 'size' ),

( 3        , 'A', 30, 'height' ),

( 3        , 'A', 20, 'weigth' )]



df = spark.createDataFrame(l, ['id','place', 'value', 'attribute'])



df.groupBy(df.id, df.place).pivot('attribute').agg(first("value")).show()



+---+-----+------+----+------+ 

| id|place|height|size|weigth|

+---+-----+------+----+------+ 

|  2|    A|    30|  10|    20| 

|  3|    A|    30|  10|    20| 

|  1|    A|    30|  10|    20|

+---+-----+------+----+------+

answered Nov 24 '18 at 1:27

cronoik

425314

Thank you! that is what I was looking for!

– lolo
Nov 26 '18 at 0:36

add a comment |

from pyspark.sql.functions import first

l =[( 1        ,'A', 10, 'size' ),

( 1        , 'A', 30, 'height' ),

( 1        , 'A', 20, 'weigth' ),

( 2        , 'A', 10, 'size' ),

( 2        , 'A', 30, 'height' ),

( 2        , 'A', 20, 'weigth' ),

( 3        , 'A', 10, 'size' ),

( 3        , 'A', 30, 'height' ),

( 3        , 'A', 20, 'weigth' )]



df = spark.createDataFrame(l, ['id','place', 'value', 'attribute'])



df.groupBy(df.id, df.place).pivot('attribute').agg(first("value")).show()



+---+-----+------+----+------+ 

| id|place|height|size|weigth|

+---+-----+------+----+------+ 

|  2|    A|    30|  10|    20| 

|  3|    A|    30|  10|    20| 

|  1|    A|    30|  10|    20|

+---+-----+------+----+------+

answered Nov 24 '18 at 1:27

cronoik

425314

from pyspark.sql.functions import first

l =[( 1        ,'A', 10, 'size' ),

( 1        , 'A', 30, 'height' ),

( 1        , 'A', 20, 'weigth' ),

( 2        , 'A', 10, 'size' ),

( 2        , 'A', 30, 'height' ),

( 2        , 'A', 20, 'weigth' ),

( 3        , 'A', 10, 'size' ),

( 3        , 'A', 30, 'height' ),

( 3        , 'A', 20, 'weigth' )]



df = spark.createDataFrame(l, ['id','place', 'value', 'attribute'])



df.groupBy(df.id, df.place).pivot('attribute').agg(first("value")).show()



+---+-----+------+----+------+ 

| id|place|height|size|weigth|

+---+-----+------+----+------+ 

|  2|    A|    30|  10|    20| 

|  3|    A|    30|  10|    20| 

|  1|    A|    30|  10|    20|

+---+-----+------+----+------+

answered Nov 24 '18 at 1:27

cronoik

425314

answered Nov 24 '18 at 1:27

cronoik

425314

answered Nov 24 '18 at 1:27

cronoik

425314

answered Nov 24 '18 at 1:27

cronoik

425314

Thank you! that is what I was looking for!

– lolo
Nov 26 '18 at 0:36

add a comment |

Thank you! that is what I was looking for!

– lolo
Nov 26 '18 at 0:36

Thank you! that is what I was looking for!

– lolo
Nov 26 '18 at 0:36

add a comment |

df = df.groupBy(["id","place"]).pivot("attribute").sum("value")

This link also addresses the same question.

answered Nov 25 '18 at 10:10

cph_sto

2,3392421

add a comment |

df = df.groupBy(["id","place"]).pivot("attribute").sum("value")

This link also addresses the same question.

answered Nov 25 '18 at 10:10

cph_sto

2,3392421

add a comment |

df = df.groupBy(["id","place"]).pivot("attribute").sum("value")

This link also addresses the same question.

answered Nov 25 '18 at 10:10

cph_sto

2,3392421

df = df.groupBy(["id","place"]).pivot("attribute").sum("value")

This link also addresses the same question.

answered Nov 25 '18 at 10:10

cph_sto

2,3392421

answered Nov 25 '18 at 10:10

cph_sto

2,3392421

answered Nov 25 '18 at 10:10

cph_sto

2,3392421

answered Nov 25 '18 at 10:10

cph_sto

2,3392421

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Nsryjdtyk