Reverse summary to expand comma separated strings in dataframe [duplicate]
This question already has an answer here:
Split comma-separated strings in a column into separate rows
4 answers
I have the following dataframe
group = c("cat", "dog", "horse")
value = c("1", "2", "3")
list = c("siamese,burmese,balinese","corgi,sheltie,collie","arabian,friesian,andalusian" )
df = data.frame(group, value, list)
df
group value list
1 cat 1 siamese,burmese,balinese
2 dog 2 corgi,sheltie,collie
3 horse 3 arabian,friesian,andalusian
and am trying to achieve this:
group value list
1 cat 1 siamese
2 cat 1 burmese
3 cat 1 balinese
4 dog 2 corgi
5 dog 2 sheltie
6 dog 2 collie
7 horse 3 arabian
8 horse 3 friesian
9 horse 3 andalusian
I know how to summarize a dataframe, but I now realize that I don't know how to "unsummarize" one with comma separated strings.
r
marked as duplicate by Henrik
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 26 '18 at 0:29
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |
This question already has an answer here:
Split comma-separated strings in a column into separate rows
4 answers
I have the following dataframe
group = c("cat", "dog", "horse")
value = c("1", "2", "3")
list = c("siamese,burmese,balinese","corgi,sheltie,collie","arabian,friesian,andalusian" )
df = data.frame(group, value, list)
df
group value list
1 cat 1 siamese,burmese,balinese
2 dog 2 corgi,sheltie,collie
3 horse 3 arabian,friesian,andalusian
and am trying to achieve this:
group value list
1 cat 1 siamese
2 cat 1 burmese
3 cat 1 balinese
4 dog 2 corgi
5 dog 2 sheltie
6 dog 2 collie
7 horse 3 arabian
8 horse 3 friesian
9 horse 3 andalusian
I know how to summarize a dataframe, but I now realize that I don't know how to "unsummarize" one with comma separated strings.
r
marked as duplicate by Henrik
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 26 '18 at 0:29
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
1
Welcome to SO! Great first quetion! 👍🏼! Try:tidyr::separate_rows(df, list, sep=",")
– hrbrmstr
Nov 25 '18 at 23:57
I always forget aboutseparate_rows
and start reaching for unnest and strsplit. Thanks for the reminder.
– neilfws
Nov 25 '18 at 23:59
@neilfws I regularly have to reference the help on bothgather
andseparate
as — for some reason — I can't keep the parameters in active memory and I tend to start from the pkg help index page so I seeseparate_rows
alot :-) Perhapsdisunite
(or something like that) might be a better name or alias for it.
– hrbrmstr
Nov 26 '18 at 0:03
add a comment |
This question already has an answer here:
Split comma-separated strings in a column into separate rows
4 answers
I have the following dataframe
group = c("cat", "dog", "horse")
value = c("1", "2", "3")
list = c("siamese,burmese,balinese","corgi,sheltie,collie","arabian,friesian,andalusian" )
df = data.frame(group, value, list)
df
group value list
1 cat 1 siamese,burmese,balinese
2 dog 2 corgi,sheltie,collie
3 horse 3 arabian,friesian,andalusian
and am trying to achieve this:
group value list
1 cat 1 siamese
2 cat 1 burmese
3 cat 1 balinese
4 dog 2 corgi
5 dog 2 sheltie
6 dog 2 collie
7 horse 3 arabian
8 horse 3 friesian
9 horse 3 andalusian
I know how to summarize a dataframe, but I now realize that I don't know how to "unsummarize" one with comma separated strings.
r
This question already has an answer here:
Split comma-separated strings in a column into separate rows
4 answers
I have the following dataframe
group = c("cat", "dog", "horse")
value = c("1", "2", "3")
list = c("siamese,burmese,balinese","corgi,sheltie,collie","arabian,friesian,andalusian" )
df = data.frame(group, value, list)
df
group value list
1 cat 1 siamese,burmese,balinese
2 dog 2 corgi,sheltie,collie
3 horse 3 arabian,friesian,andalusian
and am trying to achieve this:
group value list
1 cat 1 siamese
2 cat 1 burmese
3 cat 1 balinese
4 dog 2 corgi
5 dog 2 sheltie
6 dog 2 collie
7 horse 3 arabian
8 horse 3 friesian
9 horse 3 andalusian
I know how to summarize a dataframe, but I now realize that I don't know how to "unsummarize" one with comma separated strings.
This question already has an answer here:
Split comma-separated strings in a column into separate rows
4 answers
r
r
asked Nov 25 '18 at 23:51
Helen K.Helen K.
333
333
marked as duplicate by Henrik
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 26 '18 at 0:29
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
marked as duplicate by Henrik
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 26 '18 at 0:29
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
1
Welcome to SO! Great first quetion! 👍🏼! Try:tidyr::separate_rows(df, list, sep=",")
– hrbrmstr
Nov 25 '18 at 23:57
I always forget aboutseparate_rows
and start reaching for unnest and strsplit. Thanks for the reminder.
– neilfws
Nov 25 '18 at 23:59
@neilfws I regularly have to reference the help on bothgather
andseparate
as — for some reason — I can't keep the parameters in active memory and I tend to start from the pkg help index page so I seeseparate_rows
alot :-) Perhapsdisunite
(or something like that) might be a better name or alias for it.
– hrbrmstr
Nov 26 '18 at 0:03
add a comment |
1
Welcome to SO! Great first quetion! 👍🏼! Try:tidyr::separate_rows(df, list, sep=",")
– hrbrmstr
Nov 25 '18 at 23:57
I always forget aboutseparate_rows
and start reaching for unnest and strsplit. Thanks for the reminder.
– neilfws
Nov 25 '18 at 23:59
@neilfws I regularly have to reference the help on bothgather
andseparate
as — for some reason — I can't keep the parameters in active memory and I tend to start from the pkg help index page so I seeseparate_rows
alot :-) Perhapsdisunite
(or something like that) might be a better name or alias for it.
– hrbrmstr
Nov 26 '18 at 0:03
1
1
Welcome to SO! Great first quetion! 👍🏼! Try:
tidyr::separate_rows(df, list, sep=",")
– hrbrmstr
Nov 25 '18 at 23:57
Welcome to SO! Great first quetion! 👍🏼! Try:
tidyr::separate_rows(df, list, sep=",")
– hrbrmstr
Nov 25 '18 at 23:57
I always forget about
separate_rows
and start reaching for unnest and strsplit. Thanks for the reminder.– neilfws
Nov 25 '18 at 23:59
I always forget about
separate_rows
and start reaching for unnest and strsplit. Thanks for the reminder.– neilfws
Nov 25 '18 at 23:59
@neilfws I regularly have to reference the help on both
gather
and separate
as — for some reason — I can't keep the parameters in active memory and I tend to start from the pkg help index page so I see separate_rows
alot :-) Perhaps disunite
(or something like that) might be a better name or alias for it.– hrbrmstr
Nov 26 '18 at 0:03
@neilfws I regularly have to reference the help on both
gather
and separate
as — for some reason — I can't keep the parameters in active memory and I tend to start from the pkg help index page so I see separate_rows
alot :-) Perhaps disunite
(or something like that) might be a better name or alias for it.– hrbrmstr
Nov 26 '18 at 0:03
add a comment |
2 Answers
2
active
oldest
votes
data.frame(
group = c("cat", "dog", "horse"),
value = c("1", "2", "3"),
list = c("siamese,burmese,balinese","corgi,sheltie,collie","arabian,friesian,andalusian"),
stringsAsFactors = FALSE
) -> xdf
tidyverse
:
tidyr::separate_rows(xdf, list, sep=",")
## group value list
## 1 cat 1 siamese
## 2 cat 1 burmese
## 3 cat 1 balinese
## 4 dog 2 corgi
## 5 dog 2 sheltie
## 6 dog 2 collie
## 7 horse 3 arabian
## 8 horse 3 friesian
## 9 horse 3 andalusian
Base R:
do.call(
rbind.data.frame,
lapply(1:nrow(xdf), function(idx) {
data.frame(
group = xdf[idx, "group"],
value = xdf[idx, "value"],
list = strsplit(xdf[idx, "list"], ",")[[1]],
stringsAsFactors = FALSE
)
})
)
## group value list
## 1 cat 1 siamese
## 2 cat 1 burmese
## 3 cat 1 balinese
## 4 dog 2 corgi
## 5 dog 2 sheltie
## 6 dog 2 collie
## 7 horse 3 arabian
## 8 horse 3 friesian
## 9 horse 3 andalusian
The shootout:
microbenchmark::microbenchmark(
unnest = transform(xdf, list = strsplit(list, ",")) %>%
tidyr::unnest(list),
separate_rows = tidyr::separate_rows(xdf, list, sep=","),
base = do.call(
rbind.data.frame,
lapply(1:nrow(xdf), function(idx) {
data.frame(
group = xdf[idx, "group"],
value = xdf[idx, "value"],
list = strsplit(xdf[idx, "list"], ",")[[1]],
stringsAsFactors = FALSE
)
})
)
)
## Unit: microseconds
## expr min lq mean median uq max neval
## unnest 3689.890 4280.7045 6326.231 4881.160 6428.508 16670.715 100
## separate_rows 5093.618 5602.2510 8479.712 6289.193 10352.847 24447.528 100
## base 872.343 975.1615 1589.915 1099.391 1660.324 6663.132 100
I'm constantly surprised at the horrible performance of tidyr
ops.
add a comment |
IIUC we have unnest
in R
library(dplyr)
library(tidyr)
df = data.frame(group, value, list,stringsAsFactors = F)
df %>%
transform(list = strsplit(list, ",")) %>%
unnest(list)
group value list
1 cat 1 siamese
2 cat 1 burmese
3 cat 1 balinese
4 dog 2 corgi
5 dog 2 sheltie
6 dog 2 collie
7 horse 3 arabian
8 horse 3 friesian
9 horse 3 andalusian
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
data.frame(
group = c("cat", "dog", "horse"),
value = c("1", "2", "3"),
list = c("siamese,burmese,balinese","corgi,sheltie,collie","arabian,friesian,andalusian"),
stringsAsFactors = FALSE
) -> xdf
tidyverse
:
tidyr::separate_rows(xdf, list, sep=",")
## group value list
## 1 cat 1 siamese
## 2 cat 1 burmese
## 3 cat 1 balinese
## 4 dog 2 corgi
## 5 dog 2 sheltie
## 6 dog 2 collie
## 7 horse 3 arabian
## 8 horse 3 friesian
## 9 horse 3 andalusian
Base R:
do.call(
rbind.data.frame,
lapply(1:nrow(xdf), function(idx) {
data.frame(
group = xdf[idx, "group"],
value = xdf[idx, "value"],
list = strsplit(xdf[idx, "list"], ",")[[1]],
stringsAsFactors = FALSE
)
})
)
## group value list
## 1 cat 1 siamese
## 2 cat 1 burmese
## 3 cat 1 balinese
## 4 dog 2 corgi
## 5 dog 2 sheltie
## 6 dog 2 collie
## 7 horse 3 arabian
## 8 horse 3 friesian
## 9 horse 3 andalusian
The shootout:
microbenchmark::microbenchmark(
unnest = transform(xdf, list = strsplit(list, ",")) %>%
tidyr::unnest(list),
separate_rows = tidyr::separate_rows(xdf, list, sep=","),
base = do.call(
rbind.data.frame,
lapply(1:nrow(xdf), function(idx) {
data.frame(
group = xdf[idx, "group"],
value = xdf[idx, "value"],
list = strsplit(xdf[idx, "list"], ",")[[1]],
stringsAsFactors = FALSE
)
})
)
)
## Unit: microseconds
## expr min lq mean median uq max neval
## unnest 3689.890 4280.7045 6326.231 4881.160 6428.508 16670.715 100
## separate_rows 5093.618 5602.2510 8479.712 6289.193 10352.847 24447.528 100
## base 872.343 975.1615 1589.915 1099.391 1660.324 6663.132 100
I'm constantly surprised at the horrible performance of tidyr
ops.
add a comment |
data.frame(
group = c("cat", "dog", "horse"),
value = c("1", "2", "3"),
list = c("siamese,burmese,balinese","corgi,sheltie,collie","arabian,friesian,andalusian"),
stringsAsFactors = FALSE
) -> xdf
tidyverse
:
tidyr::separate_rows(xdf, list, sep=",")
## group value list
## 1 cat 1 siamese
## 2 cat 1 burmese
## 3 cat 1 balinese
## 4 dog 2 corgi
## 5 dog 2 sheltie
## 6 dog 2 collie
## 7 horse 3 arabian
## 8 horse 3 friesian
## 9 horse 3 andalusian
Base R:
do.call(
rbind.data.frame,
lapply(1:nrow(xdf), function(idx) {
data.frame(
group = xdf[idx, "group"],
value = xdf[idx, "value"],
list = strsplit(xdf[idx, "list"], ",")[[1]],
stringsAsFactors = FALSE
)
})
)
## group value list
## 1 cat 1 siamese
## 2 cat 1 burmese
## 3 cat 1 balinese
## 4 dog 2 corgi
## 5 dog 2 sheltie
## 6 dog 2 collie
## 7 horse 3 arabian
## 8 horse 3 friesian
## 9 horse 3 andalusian
The shootout:
microbenchmark::microbenchmark(
unnest = transform(xdf, list = strsplit(list, ",")) %>%
tidyr::unnest(list),
separate_rows = tidyr::separate_rows(xdf, list, sep=","),
base = do.call(
rbind.data.frame,
lapply(1:nrow(xdf), function(idx) {
data.frame(
group = xdf[idx, "group"],
value = xdf[idx, "value"],
list = strsplit(xdf[idx, "list"], ",")[[1]],
stringsAsFactors = FALSE
)
})
)
)
## Unit: microseconds
## expr min lq mean median uq max neval
## unnest 3689.890 4280.7045 6326.231 4881.160 6428.508 16670.715 100
## separate_rows 5093.618 5602.2510 8479.712 6289.193 10352.847 24447.528 100
## base 872.343 975.1615 1589.915 1099.391 1660.324 6663.132 100
I'm constantly surprised at the horrible performance of tidyr
ops.
add a comment |
data.frame(
group = c("cat", "dog", "horse"),
value = c("1", "2", "3"),
list = c("siamese,burmese,balinese","corgi,sheltie,collie","arabian,friesian,andalusian"),
stringsAsFactors = FALSE
) -> xdf
tidyverse
:
tidyr::separate_rows(xdf, list, sep=",")
## group value list
## 1 cat 1 siamese
## 2 cat 1 burmese
## 3 cat 1 balinese
## 4 dog 2 corgi
## 5 dog 2 sheltie
## 6 dog 2 collie
## 7 horse 3 arabian
## 8 horse 3 friesian
## 9 horse 3 andalusian
Base R:
do.call(
rbind.data.frame,
lapply(1:nrow(xdf), function(idx) {
data.frame(
group = xdf[idx, "group"],
value = xdf[idx, "value"],
list = strsplit(xdf[idx, "list"], ",")[[1]],
stringsAsFactors = FALSE
)
})
)
## group value list
## 1 cat 1 siamese
## 2 cat 1 burmese
## 3 cat 1 balinese
## 4 dog 2 corgi
## 5 dog 2 sheltie
## 6 dog 2 collie
## 7 horse 3 arabian
## 8 horse 3 friesian
## 9 horse 3 andalusian
The shootout:
microbenchmark::microbenchmark(
unnest = transform(xdf, list = strsplit(list, ",")) %>%
tidyr::unnest(list),
separate_rows = tidyr::separate_rows(xdf, list, sep=","),
base = do.call(
rbind.data.frame,
lapply(1:nrow(xdf), function(idx) {
data.frame(
group = xdf[idx, "group"],
value = xdf[idx, "value"],
list = strsplit(xdf[idx, "list"], ",")[[1]],
stringsAsFactors = FALSE
)
})
)
)
## Unit: microseconds
## expr min lq mean median uq max neval
## unnest 3689.890 4280.7045 6326.231 4881.160 6428.508 16670.715 100
## separate_rows 5093.618 5602.2510 8479.712 6289.193 10352.847 24447.528 100
## base 872.343 975.1615 1589.915 1099.391 1660.324 6663.132 100
I'm constantly surprised at the horrible performance of tidyr
ops.
data.frame(
group = c("cat", "dog", "horse"),
value = c("1", "2", "3"),
list = c("siamese,burmese,balinese","corgi,sheltie,collie","arabian,friesian,andalusian"),
stringsAsFactors = FALSE
) -> xdf
tidyverse
:
tidyr::separate_rows(xdf, list, sep=",")
## group value list
## 1 cat 1 siamese
## 2 cat 1 burmese
## 3 cat 1 balinese
## 4 dog 2 corgi
## 5 dog 2 sheltie
## 6 dog 2 collie
## 7 horse 3 arabian
## 8 horse 3 friesian
## 9 horse 3 andalusian
Base R:
do.call(
rbind.data.frame,
lapply(1:nrow(xdf), function(idx) {
data.frame(
group = xdf[idx, "group"],
value = xdf[idx, "value"],
list = strsplit(xdf[idx, "list"], ",")[[1]],
stringsAsFactors = FALSE
)
})
)
## group value list
## 1 cat 1 siamese
## 2 cat 1 burmese
## 3 cat 1 balinese
## 4 dog 2 corgi
## 5 dog 2 sheltie
## 6 dog 2 collie
## 7 horse 3 arabian
## 8 horse 3 friesian
## 9 horse 3 andalusian
The shootout:
microbenchmark::microbenchmark(
unnest = transform(xdf, list = strsplit(list, ",")) %>%
tidyr::unnest(list),
separate_rows = tidyr::separate_rows(xdf, list, sep=","),
base = do.call(
rbind.data.frame,
lapply(1:nrow(xdf), function(idx) {
data.frame(
group = xdf[idx, "group"],
value = xdf[idx, "value"],
list = strsplit(xdf[idx, "list"], ",")[[1]],
stringsAsFactors = FALSE
)
})
)
)
## Unit: microseconds
## expr min lq mean median uq max neval
## unnest 3689.890 4280.7045 6326.231 4881.160 6428.508 16670.715 100
## separate_rows 5093.618 5602.2510 8479.712 6289.193 10352.847 24447.528 100
## base 872.343 975.1615 1589.915 1099.391 1660.324 6663.132 100
I'm constantly surprised at the horrible performance of tidyr
ops.
edited Nov 26 '18 at 0:19
answered Nov 26 '18 at 0:01
hrbrmstrhrbrmstr
61.7k693153
61.7k693153
add a comment |
add a comment |
IIUC we have unnest
in R
library(dplyr)
library(tidyr)
df = data.frame(group, value, list,stringsAsFactors = F)
df %>%
transform(list = strsplit(list, ",")) %>%
unnest(list)
group value list
1 cat 1 siamese
2 cat 1 burmese
3 cat 1 balinese
4 dog 2 corgi
5 dog 2 sheltie
6 dog 2 collie
7 horse 3 arabian
8 horse 3 friesian
9 horse 3 andalusian
add a comment |
IIUC we have unnest
in R
library(dplyr)
library(tidyr)
df = data.frame(group, value, list,stringsAsFactors = F)
df %>%
transform(list = strsplit(list, ",")) %>%
unnest(list)
group value list
1 cat 1 siamese
2 cat 1 burmese
3 cat 1 balinese
4 dog 2 corgi
5 dog 2 sheltie
6 dog 2 collie
7 horse 3 arabian
8 horse 3 friesian
9 horse 3 andalusian
add a comment |
IIUC we have unnest
in R
library(dplyr)
library(tidyr)
df = data.frame(group, value, list,stringsAsFactors = F)
df %>%
transform(list = strsplit(list, ",")) %>%
unnest(list)
group value list
1 cat 1 siamese
2 cat 1 burmese
3 cat 1 balinese
4 dog 2 corgi
5 dog 2 sheltie
6 dog 2 collie
7 horse 3 arabian
8 horse 3 friesian
9 horse 3 andalusian
IIUC we have unnest
in R
library(dplyr)
library(tidyr)
df = data.frame(group, value, list,stringsAsFactors = F)
df %>%
transform(list = strsplit(list, ",")) %>%
unnest(list)
group value list
1 cat 1 siamese
2 cat 1 burmese
3 cat 1 balinese
4 dog 2 corgi
5 dog 2 sheltie
6 dog 2 collie
7 horse 3 arabian
8 horse 3 friesian
9 horse 3 andalusian
answered Nov 26 '18 at 0:10
Wen-BenWen-Ben
119k83469
119k83469
add a comment |
add a comment |
1
Welcome to SO! Great first quetion! 👍🏼! Try:
tidyr::separate_rows(df, list, sep=",")
– hrbrmstr
Nov 25 '18 at 23:57
I always forget about
separate_rows
and start reaching for unnest and strsplit. Thanks for the reminder.– neilfws
Nov 25 '18 at 23:59
@neilfws I regularly have to reference the help on both
gather
andseparate
as — for some reason — I can't keep the parameters in active memory and I tend to start from the pkg help index page so I seeseparate_rows
alot :-) Perhapsdisunite
(or something like that) might be a better name or alias for it.– hrbrmstr
Nov 26 '18 at 0:03