Tabulate multiple levels for each column in dataframe
> head(Gene)
Key Func.ensGene Func.genericGene Func.refGene
1 1 intergenic intergenic intergenic
2 2 intergenic intergenic intergenic
3 3 intergenic intergenic intronic
4 4 exonic exonic exonic
5 5 intergenic intergenic intronic
6 6 intergenic intergenic intronic
Required Output:
Type Func.ensGene Func.genericGene Func.refGene
exonic 1 1 1
intergenic 5 5 2
intronic 0 0 3
The solution I tried is working on only one column:
unique(Gene["Func.ensGene"])
Could I get the output table as shown above and get a barplot
where X-axis has the 'Type' and the bar represents counts from each column?
r
add a comment |
> head(Gene)
Key Func.ensGene Func.genericGene Func.refGene
1 1 intergenic intergenic intergenic
2 2 intergenic intergenic intergenic
3 3 intergenic intergenic intronic
4 4 exonic exonic exonic
5 5 intergenic intergenic intronic
6 6 intergenic intergenic intronic
Required Output:
Type Func.ensGene Func.genericGene Func.refGene
exonic 1 1 1
intergenic 5 5 2
intronic 0 0 3
The solution I tried is working on only one column:
unique(Gene["Func.ensGene"])
Could I get the output table as shown above and get a barplot
where X-axis has the 'Type' and the bar represents counts from each column?
r
dput(head(Gene))
would have been better than justhead(Gene)
– snoram
Nov 22 '18 at 13:43
add a comment |
> head(Gene)
Key Func.ensGene Func.genericGene Func.refGene
1 1 intergenic intergenic intergenic
2 2 intergenic intergenic intergenic
3 3 intergenic intergenic intronic
4 4 exonic exonic exonic
5 5 intergenic intergenic intronic
6 6 intergenic intergenic intronic
Required Output:
Type Func.ensGene Func.genericGene Func.refGene
exonic 1 1 1
intergenic 5 5 2
intronic 0 0 3
The solution I tried is working on only one column:
unique(Gene["Func.ensGene"])
Could I get the output table as shown above and get a barplot
where X-axis has the 'Type' and the bar represents counts from each column?
r
> head(Gene)
Key Func.ensGene Func.genericGene Func.refGene
1 1 intergenic intergenic intergenic
2 2 intergenic intergenic intergenic
3 3 intergenic intergenic intronic
4 4 exonic exonic exonic
5 5 intergenic intergenic intronic
6 6 intergenic intergenic intronic
Required Output:
Type Func.ensGene Func.genericGene Func.refGene
exonic 1 1 1
intergenic 5 5 2
intronic 0 0 3
The solution I tried is working on only one column:
unique(Gene["Func.ensGene"])
Could I get the output table as shown above and get a barplot
where X-axis has the 'Type' and the bar represents counts from each column?
r
r
edited Nov 22 '18 at 14:39
chas
asked Nov 22 '18 at 13:21
chaschas
46631527
46631527
dput(head(Gene))
would have been better than justhead(Gene)
– snoram
Nov 22 '18 at 13:43
add a comment |
dput(head(Gene))
would have been better than justhead(Gene)
– snoram
Nov 22 '18 at 13:43
dput(head(Gene))
would have been better than just head(Gene)
– snoram
Nov 22 '18 at 13:43
dput(head(Gene))
would have been better than just head(Gene)
– snoram
Nov 22 '18 at 13:43
add a comment |
3 Answers
3
active
oldest
votes
Simply use ?xtabs
along with ?stack
:
xtabs( ~ values + ind , stack(df1[,-1]))
or even shorter as @nicola suggests:
table(stack(df1[,-1]))
for both you get:
# ind
#values Func.ensGene Func.genericGene Func.refGene
# exonic 1 1 1
# intergenic 5 5 2
# intronic 0 0 3
do you prefer further working on a data.frame?
as.data.frame.matrix(
xtabs( ~ values + ind , stack(df1[,-1])) # or again only table(stack(df1[,-1]))
)
add a comment |
We can get all unique
levels from the dataframe and for every column calculate the count of each level by first converting it to factor
with unique levels calculated before.
unique_names <- unique(unlist(df[-1]))
sapply(df[-1], function(x) table(factor(x, levels = unique_names)))
# Func.ensGene Func.genericGene Func.refGene
#intergenic 5 5 2
#exonic 1 1 1
#intronic 0 0 3
add a comment |
Love the base R solutions but using data.table
and some magrittr
for readability you could get a data.frame directly (instead of a table):
library(magrittr)
library(data.table)
setDT(df)
df %>%
melt(id.vars = "Key") %>%
.[, .N, .(variable, value)] %>%
dcast(value ~ variable, value.var = "N", fill = 0)
value Func.ensGene Func.genericGene Func.refGene
1: exonic 1 1 1
2: intergenic 5 5 2
3: intronic 0 0 3
Or much more concisely(as suggested by Henrik):
dcast(melt(df, "Key"), value ~ variable)
If you prefer tidyverse functions:
library(tidyr)
df %>%
gather(key = Key) %>%
group_by(Key, value) %>%
count() %>%
spread(Key, n, fill = 0)
# A tibble: 3 x 4
# Groups: value [3]
value Func.ensGene Func.genericGene Func.refGene
<chr> <dbl> <dbl> <dbl>
1 exonic 1 1 1
2 intergenic 5 5 2
3 intronic 0 0 3
Data:
df <- data.frame(
Key = 1:6,
Func.ensGene = c("intergenic", "intergenic", "intergenic", "exonic", "intergenic", "intergenic"),
Func.genericGene = c("intergenic", "intergenic", "intergenic", "exonic", "intergenic", "intergenic"),
Func.refGene = c("intergenic", "intergenic", "intronic", "exonic", "intronic", "intronic"),
stringsAsFactors = FALSE
)
For those who prefer poorer readability:dcast(melt(df, "Key"), value ~ variable)
– Henrik
Nov 22 '18 at 15:13
Thanks @Henrik, I forgot about the summarising capabilities ofdcast()
.
– snoram
Nov 22 '18 at 15:15
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53431943%2ftabulate-multiple-levels-for-each-column-in-dataframe%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Simply use ?xtabs
along with ?stack
:
xtabs( ~ values + ind , stack(df1[,-1]))
or even shorter as @nicola suggests:
table(stack(df1[,-1]))
for both you get:
# ind
#values Func.ensGene Func.genericGene Func.refGene
# exonic 1 1 1
# intergenic 5 5 2
# intronic 0 0 3
do you prefer further working on a data.frame?
as.data.frame.matrix(
xtabs( ~ values + ind , stack(df1[,-1])) # or again only table(stack(df1[,-1]))
)
add a comment |
Simply use ?xtabs
along with ?stack
:
xtabs( ~ values + ind , stack(df1[,-1]))
or even shorter as @nicola suggests:
table(stack(df1[,-1]))
for both you get:
# ind
#values Func.ensGene Func.genericGene Func.refGene
# exonic 1 1 1
# intergenic 5 5 2
# intronic 0 0 3
do you prefer further working on a data.frame?
as.data.frame.matrix(
xtabs( ~ values + ind , stack(df1[,-1])) # or again only table(stack(df1[,-1]))
)
add a comment |
Simply use ?xtabs
along with ?stack
:
xtabs( ~ values + ind , stack(df1[,-1]))
or even shorter as @nicola suggests:
table(stack(df1[,-1]))
for both you get:
# ind
#values Func.ensGene Func.genericGene Func.refGene
# exonic 1 1 1
# intergenic 5 5 2
# intronic 0 0 3
do you prefer further working on a data.frame?
as.data.frame.matrix(
xtabs( ~ values + ind , stack(df1[,-1])) # or again only table(stack(df1[,-1]))
)
Simply use ?xtabs
along with ?stack
:
xtabs( ~ values + ind , stack(df1[,-1]))
or even shorter as @nicola suggests:
table(stack(df1[,-1]))
for both you get:
# ind
#values Func.ensGene Func.genericGene Func.refGene
# exonic 1 1 1
# intergenic 5 5 2
# intronic 0 0 3
do you prefer further working on a data.frame?
as.data.frame.matrix(
xtabs( ~ values + ind , stack(df1[,-1])) # or again only table(stack(df1[,-1]))
)
edited Nov 22 '18 at 13:57
answered Nov 22 '18 at 13:46
Andre ElricoAndre Elrico
5,70911029
5,70911029
add a comment |
add a comment |
We can get all unique
levels from the dataframe and for every column calculate the count of each level by first converting it to factor
with unique levels calculated before.
unique_names <- unique(unlist(df[-1]))
sapply(df[-1], function(x) table(factor(x, levels = unique_names)))
# Func.ensGene Func.genericGene Func.refGene
#intergenic 5 5 2
#exonic 1 1 1
#intronic 0 0 3
add a comment |
We can get all unique
levels from the dataframe and for every column calculate the count of each level by first converting it to factor
with unique levels calculated before.
unique_names <- unique(unlist(df[-1]))
sapply(df[-1], function(x) table(factor(x, levels = unique_names)))
# Func.ensGene Func.genericGene Func.refGene
#intergenic 5 5 2
#exonic 1 1 1
#intronic 0 0 3
add a comment |
We can get all unique
levels from the dataframe and for every column calculate the count of each level by first converting it to factor
with unique levels calculated before.
unique_names <- unique(unlist(df[-1]))
sapply(df[-1], function(x) table(factor(x, levels = unique_names)))
# Func.ensGene Func.genericGene Func.refGene
#intergenic 5 5 2
#exonic 1 1 1
#intronic 0 0 3
We can get all unique
levels from the dataframe and for every column calculate the count of each level by first converting it to factor
with unique levels calculated before.
unique_names <- unique(unlist(df[-1]))
sapply(df[-1], function(x) table(factor(x, levels = unique_names)))
# Func.ensGene Func.genericGene Func.refGene
#intergenic 5 5 2
#exonic 1 1 1
#intronic 0 0 3
answered Nov 22 '18 at 13:27
Ronak ShahRonak Shah
36.7k104161
36.7k104161
add a comment |
add a comment |
Love the base R solutions but using data.table
and some magrittr
for readability you could get a data.frame directly (instead of a table):
library(magrittr)
library(data.table)
setDT(df)
df %>%
melt(id.vars = "Key") %>%
.[, .N, .(variable, value)] %>%
dcast(value ~ variable, value.var = "N", fill = 0)
value Func.ensGene Func.genericGene Func.refGene
1: exonic 1 1 1
2: intergenic 5 5 2
3: intronic 0 0 3
Or much more concisely(as suggested by Henrik):
dcast(melt(df, "Key"), value ~ variable)
If you prefer tidyverse functions:
library(tidyr)
df %>%
gather(key = Key) %>%
group_by(Key, value) %>%
count() %>%
spread(Key, n, fill = 0)
# A tibble: 3 x 4
# Groups: value [3]
value Func.ensGene Func.genericGene Func.refGene
<chr> <dbl> <dbl> <dbl>
1 exonic 1 1 1
2 intergenic 5 5 2
3 intronic 0 0 3
Data:
df <- data.frame(
Key = 1:6,
Func.ensGene = c("intergenic", "intergenic", "intergenic", "exonic", "intergenic", "intergenic"),
Func.genericGene = c("intergenic", "intergenic", "intergenic", "exonic", "intergenic", "intergenic"),
Func.refGene = c("intergenic", "intergenic", "intronic", "exonic", "intronic", "intronic"),
stringsAsFactors = FALSE
)
For those who prefer poorer readability:dcast(melt(df, "Key"), value ~ variable)
– Henrik
Nov 22 '18 at 15:13
Thanks @Henrik, I forgot about the summarising capabilities ofdcast()
.
– snoram
Nov 22 '18 at 15:15
add a comment |
Love the base R solutions but using data.table
and some magrittr
for readability you could get a data.frame directly (instead of a table):
library(magrittr)
library(data.table)
setDT(df)
df %>%
melt(id.vars = "Key") %>%
.[, .N, .(variable, value)] %>%
dcast(value ~ variable, value.var = "N", fill = 0)
value Func.ensGene Func.genericGene Func.refGene
1: exonic 1 1 1
2: intergenic 5 5 2
3: intronic 0 0 3
Or much more concisely(as suggested by Henrik):
dcast(melt(df, "Key"), value ~ variable)
If you prefer tidyverse functions:
library(tidyr)
df %>%
gather(key = Key) %>%
group_by(Key, value) %>%
count() %>%
spread(Key, n, fill = 0)
# A tibble: 3 x 4
# Groups: value [3]
value Func.ensGene Func.genericGene Func.refGene
<chr> <dbl> <dbl> <dbl>
1 exonic 1 1 1
2 intergenic 5 5 2
3 intronic 0 0 3
Data:
df <- data.frame(
Key = 1:6,
Func.ensGene = c("intergenic", "intergenic", "intergenic", "exonic", "intergenic", "intergenic"),
Func.genericGene = c("intergenic", "intergenic", "intergenic", "exonic", "intergenic", "intergenic"),
Func.refGene = c("intergenic", "intergenic", "intronic", "exonic", "intronic", "intronic"),
stringsAsFactors = FALSE
)
For those who prefer poorer readability:dcast(melt(df, "Key"), value ~ variable)
– Henrik
Nov 22 '18 at 15:13
Thanks @Henrik, I forgot about the summarising capabilities ofdcast()
.
– snoram
Nov 22 '18 at 15:15
add a comment |
Love the base R solutions but using data.table
and some magrittr
for readability you could get a data.frame directly (instead of a table):
library(magrittr)
library(data.table)
setDT(df)
df %>%
melt(id.vars = "Key") %>%
.[, .N, .(variable, value)] %>%
dcast(value ~ variable, value.var = "N", fill = 0)
value Func.ensGene Func.genericGene Func.refGene
1: exonic 1 1 1
2: intergenic 5 5 2
3: intronic 0 0 3
Or much more concisely(as suggested by Henrik):
dcast(melt(df, "Key"), value ~ variable)
If you prefer tidyverse functions:
library(tidyr)
df %>%
gather(key = Key) %>%
group_by(Key, value) %>%
count() %>%
spread(Key, n, fill = 0)
# A tibble: 3 x 4
# Groups: value [3]
value Func.ensGene Func.genericGene Func.refGene
<chr> <dbl> <dbl> <dbl>
1 exonic 1 1 1
2 intergenic 5 5 2
3 intronic 0 0 3
Data:
df <- data.frame(
Key = 1:6,
Func.ensGene = c("intergenic", "intergenic", "intergenic", "exonic", "intergenic", "intergenic"),
Func.genericGene = c("intergenic", "intergenic", "intergenic", "exonic", "intergenic", "intergenic"),
Func.refGene = c("intergenic", "intergenic", "intronic", "exonic", "intronic", "intronic"),
stringsAsFactors = FALSE
)
Love the base R solutions but using data.table
and some magrittr
for readability you could get a data.frame directly (instead of a table):
library(magrittr)
library(data.table)
setDT(df)
df %>%
melt(id.vars = "Key") %>%
.[, .N, .(variable, value)] %>%
dcast(value ~ variable, value.var = "N", fill = 0)
value Func.ensGene Func.genericGene Func.refGene
1: exonic 1 1 1
2: intergenic 5 5 2
3: intronic 0 0 3
Or much more concisely(as suggested by Henrik):
dcast(melt(df, "Key"), value ~ variable)
If you prefer tidyverse functions:
library(tidyr)
df %>%
gather(key = Key) %>%
group_by(Key, value) %>%
count() %>%
spread(Key, n, fill = 0)
# A tibble: 3 x 4
# Groups: value [3]
value Func.ensGene Func.genericGene Func.refGene
<chr> <dbl> <dbl> <dbl>
1 exonic 1 1 1
2 intergenic 5 5 2
3 intronic 0 0 3
Data:
df <- data.frame(
Key = 1:6,
Func.ensGene = c("intergenic", "intergenic", "intergenic", "exonic", "intergenic", "intergenic"),
Func.genericGene = c("intergenic", "intergenic", "intergenic", "exonic", "intergenic", "intergenic"),
Func.refGene = c("intergenic", "intergenic", "intronic", "exonic", "intronic", "intronic"),
stringsAsFactors = FALSE
)
edited Nov 22 '18 at 15:16
answered Nov 22 '18 at 13:53
snoramsnoram
7,101832
7,101832
For those who prefer poorer readability:dcast(melt(df, "Key"), value ~ variable)
– Henrik
Nov 22 '18 at 15:13
Thanks @Henrik, I forgot about the summarising capabilities ofdcast()
.
– snoram
Nov 22 '18 at 15:15
add a comment |
For those who prefer poorer readability:dcast(melt(df, "Key"), value ~ variable)
– Henrik
Nov 22 '18 at 15:13
Thanks @Henrik, I forgot about the summarising capabilities ofdcast()
.
– snoram
Nov 22 '18 at 15:15
For those who prefer poorer readability:
dcast(melt(df, "Key"), value ~ variable)
– Henrik
Nov 22 '18 at 15:13
For those who prefer poorer readability:
dcast(melt(df, "Key"), value ~ variable)
– Henrik
Nov 22 '18 at 15:13
Thanks @Henrik, I forgot about the summarising capabilities of
dcast()
.– snoram
Nov 22 '18 at 15:15
Thanks @Henrik, I forgot about the summarising capabilities of
dcast()
.– snoram
Nov 22 '18 at 15:15
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53431943%2ftabulate-multiple-levels-for-each-column-in-dataframe%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
dput(head(Gene))
would have been better than justhead(Gene)
– snoram
Nov 22 '18 at 13:43