Tabulate multiple levels for each column in dataframe

-1

> head(Gene)

  Key Func.ensGene Func.genericGene Func.refGene

1   1   intergenic       intergenic   intergenic

2   2   intergenic       intergenic   intergenic

3   3   intergenic       intergenic     intronic

4   4       exonic           exonic       exonic

5   5   intergenic       intergenic     intronic

6   6   intergenic       intergenic     intronic

Required Output:

Type          Func.ensGene Func.genericGene Func.refGene

exonic             1              1                1

intergenic         5              5                2

intronic           0              0                3

The solution I tried is working on only one column:

unique(Gene["Func.ensGene"])

Could I get the output table as shown above and get a barplot
where X-axis has the 'Type' and the bar represents counts from each column?

edited Nov 22 '18 at 14:39

asked Nov 22 '18 at 13:21

chas

46631527

dput(head(Gene)) would have been better than just head(Gene)

– snoram
Nov 22 '18 at 13:43

add a comment |

-1

> head(Gene)

  Key Func.ensGene Func.genericGene Func.refGene

1   1   intergenic       intergenic   intergenic

2   2   intergenic       intergenic   intergenic

3   3   intergenic       intergenic     intronic

4   4       exonic           exonic       exonic

5   5   intergenic       intergenic     intronic

6   6   intergenic       intergenic     intronic

Required Output:

Type          Func.ensGene Func.genericGene Func.refGene

exonic             1              1                1

intergenic         5              5                2

intronic           0              0                3

The solution I tried is working on only one column:

unique(Gene["Func.ensGene"])

Could I get the output table as shown above and get a barplot
where X-axis has the 'Type' and the bar represents counts from each column?

edited Nov 22 '18 at 14:39

asked Nov 22 '18 at 13:21

chas

46631527

dput(head(Gene)) would have been better than just head(Gene)

– snoram
Nov 22 '18 at 13:43

add a comment |

-1

> head(Gene)

  Key Func.ensGene Func.genericGene Func.refGene

1   1   intergenic       intergenic   intergenic

2   2   intergenic       intergenic   intergenic

3   3   intergenic       intergenic     intronic

4   4       exonic           exonic       exonic

5   5   intergenic       intergenic     intronic

6   6   intergenic       intergenic     intronic

Required Output:

Type          Func.ensGene Func.genericGene Func.refGene

exonic             1              1                1

intergenic         5              5                2

intronic           0              0                3

The solution I tried is working on only one column:

unique(Gene["Func.ensGene"])

Could I get the output table as shown above and get a barplot
where X-axis has the 'Type' and the bar represents counts from each column?

edited Nov 22 '18 at 14:39

asked Nov 22 '18 at 13:21

chas

46631527

> head(Gene)

  Key Func.ensGene Func.genericGene Func.refGene

1   1   intergenic       intergenic   intergenic

2   2   intergenic       intergenic   intergenic

3   3   intergenic       intergenic     intronic

4   4       exonic           exonic       exonic

5   5   intergenic       intergenic     intronic

6   6   intergenic       intergenic     intronic

Required Output:

Type          Func.ensGene Func.genericGene Func.refGene

exonic             1              1                1

intergenic         5              5                2

intronic           0              0                3

The solution I tried is working on only one column:

unique(Gene["Func.ensGene"])

Could I get the output table as shown above and get a barplot
where X-axis has the 'Type' and the bar represents counts from each column?

edited Nov 22 '18 at 14:39

asked Nov 22 '18 at 13:21

chas

46631527

edited Nov 22 '18 at 14:39

asked Nov 22 '18 at 13:21

chas

46631527

edited Nov 22 '18 at 14:39

asked Nov 22 '18 at 13:21

chas

46631527

asked Nov 22 '18 at 13:21

chas

46631527

asked Nov 22 '18 at 13:21

chas

46631527

dput(head(Gene)) would have been better than just head(Gene)

– snoram
Nov 22 '18 at 13:43

add a comment |

dput(head(Gene)) would have been better than just head(Gene)

– snoram
Nov 22 '18 at 13:43

dput(head(Gene)) would have been better than just head(Gene)

– snoram
Nov 22 '18 at 13:43

add a comment |

3 Answers
3

active

oldest

votes

Simply use ?xtabs along with ?stack:

xtabs( ~ values + ind , stack(df1[,-1]))

or even shorter as @nicola suggests:

table(stack(df1[,-1]))

for both you get:

#            ind

#values       Func.ensGene Func.genericGene Func.refGene

#  exonic                1                1            1

#  intergenic            5                5            2

#  intronic              0                0            3

do you prefer further working on a data.frame?

as.data.frame.matrix(

    xtabs( ~ values + ind , stack(df1[,-1]))  # or again only table(stack(df1[,-1]))

)

edited Nov 22 '18 at 13:57

answered Nov 22 '18 at 13:46

Andre Elrico

5,70911029

add a comment |

We can get all unique levels from the dataframe and for every column calculate the count of each level by first converting it to factor with unique levels calculated before.

unique_names <- unique(unlist(df[-1]))

sapply(df[-1], function(x) table(factor(x, levels = unique_names)))



#           Func.ensGene Func.genericGene Func.refGene

#intergenic            5                5            2

#exonic                1                1            1

#intronic              0                0            3

answered Nov 22 '18 at 13:27

Ronak Shah

36.7k104161

add a comment |

Love the base R solutions but using data.table and some magrittr for readability you could get a data.frame directly (instead of a table):

library(magrittr)

library(data.table)

setDT(df)

df %>%

  melt(id.vars = "Key") %>%

  .[, .N, .(variable, value)] %>% 

  dcast(value ~ variable, value.var = "N", fill = 0)



        value Func.ensGene Func.genericGene Func.refGene

1:     exonic            1                1            1

2: intergenic            5                5            2

3:   intronic            0                0            3

Or much more concisely(as suggested by Henrik):

dcast(melt(df, "Key"), value ~ variable)

If you prefer tidyverse functions:

library(tidyr)

df %>%

  gather(key = Key) %>%

  group_by(Key, value) %>%

  count() %>%

  spread(Key, n, fill = 0)



# A tibble: 3 x 4

# Groups:   value [3]

  value      Func.ensGene Func.genericGene Func.refGene

  <chr>             <dbl>            <dbl>        <dbl>

1 exonic                1                1            1

2 intergenic            5                5            2

3 intronic              0                0            3

Data:

df <- data.frame(

  Key              = 1:6, 

  Func.ensGene     = c("intergenic", "intergenic", "intergenic", "exonic", "intergenic", "intergenic"), 

  Func.genericGene = c("intergenic", "intergenic", "intergenic", "exonic", "intergenic", "intergenic"), 

  Func.refGene     = c("intergenic", "intergenic", "intronic", "exonic", "intronic", "intronic"),

  stringsAsFactors = FALSE

)

edited Nov 22 '18 at 15:16

answered Nov 22 '18 at 13:53

snoram

7,101832

For those who prefer poorer readability: dcast(melt(df, "Key"), value ~ variable)

– Henrik
Nov 22 '18 at 15:13

Thanks @Henrik, I forgot about the summarising capabilities of dcast().

– snoram
Nov 22 '18 at 15:15

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53431943%2ftabulate-multiple-levels-for-each-column-in-dataframe%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

Simply use ?xtabs along with ?stack:

xtabs( ~ values + ind , stack(df1[,-1]))

or even shorter as @nicola suggests:

table(stack(df1[,-1]))

for both you get:

#            ind

#values       Func.ensGene Func.genericGene Func.refGene

#  exonic                1                1            1

#  intergenic            5                5            2

#  intronic              0                0            3

do you prefer further working on a data.frame?

as.data.frame.matrix(

    xtabs( ~ values + ind , stack(df1[,-1]))  # or again only table(stack(df1[,-1]))

)

edited Nov 22 '18 at 13:57

answered Nov 22 '18 at 13:46

Andre Elrico

5,70911029

add a comment |

Simply use ?xtabs along with ?stack:

xtabs( ~ values + ind , stack(df1[,-1]))

or even shorter as @nicola suggests:

table(stack(df1[,-1]))

for both you get:

#            ind

#values       Func.ensGene Func.genericGene Func.refGene

#  exonic                1                1            1

#  intergenic            5                5            2

#  intronic              0                0            3

do you prefer further working on a data.frame?

as.data.frame.matrix(

    xtabs( ~ values + ind , stack(df1[,-1]))  # or again only table(stack(df1[,-1]))

)

edited Nov 22 '18 at 13:57

answered Nov 22 '18 at 13:46

Andre Elrico

5,70911029

add a comment |

Simply use ?xtabs along with ?stack:

xtabs( ~ values + ind , stack(df1[,-1]))

or even shorter as @nicola suggests:

table(stack(df1[,-1]))

for both you get:

#            ind

#values       Func.ensGene Func.genericGene Func.refGene

#  exonic                1                1            1

#  intergenic            5                5            2

#  intronic              0                0            3

do you prefer further working on a data.frame?

as.data.frame.matrix(

    xtabs( ~ values + ind , stack(df1[,-1]))  # or again only table(stack(df1[,-1]))

)

edited Nov 22 '18 at 13:57

answered Nov 22 '18 at 13:46

Andre Elrico

5,70911029

Simply use ?xtabs along with ?stack:

xtabs( ~ values + ind , stack(df1[,-1]))

or even shorter as @nicola suggests:

table(stack(df1[,-1]))

for both you get:

#            ind

#values       Func.ensGene Func.genericGene Func.refGene

#  exonic                1                1            1

#  intergenic            5                5            2

#  intronic              0                0            3

do you prefer further working on a data.frame?

as.data.frame.matrix(

    xtabs( ~ values + ind , stack(df1[,-1]))  # or again only table(stack(df1[,-1]))

)

edited Nov 22 '18 at 13:57

answered Nov 22 '18 at 13:46

Andre Elrico

5,70911029

edited Nov 22 '18 at 13:57

answered Nov 22 '18 at 13:46

Andre Elrico

5,70911029

answered Nov 22 '18 at 13:46

Andre Elrico

5,70911029

answered Nov 22 '18 at 13:46

Andre Elrico

5,70911029

add a comment |

We can get all unique levels from the dataframe and for every column calculate the count of each level by first converting it to factor with unique levels calculated before.

unique_names <- unique(unlist(df[-1]))

sapply(df[-1], function(x) table(factor(x, levels = unique_names)))



#           Func.ensGene Func.genericGene Func.refGene

#intergenic            5                5            2

#exonic                1                1            1

#intronic              0                0            3

answered Nov 22 '18 at 13:27

Ronak Shah

36.7k104161

add a comment |

We can get all unique levels from the dataframe and for every column calculate the count of each level by first converting it to factor with unique levels calculated before.

unique_names <- unique(unlist(df[-1]))

sapply(df[-1], function(x) table(factor(x, levels = unique_names)))



#           Func.ensGene Func.genericGene Func.refGene

#intergenic            5                5            2

#exonic                1                1            1

#intronic              0                0            3

answered Nov 22 '18 at 13:27

Ronak Shah

36.7k104161

add a comment |

We can get all unique levels from the dataframe and for every column calculate the count of each level by first converting it to factor with unique levels calculated before.

unique_names <- unique(unlist(df[-1]))

sapply(df[-1], function(x) table(factor(x, levels = unique_names)))



#           Func.ensGene Func.genericGene Func.refGene

#intergenic            5                5            2

#exonic                1                1            1

#intronic              0                0            3

answered Nov 22 '18 at 13:27

Ronak Shah

36.7k104161

We can get all unique levels from the dataframe and for every column calculate the count of each level by first converting it to factor with unique levels calculated before.

unique_names <- unique(unlist(df[-1]))

sapply(df[-1], function(x) table(factor(x, levels = unique_names)))



#           Func.ensGene Func.genericGene Func.refGene

#intergenic            5                5            2

#exonic                1                1            1

#intronic              0                0            3

answered Nov 22 '18 at 13:27

Ronak Shah

36.7k104161

answered Nov 22 '18 at 13:27

Ronak Shah

36.7k104161

answered Nov 22 '18 at 13:27

Ronak Shah

36.7k104161

answered Nov 22 '18 at 13:27

Ronak Shah

36.7k104161

add a comment |

Love the base R solutions but using data.table and some magrittr for readability you could get a data.frame directly (instead of a table):

library(magrittr)

library(data.table)

setDT(df)

df %>%

  melt(id.vars = "Key") %>%

  .[, .N, .(variable, value)] %>% 

  dcast(value ~ variable, value.var = "N", fill = 0)



        value Func.ensGene Func.genericGene Func.refGene

1:     exonic            1                1            1

2: intergenic            5                5            2

3:   intronic            0                0            3

Or much more concisely(as suggested by Henrik):

dcast(melt(df, "Key"), value ~ variable)

If you prefer tidyverse functions:

library(tidyr)

df %>%

  gather(key = Key) %>%

  group_by(Key, value) %>%

  count() %>%

  spread(Key, n, fill = 0)



# A tibble: 3 x 4

# Groups:   value [3]

  value      Func.ensGene Func.genericGene Func.refGene

  <chr>             <dbl>            <dbl>        <dbl>

1 exonic                1                1            1

2 intergenic            5                5            2

3 intronic              0                0            3

Data:

df <- data.frame(

  Key              = 1:6, 

  Func.ensGene     = c("intergenic", "intergenic", "intergenic", "exonic", "intergenic", "intergenic"), 

  Func.genericGene = c("intergenic", "intergenic", "intergenic", "exonic", "intergenic", "intergenic"), 

  Func.refGene     = c("intergenic", "intergenic", "intronic", "exonic", "intronic", "intronic"),

  stringsAsFactors = FALSE

)

edited Nov 22 '18 at 15:16

answered Nov 22 '18 at 13:53

snoram

7,101832

For those who prefer poorer readability: dcast(melt(df, "Key"), value ~ variable)

– Henrik
Nov 22 '18 at 15:13

Thanks @Henrik, I forgot about the summarising capabilities of dcast().

– snoram
Nov 22 '18 at 15:15

add a comment |

Love the base R solutions but using data.table and some magrittr for readability you could get a data.frame directly (instead of a table):

library(magrittr)

library(data.table)

setDT(df)

df %>%

  melt(id.vars = "Key") %>%

  .[, .N, .(variable, value)] %>% 

  dcast(value ~ variable, value.var = "N", fill = 0)



        value Func.ensGene Func.genericGene Func.refGene

1:     exonic            1                1            1

2: intergenic            5                5            2

3:   intronic            0                0            3

Or much more concisely(as suggested by Henrik):

dcast(melt(df, "Key"), value ~ variable)

If you prefer tidyverse functions:

library(tidyr)

df %>%

  gather(key = Key) %>%

  group_by(Key, value) %>%

  count() %>%

  spread(Key, n, fill = 0)



# A tibble: 3 x 4

# Groups:   value [3]

  value      Func.ensGene Func.genericGene Func.refGene

  <chr>             <dbl>            <dbl>        <dbl>

1 exonic                1                1            1

2 intergenic            5                5            2

3 intronic              0                0            3

Data:

df <- data.frame(

  Key              = 1:6, 

  Func.ensGene     = c("intergenic", "intergenic", "intergenic", "exonic", "intergenic", "intergenic"), 

  Func.genericGene = c("intergenic", "intergenic", "intergenic", "exonic", "intergenic", "intergenic"), 

  Func.refGene     = c("intergenic", "intergenic", "intronic", "exonic", "intronic", "intronic"),

  stringsAsFactors = FALSE

)

edited Nov 22 '18 at 15:16

answered Nov 22 '18 at 13:53

snoram

7,101832

For those who prefer poorer readability: dcast(melt(df, "Key"), value ~ variable)

– Henrik
Nov 22 '18 at 15:13

Thanks @Henrik, I forgot about the summarising capabilities of dcast().

– snoram
Nov 22 '18 at 15:15

add a comment |

Love the base R solutions but using data.table and some magrittr for readability you could get a data.frame directly (instead of a table):

library(magrittr)

library(data.table)

setDT(df)

df %>%

  melt(id.vars = "Key") %>%

  .[, .N, .(variable, value)] %>% 

  dcast(value ~ variable, value.var = "N", fill = 0)



        value Func.ensGene Func.genericGene Func.refGene

1:     exonic            1                1            1

2: intergenic            5                5            2

3:   intronic            0                0            3

Or much more concisely(as suggested by Henrik):

dcast(melt(df, "Key"), value ~ variable)

If you prefer tidyverse functions:

library(tidyr)

df %>%

  gather(key = Key) %>%

  group_by(Key, value) %>%

  count() %>%

  spread(Key, n, fill = 0)



# A tibble: 3 x 4

# Groups:   value [3]

  value      Func.ensGene Func.genericGene Func.refGene

  <chr>             <dbl>            <dbl>        <dbl>

1 exonic                1                1            1

2 intergenic            5                5            2

3 intronic              0                0            3

Data:

df <- data.frame(

  Key              = 1:6, 

  Func.ensGene     = c("intergenic", "intergenic", "intergenic", "exonic", "intergenic", "intergenic"), 

  Func.genericGene = c("intergenic", "intergenic", "intergenic", "exonic", "intergenic", "intergenic"), 

  Func.refGene     = c("intergenic", "intergenic", "intronic", "exonic", "intronic", "intronic"),

  stringsAsFactors = FALSE

)

edited Nov 22 '18 at 15:16

answered Nov 22 '18 at 13:53

snoram

7,101832

Love the base R solutions but using data.table and some magrittr for readability you could get a data.frame directly (instead of a table):

library(magrittr)

library(data.table)

setDT(df)

df %>%

  melt(id.vars = "Key") %>%

  .[, .N, .(variable, value)] %>% 

  dcast(value ~ variable, value.var = "N", fill = 0)



        value Func.ensGene Func.genericGene Func.refGene

1:     exonic            1                1            1

2: intergenic            5                5            2

3:   intronic            0                0            3

Or much more concisely(as suggested by Henrik):

dcast(melt(df, "Key"), value ~ variable)

If you prefer tidyverse functions:

library(tidyr)

df %>%

  gather(key = Key) %>%

  group_by(Key, value) %>%

  count() %>%

  spread(Key, n, fill = 0)



# A tibble: 3 x 4

# Groups:   value [3]

  value      Func.ensGene Func.genericGene Func.refGene

  <chr>             <dbl>            <dbl>        <dbl>

1 exonic                1                1            1

2 intergenic            5                5            2

3 intronic              0                0            3

Data:

df <- data.frame(

  Key              = 1:6, 

  Func.ensGene     = c("intergenic", "intergenic", "intergenic", "exonic", "intergenic", "intergenic"), 

  Func.genericGene = c("intergenic", "intergenic", "intergenic", "exonic", "intergenic", "intergenic"), 

  Func.refGene     = c("intergenic", "intergenic", "intronic", "exonic", "intronic", "intronic"),

  stringsAsFactors = FALSE

)

edited Nov 22 '18 at 15:16

answered Nov 22 '18 at 13:53

snoram

7,101832

edited Nov 22 '18 at 15:16

answered Nov 22 '18 at 13:53

snoram

7,101832

answered Nov 22 '18 at 13:53

snoram

7,101832

answered Nov 22 '18 at 13:53

snoram

7,101832

For those who prefer poorer readability: dcast(melt(df, "Key"), value ~ variable)

– Henrik
Nov 22 '18 at 15:13

Thanks @Henrik, I forgot about the summarising capabilities of dcast().

– snoram
Nov 22 '18 at 15:15

add a comment |

For those who prefer poorer readability: dcast(melt(df, "Key"), value ~ variable)

– Henrik
Nov 22 '18 at 15:13

Thanks @Henrik, I forgot about the summarising capabilities of dcast().

– snoram
Nov 22 '18 at 15:15

For those who prefer poorer readability: dcast(melt(df, "Key"), value ~ variable)

– Henrik
Nov 22 '18 at 15:13

Thanks @Henrik, I forgot about the summarising capabilities of dcast().

– snoram
Nov 22 '18 at 15:15

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Nsryjdtyk