Tabulate multiple levels for each column in dataframe












-1















> head(Gene)
Key Func.ensGene Func.genericGene Func.refGene
1 1 intergenic intergenic intergenic
2 2 intergenic intergenic intergenic
3 3 intergenic intergenic intronic
4 4 exonic exonic exonic
5 5 intergenic intergenic intronic
6 6 intergenic intergenic intronic


Required Output:



Type          Func.ensGene Func.genericGene Func.refGene
exonic 1 1 1
intergenic 5 5 2
intronic 0 0 3


The solution I tried is working on only one column:



unique(Gene["Func.ensGene"])


Could I get the output table as shown above and get a barplot
where X-axis has the 'Type' and the bar represents counts from each column?










share|improve this question

























  • dput(head(Gene)) would have been better than just head(Gene)

    – snoram
    Nov 22 '18 at 13:43
















-1















> head(Gene)
Key Func.ensGene Func.genericGene Func.refGene
1 1 intergenic intergenic intergenic
2 2 intergenic intergenic intergenic
3 3 intergenic intergenic intronic
4 4 exonic exonic exonic
5 5 intergenic intergenic intronic
6 6 intergenic intergenic intronic


Required Output:



Type          Func.ensGene Func.genericGene Func.refGene
exonic 1 1 1
intergenic 5 5 2
intronic 0 0 3


The solution I tried is working on only one column:



unique(Gene["Func.ensGene"])


Could I get the output table as shown above and get a barplot
where X-axis has the 'Type' and the bar represents counts from each column?










share|improve this question

























  • dput(head(Gene)) would have been better than just head(Gene)

    – snoram
    Nov 22 '18 at 13:43














-1












-1








-1








> head(Gene)
Key Func.ensGene Func.genericGene Func.refGene
1 1 intergenic intergenic intergenic
2 2 intergenic intergenic intergenic
3 3 intergenic intergenic intronic
4 4 exonic exonic exonic
5 5 intergenic intergenic intronic
6 6 intergenic intergenic intronic


Required Output:



Type          Func.ensGene Func.genericGene Func.refGene
exonic 1 1 1
intergenic 5 5 2
intronic 0 0 3


The solution I tried is working on only one column:



unique(Gene["Func.ensGene"])


Could I get the output table as shown above and get a barplot
where X-axis has the 'Type' and the bar represents counts from each column?










share|improve this question
















> head(Gene)
Key Func.ensGene Func.genericGene Func.refGene
1 1 intergenic intergenic intergenic
2 2 intergenic intergenic intergenic
3 3 intergenic intergenic intronic
4 4 exonic exonic exonic
5 5 intergenic intergenic intronic
6 6 intergenic intergenic intronic


Required Output:



Type          Func.ensGene Func.genericGene Func.refGene
exonic 1 1 1
intergenic 5 5 2
intronic 0 0 3


The solution I tried is working on only one column:



unique(Gene["Func.ensGene"])


Could I get the output table as shown above and get a barplot
where X-axis has the 'Type' and the bar represents counts from each column?







r






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 22 '18 at 14:39







chas

















asked Nov 22 '18 at 13:21









chaschas

46631527




46631527













  • dput(head(Gene)) would have been better than just head(Gene)

    – snoram
    Nov 22 '18 at 13:43



















  • dput(head(Gene)) would have been better than just head(Gene)

    – snoram
    Nov 22 '18 at 13:43

















dput(head(Gene)) would have been better than just head(Gene)

– snoram
Nov 22 '18 at 13:43





dput(head(Gene)) would have been better than just head(Gene)

– snoram
Nov 22 '18 at 13:43












3 Answers
3






active

oldest

votes


















1














Simply use ?xtabs along with ?stack:



xtabs( ~ values + ind , stack(df1[,-1]))


or even shorter as @nicola suggests:



table(stack(df1[,-1]))


for both you get:



#            ind
#values Func.ensGene Func.genericGene Func.refGene
# exonic 1 1 1
# intergenic 5 5 2
# intronic 0 0 3


do you prefer further working on a data.frame?



as.data.frame.matrix(
xtabs( ~ values + ind , stack(df1[,-1])) # or again only table(stack(df1[,-1]))
)





share|improve this answer

































    1














    We can get all unique levels from the dataframe and for every column calculate the count of each level by first converting it to factor with unique levels calculated before.



    unique_names <- unique(unlist(df[-1]))
    sapply(df[-1], function(x) table(factor(x, levels = unique_names)))

    # Func.ensGene Func.genericGene Func.refGene
    #intergenic 5 5 2
    #exonic 1 1 1
    #intronic 0 0 3





    share|improve this answer































      0














      Love the base R solutions but using data.table and some magrittr for readability you could get a data.frame directly (instead of a table):



      library(magrittr)
      library(data.table)
      setDT(df)
      df %>%
      melt(id.vars = "Key") %>%
      .[, .N, .(variable, value)] %>%
      dcast(value ~ variable, value.var = "N", fill = 0)

      value Func.ensGene Func.genericGene Func.refGene
      1: exonic 1 1 1
      2: intergenic 5 5 2
      3: intronic 0 0 3


      Or much more concisely(as suggested by Henrik):



      dcast(melt(df, "Key"), value ~ variable)


      If you prefer tidyverse functions:



      library(tidyr)
      df %>%
      gather(key = Key) %>%
      group_by(Key, value) %>%
      count() %>%
      spread(Key, n, fill = 0)

      # A tibble: 3 x 4
      # Groups: value [3]
      value Func.ensGene Func.genericGene Func.refGene
      <chr> <dbl> <dbl> <dbl>
      1 exonic 1 1 1
      2 intergenic 5 5 2
      3 intronic 0 0 3


      Data:



      df <- data.frame(
      Key = 1:6,
      Func.ensGene = c("intergenic", "intergenic", "intergenic", "exonic", "intergenic", "intergenic"),
      Func.genericGene = c("intergenic", "intergenic", "intergenic", "exonic", "intergenic", "intergenic"),
      Func.refGene = c("intergenic", "intergenic", "intronic", "exonic", "intronic", "intronic"),
      stringsAsFactors = FALSE
      )





      share|improve this answer


























      • For those who prefer poorer readability: dcast(melt(df, "Key"), value ~ variable)

        – Henrik
        Nov 22 '18 at 15:13













      • Thanks @Henrik, I forgot about the summarising capabilities of dcast().

        – snoram
        Nov 22 '18 at 15:15











      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53431943%2ftabulate-multiple-levels-for-each-column-in-dataframe%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      1














      Simply use ?xtabs along with ?stack:



      xtabs( ~ values + ind , stack(df1[,-1]))


      or even shorter as @nicola suggests:



      table(stack(df1[,-1]))


      for both you get:



      #            ind
      #values Func.ensGene Func.genericGene Func.refGene
      # exonic 1 1 1
      # intergenic 5 5 2
      # intronic 0 0 3


      do you prefer further working on a data.frame?



      as.data.frame.matrix(
      xtabs( ~ values + ind , stack(df1[,-1])) # or again only table(stack(df1[,-1]))
      )





      share|improve this answer






























        1














        Simply use ?xtabs along with ?stack:



        xtabs( ~ values + ind , stack(df1[,-1]))


        or even shorter as @nicola suggests:



        table(stack(df1[,-1]))


        for both you get:



        #            ind
        #values Func.ensGene Func.genericGene Func.refGene
        # exonic 1 1 1
        # intergenic 5 5 2
        # intronic 0 0 3


        do you prefer further working on a data.frame?



        as.data.frame.matrix(
        xtabs( ~ values + ind , stack(df1[,-1])) # or again only table(stack(df1[,-1]))
        )





        share|improve this answer




























          1












          1








          1







          Simply use ?xtabs along with ?stack:



          xtabs( ~ values + ind , stack(df1[,-1]))


          or even shorter as @nicola suggests:



          table(stack(df1[,-1]))


          for both you get:



          #            ind
          #values Func.ensGene Func.genericGene Func.refGene
          # exonic 1 1 1
          # intergenic 5 5 2
          # intronic 0 0 3


          do you prefer further working on a data.frame?



          as.data.frame.matrix(
          xtabs( ~ values + ind , stack(df1[,-1])) # or again only table(stack(df1[,-1]))
          )





          share|improve this answer















          Simply use ?xtabs along with ?stack:



          xtabs( ~ values + ind , stack(df1[,-1]))


          or even shorter as @nicola suggests:



          table(stack(df1[,-1]))


          for both you get:



          #            ind
          #values Func.ensGene Func.genericGene Func.refGene
          # exonic 1 1 1
          # intergenic 5 5 2
          # intronic 0 0 3


          do you prefer further working on a data.frame?



          as.data.frame.matrix(
          xtabs( ~ values + ind , stack(df1[,-1])) # or again only table(stack(df1[,-1]))
          )






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 22 '18 at 13:57

























          answered Nov 22 '18 at 13:46









          Andre ElricoAndre Elrico

          5,70911029




          5,70911029

























              1














              We can get all unique levels from the dataframe and for every column calculate the count of each level by first converting it to factor with unique levels calculated before.



              unique_names <- unique(unlist(df[-1]))
              sapply(df[-1], function(x) table(factor(x, levels = unique_names)))

              # Func.ensGene Func.genericGene Func.refGene
              #intergenic 5 5 2
              #exonic 1 1 1
              #intronic 0 0 3





              share|improve this answer




























                1














                We can get all unique levels from the dataframe and for every column calculate the count of each level by first converting it to factor with unique levels calculated before.



                unique_names <- unique(unlist(df[-1]))
                sapply(df[-1], function(x) table(factor(x, levels = unique_names)))

                # Func.ensGene Func.genericGene Func.refGene
                #intergenic 5 5 2
                #exonic 1 1 1
                #intronic 0 0 3





                share|improve this answer


























                  1












                  1








                  1







                  We can get all unique levels from the dataframe and for every column calculate the count of each level by first converting it to factor with unique levels calculated before.



                  unique_names <- unique(unlist(df[-1]))
                  sapply(df[-1], function(x) table(factor(x, levels = unique_names)))

                  # Func.ensGene Func.genericGene Func.refGene
                  #intergenic 5 5 2
                  #exonic 1 1 1
                  #intronic 0 0 3





                  share|improve this answer













                  We can get all unique levels from the dataframe and for every column calculate the count of each level by first converting it to factor with unique levels calculated before.



                  unique_names <- unique(unlist(df[-1]))
                  sapply(df[-1], function(x) table(factor(x, levels = unique_names)))

                  # Func.ensGene Func.genericGene Func.refGene
                  #intergenic 5 5 2
                  #exonic 1 1 1
                  #intronic 0 0 3






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 22 '18 at 13:27









                  Ronak ShahRonak Shah

                  36.7k104161




                  36.7k104161























                      0














                      Love the base R solutions but using data.table and some magrittr for readability you could get a data.frame directly (instead of a table):



                      library(magrittr)
                      library(data.table)
                      setDT(df)
                      df %>%
                      melt(id.vars = "Key") %>%
                      .[, .N, .(variable, value)] %>%
                      dcast(value ~ variable, value.var = "N", fill = 0)

                      value Func.ensGene Func.genericGene Func.refGene
                      1: exonic 1 1 1
                      2: intergenic 5 5 2
                      3: intronic 0 0 3


                      Or much more concisely(as suggested by Henrik):



                      dcast(melt(df, "Key"), value ~ variable)


                      If you prefer tidyverse functions:



                      library(tidyr)
                      df %>%
                      gather(key = Key) %>%
                      group_by(Key, value) %>%
                      count() %>%
                      spread(Key, n, fill = 0)

                      # A tibble: 3 x 4
                      # Groups: value [3]
                      value Func.ensGene Func.genericGene Func.refGene
                      <chr> <dbl> <dbl> <dbl>
                      1 exonic 1 1 1
                      2 intergenic 5 5 2
                      3 intronic 0 0 3


                      Data:



                      df <- data.frame(
                      Key = 1:6,
                      Func.ensGene = c("intergenic", "intergenic", "intergenic", "exonic", "intergenic", "intergenic"),
                      Func.genericGene = c("intergenic", "intergenic", "intergenic", "exonic", "intergenic", "intergenic"),
                      Func.refGene = c("intergenic", "intergenic", "intronic", "exonic", "intronic", "intronic"),
                      stringsAsFactors = FALSE
                      )





                      share|improve this answer


























                      • For those who prefer poorer readability: dcast(melt(df, "Key"), value ~ variable)

                        – Henrik
                        Nov 22 '18 at 15:13













                      • Thanks @Henrik, I forgot about the summarising capabilities of dcast().

                        – snoram
                        Nov 22 '18 at 15:15
















                      0














                      Love the base R solutions but using data.table and some magrittr for readability you could get a data.frame directly (instead of a table):



                      library(magrittr)
                      library(data.table)
                      setDT(df)
                      df %>%
                      melt(id.vars = "Key") %>%
                      .[, .N, .(variable, value)] %>%
                      dcast(value ~ variable, value.var = "N", fill = 0)

                      value Func.ensGene Func.genericGene Func.refGene
                      1: exonic 1 1 1
                      2: intergenic 5 5 2
                      3: intronic 0 0 3


                      Or much more concisely(as suggested by Henrik):



                      dcast(melt(df, "Key"), value ~ variable)


                      If you prefer tidyverse functions:



                      library(tidyr)
                      df %>%
                      gather(key = Key) %>%
                      group_by(Key, value) %>%
                      count() %>%
                      spread(Key, n, fill = 0)

                      # A tibble: 3 x 4
                      # Groups: value [3]
                      value Func.ensGene Func.genericGene Func.refGene
                      <chr> <dbl> <dbl> <dbl>
                      1 exonic 1 1 1
                      2 intergenic 5 5 2
                      3 intronic 0 0 3


                      Data:



                      df <- data.frame(
                      Key = 1:6,
                      Func.ensGene = c("intergenic", "intergenic", "intergenic", "exonic", "intergenic", "intergenic"),
                      Func.genericGene = c("intergenic", "intergenic", "intergenic", "exonic", "intergenic", "intergenic"),
                      Func.refGene = c("intergenic", "intergenic", "intronic", "exonic", "intronic", "intronic"),
                      stringsAsFactors = FALSE
                      )





                      share|improve this answer


























                      • For those who prefer poorer readability: dcast(melt(df, "Key"), value ~ variable)

                        – Henrik
                        Nov 22 '18 at 15:13













                      • Thanks @Henrik, I forgot about the summarising capabilities of dcast().

                        – snoram
                        Nov 22 '18 at 15:15














                      0












                      0








                      0







                      Love the base R solutions but using data.table and some magrittr for readability you could get a data.frame directly (instead of a table):



                      library(magrittr)
                      library(data.table)
                      setDT(df)
                      df %>%
                      melt(id.vars = "Key") %>%
                      .[, .N, .(variable, value)] %>%
                      dcast(value ~ variable, value.var = "N", fill = 0)

                      value Func.ensGene Func.genericGene Func.refGene
                      1: exonic 1 1 1
                      2: intergenic 5 5 2
                      3: intronic 0 0 3


                      Or much more concisely(as suggested by Henrik):



                      dcast(melt(df, "Key"), value ~ variable)


                      If you prefer tidyverse functions:



                      library(tidyr)
                      df %>%
                      gather(key = Key) %>%
                      group_by(Key, value) %>%
                      count() %>%
                      spread(Key, n, fill = 0)

                      # A tibble: 3 x 4
                      # Groups: value [3]
                      value Func.ensGene Func.genericGene Func.refGene
                      <chr> <dbl> <dbl> <dbl>
                      1 exonic 1 1 1
                      2 intergenic 5 5 2
                      3 intronic 0 0 3


                      Data:



                      df <- data.frame(
                      Key = 1:6,
                      Func.ensGene = c("intergenic", "intergenic", "intergenic", "exonic", "intergenic", "intergenic"),
                      Func.genericGene = c("intergenic", "intergenic", "intergenic", "exonic", "intergenic", "intergenic"),
                      Func.refGene = c("intergenic", "intergenic", "intronic", "exonic", "intronic", "intronic"),
                      stringsAsFactors = FALSE
                      )





                      share|improve this answer















                      Love the base R solutions but using data.table and some magrittr for readability you could get a data.frame directly (instead of a table):



                      library(magrittr)
                      library(data.table)
                      setDT(df)
                      df %>%
                      melt(id.vars = "Key") %>%
                      .[, .N, .(variable, value)] %>%
                      dcast(value ~ variable, value.var = "N", fill = 0)

                      value Func.ensGene Func.genericGene Func.refGene
                      1: exonic 1 1 1
                      2: intergenic 5 5 2
                      3: intronic 0 0 3


                      Or much more concisely(as suggested by Henrik):



                      dcast(melt(df, "Key"), value ~ variable)


                      If you prefer tidyverse functions:



                      library(tidyr)
                      df %>%
                      gather(key = Key) %>%
                      group_by(Key, value) %>%
                      count() %>%
                      spread(Key, n, fill = 0)

                      # A tibble: 3 x 4
                      # Groups: value [3]
                      value Func.ensGene Func.genericGene Func.refGene
                      <chr> <dbl> <dbl> <dbl>
                      1 exonic 1 1 1
                      2 intergenic 5 5 2
                      3 intronic 0 0 3


                      Data:



                      df <- data.frame(
                      Key = 1:6,
                      Func.ensGene = c("intergenic", "intergenic", "intergenic", "exonic", "intergenic", "intergenic"),
                      Func.genericGene = c("intergenic", "intergenic", "intergenic", "exonic", "intergenic", "intergenic"),
                      Func.refGene = c("intergenic", "intergenic", "intronic", "exonic", "intronic", "intronic"),
                      stringsAsFactors = FALSE
                      )






                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited Nov 22 '18 at 15:16

























                      answered Nov 22 '18 at 13:53









                      snoramsnoram

                      7,101832




                      7,101832













                      • For those who prefer poorer readability: dcast(melt(df, "Key"), value ~ variable)

                        – Henrik
                        Nov 22 '18 at 15:13













                      • Thanks @Henrik, I forgot about the summarising capabilities of dcast().

                        – snoram
                        Nov 22 '18 at 15:15



















                      • For those who prefer poorer readability: dcast(melt(df, "Key"), value ~ variable)

                        – Henrik
                        Nov 22 '18 at 15:13













                      • Thanks @Henrik, I forgot about the summarising capabilities of dcast().

                        – snoram
                        Nov 22 '18 at 15:15

















                      For those who prefer poorer readability: dcast(melt(df, "Key"), value ~ variable)

                      – Henrik
                      Nov 22 '18 at 15:13







                      For those who prefer poorer readability: dcast(melt(df, "Key"), value ~ variable)

                      – Henrik
                      Nov 22 '18 at 15:13















                      Thanks @Henrik, I forgot about the summarising capabilities of dcast().

                      – snoram
                      Nov 22 '18 at 15:15





                      Thanks @Henrik, I forgot about the summarising capabilities of dcast().

                      – snoram
                      Nov 22 '18 at 15:15


















                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53431943%2ftabulate-multiple-levels-for-each-column-in-dataframe%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Create new schema in PostgreSQL using DBeaver

                      Deepest pit of an array with Javascript: test on Codility

                      Fotorealismo