Word cloud in R with multiple words and special characters











up vote
0
down vote

favorite












I want to create a wordcloud with R. I want to visualize the occurence of variable names, which may consist of more than one word and also special characters and numbers, for example one variable name is "S & P 500 dividend yield".



The variable names are in a text file and they are no further separated. Every line of the text file contains a new variable name.



I tried the folowing code, however the variable names are split into different characters:



library(tm)
library(SnowballC)
library(wordcloud)
library(RColorBrewer)


# load the text:
text <- readLines("./Overview_used_series.txt")
docs <- Corpus(VectorSource(text))
inspect(docs)

# build a term-document matrix:
tdm <- TermDocumentMatrix(docs)
m <- as.matrix(tdm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
head(d, 10)


# generate the wordcloud:
pdf("Word cloud.pdf")
wordcloud(words = d$word, freq = d$freq, min.freq = 1,
max.words=200, random.order=FALSE, rot.per=0.35,
colors=brewer.pal(8, "Dark2"))
dev.off()


How can I treat the variable names, so that they are visualized in the wordcloud with their original names as in the text file?










share|improve this question




























    up vote
    0
    down vote

    favorite












    I want to create a wordcloud with R. I want to visualize the occurence of variable names, which may consist of more than one word and also special characters and numbers, for example one variable name is "S & P 500 dividend yield".



    The variable names are in a text file and they are no further separated. Every line of the text file contains a new variable name.



    I tried the folowing code, however the variable names are split into different characters:



    library(tm)
    library(SnowballC)
    library(wordcloud)
    library(RColorBrewer)


    # load the text:
    text <- readLines("./Overview_used_series.txt")
    docs <- Corpus(VectorSource(text))
    inspect(docs)

    # build a term-document matrix:
    tdm <- TermDocumentMatrix(docs)
    m <- as.matrix(tdm)
    v <- sort(rowSums(m),decreasing=TRUE)
    d <- data.frame(word = names(v),freq=v)
    head(d, 10)


    # generate the wordcloud:
    pdf("Word cloud.pdf")
    wordcloud(words = d$word, freq = d$freq, min.freq = 1,
    max.words=200, random.order=FALSE, rot.per=0.35,
    colors=brewer.pal(8, "Dark2"))
    dev.off()


    How can I treat the variable names, so that they are visualized in the wordcloud with their original names as in the text file?










    share|improve this question


























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      I want to create a wordcloud with R. I want to visualize the occurence of variable names, which may consist of more than one word and also special characters and numbers, for example one variable name is "S & P 500 dividend yield".



      The variable names are in a text file and they are no further separated. Every line of the text file contains a new variable name.



      I tried the folowing code, however the variable names are split into different characters:



      library(tm)
      library(SnowballC)
      library(wordcloud)
      library(RColorBrewer)


      # load the text:
      text <- readLines("./Overview_used_series.txt")
      docs <- Corpus(VectorSource(text))
      inspect(docs)

      # build a term-document matrix:
      tdm <- TermDocumentMatrix(docs)
      m <- as.matrix(tdm)
      v <- sort(rowSums(m),decreasing=TRUE)
      d <- data.frame(word = names(v),freq=v)
      head(d, 10)


      # generate the wordcloud:
      pdf("Word cloud.pdf")
      wordcloud(words = d$word, freq = d$freq, min.freq = 1,
      max.words=200, random.order=FALSE, rot.per=0.35,
      colors=brewer.pal(8, "Dark2"))
      dev.off()


      How can I treat the variable names, so that they are visualized in the wordcloud with their original names as in the text file?










      share|improve this question















      I want to create a wordcloud with R. I want to visualize the occurence of variable names, which may consist of more than one word and also special characters and numbers, for example one variable name is "S & P 500 dividend yield".



      The variable names are in a text file and they are no further separated. Every line of the text file contains a new variable name.



      I tried the folowing code, however the variable names are split into different characters:



      library(tm)
      library(SnowballC)
      library(wordcloud)
      library(RColorBrewer)


      # load the text:
      text <- readLines("./Overview_used_series.txt")
      docs <- Corpus(VectorSource(text))
      inspect(docs)

      # build a term-document matrix:
      tdm <- TermDocumentMatrix(docs)
      m <- as.matrix(tdm)
      v <- sort(rowSums(m),decreasing=TRUE)
      d <- data.frame(word = names(v),freq=v)
      head(d, 10)


      # generate the wordcloud:
      pdf("Word cloud.pdf")
      wordcloud(words = d$word, freq = d$freq, min.freq = 1,
      max.words=200, random.order=FALSE, rot.per=0.35,
      colors=brewer.pal(8, "Dark2"))
      dev.off()


      How can I treat the variable names, so that they are visualized in the wordcloud with their original names as in the text file?







      r word-cloud






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 19 at 14:29

























      asked Nov 19 at 14:08









      user155417

      135




      135
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          0
          down vote













          If you have a file as you specified with a variable name per line, there is no need to use tm. You can easily create your own word frequency table to use as input. When using tm, it will split words based a space and will not respect your variable names.



          Starting from when the text is loaded, just create a data.frame with where frequency is set to 1 and then you can just aggregate everything. wordcloud also accepts data.frame like this and you can just create a wordcloud from this. Note that I adjusted the scale a bit, because when you have long variable names, they might not get printed. You will get a warning message when this happens.



          I'm not inserting the resulting picture.



          #text <- readLines("./Overview_used_series.txt")
          text <- c("S & P 500 dividend yield", "S & P 500 dividend yield", "S & P 500 dividend yield",
          "visualize ", "occurence ", "variable names", "visualize ", "occurence ",
          "variable names")

          # freq = 1 adds a columns with just 1's for every value.
          my_data <- data.frame(text = text, freq = 1, stringsAsFactors = FALSE)

          # aggregate the data.
          my_agr <- aggregate(freq ~ ., data = my_data, sum)

          wordcloud(words = my_agr$text, freq = my_agr$freq, min.freq = 1,
          max.words=200, random.order=FALSE, rot.per=0.35,
          colors=brewer.pal(8, "Dark2"), scale = c(2, .5))





          share|improve this answer





















          • Thank you, this works very well. However, I have the problem that I have a huge text file where the variable names are numerous, so it is impossible to type them in R by hand. Alternatively, I have the variable names in an xlsx file. Is there a solution where I can do this automatically? Reading in the variables and counting there occurence. In my example this is done by the term document matrix, unfortunately with the problem described above.
            – user155417
            Nov 20 at 8:14










          • You don't have to type anything by hand. I only used the text line as an example. You should just read in the data as you did with readLines (the first commented part of my code) and then follow the rest of my code. That should do it.
            – phiver
            Nov 20 at 13:11










          • Ah I see. It works! Perfect! Thank you very much!
            – user155417
            Nov 20 at 16:02











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53376416%2fword-cloud-in-r-with-multiple-words-and-special-characters%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          0
          down vote













          If you have a file as you specified with a variable name per line, there is no need to use tm. You can easily create your own word frequency table to use as input. When using tm, it will split words based a space and will not respect your variable names.



          Starting from when the text is loaded, just create a data.frame with where frequency is set to 1 and then you can just aggregate everything. wordcloud also accepts data.frame like this and you can just create a wordcloud from this. Note that I adjusted the scale a bit, because when you have long variable names, they might not get printed. You will get a warning message when this happens.



          I'm not inserting the resulting picture.



          #text <- readLines("./Overview_used_series.txt")
          text <- c("S & P 500 dividend yield", "S & P 500 dividend yield", "S & P 500 dividend yield",
          "visualize ", "occurence ", "variable names", "visualize ", "occurence ",
          "variable names")

          # freq = 1 adds a columns with just 1's for every value.
          my_data <- data.frame(text = text, freq = 1, stringsAsFactors = FALSE)

          # aggregate the data.
          my_agr <- aggregate(freq ~ ., data = my_data, sum)

          wordcloud(words = my_agr$text, freq = my_agr$freq, min.freq = 1,
          max.words=200, random.order=FALSE, rot.per=0.35,
          colors=brewer.pal(8, "Dark2"), scale = c(2, .5))





          share|improve this answer





















          • Thank you, this works very well. However, I have the problem that I have a huge text file where the variable names are numerous, so it is impossible to type them in R by hand. Alternatively, I have the variable names in an xlsx file. Is there a solution where I can do this automatically? Reading in the variables and counting there occurence. In my example this is done by the term document matrix, unfortunately with the problem described above.
            – user155417
            Nov 20 at 8:14










          • You don't have to type anything by hand. I only used the text line as an example. You should just read in the data as you did with readLines (the first commented part of my code) and then follow the rest of my code. That should do it.
            – phiver
            Nov 20 at 13:11










          • Ah I see. It works! Perfect! Thank you very much!
            – user155417
            Nov 20 at 16:02















          up vote
          0
          down vote













          If you have a file as you specified with a variable name per line, there is no need to use tm. You can easily create your own word frequency table to use as input. When using tm, it will split words based a space and will not respect your variable names.



          Starting from when the text is loaded, just create a data.frame with where frequency is set to 1 and then you can just aggregate everything. wordcloud also accepts data.frame like this and you can just create a wordcloud from this. Note that I adjusted the scale a bit, because when you have long variable names, they might not get printed. You will get a warning message when this happens.



          I'm not inserting the resulting picture.



          #text <- readLines("./Overview_used_series.txt")
          text <- c("S & P 500 dividend yield", "S & P 500 dividend yield", "S & P 500 dividend yield",
          "visualize ", "occurence ", "variable names", "visualize ", "occurence ",
          "variable names")

          # freq = 1 adds a columns with just 1's for every value.
          my_data <- data.frame(text = text, freq = 1, stringsAsFactors = FALSE)

          # aggregate the data.
          my_agr <- aggregate(freq ~ ., data = my_data, sum)

          wordcloud(words = my_agr$text, freq = my_agr$freq, min.freq = 1,
          max.words=200, random.order=FALSE, rot.per=0.35,
          colors=brewer.pal(8, "Dark2"), scale = c(2, .5))





          share|improve this answer





















          • Thank you, this works very well. However, I have the problem that I have a huge text file where the variable names are numerous, so it is impossible to type them in R by hand. Alternatively, I have the variable names in an xlsx file. Is there a solution where I can do this automatically? Reading in the variables and counting there occurence. In my example this is done by the term document matrix, unfortunately with the problem described above.
            – user155417
            Nov 20 at 8:14










          • You don't have to type anything by hand. I only used the text line as an example. You should just read in the data as you did with readLines (the first commented part of my code) and then follow the rest of my code. That should do it.
            – phiver
            Nov 20 at 13:11










          • Ah I see. It works! Perfect! Thank you very much!
            – user155417
            Nov 20 at 16:02













          up vote
          0
          down vote










          up vote
          0
          down vote









          If you have a file as you specified with a variable name per line, there is no need to use tm. You can easily create your own word frequency table to use as input. When using tm, it will split words based a space and will not respect your variable names.



          Starting from when the text is loaded, just create a data.frame with where frequency is set to 1 and then you can just aggregate everything. wordcloud also accepts data.frame like this and you can just create a wordcloud from this. Note that I adjusted the scale a bit, because when you have long variable names, they might not get printed. You will get a warning message when this happens.



          I'm not inserting the resulting picture.



          #text <- readLines("./Overview_used_series.txt")
          text <- c("S & P 500 dividend yield", "S & P 500 dividend yield", "S & P 500 dividend yield",
          "visualize ", "occurence ", "variable names", "visualize ", "occurence ",
          "variable names")

          # freq = 1 adds a columns with just 1's for every value.
          my_data <- data.frame(text = text, freq = 1, stringsAsFactors = FALSE)

          # aggregate the data.
          my_agr <- aggregate(freq ~ ., data = my_data, sum)

          wordcloud(words = my_agr$text, freq = my_agr$freq, min.freq = 1,
          max.words=200, random.order=FALSE, rot.per=0.35,
          colors=brewer.pal(8, "Dark2"), scale = c(2, .5))





          share|improve this answer












          If you have a file as you specified with a variable name per line, there is no need to use tm. You can easily create your own word frequency table to use as input. When using tm, it will split words based a space and will not respect your variable names.



          Starting from when the text is loaded, just create a data.frame with where frequency is set to 1 and then you can just aggregate everything. wordcloud also accepts data.frame like this and you can just create a wordcloud from this. Note that I adjusted the scale a bit, because when you have long variable names, they might not get printed. You will get a warning message when this happens.



          I'm not inserting the resulting picture.



          #text <- readLines("./Overview_used_series.txt")
          text <- c("S & P 500 dividend yield", "S & P 500 dividend yield", "S & P 500 dividend yield",
          "visualize ", "occurence ", "variable names", "visualize ", "occurence ",
          "variable names")

          # freq = 1 adds a columns with just 1's for every value.
          my_data <- data.frame(text = text, freq = 1, stringsAsFactors = FALSE)

          # aggregate the data.
          my_agr <- aggregate(freq ~ ., data = my_data, sum)

          wordcloud(words = my_agr$text, freq = my_agr$freq, min.freq = 1,
          max.words=200, random.order=FALSE, rot.per=0.35,
          colors=brewer.pal(8, "Dark2"), scale = c(2, .5))






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 19 at 17:12









          phiver

          12.2k92634




          12.2k92634












          • Thank you, this works very well. However, I have the problem that I have a huge text file where the variable names are numerous, so it is impossible to type them in R by hand. Alternatively, I have the variable names in an xlsx file. Is there a solution where I can do this automatically? Reading in the variables and counting there occurence. In my example this is done by the term document matrix, unfortunately with the problem described above.
            – user155417
            Nov 20 at 8:14










          • You don't have to type anything by hand. I only used the text line as an example. You should just read in the data as you did with readLines (the first commented part of my code) and then follow the rest of my code. That should do it.
            – phiver
            Nov 20 at 13:11










          • Ah I see. It works! Perfect! Thank you very much!
            – user155417
            Nov 20 at 16:02


















          • Thank you, this works very well. However, I have the problem that I have a huge text file where the variable names are numerous, so it is impossible to type them in R by hand. Alternatively, I have the variable names in an xlsx file. Is there a solution where I can do this automatically? Reading in the variables and counting there occurence. In my example this is done by the term document matrix, unfortunately with the problem described above.
            – user155417
            Nov 20 at 8:14










          • You don't have to type anything by hand. I only used the text line as an example. You should just read in the data as you did with readLines (the first commented part of my code) and then follow the rest of my code. That should do it.
            – phiver
            Nov 20 at 13:11










          • Ah I see. It works! Perfect! Thank you very much!
            – user155417
            Nov 20 at 16:02
















          Thank you, this works very well. However, I have the problem that I have a huge text file where the variable names are numerous, so it is impossible to type them in R by hand. Alternatively, I have the variable names in an xlsx file. Is there a solution where I can do this automatically? Reading in the variables and counting there occurence. In my example this is done by the term document matrix, unfortunately with the problem described above.
          – user155417
          Nov 20 at 8:14




          Thank you, this works very well. However, I have the problem that I have a huge text file where the variable names are numerous, so it is impossible to type them in R by hand. Alternatively, I have the variable names in an xlsx file. Is there a solution where I can do this automatically? Reading in the variables and counting there occurence. In my example this is done by the term document matrix, unfortunately with the problem described above.
          – user155417
          Nov 20 at 8:14












          You don't have to type anything by hand. I only used the text line as an example. You should just read in the data as you did with readLines (the first commented part of my code) and then follow the rest of my code. That should do it.
          – phiver
          Nov 20 at 13:11




          You don't have to type anything by hand. I only used the text line as an example. You should just read in the data as you did with readLines (the first commented part of my code) and then follow the rest of my code. That should do it.
          – phiver
          Nov 20 at 13:11












          Ah I see. It works! Perfect! Thank you very much!
          – user155417
          Nov 20 at 16:02




          Ah I see. It works! Perfect! Thank you very much!
          – user155417
          Nov 20 at 16:02


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53376416%2fword-cloud-in-r-with-multiple-words-and-special-characters%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Create new schema in PostgreSQL using DBeaver

          Deepest pit of an array with Javascript: test on Codility

          Costa Masnaga