How to find 10 most frequent words in the file in Unix/Linux











up vote
0
down vote

favorite












How to find 10 most frequent words in the file in Unix/Linux?



I tried using this command in Unix:



$ sort file.txt | uniq -c | sort -nr | head -10



However I am not sure if it's correct and whether it is showing me 10 most frequent words in the large file.










share|improve this question


















  • 1




    Is file.txt just one word per line? Or are there multiple words per line?
    – dawg
    Nov 19 at 15:35










  • yes it has one word per line
    – rex7991
    Nov 19 at 15:37






  • 1




    awk '{cnt[$1]++} END{for (e in cnt) printf "%st%sn", cnt[e], e}' file.txt | sort -nr | head -n 10
    – dawg
    Nov 19 at 15:39








  • 1




    Please add example input and desired output.
    – dawg
    Nov 19 at 15:53






  • 1




    Why do you think your result is wrong?
    – kvantour
    Nov 19 at 16:06















up vote
0
down vote

favorite












How to find 10 most frequent words in the file in Unix/Linux?



I tried using this command in Unix:



$ sort file.txt | uniq -c | sort -nr | head -10



However I am not sure if it's correct and whether it is showing me 10 most frequent words in the large file.










share|improve this question


















  • 1




    Is file.txt just one word per line? Or are there multiple words per line?
    – dawg
    Nov 19 at 15:35










  • yes it has one word per line
    – rex7991
    Nov 19 at 15:37






  • 1




    awk '{cnt[$1]++} END{for (e in cnt) printf "%st%sn", cnt[e], e}' file.txt | sort -nr | head -n 10
    – dawg
    Nov 19 at 15:39








  • 1




    Please add example input and desired output.
    – dawg
    Nov 19 at 15:53






  • 1




    Why do you think your result is wrong?
    – kvantour
    Nov 19 at 16:06













up vote
0
down vote

favorite









up vote
0
down vote

favorite











How to find 10 most frequent words in the file in Unix/Linux?



I tried using this command in Unix:



$ sort file.txt | uniq -c | sort -nr | head -10



However I am not sure if it's correct and whether it is showing me 10 most frequent words in the large file.










share|improve this question













How to find 10 most frequent words in the file in Unix/Linux?



I tried using this command in Unix:



$ sort file.txt | uniq -c | sort -nr | head -10



However I am not sure if it's correct and whether it is showing me 10 most frequent words in the large file.







linux unix






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 19 at 15:30









rex7991

1




1








  • 1




    Is file.txt just one word per line? Or are there multiple words per line?
    – dawg
    Nov 19 at 15:35










  • yes it has one word per line
    – rex7991
    Nov 19 at 15:37






  • 1




    awk '{cnt[$1]++} END{for (e in cnt) printf "%st%sn", cnt[e], e}' file.txt | sort -nr | head -n 10
    – dawg
    Nov 19 at 15:39








  • 1




    Please add example input and desired output.
    – dawg
    Nov 19 at 15:53






  • 1




    Why do you think your result is wrong?
    – kvantour
    Nov 19 at 16:06














  • 1




    Is file.txt just one word per line? Or are there multiple words per line?
    – dawg
    Nov 19 at 15:35










  • yes it has one word per line
    – rex7991
    Nov 19 at 15:37






  • 1




    awk '{cnt[$1]++} END{for (e in cnt) printf "%st%sn", cnt[e], e}' file.txt | sort -nr | head -n 10
    – dawg
    Nov 19 at 15:39








  • 1




    Please add example input and desired output.
    – dawg
    Nov 19 at 15:53






  • 1




    Why do you think your result is wrong?
    – kvantour
    Nov 19 at 16:06








1




1




Is file.txt just one word per line? Or are there multiple words per line?
– dawg
Nov 19 at 15:35




Is file.txt just one word per line? Or are there multiple words per line?
– dawg
Nov 19 at 15:35












yes it has one word per line
– rex7991
Nov 19 at 15:37




yes it has one word per line
– rex7991
Nov 19 at 15:37




1




1




awk '{cnt[$1]++} END{for (e in cnt) printf "%st%sn", cnt[e], e}' file.txt | sort -nr | head -n 10
– dawg
Nov 19 at 15:39






awk '{cnt[$1]++} END{for (e in cnt) printf "%st%sn", cnt[e], e}' file.txt | sort -nr | head -n 10
– dawg
Nov 19 at 15:39






1




1




Please add example input and desired output.
– dawg
Nov 19 at 15:53




Please add example input and desired output.
– dawg
Nov 19 at 15:53




1




1




Why do you think your result is wrong?
– kvantour
Nov 19 at 16:06




Why do you think your result is wrong?
– kvantour
Nov 19 at 16:06












1 Answer
1






active

oldest

votes

















up vote
0
down vote













I have a shell demo to deal with your problem ,even you have a file with more than one Word in one line



wordcount.sh



#!/bin/bash

# filename: wordcount.sh
# usage: word count

# handle position arguments
if [ $# -ne 1 ]
then
echo "Usage: $0 filename"
exit -1
fi

# realize word count
printf "%-14s%sn" "Word" "Count"

cat $1 | tr 'A-Z' 'a-z' |
egrep -o "b[[:alpha:]]+b" |
awk '{ count[$0]++ }
END{
for(ind in count)
{ printf("%-14s%dn",ind,count[ind]); }
}' | sort -k2 -n -r | head -n 10


just run ./wordcount.sh filename.txt



explain

Use the tr command to convert all uppercase letters to lowercase letters, then use the egrep command to grab all the words in the text and output them item by item. Finally, use the awk command and the associative array to implement the word count function, and decrement the output according to the number of occurrences. .






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53377867%2fhow-to-find-10-most-frequent-words-in-the-file-in-unix-linux%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote













    I have a shell demo to deal with your problem ,even you have a file with more than one Word in one line



    wordcount.sh



    #!/bin/bash

    # filename: wordcount.sh
    # usage: word count

    # handle position arguments
    if [ $# -ne 1 ]
    then
    echo "Usage: $0 filename"
    exit -1
    fi

    # realize word count
    printf "%-14s%sn" "Word" "Count"

    cat $1 | tr 'A-Z' 'a-z' |
    egrep -o "b[[:alpha:]]+b" |
    awk '{ count[$0]++ }
    END{
    for(ind in count)
    { printf("%-14s%dn",ind,count[ind]); }
    }' | sort -k2 -n -r | head -n 10


    just run ./wordcount.sh filename.txt



    explain

    Use the tr command to convert all uppercase letters to lowercase letters, then use the egrep command to grab all the words in the text and output them item by item. Finally, use the awk command and the associative array to implement the word count function, and decrement the output according to the number of occurrences. .






    share|improve this answer



























      up vote
      0
      down vote













      I have a shell demo to deal with your problem ,even you have a file with more than one Word in one line



      wordcount.sh



      #!/bin/bash

      # filename: wordcount.sh
      # usage: word count

      # handle position arguments
      if [ $# -ne 1 ]
      then
      echo "Usage: $0 filename"
      exit -1
      fi

      # realize word count
      printf "%-14s%sn" "Word" "Count"

      cat $1 | tr 'A-Z' 'a-z' |
      egrep -o "b[[:alpha:]]+b" |
      awk '{ count[$0]++ }
      END{
      for(ind in count)
      { printf("%-14s%dn",ind,count[ind]); }
      }' | sort -k2 -n -r | head -n 10


      just run ./wordcount.sh filename.txt



      explain

      Use the tr command to convert all uppercase letters to lowercase letters, then use the egrep command to grab all the words in the text and output them item by item. Finally, use the awk command and the associative array to implement the word count function, and decrement the output according to the number of occurrences. .






      share|improve this answer

























        up vote
        0
        down vote










        up vote
        0
        down vote









        I have a shell demo to deal with your problem ,even you have a file with more than one Word in one line



        wordcount.sh



        #!/bin/bash

        # filename: wordcount.sh
        # usage: word count

        # handle position arguments
        if [ $# -ne 1 ]
        then
        echo "Usage: $0 filename"
        exit -1
        fi

        # realize word count
        printf "%-14s%sn" "Word" "Count"

        cat $1 | tr 'A-Z' 'a-z' |
        egrep -o "b[[:alpha:]]+b" |
        awk '{ count[$0]++ }
        END{
        for(ind in count)
        { printf("%-14s%dn",ind,count[ind]); }
        }' | sort -k2 -n -r | head -n 10


        just run ./wordcount.sh filename.txt



        explain

        Use the tr command to convert all uppercase letters to lowercase letters, then use the egrep command to grab all the words in the text and output them item by item. Finally, use the awk command and the associative array to implement the word count function, and decrement the output according to the number of occurrences. .






        share|improve this answer














        I have a shell demo to deal with your problem ,even you have a file with more than one Word in one line



        wordcount.sh



        #!/bin/bash

        # filename: wordcount.sh
        # usage: word count

        # handle position arguments
        if [ $# -ne 1 ]
        then
        echo "Usage: $0 filename"
        exit -1
        fi

        # realize word count
        printf "%-14s%sn" "Word" "Count"

        cat $1 | tr 'A-Z' 'a-z' |
        egrep -o "b[[:alpha:]]+b" |
        awk '{ count[$0]++ }
        END{
        for(ind in count)
        { printf("%-14s%dn",ind,count[ind]); }
        }' | sort -k2 -n -r | head -n 10


        just run ./wordcount.sh filename.txt



        explain

        Use the tr command to convert all uppercase letters to lowercase letters, then use the egrep command to grab all the words in the text and output them item by item. Finally, use the awk command and the associative array to implement the word count function, and decrement the output according to the number of occurrences. .







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 19 at 15:59

























        answered Nov 19 at 15:53









        HbnKing

        6021315




        6021315






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53377867%2fhow-to-find-10-most-frequent-words-in-the-file-in-unix-linux%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Costa Masnaga

            Fotorealismo

            Sidney Franklin