Find most occurring words in text file

I have a log file which logs cat and sub cat names that failed with message error. My goal is to find the most occurring categories.

e.g. log.:

Mon, 26 Nov 2018 07:51:07 +0100 | 164: [ERROR ***] Category ID not found for 'mcat-name1' 'subcat-name1' ref: '073' 

Mon, 26 Nov 2018 07:51:08 +0100 | 278: [ERROR ***] Category ID not found for 'mcat-name2' 'subcat-name2' ref: '020'

Now I want to identify the top 10 categories that failed.

Using sed:

sed -e 's/s/n/g' < file.log | grep ERROR | sort | uniq -c | sort -nr  | head  -10

I am getting 1636 [ERROR

While I was looking for a list of categories sorting after amount of occurrenxe. e.g.

139 category1

23 category 2

...

edited Nov 27 '18 at 14:29

asked Nov 26 '18 at 7:25

merlin

7761922

Please post more explanatory samples of input and output in your post and let us know then as it is not clear.

– RavinderSingh13
Nov 26 '18 at 7:28

agree @RavinderSingh13 - there is no category1 in your example but you want it to be in output; and also fix question title - seems like you are looking for a count of somthing not the word itself

– Drako
Nov 26 '18 at 7:30

add a comment |

I have a log file which logs cat and sub cat names that failed with message error. My goal is to find the most occurring categories.

e.g. log.:

Mon, 26 Nov 2018 07:51:07 +0100 | 164: [ERROR ***] Category ID not found for 'mcat-name1' 'subcat-name1' ref: '073' 

Mon, 26 Nov 2018 07:51:08 +0100 | 278: [ERROR ***] Category ID not found for 'mcat-name2' 'subcat-name2' ref: '020'

Now I want to identify the top 10 categories that failed.

Using sed:

sed -e 's/s/n/g' < file.log | grep ERROR | sort | uniq -c | sort -nr  | head  -10

I am getting 1636 [ERROR

While I was looking for a list of categories sorting after amount of occurrenxe. e.g.

139 category1

23 category 2

...

edited Nov 27 '18 at 14:29

asked Nov 26 '18 at 7:25

merlin

7761922

Please post more explanatory samples of input and output in your post and let us know then as it is not clear.

– RavinderSingh13
Nov 26 '18 at 7:28

agree @RavinderSingh13 - there is no category1 in your example but you want it to be in output; and also fix question title - seems like you are looking for a count of somthing not the word itself

– Drako
Nov 26 '18 at 7:30

add a comment |

I have a log file which logs cat and sub cat names that failed with message error. My goal is to find the most occurring categories.

e.g. log.:

Mon, 26 Nov 2018 07:51:07 +0100 | 164: [ERROR ***] Category ID not found for 'mcat-name1' 'subcat-name1' ref: '073' 

Mon, 26 Nov 2018 07:51:08 +0100 | 278: [ERROR ***] Category ID not found for 'mcat-name2' 'subcat-name2' ref: '020'

Now I want to identify the top 10 categories that failed.

Using sed:

sed -e 's/s/n/g' < file.log | grep ERROR | sort | uniq -c | sort -nr  | head  -10

I am getting 1636 [ERROR

While I was looking for a list of categories sorting after amount of occurrenxe. e.g.

139 category1

23 category 2

...

edited Nov 27 '18 at 14:29

asked Nov 26 '18 at 7:25

merlin

7761922

I have a log file which logs cat and sub cat names that failed with message error. My goal is to find the most occurring categories.

e.g. log.:

Mon, 26 Nov 2018 07:51:07 +0100 | 164: [ERROR ***] Category ID not found for 'mcat-name1' 'subcat-name1' ref: '073' 

Mon, 26 Nov 2018 07:51:08 +0100 | 278: [ERROR ***] Category ID not found for 'mcat-name2' 'subcat-name2' ref: '020'

Now I want to identify the top 10 categories that failed.

Using sed:

sed -e 's/s/n/g' < file.log | grep ERROR | sort | uniq -c | sort -nr  | head  -10

I am getting 1636 [ERROR

While I was looking for a list of categories sorting after amount of occurrenxe. e.g.

139 category1

23 category 2

...

unix command-line text-processing

edited Nov 27 '18 at 14:29

asked Nov 26 '18 at 7:25

merlin

7761922

edited Nov 27 '18 at 14:29

asked Nov 26 '18 at 7:25

merlin

7761922

edited Nov 27 '18 at 14:29

asked Nov 26 '18 at 7:25

merlin

7761922

asked Nov 26 '18 at 7:25

merlin

7761922

asked Nov 26 '18 at 7:25

merlin

7761922

Please post more explanatory samples of input and output in your post and let us know then as it is not clear.

– RavinderSingh13
Nov 26 '18 at 7:28

agree @RavinderSingh13 - there is no category1 in your example but you want it to be in output; and also fix question title - seems like you are looking for a count of somthing not the word itself

– Drako
Nov 26 '18 at 7:30

add a comment |

Please post more explanatory samples of input and output in your post and let us know then as it is not clear.

– RavinderSingh13
Nov 26 '18 at 7:28

agree @RavinderSingh13 - there is no category1 in your example but you want it to be in output; and also fix question title - seems like you are looking for a count of somthing not the word itself

– Drako
Nov 26 '18 at 7:30

Please post more explanatory samples of input and output in your post and let us know then as it is not clear.

– RavinderSingh13
Nov 26 '18 at 7:28

agree @RavinderSingh13 - there is no category1 in your example but you want it to be in output; and also fix question title - seems like you are looking for a count of somthing not the word itself

– Drako
Nov 26 '18 at 7:30

add a comment |

5 Answers
5

active

oldest

votes

You say you want to make a counting using sed, but actually, you are having an entire pipeline with sed, grep, sort, uniq and head. Generally, when this happens, your problem is screaming for awk:

awk 'BEGIN{FS="47"; PROCINFO["sorted_in"]="@val_num_asc"}

     /[ERROR /{c[$2]++}

     END{for(i in c) { print c[i],i; if(++j == 10) exit } }' file

The above solution is a GNU awk solution as it makes use of non-POSIX compliant features such as the sorting of the array traversal (PROCINFO). The field separator is set to the <single quote> (') which has octal value 47 as it assumes that the category name is between single quotes.

If you are not using GNU awk, you could use sort and head or do the sorting yourself. One way is:

awk 'BEGIN{FS="47"; n=10 }

     /[ERROR /{ c[$2]++ }

     END {

       for (l in c) {

         for (i=1;i<=n;++i) { 

           if (c[l] > c[s[i]]) {

             for(j=n;j>i;--j) s[j]=s[j-1];

             s[i]=l

             break

           }

         }

       }

       for (i=1;i<=n;++i) {

         if (s[i]=="") break

         print c[s[i]], s[i]

       }

     }' file

or just do:

awk 'BEGIN{FS="47"}

     /[ERROR /{c[$2]++}

     END{for(i in c) { print c[i],i; if(++j == 10) exit } }' file 

| sort -nr | head -10

edited Nov 27 '18 at 14:33

answered Nov 27 '18 at 9:52

kvantour

9,92631731

awk seems to be the best solution. The result returns the category with a number assuming the amount occurrence, but is not sorted after the most occurrence.

– merlin
Nov 27 '18 at 10:29

@merlin do you have an example of the input file?

– kvantour
Nov 27 '18 at 10:32

Here is another line which has more txt after ref number. make and model are the ones I try to count in order to identify the cats with most errors: Mon, 26 Nov 2018 07:51:21 +0100 | 1232: [ERROR ***] Category ID not found for 'make' 'model' ref: '228239'

– merlin
Nov 27 '18 at 10:42

@merlin, could you please provide us with more than one more line. We need a sample of your input.

– kvantour
Nov 27 '18 at 10:48

How can I add a file to stackoverflow? awk version 20070501

– merlin
Nov 27 '18 at 10:51

|
show 3 more comments

You got 1636 [ERROR because you change the space character into a newline character, then you grep the word ERROR, then you count.

This :

sed -e 's/s/n/g' < file.log | grep ERROR

Gives you this :

[ERROR

[ERROR

[ERROR

[ERROR

[ERROR

[ERROR

... (1630 more)

You need to grep first then sed (pretty sure you can do better with sed but I'm just talking about the logic behind the commands) :

grep ERROR file.log | sed -e 's/s/n/g' | sort | uniq -c | sort -nr | head -10

This may not be the best solution as it counts the word ERROR and other useless words, but you didn't give us a lot of information on the input file.

edited Nov 26 '18 at 9:18

answered Nov 26 '18 at 8:54

Corentin Limier

2,0511611

Of course, sed is perfectly able to perform the work of grep, too. See useless use of grep

– tripleee
Nov 26 '18 at 9:12

add a comment |

Assuming 'Bulgari' is an example of a category you want to extract, try

sed -n "s/.*ERROR.*] Category '([^']*)'.*/1/p" file.log |

sort | uniq -c | sort -rn | head -n 10

The sed command finds lines which match a fairly complex regular expression and captures part of the line, then replaces the match with the captured substring, and prints it (the -n option disables the default print action, so we only print the extracted lines). The rest is basically identical to what you already had.

In the regex, we look for (beginning of line followed by) anything (except a newline) followed by ERROR and later on followed by ] Category ' and then a string which doesn't contain a single quote, then the closing single quote followed by anything. The lots of "anything (except newline)" are required in order to replace the entire line with just the captured string from inside the single quotes. The backslashed parentheses are what capture an expression; google for "backref" for the full scoop.

Your original attempt would only extract the actual ERROR strings, because you replaced all the surrounding spaces with newlines (assuming vaguely that your sed accepts the Perl s shorthand, which isn't standard in sed, and that n gets interpreted as a literal newline in the replacement, which also isn't entirely standard or portable).

edited Nov 26 '18 at 9:27

answered Nov 26 '18 at 9:17

tripleee

94.9k13133188

add a comment |

The way to go is to select the erred categories and replace the whole line with only the Category name using sed.

Give a try to this:

sed -e "s/^.* [ERROR .*] Category '([^']*)' .*$/1/g" file.log | sort  | uniq -c | sort -nr | head -16

^ is the start of the line

( ... ) : the char sequence enclosed in this escaped parenthesis can be referred with 1 for the first pair appearing in the regex, 2 for the second pair etc.

$ is the end of the line.

The sed selects a line which contains [ERROR and some chars until a ], folled with the word Category, and then after the (space) char, any sequence of chars, up to the next space char, is selected with a pair of escaped parenthesis, followed with any sequence of chars up to the end of the line. If a such a line is found, it is replaced with the char sequence after Category.

edited Nov 27 '18 at 7:19

answered Nov 26 '18 at 9:40

Jay jargot

1,9821511

add a comment |

Using Perl

> cat merlin.txt

Mon, 26 Nov 2018 07:51:07 +0100 | 164: [ERROR ***] Category ID not found for 'mcat-name1' 'subcat-name1' ref: '073'

Mon, 26 Nov 2018 07:51:08 +0100 | 278: [ERROR ***] Category ID not found for 'mcat-name2' 'subcat-name2' ref: '020'

Mon, 26 Nov 2018 07:51:21 +0100 | 1232: [ERROR ***] Category ID not found for 'make' 'model' ref: '228239'

> perl -ne ' { s/(.*)Category.*for(.+)ref.*/2/g and s/(47S+47)/$kv{$1}++/ge if /ERROR/}  END { foreach (sort keys %kv) { print "$_ $kv{$_}n" } } ' merlin.txt | sort -nr

'subcat-name2' 1

'subcat-name1' 1

'model' 1

'mcat-name2' 1

'mcat-name1' 1

'make' 1

>

answered Nov 27 '18 at 14:57

stack0114106

4,8322423

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53476413%2ffind-most-occurring-words-in-text-file%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

5 Answers
5

active

oldest

votes

5 Answers
5

active

oldest

votes

awk 'BEGIN{FS="47"; PROCINFO["sorted_in"]="@val_num_asc"}

     /[ERROR /{c[$2]++}

     END{for(i in c) { print c[i],i; if(++j == 10) exit } }' file

If you are not using GNU awk, you could use sort and head or do the sorting yourself. One way is:

awk 'BEGIN{FS="47"; n=10 }

     /[ERROR /{ c[$2]++ }

     END {

       for (l in c) {

         for (i=1;i<=n;++i) { 

           if (c[l] > c[s[i]]) {

             for(j=n;j>i;--j) s[j]=s[j-1];

             s[i]=l

             break

           }

         }

       }

       for (i=1;i<=n;++i) {

         if (s[i]=="") break

         print c[s[i]], s[i]

       }

     }' file

or just do:

awk 'BEGIN{FS="47"}

     /[ERROR /{c[$2]++}

     END{for(i in c) { print c[i],i; if(++j == 10) exit } }' file 

| sort -nr | head -10

edited Nov 27 '18 at 14:33

answered Nov 27 '18 at 9:52

kvantour

9,92631731

awk seems to be the best solution. The result returns the category with a number assuming the amount occurrence, but is not sorted after the most occurrence.

– merlin
Nov 27 '18 at 10:29

@merlin do you have an example of the input file?

– kvantour
Nov 27 '18 at 10:32

Here is another line which has more txt after ref number. make and model are the ones I try to count in order to identify the cats with most errors: Mon, 26 Nov 2018 07:51:21 +0100 | 1232: [ERROR ***] Category ID not found for 'make' 'model' ref: '228239'

– merlin
Nov 27 '18 at 10:42

@merlin, could you please provide us with more than one more line. We need a sample of your input.

– kvantour
Nov 27 '18 at 10:48

How can I add a file to stackoverflow? awk version 20070501

– merlin
Nov 27 '18 at 10:51

|
show 3 more comments

awk 'BEGIN{FS="47"; PROCINFO["sorted_in"]="@val_num_asc"}

     /[ERROR /{c[$2]++}

     END{for(i in c) { print c[i],i; if(++j == 10) exit } }' file

If you are not using GNU awk, you could use sort and head or do the sorting yourself. One way is:

awk 'BEGIN{FS="47"; n=10 }

     /[ERROR /{ c[$2]++ }

     END {

       for (l in c) {

         for (i=1;i<=n;++i) { 

           if (c[l] > c[s[i]]) {

             for(j=n;j>i;--j) s[j]=s[j-1];

             s[i]=l

             break

           }

         }

       }

       for (i=1;i<=n;++i) {

         if (s[i]=="") break

         print c[s[i]], s[i]

       }

     }' file

or just do:

awk 'BEGIN{FS="47"}

     /[ERROR /{c[$2]++}

     END{for(i in c) { print c[i],i; if(++j == 10) exit } }' file 

| sort -nr | head -10

edited Nov 27 '18 at 14:33

answered Nov 27 '18 at 9:52

kvantour

9,92631731

awk seems to be the best solution. The result returns the category with a number assuming the amount occurrence, but is not sorted after the most occurrence.

– merlin
Nov 27 '18 at 10:29

@merlin do you have an example of the input file?

– kvantour
Nov 27 '18 at 10:32

Here is another line which has more txt after ref number. make and model are the ones I try to count in order to identify the cats with most errors: Mon, 26 Nov 2018 07:51:21 +0100 | 1232: [ERROR ***] Category ID not found for 'make' 'model' ref: '228239'

– merlin
Nov 27 '18 at 10:42

@merlin, could you please provide us with more than one more line. We need a sample of your input.

– kvantour
Nov 27 '18 at 10:48

How can I add a file to stackoverflow? awk version 20070501

– merlin
Nov 27 '18 at 10:51

|
show 3 more comments

awk 'BEGIN{FS="47"; PROCINFO["sorted_in"]="@val_num_asc"}

     /[ERROR /{c[$2]++}

     END{for(i in c) { print c[i],i; if(++j == 10) exit } }' file

If you are not using GNU awk, you could use sort and head or do the sorting yourself. One way is:

awk 'BEGIN{FS="47"; n=10 }

     /[ERROR /{ c[$2]++ }

     END {

       for (l in c) {

         for (i=1;i<=n;++i) { 

           if (c[l] > c[s[i]]) {

             for(j=n;j>i;--j) s[j]=s[j-1];

             s[i]=l

             break

           }

         }

       }

       for (i=1;i<=n;++i) {

         if (s[i]=="") break

         print c[s[i]], s[i]

       }

     }' file

or just do:

awk 'BEGIN{FS="47"}

     /[ERROR /{c[$2]++}

     END{for(i in c) { print c[i],i; if(++j == 10) exit } }' file 

| sort -nr | head -10

edited Nov 27 '18 at 14:33

answered Nov 27 '18 at 9:52

kvantour

9,92631731

awk 'BEGIN{FS="47"; PROCINFO["sorted_in"]="@val_num_asc"}

     /[ERROR /{c[$2]++}

     END{for(i in c) { print c[i],i; if(++j == 10) exit } }' file

If you are not using GNU awk, you could use sort and head or do the sorting yourself. One way is:

awk 'BEGIN{FS="47"; n=10 }

     /[ERROR /{ c[$2]++ }

     END {

       for (l in c) {

         for (i=1;i<=n;++i) { 

           if (c[l] > c[s[i]]) {

             for(j=n;j>i;--j) s[j]=s[j-1];

             s[i]=l

             break

           }

         }

       }

       for (i=1;i<=n;++i) {

         if (s[i]=="") break

         print c[s[i]], s[i]

       }

     }' file

or just do:

awk 'BEGIN{FS="47"}

     /[ERROR /{c[$2]++}

     END{for(i in c) { print c[i],i; if(++j == 10) exit } }' file 

| sort -nr | head -10

edited Nov 27 '18 at 14:33

answered Nov 27 '18 at 9:52

kvantour

9,92631731

edited Nov 27 '18 at 14:33

answered Nov 27 '18 at 9:52

kvantour

9,92631731

answered Nov 27 '18 at 9:52

kvantour

9,92631731

answered Nov 27 '18 at 9:52

kvantour

9,92631731

awk seems to be the best solution. The result returns the category with a number assuming the amount occurrence, but is not sorted after the most occurrence.

– merlin
Nov 27 '18 at 10:29

@merlin do you have an example of the input file?

– kvantour
Nov 27 '18 at 10:32

Here is another line which has more txt after ref number. make and model are the ones I try to count in order to identify the cats with most errors: Mon, 26 Nov 2018 07:51:21 +0100 | 1232: [ERROR ***] Category ID not found for 'make' 'model' ref: '228239'

– merlin
Nov 27 '18 at 10:42

@merlin, could you please provide us with more than one more line. We need a sample of your input.

– kvantour
Nov 27 '18 at 10:48

How can I add a file to stackoverflow? awk version 20070501

– merlin
Nov 27 '18 at 10:51

|
show 3 more comments

awk seems to be the best solution. The result returns the category with a number assuming the amount occurrence, but is not sorted after the most occurrence.

– merlin
Nov 27 '18 at 10:29

@merlin do you have an example of the input file?

– kvantour
Nov 27 '18 at 10:32

Here is another line which has more txt after ref number. make and model are the ones I try to count in order to identify the cats with most errors: Mon, 26 Nov 2018 07:51:21 +0100 | 1232: [ERROR ***] Category ID not found for 'make' 'model' ref: '228239'

– merlin
Nov 27 '18 at 10:42

@merlin, could you please provide us with more than one more line. We need a sample of your input.

– kvantour
Nov 27 '18 at 10:48

How can I add a file to stackoverflow? awk version 20070501

– merlin
Nov 27 '18 at 10:51

awk seems to be the best solution. The result returns the category with a number assuming the amount occurrence, but is not sorted after the most occurrence.

– merlin
Nov 27 '18 at 10:29

@merlin do you have an example of the input file?

– kvantour
Nov 27 '18 at 10:32

Here is another line which has more txt after ref number. make and model are the ones I try to count in order to identify the cats with most errors: Mon, 26 Nov 2018 07:51:21 +0100 | 1232: [ERROR ***] Category ID not found for 'make' 'model' ref: '228239'

– merlin
Nov 27 '18 at 10:42

@merlin, could you please provide us with more than one more line. We need a sample of your input.

– kvantour
Nov 27 '18 at 10:48

How can I add a file to stackoverflow? awk version 20070501

– merlin
Nov 27 '18 at 10:51

|
show 3 more comments

You got 1636 [ERROR because you change the space character into a newline character, then you grep the word ERROR, then you count.

This :

sed -e 's/s/n/g' < file.log | grep ERROR

Gives you this :

[ERROR

[ERROR

[ERROR

[ERROR

[ERROR

[ERROR

... (1630 more)

You need to grep first then sed (pretty sure you can do better with sed but I'm just talking about the logic behind the commands) :

grep ERROR file.log | sed -e 's/s/n/g' | sort | uniq -c | sort -nr | head -10

This may not be the best solution as it counts the word ERROR and other useless words, but you didn't give us a lot of information on the input file.

edited Nov 26 '18 at 9:18

answered Nov 26 '18 at 8:54

Corentin Limier

2,0511611

Of course, sed is perfectly able to perform the work of grep, too. See useless use of grep

– tripleee
Nov 26 '18 at 9:12

add a comment |

You got 1636 [ERROR because you change the space character into a newline character, then you grep the word ERROR, then you count.

This :

sed -e 's/s/n/g' < file.log | grep ERROR

Gives you this :

[ERROR

[ERROR

[ERROR

[ERROR

[ERROR

[ERROR

... (1630 more)

You need to grep first then sed (pretty sure you can do better with sed but I'm just talking about the logic behind the commands) :

grep ERROR file.log | sed -e 's/s/n/g' | sort | uniq -c | sort -nr | head -10

This may not be the best solution as it counts the word ERROR and other useless words, but you didn't give us a lot of information on the input file.

edited Nov 26 '18 at 9:18

answered Nov 26 '18 at 8:54

Corentin Limier

2,0511611

Of course, sed is perfectly able to perform the work of grep, too. See useless use of grep

– tripleee
Nov 26 '18 at 9:12

add a comment |

You got 1636 [ERROR because you change the space character into a newline character, then you grep the word ERROR, then you count.

This :

sed -e 's/s/n/g' < file.log | grep ERROR

Gives you this :

[ERROR

[ERROR

[ERROR

[ERROR

[ERROR

[ERROR

... (1630 more)

You need to grep first then sed (pretty sure you can do better with sed but I'm just talking about the logic behind the commands) :

grep ERROR file.log | sed -e 's/s/n/g' | sort | uniq -c | sort -nr | head -10

This may not be the best solution as it counts the word ERROR and other useless words, but you didn't give us a lot of information on the input file.

edited Nov 26 '18 at 9:18

answered Nov 26 '18 at 8:54

Corentin Limier

2,0511611

You got 1636 [ERROR because you change the space character into a newline character, then you grep the word ERROR, then you count.

This :

sed -e 's/s/n/g' < file.log | grep ERROR

Gives you this :

[ERROR

[ERROR

[ERROR

[ERROR

[ERROR

[ERROR

... (1630 more)

You need to grep first then sed (pretty sure you can do better with sed but I'm just talking about the logic behind the commands) :

grep ERROR file.log | sed -e 's/s/n/g' | sort | uniq -c | sort -nr | head -10

This may not be the best solution as it counts the word ERROR and other useless words, but you didn't give us a lot of information on the input file.

edited Nov 26 '18 at 9:18

answered Nov 26 '18 at 8:54

Corentin Limier

2,0511611

edited Nov 26 '18 at 9:18

answered Nov 26 '18 at 8:54

Corentin Limier

2,0511611

answered Nov 26 '18 at 8:54

Corentin Limier

2,0511611

answered Nov 26 '18 at 8:54

Corentin Limier

2,0511611

Of course, sed is perfectly able to perform the work of grep, too. See useless use of grep

– tripleee
Nov 26 '18 at 9:12

add a comment |

Of course, sed is perfectly able to perform the work of grep, too. See useless use of grep

– tripleee
Nov 26 '18 at 9:12

Of course, sed is perfectly able to perform the work of grep, too. See useless use of grep

– tripleee
Nov 26 '18 at 9:12

add a comment |

Assuming 'Bulgari' is an example of a category you want to extract, try

sed -n "s/.*ERROR.*] Category '([^']*)'.*/1/p" file.log |

sort | uniq -c | sort -rn | head -n 10

edited Nov 26 '18 at 9:27

answered Nov 26 '18 at 9:17

tripleee

94.9k13133188

add a comment |

Assuming 'Bulgari' is an example of a category you want to extract, try

sed -n "s/.*ERROR.*] Category '([^']*)'.*/1/p" file.log |

sort | uniq -c | sort -rn | head -n 10

edited Nov 26 '18 at 9:27

answered Nov 26 '18 at 9:17

tripleee

94.9k13133188

add a comment |

Assuming 'Bulgari' is an example of a category you want to extract, try

sed -n "s/.*ERROR.*] Category '([^']*)'.*/1/p" file.log |

sort | uniq -c | sort -rn | head -n 10

edited Nov 26 '18 at 9:27

answered Nov 26 '18 at 9:17

tripleee

94.9k13133188

Assuming 'Bulgari' is an example of a category you want to extract, try

sed -n "s/.*ERROR.*] Category '([^']*)'.*/1/p" file.log |

sort | uniq -c | sort -rn | head -n 10

edited Nov 26 '18 at 9:27

answered Nov 26 '18 at 9:17

tripleee

94.9k13133188

edited Nov 26 '18 at 9:27

answered Nov 26 '18 at 9:17

tripleee

94.9k13133188

answered Nov 26 '18 at 9:17

tripleee

94.9k13133188

answered Nov 26 '18 at 9:17

tripleee

94.9k13133188

add a comment |

The way to go is to select the erred categories and replace the whole line with only the Category name using sed.

Give a try to this:

sed -e "s/^.* [ERROR .*] Category '([^']*)' .*$/1/g" file.log | sort  | uniq -c | sort -nr | head -16

^ is the start of the line

( ... ) : the char sequence enclosed in this escaped parenthesis can be referred with 1 for the first pair appearing in the regex, 2 for the second pair etc.

$ is the end of the line.

edited Nov 27 '18 at 7:19

answered Nov 26 '18 at 9:40

Jay jargot

1,9821511

add a comment |

The way to go is to select the erred categories and replace the whole line with only the Category name using sed.

Give a try to this:

sed -e "s/^.* [ERROR .*] Category '([^']*)' .*$/1/g" file.log | sort  | uniq -c | sort -nr | head -16

^ is the start of the line

( ... ) : the char sequence enclosed in this escaped parenthesis can be referred with 1 for the first pair appearing in the regex, 2 for the second pair etc.

$ is the end of the line.

edited Nov 27 '18 at 7:19

answered Nov 26 '18 at 9:40

Jay jargot

1,9821511

add a comment |

The way to go is to select the erred categories and replace the whole line with only the Category name using sed.

Give a try to this:

sed -e "s/^.* [ERROR .*] Category '([^']*)' .*$/1/g" file.log | sort  | uniq -c | sort -nr | head -16

^ is the start of the line

( ... ) : the char sequence enclosed in this escaped parenthesis can be referred with 1 for the first pair appearing in the regex, 2 for the second pair etc.

$ is the end of the line.

edited Nov 27 '18 at 7:19

answered Nov 26 '18 at 9:40

Jay jargot

1,9821511

The way to go is to select the erred categories and replace the whole line with only the Category name using sed.

Give a try to this:

sed -e "s/^.* [ERROR .*] Category '([^']*)' .*$/1/g" file.log | sort  | uniq -c | sort -nr | head -16

^ is the start of the line

( ... ) : the char sequence enclosed in this escaped parenthesis can be referred with 1 for the first pair appearing in the regex, 2 for the second pair etc.

$ is the end of the line.

edited Nov 27 '18 at 7:19

answered Nov 26 '18 at 9:40

Jay jargot

1,9821511

edited Nov 27 '18 at 7:19

answered Nov 26 '18 at 9:40

Jay jargot

1,9821511

answered Nov 26 '18 at 9:40

Jay jargot

1,9821511

answered Nov 26 '18 at 9:40

Jay jargot

1,9821511

add a comment |

Using Perl

> cat merlin.txt

Mon, 26 Nov 2018 07:51:07 +0100 | 164: [ERROR ***] Category ID not found for 'mcat-name1' 'subcat-name1' ref: '073'

Mon, 26 Nov 2018 07:51:08 +0100 | 278: [ERROR ***] Category ID not found for 'mcat-name2' 'subcat-name2' ref: '020'

Mon, 26 Nov 2018 07:51:21 +0100 | 1232: [ERROR ***] Category ID not found for 'make' 'model' ref: '228239'

> perl -ne ' { s/(.*)Category.*for(.+)ref.*/2/g and s/(47S+47)/$kv{$1}++/ge if /ERROR/}  END { foreach (sort keys %kv) { print "$_ $kv{$_}n" } } ' merlin.txt | sort -nr

'subcat-name2' 1

'subcat-name1' 1

'model' 1

'mcat-name2' 1

'mcat-name1' 1

'make' 1

>

answered Nov 27 '18 at 14:57

stack0114106

4,8322423

add a comment |

Using Perl

> cat merlin.txt

Mon, 26 Nov 2018 07:51:07 +0100 | 164: [ERROR ***] Category ID not found for 'mcat-name1' 'subcat-name1' ref: '073'

Mon, 26 Nov 2018 07:51:08 +0100 | 278: [ERROR ***] Category ID not found for 'mcat-name2' 'subcat-name2' ref: '020'

Mon, 26 Nov 2018 07:51:21 +0100 | 1232: [ERROR ***] Category ID not found for 'make' 'model' ref: '228239'

> perl -ne ' { s/(.*)Category.*for(.+)ref.*/2/g and s/(47S+47)/$kv{$1}++/ge if /ERROR/}  END { foreach (sort keys %kv) { print "$_ $kv{$_}n" } } ' merlin.txt | sort -nr

'subcat-name2' 1

'subcat-name1' 1

'model' 1

'mcat-name2' 1

'mcat-name1' 1

'make' 1

>

answered Nov 27 '18 at 14:57

stack0114106

4,8322423

add a comment |

Using Perl

> cat merlin.txt

Mon, 26 Nov 2018 07:51:07 +0100 | 164: [ERROR ***] Category ID not found for 'mcat-name1' 'subcat-name1' ref: '073'

Mon, 26 Nov 2018 07:51:08 +0100 | 278: [ERROR ***] Category ID not found for 'mcat-name2' 'subcat-name2' ref: '020'

Mon, 26 Nov 2018 07:51:21 +0100 | 1232: [ERROR ***] Category ID not found for 'make' 'model' ref: '228239'

> perl -ne ' { s/(.*)Category.*for(.+)ref.*/2/g and s/(47S+47)/$kv{$1}++/ge if /ERROR/}  END { foreach (sort keys %kv) { print "$_ $kv{$_}n" } } ' merlin.txt | sort -nr

'subcat-name2' 1

'subcat-name1' 1

'model' 1

'mcat-name2' 1

'mcat-name1' 1

'make' 1

>

answered Nov 27 '18 at 14:57

stack0114106

4,8322423

Using Perl

> cat merlin.txt

Mon, 26 Nov 2018 07:51:07 +0100 | 164: [ERROR ***] Category ID not found for 'mcat-name1' 'subcat-name1' ref: '073'

Mon, 26 Nov 2018 07:51:08 +0100 | 278: [ERROR ***] Category ID not found for 'mcat-name2' 'subcat-name2' ref: '020'

Mon, 26 Nov 2018 07:51:21 +0100 | 1232: [ERROR ***] Category ID not found for 'make' 'model' ref: '228239'

> perl -ne ' { s/(.*)Category.*for(.+)ref.*/2/g and s/(47S+47)/$kv{$1}++/ge if /ERROR/}  END { foreach (sort keys %kv) { print "$_ $kv{$_}n" } } ' merlin.txt | sort -nr

'subcat-name2' 1

'subcat-name1' 1

'model' 1

'mcat-name2' 1

'mcat-name1' 1

'make' 1

>

answered Nov 27 '18 at 14:57

stack0114106

4,8322423

answered Nov 27 '18 at 14:57

stack0114106

4,8322423

answered Nov 27 '18 at 14:57

stack0114106

4,8322423

answered Nov 27 '18 at 14:57

stack0114106

4,8322423

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Nsryjdtyk