Evaluating association strengths of items
$begingroup$
Say I have a list of animals with their counts:
import numpy as np
import pandas as pd
from random import randint
table = np.zeros((5,1), dtype=int)
for i in range(5):
table[i]=randint(10, 20)
df1 = pd.DataFrame(columns=['Animal', 'Count'])
df1['Animal'] = animal_list
df1['Count'] = table
df1
And I have a matrix of how many times they appear together:
table = np.zeros((5,5), dtype=int)
animal_list = ['Monkey', 'Tiger', 'Cat', 'Dog', 'Lion']
for i in range(5):
for j in range(5):
table[i][j]=randint(0, 9)
df2 = pd.DataFrame(table, columns=animal_list, index=animal_list)
df2
I want to find the animals' association strength, which is defined like so - if Lion
and Cat
appear together 5 times, and Lion
's count is 10 and Cat
's count is 15, then Lion -> Cat
association strength is 5/10=0.5
, and Cat -> Lion
association strength is 5/15=0.33
.
I do it like so:
assoc_df = pd.DataFrame(columns=['Animal 1', 'Animal 2', 'Association Strength'])
for row_word in df2:
for col_word in df2:
if row_word!=col_word:
assoc_df = assoc_df.append({'Animal 1': row_word, 'Animal 2': col_word,
'Association Strength': df2[col_word][row_word]/df1[df1.Animal==row_word]['Count'].values[0]}, ignore_index=True)
assoc_df
The problem is, since there are 2 for loops, the complexity is O(n^2)
. This is a problem when (in my real dataset) I have ~1000 animals to loop on, which takes hours to finish computing the association strength table.
So, how to do I better optimize the creation/generation process of this association table?
P.S.: In most practical use cases, df2
is a symmetric matrix, as "X
appears together with Y
" generally also means the same as "Y
appears together with X
. So, I am ok with solutions that assume that df2
is symmetric, and cut down the running time by half. In the above example, df2
is not a symmetric matrix, which is applicable for situations where we want to express meanings such as "X
appears after Y
" and "Y
appears after X
", which are not the same.
python performance python-3.x community-challenge data-mining
$endgroup$
add a comment |
$begingroup$
Say I have a list of animals with their counts:
import numpy as np
import pandas as pd
from random import randint
table = np.zeros((5,1), dtype=int)
for i in range(5):
table[i]=randint(10, 20)
df1 = pd.DataFrame(columns=['Animal', 'Count'])
df1['Animal'] = animal_list
df1['Count'] = table
df1
And I have a matrix of how many times they appear together:
table = np.zeros((5,5), dtype=int)
animal_list = ['Monkey', 'Tiger', 'Cat', 'Dog', 'Lion']
for i in range(5):
for j in range(5):
table[i][j]=randint(0, 9)
df2 = pd.DataFrame(table, columns=animal_list, index=animal_list)
df2
I want to find the animals' association strength, which is defined like so - if Lion
and Cat
appear together 5 times, and Lion
's count is 10 and Cat
's count is 15, then Lion -> Cat
association strength is 5/10=0.5
, and Cat -> Lion
association strength is 5/15=0.33
.
I do it like so:
assoc_df = pd.DataFrame(columns=['Animal 1', 'Animal 2', 'Association Strength'])
for row_word in df2:
for col_word in df2:
if row_word!=col_word:
assoc_df = assoc_df.append({'Animal 1': row_word, 'Animal 2': col_word,
'Association Strength': df2[col_word][row_word]/df1[df1.Animal==row_word]['Count'].values[0]}, ignore_index=True)
assoc_df
The problem is, since there are 2 for loops, the complexity is O(n^2)
. This is a problem when (in my real dataset) I have ~1000 animals to loop on, which takes hours to finish computing the association strength table.
So, how to do I better optimize the creation/generation process of this association table?
P.S.: In most practical use cases, df2
is a symmetric matrix, as "X
appears together with Y
" generally also means the same as "Y
appears together with X
. So, I am ok with solutions that assume that df2
is symmetric, and cut down the running time by half. In the above example, df2
is not a symmetric matrix, which is applicable for situations where we want to express meanings such as "X
appears after Y
" and "Y
appears after X
", which are not the same.
python performance python-3.x community-challenge data-mining
$endgroup$
add a comment |
$begingroup$
Say I have a list of animals with their counts:
import numpy as np
import pandas as pd
from random import randint
table = np.zeros((5,1), dtype=int)
for i in range(5):
table[i]=randint(10, 20)
df1 = pd.DataFrame(columns=['Animal', 'Count'])
df1['Animal'] = animal_list
df1['Count'] = table
df1
And I have a matrix of how many times they appear together:
table = np.zeros((5,5), dtype=int)
animal_list = ['Monkey', 'Tiger', 'Cat', 'Dog', 'Lion']
for i in range(5):
for j in range(5):
table[i][j]=randint(0, 9)
df2 = pd.DataFrame(table, columns=animal_list, index=animal_list)
df2
I want to find the animals' association strength, which is defined like so - if Lion
and Cat
appear together 5 times, and Lion
's count is 10 and Cat
's count is 15, then Lion -> Cat
association strength is 5/10=0.5
, and Cat -> Lion
association strength is 5/15=0.33
.
I do it like so:
assoc_df = pd.DataFrame(columns=['Animal 1', 'Animal 2', 'Association Strength'])
for row_word in df2:
for col_word in df2:
if row_word!=col_word:
assoc_df = assoc_df.append({'Animal 1': row_word, 'Animal 2': col_word,
'Association Strength': df2[col_word][row_word]/df1[df1.Animal==row_word]['Count'].values[0]}, ignore_index=True)
assoc_df
The problem is, since there are 2 for loops, the complexity is O(n^2)
. This is a problem when (in my real dataset) I have ~1000 animals to loop on, which takes hours to finish computing the association strength table.
So, how to do I better optimize the creation/generation process of this association table?
P.S.: In most practical use cases, df2
is a symmetric matrix, as "X
appears together with Y
" generally also means the same as "Y
appears together with X
. So, I am ok with solutions that assume that df2
is symmetric, and cut down the running time by half. In the above example, df2
is not a symmetric matrix, which is applicable for situations where we want to express meanings such as "X
appears after Y
" and "Y
appears after X
", which are not the same.
python performance python-3.x community-challenge data-mining
$endgroup$
Say I have a list of animals with their counts:
import numpy as np
import pandas as pd
from random import randint
table = np.zeros((5,1), dtype=int)
for i in range(5):
table[i]=randint(10, 20)
df1 = pd.DataFrame(columns=['Animal', 'Count'])
df1['Animal'] = animal_list
df1['Count'] = table
df1
And I have a matrix of how many times they appear together:
table = np.zeros((5,5), dtype=int)
animal_list = ['Monkey', 'Tiger', 'Cat', 'Dog', 'Lion']
for i in range(5):
for j in range(5):
table[i][j]=randint(0, 9)
df2 = pd.DataFrame(table, columns=animal_list, index=animal_list)
df2
I want to find the animals' association strength, which is defined like so - if Lion
and Cat
appear together 5 times, and Lion
's count is 10 and Cat
's count is 15, then Lion -> Cat
association strength is 5/10=0.5
, and Cat -> Lion
association strength is 5/15=0.33
.
I do it like so:
assoc_df = pd.DataFrame(columns=['Animal 1', 'Animal 2', 'Association Strength'])
for row_word in df2:
for col_word in df2:
if row_word!=col_word:
assoc_df = assoc_df.append({'Animal 1': row_word, 'Animal 2': col_word,
'Association Strength': df2[col_word][row_word]/df1[df1.Animal==row_word]['Count'].values[0]}, ignore_index=True)
assoc_df
The problem is, since there are 2 for loops, the complexity is O(n^2)
. This is a problem when (in my real dataset) I have ~1000 animals to loop on, which takes hours to finish computing the association strength table.
So, how to do I better optimize the creation/generation process of this association table?
P.S.: In most practical use cases, df2
is a symmetric matrix, as "X
appears together with Y
" generally also means the same as "Y
appears together with X
. So, I am ok with solutions that assume that df2
is symmetric, and cut down the running time by half. In the above example, df2
is not a symmetric matrix, which is applicable for situations where we want to express meanings such as "X
appears after Y
" and "Y
appears after X
", which are not the same.
python performance python-3.x community-challenge data-mining
python performance python-3.x community-challenge data-mining
edited 13 mins ago
Kristada673
asked 19 mins ago
Kristada673Kristada673
1463
1463
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f214381%2fevaluating-association-strengths-of-items%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f214381%2fevaluating-association-strengths-of-items%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown