Quadratic n term equation using multiindex












1















I have two DFs which I would like to use to calculate the following:



w(ti,ti)*a(ti)^2 + w(tj,tj)*b(sj,tj)^2 + 2*w(si,tj)*a(ti)*b(tj)


The above uses two terms (a,b).
w is the weight df where i and j are index and column spaces pertaining to the Tn index of a and b.



Set Up - Edit dynamic W



import pandas as pd
import numpy as np

I = ['i'+ str(i) for i in range(4)]
Q = ['q' + str(i) for i in range(5)]
T = ['t' + str(i) for i in range(3)]
n = 100

df1 = pd.DataFrame({'I': [I[np.random.randint(len(I))] for i in range(n)],
'Q': [Q[np.random.randint(len(Q))] for i in range(n)],
'Tn': [T[np.random.randint(len(T))] for i in range(n)],
'V': np.random.rand(n)}).groupby(['I','Q','Tn']).sum()

df1.head(5)
I Q Tn V
i0 q0 t0 1.626799
t2 1.725374
q1 t0 2.155340
t1 0.479741
t2 1.039178

w = np.random.randn(len(T),len(T))
w = (w*w.T)/2
np.fill_diagonal(w,1)
W = pd.DataFrame(w, columns = T, index = T)

W
t0 t1 t2
t0 1.000000 0.029174 -0.045754
t1 0.029174 1.000000 0.233330
t2 -0.045754 0.233330 1.000000


Effectively I would like to use the index Tn in df1 to use the above equation for every I and Q.



The end result for df1.loc['i0','q0'] in the example above should be:



  W(t0,t0) * V(t0)^2 
+ W(t2,t2) * V(t2)^2
+ 2 * W(t0,t2) * V(t0) * V(t2)
=
1.0 * 1.626799**2
+ 1.0 * 1.725374**2
+ (-0.045754) * 1.626799 * 1.725374


The end result for df1.loc['i0','q1'] in the example above should be:



  W(t0,t0) * V(t0)^2 
+ W(t1,t1) * V(t1)^2
+ W(t2,t2) * V(t2)^2
+ 2 * W(t0,t1) * V(t0) * V(t1)
+ 2 * W(t0,t2) * V(t0) * V(t2)
+ 2 * W(t2,t1) * V(t1) * V(t2)
=
1.0 * 2.155340**2
+ 1.0 * 0.479741**2
+ 1.0 * 1.039178**2
+ 0.029174 * 2.155340 * 0.479741 * 1
+ (-0.045754) * 2.155340 * 1.039178 * 1
+ 0.233330 * 0.479741 * 1.039178 * 1


This pattern will repeat depending on the number of tn terms in each Q hence it should be robust enough to handle as many Tn terms as needed (in the example I use 3, but it could be as much as 100 or more).



Each result should then be saved in a new DF with Index = [I, Q]
The solution should also not be slower than excel when n increases in value.



Thanks in advance










share|improve this question




















  • 2





    Your equation implies the value 'w' is the same for all three terms but they are not. Maybe you should rename them and describe how they relate to or are derived from the df1 indices . Make it easier for your readers.

    – wwii
    Nov 22 '18 at 21:52








  • 2





    df1.loc['i0','q0' has three Tn's. How does it work?

    – wwii
    Nov 22 '18 at 22:03






  • 1





    Is W not supposed to be symmetric? if not, how I know which factor to used between W.loc['t3','t4'] and W.loc['t4','t3'] for the example you give, because you use the first one but why?

    – Ben.T
    Nov 22 '18 at 22:09






  • 1





    I have changed the question to correspond with the comments

    – RealRageDontQuit
    Nov 22 '18 at 22:42
















1















I have two DFs which I would like to use to calculate the following:



w(ti,ti)*a(ti)^2 + w(tj,tj)*b(sj,tj)^2 + 2*w(si,tj)*a(ti)*b(tj)


The above uses two terms (a,b).
w is the weight df where i and j are index and column spaces pertaining to the Tn index of a and b.



Set Up - Edit dynamic W



import pandas as pd
import numpy as np

I = ['i'+ str(i) for i in range(4)]
Q = ['q' + str(i) for i in range(5)]
T = ['t' + str(i) for i in range(3)]
n = 100

df1 = pd.DataFrame({'I': [I[np.random.randint(len(I))] for i in range(n)],
'Q': [Q[np.random.randint(len(Q))] for i in range(n)],
'Tn': [T[np.random.randint(len(T))] for i in range(n)],
'V': np.random.rand(n)}).groupby(['I','Q','Tn']).sum()

df1.head(5)
I Q Tn V
i0 q0 t0 1.626799
t2 1.725374
q1 t0 2.155340
t1 0.479741
t2 1.039178

w = np.random.randn(len(T),len(T))
w = (w*w.T)/2
np.fill_diagonal(w,1)
W = pd.DataFrame(w, columns = T, index = T)

W
t0 t1 t2
t0 1.000000 0.029174 -0.045754
t1 0.029174 1.000000 0.233330
t2 -0.045754 0.233330 1.000000


Effectively I would like to use the index Tn in df1 to use the above equation for every I and Q.



The end result for df1.loc['i0','q0'] in the example above should be:



  W(t0,t0) * V(t0)^2 
+ W(t2,t2) * V(t2)^2
+ 2 * W(t0,t2) * V(t0) * V(t2)
=
1.0 * 1.626799**2
+ 1.0 * 1.725374**2
+ (-0.045754) * 1.626799 * 1.725374


The end result for df1.loc['i0','q1'] in the example above should be:



  W(t0,t0) * V(t0)^2 
+ W(t1,t1) * V(t1)^2
+ W(t2,t2) * V(t2)^2
+ 2 * W(t0,t1) * V(t0) * V(t1)
+ 2 * W(t0,t2) * V(t0) * V(t2)
+ 2 * W(t2,t1) * V(t1) * V(t2)
=
1.0 * 2.155340**2
+ 1.0 * 0.479741**2
+ 1.0 * 1.039178**2
+ 0.029174 * 2.155340 * 0.479741 * 1
+ (-0.045754) * 2.155340 * 1.039178 * 1
+ 0.233330 * 0.479741 * 1.039178 * 1


This pattern will repeat depending on the number of tn terms in each Q hence it should be robust enough to handle as many Tn terms as needed (in the example I use 3, but it could be as much as 100 or more).



Each result should then be saved in a new DF with Index = [I, Q]
The solution should also not be slower than excel when n increases in value.



Thanks in advance










share|improve this question




















  • 2





    Your equation implies the value 'w' is the same for all three terms but they are not. Maybe you should rename them and describe how they relate to or are derived from the df1 indices . Make it easier for your readers.

    – wwii
    Nov 22 '18 at 21:52








  • 2





    df1.loc['i0','q0' has three Tn's. How does it work?

    – wwii
    Nov 22 '18 at 22:03






  • 1





    Is W not supposed to be symmetric? if not, how I know which factor to used between W.loc['t3','t4'] and W.loc['t4','t3'] for the example you give, because you use the first one but why?

    – Ben.T
    Nov 22 '18 at 22:09






  • 1





    I have changed the question to correspond with the comments

    – RealRageDontQuit
    Nov 22 '18 at 22:42














1












1








1








I have two DFs which I would like to use to calculate the following:



w(ti,ti)*a(ti)^2 + w(tj,tj)*b(sj,tj)^2 + 2*w(si,tj)*a(ti)*b(tj)


The above uses two terms (a,b).
w is the weight df where i and j are index and column spaces pertaining to the Tn index of a and b.



Set Up - Edit dynamic W



import pandas as pd
import numpy as np

I = ['i'+ str(i) for i in range(4)]
Q = ['q' + str(i) for i in range(5)]
T = ['t' + str(i) for i in range(3)]
n = 100

df1 = pd.DataFrame({'I': [I[np.random.randint(len(I))] for i in range(n)],
'Q': [Q[np.random.randint(len(Q))] for i in range(n)],
'Tn': [T[np.random.randint(len(T))] for i in range(n)],
'V': np.random.rand(n)}).groupby(['I','Q','Tn']).sum()

df1.head(5)
I Q Tn V
i0 q0 t0 1.626799
t2 1.725374
q1 t0 2.155340
t1 0.479741
t2 1.039178

w = np.random.randn(len(T),len(T))
w = (w*w.T)/2
np.fill_diagonal(w,1)
W = pd.DataFrame(w, columns = T, index = T)

W
t0 t1 t2
t0 1.000000 0.029174 -0.045754
t1 0.029174 1.000000 0.233330
t2 -0.045754 0.233330 1.000000


Effectively I would like to use the index Tn in df1 to use the above equation for every I and Q.



The end result for df1.loc['i0','q0'] in the example above should be:



  W(t0,t0) * V(t0)^2 
+ W(t2,t2) * V(t2)^2
+ 2 * W(t0,t2) * V(t0) * V(t2)
=
1.0 * 1.626799**2
+ 1.0 * 1.725374**2
+ (-0.045754) * 1.626799 * 1.725374


The end result for df1.loc['i0','q1'] in the example above should be:



  W(t0,t0) * V(t0)^2 
+ W(t1,t1) * V(t1)^2
+ W(t2,t2) * V(t2)^2
+ 2 * W(t0,t1) * V(t0) * V(t1)
+ 2 * W(t0,t2) * V(t0) * V(t2)
+ 2 * W(t2,t1) * V(t1) * V(t2)
=
1.0 * 2.155340**2
+ 1.0 * 0.479741**2
+ 1.0 * 1.039178**2
+ 0.029174 * 2.155340 * 0.479741 * 1
+ (-0.045754) * 2.155340 * 1.039178 * 1
+ 0.233330 * 0.479741 * 1.039178 * 1


This pattern will repeat depending on the number of tn terms in each Q hence it should be robust enough to handle as many Tn terms as needed (in the example I use 3, but it could be as much as 100 or more).



Each result should then be saved in a new DF with Index = [I, Q]
The solution should also not be slower than excel when n increases in value.



Thanks in advance










share|improve this question
















I have two DFs which I would like to use to calculate the following:



w(ti,ti)*a(ti)^2 + w(tj,tj)*b(sj,tj)^2 + 2*w(si,tj)*a(ti)*b(tj)


The above uses two terms (a,b).
w is the weight df where i and j are index and column spaces pertaining to the Tn index of a and b.



Set Up - Edit dynamic W



import pandas as pd
import numpy as np

I = ['i'+ str(i) for i in range(4)]
Q = ['q' + str(i) for i in range(5)]
T = ['t' + str(i) for i in range(3)]
n = 100

df1 = pd.DataFrame({'I': [I[np.random.randint(len(I))] for i in range(n)],
'Q': [Q[np.random.randint(len(Q))] for i in range(n)],
'Tn': [T[np.random.randint(len(T))] for i in range(n)],
'V': np.random.rand(n)}).groupby(['I','Q','Tn']).sum()

df1.head(5)
I Q Tn V
i0 q0 t0 1.626799
t2 1.725374
q1 t0 2.155340
t1 0.479741
t2 1.039178

w = np.random.randn(len(T),len(T))
w = (w*w.T)/2
np.fill_diagonal(w,1)
W = pd.DataFrame(w, columns = T, index = T)

W
t0 t1 t2
t0 1.000000 0.029174 -0.045754
t1 0.029174 1.000000 0.233330
t2 -0.045754 0.233330 1.000000


Effectively I would like to use the index Tn in df1 to use the above equation for every I and Q.



The end result for df1.loc['i0','q0'] in the example above should be:



  W(t0,t0) * V(t0)^2 
+ W(t2,t2) * V(t2)^2
+ 2 * W(t0,t2) * V(t0) * V(t2)
=
1.0 * 1.626799**2
+ 1.0 * 1.725374**2
+ (-0.045754) * 1.626799 * 1.725374


The end result for df1.loc['i0','q1'] in the example above should be:



  W(t0,t0) * V(t0)^2 
+ W(t1,t1) * V(t1)^2
+ W(t2,t2) * V(t2)^2
+ 2 * W(t0,t1) * V(t0) * V(t1)
+ 2 * W(t0,t2) * V(t0) * V(t2)
+ 2 * W(t2,t1) * V(t1) * V(t2)
=
1.0 * 2.155340**2
+ 1.0 * 0.479741**2
+ 1.0 * 1.039178**2
+ 0.029174 * 2.155340 * 0.479741 * 1
+ (-0.045754) * 2.155340 * 1.039178 * 1
+ 0.233330 * 0.479741 * 1.039178 * 1


This pattern will repeat depending on the number of tn terms in each Q hence it should be robust enough to handle as many Tn terms as needed (in the example I use 3, but it could be as much as 100 or more).



Each result should then be saved in a new DF with Index = [I, Q]
The solution should also not be slower than excel when n increases in value.



Thanks in advance







python numpy dataframe multi-index quadratic






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 25 '18 at 8:36







RealRageDontQuit

















asked Nov 22 '18 at 21:33









RealRageDontQuitRealRageDontQuit

508




508








  • 2





    Your equation implies the value 'w' is the same for all three terms but they are not. Maybe you should rename them and describe how they relate to or are derived from the df1 indices . Make it easier for your readers.

    – wwii
    Nov 22 '18 at 21:52








  • 2





    df1.loc['i0','q0' has three Tn's. How does it work?

    – wwii
    Nov 22 '18 at 22:03






  • 1





    Is W not supposed to be symmetric? if not, how I know which factor to used between W.loc['t3','t4'] and W.loc['t4','t3'] for the example you give, because you use the first one but why?

    – Ben.T
    Nov 22 '18 at 22:09






  • 1





    I have changed the question to correspond with the comments

    – RealRageDontQuit
    Nov 22 '18 at 22:42














  • 2





    Your equation implies the value 'w' is the same for all three terms but they are not. Maybe you should rename them and describe how they relate to or are derived from the df1 indices . Make it easier for your readers.

    – wwii
    Nov 22 '18 at 21:52








  • 2





    df1.loc['i0','q0' has three Tn's. How does it work?

    – wwii
    Nov 22 '18 at 22:03






  • 1





    Is W not supposed to be symmetric? if not, how I know which factor to used between W.loc['t3','t4'] and W.loc['t4','t3'] for the example you give, because you use the first one but why?

    – Ben.T
    Nov 22 '18 at 22:09






  • 1





    I have changed the question to correspond with the comments

    – RealRageDontQuit
    Nov 22 '18 at 22:42








2




2





Your equation implies the value 'w' is the same for all three terms but they are not. Maybe you should rename them and describe how they relate to or are derived from the df1 indices . Make it easier for your readers.

– wwii
Nov 22 '18 at 21:52







Your equation implies the value 'w' is the same for all three terms but they are not. Maybe you should rename them and describe how they relate to or are derived from the df1 indices . Make it easier for your readers.

– wwii
Nov 22 '18 at 21:52






2




2





df1.loc['i0','q0' has three Tn's. How does it work?

– wwii
Nov 22 '18 at 22:03





df1.loc['i0','q0' has three Tn's. How does it work?

– wwii
Nov 22 '18 at 22:03




1




1





Is W not supposed to be symmetric? if not, how I know which factor to used between W.loc['t3','t4'] and W.loc['t4','t3'] for the example you give, because you use the first one but why?

– Ben.T
Nov 22 '18 at 22:09





Is W not supposed to be symmetric? if not, how I know which factor to used between W.loc['t3','t4'] and W.loc['t4','t3'] for the example you give, because you use the first one but why?

– Ben.T
Nov 22 '18 at 22:09




1




1





I have changed the question to correspond with the comments

– RealRageDontQuit
Nov 22 '18 at 22:42





I have changed the question to correspond with the comments

– RealRageDontQuit
Nov 22 '18 at 22:42












1 Answer
1






active

oldest

votes


















1














One way could be first reindex your dataframe df1 with all the possible combinations of the lists I, Q and Tn with pd.MultiIndex.from_product, filling the missing value in the column 'V' with 0. The column has then len(I)*len(Q)*len(T) elements. Then you can reshape the values to get each row related to one combination on I and Q such as:



ar = (df1.reindex(pd.MultiIndex.from_product([I,Q,T], names=['I','Q','Tn']),fill_value=0)
.values.reshape(-1,len(T)))


To see the relation between my input df1 and ar, here are some related rows



print (df1.head(6))
V
I Q Tn
i0 q0 t1 1.123666
q1 t0 0.538610
t1 2.943206
q2 t0 0.570990
t1 0.617524
t2 1.413926
print (ar[:3])
[[0. 1.1236656 0. ]
[0.53861027 2.94320574 0. ]
[0.57099049 0.61752408 1.4139263 ]]


Now, to perform the multiplication with the element of W, one way is to create the outer product of ar with itself but row-wise to get, for each row a len(T)*len(T) matrix. For example, for the second row:



[0.53861027 2.94320574 0.        ]


becomes



[[0.29010102, 1.58524083, 0.        ], #0.29010102 = 0.53861027**2, 1.58524083 = 0.53861027*2.94320574 ...
[1.58524083, 8.66246003, 0. ],
[0. , 0. , 0. ]]


Several methods are possible such as ar[:,:,None]*ar[:,None,:] or np.einsum with the right subscript: np.einsum('ij,ik->ijk',ar,ar). Both give same result.



The next step can be done with a tensordot and specify the right axes. So with ar and W as an input, you do:



print (np.tensordot(np.einsum('ij,ik->ijk',ar,ar),W.values,axes=([1,2],[0,1])))
array([ 1.26262437, 15.29352438, 15.94605435, ...


To check for the second value here, 1*0.29010102 + 1*8.66246003 + 2.*2*1.58524083 == 15.29352438 (where 1 is W(t0,t0) and W(t1,t1), 2 is W(t0,t1))



Finally, to create the dataframe as expected, use again pd.MultiIndex.from_product:



new_df = pd.DataFrame({'col1': np.tensordot(np.einsum('ij,ik->ijk',ar,ar),
W.values,axes=([1,2],[0,1]))},
index=pd.MultiIndex.from_product([I,Q], names=['I','Q']))

print (new_df.head(3))
col1
I Q
i0 q0 1.262624
q1 15.293524
q2 15.946054
...


Note: if you are SURE that each element of T is at least once in the last level of df1, the ar can be obtain using unstack such as ar=df1.unstack(fill_value=0).values. But I would suggest to use the reindex method above to prevent any error






share|improve this answer
























  • This seems to work. However, I found an edge case in my problem which would make this answer not correct. Otherwise you have taught me something new! Thank you

    – RealRageDontQuit
    Nov 24 '18 at 11:16













  • @RealRageDontQuit what you call edge case is actually a different problem. You change the dataframe structure by adding an index level, change the formula by multipying with another matrix s and do a sum over this new index (at least of what I understood). I think my answer can pretty easily be adapted to this problem, but if you want a general method it will be more complicated.

    – Ben.T
    Nov 24 '18 at 12:14






  • 1





    you are correct to say that this question solves the initial problem. I have changed the question to reflect the initial problem, tick your answer as accepted (thanks) and will also create a new question with this edge case.

    – RealRageDontQuit
    Nov 25 '18 at 8:37











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53438193%2fquadratic-n-term-equation-using-multiindex%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














One way could be first reindex your dataframe df1 with all the possible combinations of the lists I, Q and Tn with pd.MultiIndex.from_product, filling the missing value in the column 'V' with 0. The column has then len(I)*len(Q)*len(T) elements. Then you can reshape the values to get each row related to one combination on I and Q such as:



ar = (df1.reindex(pd.MultiIndex.from_product([I,Q,T], names=['I','Q','Tn']),fill_value=0)
.values.reshape(-1,len(T)))


To see the relation between my input df1 and ar, here are some related rows



print (df1.head(6))
V
I Q Tn
i0 q0 t1 1.123666
q1 t0 0.538610
t1 2.943206
q2 t0 0.570990
t1 0.617524
t2 1.413926
print (ar[:3])
[[0. 1.1236656 0. ]
[0.53861027 2.94320574 0. ]
[0.57099049 0.61752408 1.4139263 ]]


Now, to perform the multiplication with the element of W, one way is to create the outer product of ar with itself but row-wise to get, for each row a len(T)*len(T) matrix. For example, for the second row:



[0.53861027 2.94320574 0.        ]


becomes



[[0.29010102, 1.58524083, 0.        ], #0.29010102 = 0.53861027**2, 1.58524083 = 0.53861027*2.94320574 ...
[1.58524083, 8.66246003, 0. ],
[0. , 0. , 0. ]]


Several methods are possible such as ar[:,:,None]*ar[:,None,:] or np.einsum with the right subscript: np.einsum('ij,ik->ijk',ar,ar). Both give same result.



The next step can be done with a tensordot and specify the right axes. So with ar and W as an input, you do:



print (np.tensordot(np.einsum('ij,ik->ijk',ar,ar),W.values,axes=([1,2],[0,1])))
array([ 1.26262437, 15.29352438, 15.94605435, ...


To check for the second value here, 1*0.29010102 + 1*8.66246003 + 2.*2*1.58524083 == 15.29352438 (where 1 is W(t0,t0) and W(t1,t1), 2 is W(t0,t1))



Finally, to create the dataframe as expected, use again pd.MultiIndex.from_product:



new_df = pd.DataFrame({'col1': np.tensordot(np.einsum('ij,ik->ijk',ar,ar),
W.values,axes=([1,2],[0,1]))},
index=pd.MultiIndex.from_product([I,Q], names=['I','Q']))

print (new_df.head(3))
col1
I Q
i0 q0 1.262624
q1 15.293524
q2 15.946054
...


Note: if you are SURE that each element of T is at least once in the last level of df1, the ar can be obtain using unstack such as ar=df1.unstack(fill_value=0).values. But I would suggest to use the reindex method above to prevent any error






share|improve this answer
























  • This seems to work. However, I found an edge case in my problem which would make this answer not correct. Otherwise you have taught me something new! Thank you

    – RealRageDontQuit
    Nov 24 '18 at 11:16













  • @RealRageDontQuit what you call edge case is actually a different problem. You change the dataframe structure by adding an index level, change the formula by multipying with another matrix s and do a sum over this new index (at least of what I understood). I think my answer can pretty easily be adapted to this problem, but if you want a general method it will be more complicated.

    – Ben.T
    Nov 24 '18 at 12:14






  • 1





    you are correct to say that this question solves the initial problem. I have changed the question to reflect the initial problem, tick your answer as accepted (thanks) and will also create a new question with this edge case.

    – RealRageDontQuit
    Nov 25 '18 at 8:37
















1














One way could be first reindex your dataframe df1 with all the possible combinations of the lists I, Q and Tn with pd.MultiIndex.from_product, filling the missing value in the column 'V' with 0. The column has then len(I)*len(Q)*len(T) elements. Then you can reshape the values to get each row related to one combination on I and Q such as:



ar = (df1.reindex(pd.MultiIndex.from_product([I,Q,T], names=['I','Q','Tn']),fill_value=0)
.values.reshape(-1,len(T)))


To see the relation between my input df1 and ar, here are some related rows



print (df1.head(6))
V
I Q Tn
i0 q0 t1 1.123666
q1 t0 0.538610
t1 2.943206
q2 t0 0.570990
t1 0.617524
t2 1.413926
print (ar[:3])
[[0. 1.1236656 0. ]
[0.53861027 2.94320574 0. ]
[0.57099049 0.61752408 1.4139263 ]]


Now, to perform the multiplication with the element of W, one way is to create the outer product of ar with itself but row-wise to get, for each row a len(T)*len(T) matrix. For example, for the second row:



[0.53861027 2.94320574 0.        ]


becomes



[[0.29010102, 1.58524083, 0.        ], #0.29010102 = 0.53861027**2, 1.58524083 = 0.53861027*2.94320574 ...
[1.58524083, 8.66246003, 0. ],
[0. , 0. , 0. ]]


Several methods are possible such as ar[:,:,None]*ar[:,None,:] or np.einsum with the right subscript: np.einsum('ij,ik->ijk',ar,ar). Both give same result.



The next step can be done with a tensordot and specify the right axes. So with ar and W as an input, you do:



print (np.tensordot(np.einsum('ij,ik->ijk',ar,ar),W.values,axes=([1,2],[0,1])))
array([ 1.26262437, 15.29352438, 15.94605435, ...


To check for the second value here, 1*0.29010102 + 1*8.66246003 + 2.*2*1.58524083 == 15.29352438 (where 1 is W(t0,t0) and W(t1,t1), 2 is W(t0,t1))



Finally, to create the dataframe as expected, use again pd.MultiIndex.from_product:



new_df = pd.DataFrame({'col1': np.tensordot(np.einsum('ij,ik->ijk',ar,ar),
W.values,axes=([1,2],[0,1]))},
index=pd.MultiIndex.from_product([I,Q], names=['I','Q']))

print (new_df.head(3))
col1
I Q
i0 q0 1.262624
q1 15.293524
q2 15.946054
...


Note: if you are SURE that each element of T is at least once in the last level of df1, the ar can be obtain using unstack such as ar=df1.unstack(fill_value=0).values. But I would suggest to use the reindex method above to prevent any error






share|improve this answer
























  • This seems to work. However, I found an edge case in my problem which would make this answer not correct. Otherwise you have taught me something new! Thank you

    – RealRageDontQuit
    Nov 24 '18 at 11:16













  • @RealRageDontQuit what you call edge case is actually a different problem. You change the dataframe structure by adding an index level, change the formula by multipying with another matrix s and do a sum over this new index (at least of what I understood). I think my answer can pretty easily be adapted to this problem, but if you want a general method it will be more complicated.

    – Ben.T
    Nov 24 '18 at 12:14






  • 1





    you are correct to say that this question solves the initial problem. I have changed the question to reflect the initial problem, tick your answer as accepted (thanks) and will also create a new question with this edge case.

    – RealRageDontQuit
    Nov 25 '18 at 8:37














1












1








1







One way could be first reindex your dataframe df1 with all the possible combinations of the lists I, Q and Tn with pd.MultiIndex.from_product, filling the missing value in the column 'V' with 0. The column has then len(I)*len(Q)*len(T) elements. Then you can reshape the values to get each row related to one combination on I and Q such as:



ar = (df1.reindex(pd.MultiIndex.from_product([I,Q,T], names=['I','Q','Tn']),fill_value=0)
.values.reshape(-1,len(T)))


To see the relation between my input df1 and ar, here are some related rows



print (df1.head(6))
V
I Q Tn
i0 q0 t1 1.123666
q1 t0 0.538610
t1 2.943206
q2 t0 0.570990
t1 0.617524
t2 1.413926
print (ar[:3])
[[0. 1.1236656 0. ]
[0.53861027 2.94320574 0. ]
[0.57099049 0.61752408 1.4139263 ]]


Now, to perform the multiplication with the element of W, one way is to create the outer product of ar with itself but row-wise to get, for each row a len(T)*len(T) matrix. For example, for the second row:



[0.53861027 2.94320574 0.        ]


becomes



[[0.29010102, 1.58524083, 0.        ], #0.29010102 = 0.53861027**2, 1.58524083 = 0.53861027*2.94320574 ...
[1.58524083, 8.66246003, 0. ],
[0. , 0. , 0. ]]


Several methods are possible such as ar[:,:,None]*ar[:,None,:] or np.einsum with the right subscript: np.einsum('ij,ik->ijk',ar,ar). Both give same result.



The next step can be done with a tensordot and specify the right axes. So with ar and W as an input, you do:



print (np.tensordot(np.einsum('ij,ik->ijk',ar,ar),W.values,axes=([1,2],[0,1])))
array([ 1.26262437, 15.29352438, 15.94605435, ...


To check for the second value here, 1*0.29010102 + 1*8.66246003 + 2.*2*1.58524083 == 15.29352438 (where 1 is W(t0,t0) and W(t1,t1), 2 is W(t0,t1))



Finally, to create the dataframe as expected, use again pd.MultiIndex.from_product:



new_df = pd.DataFrame({'col1': np.tensordot(np.einsum('ij,ik->ijk',ar,ar),
W.values,axes=([1,2],[0,1]))},
index=pd.MultiIndex.from_product([I,Q], names=['I','Q']))

print (new_df.head(3))
col1
I Q
i0 q0 1.262624
q1 15.293524
q2 15.946054
...


Note: if you are SURE that each element of T is at least once in the last level of df1, the ar can be obtain using unstack such as ar=df1.unstack(fill_value=0).values. But I would suggest to use the reindex method above to prevent any error






share|improve this answer













One way could be first reindex your dataframe df1 with all the possible combinations of the lists I, Q and Tn with pd.MultiIndex.from_product, filling the missing value in the column 'V' with 0. The column has then len(I)*len(Q)*len(T) elements. Then you can reshape the values to get each row related to one combination on I and Q such as:



ar = (df1.reindex(pd.MultiIndex.from_product([I,Q,T], names=['I','Q','Tn']),fill_value=0)
.values.reshape(-1,len(T)))


To see the relation between my input df1 and ar, here are some related rows



print (df1.head(6))
V
I Q Tn
i0 q0 t1 1.123666
q1 t0 0.538610
t1 2.943206
q2 t0 0.570990
t1 0.617524
t2 1.413926
print (ar[:3])
[[0. 1.1236656 0. ]
[0.53861027 2.94320574 0. ]
[0.57099049 0.61752408 1.4139263 ]]


Now, to perform the multiplication with the element of W, one way is to create the outer product of ar with itself but row-wise to get, for each row a len(T)*len(T) matrix. For example, for the second row:



[0.53861027 2.94320574 0.        ]


becomes



[[0.29010102, 1.58524083, 0.        ], #0.29010102 = 0.53861027**2, 1.58524083 = 0.53861027*2.94320574 ...
[1.58524083, 8.66246003, 0. ],
[0. , 0. , 0. ]]


Several methods are possible such as ar[:,:,None]*ar[:,None,:] or np.einsum with the right subscript: np.einsum('ij,ik->ijk',ar,ar). Both give same result.



The next step can be done with a tensordot and specify the right axes. So with ar and W as an input, you do:



print (np.tensordot(np.einsum('ij,ik->ijk',ar,ar),W.values,axes=([1,2],[0,1])))
array([ 1.26262437, 15.29352438, 15.94605435, ...


To check for the second value here, 1*0.29010102 + 1*8.66246003 + 2.*2*1.58524083 == 15.29352438 (where 1 is W(t0,t0) and W(t1,t1), 2 is W(t0,t1))



Finally, to create the dataframe as expected, use again pd.MultiIndex.from_product:



new_df = pd.DataFrame({'col1': np.tensordot(np.einsum('ij,ik->ijk',ar,ar),
W.values,axes=([1,2],[0,1]))},
index=pd.MultiIndex.from_product([I,Q], names=['I','Q']))

print (new_df.head(3))
col1
I Q
i0 q0 1.262624
q1 15.293524
q2 15.946054
...


Note: if you are SURE that each element of T is at least once in the last level of df1, the ar can be obtain using unstack such as ar=df1.unstack(fill_value=0).values. But I would suggest to use the reindex method above to prevent any error







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 23 '18 at 19:40









Ben.TBen.T

6,0802825




6,0802825













  • This seems to work. However, I found an edge case in my problem which would make this answer not correct. Otherwise you have taught me something new! Thank you

    – RealRageDontQuit
    Nov 24 '18 at 11:16













  • @RealRageDontQuit what you call edge case is actually a different problem. You change the dataframe structure by adding an index level, change the formula by multipying with another matrix s and do a sum over this new index (at least of what I understood). I think my answer can pretty easily be adapted to this problem, but if you want a general method it will be more complicated.

    – Ben.T
    Nov 24 '18 at 12:14






  • 1





    you are correct to say that this question solves the initial problem. I have changed the question to reflect the initial problem, tick your answer as accepted (thanks) and will also create a new question with this edge case.

    – RealRageDontQuit
    Nov 25 '18 at 8:37



















  • This seems to work. However, I found an edge case in my problem which would make this answer not correct. Otherwise you have taught me something new! Thank you

    – RealRageDontQuit
    Nov 24 '18 at 11:16













  • @RealRageDontQuit what you call edge case is actually a different problem. You change the dataframe structure by adding an index level, change the formula by multipying with another matrix s and do a sum over this new index (at least of what I understood). I think my answer can pretty easily be adapted to this problem, but if you want a general method it will be more complicated.

    – Ben.T
    Nov 24 '18 at 12:14






  • 1





    you are correct to say that this question solves the initial problem. I have changed the question to reflect the initial problem, tick your answer as accepted (thanks) and will also create a new question with this edge case.

    – RealRageDontQuit
    Nov 25 '18 at 8:37

















This seems to work. However, I found an edge case in my problem which would make this answer not correct. Otherwise you have taught me something new! Thank you

– RealRageDontQuit
Nov 24 '18 at 11:16







This seems to work. However, I found an edge case in my problem which would make this answer not correct. Otherwise you have taught me something new! Thank you

– RealRageDontQuit
Nov 24 '18 at 11:16















@RealRageDontQuit what you call edge case is actually a different problem. You change the dataframe structure by adding an index level, change the formula by multipying with another matrix s and do a sum over this new index (at least of what I understood). I think my answer can pretty easily be adapted to this problem, but if you want a general method it will be more complicated.

– Ben.T
Nov 24 '18 at 12:14





@RealRageDontQuit what you call edge case is actually a different problem. You change the dataframe structure by adding an index level, change the formula by multipying with another matrix s and do a sum over this new index (at least of what I understood). I think my answer can pretty easily be adapted to this problem, but if you want a general method it will be more complicated.

– Ben.T
Nov 24 '18 at 12:14




1




1





you are correct to say that this question solves the initial problem. I have changed the question to reflect the initial problem, tick your answer as accepted (thanks) and will also create a new question with this edge case.

– RealRageDontQuit
Nov 25 '18 at 8:37





you are correct to say that this question solves the initial problem. I have changed the question to reflect the initial problem, tick your answer as accepted (thanks) and will also create a new question with this edge case.

– RealRageDontQuit
Nov 25 '18 at 8:37


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53438193%2fquadratic-n-term-equation-using-multiindex%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Costa Masnaga

Fotorealismo

Sidney Franklin