Diagonal of the Hessian with Tensorflow











up vote
0
down vote

favorite












I'm doing some machine learning and I have to deal with a custom loss function. The derivatives and the Hessian of the loss function are difficult to derive and so I've resorted to computing them automatically using Tensorflow.



Here is an example.



import numpy as np
import tensorflow as tf

y_true = np.array([
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1]
], dtype=float)

y_pred = np.array([
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1]
], dtype=float)

weights = np.array([1, 1, 1, 1, 1], dtype=float)

with tf.Session():

# We first convert the numpy arrays to Tensorflow tensors
y_true = tf.convert_to_tensor(y_true)
y_pred = tf.convert_to_tensor(y_pred)
weights = tf.convert_to_tensor(weights)

# The following code block is a custom loss
ys = tf.reduce_sum(y_true, axis=0)
y_true = y_true / ys
ln_p = tf.nn.log_softmax(y_pred)
wll = tf.reduce_sum(y_true * ln_p, axis=0)
loss = -tf.tensordot(weights, wll, axes=1)

grad = tf.gradients(loss, y_pred)[0]

hess = tf.hessians(loss, y_pred)[0]
hess = tf.diag_part(hess)

print(hess.eval())


which prints out



[[0.24090069 0.12669198 0.12669198 0.12669198 0.12669198]
[0.12669198 0.24090069 0.12669198 0.12669198 0.12669198]
[0.12669198 0.12669198 0.12669198 0.24090069 0.12669198]
[0.12669198 0.12669198 0.24090069 0.12669198 0.12669198]
[0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]
[0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]
[0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]]


I'm happy with this because it works, the problem is that it doesn't scale. For my use case I only need the diagonal of the Hessian matrix. I've managed to extract it using hess = tf.diag_part(hess) but this will still compute the full Hessian, which is unnecessary. The overhead is so bad that I can't use it for moderate sized datasets (~100k rows).



My question is thus: is there any better way to extract the diagonal of the Hessian? I'm well aware of this post and this one but I don't find the answers good enough.










share|improve this question
























  • Possible duplicate of Tensorflow: Compute Hessian matrix (only diagonal part) with respect to a high rank tensor
    – b-fg
    Nov 19 at 15:33










  • I'm aware about the link you provided but it's really unclear, plus the Tensorflow API has changed sinced then.
    – Max Halford
    Nov 20 at 11:01










  • Posaible duplicate is not inherently bad. Just means that the question is closely tied to another one. So with this we provide a permanent record of the link between both questions.
    – b-fg
    Nov 20 at 11:10

















up vote
0
down vote

favorite












I'm doing some machine learning and I have to deal with a custom loss function. The derivatives and the Hessian of the loss function are difficult to derive and so I've resorted to computing them automatically using Tensorflow.



Here is an example.



import numpy as np
import tensorflow as tf

y_true = np.array([
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1]
], dtype=float)

y_pred = np.array([
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1]
], dtype=float)

weights = np.array([1, 1, 1, 1, 1], dtype=float)

with tf.Session():

# We first convert the numpy arrays to Tensorflow tensors
y_true = tf.convert_to_tensor(y_true)
y_pred = tf.convert_to_tensor(y_pred)
weights = tf.convert_to_tensor(weights)

# The following code block is a custom loss
ys = tf.reduce_sum(y_true, axis=0)
y_true = y_true / ys
ln_p = tf.nn.log_softmax(y_pred)
wll = tf.reduce_sum(y_true * ln_p, axis=0)
loss = -tf.tensordot(weights, wll, axes=1)

grad = tf.gradients(loss, y_pred)[0]

hess = tf.hessians(loss, y_pred)[0]
hess = tf.diag_part(hess)

print(hess.eval())


which prints out



[[0.24090069 0.12669198 0.12669198 0.12669198 0.12669198]
[0.12669198 0.24090069 0.12669198 0.12669198 0.12669198]
[0.12669198 0.12669198 0.12669198 0.24090069 0.12669198]
[0.12669198 0.12669198 0.24090069 0.12669198 0.12669198]
[0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]
[0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]
[0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]]


I'm happy with this because it works, the problem is that it doesn't scale. For my use case I only need the diagonal of the Hessian matrix. I've managed to extract it using hess = tf.diag_part(hess) but this will still compute the full Hessian, which is unnecessary. The overhead is so bad that I can't use it for moderate sized datasets (~100k rows).



My question is thus: is there any better way to extract the diagonal of the Hessian? I'm well aware of this post and this one but I don't find the answers good enough.










share|improve this question
























  • Possible duplicate of Tensorflow: Compute Hessian matrix (only diagonal part) with respect to a high rank tensor
    – b-fg
    Nov 19 at 15:33










  • I'm aware about the link you provided but it's really unclear, plus the Tensorflow API has changed sinced then.
    – Max Halford
    Nov 20 at 11:01










  • Posaible duplicate is not inherently bad. Just means that the question is closely tied to another one. So with this we provide a permanent record of the link between both questions.
    – b-fg
    Nov 20 at 11:10















up vote
0
down vote

favorite









up vote
0
down vote

favorite











I'm doing some machine learning and I have to deal with a custom loss function. The derivatives and the Hessian of the loss function are difficult to derive and so I've resorted to computing them automatically using Tensorflow.



Here is an example.



import numpy as np
import tensorflow as tf

y_true = np.array([
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1]
], dtype=float)

y_pred = np.array([
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1]
], dtype=float)

weights = np.array([1, 1, 1, 1, 1], dtype=float)

with tf.Session():

# We first convert the numpy arrays to Tensorflow tensors
y_true = tf.convert_to_tensor(y_true)
y_pred = tf.convert_to_tensor(y_pred)
weights = tf.convert_to_tensor(weights)

# The following code block is a custom loss
ys = tf.reduce_sum(y_true, axis=0)
y_true = y_true / ys
ln_p = tf.nn.log_softmax(y_pred)
wll = tf.reduce_sum(y_true * ln_p, axis=0)
loss = -tf.tensordot(weights, wll, axes=1)

grad = tf.gradients(loss, y_pred)[0]

hess = tf.hessians(loss, y_pred)[0]
hess = tf.diag_part(hess)

print(hess.eval())


which prints out



[[0.24090069 0.12669198 0.12669198 0.12669198 0.12669198]
[0.12669198 0.24090069 0.12669198 0.12669198 0.12669198]
[0.12669198 0.12669198 0.12669198 0.24090069 0.12669198]
[0.12669198 0.12669198 0.24090069 0.12669198 0.12669198]
[0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]
[0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]
[0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]]


I'm happy with this because it works, the problem is that it doesn't scale. For my use case I only need the diagonal of the Hessian matrix. I've managed to extract it using hess = tf.diag_part(hess) but this will still compute the full Hessian, which is unnecessary. The overhead is so bad that I can't use it for moderate sized datasets (~100k rows).



My question is thus: is there any better way to extract the diagonal of the Hessian? I'm well aware of this post and this one but I don't find the answers good enough.










share|improve this question















I'm doing some machine learning and I have to deal with a custom loss function. The derivatives and the Hessian of the loss function are difficult to derive and so I've resorted to computing them automatically using Tensorflow.



Here is an example.



import numpy as np
import tensorflow as tf

y_true = np.array([
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1]
], dtype=float)

y_pred = np.array([
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1]
], dtype=float)

weights = np.array([1, 1, 1, 1, 1], dtype=float)

with tf.Session():

# We first convert the numpy arrays to Tensorflow tensors
y_true = tf.convert_to_tensor(y_true)
y_pred = tf.convert_to_tensor(y_pred)
weights = tf.convert_to_tensor(weights)

# The following code block is a custom loss
ys = tf.reduce_sum(y_true, axis=0)
y_true = y_true / ys
ln_p = tf.nn.log_softmax(y_pred)
wll = tf.reduce_sum(y_true * ln_p, axis=0)
loss = -tf.tensordot(weights, wll, axes=1)

grad = tf.gradients(loss, y_pred)[0]

hess = tf.hessians(loss, y_pred)[0]
hess = tf.diag_part(hess)

print(hess.eval())


which prints out



[[0.24090069 0.12669198 0.12669198 0.12669198 0.12669198]
[0.12669198 0.24090069 0.12669198 0.12669198 0.12669198]
[0.12669198 0.12669198 0.12669198 0.24090069 0.12669198]
[0.12669198 0.12669198 0.24090069 0.12669198 0.12669198]
[0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]
[0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]
[0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]]


I'm happy with this because it works, the problem is that it doesn't scale. For my use case I only need the diagonal of the Hessian matrix. I've managed to extract it using hess = tf.diag_part(hess) but this will still compute the full Hessian, which is unnecessary. The overhead is so bad that I can't use it for moderate sized datasets (~100k rows).



My question is thus: is there any better way to extract the diagonal of the Hessian? I'm well aware of this post and this one but I don't find the answers good enough.







python tensorflow hessian






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 19 at 15:44

























asked Nov 19 at 15:29









Max Halford

62




62












  • Possible duplicate of Tensorflow: Compute Hessian matrix (only diagonal part) with respect to a high rank tensor
    – b-fg
    Nov 19 at 15:33










  • I'm aware about the link you provided but it's really unclear, plus the Tensorflow API has changed sinced then.
    – Max Halford
    Nov 20 at 11:01










  • Posaible duplicate is not inherently bad. Just means that the question is closely tied to another one. So with this we provide a permanent record of the link between both questions.
    – b-fg
    Nov 20 at 11:10




















  • Possible duplicate of Tensorflow: Compute Hessian matrix (only diagonal part) with respect to a high rank tensor
    – b-fg
    Nov 19 at 15:33










  • I'm aware about the link you provided but it's really unclear, plus the Tensorflow API has changed sinced then.
    – Max Halford
    Nov 20 at 11:01










  • Posaible duplicate is not inherently bad. Just means that the question is closely tied to another one. So with this we provide a permanent record of the link between both questions.
    – b-fg
    Nov 20 at 11:10


















Possible duplicate of Tensorflow: Compute Hessian matrix (only diagonal part) with respect to a high rank tensor
– b-fg
Nov 19 at 15:33




Possible duplicate of Tensorflow: Compute Hessian matrix (only diagonal part) with respect to a high rank tensor
– b-fg
Nov 19 at 15:33












I'm aware about the link you provided but it's really unclear, plus the Tensorflow API has changed sinced then.
– Max Halford
Nov 20 at 11:01




I'm aware about the link you provided but it's really unclear, plus the Tensorflow API has changed sinced then.
– Max Halford
Nov 20 at 11:01












Posaible duplicate is not inherently bad. Just means that the question is closely tied to another one. So with this we provide a permanent record of the link between both questions.
– b-fg
Nov 20 at 11:10






Posaible duplicate is not inherently bad. Just means that the question is closely tied to another one. So with this we provide a permanent record of the link between both questions.
– b-fg
Nov 20 at 11:10



















active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53377860%2fdiagonal-of-the-hessian-with-tensorflow%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown






























active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53377860%2fdiagonal-of-the-hessian-with-tensorflow%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Create new schema in PostgreSQL using DBeaver

Deepest pit of an array with Javascript: test on Codility

Costa Masnaga