Diagonal of the Hessian with Tensorflow
up vote
0
down vote
favorite
I'm doing some machine learning and I have to deal with a custom loss function. The derivatives and the Hessian of the loss function are difficult to derive and so I've resorted to computing them automatically using Tensorflow.
Here is an example.
import numpy as np
import tensorflow as tf
y_true = np.array([
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1]
], dtype=float)
y_pred = np.array([
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1]
], dtype=float)
weights = np.array([1, 1, 1, 1, 1], dtype=float)
with tf.Session():
# We first convert the numpy arrays to Tensorflow tensors
y_true = tf.convert_to_tensor(y_true)
y_pred = tf.convert_to_tensor(y_pred)
weights = tf.convert_to_tensor(weights)
# The following code block is a custom loss
ys = tf.reduce_sum(y_true, axis=0)
y_true = y_true / ys
ln_p = tf.nn.log_softmax(y_pred)
wll = tf.reduce_sum(y_true * ln_p, axis=0)
loss = -tf.tensordot(weights, wll, axes=1)
grad = tf.gradients(loss, y_pred)[0]
hess = tf.hessians(loss, y_pred)[0]
hess = tf.diag_part(hess)
print(hess.eval())
which prints out
[[0.24090069 0.12669198 0.12669198 0.12669198 0.12669198]
[0.12669198 0.24090069 0.12669198 0.12669198 0.12669198]
[0.12669198 0.12669198 0.12669198 0.24090069 0.12669198]
[0.12669198 0.12669198 0.24090069 0.12669198 0.12669198]
[0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]
[0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]
[0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]]
I'm happy with this because it works, the problem is that it doesn't scale. For my use case I only need the diagonal of the Hessian matrix. I've managed to extract it using hess = tf.diag_part(hess)
but this will still compute the full Hessian, which is unnecessary. The overhead is so bad that I can't use it for moderate sized datasets (~100k rows).
My question is thus: is there any better way to extract the diagonal of the Hessian? I'm well aware of this post and this one but I don't find the answers good enough.
python tensorflow hessian
add a comment |
up vote
0
down vote
favorite
I'm doing some machine learning and I have to deal with a custom loss function. The derivatives and the Hessian of the loss function are difficult to derive and so I've resorted to computing them automatically using Tensorflow.
Here is an example.
import numpy as np
import tensorflow as tf
y_true = np.array([
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1]
], dtype=float)
y_pred = np.array([
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1]
], dtype=float)
weights = np.array([1, 1, 1, 1, 1], dtype=float)
with tf.Session():
# We first convert the numpy arrays to Tensorflow tensors
y_true = tf.convert_to_tensor(y_true)
y_pred = tf.convert_to_tensor(y_pred)
weights = tf.convert_to_tensor(weights)
# The following code block is a custom loss
ys = tf.reduce_sum(y_true, axis=0)
y_true = y_true / ys
ln_p = tf.nn.log_softmax(y_pred)
wll = tf.reduce_sum(y_true * ln_p, axis=0)
loss = -tf.tensordot(weights, wll, axes=1)
grad = tf.gradients(loss, y_pred)[0]
hess = tf.hessians(loss, y_pred)[0]
hess = tf.diag_part(hess)
print(hess.eval())
which prints out
[[0.24090069 0.12669198 0.12669198 0.12669198 0.12669198]
[0.12669198 0.24090069 0.12669198 0.12669198 0.12669198]
[0.12669198 0.12669198 0.12669198 0.24090069 0.12669198]
[0.12669198 0.12669198 0.24090069 0.12669198 0.12669198]
[0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]
[0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]
[0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]]
I'm happy with this because it works, the problem is that it doesn't scale. For my use case I only need the diagonal of the Hessian matrix. I've managed to extract it using hess = tf.diag_part(hess)
but this will still compute the full Hessian, which is unnecessary. The overhead is so bad that I can't use it for moderate sized datasets (~100k rows).
My question is thus: is there any better way to extract the diagonal of the Hessian? I'm well aware of this post and this one but I don't find the answers good enough.
python tensorflow hessian
Possible duplicate of Tensorflow: Compute Hessian matrix (only diagonal part) with respect to a high rank tensor
– b-fg
Nov 19 at 15:33
I'm aware about the link you provided but it's really unclear, plus the Tensorflow API has changed sinced then.
– Max Halford
Nov 20 at 11:01
Posaible duplicate is not inherently bad. Just means that the question is closely tied to another one. So with this we provide a permanent record of the link between both questions.
– b-fg
Nov 20 at 11:10
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I'm doing some machine learning and I have to deal with a custom loss function. The derivatives and the Hessian of the loss function are difficult to derive and so I've resorted to computing them automatically using Tensorflow.
Here is an example.
import numpy as np
import tensorflow as tf
y_true = np.array([
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1]
], dtype=float)
y_pred = np.array([
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1]
], dtype=float)
weights = np.array([1, 1, 1, 1, 1], dtype=float)
with tf.Session():
# We first convert the numpy arrays to Tensorflow tensors
y_true = tf.convert_to_tensor(y_true)
y_pred = tf.convert_to_tensor(y_pred)
weights = tf.convert_to_tensor(weights)
# The following code block is a custom loss
ys = tf.reduce_sum(y_true, axis=0)
y_true = y_true / ys
ln_p = tf.nn.log_softmax(y_pred)
wll = tf.reduce_sum(y_true * ln_p, axis=0)
loss = -tf.tensordot(weights, wll, axes=1)
grad = tf.gradients(loss, y_pred)[0]
hess = tf.hessians(loss, y_pred)[0]
hess = tf.diag_part(hess)
print(hess.eval())
which prints out
[[0.24090069 0.12669198 0.12669198 0.12669198 0.12669198]
[0.12669198 0.24090069 0.12669198 0.12669198 0.12669198]
[0.12669198 0.12669198 0.12669198 0.24090069 0.12669198]
[0.12669198 0.12669198 0.24090069 0.12669198 0.12669198]
[0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]
[0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]
[0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]]
I'm happy with this because it works, the problem is that it doesn't scale. For my use case I only need the diagonal of the Hessian matrix. I've managed to extract it using hess = tf.diag_part(hess)
but this will still compute the full Hessian, which is unnecessary. The overhead is so bad that I can't use it for moderate sized datasets (~100k rows).
My question is thus: is there any better way to extract the diagonal of the Hessian? I'm well aware of this post and this one but I don't find the answers good enough.
python tensorflow hessian
I'm doing some machine learning and I have to deal with a custom loss function. The derivatives and the Hessian of the loss function are difficult to derive and so I've resorted to computing them automatically using Tensorflow.
Here is an example.
import numpy as np
import tensorflow as tf
y_true = np.array([
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1]
], dtype=float)
y_pred = np.array([
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1]
], dtype=float)
weights = np.array([1, 1, 1, 1, 1], dtype=float)
with tf.Session():
# We first convert the numpy arrays to Tensorflow tensors
y_true = tf.convert_to_tensor(y_true)
y_pred = tf.convert_to_tensor(y_pred)
weights = tf.convert_to_tensor(weights)
# The following code block is a custom loss
ys = tf.reduce_sum(y_true, axis=0)
y_true = y_true / ys
ln_p = tf.nn.log_softmax(y_pred)
wll = tf.reduce_sum(y_true * ln_p, axis=0)
loss = -tf.tensordot(weights, wll, axes=1)
grad = tf.gradients(loss, y_pred)[0]
hess = tf.hessians(loss, y_pred)[0]
hess = tf.diag_part(hess)
print(hess.eval())
which prints out
[[0.24090069 0.12669198 0.12669198 0.12669198 0.12669198]
[0.12669198 0.24090069 0.12669198 0.12669198 0.12669198]
[0.12669198 0.12669198 0.12669198 0.24090069 0.12669198]
[0.12669198 0.12669198 0.24090069 0.12669198 0.12669198]
[0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]
[0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]
[0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]]
I'm happy with this because it works, the problem is that it doesn't scale. For my use case I only need the diagonal of the Hessian matrix. I've managed to extract it using hess = tf.diag_part(hess)
but this will still compute the full Hessian, which is unnecessary. The overhead is so bad that I can't use it for moderate sized datasets (~100k rows).
My question is thus: is there any better way to extract the diagonal of the Hessian? I'm well aware of this post and this one but I don't find the answers good enough.
python tensorflow hessian
python tensorflow hessian
edited Nov 19 at 15:44
asked Nov 19 at 15:29
Max Halford
62
62
Possible duplicate of Tensorflow: Compute Hessian matrix (only diagonal part) with respect to a high rank tensor
– b-fg
Nov 19 at 15:33
I'm aware about the link you provided but it's really unclear, plus the Tensorflow API has changed sinced then.
– Max Halford
Nov 20 at 11:01
Posaible duplicate is not inherently bad. Just means that the question is closely tied to another one. So with this we provide a permanent record of the link between both questions.
– b-fg
Nov 20 at 11:10
add a comment |
Possible duplicate of Tensorflow: Compute Hessian matrix (only diagonal part) with respect to a high rank tensor
– b-fg
Nov 19 at 15:33
I'm aware about the link you provided but it's really unclear, plus the Tensorflow API has changed sinced then.
– Max Halford
Nov 20 at 11:01
Posaible duplicate is not inherently bad. Just means that the question is closely tied to another one. So with this we provide a permanent record of the link between both questions.
– b-fg
Nov 20 at 11:10
Possible duplicate of Tensorflow: Compute Hessian matrix (only diagonal part) with respect to a high rank tensor
– b-fg
Nov 19 at 15:33
Possible duplicate of Tensorflow: Compute Hessian matrix (only diagonal part) with respect to a high rank tensor
– b-fg
Nov 19 at 15:33
I'm aware about the link you provided but it's really unclear, plus the Tensorflow API has changed sinced then.
– Max Halford
Nov 20 at 11:01
I'm aware about the link you provided but it's really unclear, plus the Tensorflow API has changed sinced then.
– Max Halford
Nov 20 at 11:01
Posaible duplicate is not inherently bad. Just means that the question is closely tied to another one. So with this we provide a permanent record of the link between both questions.
– b-fg
Nov 20 at 11:10
Posaible duplicate is not inherently bad. Just means that the question is closely tied to another one. So with this we provide a permanent record of the link between both questions.
– b-fg
Nov 20 at 11:10
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53377860%2fdiagonal-of-the-hessian-with-tensorflow%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Possible duplicate of Tensorflow: Compute Hessian matrix (only diagonal part) with respect to a high rank tensor
– b-fg
Nov 19 at 15:33
I'm aware about the link you provided but it's really unclear, plus the Tensorflow API has changed sinced then.
– Max Halford
Nov 20 at 11:01
Posaible duplicate is not inherently bad. Just means that the question is closely tied to another one. So with this we provide a permanent record of the link between both questions.
– b-fg
Nov 20 at 11:10