Diagonal of the Hessian with Tensorflow

up vote
0
down vote

favorite

I'm doing some machine learning and I have to deal with a custom loss function. The derivatives and the Hessian of the loss function are difficult to derive and so I've resorted to computing them automatically using Tensorflow.

Here is an example.

import numpy as np

import tensorflow as tf



y_true = np.array([

    [1, 0, 0, 0, 0],

    [0, 1, 0, 0, 0],

    [0, 0, 0, 1, 0],

    [0, 0, 1, 0, 0],

    [0, 0, 0, 0, 1],

    [0, 0, 0, 0, 1],

    [0, 0, 0, 0, 1]

], dtype=float)



y_pred = np.array([

    [1, 0, 0, 0, 0],

    [0, 1, 0, 0, 0],

    [0, 0, 0, 1, 0],

    [0, 0, 1, 0, 0],

    [0, 0, 0, 0, 1],

    [0, 0, 0, 0, 1],

    [0, 0, 0, 0, 1]

], dtype=float)



weights = np.array([1, 1, 1, 1, 1], dtype=float)



with tf.Session():



    # We first convert the numpy arrays to Tensorflow tensors

    y_true = tf.convert_to_tensor(y_true)

    y_pred = tf.convert_to_tensor(y_pred)

    weights = tf.convert_to_tensor(weights)



    # The following code block is a custom loss 

    ys = tf.reduce_sum(y_true, axis=0)

    y_true = y_true / ys

    ln_p = tf.nn.log_softmax(y_pred)

    wll = tf.reduce_sum(y_true * ln_p, axis=0)

    loss = -tf.tensordot(weights, wll, axes=1)



    grad = tf.gradients(loss, y_pred)[0]



    hess = tf.hessians(loss, y_pred)[0]

    hess = tf.diag_part(hess)



    print(hess.eval())

which prints out

[[0.24090069 0.12669198 0.12669198 0.12669198 0.12669198]

 [0.12669198 0.24090069 0.12669198 0.12669198 0.12669198]

 [0.12669198 0.12669198 0.12669198 0.24090069 0.12669198]

 [0.12669198 0.12669198 0.24090069 0.12669198 0.12669198]

 [0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]

 [0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]

 [0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]]

I'm happy with this because it works, the problem is that it doesn't scale. For my use case I only need the diagonal of the Hessian matrix. I've managed to extract it using hess = tf.diag_part(hess) but this will still compute the full Hessian, which is unnecessary. The overhead is so bad that I can't use it for moderate sized datasets (~100k rows).

My question is thus: is there any better way to extract the diagonal of the Hessian? I'm well aware of this post and this one but I don't find the answers good enough.

edited Nov 19 at 15:44

asked Nov 19 at 15:29

Max Halford

Possible duplicate of Tensorflow: Compute Hessian matrix (only diagonal part) with respect to a high rank tensor
– b-fg
Nov 19 at 15:33

I'm aware about the link you provided but it's really unclear, plus the Tensorflow API has changed sinced then.
– Max Halford
Nov 20 at 11:01

Posaible duplicate is not inherently bad. Just means that the question is closely tied to another one. So with this we provide a permanent record of the link between both questions.
– b-fg
Nov 20 at 11:10

add a comment |

up vote
0
down vote

favorite

Here is an example.

import numpy as np

import tensorflow as tf



y_true = np.array([

    [1, 0, 0, 0, 0],

    [0, 1, 0, 0, 0],

    [0, 0, 0, 1, 0],

    [0, 0, 1, 0, 0],

    [0, 0, 0, 0, 1],

    [0, 0, 0, 0, 1],

    [0, 0, 0, 0, 1]

], dtype=float)



y_pred = np.array([

    [1, 0, 0, 0, 0],

    [0, 1, 0, 0, 0],

    [0, 0, 0, 1, 0],

    [0, 0, 1, 0, 0],

    [0, 0, 0, 0, 1],

    [0, 0, 0, 0, 1],

    [0, 0, 0, 0, 1]

], dtype=float)



weights = np.array([1, 1, 1, 1, 1], dtype=float)



with tf.Session():



    # We first convert the numpy arrays to Tensorflow tensors

    y_true = tf.convert_to_tensor(y_true)

    y_pred = tf.convert_to_tensor(y_pred)

    weights = tf.convert_to_tensor(weights)



    # The following code block is a custom loss 

    ys = tf.reduce_sum(y_true, axis=0)

    y_true = y_true / ys

    ln_p = tf.nn.log_softmax(y_pred)

    wll = tf.reduce_sum(y_true * ln_p, axis=0)

    loss = -tf.tensordot(weights, wll, axes=1)



    grad = tf.gradients(loss, y_pred)[0]



    hess = tf.hessians(loss, y_pred)[0]

    hess = tf.diag_part(hess)



    print(hess.eval())

which prints out

[[0.24090069 0.12669198 0.12669198 0.12669198 0.12669198]

 [0.12669198 0.24090069 0.12669198 0.12669198 0.12669198]

 [0.12669198 0.12669198 0.12669198 0.24090069 0.12669198]

 [0.12669198 0.12669198 0.24090069 0.12669198 0.12669198]

 [0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]

 [0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]

 [0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]]

My question is thus: is there any better way to extract the diagonal of the Hessian? I'm well aware of this post and this one but I don't find the answers good enough.

edited Nov 19 at 15:44

asked Nov 19 at 15:29

Max Halford

Possible duplicate of Tensorflow: Compute Hessian matrix (only diagonal part) with respect to a high rank tensor
– b-fg
Nov 19 at 15:33

I'm aware about the link you provided but it's really unclear, plus the Tensorflow API has changed sinced then.
– Max Halford
Nov 20 at 11:01

Posaible duplicate is not inherently bad. Just means that the question is closely tied to another one. So with this we provide a permanent record of the link between both questions.
– b-fg
Nov 20 at 11:10

add a comment |

up vote
0
down vote

favorite

Here is an example.

import numpy as np

import tensorflow as tf



y_true = np.array([

    [1, 0, 0, 0, 0],

    [0, 1, 0, 0, 0],

    [0, 0, 0, 1, 0],

    [0, 0, 1, 0, 0],

    [0, 0, 0, 0, 1],

    [0, 0, 0, 0, 1],

    [0, 0, 0, 0, 1]

], dtype=float)



y_pred = np.array([

    [1, 0, 0, 0, 0],

    [0, 1, 0, 0, 0],

    [0, 0, 0, 1, 0],

    [0, 0, 1, 0, 0],

    [0, 0, 0, 0, 1],

    [0, 0, 0, 0, 1],

    [0, 0, 0, 0, 1]

], dtype=float)



weights = np.array([1, 1, 1, 1, 1], dtype=float)



with tf.Session():



    # We first convert the numpy arrays to Tensorflow tensors

    y_true = tf.convert_to_tensor(y_true)

    y_pred = tf.convert_to_tensor(y_pred)

    weights = tf.convert_to_tensor(weights)



    # The following code block is a custom loss 

    ys = tf.reduce_sum(y_true, axis=0)

    y_true = y_true / ys

    ln_p = tf.nn.log_softmax(y_pred)

    wll = tf.reduce_sum(y_true * ln_p, axis=0)

    loss = -tf.tensordot(weights, wll, axes=1)



    grad = tf.gradients(loss, y_pred)[0]



    hess = tf.hessians(loss, y_pred)[0]

    hess = tf.diag_part(hess)



    print(hess.eval())

which prints out

[[0.24090069 0.12669198 0.12669198 0.12669198 0.12669198]

 [0.12669198 0.24090069 0.12669198 0.12669198 0.12669198]

 [0.12669198 0.12669198 0.12669198 0.24090069 0.12669198]

 [0.12669198 0.12669198 0.24090069 0.12669198 0.12669198]

 [0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]

 [0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]

 [0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]]

My question is thus: is there any better way to extract the diagonal of the Hessian? I'm well aware of this post and this one but I don't find the answers good enough.

edited Nov 19 at 15:44

asked Nov 19 at 15:29

Max Halford

Here is an example.

import numpy as np

import tensorflow as tf



y_true = np.array([

    [1, 0, 0, 0, 0],

    [0, 1, 0, 0, 0],

    [0, 0, 0, 1, 0],

    [0, 0, 1, 0, 0],

    [0, 0, 0, 0, 1],

    [0, 0, 0, 0, 1],

    [0, 0, 0, 0, 1]

], dtype=float)



y_pred = np.array([

    [1, 0, 0, 0, 0],

    [0, 1, 0, 0, 0],

    [0, 0, 0, 1, 0],

    [0, 0, 1, 0, 0],

    [0, 0, 0, 0, 1],

    [0, 0, 0, 0, 1],

    [0, 0, 0, 0, 1]

], dtype=float)



weights = np.array([1, 1, 1, 1, 1], dtype=float)



with tf.Session():



    # We first convert the numpy arrays to Tensorflow tensors

    y_true = tf.convert_to_tensor(y_true)

    y_pred = tf.convert_to_tensor(y_pred)

    weights = tf.convert_to_tensor(weights)



    # The following code block is a custom loss 

    ys = tf.reduce_sum(y_true, axis=0)

    y_true = y_true / ys

    ln_p = tf.nn.log_softmax(y_pred)

    wll = tf.reduce_sum(y_true * ln_p, axis=0)

    loss = -tf.tensordot(weights, wll, axes=1)



    grad = tf.gradients(loss, y_pred)[0]



    hess = tf.hessians(loss, y_pred)[0]

    hess = tf.diag_part(hess)



    print(hess.eval())

which prints out

[[0.24090069 0.12669198 0.12669198 0.12669198 0.12669198]

 [0.12669198 0.24090069 0.12669198 0.12669198 0.12669198]

 [0.12669198 0.12669198 0.12669198 0.24090069 0.12669198]

 [0.12669198 0.12669198 0.24090069 0.12669198 0.12669198]

 [0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]

 [0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]

 [0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]]

My question is thus: is there any better way to extract the diagonal of the Hessian? I'm well aware of this post and this one but I don't find the answers good enough.

python tensorflow hessian

edited Nov 19 at 15:44

asked Nov 19 at 15:29

Max Halford

edited Nov 19 at 15:44

asked Nov 19 at 15:29

Max Halford

edited Nov 19 at 15:44

asked Nov 19 at 15:29

Max Halford

asked Nov 19 at 15:29

Max Halford

asked Nov 19 at 15:29

Max Halford

Possible duplicate of Tensorflow: Compute Hessian matrix (only diagonal part) with respect to a high rank tensor
– b-fg
Nov 19 at 15:33

I'm aware about the link you provided but it's really unclear, plus the Tensorflow API has changed sinced then.
– Max Halford
Nov 20 at 11:01

Posaible duplicate is not inherently bad. Just means that the question is closely tied to another one. So with this we provide a permanent record of the link between both questions.
– b-fg
Nov 20 at 11:10

add a comment |

Possible duplicate of Tensorflow: Compute Hessian matrix (only diagonal part) with respect to a high rank tensor
– b-fg
Nov 19 at 15:33

I'm aware about the link you provided but it's really unclear, plus the Tensorflow API has changed sinced then.
– Max Halford
Nov 20 at 11:01

Posaible duplicate is not inherently bad. Just means that the question is closely tied to another one. So with this we provide a permanent record of the link between both questions.
– b-fg
Nov 20 at 11:10

Possible duplicate of Tensorflow: Compute Hessian matrix (only diagonal part) with respect to a high rank tensor
– b-fg
Nov 19 at 15:33

I'm aware about the link you provided but it's really unclear, plus the Tensorflow API has changed sinced then.
– Max Halford
Nov 20 at 11:01

Posaible duplicate is not inherently bad. Just means that the question is closely tied to another one. So with this we provide a permanent record of the link between both questions.
– b-fg
Nov 20 at 11:10

add a comment |

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53377860%2fdiagonal-of-the-hessian-with-tensorflow%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Nsryjdtyk