What is the difference between Model.train_on_batch from keras and Session.run([train

What is the difference between Model.train_on_batch from keras and Session.run([train_optimizer]) from...

In the following Keras and Tensorflow implementations of the training of a neural network, how model.train_on_batch([x], [y]) in the keras implementation is different than sess.run([train_optimizer, cross_entropy, accuracy_op], feed_dict=feed_dict) in the Tensorflow implementation? In particular: how those two lines can lead to different computation in training?:

keras_version.py

input_x = Input(shape=input_shape, name="x")

c = Dense(num_classes, activation="softmax")(input_x)



model = Model([input_x], [c])

opt = Adam(lr)

model.compile(loss=['categorical_crossentropy'], optimizer=opt)



nb_batchs = int(len(x_train)/batch_size)



for epoch in range(epochs):

    loss = 0.0

    for batch in range(nb_batchs):

        x = x_train[batch*batch_size:(batch+1)*batch_size]

        y = y_train[batch*batch_size:(batch+1)*batch_size]



        loss_batch, acc_batch = model.train_on_batch([x], [y])



        loss += loss_batch

    print(epoch, loss / nb_batchs)

tensorflow_version.py

input_x = Input(shape=input_shape, name="x")

c = Dense(num_classes)(input_x)



input_y = tf.placeholder(tf.float32, shape=[None, num_classes], name="label")

cross_entropy = tf.reduce_mean(

    tf.nn.softmax_cross_entropy_with_logits_v2(labels=input_y, logits=c, name="xentropy"),

    name="xentropy_mean"

)

train_optimizer = tf.train.AdamOptimizer(learning_rate=lr).minimize(cross_entropy)



nb_batchs = int(len(x_train)/batch_size)



init = tf.global_variables_initializer()

with tf.Session() as sess:

    sess.run(init)

    for epoch in range(epochs):

        loss = 0.0

        acc = 0.0



        for batch in range(nb_batchs):

            x = x_train[batch*batch_size:(batch+1)*batch_size]

            y = y_train[batch*batch_size:(batch+1)*batch_size]



            feed_dict = {input_x: x,

                         input_y: y}

            _, loss_batch = sess.run([train_optimizer, cross_entropy], feed_dict=feed_dict)



            loss += loss_batch

        print(epoch, loss / nb_batchs)

Note: This question follows Same (?) model converges in Keras but not in Tensorflow , which have been considered too broad but in which I show exactly why I think those two statements are somehow different and lead to different computation.

edited Nov 24 at 9:10

asked Nov 20 at 15:19

LucG

12911

add a comment |

keras_version.py

input_x = Input(shape=input_shape, name="x")

c = Dense(num_classes, activation="softmax")(input_x)



model = Model([input_x], [c])

opt = Adam(lr)

model.compile(loss=['categorical_crossentropy'], optimizer=opt)



nb_batchs = int(len(x_train)/batch_size)



for epoch in range(epochs):

    loss = 0.0

    for batch in range(nb_batchs):

        x = x_train[batch*batch_size:(batch+1)*batch_size]

        y = y_train[batch*batch_size:(batch+1)*batch_size]



        loss_batch, acc_batch = model.train_on_batch([x], [y])



        loss += loss_batch

    print(epoch, loss / nb_batchs)

tensorflow_version.py

input_x = Input(shape=input_shape, name="x")

c = Dense(num_classes)(input_x)



input_y = tf.placeholder(tf.float32, shape=[None, num_classes], name="label")

cross_entropy = tf.reduce_mean(

    tf.nn.softmax_cross_entropy_with_logits_v2(labels=input_y, logits=c, name="xentropy"),

    name="xentropy_mean"

)

train_optimizer = tf.train.AdamOptimizer(learning_rate=lr).minimize(cross_entropy)



nb_batchs = int(len(x_train)/batch_size)



init = tf.global_variables_initializer()

with tf.Session() as sess:

    sess.run(init)

    for epoch in range(epochs):

        loss = 0.0

        acc = 0.0



        for batch in range(nb_batchs):

            x = x_train[batch*batch_size:(batch+1)*batch_size]

            y = y_train[batch*batch_size:(batch+1)*batch_size]



            feed_dict = {input_x: x,

                         input_y: y}

            _, loss_batch = sess.run([train_optimizer, cross_entropy], feed_dict=feed_dict)



            loss += loss_batch

        print(epoch, loss / nb_batchs)

edited Nov 24 at 9:10

asked Nov 20 at 15:19

LucG

12911

add a comment |

keras_version.py

input_x = Input(shape=input_shape, name="x")

c = Dense(num_classes, activation="softmax")(input_x)



model = Model([input_x], [c])

opt = Adam(lr)

model.compile(loss=['categorical_crossentropy'], optimizer=opt)



nb_batchs = int(len(x_train)/batch_size)



for epoch in range(epochs):

    loss = 0.0

    for batch in range(nb_batchs):

        x = x_train[batch*batch_size:(batch+1)*batch_size]

        y = y_train[batch*batch_size:(batch+1)*batch_size]



        loss_batch, acc_batch = model.train_on_batch([x], [y])



        loss += loss_batch

    print(epoch, loss / nb_batchs)

tensorflow_version.py

input_x = Input(shape=input_shape, name="x")

c = Dense(num_classes)(input_x)



input_y = tf.placeholder(tf.float32, shape=[None, num_classes], name="label")

cross_entropy = tf.reduce_mean(

    tf.nn.softmax_cross_entropy_with_logits_v2(labels=input_y, logits=c, name="xentropy"),

    name="xentropy_mean"

)

train_optimizer = tf.train.AdamOptimizer(learning_rate=lr).minimize(cross_entropy)



nb_batchs = int(len(x_train)/batch_size)



init = tf.global_variables_initializer()

with tf.Session() as sess:

    sess.run(init)

    for epoch in range(epochs):

        loss = 0.0

        acc = 0.0



        for batch in range(nb_batchs):

            x = x_train[batch*batch_size:(batch+1)*batch_size]

            y = y_train[batch*batch_size:(batch+1)*batch_size]



            feed_dict = {input_x: x,

                         input_y: y}

            _, loss_batch = sess.run([train_optimizer, cross_entropy], feed_dict=feed_dict)



            loss += loss_batch

        print(epoch, loss / nb_batchs)

edited Nov 24 at 9:10

asked Nov 20 at 15:19

LucG

12911

keras_version.py

input_x = Input(shape=input_shape, name="x")

c = Dense(num_classes, activation="softmax")(input_x)



model = Model([input_x], [c])

opt = Adam(lr)

model.compile(loss=['categorical_crossentropy'], optimizer=opt)



nb_batchs = int(len(x_train)/batch_size)



for epoch in range(epochs):

    loss = 0.0

    for batch in range(nb_batchs):

        x = x_train[batch*batch_size:(batch+1)*batch_size]

        y = y_train[batch*batch_size:(batch+1)*batch_size]



        loss_batch, acc_batch = model.train_on_batch([x], [y])



        loss += loss_batch

    print(epoch, loss / nb_batchs)

tensorflow_version.py

input_x = Input(shape=input_shape, name="x")

c = Dense(num_classes)(input_x)



input_y = tf.placeholder(tf.float32, shape=[None, num_classes], name="label")

cross_entropy = tf.reduce_mean(

    tf.nn.softmax_cross_entropy_with_logits_v2(labels=input_y, logits=c, name="xentropy"),

    name="xentropy_mean"

)

train_optimizer = tf.train.AdamOptimizer(learning_rate=lr).minimize(cross_entropy)



nb_batchs = int(len(x_train)/batch_size)



init = tf.global_variables_initializer()

with tf.Session() as sess:

    sess.run(init)

    for epoch in range(epochs):

        loss = 0.0

        acc = 0.0



        for batch in range(nb_batchs):

            x = x_train[batch*batch_size:(batch+1)*batch_size]

            y = y_train[batch*batch_size:(batch+1)*batch_size]



            feed_dict = {input_x: x,

                         input_y: y}

            _, loss_batch = sess.run([train_optimizer, cross_entropy], feed_dict=feed_dict)



            loss += loss_batch

        print(epoch, loss / nb_batchs)

python tensorflow machine-learning keras

edited Nov 24 at 9:10

asked Nov 20 at 15:19

LucG

12911

edited Nov 24 at 9:10

asked Nov 20 at 15:19

LucG

12911

edited Nov 24 at 9:10

asked Nov 20 at 15:19

LucG

12911

asked Nov 20 at 15:19

LucG

12911

asked Nov 20 at 15:19

LucG

12911

add a comment |

1 Answer
1

active

oldest

votes

+25

Yes, the results can be different. The results shouldn't be surprising if you know the following things in advance:

Implementation of corss-entropy in Tensorflow and Keras is different. Tensorflow assumes the input to tf.nn.softmax_cross_entropy_with_logits_v2 as the raw unnormalized logits while Keras accepts inputs as probabilities

Implementation of optimizers in Keras and Tensorflow are different.

It might be the case that you are shuffling the data and the batches passed aren't in the same order. Although it doesn't matter if you run the model for long but initial few epochs can be entirely different. Make sure same batch is passed to both and then compare the results.

answered Nov 24 at 11:17

mlRocks

472113

Can you elaborate on how the implementation of the optimizers are different? I have tried to compute and apply the gradient my self in the tensorflow version, which has not brought better results, I was still using the optimizer class though. Items 1 and 3 are not satisfying answers in that case because 1 I feed the tf optimizer with the output of a softmax operation, which I don't with the keras one and 3 the tf model never converges whern the keras one always does.
– LucG
Nov 24 at 14:03

Compare the source code for it. Plus 1) and 3) are totally relevant. Idk what made you them irrelevenat
– mlRocks
Nov 24 at 14:36

Yes, they are relevant. I meant they are not in my specific case because 1 I feed the logits in the tf loss computation while I feed probabilities in the keras loss and 3 this stand for one run but In my case, keras code always gives convergence while the tf one never does. Yes, thank you for the "compare the source code" advice. The whole question is about comparing source code, that's the point: I am not capable enough to understand the differences yet.
– LucG
Nov 25 at 6:21

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53396147%2fwhat-is-the-difference-between-model-train-on-batch-from-keras-and-session-run%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

+25

Yes, the results can be different. The results shouldn't be surprising if you know the following things in advance:

Implementation of corss-entropy in Tensorflow and Keras is different. Tensorflow assumes the input to tf.nn.softmax_cross_entropy_with_logits_v2 as the raw unnormalized logits while Keras accepts inputs as probabilities

Implementation of optimizers in Keras and Tensorflow are different.

It might be the case that you are shuffling the data and the batches passed aren't in the same order. Although it doesn't matter if you run the model for long but initial few epochs can be entirely different. Make sure same batch is passed to both and then compare the results.

answered Nov 24 at 11:17

mlRocks

472113

Can you elaborate on how the implementation of the optimizers are different? I have tried to compute and apply the gradient my self in the tensorflow version, which has not brought better results, I was still using the optimizer class though. Items 1 and 3 are not satisfying answers in that case because 1 I feed the tf optimizer with the output of a softmax operation, which I don't with the keras one and 3 the tf model never converges whern the keras one always does.
– LucG
Nov 24 at 14:03

Compare the source code for it. Plus 1) and 3) are totally relevant. Idk what made you them irrelevenat
– mlRocks
Nov 24 at 14:36

Yes, they are relevant. I meant they are not in my specific case because 1 I feed the logits in the tf loss computation while I feed probabilities in the keras loss and 3 this stand for one run but In my case, keras code always gives convergence while the tf one never does. Yes, thank you for the "compare the source code" advice. The whole question is about comparing source code, that's the point: I am not capable enough to understand the differences yet.
– LucG
Nov 25 at 6:21

add a comment |

+25

Yes, the results can be different. The results shouldn't be surprising if you know the following things in advance:

Implementation of corss-entropy in Tensorflow and Keras is different. Tensorflow assumes the input to tf.nn.softmax_cross_entropy_with_logits_v2 as the raw unnormalized logits while Keras accepts inputs as probabilities

Implementation of optimizers in Keras and Tensorflow are different.

It might be the case that you are shuffling the data and the batches passed aren't in the same order. Although it doesn't matter if you run the model for long but initial few epochs can be entirely different. Make sure same batch is passed to both and then compare the results.

answered Nov 24 at 11:17

mlRocks

472113

Can you elaborate on how the implementation of the optimizers are different? I have tried to compute and apply the gradient my self in the tensorflow version, which has not brought better results, I was still using the optimizer class though. Items 1 and 3 are not satisfying answers in that case because 1 I feed the tf optimizer with the output of a softmax operation, which I don't with the keras one and 3 the tf model never converges whern the keras one always does.
– LucG
Nov 24 at 14:03

Compare the source code for it. Plus 1) and 3) are totally relevant. Idk what made you them irrelevenat
– mlRocks
Nov 24 at 14:36

Yes, they are relevant. I meant they are not in my specific case because 1 I feed the logits in the tf loss computation while I feed probabilities in the keras loss and 3 this stand for one run but In my case, keras code always gives convergence while the tf one never does. Yes, thank you for the "compare the source code" advice. The whole question is about comparing source code, that's the point: I am not capable enough to understand the differences yet.
– LucG
Nov 25 at 6:21

add a comment |

+25

Yes, the results can be different. The results shouldn't be surprising if you know the following things in advance:

Implementation of corss-entropy in Tensorflow and Keras is different. Tensorflow assumes the input to tf.nn.softmax_cross_entropy_with_logits_v2 as the raw unnormalized logits while Keras accepts inputs as probabilities

Implementation of optimizers in Keras and Tensorflow are different.

It might be the case that you are shuffling the data and the batches passed aren't in the same order. Although it doesn't matter if you run the model for long but initial few epochs can be entirely different. Make sure same batch is passed to both and then compare the results.

answered Nov 24 at 11:17

mlRocks

472113

Yes, the results can be different. The results shouldn't be surprising if you know the following things in advance:

Implementation of corss-entropy in Tensorflow and Keras is different. Tensorflow assumes the input to tf.nn.softmax_cross_entropy_with_logits_v2 as the raw unnormalized logits while Keras accepts inputs as probabilities

Implementation of optimizers in Keras and Tensorflow are different.

It might be the case that you are shuffling the data and the batches passed aren't in the same order. Although it doesn't matter if you run the model for long but initial few epochs can be entirely different. Make sure same batch is passed to both and then compare the results.

answered Nov 24 at 11:17

mlRocks

472113

answered Nov 24 at 11:17

mlRocks

472113

answered Nov 24 at 11:17

mlRocks

472113

answered Nov 24 at 11:17

mlRocks

472113

Can you elaborate on how the implementation of the optimizers are different? I have tried to compute and apply the gradient my self in the tensorflow version, which has not brought better results, I was still using the optimizer class though. Items 1 and 3 are not satisfying answers in that case because 1 I feed the tf optimizer with the output of a softmax operation, which I don't with the keras one and 3 the tf model never converges whern the keras one always does.
– LucG
Nov 24 at 14:03

Compare the source code for it. Plus 1) and 3) are totally relevant. Idk what made you them irrelevenat
– mlRocks
Nov 24 at 14:36

Yes, they are relevant. I meant they are not in my specific case because 1 I feed the logits in the tf loss computation while I feed probabilities in the keras loss and 3 this stand for one run but In my case, keras code always gives convergence while the tf one never does. Yes, thank you for the "compare the source code" advice. The whole question is about comparing source code, that's the point: I am not capable enough to understand the differences yet.
– LucG
Nov 25 at 6:21

add a comment |

Can you elaborate on how the implementation of the optimizers are different? I have tried to compute and apply the gradient my self in the tensorflow version, which has not brought better results, I was still using the optimizer class though. Items 1 and 3 are not satisfying answers in that case because 1 I feed the tf optimizer with the output of a softmax operation, which I don't with the keras one and 3 the tf model never converges whern the keras one always does.
– LucG
Nov 24 at 14:03

Compare the source code for it. Plus 1) and 3) are totally relevant. Idk what made you them irrelevenat
– mlRocks
Nov 24 at 14:36

Yes, they are relevant. I meant they are not in my specific case because 1 I feed the logits in the tf loss computation while I feed probabilities in the keras loss and 3 this stand for one run but In my case, keras code always gives convergence while the tf one never does. Yes, thank you for the "compare the source code" advice. The whole question is about comparing source code, that's the point: I am not capable enough to understand the differences yet.
– LucG
Nov 25 at 6:21

Can you elaborate on how the implementation of the optimizers are different? I have tried to compute and apply the gradient my self in the tensorflow version, which has not brought better results, I was still using the optimizer class though. Items 1 and 3 are not satisfying answers in that case because 1 I feed the tf optimizer with the output of a softmax operation, which I don't with the keras one and 3 the tf model never converges whern the keras one always does.
– LucG
Nov 24 at 14:03

Compare the source code for it. Plus 1) and 3) are totally relevant. Idk what made you them irrelevenat
– mlRocks
Nov 24 at 14:36

Yes, they are relevant. I meant they are not in my specific case because 1 I feed the logits in the tf loss computation while I feed probabilities in the keras loss and 3 this stand for one run but In my case, keras code always gives convergence while the tf one never does. Yes, thank you for the "compare the source code" advice. The whole question is about comparing source code, that's the point: I am not capable enough to understand the differences yet.
– LucG
Nov 25 at 6:21

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Nsryjdtyk