What is the difference between Model.train_on_batch from keras and Session.run([train_optimizer]) from...
In the following Keras and Tensorflow implementations of the training of a neural network, how model.train_on_batch([x], [y])
in the keras implementation is different than sess.run([train_optimizer, cross_entropy, accuracy_op], feed_dict=feed_dict)
in the Tensorflow implementation? In particular: how those two lines can lead to different computation in training?:
keras_version.py
input_x = Input(shape=input_shape, name="x")
c = Dense(num_classes, activation="softmax")(input_x)
model = Model([input_x], [c])
opt = Adam(lr)
model.compile(loss=['categorical_crossentropy'], optimizer=opt)
nb_batchs = int(len(x_train)/batch_size)
for epoch in range(epochs):
loss = 0.0
for batch in range(nb_batchs):
x = x_train[batch*batch_size:(batch+1)*batch_size]
y = y_train[batch*batch_size:(batch+1)*batch_size]
loss_batch, acc_batch = model.train_on_batch([x], [y])
loss += loss_batch
print(epoch, loss / nb_batchs)
tensorflow_version.py
input_x = Input(shape=input_shape, name="x")
c = Dense(num_classes)(input_x)
input_y = tf.placeholder(tf.float32, shape=[None, num_classes], name="label")
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits_v2(labels=input_y, logits=c, name="xentropy"),
name="xentropy_mean"
)
train_optimizer = tf.train.AdamOptimizer(learning_rate=lr).minimize(cross_entropy)
nb_batchs = int(len(x_train)/batch_size)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for epoch in range(epochs):
loss = 0.0
acc = 0.0
for batch in range(nb_batchs):
x = x_train[batch*batch_size:(batch+1)*batch_size]
y = y_train[batch*batch_size:(batch+1)*batch_size]
feed_dict = {input_x: x,
input_y: y}
_, loss_batch = sess.run([train_optimizer, cross_entropy], feed_dict=feed_dict)
loss += loss_batch
print(epoch, loss / nb_batchs)
Note: This question follows Same (?) model converges in Keras but not in Tensorflow , which have been considered too broad but in which I show exactly why I think those two statements are somehow different and lead to different computation.
python tensorflow machine-learning keras
add a comment |
In the following Keras and Tensorflow implementations of the training of a neural network, how model.train_on_batch([x], [y])
in the keras implementation is different than sess.run([train_optimizer, cross_entropy, accuracy_op], feed_dict=feed_dict)
in the Tensorflow implementation? In particular: how those two lines can lead to different computation in training?:
keras_version.py
input_x = Input(shape=input_shape, name="x")
c = Dense(num_classes, activation="softmax")(input_x)
model = Model([input_x], [c])
opt = Adam(lr)
model.compile(loss=['categorical_crossentropy'], optimizer=opt)
nb_batchs = int(len(x_train)/batch_size)
for epoch in range(epochs):
loss = 0.0
for batch in range(nb_batchs):
x = x_train[batch*batch_size:(batch+1)*batch_size]
y = y_train[batch*batch_size:(batch+1)*batch_size]
loss_batch, acc_batch = model.train_on_batch([x], [y])
loss += loss_batch
print(epoch, loss / nb_batchs)
tensorflow_version.py
input_x = Input(shape=input_shape, name="x")
c = Dense(num_classes)(input_x)
input_y = tf.placeholder(tf.float32, shape=[None, num_classes], name="label")
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits_v2(labels=input_y, logits=c, name="xentropy"),
name="xentropy_mean"
)
train_optimizer = tf.train.AdamOptimizer(learning_rate=lr).minimize(cross_entropy)
nb_batchs = int(len(x_train)/batch_size)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for epoch in range(epochs):
loss = 0.0
acc = 0.0
for batch in range(nb_batchs):
x = x_train[batch*batch_size:(batch+1)*batch_size]
y = y_train[batch*batch_size:(batch+1)*batch_size]
feed_dict = {input_x: x,
input_y: y}
_, loss_batch = sess.run([train_optimizer, cross_entropy], feed_dict=feed_dict)
loss += loss_batch
print(epoch, loss / nb_batchs)
Note: This question follows Same (?) model converges in Keras but not in Tensorflow , which have been considered too broad but in which I show exactly why I think those two statements are somehow different and lead to different computation.
python tensorflow machine-learning keras
add a comment |
In the following Keras and Tensorflow implementations of the training of a neural network, how model.train_on_batch([x], [y])
in the keras implementation is different than sess.run([train_optimizer, cross_entropy, accuracy_op], feed_dict=feed_dict)
in the Tensorflow implementation? In particular: how those two lines can lead to different computation in training?:
keras_version.py
input_x = Input(shape=input_shape, name="x")
c = Dense(num_classes, activation="softmax")(input_x)
model = Model([input_x], [c])
opt = Adam(lr)
model.compile(loss=['categorical_crossentropy'], optimizer=opt)
nb_batchs = int(len(x_train)/batch_size)
for epoch in range(epochs):
loss = 0.0
for batch in range(nb_batchs):
x = x_train[batch*batch_size:(batch+1)*batch_size]
y = y_train[batch*batch_size:(batch+1)*batch_size]
loss_batch, acc_batch = model.train_on_batch([x], [y])
loss += loss_batch
print(epoch, loss / nb_batchs)
tensorflow_version.py
input_x = Input(shape=input_shape, name="x")
c = Dense(num_classes)(input_x)
input_y = tf.placeholder(tf.float32, shape=[None, num_classes], name="label")
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits_v2(labels=input_y, logits=c, name="xentropy"),
name="xentropy_mean"
)
train_optimizer = tf.train.AdamOptimizer(learning_rate=lr).minimize(cross_entropy)
nb_batchs = int(len(x_train)/batch_size)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for epoch in range(epochs):
loss = 0.0
acc = 0.0
for batch in range(nb_batchs):
x = x_train[batch*batch_size:(batch+1)*batch_size]
y = y_train[batch*batch_size:(batch+1)*batch_size]
feed_dict = {input_x: x,
input_y: y}
_, loss_batch = sess.run([train_optimizer, cross_entropy], feed_dict=feed_dict)
loss += loss_batch
print(epoch, loss / nb_batchs)
Note: This question follows Same (?) model converges in Keras but not in Tensorflow , which have been considered too broad but in which I show exactly why I think those two statements are somehow different and lead to different computation.
python tensorflow machine-learning keras
In the following Keras and Tensorflow implementations of the training of a neural network, how model.train_on_batch([x], [y])
in the keras implementation is different than sess.run([train_optimizer, cross_entropy, accuracy_op], feed_dict=feed_dict)
in the Tensorflow implementation? In particular: how those two lines can lead to different computation in training?:
keras_version.py
input_x = Input(shape=input_shape, name="x")
c = Dense(num_classes, activation="softmax")(input_x)
model = Model([input_x], [c])
opt = Adam(lr)
model.compile(loss=['categorical_crossentropy'], optimizer=opt)
nb_batchs = int(len(x_train)/batch_size)
for epoch in range(epochs):
loss = 0.0
for batch in range(nb_batchs):
x = x_train[batch*batch_size:(batch+1)*batch_size]
y = y_train[batch*batch_size:(batch+1)*batch_size]
loss_batch, acc_batch = model.train_on_batch([x], [y])
loss += loss_batch
print(epoch, loss / nb_batchs)
tensorflow_version.py
input_x = Input(shape=input_shape, name="x")
c = Dense(num_classes)(input_x)
input_y = tf.placeholder(tf.float32, shape=[None, num_classes], name="label")
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits_v2(labels=input_y, logits=c, name="xentropy"),
name="xentropy_mean"
)
train_optimizer = tf.train.AdamOptimizer(learning_rate=lr).minimize(cross_entropy)
nb_batchs = int(len(x_train)/batch_size)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for epoch in range(epochs):
loss = 0.0
acc = 0.0
for batch in range(nb_batchs):
x = x_train[batch*batch_size:(batch+1)*batch_size]
y = y_train[batch*batch_size:(batch+1)*batch_size]
feed_dict = {input_x: x,
input_y: y}
_, loss_batch = sess.run([train_optimizer, cross_entropy], feed_dict=feed_dict)
loss += loss_batch
print(epoch, loss / nb_batchs)
Note: This question follows Same (?) model converges in Keras but not in Tensorflow , which have been considered too broad but in which I show exactly why I think those two statements are somehow different and lead to different computation.
python tensorflow machine-learning keras
python tensorflow machine-learning keras
edited Nov 24 at 9:10
asked Nov 20 at 15:19
LucG
12911
12911
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Yes, the results can be different. The results shouldn't be surprising if you know the following things in advance:
- Implementation of
corss-entropy
in Tensorflow and Keras is different. Tensorflow assumes the input totf.nn.softmax_cross_entropy_with_logits_v2
as the raw unnormalized logits whileKeras
accepts inputs as probabilities - Implementation of
optimizers
in Keras and Tensorflow are different. - It might be the case that you are shuffling the data and the batches passed aren't in the same order. Although it doesn't matter if you run the model for long but initial few epochs can be entirely different. Make sure same batch is passed to both and then compare the results.
Can you elaborate on how the implementation of the optimizers are different? I have tried to compute and apply the gradient my self in the tensorflow version, which has not brought better results, I was still using the optimizer class though. Items 1 and 3 are not satisfying answers in that case because 1 I feed the tf optimizer with the output of a softmax operation, which I don't with the keras one and 3 the tf model never converges whern the keras one always does.
– LucG
Nov 24 at 14:03
Compare the source code for it. Plus 1) and 3) are totally relevant. Idk what made you them irrelevenat
– mlRocks
Nov 24 at 14:36
Yes, they are relevant. I meant they are not in my specific case because 1 I feed the logits in the tf loss computation while I feed probabilities in the keras loss and 3 this stand for one run but In my case, keras code always gives convergence while the tf one never does. Yes, thank you for the "compare the source code" advice. The whole question is about comparing source code, that's the point: I am not capable enough to understand the differences yet.
– LucG
Nov 25 at 6:21
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53396147%2fwhat-is-the-difference-between-model-train-on-batch-from-keras-and-session-run%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Yes, the results can be different. The results shouldn't be surprising if you know the following things in advance:
- Implementation of
corss-entropy
in Tensorflow and Keras is different. Tensorflow assumes the input totf.nn.softmax_cross_entropy_with_logits_v2
as the raw unnormalized logits whileKeras
accepts inputs as probabilities - Implementation of
optimizers
in Keras and Tensorflow are different. - It might be the case that you are shuffling the data and the batches passed aren't in the same order. Although it doesn't matter if you run the model for long but initial few epochs can be entirely different. Make sure same batch is passed to both and then compare the results.
Can you elaborate on how the implementation of the optimizers are different? I have tried to compute and apply the gradient my self in the tensorflow version, which has not brought better results, I was still using the optimizer class though. Items 1 and 3 are not satisfying answers in that case because 1 I feed the tf optimizer with the output of a softmax operation, which I don't with the keras one and 3 the tf model never converges whern the keras one always does.
– LucG
Nov 24 at 14:03
Compare the source code for it. Plus 1) and 3) are totally relevant. Idk what made you them irrelevenat
– mlRocks
Nov 24 at 14:36
Yes, they are relevant. I meant they are not in my specific case because 1 I feed the logits in the tf loss computation while I feed probabilities in the keras loss and 3 this stand for one run but In my case, keras code always gives convergence while the tf one never does. Yes, thank you for the "compare the source code" advice. The whole question is about comparing source code, that's the point: I am not capable enough to understand the differences yet.
– LucG
Nov 25 at 6:21
add a comment |
Yes, the results can be different. The results shouldn't be surprising if you know the following things in advance:
- Implementation of
corss-entropy
in Tensorflow and Keras is different. Tensorflow assumes the input totf.nn.softmax_cross_entropy_with_logits_v2
as the raw unnormalized logits whileKeras
accepts inputs as probabilities - Implementation of
optimizers
in Keras and Tensorflow are different. - It might be the case that you are shuffling the data and the batches passed aren't in the same order. Although it doesn't matter if you run the model for long but initial few epochs can be entirely different. Make sure same batch is passed to both and then compare the results.
Can you elaborate on how the implementation of the optimizers are different? I have tried to compute and apply the gradient my self in the tensorflow version, which has not brought better results, I was still using the optimizer class though. Items 1 and 3 are not satisfying answers in that case because 1 I feed the tf optimizer with the output of a softmax operation, which I don't with the keras one and 3 the tf model never converges whern the keras one always does.
– LucG
Nov 24 at 14:03
Compare the source code for it. Plus 1) and 3) are totally relevant. Idk what made you them irrelevenat
– mlRocks
Nov 24 at 14:36
Yes, they are relevant. I meant they are not in my specific case because 1 I feed the logits in the tf loss computation while I feed probabilities in the keras loss and 3 this stand for one run but In my case, keras code always gives convergence while the tf one never does. Yes, thank you for the "compare the source code" advice. The whole question is about comparing source code, that's the point: I am not capable enough to understand the differences yet.
– LucG
Nov 25 at 6:21
add a comment |
Yes, the results can be different. The results shouldn't be surprising if you know the following things in advance:
- Implementation of
corss-entropy
in Tensorflow and Keras is different. Tensorflow assumes the input totf.nn.softmax_cross_entropy_with_logits_v2
as the raw unnormalized logits whileKeras
accepts inputs as probabilities - Implementation of
optimizers
in Keras and Tensorflow are different. - It might be the case that you are shuffling the data and the batches passed aren't in the same order. Although it doesn't matter if you run the model for long but initial few epochs can be entirely different. Make sure same batch is passed to both and then compare the results.
Yes, the results can be different. The results shouldn't be surprising if you know the following things in advance:
- Implementation of
corss-entropy
in Tensorflow and Keras is different. Tensorflow assumes the input totf.nn.softmax_cross_entropy_with_logits_v2
as the raw unnormalized logits whileKeras
accepts inputs as probabilities - Implementation of
optimizers
in Keras and Tensorflow are different. - It might be the case that you are shuffling the data and the batches passed aren't in the same order. Although it doesn't matter if you run the model for long but initial few epochs can be entirely different. Make sure same batch is passed to both and then compare the results.
answered Nov 24 at 11:17
mlRocks
472113
472113
Can you elaborate on how the implementation of the optimizers are different? I have tried to compute and apply the gradient my self in the tensorflow version, which has not brought better results, I was still using the optimizer class though. Items 1 and 3 are not satisfying answers in that case because 1 I feed the tf optimizer with the output of a softmax operation, which I don't with the keras one and 3 the tf model never converges whern the keras one always does.
– LucG
Nov 24 at 14:03
Compare the source code for it. Plus 1) and 3) are totally relevant. Idk what made you them irrelevenat
– mlRocks
Nov 24 at 14:36
Yes, they are relevant. I meant they are not in my specific case because 1 I feed the logits in the tf loss computation while I feed probabilities in the keras loss and 3 this stand for one run but In my case, keras code always gives convergence while the tf one never does. Yes, thank you for the "compare the source code" advice. The whole question is about comparing source code, that's the point: I am not capable enough to understand the differences yet.
– LucG
Nov 25 at 6:21
add a comment |
Can you elaborate on how the implementation of the optimizers are different? I have tried to compute and apply the gradient my self in the tensorflow version, which has not brought better results, I was still using the optimizer class though. Items 1 and 3 are not satisfying answers in that case because 1 I feed the tf optimizer with the output of a softmax operation, which I don't with the keras one and 3 the tf model never converges whern the keras one always does.
– LucG
Nov 24 at 14:03
Compare the source code for it. Plus 1) and 3) are totally relevant. Idk what made you them irrelevenat
– mlRocks
Nov 24 at 14:36
Yes, they are relevant. I meant they are not in my specific case because 1 I feed the logits in the tf loss computation while I feed probabilities in the keras loss and 3 this stand for one run but In my case, keras code always gives convergence while the tf one never does. Yes, thank you for the "compare the source code" advice. The whole question is about comparing source code, that's the point: I am not capable enough to understand the differences yet.
– LucG
Nov 25 at 6:21
Can you elaborate on how the implementation of the optimizers are different? I have tried to compute and apply the gradient my self in the tensorflow version, which has not brought better results, I was still using the optimizer class though. Items 1 and 3 are not satisfying answers in that case because 1 I feed the tf optimizer with the output of a softmax operation, which I don't with the keras one and 3 the tf model never converges whern the keras one always does.
– LucG
Nov 24 at 14:03
Can you elaborate on how the implementation of the optimizers are different? I have tried to compute and apply the gradient my self in the tensorflow version, which has not brought better results, I was still using the optimizer class though. Items 1 and 3 are not satisfying answers in that case because 1 I feed the tf optimizer with the output of a softmax operation, which I don't with the keras one and 3 the tf model never converges whern the keras one always does.
– LucG
Nov 24 at 14:03
Compare the source code for it. Plus 1) and 3) are totally relevant. Idk what made you them irrelevenat
– mlRocks
Nov 24 at 14:36
Compare the source code for it. Plus 1) and 3) are totally relevant. Idk what made you them irrelevenat
– mlRocks
Nov 24 at 14:36
Yes, they are relevant. I meant they are not in my specific case because 1 I feed the logits in the tf loss computation while I feed probabilities in the keras loss and 3 this stand for one run but In my case, keras code always gives convergence while the tf one never does. Yes, thank you for the "compare the source code" advice. The whole question is about comparing source code, that's the point: I am not capable enough to understand the differences yet.
– LucG
Nov 25 at 6:21
Yes, they are relevant. I meant they are not in my specific case because 1 I feed the logits in the tf loss computation while I feed probabilities in the keras loss and 3 this stand for one run but In my case, keras code always gives convergence while the tf one never does. Yes, thank you for the "compare the source code" advice. The whole question is about comparing source code, that's the point: I am not capable enough to understand the differences yet.
– LucG
Nov 25 at 6:21
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53396147%2fwhat-is-the-difference-between-model-train-on-batch-from-keras-and-session-run%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown