What is the difference between Model.train_on_batch from keras and Session.run([train_optimizer]) from...












8














In the following Keras and Tensorflow implementations of the training of a neural network, how model.train_on_batch([x], [y]) in the keras implementation is different than sess.run([train_optimizer, cross_entropy, accuracy_op], feed_dict=feed_dict) in the Tensorflow implementation? In particular: how those two lines can lead to different computation in training?:



keras_version.py



input_x = Input(shape=input_shape, name="x")
c = Dense(num_classes, activation="softmax")(input_x)

model = Model([input_x], [c])
opt = Adam(lr)
model.compile(loss=['categorical_crossentropy'], optimizer=opt)

nb_batchs = int(len(x_train)/batch_size)

for epoch in range(epochs):
loss = 0.0
for batch in range(nb_batchs):
x = x_train[batch*batch_size:(batch+1)*batch_size]
y = y_train[batch*batch_size:(batch+1)*batch_size]

loss_batch, acc_batch = model.train_on_batch([x], [y])

loss += loss_batch
print(epoch, loss / nb_batchs)


tensorflow_version.py



input_x = Input(shape=input_shape, name="x")
c = Dense(num_classes)(input_x)

input_y = tf.placeholder(tf.float32, shape=[None, num_classes], name="label")
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits_v2(labels=input_y, logits=c, name="xentropy"),
name="xentropy_mean"
)
train_optimizer = tf.train.AdamOptimizer(learning_rate=lr).minimize(cross_entropy)

nb_batchs = int(len(x_train)/batch_size)

init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for epoch in range(epochs):
loss = 0.0
acc = 0.0

for batch in range(nb_batchs):
x = x_train[batch*batch_size:(batch+1)*batch_size]
y = y_train[batch*batch_size:(batch+1)*batch_size]

feed_dict = {input_x: x,
input_y: y}
_, loss_batch = sess.run([train_optimizer, cross_entropy], feed_dict=feed_dict)

loss += loss_batch
print(epoch, loss / nb_batchs)


Note: This question follows Same (?) model converges in Keras but not in Tensorflow , which have been considered too broad but in which I show exactly why I think those two statements are somehow different and lead to different computation.










share|improve this question





























    8














    In the following Keras and Tensorflow implementations of the training of a neural network, how model.train_on_batch([x], [y]) in the keras implementation is different than sess.run([train_optimizer, cross_entropy, accuracy_op], feed_dict=feed_dict) in the Tensorflow implementation? In particular: how those two lines can lead to different computation in training?:



    keras_version.py



    input_x = Input(shape=input_shape, name="x")
    c = Dense(num_classes, activation="softmax")(input_x)

    model = Model([input_x], [c])
    opt = Adam(lr)
    model.compile(loss=['categorical_crossentropy'], optimizer=opt)

    nb_batchs = int(len(x_train)/batch_size)

    for epoch in range(epochs):
    loss = 0.0
    for batch in range(nb_batchs):
    x = x_train[batch*batch_size:(batch+1)*batch_size]
    y = y_train[batch*batch_size:(batch+1)*batch_size]

    loss_batch, acc_batch = model.train_on_batch([x], [y])

    loss += loss_batch
    print(epoch, loss / nb_batchs)


    tensorflow_version.py



    input_x = Input(shape=input_shape, name="x")
    c = Dense(num_classes)(input_x)

    input_y = tf.placeholder(tf.float32, shape=[None, num_classes], name="label")
    cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits_v2(labels=input_y, logits=c, name="xentropy"),
    name="xentropy_mean"
    )
    train_optimizer = tf.train.AdamOptimizer(learning_rate=lr).minimize(cross_entropy)

    nb_batchs = int(len(x_train)/batch_size)

    init = tf.global_variables_initializer()
    with tf.Session() as sess:
    sess.run(init)
    for epoch in range(epochs):
    loss = 0.0
    acc = 0.0

    for batch in range(nb_batchs):
    x = x_train[batch*batch_size:(batch+1)*batch_size]
    y = y_train[batch*batch_size:(batch+1)*batch_size]

    feed_dict = {input_x: x,
    input_y: y}
    _, loss_batch = sess.run([train_optimizer, cross_entropy], feed_dict=feed_dict)

    loss += loss_batch
    print(epoch, loss / nb_batchs)


    Note: This question follows Same (?) model converges in Keras but not in Tensorflow , which have been considered too broad but in which I show exactly why I think those two statements are somehow different and lead to different computation.










    share|improve this question



























      8












      8








      8







      In the following Keras and Tensorflow implementations of the training of a neural network, how model.train_on_batch([x], [y]) in the keras implementation is different than sess.run([train_optimizer, cross_entropy, accuracy_op], feed_dict=feed_dict) in the Tensorflow implementation? In particular: how those two lines can lead to different computation in training?:



      keras_version.py



      input_x = Input(shape=input_shape, name="x")
      c = Dense(num_classes, activation="softmax")(input_x)

      model = Model([input_x], [c])
      opt = Adam(lr)
      model.compile(loss=['categorical_crossentropy'], optimizer=opt)

      nb_batchs = int(len(x_train)/batch_size)

      for epoch in range(epochs):
      loss = 0.0
      for batch in range(nb_batchs):
      x = x_train[batch*batch_size:(batch+1)*batch_size]
      y = y_train[batch*batch_size:(batch+1)*batch_size]

      loss_batch, acc_batch = model.train_on_batch([x], [y])

      loss += loss_batch
      print(epoch, loss / nb_batchs)


      tensorflow_version.py



      input_x = Input(shape=input_shape, name="x")
      c = Dense(num_classes)(input_x)

      input_y = tf.placeholder(tf.float32, shape=[None, num_classes], name="label")
      cross_entropy = tf.reduce_mean(
      tf.nn.softmax_cross_entropy_with_logits_v2(labels=input_y, logits=c, name="xentropy"),
      name="xentropy_mean"
      )
      train_optimizer = tf.train.AdamOptimizer(learning_rate=lr).minimize(cross_entropy)

      nb_batchs = int(len(x_train)/batch_size)

      init = tf.global_variables_initializer()
      with tf.Session() as sess:
      sess.run(init)
      for epoch in range(epochs):
      loss = 0.0
      acc = 0.0

      for batch in range(nb_batchs):
      x = x_train[batch*batch_size:(batch+1)*batch_size]
      y = y_train[batch*batch_size:(batch+1)*batch_size]

      feed_dict = {input_x: x,
      input_y: y}
      _, loss_batch = sess.run([train_optimizer, cross_entropy], feed_dict=feed_dict)

      loss += loss_batch
      print(epoch, loss / nb_batchs)


      Note: This question follows Same (?) model converges in Keras but not in Tensorflow , which have been considered too broad but in which I show exactly why I think those two statements are somehow different and lead to different computation.










      share|improve this question















      In the following Keras and Tensorflow implementations of the training of a neural network, how model.train_on_batch([x], [y]) in the keras implementation is different than sess.run([train_optimizer, cross_entropy, accuracy_op], feed_dict=feed_dict) in the Tensorflow implementation? In particular: how those two lines can lead to different computation in training?:



      keras_version.py



      input_x = Input(shape=input_shape, name="x")
      c = Dense(num_classes, activation="softmax")(input_x)

      model = Model([input_x], [c])
      opt = Adam(lr)
      model.compile(loss=['categorical_crossentropy'], optimizer=opt)

      nb_batchs = int(len(x_train)/batch_size)

      for epoch in range(epochs):
      loss = 0.0
      for batch in range(nb_batchs):
      x = x_train[batch*batch_size:(batch+1)*batch_size]
      y = y_train[batch*batch_size:(batch+1)*batch_size]

      loss_batch, acc_batch = model.train_on_batch([x], [y])

      loss += loss_batch
      print(epoch, loss / nb_batchs)


      tensorflow_version.py



      input_x = Input(shape=input_shape, name="x")
      c = Dense(num_classes)(input_x)

      input_y = tf.placeholder(tf.float32, shape=[None, num_classes], name="label")
      cross_entropy = tf.reduce_mean(
      tf.nn.softmax_cross_entropy_with_logits_v2(labels=input_y, logits=c, name="xentropy"),
      name="xentropy_mean"
      )
      train_optimizer = tf.train.AdamOptimizer(learning_rate=lr).minimize(cross_entropy)

      nb_batchs = int(len(x_train)/batch_size)

      init = tf.global_variables_initializer()
      with tf.Session() as sess:
      sess.run(init)
      for epoch in range(epochs):
      loss = 0.0
      acc = 0.0

      for batch in range(nb_batchs):
      x = x_train[batch*batch_size:(batch+1)*batch_size]
      y = y_train[batch*batch_size:(batch+1)*batch_size]

      feed_dict = {input_x: x,
      input_y: y}
      _, loss_batch = sess.run([train_optimizer, cross_entropy], feed_dict=feed_dict)

      loss += loss_batch
      print(epoch, loss / nb_batchs)


      Note: This question follows Same (?) model converges in Keras but not in Tensorflow , which have been considered too broad but in which I show exactly why I think those two statements are somehow different and lead to different computation.







      python tensorflow machine-learning keras






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 24 at 9:10

























      asked Nov 20 at 15:19









      LucG

      12911




      12911
























          1 Answer
          1






          active

          oldest

          votes


















          6





          +25









          Yes, the results can be different. The results shouldn't be surprising if you know the following things in advance:




          1. Implementation of corss-entropy in Tensorflow and Keras is different. Tensorflow assumes the input to tf.nn.softmax_cross_entropy_with_logits_v2 as the raw unnormalized logits while Keras accepts inputs as probabilities

          2. Implementation of optimizers in Keras and Tensorflow are different.

          3. It might be the case that you are shuffling the data and the batches passed aren't in the same order. Although it doesn't matter if you run the model for long but initial few epochs can be entirely different. Make sure same batch is passed to both and then compare the results.






          share|improve this answer





















          • Can you elaborate on how the implementation of the optimizers are different? I have tried to compute and apply the gradient my self in the tensorflow version, which has not brought better results, I was still using the optimizer class though. Items 1 and 3 are not satisfying answers in that case because 1 I feed the tf optimizer with the output of a softmax operation, which I don't with the keras one and 3 the tf model never converges whern the keras one always does.
            – LucG
            Nov 24 at 14:03










          • Compare the source code for it. Plus 1) and 3) are totally relevant. Idk what made you them irrelevenat
            – mlRocks
            Nov 24 at 14:36










          • Yes, they are relevant. I meant they are not in my specific case because 1 I feed the logits in the tf loss computation while I feed probabilities in the keras loss and 3 this stand for one run but In my case, keras code always gives convergence while the tf one never does. Yes, thank you for the "compare the source code" advice. The whole question is about comparing source code, that's the point: I am not capable enough to understand the differences yet.
            – LucG
            Nov 25 at 6:21













          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53396147%2fwhat-is-the-difference-between-model-train-on-batch-from-keras-and-session-run%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          6





          +25









          Yes, the results can be different. The results shouldn't be surprising if you know the following things in advance:




          1. Implementation of corss-entropy in Tensorflow and Keras is different. Tensorflow assumes the input to tf.nn.softmax_cross_entropy_with_logits_v2 as the raw unnormalized logits while Keras accepts inputs as probabilities

          2. Implementation of optimizers in Keras and Tensorflow are different.

          3. It might be the case that you are shuffling the data and the batches passed aren't in the same order. Although it doesn't matter if you run the model for long but initial few epochs can be entirely different. Make sure same batch is passed to both and then compare the results.






          share|improve this answer





















          • Can you elaborate on how the implementation of the optimizers are different? I have tried to compute and apply the gradient my self in the tensorflow version, which has not brought better results, I was still using the optimizer class though. Items 1 and 3 are not satisfying answers in that case because 1 I feed the tf optimizer with the output of a softmax operation, which I don't with the keras one and 3 the tf model never converges whern the keras one always does.
            – LucG
            Nov 24 at 14:03










          • Compare the source code for it. Plus 1) and 3) are totally relevant. Idk what made you them irrelevenat
            – mlRocks
            Nov 24 at 14:36










          • Yes, they are relevant. I meant they are not in my specific case because 1 I feed the logits in the tf loss computation while I feed probabilities in the keras loss and 3 this stand for one run but In my case, keras code always gives convergence while the tf one never does. Yes, thank you for the "compare the source code" advice. The whole question is about comparing source code, that's the point: I am not capable enough to understand the differences yet.
            – LucG
            Nov 25 at 6:21


















          6





          +25









          Yes, the results can be different. The results shouldn't be surprising if you know the following things in advance:




          1. Implementation of corss-entropy in Tensorflow and Keras is different. Tensorflow assumes the input to tf.nn.softmax_cross_entropy_with_logits_v2 as the raw unnormalized logits while Keras accepts inputs as probabilities

          2. Implementation of optimizers in Keras and Tensorflow are different.

          3. It might be the case that you are shuffling the data and the batches passed aren't in the same order. Although it doesn't matter if you run the model for long but initial few epochs can be entirely different. Make sure same batch is passed to both and then compare the results.






          share|improve this answer





















          • Can you elaborate on how the implementation of the optimizers are different? I have tried to compute and apply the gradient my self in the tensorflow version, which has not brought better results, I was still using the optimizer class though. Items 1 and 3 are not satisfying answers in that case because 1 I feed the tf optimizer with the output of a softmax operation, which I don't with the keras one and 3 the tf model never converges whern the keras one always does.
            – LucG
            Nov 24 at 14:03










          • Compare the source code for it. Plus 1) and 3) are totally relevant. Idk what made you them irrelevenat
            – mlRocks
            Nov 24 at 14:36










          • Yes, they are relevant. I meant they are not in my specific case because 1 I feed the logits in the tf loss computation while I feed probabilities in the keras loss and 3 this stand for one run but In my case, keras code always gives convergence while the tf one never does. Yes, thank you for the "compare the source code" advice. The whole question is about comparing source code, that's the point: I am not capable enough to understand the differences yet.
            – LucG
            Nov 25 at 6:21
















          6





          +25







          6





          +25



          6




          +25




          Yes, the results can be different. The results shouldn't be surprising if you know the following things in advance:




          1. Implementation of corss-entropy in Tensorflow and Keras is different. Tensorflow assumes the input to tf.nn.softmax_cross_entropy_with_logits_v2 as the raw unnormalized logits while Keras accepts inputs as probabilities

          2. Implementation of optimizers in Keras and Tensorflow are different.

          3. It might be the case that you are shuffling the data and the batches passed aren't in the same order. Although it doesn't matter if you run the model for long but initial few epochs can be entirely different. Make sure same batch is passed to both and then compare the results.






          share|improve this answer












          Yes, the results can be different. The results shouldn't be surprising if you know the following things in advance:




          1. Implementation of corss-entropy in Tensorflow and Keras is different. Tensorflow assumes the input to tf.nn.softmax_cross_entropy_with_logits_v2 as the raw unnormalized logits while Keras accepts inputs as probabilities

          2. Implementation of optimizers in Keras and Tensorflow are different.

          3. It might be the case that you are shuffling the data and the batches passed aren't in the same order. Although it doesn't matter if you run the model for long but initial few epochs can be entirely different. Make sure same batch is passed to both and then compare the results.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 24 at 11:17









          mlRocks

          472113




          472113












          • Can you elaborate on how the implementation of the optimizers are different? I have tried to compute and apply the gradient my self in the tensorflow version, which has not brought better results, I was still using the optimizer class though. Items 1 and 3 are not satisfying answers in that case because 1 I feed the tf optimizer with the output of a softmax operation, which I don't with the keras one and 3 the tf model never converges whern the keras one always does.
            – LucG
            Nov 24 at 14:03










          • Compare the source code for it. Plus 1) and 3) are totally relevant. Idk what made you them irrelevenat
            – mlRocks
            Nov 24 at 14:36










          • Yes, they are relevant. I meant they are not in my specific case because 1 I feed the logits in the tf loss computation while I feed probabilities in the keras loss and 3 this stand for one run but In my case, keras code always gives convergence while the tf one never does. Yes, thank you for the "compare the source code" advice. The whole question is about comparing source code, that's the point: I am not capable enough to understand the differences yet.
            – LucG
            Nov 25 at 6:21




















          • Can you elaborate on how the implementation of the optimizers are different? I have tried to compute and apply the gradient my self in the tensorflow version, which has not brought better results, I was still using the optimizer class though. Items 1 and 3 are not satisfying answers in that case because 1 I feed the tf optimizer with the output of a softmax operation, which I don't with the keras one and 3 the tf model never converges whern the keras one always does.
            – LucG
            Nov 24 at 14:03










          • Compare the source code for it. Plus 1) and 3) are totally relevant. Idk what made you them irrelevenat
            – mlRocks
            Nov 24 at 14:36










          • Yes, they are relevant. I meant they are not in my specific case because 1 I feed the logits in the tf loss computation while I feed probabilities in the keras loss and 3 this stand for one run but In my case, keras code always gives convergence while the tf one never does. Yes, thank you for the "compare the source code" advice. The whole question is about comparing source code, that's the point: I am not capable enough to understand the differences yet.
            – LucG
            Nov 25 at 6:21


















          Can you elaborate on how the implementation of the optimizers are different? I have tried to compute and apply the gradient my self in the tensorflow version, which has not brought better results, I was still using the optimizer class though. Items 1 and 3 are not satisfying answers in that case because 1 I feed the tf optimizer with the output of a softmax operation, which I don't with the keras one and 3 the tf model never converges whern the keras one always does.
          – LucG
          Nov 24 at 14:03




          Can you elaborate on how the implementation of the optimizers are different? I have tried to compute and apply the gradient my self in the tensorflow version, which has not brought better results, I was still using the optimizer class though. Items 1 and 3 are not satisfying answers in that case because 1 I feed the tf optimizer with the output of a softmax operation, which I don't with the keras one and 3 the tf model never converges whern the keras one always does.
          – LucG
          Nov 24 at 14:03












          Compare the source code for it. Plus 1) and 3) are totally relevant. Idk what made you them irrelevenat
          – mlRocks
          Nov 24 at 14:36




          Compare the source code for it. Plus 1) and 3) are totally relevant. Idk what made you them irrelevenat
          – mlRocks
          Nov 24 at 14:36












          Yes, they are relevant. I meant they are not in my specific case because 1 I feed the logits in the tf loss computation while I feed probabilities in the keras loss and 3 this stand for one run but In my case, keras code always gives convergence while the tf one never does. Yes, thank you for the "compare the source code" advice. The whole question is about comparing source code, that's the point: I am not capable enough to understand the differences yet.
          – LucG
          Nov 25 at 6:21






          Yes, they are relevant. I meant they are not in my specific case because 1 I feed the logits in the tf loss computation while I feed probabilities in the keras loss and 3 this stand for one run but In my case, keras code always gives convergence while the tf one never does. Yes, thank you for the "compare the source code" advice. The whole question is about comparing source code, that's the point: I am not capable enough to understand the differences yet.
          – LucG
          Nov 25 at 6:21




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53396147%2fwhat-is-the-difference-between-model-train-on-batch-from-keras-and-session-run%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Create new schema in PostgreSQL using DBeaver

          Deepest pit of an array with Javascript: test on Codility

          Fotorealismo