Understanding Keras LSTM NN input & output for binary classification












0















I am trying to create a simple LSTM network that would - based on the last 16 time frames - provide some output. Let's say I have a dataset with 112000 rows (measurements) and 7 columns (6 features + class). What I understand is that I have to "pack" the dataset into X number of 16 elements long batches. With 112000 rows that would mean 112000/16 = 7000 batches, therefore a numpy 3D array with shape (7000, 16, 7). Splitting this array for train and test data I get shapes:



xtrain.shape == (5000, 16, 6)
ytrain.shape == (5000, 16)
xtest.shape == (2000, 16, 6)
ytest.shape == (2000, 16)


My model looks like this:



model.add(keras.layers.LSTM(8, input_shape=(16, 6), stateful=True, batch_size=16, name="input"));
model.add(keras.layers.Dense(5, activation="relu", name="hidden1"));
model.add(keras.layers.Dense(1, activation="sigmoid", name="output"));
model.compile(optimizer="rmsprop", loss="binary_crossentropy", metrics=["accuracy"]);

model.fit(xtrain, ytrain, batch_size=16, epochs=10);


However after trying to fit the model I get this error:




ValueError: Error when checking target: expected output to have shape (1,) but got array with shape (16,)




What I guess is wrong is that the model expects a single output per batch (so the ytrain shape should be (5000,)), instead of 16 outputs (one for every entry in a batch - (5000, 16)).



If that is the case, should I, instead of packing the data like this, create a 16 elements long batch for every output? Therefore having



xtrain.shape == (80000, 16, 6)
ytrain.shape == (80000,)
xtest.shape == (32000, 16, 6)
ytest.shape == (32000,)









share|improve this question



























    0















    I am trying to create a simple LSTM network that would - based on the last 16 time frames - provide some output. Let's say I have a dataset with 112000 rows (measurements) and 7 columns (6 features + class). What I understand is that I have to "pack" the dataset into X number of 16 elements long batches. With 112000 rows that would mean 112000/16 = 7000 batches, therefore a numpy 3D array with shape (7000, 16, 7). Splitting this array for train and test data I get shapes:



    xtrain.shape == (5000, 16, 6)
    ytrain.shape == (5000, 16)
    xtest.shape == (2000, 16, 6)
    ytest.shape == (2000, 16)


    My model looks like this:



    model.add(keras.layers.LSTM(8, input_shape=(16, 6), stateful=True, batch_size=16, name="input"));
    model.add(keras.layers.Dense(5, activation="relu", name="hidden1"));
    model.add(keras.layers.Dense(1, activation="sigmoid", name="output"));
    model.compile(optimizer="rmsprop", loss="binary_crossentropy", metrics=["accuracy"]);

    model.fit(xtrain, ytrain, batch_size=16, epochs=10);


    However after trying to fit the model I get this error:




    ValueError: Error when checking target: expected output to have shape (1,) but got array with shape (16,)




    What I guess is wrong is that the model expects a single output per batch (so the ytrain shape should be (5000,)), instead of 16 outputs (one for every entry in a batch - (5000, 16)).



    If that is the case, should I, instead of packing the data like this, create a 16 elements long batch for every output? Therefore having



    xtrain.shape == (80000, 16, 6)
    ytrain.shape == (80000,)
    xtest.shape == (32000, 16, 6)
    ytest.shape == (32000,)









    share|improve this question

























      0












      0








      0








      I am trying to create a simple LSTM network that would - based on the last 16 time frames - provide some output. Let's say I have a dataset with 112000 rows (measurements) and 7 columns (6 features + class). What I understand is that I have to "pack" the dataset into X number of 16 elements long batches. With 112000 rows that would mean 112000/16 = 7000 batches, therefore a numpy 3D array with shape (7000, 16, 7). Splitting this array for train and test data I get shapes:



      xtrain.shape == (5000, 16, 6)
      ytrain.shape == (5000, 16)
      xtest.shape == (2000, 16, 6)
      ytest.shape == (2000, 16)


      My model looks like this:



      model.add(keras.layers.LSTM(8, input_shape=(16, 6), stateful=True, batch_size=16, name="input"));
      model.add(keras.layers.Dense(5, activation="relu", name="hidden1"));
      model.add(keras.layers.Dense(1, activation="sigmoid", name="output"));
      model.compile(optimizer="rmsprop", loss="binary_crossentropy", metrics=["accuracy"]);

      model.fit(xtrain, ytrain, batch_size=16, epochs=10);


      However after trying to fit the model I get this error:




      ValueError: Error when checking target: expected output to have shape (1,) but got array with shape (16,)




      What I guess is wrong is that the model expects a single output per batch (so the ytrain shape should be (5000,)), instead of 16 outputs (one for every entry in a batch - (5000, 16)).



      If that is the case, should I, instead of packing the data like this, create a 16 elements long batch for every output? Therefore having



      xtrain.shape == (80000, 16, 6)
      ytrain.shape == (80000,)
      xtest.shape == (32000, 16, 6)
      ytest.shape == (32000,)









      share|improve this question














      I am trying to create a simple LSTM network that would - based on the last 16 time frames - provide some output. Let's say I have a dataset with 112000 rows (measurements) and 7 columns (6 features + class). What I understand is that I have to "pack" the dataset into X number of 16 elements long batches. With 112000 rows that would mean 112000/16 = 7000 batches, therefore a numpy 3D array with shape (7000, 16, 7). Splitting this array for train and test data I get shapes:



      xtrain.shape == (5000, 16, 6)
      ytrain.shape == (5000, 16)
      xtest.shape == (2000, 16, 6)
      ytest.shape == (2000, 16)


      My model looks like this:



      model.add(keras.layers.LSTM(8, input_shape=(16, 6), stateful=True, batch_size=16, name="input"));
      model.add(keras.layers.Dense(5, activation="relu", name="hidden1"));
      model.add(keras.layers.Dense(1, activation="sigmoid", name="output"));
      model.compile(optimizer="rmsprop", loss="binary_crossentropy", metrics=["accuracy"]);

      model.fit(xtrain, ytrain, batch_size=16, epochs=10);


      However after trying to fit the model I get this error:




      ValueError: Error when checking target: expected output to have shape (1,) but got array with shape (16,)




      What I guess is wrong is that the model expects a single output per batch (so the ytrain shape should be (5000,)), instead of 16 outputs (one for every entry in a batch - (5000, 16)).



      If that is the case, should I, instead of packing the data like this, create a 16 elements long batch for every output? Therefore having



      xtrain.shape == (80000, 16, 6)
      ytrain.shape == (80000,)
      xtest.shape == (32000, 16, 6)
      ytest.shape == (32000,)






      python tensorflow keras lstm






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 22 '18 at 23:31









      SEnergySEnergy

      137




      137
























          1 Answer
          1






          active

          oldest

          votes


















          1














          You are close with the last comments of the question. Since it's a binary classification problem, you should have 1 output per input, so you need to get rid of the 16 in you ys and replace it for a 1.



          Besides, you need to be able to divide the train set by your batch size, so you can use 5008 for example.



          In fact:



          ytrain.shape == (5000, 1)


          Passes the error you mention, but raises a new one:




          ValueError: In a stateful network, you should only pass inputs with a number of samples that can be divided by the batch size. Found: 5000 samples




          Which is addressed by ensuring that:



          xtrain.shape == (5008, 16, 6)
          ytrain.shape == (5008, 1)





          share|improve this answer
























          • so, considering I have 112000 rows of data, and I want to train the LSTM on as many rows as possible, should I create 111984 "packs" of 16 rows data that I feed into the LSTM? therefore having (111984, 16, 6) as input and (111984, 1) as output... I want to train a LSTM for class on every row, but that class requires information about last 16 time frames, so for the 16th (and first) row I need information about 0-15 rows, for 17th I need information about 1-16 rows etc, therefore 112000-16 = 11984 packs of 16 element long data?

            – SEnergy
            Nov 23 '18 at 16:32











          • I'm not sure I'm following you. You have n train samples. n should be divisible by the batch size, but you can fit the n samples and Keras will take care of the "batch" splitting. In turn, each train sample is a sequence of features. Each element of the sequence can be interpreted as a time step. And you have f features to describe that time step. Besides, for each train sample (consisting of a sequence of features), you have one unique y. This is the schema for a binary classification of a sequence.

            – Julian Peller
            Nov 23 '18 at 20:04











          • Assume I have n train samples (rows), where every sample has 3 features f (columns). Features f_n0 and f_n1 are input and f_n2 is an output. This f_n2 output, however, should be based on the last 16 rows (time frames) only, nothing before that time frame. Assume 1 time frame = 1 second: the NN tries to predict the output based on what happened in the last 16 seconds. Assuming I have 112 000 train samples (n = 112000) with 7 features (f = 7), and LSTM works with a 3D array, would be the resulting array in shape of (n-16, 16, f), or rather (n/16, 16, f) ?

            – SEnergy
            Nov 23 '18 at 20:41











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53439070%2funderstanding-keras-lstm-nn-input-output-for-binary-classification%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          You are close with the last comments of the question. Since it's a binary classification problem, you should have 1 output per input, so you need to get rid of the 16 in you ys and replace it for a 1.



          Besides, you need to be able to divide the train set by your batch size, so you can use 5008 for example.



          In fact:



          ytrain.shape == (5000, 1)


          Passes the error you mention, but raises a new one:




          ValueError: In a stateful network, you should only pass inputs with a number of samples that can be divided by the batch size. Found: 5000 samples




          Which is addressed by ensuring that:



          xtrain.shape == (5008, 16, 6)
          ytrain.shape == (5008, 1)





          share|improve this answer
























          • so, considering I have 112000 rows of data, and I want to train the LSTM on as many rows as possible, should I create 111984 "packs" of 16 rows data that I feed into the LSTM? therefore having (111984, 16, 6) as input and (111984, 1) as output... I want to train a LSTM for class on every row, but that class requires information about last 16 time frames, so for the 16th (and first) row I need information about 0-15 rows, for 17th I need information about 1-16 rows etc, therefore 112000-16 = 11984 packs of 16 element long data?

            – SEnergy
            Nov 23 '18 at 16:32











          • I'm not sure I'm following you. You have n train samples. n should be divisible by the batch size, but you can fit the n samples and Keras will take care of the "batch" splitting. In turn, each train sample is a sequence of features. Each element of the sequence can be interpreted as a time step. And you have f features to describe that time step. Besides, for each train sample (consisting of a sequence of features), you have one unique y. This is the schema for a binary classification of a sequence.

            – Julian Peller
            Nov 23 '18 at 20:04











          • Assume I have n train samples (rows), where every sample has 3 features f (columns). Features f_n0 and f_n1 are input and f_n2 is an output. This f_n2 output, however, should be based on the last 16 rows (time frames) only, nothing before that time frame. Assume 1 time frame = 1 second: the NN tries to predict the output based on what happened in the last 16 seconds. Assuming I have 112 000 train samples (n = 112000) with 7 features (f = 7), and LSTM works with a 3D array, would be the resulting array in shape of (n-16, 16, f), or rather (n/16, 16, f) ?

            – SEnergy
            Nov 23 '18 at 20:41
















          1














          You are close with the last comments of the question. Since it's a binary classification problem, you should have 1 output per input, so you need to get rid of the 16 in you ys and replace it for a 1.



          Besides, you need to be able to divide the train set by your batch size, so you can use 5008 for example.



          In fact:



          ytrain.shape == (5000, 1)


          Passes the error you mention, but raises a new one:




          ValueError: In a stateful network, you should only pass inputs with a number of samples that can be divided by the batch size. Found: 5000 samples




          Which is addressed by ensuring that:



          xtrain.shape == (5008, 16, 6)
          ytrain.shape == (5008, 1)





          share|improve this answer
























          • so, considering I have 112000 rows of data, and I want to train the LSTM on as many rows as possible, should I create 111984 "packs" of 16 rows data that I feed into the LSTM? therefore having (111984, 16, 6) as input and (111984, 1) as output... I want to train a LSTM for class on every row, but that class requires information about last 16 time frames, so for the 16th (and first) row I need information about 0-15 rows, for 17th I need information about 1-16 rows etc, therefore 112000-16 = 11984 packs of 16 element long data?

            – SEnergy
            Nov 23 '18 at 16:32











          • I'm not sure I'm following you. You have n train samples. n should be divisible by the batch size, but you can fit the n samples and Keras will take care of the "batch" splitting. In turn, each train sample is a sequence of features. Each element of the sequence can be interpreted as a time step. And you have f features to describe that time step. Besides, for each train sample (consisting of a sequence of features), you have one unique y. This is the schema for a binary classification of a sequence.

            – Julian Peller
            Nov 23 '18 at 20:04











          • Assume I have n train samples (rows), where every sample has 3 features f (columns). Features f_n0 and f_n1 are input and f_n2 is an output. This f_n2 output, however, should be based on the last 16 rows (time frames) only, nothing before that time frame. Assume 1 time frame = 1 second: the NN tries to predict the output based on what happened in the last 16 seconds. Assuming I have 112 000 train samples (n = 112000) with 7 features (f = 7), and LSTM works with a 3D array, would be the resulting array in shape of (n-16, 16, f), or rather (n/16, 16, f) ?

            – SEnergy
            Nov 23 '18 at 20:41














          1












          1








          1







          You are close with the last comments of the question. Since it's a binary classification problem, you should have 1 output per input, so you need to get rid of the 16 in you ys and replace it for a 1.



          Besides, you need to be able to divide the train set by your batch size, so you can use 5008 for example.



          In fact:



          ytrain.shape == (5000, 1)


          Passes the error you mention, but raises a new one:




          ValueError: In a stateful network, you should only pass inputs with a number of samples that can be divided by the batch size. Found: 5000 samples




          Which is addressed by ensuring that:



          xtrain.shape == (5008, 16, 6)
          ytrain.shape == (5008, 1)





          share|improve this answer













          You are close with the last comments of the question. Since it's a binary classification problem, you should have 1 output per input, so you need to get rid of the 16 in you ys and replace it for a 1.



          Besides, you need to be able to divide the train set by your batch size, so you can use 5008 for example.



          In fact:



          ytrain.shape == (5000, 1)


          Passes the error you mention, but raises a new one:




          ValueError: In a stateful network, you should only pass inputs with a number of samples that can be divided by the batch size. Found: 5000 samples




          Which is addressed by ensuring that:



          xtrain.shape == (5008, 16, 6)
          ytrain.shape == (5008, 1)






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 22 '18 at 23:56









          Julian PellerJulian Peller

          8941511




          8941511













          • so, considering I have 112000 rows of data, and I want to train the LSTM on as many rows as possible, should I create 111984 "packs" of 16 rows data that I feed into the LSTM? therefore having (111984, 16, 6) as input and (111984, 1) as output... I want to train a LSTM for class on every row, but that class requires information about last 16 time frames, so for the 16th (and first) row I need information about 0-15 rows, for 17th I need information about 1-16 rows etc, therefore 112000-16 = 11984 packs of 16 element long data?

            – SEnergy
            Nov 23 '18 at 16:32











          • I'm not sure I'm following you. You have n train samples. n should be divisible by the batch size, but you can fit the n samples and Keras will take care of the "batch" splitting. In turn, each train sample is a sequence of features. Each element of the sequence can be interpreted as a time step. And you have f features to describe that time step. Besides, for each train sample (consisting of a sequence of features), you have one unique y. This is the schema for a binary classification of a sequence.

            – Julian Peller
            Nov 23 '18 at 20:04











          • Assume I have n train samples (rows), where every sample has 3 features f (columns). Features f_n0 and f_n1 are input and f_n2 is an output. This f_n2 output, however, should be based on the last 16 rows (time frames) only, nothing before that time frame. Assume 1 time frame = 1 second: the NN tries to predict the output based on what happened in the last 16 seconds. Assuming I have 112 000 train samples (n = 112000) with 7 features (f = 7), and LSTM works with a 3D array, would be the resulting array in shape of (n-16, 16, f), or rather (n/16, 16, f) ?

            – SEnergy
            Nov 23 '18 at 20:41



















          • so, considering I have 112000 rows of data, and I want to train the LSTM on as many rows as possible, should I create 111984 "packs" of 16 rows data that I feed into the LSTM? therefore having (111984, 16, 6) as input and (111984, 1) as output... I want to train a LSTM for class on every row, but that class requires information about last 16 time frames, so for the 16th (and first) row I need information about 0-15 rows, for 17th I need information about 1-16 rows etc, therefore 112000-16 = 11984 packs of 16 element long data?

            – SEnergy
            Nov 23 '18 at 16:32











          • I'm not sure I'm following you. You have n train samples. n should be divisible by the batch size, but you can fit the n samples and Keras will take care of the "batch" splitting. In turn, each train sample is a sequence of features. Each element of the sequence can be interpreted as a time step. And you have f features to describe that time step. Besides, for each train sample (consisting of a sequence of features), you have one unique y. This is the schema for a binary classification of a sequence.

            – Julian Peller
            Nov 23 '18 at 20:04











          • Assume I have n train samples (rows), where every sample has 3 features f (columns). Features f_n0 and f_n1 are input and f_n2 is an output. This f_n2 output, however, should be based on the last 16 rows (time frames) only, nothing before that time frame. Assume 1 time frame = 1 second: the NN tries to predict the output based on what happened in the last 16 seconds. Assuming I have 112 000 train samples (n = 112000) with 7 features (f = 7), and LSTM works with a 3D array, would be the resulting array in shape of (n-16, 16, f), or rather (n/16, 16, f) ?

            – SEnergy
            Nov 23 '18 at 20:41

















          so, considering I have 112000 rows of data, and I want to train the LSTM on as many rows as possible, should I create 111984 "packs" of 16 rows data that I feed into the LSTM? therefore having (111984, 16, 6) as input and (111984, 1) as output... I want to train a LSTM for class on every row, but that class requires information about last 16 time frames, so for the 16th (and first) row I need information about 0-15 rows, for 17th I need information about 1-16 rows etc, therefore 112000-16 = 11984 packs of 16 element long data?

          – SEnergy
          Nov 23 '18 at 16:32





          so, considering I have 112000 rows of data, and I want to train the LSTM on as many rows as possible, should I create 111984 "packs" of 16 rows data that I feed into the LSTM? therefore having (111984, 16, 6) as input and (111984, 1) as output... I want to train a LSTM for class on every row, but that class requires information about last 16 time frames, so for the 16th (and first) row I need information about 0-15 rows, for 17th I need information about 1-16 rows etc, therefore 112000-16 = 11984 packs of 16 element long data?

          – SEnergy
          Nov 23 '18 at 16:32













          I'm not sure I'm following you. You have n train samples. n should be divisible by the batch size, but you can fit the n samples and Keras will take care of the "batch" splitting. In turn, each train sample is a sequence of features. Each element of the sequence can be interpreted as a time step. And you have f features to describe that time step. Besides, for each train sample (consisting of a sequence of features), you have one unique y. This is the schema for a binary classification of a sequence.

          – Julian Peller
          Nov 23 '18 at 20:04





          I'm not sure I'm following you. You have n train samples. n should be divisible by the batch size, but you can fit the n samples and Keras will take care of the "batch" splitting. In turn, each train sample is a sequence of features. Each element of the sequence can be interpreted as a time step. And you have f features to describe that time step. Besides, for each train sample (consisting of a sequence of features), you have one unique y. This is the schema for a binary classification of a sequence.

          – Julian Peller
          Nov 23 '18 at 20:04













          Assume I have n train samples (rows), where every sample has 3 features f (columns). Features f_n0 and f_n1 are input and f_n2 is an output. This f_n2 output, however, should be based on the last 16 rows (time frames) only, nothing before that time frame. Assume 1 time frame = 1 second: the NN tries to predict the output based on what happened in the last 16 seconds. Assuming I have 112 000 train samples (n = 112000) with 7 features (f = 7), and LSTM works with a 3D array, would be the resulting array in shape of (n-16, 16, f), or rather (n/16, 16, f) ?

          – SEnergy
          Nov 23 '18 at 20:41





          Assume I have n train samples (rows), where every sample has 3 features f (columns). Features f_n0 and f_n1 are input and f_n2 is an output. This f_n2 output, however, should be based on the last 16 rows (time frames) only, nothing before that time frame. Assume 1 time frame = 1 second: the NN tries to predict the output based on what happened in the last 16 seconds. Assuming I have 112 000 train samples (n = 112000) with 7 features (f = 7), and LSTM works with a 3D array, would be the resulting array in shape of (n-16, 16, f), or rather (n/16, 16, f) ?

          – SEnergy
          Nov 23 '18 at 20:41


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53439070%2funderstanding-keras-lstm-nn-input-output-for-binary-classification%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Costa Masnaga

          Fotorealismo

          Sidney Franklin