Same ImageDataGenerator but different class_indices - How to remap the classes within a generator?












0















Background



I am training a model that takes as inputs two images. As the data is too large to fit into my machines RAM, I am using flow_from_dataframe to create the generators for training and validation - two training generators which each provide one of the respective images (front view & back view as indicated by the x_col parameter) and two generators for validation respectively.



Like this:



datagen=ImageDataGenerator(rescale=1. / 255)

X1_train_generator =datagen.flow_from_dataframe(dataframe=train, directory=data_dir, x_col="front", y_col=target, has_ext=True, class_mode="categorical", target_size=(224,224), batch_size=batch_size,seed = 1)
X2_train_generator=datagen.flow_from_dataframe(dataframe=train, directory=data_dir, x_col="back", y_col=target, has_ext=True, class_mode="categorical", target_size=(224,224), batch_size=batch_size,seed = 1)

X1_validation_generator =datagen.flow_from_dataframe(dataframe=test, directory=data_dir, x_col="front", y_col=target, has_ext=True, class_mode="categorical", target_size=(224,224), batch_size=batch_size,seed = 1)
X2_validation_generator=datagen.flow_from_dataframe(dataframe=test, directory=data_dir, x_col="back", y_col=target, has_ext=True, class_mode="categorical", target_size=(224,224), batch_size=batch_size,seed = 1)


In order to combine the train and validation generator for training I am using:



def format_gen_outputs(gen1,gen2):
x1 = gen1[0]
x2 = gen2[0]
y1 = gen1[1]
return [x1, x2], y1

train_combo_gen= map(format_gen_outputs, X1_train_generator , X2_train_generator )
validation_combo_gen= map(format_gen_outputs, X1_validation_generator , X2_validation_generator )


Now I use fit_generator to train my model passing train_combo_gen for training purpose and validation_combo_gen to the validation_data parameter for validation purpose



Problem



However, I realize that both my X1_train_generator and X2_train_generator show a different .class_indices mapping than my other two validation generators X1_validation_generator and X2_validation_generator.



Like this (notice how cat & dog are assigned to different classes):



X1_train_generator.class_indices
>> {'cat': 0, 'dog': 1, 'car': 2, 'bike': 3}

X1_validation_generato.class_indices
>> {'dog': 0, 'cat': 1, 'car': 2, 'bike': 3}


Question



Hence, I don't trust my val_loss and val_acc during training. Is there any way to fix this i.e. remap the classes within the generators?










share|improve this question





























    0















    Background



    I am training a model that takes as inputs two images. As the data is too large to fit into my machines RAM, I am using flow_from_dataframe to create the generators for training and validation - two training generators which each provide one of the respective images (front view & back view as indicated by the x_col parameter) and two generators for validation respectively.



    Like this:



    datagen=ImageDataGenerator(rescale=1. / 255)

    X1_train_generator =datagen.flow_from_dataframe(dataframe=train, directory=data_dir, x_col="front", y_col=target, has_ext=True, class_mode="categorical", target_size=(224,224), batch_size=batch_size,seed = 1)
    X2_train_generator=datagen.flow_from_dataframe(dataframe=train, directory=data_dir, x_col="back", y_col=target, has_ext=True, class_mode="categorical", target_size=(224,224), batch_size=batch_size,seed = 1)

    X1_validation_generator =datagen.flow_from_dataframe(dataframe=test, directory=data_dir, x_col="front", y_col=target, has_ext=True, class_mode="categorical", target_size=(224,224), batch_size=batch_size,seed = 1)
    X2_validation_generator=datagen.flow_from_dataframe(dataframe=test, directory=data_dir, x_col="back", y_col=target, has_ext=True, class_mode="categorical", target_size=(224,224), batch_size=batch_size,seed = 1)


    In order to combine the train and validation generator for training I am using:



    def format_gen_outputs(gen1,gen2):
    x1 = gen1[0]
    x2 = gen2[0]
    y1 = gen1[1]
    return [x1, x2], y1

    train_combo_gen= map(format_gen_outputs, X1_train_generator , X2_train_generator )
    validation_combo_gen= map(format_gen_outputs, X1_validation_generator , X2_validation_generator )


    Now I use fit_generator to train my model passing train_combo_gen for training purpose and validation_combo_gen to the validation_data parameter for validation purpose



    Problem



    However, I realize that both my X1_train_generator and X2_train_generator show a different .class_indices mapping than my other two validation generators X1_validation_generator and X2_validation_generator.



    Like this (notice how cat & dog are assigned to different classes):



    X1_train_generator.class_indices
    >> {'cat': 0, 'dog': 1, 'car': 2, 'bike': 3}

    X1_validation_generato.class_indices
    >> {'dog': 0, 'cat': 1, 'car': 2, 'bike': 3}


    Question



    Hence, I don't trust my val_loss and val_acc during training. Is there any way to fix this i.e. remap the classes within the generators?










    share|improve this question



























      0












      0








      0








      Background



      I am training a model that takes as inputs two images. As the data is too large to fit into my machines RAM, I am using flow_from_dataframe to create the generators for training and validation - two training generators which each provide one of the respective images (front view & back view as indicated by the x_col parameter) and two generators for validation respectively.



      Like this:



      datagen=ImageDataGenerator(rescale=1. / 255)

      X1_train_generator =datagen.flow_from_dataframe(dataframe=train, directory=data_dir, x_col="front", y_col=target, has_ext=True, class_mode="categorical", target_size=(224,224), batch_size=batch_size,seed = 1)
      X2_train_generator=datagen.flow_from_dataframe(dataframe=train, directory=data_dir, x_col="back", y_col=target, has_ext=True, class_mode="categorical", target_size=(224,224), batch_size=batch_size,seed = 1)

      X1_validation_generator =datagen.flow_from_dataframe(dataframe=test, directory=data_dir, x_col="front", y_col=target, has_ext=True, class_mode="categorical", target_size=(224,224), batch_size=batch_size,seed = 1)
      X2_validation_generator=datagen.flow_from_dataframe(dataframe=test, directory=data_dir, x_col="back", y_col=target, has_ext=True, class_mode="categorical", target_size=(224,224), batch_size=batch_size,seed = 1)


      In order to combine the train and validation generator for training I am using:



      def format_gen_outputs(gen1,gen2):
      x1 = gen1[0]
      x2 = gen2[0]
      y1 = gen1[1]
      return [x1, x2], y1

      train_combo_gen= map(format_gen_outputs, X1_train_generator , X2_train_generator )
      validation_combo_gen= map(format_gen_outputs, X1_validation_generator , X2_validation_generator )


      Now I use fit_generator to train my model passing train_combo_gen for training purpose and validation_combo_gen to the validation_data parameter for validation purpose



      Problem



      However, I realize that both my X1_train_generator and X2_train_generator show a different .class_indices mapping than my other two validation generators X1_validation_generator and X2_validation_generator.



      Like this (notice how cat & dog are assigned to different classes):



      X1_train_generator.class_indices
      >> {'cat': 0, 'dog': 1, 'car': 2, 'bike': 3}

      X1_validation_generato.class_indices
      >> {'dog': 0, 'cat': 1, 'car': 2, 'bike': 3}


      Question



      Hence, I don't trust my val_loss and val_acc during training. Is there any way to fix this i.e. remap the classes within the generators?










      share|improve this question
















      Background



      I am training a model that takes as inputs two images. As the data is too large to fit into my machines RAM, I am using flow_from_dataframe to create the generators for training and validation - two training generators which each provide one of the respective images (front view & back view as indicated by the x_col parameter) and two generators for validation respectively.



      Like this:



      datagen=ImageDataGenerator(rescale=1. / 255)

      X1_train_generator =datagen.flow_from_dataframe(dataframe=train, directory=data_dir, x_col="front", y_col=target, has_ext=True, class_mode="categorical", target_size=(224,224), batch_size=batch_size,seed = 1)
      X2_train_generator=datagen.flow_from_dataframe(dataframe=train, directory=data_dir, x_col="back", y_col=target, has_ext=True, class_mode="categorical", target_size=(224,224), batch_size=batch_size,seed = 1)

      X1_validation_generator =datagen.flow_from_dataframe(dataframe=test, directory=data_dir, x_col="front", y_col=target, has_ext=True, class_mode="categorical", target_size=(224,224), batch_size=batch_size,seed = 1)
      X2_validation_generator=datagen.flow_from_dataframe(dataframe=test, directory=data_dir, x_col="back", y_col=target, has_ext=True, class_mode="categorical", target_size=(224,224), batch_size=batch_size,seed = 1)


      In order to combine the train and validation generator for training I am using:



      def format_gen_outputs(gen1,gen2):
      x1 = gen1[0]
      x2 = gen2[0]
      y1 = gen1[1]
      return [x1, x2], y1

      train_combo_gen= map(format_gen_outputs, X1_train_generator , X2_train_generator )
      validation_combo_gen= map(format_gen_outputs, X1_validation_generator , X2_validation_generator )


      Now I use fit_generator to train my model passing train_combo_gen for training purpose and validation_combo_gen to the validation_data parameter for validation purpose



      Problem



      However, I realize that both my X1_train_generator and X2_train_generator show a different .class_indices mapping than my other two validation generators X1_validation_generator and X2_validation_generator.



      Like this (notice how cat & dog are assigned to different classes):



      X1_train_generator.class_indices
      >> {'cat': 0, 'dog': 1, 'car': 2, 'bike': 3}

      X1_validation_generato.class_indices
      >> {'dog': 0, 'cat': 1, 'car': 2, 'bike': 3}


      Question



      Hence, I don't trust my val_loss and val_acc during training. Is there any way to fix this i.e. remap the classes within the generators?







      python tensorflow keras generator






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 27 '18 at 12:16









      today

      11.3k22239




      11.3k22239










      asked Nov 25 '18 at 23:39









      AaronDTAaronDT

      9472627




      9472627
























          1 Answer
          1






          active

          oldest

          votes


















          2














          When you don't explicitly set the classes using classes argument, the flow_from_dataframe internally uses pandas Series unique method on the y_col column to find the classes:



          if not classes:
          classes =
          if class_mode not in ["other", "input", None]:
          classes = list(self.df[y_col].unique())


          The unique method would return the unique values in order of appearance in the column. Since the order of appearance of labels in your train and test dataframe are different from each other, you would get different indices for classes.



          One workaround is to explicitly set the classes argument for all the flow_from_dataframe calls to guarantee the same class indices mapping in train and validation generators:



          X1_train_generator =datagen.flow_from_dataframe(dataframe=train, 
          classes=['cat', 'dog', 'car', 'bike'], ...)

          # do the same for `X2_train_generator`, `X1_validation_generator` and `X2_validation_generator`





          share|improve this answer
























          • Thank you so much! This is exactly what I was looking for. Totally missed the fact that there is a classes parameter!

            – AaronDT
            Nov 28 '18 at 13:04











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53473110%2fsame-imagedatagenerator-but-different-class-indices-how-to-remap-the-classes-w%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          2














          When you don't explicitly set the classes using classes argument, the flow_from_dataframe internally uses pandas Series unique method on the y_col column to find the classes:



          if not classes:
          classes =
          if class_mode not in ["other", "input", None]:
          classes = list(self.df[y_col].unique())


          The unique method would return the unique values in order of appearance in the column. Since the order of appearance of labels in your train and test dataframe are different from each other, you would get different indices for classes.



          One workaround is to explicitly set the classes argument for all the flow_from_dataframe calls to guarantee the same class indices mapping in train and validation generators:



          X1_train_generator =datagen.flow_from_dataframe(dataframe=train, 
          classes=['cat', 'dog', 'car', 'bike'], ...)

          # do the same for `X2_train_generator`, `X1_validation_generator` and `X2_validation_generator`





          share|improve this answer
























          • Thank you so much! This is exactly what I was looking for. Totally missed the fact that there is a classes parameter!

            – AaronDT
            Nov 28 '18 at 13:04
















          2














          When you don't explicitly set the classes using classes argument, the flow_from_dataframe internally uses pandas Series unique method on the y_col column to find the classes:



          if not classes:
          classes =
          if class_mode not in ["other", "input", None]:
          classes = list(self.df[y_col].unique())


          The unique method would return the unique values in order of appearance in the column. Since the order of appearance of labels in your train and test dataframe are different from each other, you would get different indices for classes.



          One workaround is to explicitly set the classes argument for all the flow_from_dataframe calls to guarantee the same class indices mapping in train and validation generators:



          X1_train_generator =datagen.flow_from_dataframe(dataframe=train, 
          classes=['cat', 'dog', 'car', 'bike'], ...)

          # do the same for `X2_train_generator`, `X1_validation_generator` and `X2_validation_generator`





          share|improve this answer
























          • Thank you so much! This is exactly what I was looking for. Totally missed the fact that there is a classes parameter!

            – AaronDT
            Nov 28 '18 at 13:04














          2












          2








          2







          When you don't explicitly set the classes using classes argument, the flow_from_dataframe internally uses pandas Series unique method on the y_col column to find the classes:



          if not classes:
          classes =
          if class_mode not in ["other", "input", None]:
          classes = list(self.df[y_col].unique())


          The unique method would return the unique values in order of appearance in the column. Since the order of appearance of labels in your train and test dataframe are different from each other, you would get different indices for classes.



          One workaround is to explicitly set the classes argument for all the flow_from_dataframe calls to guarantee the same class indices mapping in train and validation generators:



          X1_train_generator =datagen.flow_from_dataframe(dataframe=train, 
          classes=['cat', 'dog', 'car', 'bike'], ...)

          # do the same for `X2_train_generator`, `X1_validation_generator` and `X2_validation_generator`





          share|improve this answer













          When you don't explicitly set the classes using classes argument, the flow_from_dataframe internally uses pandas Series unique method on the y_col column to find the classes:



          if not classes:
          classes =
          if class_mode not in ["other", "input", None]:
          classes = list(self.df[y_col].unique())


          The unique method would return the unique values in order of appearance in the column. Since the order of appearance of labels in your train and test dataframe are different from each other, you would get different indices for classes.



          One workaround is to explicitly set the classes argument for all the flow_from_dataframe calls to guarantee the same class indices mapping in train and validation generators:



          X1_train_generator =datagen.flow_from_dataframe(dataframe=train, 
          classes=['cat', 'dog', 'car', 'bike'], ...)

          # do the same for `X2_train_generator`, `X1_validation_generator` and `X2_validation_generator`






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 27 '18 at 12:09









          todaytoday

          11.3k22239




          11.3k22239













          • Thank you so much! This is exactly what I was looking for. Totally missed the fact that there is a classes parameter!

            – AaronDT
            Nov 28 '18 at 13:04



















          • Thank you so much! This is exactly what I was looking for. Totally missed the fact that there is a classes parameter!

            – AaronDT
            Nov 28 '18 at 13:04

















          Thank you so much! This is exactly what I was looking for. Totally missed the fact that there is a classes parameter!

          – AaronDT
          Nov 28 '18 at 13:04





          Thank you so much! This is exactly what I was looking for. Totally missed the fact that there is a classes parameter!

          – AaronDT
          Nov 28 '18 at 13:04




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53473110%2fsame-imagedatagenerator-but-different-class-indices-how-to-remap-the-classes-w%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Ottavio Pratesi

          Tricia Helfer

          15 giugno