Same ImageDataGenerator but different class_indices - How to remap the classes within a generator?
Background
I am training a model that takes as inputs two images. As the data is too large to fit into my machines RAM, I am using flow_from_dataframe to create the generators for training and validation - two training generators which each provide one of the respective images (front view & back view as indicated by the x_col parameter) and two generators for validation respectively.
Like this:
datagen=ImageDataGenerator(rescale=1. / 255)
X1_train_generator =datagen.flow_from_dataframe(dataframe=train, directory=data_dir, x_col="front", y_col=target, has_ext=True, class_mode="categorical", target_size=(224,224), batch_size=batch_size,seed = 1)
X2_train_generator=datagen.flow_from_dataframe(dataframe=train, directory=data_dir, x_col="back", y_col=target, has_ext=True, class_mode="categorical", target_size=(224,224), batch_size=batch_size,seed = 1)
X1_validation_generator =datagen.flow_from_dataframe(dataframe=test, directory=data_dir, x_col="front", y_col=target, has_ext=True, class_mode="categorical", target_size=(224,224), batch_size=batch_size,seed = 1)
X2_validation_generator=datagen.flow_from_dataframe(dataframe=test, directory=data_dir, x_col="back", y_col=target, has_ext=True, class_mode="categorical", target_size=(224,224), batch_size=batch_size,seed = 1)
In order to combine the train and validation generator for training I am using:
def format_gen_outputs(gen1,gen2):
x1 = gen1[0]
x2 = gen2[0]
y1 = gen1[1]
return [x1, x2], y1
train_combo_gen= map(format_gen_outputs, X1_train_generator , X2_train_generator )
validation_combo_gen= map(format_gen_outputs, X1_validation_generator , X2_validation_generator )
Now I use fit_generator to train my model passing train_combo_gen for training purpose and validation_combo_gen to the validation_data parameter for validation purpose
Problem
However, I realize that both my X1_train_generator and X2_train_generator show a different .class_indices mapping than my other two validation generators X1_validation_generator and X2_validation_generator.
Like this (notice how cat & dog are assigned to different classes):
X1_train_generator.class_indices
>> {'cat': 0, 'dog': 1, 'car': 2, 'bike': 3}
X1_validation_generato.class_indices
>> {'dog': 0, 'cat': 1, 'car': 2, 'bike': 3}
Question
Hence, I don't trust my val_loss and val_acc during training. Is there any way to fix this i.e. remap the classes within the generators?
python tensorflow keras generator
add a comment |
Background
I am training a model that takes as inputs two images. As the data is too large to fit into my machines RAM, I am using flow_from_dataframe to create the generators for training and validation - two training generators which each provide one of the respective images (front view & back view as indicated by the x_col parameter) and two generators for validation respectively.
Like this:
datagen=ImageDataGenerator(rescale=1. / 255)
X1_train_generator =datagen.flow_from_dataframe(dataframe=train, directory=data_dir, x_col="front", y_col=target, has_ext=True, class_mode="categorical", target_size=(224,224), batch_size=batch_size,seed = 1)
X2_train_generator=datagen.flow_from_dataframe(dataframe=train, directory=data_dir, x_col="back", y_col=target, has_ext=True, class_mode="categorical", target_size=(224,224), batch_size=batch_size,seed = 1)
X1_validation_generator =datagen.flow_from_dataframe(dataframe=test, directory=data_dir, x_col="front", y_col=target, has_ext=True, class_mode="categorical", target_size=(224,224), batch_size=batch_size,seed = 1)
X2_validation_generator=datagen.flow_from_dataframe(dataframe=test, directory=data_dir, x_col="back", y_col=target, has_ext=True, class_mode="categorical", target_size=(224,224), batch_size=batch_size,seed = 1)
In order to combine the train and validation generator for training I am using:
def format_gen_outputs(gen1,gen2):
x1 = gen1[0]
x2 = gen2[0]
y1 = gen1[1]
return [x1, x2], y1
train_combo_gen= map(format_gen_outputs, X1_train_generator , X2_train_generator )
validation_combo_gen= map(format_gen_outputs, X1_validation_generator , X2_validation_generator )
Now I use fit_generator to train my model passing train_combo_gen for training purpose and validation_combo_gen to the validation_data parameter for validation purpose
Problem
However, I realize that both my X1_train_generator and X2_train_generator show a different .class_indices mapping than my other two validation generators X1_validation_generator and X2_validation_generator.
Like this (notice how cat & dog are assigned to different classes):
X1_train_generator.class_indices
>> {'cat': 0, 'dog': 1, 'car': 2, 'bike': 3}
X1_validation_generato.class_indices
>> {'dog': 0, 'cat': 1, 'car': 2, 'bike': 3}
Question
Hence, I don't trust my val_loss and val_acc during training. Is there any way to fix this i.e. remap the classes within the generators?
python tensorflow keras generator
add a comment |
Background
I am training a model that takes as inputs two images. As the data is too large to fit into my machines RAM, I am using flow_from_dataframe to create the generators for training and validation - two training generators which each provide one of the respective images (front view & back view as indicated by the x_col parameter) and two generators for validation respectively.
Like this:
datagen=ImageDataGenerator(rescale=1. / 255)
X1_train_generator =datagen.flow_from_dataframe(dataframe=train, directory=data_dir, x_col="front", y_col=target, has_ext=True, class_mode="categorical", target_size=(224,224), batch_size=batch_size,seed = 1)
X2_train_generator=datagen.flow_from_dataframe(dataframe=train, directory=data_dir, x_col="back", y_col=target, has_ext=True, class_mode="categorical", target_size=(224,224), batch_size=batch_size,seed = 1)
X1_validation_generator =datagen.flow_from_dataframe(dataframe=test, directory=data_dir, x_col="front", y_col=target, has_ext=True, class_mode="categorical", target_size=(224,224), batch_size=batch_size,seed = 1)
X2_validation_generator=datagen.flow_from_dataframe(dataframe=test, directory=data_dir, x_col="back", y_col=target, has_ext=True, class_mode="categorical", target_size=(224,224), batch_size=batch_size,seed = 1)
In order to combine the train and validation generator for training I am using:
def format_gen_outputs(gen1,gen2):
x1 = gen1[0]
x2 = gen2[0]
y1 = gen1[1]
return [x1, x2], y1
train_combo_gen= map(format_gen_outputs, X1_train_generator , X2_train_generator )
validation_combo_gen= map(format_gen_outputs, X1_validation_generator , X2_validation_generator )
Now I use fit_generator to train my model passing train_combo_gen for training purpose and validation_combo_gen to the validation_data parameter for validation purpose
Problem
However, I realize that both my X1_train_generator and X2_train_generator show a different .class_indices mapping than my other two validation generators X1_validation_generator and X2_validation_generator.
Like this (notice how cat & dog are assigned to different classes):
X1_train_generator.class_indices
>> {'cat': 0, 'dog': 1, 'car': 2, 'bike': 3}
X1_validation_generato.class_indices
>> {'dog': 0, 'cat': 1, 'car': 2, 'bike': 3}
Question
Hence, I don't trust my val_loss and val_acc during training. Is there any way to fix this i.e. remap the classes within the generators?
python tensorflow keras generator
Background
I am training a model that takes as inputs two images. As the data is too large to fit into my machines RAM, I am using flow_from_dataframe to create the generators for training and validation - two training generators which each provide one of the respective images (front view & back view as indicated by the x_col parameter) and two generators for validation respectively.
Like this:
datagen=ImageDataGenerator(rescale=1. / 255)
X1_train_generator =datagen.flow_from_dataframe(dataframe=train, directory=data_dir, x_col="front", y_col=target, has_ext=True, class_mode="categorical", target_size=(224,224), batch_size=batch_size,seed = 1)
X2_train_generator=datagen.flow_from_dataframe(dataframe=train, directory=data_dir, x_col="back", y_col=target, has_ext=True, class_mode="categorical", target_size=(224,224), batch_size=batch_size,seed = 1)
X1_validation_generator =datagen.flow_from_dataframe(dataframe=test, directory=data_dir, x_col="front", y_col=target, has_ext=True, class_mode="categorical", target_size=(224,224), batch_size=batch_size,seed = 1)
X2_validation_generator=datagen.flow_from_dataframe(dataframe=test, directory=data_dir, x_col="back", y_col=target, has_ext=True, class_mode="categorical", target_size=(224,224), batch_size=batch_size,seed = 1)
In order to combine the train and validation generator for training I am using:
def format_gen_outputs(gen1,gen2):
x1 = gen1[0]
x2 = gen2[0]
y1 = gen1[1]
return [x1, x2], y1
train_combo_gen= map(format_gen_outputs, X1_train_generator , X2_train_generator )
validation_combo_gen= map(format_gen_outputs, X1_validation_generator , X2_validation_generator )
Now I use fit_generator to train my model passing train_combo_gen for training purpose and validation_combo_gen to the validation_data parameter for validation purpose
Problem
However, I realize that both my X1_train_generator and X2_train_generator show a different .class_indices mapping than my other two validation generators X1_validation_generator and X2_validation_generator.
Like this (notice how cat & dog are assigned to different classes):
X1_train_generator.class_indices
>> {'cat': 0, 'dog': 1, 'car': 2, 'bike': 3}
X1_validation_generato.class_indices
>> {'dog': 0, 'cat': 1, 'car': 2, 'bike': 3}
Question
Hence, I don't trust my val_loss and val_acc during training. Is there any way to fix this i.e. remap the classes within the generators?
python tensorflow keras generator
python tensorflow keras generator
edited Nov 27 '18 at 12:16
today
11.3k22239
11.3k22239
asked Nov 25 '18 at 23:39
AaronDTAaronDT
9472627
9472627
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
When you don't explicitly set the classes using classes argument, the flow_from_dataframe internally uses pandas Series unique method on the y_col column to find the classes:
if not classes:
classes =
if class_mode not in ["other", "input", None]:
classes = list(self.df[y_col].unique())
The unique method would return the unique values in order of appearance in the column. Since the order of appearance of labels in your train and test dataframe are different from each other, you would get different indices for classes.
One workaround is to explicitly set the classes argument for all the flow_from_dataframe calls to guarantee the same class indices mapping in train and validation generators:
X1_train_generator =datagen.flow_from_dataframe(dataframe=train,
classes=['cat', 'dog', 'car', 'bike'], ...)
# do the same for `X2_train_generator`, `X1_validation_generator` and `X2_validation_generator`
Thank you so much! This is exactly what I was looking for. Totally missed the fact that there is a classes parameter!
– AaronDT
Nov 28 '18 at 13:04
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53473110%2fsame-imagedatagenerator-but-different-class-indices-how-to-remap-the-classes-w%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
When you don't explicitly set the classes using classes argument, the flow_from_dataframe internally uses pandas Series unique method on the y_col column to find the classes:
if not classes:
classes =
if class_mode not in ["other", "input", None]:
classes = list(self.df[y_col].unique())
The unique method would return the unique values in order of appearance in the column. Since the order of appearance of labels in your train and test dataframe are different from each other, you would get different indices for classes.
One workaround is to explicitly set the classes argument for all the flow_from_dataframe calls to guarantee the same class indices mapping in train and validation generators:
X1_train_generator =datagen.flow_from_dataframe(dataframe=train,
classes=['cat', 'dog', 'car', 'bike'], ...)
# do the same for `X2_train_generator`, `X1_validation_generator` and `X2_validation_generator`
Thank you so much! This is exactly what I was looking for. Totally missed the fact that there is a classes parameter!
– AaronDT
Nov 28 '18 at 13:04
add a comment |
When you don't explicitly set the classes using classes argument, the flow_from_dataframe internally uses pandas Series unique method on the y_col column to find the classes:
if not classes:
classes =
if class_mode not in ["other", "input", None]:
classes = list(self.df[y_col].unique())
The unique method would return the unique values in order of appearance in the column. Since the order of appearance of labels in your train and test dataframe are different from each other, you would get different indices for classes.
One workaround is to explicitly set the classes argument for all the flow_from_dataframe calls to guarantee the same class indices mapping in train and validation generators:
X1_train_generator =datagen.flow_from_dataframe(dataframe=train,
classes=['cat', 'dog', 'car', 'bike'], ...)
# do the same for `X2_train_generator`, `X1_validation_generator` and `X2_validation_generator`
Thank you so much! This is exactly what I was looking for. Totally missed the fact that there is a classes parameter!
– AaronDT
Nov 28 '18 at 13:04
add a comment |
When you don't explicitly set the classes using classes argument, the flow_from_dataframe internally uses pandas Series unique method on the y_col column to find the classes:
if not classes:
classes =
if class_mode not in ["other", "input", None]:
classes = list(self.df[y_col].unique())
The unique method would return the unique values in order of appearance in the column. Since the order of appearance of labels in your train and test dataframe are different from each other, you would get different indices for classes.
One workaround is to explicitly set the classes argument for all the flow_from_dataframe calls to guarantee the same class indices mapping in train and validation generators:
X1_train_generator =datagen.flow_from_dataframe(dataframe=train,
classes=['cat', 'dog', 'car', 'bike'], ...)
# do the same for `X2_train_generator`, `X1_validation_generator` and `X2_validation_generator`
When you don't explicitly set the classes using classes argument, the flow_from_dataframe internally uses pandas Series unique method on the y_col column to find the classes:
if not classes:
classes =
if class_mode not in ["other", "input", None]:
classes = list(self.df[y_col].unique())
The unique method would return the unique values in order of appearance in the column. Since the order of appearance of labels in your train and test dataframe are different from each other, you would get different indices for classes.
One workaround is to explicitly set the classes argument for all the flow_from_dataframe calls to guarantee the same class indices mapping in train and validation generators:
X1_train_generator =datagen.flow_from_dataframe(dataframe=train,
classes=['cat', 'dog', 'car', 'bike'], ...)
# do the same for `X2_train_generator`, `X1_validation_generator` and `X2_validation_generator`
answered Nov 27 '18 at 12:09
todaytoday
11.3k22239
11.3k22239
Thank you so much! This is exactly what I was looking for. Totally missed the fact that there is a classes parameter!
– AaronDT
Nov 28 '18 at 13:04
add a comment |
Thank you so much! This is exactly what I was looking for. Totally missed the fact that there is a classes parameter!
– AaronDT
Nov 28 '18 at 13:04
Thank you so much! This is exactly what I was looking for. Totally missed the fact that there is a classes parameter!
– AaronDT
Nov 28 '18 at 13:04
Thank you so much! This is exactly what I was looking for. Totally missed the fact that there is a classes parameter!
– AaronDT
Nov 28 '18 at 13:04
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53473110%2fsame-imagedatagenerator-but-different-class-indices-how-to-remap-the-classes-w%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown