Split data directory into training and test directory with sub directory structure preserved

I am interested in using ImageDataGenerator in Keras for data augmentation. But it requires that training and validation directories with sub directories for classes be fed in separately as below (this is from Keras documentation). I have a single directory with 2 subdirectories for 2 classes (Data/Class1 and Data/Class2). How do I randomly split this into training and validation directories

    train_datagen = ImageDataGenerator(

    rescale=1./255,

    shear_range=0.2,

    zoom_range=0.2,

    horizontal_flip=True)



    test_datagen = ImageDataGenerator(rescale=1./255)



    train_generator = train_datagen.flow_from_directory(

    'data/train',

    target_size=(150, 150),

    batch_size=32,

    class_mode='binary')



   validation_generator = test_datagen.flow_from_directory(

    'data/validation',

    target_size=(150, 150),

    batch_size=32,

    class_mode='binary')



   model.fit_generator(

    train_generator,

    steps_per_epoch=2000,

    epochs=50,

    validation_data=validation_generator,

    validation_steps=800)

I am interested in re-running my algorithm multiple times with random training and validation data splits.

edited May 18 '18 at 11:39

Marcin Możejko

21.6k54878

asked Oct 12 '17 at 19:47

Sharanya Arcot Desai

234311

add a comment |

    train_datagen = ImageDataGenerator(

    rescale=1./255,

    shear_range=0.2,

    zoom_range=0.2,

    horizontal_flip=True)



    test_datagen = ImageDataGenerator(rescale=1./255)



    train_generator = train_datagen.flow_from_directory(

    'data/train',

    target_size=(150, 150),

    batch_size=32,

    class_mode='binary')



   validation_generator = test_datagen.flow_from_directory(

    'data/validation',

    target_size=(150, 150),

    batch_size=32,

    class_mode='binary')



   model.fit_generator(

    train_generator,

    steps_per_epoch=2000,

    epochs=50,

    validation_data=validation_generator,

    validation_steps=800)

I am interested in re-running my algorithm multiple times with random training and validation data splits.

edited May 18 '18 at 11:39

Marcin Możejko

21.6k54878

asked Oct 12 '17 at 19:47

Sharanya Arcot Desai

234311

add a comment |

    train_datagen = ImageDataGenerator(

    rescale=1./255,

    shear_range=0.2,

    zoom_range=0.2,

    horizontal_flip=True)



    test_datagen = ImageDataGenerator(rescale=1./255)



    train_generator = train_datagen.flow_from_directory(

    'data/train',

    target_size=(150, 150),

    batch_size=32,

    class_mode='binary')



   validation_generator = test_datagen.flow_from_directory(

    'data/validation',

    target_size=(150, 150),

    batch_size=32,

    class_mode='binary')



   model.fit_generator(

    train_generator,

    steps_per_epoch=2000,

    epochs=50,

    validation_data=validation_generator,

    validation_steps=800)

I am interested in re-running my algorithm multiple times with random training and validation data splits.

edited May 18 '18 at 11:39

Marcin Możejko

21.6k54878

asked Oct 12 '17 at 19:47

Sharanya Arcot Desai

234311

    train_datagen = ImageDataGenerator(

    rescale=1./255,

    shear_range=0.2,

    zoom_range=0.2,

    horizontal_flip=True)



    test_datagen = ImageDataGenerator(rescale=1./255)



    train_generator = train_datagen.flow_from_directory(

    'data/train',

    target_size=(150, 150),

    batch_size=32,

    class_mode='binary')



   validation_generator = test_datagen.flow_from_directory(

    'data/validation',

    target_size=(150, 150),

    batch_size=32,

    class_mode='binary')



   model.fit_generator(

    train_generator,

    steps_per_epoch=2000,

    epochs=50,

    validation_data=validation_generator,

    validation_steps=800)

I am interested in re-running my algorithm multiple times with random training and validation data splits.

python machine-learning neural-network keras deep-learning

edited May 18 '18 at 11:39

Marcin Możejko

21.6k54878

asked Oct 12 '17 at 19:47

Sharanya Arcot Desai

234311

edited May 18 '18 at 11:39

Marcin Możejko

21.6k54878

asked Oct 12 '17 at 19:47

Sharanya Arcot Desai

234311

edited May 18 '18 at 11:39

Marcin Możejko

21.6k54878

edited May 18 '18 at 11:39

Marcin Możejko

21.6k54878

edited May 18 '18 at 11:39

Marcin Możejko

21.6k54878

asked Oct 12 '17 at 19:47

Sharanya Arcot Desai

234311

asked Oct 12 '17 at 19:47

Sharanya Arcot Desai

234311

asked Oct 12 '17 at 19:47

Sharanya Arcot Desai

234311

add a comment |

6 Answers
6

active

oldest

votes

Thank you guys! I was able to write my own function to create training and test data sets. Here's the code for anyone who's looking.

import os

source1 = "/source_dir"

dest11 = "/dest_dir"

files = os.listdir(source1)

import shutil

import numpy as np

for f in files:

    if np.random.rand(1) < 0.2:

        shutil.move(source1 + '/'+ f, dest11 + '/'+ f)

answered Oct 20 '17 at 21:54

Sharanya Arcot Desai

234311

add a comment |

Unfortunately, it's impossible for the current implementation of keras.preprocessing.image.ImageDataGenerator (as for October 14th, 2017) but as it's a really requested feature I expect it to be added in the nearest future.

But you could do this using standard Python os operations. Depending on the size of your dataset you could also try to first load all images to RAM and then use a classical fit method which could split your data randomly.

answered Oct 14 '17 at 14:39

Marcin Możejko

21.6k54878

Thanks you. I was able to write a function to create these libraries.

– Sharanya Arcot Desai
Oct 19 '17 at 22:50

add a comment |

You will need to either manually copy out some of your training data and paste it into a validation directory, or create a program to randomly move data from your training directory to your validation directory. With either of these options, you will need to pass in the validation directory to your validation ImageDataGenerator().flow_from_directory() as the path.

Details for organizing your data in the directory structure are covered in this video.

edited Oct 14 '17 at 16:19

answered Oct 12 '17 at 20:11

blackHoleDetector

1,024410

Thanks for your answer. But I did not see validation_split as a parameter in fit_generator, and fit_generator is what I want to use.It's a parameter in the fit function.

– Sharanya Arcot Desai
Oct 13 '17 at 19:33

Ah, you're right. I was thinking it was a parameter in both fit() and fit_generator(), but it is only for fit(). I've updated my answer. You will have to either manually or programmatically create your directory structure for both valid and train sets, and then point to these separate directories with your ImageDataGenerators for each of these sets.

– blackHoleDetector
Oct 14 '17 at 16:21

add a comment |

https://stackoverflow.com/a/52372042/10111155 provided the easiest way: ImageDataGenerator now supports splitting into train/test from a single directory with subdirectories directly.

This is copied directly from that answer with no changes. I take no credit. I tried it and it worked perfectly.

Note that train_data_dir is the same in the train_generator and validation_generator. If you want a three-way split (train/test/valid) using ImageDataGenerator, the source code will need to be modified --- there are nice instructions here.

train_datagen = ImageDataGenerator(rescale=1./255,

    shear_range=0.2,

    zoom_range=0.2,

    horizontal_flip=True,

    validation_split=0.2) # set validation split



train_generator = train_datagen.flow_from_directory(

    train_data_dir,

    target_size=(img_width, img_height),

    batch_size=batch_size,

    class_mode='binary',

    subset='training') # set as training data



validation_generator = train_datagen.flow_from_directory(

    train_data_dir, # same directory as training data

    target_size=(img_width, img_height),

    batch_size=batch_size,

    class_mode='binary'

    subset='validation') # set as validation data



model.fit_generator(

    train_generator,

    steps_per_epoch = train_generator.samples // batch_size,

    validation_data = validation_generator, 

    validation_steps = validation_generator.samples // batch_size,

    epochs = nb_epochs)

answered Nov 21 '18 at 18:06

Beau Hilton

7714

add a comment |

Here's my approach:

# Create temporary validation set.

with TemporaryDirectory(dir=train_image_folder) as valid_image_folder, TemporaryDirectory(dir=train_label_folder) as valid_label_folder:

    train_images = os.listdir(train_image_folder)

    train_labels = os.listdir(train_label_folder)



    for img_name in train_images:

        single_name, ext = os.path.splitext(img_name)

        label_name = single_name + '.png'

        if label_name not in train_labels:

            continue

        if random.uniform(0, 1) <= train_val_split:

            # Move the files.

            shutil.move(os.path.join(train_image_folder, img_name), os.path.join(valid_image_folder, img_name))

            shutil.move(os.path.join(train_label_folder, label_name), os.path.join(valid_label_folder, img_name))

Don't forget to move everything back.

answered Mar 14 '18 at 13:55

Richard

313

add a comment |

You solution worked, thanks.

   import os

   import shutil

   import numpy as np



   sourceN = base_dir + "\train\NORMAL\"

   destN = base_dir + "\val\NORMAL"

   sourceP = base_dir + "\train\PNEUMONIA"

   destP = base_dir + "\val\PNEUMONIA"



   filesN = os.listdir(sourceN)

   filesP = os.listdir(sourceP)       



   for f in filesN:

       if np.random.rand(1) < 0.2:

       shutil.move(sourceN + '\'+ f, destN + '\'+ f)



   for i in filesP:

       if np.random.rand(1) < 0.2:

       shutil.move(sourceP + '\'+ i, destP + '\'+ i)



   print(len(os.listdir(sourceN)))

   print(len(os.listdir(sourceP)))

   print(len(os.listdir(destN)))

   print(len(os.listdir(destP)))

answered May 23 '18 at 18:07

Jordy

Consider adding an explanation.

– Grant Miller
May 23 '18 at 18:38

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f46717742%2fsplit-data-directory-into-training-and-test-directory-with-sub-directory-structu%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

6 Answers
6

active

oldest

votes

6 Answers
6

active

oldest

votes

Thank you guys! I was able to write my own function to create training and test data sets. Here's the code for anyone who's looking.

import os

source1 = "/source_dir"

dest11 = "/dest_dir"

files = os.listdir(source1)

import shutil

import numpy as np

for f in files:

    if np.random.rand(1) < 0.2:

        shutil.move(source1 + '/'+ f, dest11 + '/'+ f)

answered Oct 20 '17 at 21:54

Sharanya Arcot Desai

234311

add a comment |

Thank you guys! I was able to write my own function to create training and test data sets. Here's the code for anyone who's looking.

import os

source1 = "/source_dir"

dest11 = "/dest_dir"

files = os.listdir(source1)

import shutil

import numpy as np

for f in files:

    if np.random.rand(1) < 0.2:

        shutil.move(source1 + '/'+ f, dest11 + '/'+ f)

answered Oct 20 '17 at 21:54

Sharanya Arcot Desai

234311

add a comment |

Thank you guys! I was able to write my own function to create training and test data sets. Here's the code for anyone who's looking.

import os

source1 = "/source_dir"

dest11 = "/dest_dir"

files = os.listdir(source1)

import shutil

import numpy as np

for f in files:

    if np.random.rand(1) < 0.2:

        shutil.move(source1 + '/'+ f, dest11 + '/'+ f)

answered Oct 20 '17 at 21:54

Sharanya Arcot Desai

234311

Thank you guys! I was able to write my own function to create training and test data sets. Here's the code for anyone who's looking.

import os

source1 = "/source_dir"

dest11 = "/dest_dir"

files = os.listdir(source1)

import shutil

import numpy as np

for f in files:

    if np.random.rand(1) < 0.2:

        shutil.move(source1 + '/'+ f, dest11 + '/'+ f)

answered Oct 20 '17 at 21:54

Sharanya Arcot Desai

234311

answered Oct 20 '17 at 21:54

Sharanya Arcot Desai

234311

answered Oct 20 '17 at 21:54

Sharanya Arcot Desai

234311

answered Oct 20 '17 at 21:54

Sharanya Arcot Desai

234311

add a comment |

answered Oct 14 '17 at 14:39

Marcin Możejko

21.6k54878

Thanks you. I was able to write a function to create these libraries.

– Sharanya Arcot Desai
Oct 19 '17 at 22:50

add a comment |

answered Oct 14 '17 at 14:39

Marcin Możejko

21.6k54878

Thanks you. I was able to write a function to create these libraries.

– Sharanya Arcot Desai
Oct 19 '17 at 22:50

add a comment |

answered Oct 14 '17 at 14:39

Marcin Możejko

21.6k54878

answered Oct 14 '17 at 14:39

Marcin Możejko

21.6k54878

answered Oct 14 '17 at 14:39

Marcin Możejko

21.6k54878

answered Oct 14 '17 at 14:39

Marcin Możejko

21.6k54878

answered Oct 14 '17 at 14:39

Marcin Możejko

21.6k54878

Thanks you. I was able to write a function to create these libraries.

– Sharanya Arcot Desai
Oct 19 '17 at 22:50

add a comment |

Thanks you. I was able to write a function to create these libraries.

– Sharanya Arcot Desai
Oct 19 '17 at 22:50

Thanks you. I was able to write a function to create these libraries.

– Sharanya Arcot Desai
Oct 19 '17 at 22:50

add a comment |

Details for organizing your data in the directory structure are covered in this video.

edited Oct 14 '17 at 16:19

answered Oct 12 '17 at 20:11

blackHoleDetector

1,024410

Thanks for your answer. But I did not see validation_split as a parameter in fit_generator, and fit_generator is what I want to use.It's a parameter in the fit function.

– Sharanya Arcot Desai
Oct 13 '17 at 19:33

Ah, you're right. I was thinking it was a parameter in both fit() and fit_generator(), but it is only for fit(). I've updated my answer. You will have to either manually or programmatically create your directory structure for both valid and train sets, and then point to these separate directories with your ImageDataGenerators for each of these sets.

– blackHoleDetector
Oct 14 '17 at 16:21

add a comment |

Details for organizing your data in the directory structure are covered in this video.

edited Oct 14 '17 at 16:19

answered Oct 12 '17 at 20:11

blackHoleDetector

1,024410

Thanks for your answer. But I did not see validation_split as a parameter in fit_generator, and fit_generator is what I want to use.It's a parameter in the fit function.

– Sharanya Arcot Desai
Oct 13 '17 at 19:33

Ah, you're right. I was thinking it was a parameter in both fit() and fit_generator(), but it is only for fit(). I've updated my answer. You will have to either manually or programmatically create your directory structure for both valid and train sets, and then point to these separate directories with your ImageDataGenerators for each of these sets.

– blackHoleDetector
Oct 14 '17 at 16:21

add a comment |

Details for organizing your data in the directory structure are covered in this video.

edited Oct 14 '17 at 16:19

answered Oct 12 '17 at 20:11

blackHoleDetector

1,024410

Details for organizing your data in the directory structure are covered in this video.

edited Oct 14 '17 at 16:19

answered Oct 12 '17 at 20:11

blackHoleDetector

1,024410

edited Oct 14 '17 at 16:19

answered Oct 12 '17 at 20:11

blackHoleDetector

1,024410

answered Oct 12 '17 at 20:11

blackHoleDetector

1,024410

answered Oct 12 '17 at 20:11

blackHoleDetector

1,024410

Thanks for your answer. But I did not see validation_split as a parameter in fit_generator, and fit_generator is what I want to use.It's a parameter in the fit function.

– Sharanya Arcot Desai
Oct 13 '17 at 19:33

Ah, you're right. I was thinking it was a parameter in both fit() and fit_generator(), but it is only for fit(). I've updated my answer. You will have to either manually or programmatically create your directory structure for both valid and train sets, and then point to these separate directories with your ImageDataGenerators for each of these sets.

– blackHoleDetector
Oct 14 '17 at 16:21

add a comment |

Thanks for your answer. But I did not see validation_split as a parameter in fit_generator, and fit_generator is what I want to use.It's a parameter in the fit function.

– Sharanya Arcot Desai
Oct 13 '17 at 19:33

Ah, you're right. I was thinking it was a parameter in both fit() and fit_generator(), but it is only for fit(). I've updated my answer. You will have to either manually or programmatically create your directory structure for both valid and train sets, and then point to these separate directories with your ImageDataGenerators for each of these sets.

– blackHoleDetector
Oct 14 '17 at 16:21

Thanks for your answer. But I did not see validation_split as a parameter in fit_generator, and fit_generator is what I want to use.It's a parameter in the fit function.

– Sharanya Arcot Desai
Oct 13 '17 at 19:33

Ah, you're right. I was thinking it was a parameter in both fit() and fit_generator(), but it is only for fit(). I've updated my answer. You will have to either manually or programmatically create your directory structure for both valid and train sets, and then point to these separate directories with your ImageDataGenerators for each of these sets.

– blackHoleDetector
Oct 14 '17 at 16:21

add a comment |

https://stackoverflow.com/a/52372042/10111155 provided the easiest way: ImageDataGenerator now supports splitting into train/test from a single directory with subdirectories directly.

This is copied directly from that answer with no changes. I take no credit. I tried it and it worked perfectly.

train_datagen = ImageDataGenerator(rescale=1./255,

    shear_range=0.2,

    zoom_range=0.2,

    horizontal_flip=True,

    validation_split=0.2) # set validation split



train_generator = train_datagen.flow_from_directory(

    train_data_dir,

    target_size=(img_width, img_height),

    batch_size=batch_size,

    class_mode='binary',

    subset='training') # set as training data



validation_generator = train_datagen.flow_from_directory(

    train_data_dir, # same directory as training data

    target_size=(img_width, img_height),

    batch_size=batch_size,

    class_mode='binary'

    subset='validation') # set as validation data



model.fit_generator(

    train_generator,

    steps_per_epoch = train_generator.samples // batch_size,

    validation_data = validation_generator, 

    validation_steps = validation_generator.samples // batch_size,

    epochs = nb_epochs)

answered Nov 21 '18 at 18:06

Beau Hilton

7714

add a comment |

https://stackoverflow.com/a/52372042/10111155 provided the easiest way: ImageDataGenerator now supports splitting into train/test from a single directory with subdirectories directly.

This is copied directly from that answer with no changes. I take no credit. I tried it and it worked perfectly.

train_datagen = ImageDataGenerator(rescale=1./255,

    shear_range=0.2,

    zoom_range=0.2,

    horizontal_flip=True,

    validation_split=0.2) # set validation split



train_generator = train_datagen.flow_from_directory(

    train_data_dir,

    target_size=(img_width, img_height),

    batch_size=batch_size,

    class_mode='binary',

    subset='training') # set as training data



validation_generator = train_datagen.flow_from_directory(

    train_data_dir, # same directory as training data

    target_size=(img_width, img_height),

    batch_size=batch_size,

    class_mode='binary'

    subset='validation') # set as validation data



model.fit_generator(

    train_generator,

    steps_per_epoch = train_generator.samples // batch_size,

    validation_data = validation_generator, 

    validation_steps = validation_generator.samples // batch_size,

    epochs = nb_epochs)

answered Nov 21 '18 at 18:06

Beau Hilton

7714

add a comment |

https://stackoverflow.com/a/52372042/10111155 provided the easiest way: ImageDataGenerator now supports splitting into train/test from a single directory with subdirectories directly.

This is copied directly from that answer with no changes. I take no credit. I tried it and it worked perfectly.

train_datagen = ImageDataGenerator(rescale=1./255,

    shear_range=0.2,

    zoom_range=0.2,

    horizontal_flip=True,

    validation_split=0.2) # set validation split



train_generator = train_datagen.flow_from_directory(

    train_data_dir,

    target_size=(img_width, img_height),

    batch_size=batch_size,

    class_mode='binary',

    subset='training') # set as training data



validation_generator = train_datagen.flow_from_directory(

    train_data_dir, # same directory as training data

    target_size=(img_width, img_height),

    batch_size=batch_size,

    class_mode='binary'

    subset='validation') # set as validation data



model.fit_generator(

    train_generator,

    steps_per_epoch = train_generator.samples // batch_size,

    validation_data = validation_generator, 

    validation_steps = validation_generator.samples // batch_size,

    epochs = nb_epochs)

answered Nov 21 '18 at 18:06

Beau Hilton

7714

https://stackoverflow.com/a/52372042/10111155 provided the easiest way: ImageDataGenerator now supports splitting into train/test from a single directory with subdirectories directly.

This is copied directly from that answer with no changes. I take no credit. I tried it and it worked perfectly.

train_datagen = ImageDataGenerator(rescale=1./255,

    shear_range=0.2,

    zoom_range=0.2,

    horizontal_flip=True,

    validation_split=0.2) # set validation split



train_generator = train_datagen.flow_from_directory(

    train_data_dir,

    target_size=(img_width, img_height),

    batch_size=batch_size,

    class_mode='binary',

    subset='training') # set as training data



validation_generator = train_datagen.flow_from_directory(

    train_data_dir, # same directory as training data

    target_size=(img_width, img_height),

    batch_size=batch_size,

    class_mode='binary'

    subset='validation') # set as validation data



model.fit_generator(

    train_generator,

    steps_per_epoch = train_generator.samples // batch_size,

    validation_data = validation_generator, 

    validation_steps = validation_generator.samples // batch_size,

    epochs = nb_epochs)

answered Nov 21 '18 at 18:06

Beau Hilton

7714

answered Nov 21 '18 at 18:06

Beau Hilton

7714

answered Nov 21 '18 at 18:06

Beau Hilton

7714

answered Nov 21 '18 at 18:06

Beau Hilton

7714

add a comment |

Here's my approach:

# Create temporary validation set.

with TemporaryDirectory(dir=train_image_folder) as valid_image_folder, TemporaryDirectory(dir=train_label_folder) as valid_label_folder:

    train_images = os.listdir(train_image_folder)

    train_labels = os.listdir(train_label_folder)



    for img_name in train_images:

        single_name, ext = os.path.splitext(img_name)

        label_name = single_name + '.png'

        if label_name not in train_labels:

            continue

        if random.uniform(0, 1) <= train_val_split:

            # Move the files.

            shutil.move(os.path.join(train_image_folder, img_name), os.path.join(valid_image_folder, img_name))

            shutil.move(os.path.join(train_label_folder, label_name), os.path.join(valid_label_folder, img_name))

Don't forget to move everything back.

answered Mar 14 '18 at 13:55

Richard

313

add a comment |

Here's my approach:

# Create temporary validation set.

with TemporaryDirectory(dir=train_image_folder) as valid_image_folder, TemporaryDirectory(dir=train_label_folder) as valid_label_folder:

    train_images = os.listdir(train_image_folder)

    train_labels = os.listdir(train_label_folder)



    for img_name in train_images:

        single_name, ext = os.path.splitext(img_name)

        label_name = single_name + '.png'

        if label_name not in train_labels:

            continue

        if random.uniform(0, 1) <= train_val_split:

            # Move the files.

            shutil.move(os.path.join(train_image_folder, img_name), os.path.join(valid_image_folder, img_name))

            shutil.move(os.path.join(train_label_folder, label_name), os.path.join(valid_label_folder, img_name))

Don't forget to move everything back.

answered Mar 14 '18 at 13:55

Richard

313

add a comment |

Here's my approach:

# Create temporary validation set.

with TemporaryDirectory(dir=train_image_folder) as valid_image_folder, TemporaryDirectory(dir=train_label_folder) as valid_label_folder:

    train_images = os.listdir(train_image_folder)

    train_labels = os.listdir(train_label_folder)



    for img_name in train_images:

        single_name, ext = os.path.splitext(img_name)

        label_name = single_name + '.png'

        if label_name not in train_labels:

            continue

        if random.uniform(0, 1) <= train_val_split:

            # Move the files.

            shutil.move(os.path.join(train_image_folder, img_name), os.path.join(valid_image_folder, img_name))

            shutil.move(os.path.join(train_label_folder, label_name), os.path.join(valid_label_folder, img_name))

Don't forget to move everything back.

answered Mar 14 '18 at 13:55

Richard

313

Here's my approach:

# Create temporary validation set.

with TemporaryDirectory(dir=train_image_folder) as valid_image_folder, TemporaryDirectory(dir=train_label_folder) as valid_label_folder:

    train_images = os.listdir(train_image_folder)

    train_labels = os.listdir(train_label_folder)



    for img_name in train_images:

        single_name, ext = os.path.splitext(img_name)

        label_name = single_name + '.png'

        if label_name not in train_labels:

            continue

        if random.uniform(0, 1) <= train_val_split:

            # Move the files.

            shutil.move(os.path.join(train_image_folder, img_name), os.path.join(valid_image_folder, img_name))

            shutil.move(os.path.join(train_label_folder, label_name), os.path.join(valid_label_folder, img_name))

Don't forget to move everything back.

answered Mar 14 '18 at 13:55

Richard

313

answered Mar 14 '18 at 13:55

Richard

313

answered Mar 14 '18 at 13:55

Richard

313

answered Mar 14 '18 at 13:55

Richard

313

add a comment |

You solution worked, thanks.

   import os

   import shutil

   import numpy as np



   sourceN = base_dir + "\train\NORMAL\"

   destN = base_dir + "\val\NORMAL"

   sourceP = base_dir + "\train\PNEUMONIA"

   destP = base_dir + "\val\PNEUMONIA"



   filesN = os.listdir(sourceN)

   filesP = os.listdir(sourceP)       



   for f in filesN:

       if np.random.rand(1) < 0.2:

       shutil.move(sourceN + '\'+ f, destN + '\'+ f)



   for i in filesP:

       if np.random.rand(1) < 0.2:

       shutil.move(sourceP + '\'+ i, destP + '\'+ i)



   print(len(os.listdir(sourceN)))

   print(len(os.listdir(sourceP)))

   print(len(os.listdir(destN)))

   print(len(os.listdir(destP)))

answered May 23 '18 at 18:07

Jordy

Consider adding an explanation.

– Grant Miller
May 23 '18 at 18:38

add a comment |

You solution worked, thanks.

   import os

   import shutil

   import numpy as np



   sourceN = base_dir + "\train\NORMAL\"

   destN = base_dir + "\val\NORMAL"

   sourceP = base_dir + "\train\PNEUMONIA"

   destP = base_dir + "\val\PNEUMONIA"



   filesN = os.listdir(sourceN)

   filesP = os.listdir(sourceP)       



   for f in filesN:

       if np.random.rand(1) < 0.2:

       shutil.move(sourceN + '\'+ f, destN + '\'+ f)



   for i in filesP:

       if np.random.rand(1) < 0.2:

       shutil.move(sourceP + '\'+ i, destP + '\'+ i)



   print(len(os.listdir(sourceN)))

   print(len(os.listdir(sourceP)))

   print(len(os.listdir(destN)))

   print(len(os.listdir(destP)))

answered May 23 '18 at 18:07

Jordy

Consider adding an explanation.

– Grant Miller
May 23 '18 at 18:38

add a comment |

You solution worked, thanks.

   import os

   import shutil

   import numpy as np



   sourceN = base_dir + "\train\NORMAL\"

   destN = base_dir + "\val\NORMAL"

   sourceP = base_dir + "\train\PNEUMONIA"

   destP = base_dir + "\val\PNEUMONIA"



   filesN = os.listdir(sourceN)

   filesP = os.listdir(sourceP)       



   for f in filesN:

       if np.random.rand(1) < 0.2:

       shutil.move(sourceN + '\'+ f, destN + '\'+ f)



   for i in filesP:

       if np.random.rand(1) < 0.2:

       shutil.move(sourceP + '\'+ i, destP + '\'+ i)



   print(len(os.listdir(sourceN)))

   print(len(os.listdir(sourceP)))

   print(len(os.listdir(destN)))

   print(len(os.listdir(destP)))

answered May 23 '18 at 18:07

Jordy

You solution worked, thanks.

   import os

   import shutil

   import numpy as np



   sourceN = base_dir + "\train\NORMAL\"

   destN = base_dir + "\val\NORMAL"

   sourceP = base_dir + "\train\PNEUMONIA"

   destP = base_dir + "\val\PNEUMONIA"



   filesN = os.listdir(sourceN)

   filesP = os.listdir(sourceP)       



   for f in filesN:

       if np.random.rand(1) < 0.2:

       shutil.move(sourceN + '\'+ f, destN + '\'+ f)



   for i in filesP:

       if np.random.rand(1) < 0.2:

       shutil.move(sourceP + '\'+ i, destP + '\'+ i)



   print(len(os.listdir(sourceN)))

   print(len(os.listdir(sourceP)))

   print(len(os.listdir(destN)))

   print(len(os.listdir(destP)))

answered May 23 '18 at 18:07

Jordy

answered May 23 '18 at 18:07

Jordy

answered May 23 '18 at 18:07

Jordy

answered May 23 '18 at 18:07

Jordy

Consider adding an explanation.

– Grant Miller
May 23 '18 at 18:38

add a comment |

Consider adding an explanation.

– Grant Miller
May 23 '18 at 18:38

Consider adding an explanation.

– Grant Miller
May 23 '18 at 18:38

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Nsryjdtyk