Generate custom dataset

Generate a custom dataset with two classes

In this tutorial we will generate a custom dataset with Pooh Bear and Tiger. On the Dataset generator page you already can find a description for generating a custom dataset with specific cards. This description will be very similar for the previous article, however this one takes part of a full description for training YOLO to custom classes. An interactive demo is available on the Google Colaboratory: https://drive.google.com/file/d/1OMq52WWisgWU3zzx9HdJ3dBQ6d80s8zT/view?usp=sharing.

Let's start generating dataset!

  • Clone the git repository: https://github.com/szaza/dataset-generator;
  • Build the software by running the ./gradlew clean build command;
  • Into the dataset/ directory we placed two folders: cartoon/, /cartoon-backgrounds what contain the source pictures;
Winnie-the-Pooh - source for dataset generation

Winnie-the-Pooh - source for dataset generation

Tiger - source for dataset generation

Tiger - source for dataset generation

  • Pay attention for the names of the images, they should be named according to the following convention: <name of the class>-{number of the image}.<extension>, e.g. tiger-1.png;
  • Modify the configuration file:
    • set the SOURCE_DIR for the cartoon/ folder;
    • set the BACKGROUND_DIR for the cartoon-backgrounds/;
    • set the ROWS, COLS and DATA_SET_SIZE variables;
    • enumerate your class labels in the CLASS_LABELS array; check the name of the images in the SOURCE_DIR to be named according the convention: <name of the class>-{number of the image}.<extension>;
    • set the DEBUG flag if you would like to see the bounding boxes around the objects on the generated images;
  • Run the ./gradlew run command from the root directory;

Your generated data should look like the following ones:

Generated data for Winnie-te-Pooh dataset

Generated data for Winnie-the-Pooh dataset

Generated data for Winnie-te-Pooh dataset

Generated data for Winnie-te-Pooh dataset