Sample Data Generation

ChatGPT for Data Generation

Image by storyset

Disclaimer: The content of this blog are my views and understanding of the topic. I do not intend to demean anything or anyone. I am only trying to share my views on the topic so that you will get a different thought process and angle to look at this topic.

Introduction

ChatGPT is a text generator so why not generate sample data which is meaningful and having required constraints. I was working on some unit test cases and needed sample data which will cover all the possible scenarios. We heard the term API first approach in the Micro-Services architecture, similarly we should adopt ChatGPT first approach for anything we do. Whenever you are working on something first see if you could somehow leverage ChatGPT in that. Although sample data generation tools help in generating sample data but ChatGPT is smarter and could generate more realistic samples.

Use Case

Consider a use case which is for generating sample data as JSON for the list of products. We will consider mobile devices for that. Let us explore sample data generation around this use case

{

"products": [

{

"id": 123,

"name": "IPhone",

"price": "$999",

"type": "Mobile",

"manufacturer": "Apple Inc."

}

]

}

ChatGPT Prompting

Below is the prompt which helps us in generating the required sample data

Generating Sample Data

Please generate sample data as JSON referring the given Sample Input JSON. Follow below guidelines while generating sample data :

Field level restrictions

Sample Input : 

{

    "products": [

        {

            "id": 123,

            "name": "IPhone",

            "price": "$999",

            "type": "Mobile",

            "manufacturer": "Apple Inc."

        }

    ]

}

Output Generated by ChatGPT

Note : The sample data may not be current values of the prices since the data ChatGPT has is till its knowledge cut-off date September 2021.

{

  "products": [

    {

      "id": 1,

      "name": "MacBook Pro",

      "price": "$1999",

      "type": "Laptop",

      "manufacturer": "Apple Inc."

    },

    {

      "id": 2,

      "name": "iPad Pro",

      "price": "$799",

      "type": "Tablet",

      "manufacturer": "Apple Inc."

    },

    {

      "id": 3,

      "name": "iPhone 12",

      "price": "$799",

      "type": "Mobile",

      "manufacturer": "Apple Inc."

    },

    {

      "id": 4,

      "name": "iMac",

      "price": "$1299",

      "type": "Laptop",

      "manufacturer": "Apple Inc."

    },

    {

      "id": 5,

      "name": "iPad Air",

      "price": "$599",

      "type": "Tablet",

      "manufacturer": "Apple Inc."

    }

  ]

}

Analysis of the Prompt

On the left hand side you can see the output generated by ChatGPT. The prompt is self-explanatory. Still below are some key points about this prompt.

Summary

We have seen how ChatGPT has generated the required sample data by using its knowledge effectively and within the defined constraints. Creating this data on our own is time consuming and monotonous. Hence AI system is very effective for this task and we can focus on something more complex which could not be done by AI as of now. It does not mean AI cannot be used for something more complex. We are only trying to reduce our efforts and increase productivity in all the possible ways by leveraging ChatGPT. I generally chat with ChatGPT for finding a solution to a use case just like talking with an expert pair programmer. Will share more on that in later Blogs. Till then ...

Keep Thinking !!!