Sample Data Generation
ChatGPT for Data Generation
Introduction
ChatGPT is a text generator so why not generate sample data which is meaningful and having required constraints. I was working on some unit test cases and needed sample data which will cover all the possible scenarios. We heard the term API first approach in the Micro-Services architecture, similarly we should adopt ChatGPT first approach for anything we do. Whenever you are working on something first see if you could somehow leverage ChatGPT in that. Although sample data generation tools help in generating sample data but ChatGPT is smarter and could generate more realistic samples.
Use Case
Consider a use case which is for generating sample data as JSON for the list of products. We will consider mobile devices for that. Let us explore sample data generation around this use case
{
"products": [
{
"id": 123,
"name": "IPhone",
"price": "$999",
"type": "Mobile",
"manufacturer": "Apple Inc."
}
]
}
ChatGPT Prompting
Below is the prompt which helps us in generating the required sample data
Generating Sample Data
Please generate sample data as JSON referring the given Sample Input JSON. Follow below guidelines while generating sample data :
Generate data with 5 products
Field level restrictions
Id : Sequential integer value starting with 1
Name : Use sample real names of Apple Products
Price : Use known prices of them in USD
Type : Use devices types as Mobile, Tablet and Laptop
Manufacturer : Keep this value constant
Sample Input :
{
"products": [
{
"id": 123,
"name": "IPhone",
"price": "$999",
"type": "Mobile",
"manufacturer": "Apple Inc."
}
]
}
Output Generated by ChatGPT
Note : The sample data may not be current values of the prices since the data ChatGPT has is till its knowledge cut-off date September 2021.{
"products": [
{
"id": 1,
"name": "MacBook Pro",
"price": "$1999",
"type": "Laptop",
"manufacturer": "Apple Inc."
},
{
"id": 2,
"name": "iPad Pro",
"price": "$799",
"type": "Tablet",
"manufacturer": "Apple Inc."
},
{
"id": 3,
"name": "iPhone 12",
"price": "$799",
"type": "Mobile",
"manufacturer": "Apple Inc."
},
{
"id": 4,
"name": "iMac",
"price": "$1299",
"type": "Laptop",
"manufacturer": "Apple Inc."
},
{
"id": 5,
"name": "iPad Air",
"price": "$599",
"type": "Tablet",
"manufacturer": "Apple Inc."
}
]
}
Analysis of the Prompt
On the left hand side you can see the output generated by ChatGPT. The prompt is self-explanatory. Still below are some key points about this prompt.
We defined a title ( i.e. Generating Sample Data ) which is like a goal we are setting
We defined field restrictions like unordered list as separate points instead of mentioning that in a paragraph. It helps us and ChatGPT to see the key points to be focussed on
Sample Input gives a reference for generating the data. It is something we do when we are requesting someone to work on some task. It is better to explain theory with an example
Summary
We have seen how ChatGPT has generated the required sample data by using its knowledge effectively and within the defined constraints. Creating this data on our own is time consuming and monotonous. Hence AI system is very effective for this task and we can focus on something more complex which could not be done by AI as of now. It does not mean AI cannot be used for something more complex. We are only trying to reduce our efforts and increase productivity in all the possible ways by leveraging ChatGPT. I generally chat with ChatGPT for finding a solution to a use case just like talking with an expert pair programmer. Will share more on that in later Blogs. Till then ...