Data Analysis

Using Genai for coding and data analysis

Many GenAI tools have the capability of performing data analysis and coding applications.

Here are a few prompts and ideas to get you started.

Important!

!! You should not provide confidential research data to any AI tools, it is academic misconduct at many instiutions. Use exemplar/made up data to generate the code for the analysis that can be downloaded, tweaked and used offline !!

Graphical Data

Many GenAI models can use numerical data to create graphical outputs. These can range from a simple bar graph to complex data sets. Many GenAI use Python code to complete the task that can then be exported.

GTP4 is required for advanced data analysis.
Claude.ai can create graphs using Python code.

Try this simple prompt

"Draw a bar chart and present as an image where (Group 1) has a mean of 67 and a standard deviation of 5 and (Group 2) has a mean of 54 and a standard deviation of 7."

Statistical Choice

Many students often struggle when it comes to choosing the correct statistical model for their work. The prompt below will walk you through choosing a model.

"Your role in this conversation is to act as a guide, helping a researcher to choose which statistical test to use when analysing their data. Your conversation will be based on working through a statistical decision tree. You will ask the researcher questions to guide them to the most appropriate statistical test. In your responses, give an explanation of the terms used below the main text. Use examples to help them understand. Your first question will be about the number of groups used in the study and ask for some background information.

Are you ready?"

Principal Component Analysis

Principal Component Analysis (PCA) is a technique used to reduce the dimensionality of data. It has several applications including exploratory data analysis, visualisation, and data preprocessing. With PCA, the data is transformed onto a new coordinate system that helps identify the directions (principal components) that capture the largest variation in the data. This makes it easier to analyze and interpret the data.

For this example, we’ll use the Breast Cancer Wisconsin data set from the UCI Machine learning repo as our data. Go ahead and load it for yourself if you want to follow along: wbdc.csv

! Advanced: You will need an AI model like GTP4 that has data analysis features. Starting prompts.

"Show the first five rows of the data set."

"Please can you perform a PCA analysis on the dataset."

Image Analysis

Some AI models can take the descriptions or insights generated by the image analysis model and use them to provide detailed explanations, generate creative content, or even answer questions about the image. In this example, we are using GPT4 to count the number of objects in an image. The final prompt is below.

START Prompt:

"The attached image is of a fluorescent cell culture experiment. The yellow represents stress granules as small dots. Please count the number of stress granules."

END Prompt:

Objective: To count the number of yellow stress granules in a given image of a fluorescent cell culture experiment.

Steps:

Load and Display the Image

- Import necessary libraries: matplotlib.pyplot and cv2.
- Load the image using cv2.imread().
- Convert the image from BGR to RGB using cv2.cvtColor().
- Display the image using matplotlib.

Pre-process the Image

- Convert the image to the HSV color space using cv2.cvtColor().
- Define the lower and upper bounds for the yellow color in the HSV space.
- Create a mask for the yellow color using cv2.inRange().
- Use morphological operations (cv2.morphologyEx()) to remove small noise and enhance the granules. A kernel (e.g., elliptical) can be defined using cv2.getStructuringElement().

Segmentation and Counting

- Find contours in the processed mask using cv2.findContours().
- Filter out very small contours that might be noise based on a predefined minimum contour area.
- Draw the filtered contours on the original image using cv2.drawContours().
- Display the image with the detected stress granules using matplotlib.
- Count the number of detected granules.

Results

- Display the number of detected stress granules.

Page updated

Report abuse