Language Models

K-2 How Does a Computer Know What Word or Phrase Comes Next?

Structure of Learning Activity

This lesson simulates a language model. The data being used to train the model is sentence strips and predefined relationships between words in the sentence. Just as a large language model (LLM) generates text based on the relationships between tokens (words and parts of words) and the probability of what the next token will be, our language model will also use the relationships between our tokens (words in the containers) and probability to “generate” sentences based on the data we add to our model.

Prior to Lesson: If students are not familiar with the book, read or watch: Green Eggs & Ham
Warm Up Activity: Complete the sentence
Whole Class Activity: Train an unplugged AI language model using sentences from Green Eggs and Ham
Green Eggs and Ham is being used because of the repetitive nature of the text in the book. Much of what the character, who is not Sam, says starts with the word “I” and those are the sentences we will be using to build our language model.
- Each student will:
  - Read the sentence to be added to the model
  - Cut the sentence strip into 5 parts
  - Add the word(s) of the sentence strip to containers 1 through 4
  - If part 5 of the sentence strip is not blank
    - Add the word(s) of the sentence strip to container 5
- Each student will:
  - Use the fill-in-the-blank sentence strips to write their own sentence in the style of Green Eggs and Ham
  - Read their sentence
  - Cut the sentence strip into 5 parts
  - Add the word(s) of the sentence strip to containers 1 through 4
  - If part 5 of the sentence strip is not blank
    - Add the word(s) of the sentence strip to container 5
Small Group Activity: Generate & illustrate sentences using model
- Divide students into pairs or small groups of 4-6. Students could also work individually.
- Each student/pair/group will:
  - Generate a sentence using the language model the class built.
  - After sentences are built, each student will document the sentence generated and “illustrate” the sentence in some way:
    - Write or record their sentence and draw a picture of their sentence (on paper or on a platform like Seesaw).
    - Code their sentence in ScratchJr.
      - Record themselves reading the sentence and/or type the sentence into the label for a scene.
      - Use ScratchJr characters to act out the sentence.
    - Act out their sentence and, optionally, have other students guess what their sentence is.
    - Return the words of the sentence to the proper containers.

Materials Needed

Teacher Materials

Teacher Slides
5 containers (buckets, bowls, boxes, etc.) labeled with the shape and number for each of the five parts of the sentence: 1-circle, 2-heart, 3-triangle, 4-square, 5-star.
Scissors
Cardstock (for printing)
Teacher device and printer
Optional:
- Green Eggs and Ham read-aloud
- Scratch Green Eggs and Ham Sentence Generator

Student Materials

Sentence strips from Green Eggs and Ham, preferably printed on cardstock, cut into individual sentence strips (1 per student)
- There are 35 sentences in total. If you have less than 35 students, do one of the following:
  - Give some students more than one sentence
  - Use less sentences
    - Make sure to use the sentences included on the Teacher Slides
    - Make sure to use both positive and negative sentences
Fill-in-the-blank sentence strips, preferably printed on cardstock, cut into individual sentence strips (1 per student)
- Pages 1 & 2 of the fill-in-the-blank document contain 10 fill-in-the-blank sentences appropriate for all grade levels, K-2. Print as many copies as needed to provide each student with one sentence strip to customize.
- Pages 3 & 4 of the fill-in-the-blank document is most appropriate for grade 2 or above. Students must fill in the missing word with a plural pronoun or plural noun for proper verb agreement.
- If you are using this activity for upper elementary or beyond, page 5 of the fill-in-the-blank document would potentially be appropriate. It is important that students follow the structure of the sentences from Green Eggs and Ham where:
  - The first part of the sentence is the subject and is I or a plural noun or plural pronoun.
  - The second part of the sentence is a word or words that are auxiliary (modal) verbs with or without a negative adverb like do or do not, will or will not, can or can not, etc.
  - The third part of the sentence is a verb that agrees with I or a plural subject; eat NOT eats, like NOT likes, etc.
  - The fourth and fifth part of the sentence are the object. If the fourth part is the object (e.g., green eggs and ham) it should include punctuation and there will be no fifth part of of the sentence.
Print out of Algorithms To Train Our Language Models (1 per table group for 2nd grade & above)
Paper, blank sentence strips, or personal whiteboards for recording generated sentences
Pencils (1 per student)

Once this lesson has been completed, the trained language model can make a good station for students to visit independently to generate one or more sentences and illustrate them.

Background Information

Large Language Models (LLM) are trained with huge amounts of text using a neural network architecture called a transformer (the T in GPT - Generative Pretrained Transformer) to make sense of the data. LLMs are really good at finding patterns in how words and phrases relate to each other and in making predictions about what words should come next. LLMs do not look at just one word or short phrase at a time like autocorrect but add words in context with what was previously generated. It seems like the AI is having a conversation with you. Because of this, people often talk about the output from an LLM as being thoughtful and creative. BUT … LLMs do not really “know” anything and they are not creative; they are just very good at figuring out which word follows another very quickly.

Additional Resources:

Building a Small Language Model with Green Eggs and Ham
Code.org: How Chatbots and Large Language Models Work
Zapier: How does ChatGPT work?
Wired: How ChatGPT and Other LLMs Work—and Where They Could Go Next
Video: How ChatGPT Works
Look into the machine's mind (Visualization of the output of ChatGPT on some select prompts.)

CSTA K-12 CS Standards

1A-DA-07 Identify and describe patterns in data visualizations, such as charts or graphs, to make predictions.
1B-DA-07 Use data to highlight or propose cause-and-effect relationships, predict outcomes, or communicate an idea.
Students will be looking for patterns in the data added to the model to predict words that are likely to occur.

1A-AP-08 Model daily processes by creating and following algorithms (sets of step-by-step instructions) to complete tasks.
Students will be following an algorithm to build and use the model.

AI4K12 Guidelines

K-2.4-A-i Demonstrate knowledge of the structure of language through tasks such as (a) generating plausible and implausible novel words, or (b) reordering the words in a scrambled sentence so that it makes sense.

Page updated

Google Sites

Report abuse