Section 7.10

Analyze Compression Techniques

Learning Goals

DAT-1.D: Students will compare and contrast lossy vs lossless compression.
DAT-1.D: Students will discuss the tradeoffs associated with lossy vs lossless compression.
Students will identify reasons for file compression.
Students will practice compressing text using run length encoding, keyword encoding and Huffman encoding techniques.
DAT-1.D: Compare data compression algorithms to determine which is best in a particular context.
DAT-1.D.1: Data compression can reduce the size (number of bits) of transmitted or stored data.
DAT-1.D.2: Fewer bits does not necessarily mean less information.
DAT-1.D.3: The amount of size reduction from compression depends on both the amount of redundancy in the original data representation and the compression algorithm applied.
DAT-1.D.4: Lossless data compression algorithms can usually reduce the number of bits stored or transmitted while guaranteeing complete reconstruction of the original data.
DAT-1.D.5: Lossy data compression algorithms can significantly reduce the number of bits stored or transmitted but only allow reconstruction of an approximation of the original data.
DAT-1.D.6: Lossy data compression algorithms can usually reduce the number of bits stored or transmitted more than lossless compression algorithms.
DAT-1.D.7: In situations where quality or ability to reconstruct the original is maximally important, lossless compression algorithms are typically chosen.
DAT-1.D.8: In situations where minimizing data size or transmission time is maximally important, lossy compression algorithms are typically chosen.

Objectives and General Description

Now that students have a good understanding of how data is stored and used to represent numbers, characters, images and audio, they need to examine the amount of data and the problems that can arise from large file sizes. Students will learn about the concept of file compression, explore the tradeoffs in lossy vs lossless compression and further dig into text compression algorithms. This section starts with a class activity in which students are asked to determine the best solution for a specific scenario regarding a photography business. This leads into exploration of lossy versus lossless algorithms and file types. Students have a guided notes handout to complete during this activity. In the second activity, students learn to compress text with keyword encoding, run-length encoding and Huffman encoding.

Activities

Activity 7.10.1 (budget 30 minutes)

Pose the following scenario:

Deja and Wyatt are starting a photography business. They are exploring several business models in regards to the storage and transmission of the images. Deja and Wyatt have several options. Which would you choose for their business?

Purchase high quality, expensive computers with 2 TB of storage. Images are stored locally and transmitted as .png files.
Purchase slower, less expensive computers. Images are stored locally and transmitted as .jpg files.
Purchase cloud storage for .png images. Customers log in and download images upon payment.
Purchase cloud storage for .jpg images. Email images to the customers upon payment.

Give the students a few minutes to research the options. Then have students go to the “four corners” of the room based upon their selection. Each group needs to develop a minimum of three justifications for their choice. Divide the white board into four segments, one for each group. Teacher or student records the responses in the appropriate section.
Circle any commonalities among the justifications.
Discussion questions:
1. What is the difference between .jpg and .png?
2. How does file type affect storage requirements?
3. How does file type affect the quality of transmission?
4. What is more important, quality of product, cost of storage or speed of transmission?
During or after the discussion, emphasize that there isn’t a “right” or “wrong” option. There are tradeoffs regarding storage capacity, privacy of information, speed of transmission of data, cost of equipment, etc. Those decisions are made everyday and we just need to understand the tradeoffs and implications of those decisions.
We can reduce storage needs and speed up transmission speed by compressing files. There are two types of compression techniques, lossy and lossless. An example of a lossless image file is .png. It has been slightly compressed but there is no data lost. It can be reconstructed back to its original state. An example of a lossy image file is a .jpg. It has been highly compressed so it requires less storage but can NOT be reconstructed back to its original state. Certain pixels have been removed completely. Many people can’t tell the difference by looking at a .jpg versus a .png. However, many people can recognize the higher image quality of a .png.
What are lossy and lossless file types for audio files?
Have students complete the Lossy vs Lossless Guided Notes during/after this discussion while the information is fresh in their minds.
Standards to emphasize during the discussion
1. - DAT-1.D.1 Data compression can reduce the size (number of bits) of transmitted or stored data.
  - DAT-1.D.2 Fewer bits does not necessarily mean less information.
  - DAT-1.D.3 The amount of size reduction from compression depends on both the amount of redundancy in the original data representation and the compression algorithm applied.
  - DAT-1.D.4 Lossless data compression algorithms can usually reduce the number of bits stored or transmitted while guaranteeing complete reconstruction of the original data.
  - DAT-1.D.5 Lossy data compression algorithms can significantly reduce the number of bits stored or transmitted but only allow reconstruction of an approximation of the original data.
  - DAT-1.D.6 Lossy data compression algorithms can usually reduce the number of bits stored or transmitted more than lossless compression algorithms.
  - DAT-1.D.7 In situations where quality or ability to reconstruct the original is maximally important, lossless compression algorithms are typically chosen.
  - DAT-1.D.8 In situations where minimizing data size or transmission time is maximally important, lossy compression algorithms are typically chosen.
Optional extension: Have student groups develop additional scenarios and challenge the rest of the class to select the appropriate solution.

Activity 7.10.2 (55 minutes)

Text compression techniques

At the end of the previous activity, students were asked if they thought a text file should be compressed using a lossy or lossless compression. Text files are typically compressed using a lossless algorithm so that no data is lost. This activity is a deep dive into three types of text compression algorithms, keyword encoding, run-length encoding and Huffman encoding.
Introduce the concept of the compression ratio first. Ratio = compressed size / original size. I count the characters in the example phrases before and after we do a compression technique. Do the math. The closer the ratio is to 0, the more compressed the phrase is.

1. Keyword encoding: replace commonly used words or phrases with a single character
2. Run length encoding: Replace a repeated sequence with a flag character, the sequence and the number of repetitions.
3. Huffman encoding: A type of encoding that assigns a smaller binary value to frequently used characters and a higher binary value to less frequently used characters. Video link of example by eStudy on Youtube.com

Students work through several examples as we try out the different types of compression.
Student worksheet for text compression.
Here is the Huffman Encoding Template Activity.
Students are not required to know specific compression techniques for the exam.

Resources

Report abuse