3_3_5 Character encoding

You should be able to:

  • Understand what a character set is and be able to describe the following character encoding methods:

•• 7-bit ASCII

•• Unicode.

  • Understand that character codes are commonly grouped and run in sequence within encoding tables.
  • Describe the purpose of Unicode and the advantages of Unicode over ASCII.
  • Know that Unicode uses the same codes as ASCII up to 127.

REVISE:

What is ASCII?

ASCII is a 7 - bit character set.

Each character in ASCII takes 7-bits of data storage.

The maximum number of characters that can be used with ASCII is 128.

This is because the highest number that can be made with 7-bits is 127 or 1111111.

The 128th character is 0 in decimal or 0000000 in binary.

You can read a little more about this here: 7_2_11

What is Unicode?

Unicode is also a character set but it allows for a higher bit depth, usually 2 bytes, meaning that it can store more characters. There are three standards of Unicode:

  • UTF - 8
  • UTF - 16
  • UTF - 32

UTF stands for Unicode Transformation Format.

The assumption is that the numbers represent the bit depth giving you a maximum amount of characters but this isn't how Unicode works.

Unicode works in CODE POINTS, each code point is a number and this number is usually associated with a character.

However, code points can be joined together. For example... E acute (see image). The letter E and the character above it sit in two code points that are joined together to form a new character.

You can find out about the different code points at the Unicode website: http://www.unicode.org/

The first 128 characters of Unicode match the 128 in ASCII.

In ASCII, the character code for A is 65 (Decimal).

In Unicode the character code for A is 0041 (Hex)

0041 in Hex is 65 in Decimal. They are the same.

You can see this on Page 2 of this document. http://www.unicode.org/charts/PDF/U0000.pdf

What is an encoding table and how do they work?

To help developers understand what each character code represents, an encoding table is created.

For ASCII it looks like this:

Source: Wikimedia Commons

Each character is given a code, this is commonly referred to in its hex or decimal format.

Find the decimal code for the + symbol.

Try this in Python to see if you got it right:

code = # <<< enter the decimal number for the + sign here
print (chr(code))

Unicode has many different encoding tables depending on the language that is being used. You can see the Greek one here: http://www.unicode.org/charts/PDF/U0370.pdf

All of the encoding tables for Unicode can be found here: http://www.unicode.org/charts/

Unicode Vs ASCII

Unicode...

  • allows for more characters to be used
  • can be used with more languages
  • aids global communication and collaboration
  • takes up more space to store each letter

ASCII...

  • only 128 characters
  • mainly English (or basic Latin characters)
  • takes up less space for each character compared to Unicode

TEST:

  1. Download and print the test paper here: https://drive.google.com/open?id=0B5fLtQ0Xgr2PVV82N2ItcnpmMDA
  2. Try the mock test yourself.
  3. Use the 3.3.5 Walking Talking Mock below to guide you through answering the questions.

SOURCE RECOGNITION - PLEASE NOTE: The examination examples used in these walking talking mocks are samples from AQA from their non-confidential section of the public site. They also contain questions designed by TeachIT for AQA as part of the publicly available lesson materials.