In modern computers, text is represented using Unicode. In particular, the UTF-8 encoding is popular and for English text, is basically the old ASCII encoding that you are required to know about for this course. For a quick explanation, watch the Unicode Miracle video.
ASCII stands for American Standard Code for Information Interchange and it was standardised in the 1960s. The original ASCII uses 7 bits to give 128 characters that include the English alphabet, numerals and punctuation. This was extended to an 8 bit encoding with "modern" microcomputers in the 70s and 80s.
Origins of ASCII, Unicode & UTF8
Older Encodings were used, e.g.
You can generate your own table in Python with code like:
for i in range(32, 128):
print(i, "=", hex(i)[2:].upper(), " -> ", chr(i))
You should know that the digits 0-9, capitals A-Z and lowercase a-z are in sequences. Punctuation is relatively randomly scattered in the gaps.
You see ASCII character codes represented in Hex when a non ASCII or a protected symbol is used in an URL. E.g., example.com/products%20and%20services.html. See URL Percent Encoding for more details.
Character arithmetic: Using the fact that the alphabet is stored in sequence, you can add or subtract numbers to the character codes to move around the alphabet. This is quite common in exam questions. E.g.,
Text (including source code, vector graphics, etc...) must be stored using lossless compression, otherwise it would look garbled when uncompressed. The type of compression that you need to know for the IGCSE is:
You need to describe how such compression schemes work in exam-style questions.
Extension: Huffman coding is a way of generating a variable length encoding for text (and other data) that will lead to lossless compression. This is in the AQA equivalent course to ours! See the CS Field Guide link above for more detail.
BBC Bitesize: Character Sets & ASCII