You will need to convert 16 bit numbers between Denary & Binary and Binary numbers to and from Hex
Methods
Binary to Denary: just add up the powers of 2.
Denary to Binary: Repeated subtraction of highest power of 2 OR repeated division by 2 to get remainders
Binary to Hex: groups of four starting from the least significant bit
Can easily move between hexadecimal and binary, then use the previous techniques to convert between denary. E.g.,
Resources
Worksheets handed out in class
BBC Bitesize: Introduction to Binary
Binary Game (Cisco / code.org)
Crash Course CS Ep 4 - Representing Numbers and Letters with Binary
In modern computers, text is represented using Unicode. In particular, the UTF-8 encoding is popular and for English text, is basically the old ASCII encoding that used to be standard. For a quick explanation, watch the Unicode Miracle video.
ASCII stands for American Standard Code for Information Interchange and it was standardised in the 1960s. The original ASCII uses 7 bits to give 128 characters that include the English alphabet, numerals and punctuation. This was extended to an 8 bit (one byte) encoding with "modern" microcomputers in the 70s and 80s.
Older codes include EBCDIC and Morse Code.
ASCII, Unicode & UTF-8
Extension: Older Encodings
You can generate your own table in Python with code like:
for i in range(32, 128): print(i, "=", hex(i)[2:].upper(), " -> ", chr(i))You should know that the digits 0-9, capitals A-Z and lowercase a-z are in sequences. Punctuation is relatively randomly scattered in the gaps.
00NNNNN are control characters (from the old teletypes). My favourite is 0716 = 00001112...
01NNNNN are the first set of printable characters
0110000 = 0, 0110001 = 1, ... 0111001 = 9.
10NNNNN are the capital letters
1000001 = A, 1000010 = B, ... 1011010 = Z.
11NNNNN are the lowercase letters
1100001 = a, 1100010 = b, ... 1111010 = z.
Just flip the 2nd bit on the left (add/subtract 32) to switch between uppercase and lowercase!
You see character codes represented in Hex when a non-ASCII or a protected symbol is used in an URL. E.g., example.com/products%20and%20services.html. See URL Percent Encoding for more details.
Character arithmetic: Using the fact that the alphabet is stored in sequence, you can add or subtract numbers to the character codes to move around the alphabet. This is used in simple ciphers (TODO: Add link) and could also be an exam questions. E.g.,
Q: Given the ASCII character code for ‘A’ is 65. What is the 7-bit binary representation for the character ‘H’.
Ans: A = 1000001, H is 8th letter, so H = 1001000
Unicode replaced the extended 8-bit ACSII as the international standard. It is managed by the Unicode Consortium. Unicode is basically just a list of symbols and glyphs that are assigned a number (code point) in the unicode table. The first 7 bits (128) are identical to ASCII, after that it includes symbols and accented characters from all modern and extinct languages, as well as symbols for drawing, linguistics, maths, science and emoji.
Note that not every character/glyph corresponds to a single unicode code-point: E.g., Combining Characters: Bear + Snowflake = Polar bear
The full Unicode (UTF-32) standard requires 32 bits to represent the largest code points. This is inefficient, so the standard is to use a variable byte representation called UTF-8.
UTF-8 for different numbers of Bytes:
1 Byte 0xxxxxxx - just 7 bit ASCII
2 Bytes 110xxxxx 10xxxxxx
3 Bytes 1110xxxx 10xxxxxx 10xxxxxx
4 Bytes 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
The header information is removed and the bolded bits above are joined to give a binary number corresponding to the Unicode symbol that you want.
>>> word = "Café"
>>> word.encode(encoding='utf-8')
b'Caf\xc3\xa9'
C3 A9 = 1100 0011 1010 1001
Remove the UTF-8 byte header information to get the bolded bits: 00011101001. We can convert this number to a character to double check:
>>> int("00011101001", 2)
233
>>> chr(233)
'é'
These graphics are defined from combinations of mathematical shapes and curves. This allows them to be scaled up and down without loss of quality. Vector graphics are good for simple graphics & logos, but inefficient for photo-style graphics. Modern fonts also use vector graphics. Can be compressed using a lossless LZ-like compression.
Common file formats are: Scalable Vector Graphics (.svg) a modern web standard as an application of XML (like HTML); Adobe Illustrator (.ai); Postscript (.ps) - Adobe format, originally used for communicating to printers/plotters and is the ancestor of Portable Document Format (.pdf)
Bitmapped images are stored as a two-dimensional grid of pixels. "True color" images use 24 bits per pixel (aka colour depth), 8 bits per RGB colour channel. This means that there is a simple calculation for the file size of an uncompressed image - it's basically just the volume of a box...
size in bits = (# of pixels) * (bits per pixel) = height * width * color depth
For example a 400*200 true color image:
size in kB = (400 * 200 px) * 24 bits/px * 1B/8bits * 1kB/1024B = 400*200*3/1024 kB = 234.4 kB
.bmp and .raw files are uncompressed bitmap images.
Lossless compression: Bitmap graphics can be compressed using a LZ-like compression, but it does not compress things like photos very well. Portable Network Graphics (.png) files are usually a lossless format.
Lossy compression:
Reduce resolution - simple way to reduce image quality, but good quality to compression ratio
Reduce color space - GIFs (Graphics Interchange Format) work by using 8 bit color map instead of 24bit RGB for each pixel. This is lossy for most images. The GIF is then compressed using LZW to get further lossless compression.
JPEG (Joint Picture Expert Group) type compression - works by hiding information that the eye does not see very well. This is based on a colour space transformation and reduction, followed by a Discrete Cosine Transform and quantization of the resultant frequency space, finally a lossless Huffman-type encoding. JPEGs get a compression ratio of about 15:1, depending on image content and compression settings.
"True color" in computers is where each pixel has 24 bits of color depth divided amongst the 3 colour channels Red, Green, and Blue - the three additive primary colours. That is, there is 1 Byte of information for each of RGB for each pixel. This is often written as 6 hex digits: #RRGGBB. For example. #FF0000 is bright red, #FFFF00 is bright yellow, #FF00FF is bright pink/magenta, #A8A8A8 is a light grey.
To approximately figure out what colour is represented by a RGB mix, place them on the rainbow / colour wheel. (Note that the concept of Blue in the 7 ROYGBIV colours due to Newton has shifted over the years)
Magenta R O Y G B I V Magenta
R G B
R+G = Orange to Yellow to Lime Green depending on the ratio
G+B = Cyan / Aqua
B+R = Hot Pink / Magenta
Note that the additive primary colours RGB used for mixing light are the opposite of the subtractive primary colours used for mixing paint and inks. Printers use CMYK, which stands for Cyan, Magenta, Yellow, Key/blacK.
Resources
BBC Bitesize: Encoding Images
There is no Pink Light (Minute Physics), There is no White Light (The Science Asylum)
Hex Color Game - Really Good practice!
Colours & Maths Understanding the formulas of colour conversion
Create a simple vector graphic (SVG) using the following link https://editor.method.ac/
Look at the source code for the SVG (under the view menu)
Save as a SVG file to your computer and then use an online conversion tool to convert to BMP and JPG/PNG to compare size and quality of the graphics - zoom in to see the pixelisation and the compression artifacts.
Extension: Use photoshop to open a photo and then File-"Save for Web" to explore how reducing the resolution and colorspace effects the image.
Reading below this point may have unintended educational effects...
Originally designed to be a way to connect digital instruments, but can also be used to store the sequences played/produced by a digital instrument. Midi is analogous to vector graphics and can be compressed using lossless compression.
Sound is the oscillations in the air. These waves can be captured by microphones and turned into digital data using an Analogue to Digital Converter.
Size of raw audio files (.wav, .aiff files)
The sample rate is how many samples taken per second. Often measured in Hertz (Hz = 1/sec). CD quality sound uses 44.1kHz.
The sample depth (or resolution or bit depth) is how many bits used for each sample (4 in the above image, 16 or 24 bits in CDs & DVDs)
Bit Rate = (Sample Rate) * (Sample Depth) is the number of bits required per second.
File size in bits = bit rate * length of recording
For example, 2 minutes of music sampled at 16000Hz with sample depth 8 bits.
Size = (120) sec * (16000 samples/sec) * (8 bits/sample) = 120*16000*8 bits = 120*16000 B = 1875 kB
Lossless Compression
FLAC (free lossless audio codec) and other lossless compression formats use things like Run-Length-Encoding, Linear Prediction, LZ Compression etc. These achieve about a 2:1 compression ratio on music.
Lossy Compression
Codecs such as mp3 (MPEG Audio Layer III), aac (advanced audio codec), wma (windows media audio), and ogg use psychoacoustics & perceptual music shaping to reduce the quality of sounds in ways that human listeners will not perceive. If two sounds play at the same time, often the softer one can be mostly removed. These codecs get about a 10:1 compression ratio.
Get a Midi from BitMidi (e.g., Super Mario Bros)
Look at its structure using Mid2Txt
Convert it to a Wav file & check its file size matches expectations (DO THE CALCULATION)
Convert it to a MP3 file and calculate the compression ratio c.f. the Wav file.