Run-length encoding (RLE) is a very simple form of data compression in which consecutive sequences of the same value are stored as a single value along with its count.
For example, instead of storing the character string "AAAAABBBAAAABBCCCCC", we store "A5B3A4B2C5".
Below, we implement the algorithm in Python to compress a string of non-numeric characters:
def comprime(texto):
comp=""
i=0
while(i<=len(texto)-1):
car=texto[i]
if (i==len(texto)-1):
comp=comp+car+"1"
break
carsig=texto[i+1]
comp+=car
j=1
while(car==carsig):
j+=1
i+=1
car=texto[i]
if (i==len(texto)-1):
break
else:
carsig=texto[i+1]
comp+=str(j)
i+=1
return comp
Similarly, the algorithm to decompress a string encoded using the previous method is implemented as follows:
def descomprime(texto):
letra=str(texto[0])
descomp=""
i=0
while(i<len(texto)-1):
letra=texto[i]
num=""
i+=1
while(texto[i].isnumeric()):
num+=texto[i]
if (i<len(texto)-1):
i+=1
else:
break
for j in range(0,int(num)):
descomp+=letra
return descomp
We can convert a black-and-white image into a matrix of 1s and 0s, transform it into a numeric string, and then replace 1s with "A" and 0s with "x" since the compression algorithm works on non-numeric strings.
from PIL import Image
import numpy as np
def imagen_a_cadena(imagen):
im = Image.open(imagen)
p = np.array(im)
filas=p.shape[0]
cols=p.shape[1]
texto=""
for i in range(0,filas):
for j in range(0,cols):
if (p[i,j]==0):
texto+="x"
else:
texto+="A"
return texto
Small example: a 2x2 matrix given by [[1,1,1],[1,0,0]] with only two values.
We convert it into a text string, obtaining 111100.
To make it a non-numeric string, we replace 1 with A and 0 with x, resulting in AAAAxx.
Applying RLE compression, we get A4x1.
Originally, it occupied 6 characters; after RLE compression, it occupies 4 characters.
Now, let's see an example with a 50x50 black-and-white image.
We convert it into a text string (containing 1 for black pixels and 0 for white pixels).
A white pixel is represented by 0, which we replace with x.
A black pixel is represented by 1, which we replace with A.
We then apply RLE compression.
50x50 pixeles image -> 50x50 matrix
We display the original size of the string derived from the image on the screen, followed by the size of the "compressed" string.
cadena=imagen_a_cadena("cara.png")
im_comprimida = comprime(cadena)
im_descomprimida = descomprime(im_comprimida)
print("tamaño de la imagen original",len(cadena))
print("tamaño de la imagen comprimida",len(im_comprimida))
print(cadena)
print("cadena comprimida",im_comprimida)
print(im_descomprimida)
Initialsize: 2500 characters.
After the RLE compression; 562 characters.