Alphabets

From Wikipedia:


ASCII (/ˈæskiː/ (listen) ASS-kee),[1]:6 abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Most modern character-encoding schemes are based on ASCII, although they support many additional characters.

ASCII is the traditional name for the encoding system; the Internet Assigned Numbers Authority (IANA) prefers the updated name US-ASCII, which clarifies that this system was developed in the US and based on the typographical symbols predominantly in use there.[2]


Extended Binary Coded Decimal Interchange Code[1] (EBCDIC;[1] /ˈɛbsɪdɪk/) is an eight-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems. It descended from the code used with punched cards and the corresponding six-bit binary-coded decimal code used with most of IBM's computer peripherals of the late 1950s and early 1960s.[2] It is supported by various non-IBM platforms, such as Fujitsu-Siemens' BS2000/OSD, OS-IV, MSP, and MSP-EX, the SDS Sigma series, Unisys VS/9, Burroughs MCP and ICL VME.


Unicode: Since processors can handle only binary data, how can they deal with text? They assign a number to each character in the alphabet, which raises a very important, but tricky, question: How many letters are there in the alphabet? The question is tricky because you need to first ask: Which alphabet? Until the 1990s (or later), most computer systems were based on an English-language assumption that the alphabet has only 26 letters (plus uppercase letters, numbers, and punctuation). The key to numbering the alphabet is that all hardware and software vendors have to agree to use the same numbering system. It is a little hard for everyone to agree on a system that only displays English characters. Consequently,most computers today support the Unicode system. Unicode is an international standard that defines character sets for every modern language and even most dead languages.


A key feature of Unicode is that it can handle Asian languages such as Chinese and Japanese that use ideograms instead of characters. The challenge with ideograms is that each language contains tens of thousands of different ideograms. Character-based languages like English used to be handled by storing one character number in one byte. But one byte can only hold 256 (28) numbers—which is not enough to handle ideograms. Consequently, Unicode uses two bytes to count each character or ideogram, enabling it to handle 65,536 (216) ideograms in each language. It also has the ability to switch to three bytes if even more characters or ideograms are needed.