Internationalization

This is for GRUB to perform language translation, hence the name "internationalization" or i18n. To support this, we need to take note of a few things.

Charset

GRUB use UTF-8 encoding internally by by default.

Filesystem

  • NTFS, JFS, UDF, HFS+, exFAT, long filenames in FAT, Joliet part of ISO9660 are treated as UTF-16 as per specification.
  • AFS and BFS are read as UTF-8, again according to specification.
  • BtrFS, cpio, tar, squash4, minix, minix2, minix3, ROMFS, ReiserFS, XFS, ext2, ext3, ext4, FAT (short names), RockRidge part of ISO9660, nilfs2, UFS1, UFS2 and ZFS are assumed to be UTF-8.

This might be false on systems configured with legacy charset but as long as the charset used is superset of ASCII you should be able to access ASCII-named files. And it’s recommended to configure your system to use UTF-8 to access the filesystem, convmv may help with migration.


  • ISO9660 (plain) filenames are specified as being ASCII or being described with unspecified escape sequences.
  • GRUB assumes that the ISO9660 names are UTF-8 (since any ASCII is valid UTF-8).
  • There are some old CD-ROMs which use CP437 in non-compliant way.
  • You’re still able to access files with names containing only ASCII characters on such filesystems though.
  • You’re also able to access any file if the filesystem contains valid Joliet (UTF-16) or RockRidge (UTF-8). AFFS, SFS and HFS never use unicode and GRUB assumes them to be in Latin1, Latin1 and MacRoman respectively.
  • GRUB handles filesystem case-insensitivity however no attempt is performed at case conversion of international characters so e.g. a file named lowercase greek alpha is treated as different from the one named as uppercase alpha.
  • The filesystems in questions are NTFS (except POSIX namespace), HFS+ (configurable at mkfs time, default insensitive), SFS (configurable at mkfs time, default insensitive), JFS (configurable at mkfs time, default sensitive), HFS, AFFS, FAT, exFAT and ZFS (configurable on per-subvolume basis by property “casesensitivity”, default sensitive).
  • On ZFS subvolumes marked as case insensitive files containing lowercase international characters are inaccessible. Also like all supported filesystems except HFS+ and ZFS (configurable on per-subvolume basis by property “normalization”, default none) GRUB makes no attempt at check of canonical equivalence so a file name u-diaresis is treated as distinct from u+combining diaresis.
  • This however means that in order to access file on HFS+ its name must be specified in normalisation form D. On normalized ZFS sub-volumes filenames out of normalization are inaccessible.

Output Terminal

  • Firmware output console “console” on ARC and IEEE1275 are limited to ASCII.
  • BIOS firmware console and VGA text are limited to ASCII and some pseudographics.
  • None of above mentioned is appropriate for displaying international and any unsupported character is replaced with question mark except pseudographics which we attempt to approximate with ASCII.
  • EFI console on the other hand nominally supports UTF-16 but actual language coverage depends on firmware and may be very limited.
  • The encoding used on serial can be chosen with terminfo as either ASCII, UTF-8 or “visual UTF-8”. Last one is against the specification but results in correct rendering of right-to-left on some readers which don’t have own bidi implementation.
  • On emu GRUB checks if charset is UTF-8 and uses it if so and uses ASCII otherwise.

gfxterm fonts

  • When using gfxterm or gfxmenu GRUB itself is responsible for rendering the text. In this case GRUB is limited by loaded fonts. If fonts contain all required characters then bidirectional text, cursive variants and combining marks other than enclosing, half (e.g. left half tilde or combining overline) and double ones.
  • Ligatures aren’t supported though. This should cover European, Middle Eastern (if you don’t mind lack of lam-alif ligature in Arabic) and East Asian scripts.
  • Notable unsupported scripts are Brahmic family and derived as well as Mongolian, Tifinagh, Korean Jamo (precomposed characters have no problem) and tonal writing (2e5-2e9).
  • GRUB also ignores deprecated (as specified in Unicode) characters (e.g. tags).
  • GRUB also doesn’t handle so called “annotation characters”
  • If you can complete either of two lists or, better, propose a patch to improve rendering, please contact developer team.

Input Terminal

  • Firmware console on BIOS, IEEE1275 and ARC doesn’t allow you to enter non-ASCII characters.
  • EFI specification allows for such but author is unaware of any actual implementations.
  • Serial input is currently limited for latin1 (unlikely to change).
  • Own keyboard implementations (at_keyboard and usb_keyboard) supports any key but work on one-char-per-keystroke.

So no dead keys or advanced input method. Also there is no keymap change hotkey. In practice it makes difficult to enter any text using non-Latin alphabet. Moreover all current input consumers are limited to ASCII.

gettext

GRUB supports being translated. For this you need to have language *.mo files in $prefix/locale, load gettext module and set “lang” variable.

Regexp

Regexps work on unicode characters, however no attempt at checking cannonical equivalence has been made. Moreover the classes like [:alpha:] match only ASCII subset.

MISC

  • GRUB always uses YEAR-MONTH-DAY HOUR:MINUTE:SECOND [WEEKDAY] 24-hour datetime format but weekdays are translated.
  • GRUB always uses the decimal number format with [0-9] as digits and . as descimal separator and no group separator.
  • IEEE1275 aliases are matched case-insensitively except non-ASCII which is matched as binary. Similar behaviour is for matching OSBundleRequired. Since IEEE1275 aliases and OSBundleRequired don’t contain any non-ASCII it should never be a problem in practice.
  • Case-sensitive identifiers are matched as raw strings, no canonical equivalence check is performed. Case-insenstive identifiers are matched as RAW but additionally [a-z] is equivalent to [A-Z].
  • GRUB-defined identifiers use only ASCII and so should user-defined ones. Identifiers containing non-ASCII may work but aren’t supported.
  • Only the ASCII space characters (space U+0020, tab U+000b, CR U+000d and LF U+000a) are recognised. Other unicode space characters aren’t a valid field separator. test (see test) tests <, >, <=, >=, -pgt and -plt compare the strings in the lexicographical order of unicode code-points, replicating the behavior of test from coreutils. environment variables and commands are listed in the same order.

Examples

From poly-light repository (https://github.com/shvchk/poly-light/blob/master/theme.txt):


Using install.sh to modify the text based on language

# Global properties
title-text: ""
desktop-image: "background.png"
desktop-color: "#000000"
terminal-font: "Unifont Regular 18"
terminal-box: "terminal_box_*.png"
terminal-left: "0"
terminal-top: "0"
terminal-width: "100%"
terminal-height: "100%"
terminal-border: "0"

# Boot menu
+ boot_menu {
        left = 15%
        top = 20%
        width = 70%
        height = 60%
        item_font = "Unifont Regular 18"
        item_color = "#777777"
        selected_item_color = "#444444"
        item_height = 40
        item_spacing = 4
        item_pixmap_style = "item_*.png"
        selected_item_pixmap_style = "selected_item_*.png"
}

# Countdown message
+ label {
        left = 0
        top = 100%-48
        width = 100%
        align = "center"
        id = "__timeout__"
        # DE
        # text = "Start in %d Sekunden."
        # EN
        text = "Booting in %d seconds"
        # ES
        # text = "Arranque en %d segundos"
        # FR
        # text = "Démarrage automatique dans %d secondes"
        # IT
        # text = "Avvio in %d secondi"
        # NO
        # text = "Starter om %d sekunder"
        # PT
        # text = "Arranque automático dentro de %d segundos"
        # RU
        # text = "Загрузка выбранного пункта через %d сек."
        # UA
        # text = "Автоматичне завантаження розпочнеться через %d сек."
        # zh_CN
        # text = "在 %d 内启动"
        color = "#777777"
        font = "Unifont Regular 18"
}

# Navigation keys hint 
+ label {
        left = 0
        top = 100%-24
        width = 100%
        align = "center"
        # DE
        # text = "System mit ↑ und ↓ auswählen und mit Enter bestätigen."
        # EN
        text = "Use ↑ and ↓ keys to change selection, Enter to confirm"
        # ES
        # text = "Use las teclas ↑ y ↓ para cambiar la selección, Enter para confirmar"
        # FR
        # text = "Choisissez le système avec les flèches du clavier (↑ et ↓), puis validez avec la touche Enter (↲)"
        # IT
        # text = "Usa i tasti ↑ e ↓ per cambiare la selezione, premi Invio ↲ per confermare"
        # NO
        # text = "Bruk ↑ og ↓ for å endre menyvalg, velg med Enter"
        # PT
        # text = "Use as teclas ↑ e ↓ para mudar a seleção, e ENTER para confirmar"
        # RU
        # text = "Используйте клавиши ↑ и ↓ для изменения выбора, Enter для подтверждения"
        # UA
        # text = "Використовуйте ↑ та ↓ для вибору, Enter для підтвердження"
        # zh_CN
        # text = "使用 ↑ 和 ↓ 键移动选择条,Enter 键确认"
        color = "#777777"
        font = "Unifont Regular 18"
}


Install.sh script:

...

declare -A LANGS=(
        [Chinese]=zh_CN
        [English]=EN
        [French]=FR
        [German]=DE
        [Italian]=IT
        [Norwegian]=NO
        [Portuguese]=PT
        [Russian]=RU
        [Spanish]=ES
        [Ukrainian]=UA
)
LANG_NAMES=($(echo ${!LANGS[*]} | tr ' ' '\n' | sort -n))

...

if [[ "$LANG" != "English" ]]; then
        echo "Changing language to ${LANG}"
        sed -i -r \
                -e '/^\s+# EN$/{n;s/^(\s*)/\1# /}' \
                -e '/^\s+# '"${LANGS[$LANG]}"'$/{n;s/^(\s*)#\s*/\1/}' \
                ${THEME}-master/theme.txt
fi

...