Name Shortener

There are occasions where you need to develop a name shortener, where you need to trim long name into shorter name. For starter, one will usually set the name into a long list (either array or equivalent) and then chop it off with the list length.

This works only for ASCII characters, such as A-Z and a-z. However, names across the world has a lot of varieties. Take a look at our king at Ace Combat 7:

Mihaly Dumitru Margareta Corneliu Leopold Blanca Karol Aeon Ignatius Raphael Maria Niketas A. Shilage

Or a fish sandwich in Swedish:

Räksmörgåsmacka

Those cannot be represented in ASCII representation table but Unicode-8 representation table. Here's an example in Go code:

package main

import (
  "fmt"
)

func trimLength(s string, n int) string {
  rs := []rune(s)
  return string(rs[:n])
}

func main() {
  s := "Räksmörgåsmacka" // Swedish for shrimp sandwich
  fmt.Println(s[:11]) // Illegal character on end
  fmt.Println(trimLength(s, 11))
}
// Output:
// Räksmörg�
// Räksmörgåsm

Hence, we need to tread lightly and don't get underestimated!

Recommended Approach

One recommended approach is to jump back to Unicode and create a list from there. In some languages, you're required to import the Unicode Locale library Such as C/C++. The Approach is:

1. Jump to Unicode Locale Representation Table
2. Chop from name into a list (e.g. array or slice in Go)
3. Use the Locale Table to slice the list based on required length.
3.1. Length is measured by "runes", where some characters are multi-bytes per character.
4. Encode back to name data type.

Example Implementation

Here is an example implementation for the above algorithm:

// Cotributed by: Chew, Kean Ho (Holloway), Johan Dahl
//
// main program is about trimming names to 20 characters
package main

import (
    "fmt"
    "unicode/utf8"
)

const (
    maxLength = 20
)

type Name struct {
    first string
    last string
}

func (n *Name) Set(firstName string, lastName string) {
    var rs []rune

    n.first = firstName
    if utf8.RuneCountInString(firstName) > maxLength {
        rs = []rune(firstName)
        n.first = string(rs[:maxLength])
    }

    n.last = lastName
    if utf8.RuneCountInString(lastName) > maxLength {
        rs = []rune(lastName)
        n.last = string(rs[:maxLength])
    }
}

func (n *Name) GetWithFirst(withFirstName bool) string {
    if withFirstName {
        return n.last + ", " + n.first
    }
    return n.last
}

func main() {
    l := &Name{}
    l.Set("Dumitru Margareta Ἄγγελος Leopold Blanca Karol Aeon Ignatius Raphael Maria Niketas A. Shilage", "Mihaly")
    s := l.GetWithFirst(true)
    fmt.Printf("My name is: %s\n", s)
}
// Output:
// My name is: Mihaly, Dumitru Margareta Ἄγ

Optimize For Performance

Thinkeridea Qi Yin later approached the team about his finding related to the current example's performance benchmark. It appears that it took quite a hit for comparison:

  return s[:n]
}

func SubStrC(s string, length int) string {
  var size, n int
  for i := 0; i < length && n < len(s); i++ {
    _, size = utf8.DecodeRuneInString(s[n:])
    n += size
  }

  b := make([]byte, n)
  copy(b, s[:n])
  return *(*string)(unsafe.Pointer(&b))
}

var s = "Go语言是Google开发的一种静态强类型、编译型、并发型,并具有垃圾回收功能的编程语言。为了方便搜索和识别,有时会将其称为Golang。"

func BenchmarkSubStrA(b *testing.B) {
  for i := 0; i < b.N; i++ {
    SubStrA(s, 20)
  }
}

func BenchmarkSubStrB(b *testing.B) {
  for i := 0; i < b.N; i++ {
    SubStrB(s, 20)
  }
}

func BenchmarkSubStrC(b *testing.B) {
  for i := 0; i < b.N; i++ {
    SubStrC(s, 20)
  }
}
// Benchmark:
// goos: darwin
// goarch: amd64
// BenchmarkSubStrA-8 745708 1624 ns/op 336 B/op 2 allocs/op
// BenchmarkSubStrB-8 9568920 122 ns/op 0 B/op 0 allocs/op
// BenchmarkSubStrC-8 7274718 157 ns/op 48 B/op 1 allocs/op
// PASS ok command-line-arguments 4.782s

It appears that the truncation is best be done using for loop rather than checking one at a time. Hence, by merging Qi Yin's findings into the existing solution, we got the final masterpiece:

// Cotributed by: Chew, Kean Ho (Holloway), Johan Dahl, Qi Yin
//
// main program is about trimming names to 20 characters
package main

import (
        "fmt"
        "unicode/utf8"
)

const (
        maxLength = 20
)

type Name struct {
        first string
        last  string
}

func (n *Name) trim(s string, length int) string {
        var size, x int

        for i := 0; i < length && x < len(s); i++ {
                _, size = utf8.DecodeRuneInString(s[x:])
                x += size
        }

        return s[:x]
}

func (n *Name) Set(firstName string, lastName string) {
        n.first = n.trim(firstName, maxLength)
        n.last = n.trim(lastName, maxLength)
}

func (n *Name) GetWithFirst(withFirstName bool) string {
        if withFirstName {
                return n.last + ", " + n.first
        }

        return n.last
}

func main() {
        l := &Name{}
        l.Set("Dumitru Margareta Ἄγγελος Leopold Blanca Karol Aeon Ignatius Raphael Maria Niketas A. Shilage", "Mihaly")
        s := l.GetWithFirst(true)
        fmt.Printf("My name is: %s\n", s)
}

// Output:
// My name is: Mihaly, Dumitru Margareta Ἄγ

Thus, we're now performing upto speed.

That's all about trimming name into shorter length. Remember, the world is big. One should always be open for everyone!