tsc_String

The SC API uses strings extensively, and a string class is included to simplify their use.  One of the primary uses of tsc_String is to enable strings to be returned to the plugin from Survey Core while avoiding difficult memory management requirements.  tsc_String is a lightweight, efficient, immutable class which uses reference counting to prevent the unnecessary copying of strings.

tsc_String is almost identical to the C# String class.

The API also has a class for dealing with a list of strings, tsc_StringList

Note that any method that alters the string value will return a new value, and not modify the value of the referenced string - with the exception of the Swap method.

Use of const char* parameters

Note that many API functions take const char* parameters, which are used rather than tsc_String so that constant strings are efficient - simply passing a pointer rather than instantiating a string which involves allocating memory, copying the string and constructing two objects.  Because tsc_String has an implicit cast to const char*, tsc_Strings can also be passed directly into const char* parameters.

But, take care: Some functions, such as sprintf varargs parameters, require a char* parameter but do not implicitly cast it which results in the wrong address being passed to the function. If unsure, use the .Characters() function to explicitly force the correct string address to be passed.

UTF-8, Unicode, and MBCS character sets and encoding

tsc_String construction and outputs are all UTF-8 encoded.  Internally strings are stored as UTF-8 since this is the format used within Survey Core.  UTF-8 is directly compatible with ASCII for the first 128 values.  For all other characters, the Unicode value is encoded into 2 or more bytes.

The Windows API previously used the ucs-2 encodings but this has now changed to utf-16. The difference is fairly minor and up until 2021 Trimble Access and Scapi continued to use ucs-2. From 2022 versions onwards, utf-16 is fully supported. Also see the tsc_File class for information regarding the reading of files with various encodings.

The Android API has always used utf-32, and previous to 2022 versions Trimble Access has simply treated these as ucs-2 which is not fully compliant. From 2022 onwards, the tsc_UniString class supports utf-16. Conversions to and from utf-32 are performed internally.

 To obtain a utf-16 (wchar_t) string, use the tsc_UniString class to do the conversion.  The tsc_String class also handles strings identically regardless of the hardware platform and the O/S..

Some Windows API strings (such as file names) use Multi-byte character set (MBCS) for the characters. MBCS characters also occur when non-ASCII characters (such as ³, €, or ß) are typed directly into Visual Studio C++ string constants. While MBCS characters are C++ type-compatible with tsc_String, their values are not UTF-8 and will therefore display incorrectly. To fix this, use the tsc_String::FromMbcs() method to convert them to a UTF-8 string.

Case sensitivity

Many tsc_String functions have a case-insensitive option; the default is always case-sensitive but this can be changed by adding the optional Case_Insensitive parameter.  See the bottom of this page for the enums tsc_CaseOptions and tsc_ReplaceOptions.

Output formatting

The Format() function uses identical formatting strings to printf, but it has some important differences.  The output is returned in a tsc_String to avoid the need for an (insecure) fixed-length buffer.  Format will truncate the output if it exceeds 64kb.  Format() will also translate x_Code values, and it accepts any subclass of tsc_Object (including tsc_String) which has implemented the ToString() method.

When formatting double values, the value double_Null will be displayed like a NaN (1.#QNAN or similar).  The converted value for NaN is a little crazy, but well described on this stackoverflow page.

Examples

A full range of examples of use can be found on the tsc_String examples page.

Most other tsc_String functions are self-explanatory and closely resemble the .NET String functions.  They are listed below with an occasional clarifying note:

Public methods

tsc_String ();
Constructs an empty string.

tsc_String (const tsc_String &s);
Constructs a reference to s.

tsc_String (const char* a, const char* b)
Constructs a concatenation of a and b.

tsc_String (const char* s);
Constructs from a char* string.

tsc_String (char c, int count=1);
Constructs a string containing count copies of c.

static tsc_String FromMbcs (const char* mbcs);
Returns a UTF-8 string converted from the supplied null-terminated MBCS. The current codepage of the O/S is used to interpret the MBCS characters.
For example:

tsc_MessageBox::Show("Hey", tsc_String::FromMbcs("Robert lässt grüßen"));

 int Length () const;
Returns the number of bytes.  Because UTF-8 characters are used, this count is not necessarily the number of characters, due to additional bytes used to encode non-ASCII (Unicode) character codes.

bool IsEmpty () const;
Returns true if the string length is zero.

int Compare (const char* s, tsc_CaseOptions options=Case_Sensitive) const;
Compares the two strings byte-for-byte and returns -1, 0, +1 for less, equal, greater, respectively.

tsc_String ToLower () const;
Returns the string converted to lower case.

tsc_String ToUpper () const;
Returns the string converted to upper case.

bool Contains (char c, tsc_CaseOptions options=Case_Sensitive) const;
Returns true if the string contains the character c.

bool Contains (const char* c, tsc_CaseOptions options=Case_Sensitive) const;
Returns true if the string contains the string c.

int IndexOf (char c) const;
Returns the index of the first c in the string, or -1 if not found.

int IndexOf (char c, int start, tsc_CaseOptions options=Case_Sensitive) const;
Returns the zero-based index of the first c in the string starting at start, or -1 if not found.

int IndexOf (const char* c) const;
Returns the zero-based index of c in the string, or -1 if not found.

int IndexOf (const char* c, int start, tsc_CaseOptions options=Case_Sensitive) const;
Returns the zero-based index of the first c in the string starting at start, or -1 if not found.

int LastIndexOf (char c) const;
The index of the last c in the string, or -1.

bool StartsWith (char c) const;
Returns true if the string starts with c.

bool EndsWith (char c) const;
Returns true if the string ends with c.

bool StartsWith (const char* c, tsc_CaseOptions options=Case_Sensitive) const;
Returns true if the string starts with c.

bool EndsWith (const char* c, tsc_CaseOptions options=Case_Sensitive) const;
Returns true if the string ends with c.

tsc_String Replace (const char* find, const char* replace, tsc_ReplaceOptions options=Replace_None) const;
Returns a string with the first instance of find replaced by replace. The options allow case-insensitivity, and the replacement of all matches rather than just the first.  If more than one option is required, OR the values together (also see below).

tsc_String Replace (const char* find, const char* replace, int options) const;
Returns a string with the first instance of find replaced by replace. The options allow case-insensitivity, and the replacement of all matches.  If more than one option is required, OR the values together (ORing the values returns an int, hence this overload of the Replace method).

tsc_String Substring (int start) const;
Returns the rest of the string, beginning at start.

tsc_String Substring (int start, int length) const;
Returns a substring beginning at start, for length bytes.

tsc_String Insert (int startIndex, const char* value) const;
Inserts a string at startIndex.

tsc_String Remove (int startIndex) const;
Removes all characters from startIndex to the end of the string.

tsc_String Remove (int startIndex, int length) const;
Removes length characters beginning at startIndex.

tsc_String Trim () const;
Removes leading and trailing whitespace. Whitespace includes space, tab, CR, LF, VT, FF (0x09-0x0D and 0x20).

tsc_String TrimRight () const;
Removes trailing whitespace. Whitespace includes space, tab, CR, LF, VT, FF (0x09-0x0D and 0x20).

tsc_String TrimLeft () const;
Removes leading whitespace.

tsc_String Trim (char c) const;
Removes all leading and trailing occurrences of c.

tsc_String TrimRight (char c) const;
Removes all trailing occurrences of c.

tsc_String TrimLeft (char c) const;
Removes all leading occurrences of c.

tsc_StringList Split (char splitAt, tsc_SplitOptions options = Split_None);
Breaks the string into one or more substrings by splitting the string at each ocurrence of the splitAt character.  Also see the tsc_SplitOptions enumeration below for more information.

tsc_StringList Split (const char* splitAtAny, tsc_SplitOptions options = Split_None);
Breaks the string into one or more substrings by splitting the string at each occurrence of any character in splitAtAny.  Also see the tsc_SplitOptions enumeration below for more information.

tsc_StringList Split (const tsc_StringList& splitAtAny, tsc_SplitOptions options = Split_None);
Breaks the string into one or more substrings by splitting it at each occurrence of any string from splitAtAny.  Also see the tsc_SplitOptions enumeration below for more information.

void Swap (tsc_String &with);
Swaps the two strings. This is quick; it just swaps buffer pointers. The buffers themselves are unchanged, thus preserving the immutability of shared strings, however strings passed by reference will change the original.  This method is intended for use in string sorting algorithms.

virtual tsc_String ToString () const;
ToString override - returns itself.

tsc_String& operator= (const tsc_String &s);
Assignment from another tsc_String.

tsc_String& operator= (char c);
Assignment from a character.

tsc_String& operator= (const char* s);
Assignment from a char*.

bool operator== (const tsc_String &s) const;
Case-sensitive equality comparison.

bool tsc_String& operator== (const char* b) const;
Case-sensitive equality comparison with a char*.

bool operator!= (const tsc_String &s) const;
Case-sensitive inequality comparison.

bool operator!= (const char* b) const;
Case-sensitive inequallity comparison with a char*.

tsc_String& operator+= (const tsc_String &b);
Concatenate a second string. Note that the string is not altered; a new string is created and returned.

tsc_String& operator+= (const char* b);
Concatenate a char*. Note that the string is not altered; a new string is created and returned.

char operator[] (int index) const;
[] operator for indexing into the string (read only). This returns a single byte, which is an incomplete character for a multi-byte unicode character. This means all unicode codepoints greater than 0x7F.

operator const char* () const;
Implicit cast to char*. This method allows a tsc_String to be used as a parameter to any function that takes a char*.

const char* Characters () const
Const pointer to the internal utf-8 buffer. Never modify the buffer.

Static Public Method

static tsc_String Format (const char* format,
                          tsc_Variant p1, tsc_Variant p2, tsc_Variant p3,
                          tsc_Variant p4, tsc_Variant p5, tsc_Variant p6,
                          tsc_Variant p7, tsc_Variant p8, tsc_Variant p9, tsc_Variant p10)

This method provides sprintf-style formatting, with extended parameter functionality for one to ten parameters.  Internally, a C Runtime sprintf variant is used to perform the formatting.

The format argument should conform to the standard C runtime printf() specification. See: printf() on MSDN

p1 to p10, the value arguments, may be any C++ simple type (see tsc_Variant constructors), and any number of arguments between one and ten may be supplied since there is an overload for every number of parameters.

The value arguments may also be tsc_String or any other subclass of tsc_Object which has overidden ToString().  x_Code parameters are also accepted and return a string translated into the current language. Arguments of type tsc_Object, tsc_String, and x_Code must be formatted using %s.

NOTES:

Parameters:

format - The format string with the same rules as printf().

p1..p10 - Up to ten optional parameters. Each must match a compatible format specification in the format string.

Character functions

The following functions, which are not part of the tsc_String class, wrap the C runtime character tests:

bool    IsAlphaNumeric (char c);
bool    IsAlpha        (char c);
bool    IsNumeric      (char c);
bool    IsLower        (char c);
bool    IsUpper        (char c);
bool    IsWhite        (char c); // Whitespace includes space, tab, CR, LF, VT, FF (0x09-0x0D and 0x20).
char    ToLower        (char c);
char    ToUpper        (char c); 

Enumerations

This enum controls the operation of Replace.  For multiple options, OR the values together:
enum tsc_ReplaceOptions
{
    Replace_None            = 0x00,
    Replace_CaseInsensitive = 0x01,      // Makes the searching case-insensitive.
    Replace_AllOccurrences  = 0x02        // Replace all occurrences. By default only the first is replaced.
};


This enum controls case sensitivity for a number of string functions:
enum tsc_CaseOptions
{
    Case_Sensitive         = 0x00,
    Case_Insensitive       = 0x01,
};


The following enum controls the behaviour of the Split methods in tsc_String.
enum tsc_SplitOptions
{
    Split_None               = 0x00,  // No options specified.
    Split_RemoveEmptyEntries = 0x01,  // Remove empty strings from the returned list.
    Split_RespectQuotes      = 0x02   // Ignore delimiters which are inside single or double quotes.
};

The Split_RespectQuotes option is useful for applications such as decoding CSV files.  If either quote character (single or double) is found at the start or anywhere within a field, scanning for the delimiter characters or strings is suspended until another matching quote is found.  For example:

tsc_StringList fields = tsc_String("one,'two,three',four").Split(',', Split_RespectQuotes);

Note that the quotes are not removed in the returned string list.

Output:
   fields.Count() =  3
    fields[0] = one
    fields[1] = 'two,three'
    fields[2] = four