Appendix
Regular Expression Syntax
Below is a quick reference for the most common regular expression tags supported by IxoraRMS. The full supported syntax is that of the Pattern class, as described in Java documentation.
Characters
- x The character x
- \\ The backslash character
- \0n The character with octal value 0n (0 <= n <= 7)
- \0nn The character with octal value 0nn (0 <= n <= 7)
- \0mnn The character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7)
- \xhh The character with hexadecimal value 0xhh
- \uhhhh The character with hexadecimal value 0xhhhh
- \t The tab character ('\u0009')
- \n The newline (line feed) character ('\u000A')
- \r The carriage-return character ('\u000D')
- \f The form-feed character ('\u000C')
- \a The alert (bell) character ('\u0007')
- \e The escape character ('\u001B')
- \cx The control character corresponding to x
Character classes
- [abc] a, b, or c (simple class)
- [^abc] Any character except a, b, or c (negation)
- [a-zA-Z] a through z or A through Z, inclusive (range)
- [a-d[m-p]] a through d, or m through p: [a-dm-p] (union)
- [a-z&&[def]] d, e, or f (intersection)
- [a-z&&[^bc]] a through z, except for b and c: [ad-z] (subtraction)
- [a-z&&[^m-p]] a through z, and not m through p: [a-lq-z](subtraction)
Predefined character classes
- . Any character (may or may not match line terminators)
- \d A digit: [0-9]
- \D A non-digit: [^0-9]
- \s A whitespace character: [ \t\n\x0B\f\r]
- \S A non-whitespace character: [^\s]
- \w A word character: [a-zA-Z_0-9]
- \W A non-word character: [^\w]
POSIX character classes (US-ASCII only)
- \p{Lower} A lower-case alphabetic character: [a-z]
- \p{Upper} An upper-case alphabetic character:[A-Z]
- \p{ASCII} All ASCII:[\x00-\x7F]
- \p{Alpha} An alphabetic character:[\p{Lower}\p{Upper}]
- \p{Digit} A decimal digit: [0-9]
- \p{Alnum} An alphanumeric character:[\p{Alpha}\p{Digit}]
- \p{Punct} Punctuation: One of !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
- \p{Graph} A visible character: [\p{Alnum}\p{Punct}]
- \p{Print} A printable character: [\p{Graph}]
- \p{Blank} A space or a tab: [ \t]
- \p{Cntrl} A control character: [\x00-\x1F\x7F]
- \p{XDigit} A hexadecimal digit: [0-9a-fA-F]
- \p{Space} A whitespace character: [ \t\n\x0B\f\r]
Classes for Unicode blocks and categories
- \p{InGreek} A character in the Greek block (simple block)
- \p{Lu} An uppercase letter (simple category)
- \p{Sc} A currency symbol
- \P{InGreek} Any character except one in the Greek block (negation)
- [\p{L}&&[^\p{Lu}]] Any letter except an uppercase letter (subtraction)
Boundary matchers
- ^ The beginning of a line
- $ The end of a line
- \b A word boundary
- \B A non-word boundary
- \A The beginning of the input
- \G The end of the previous match
- \Z The end of the input but for the final terminator, if any
- \z The end of the input
Greedy quantifiers
- X? X, once or not at all
- X* X, zero or more times
- X+ X, one or more times
- X{n} X, exactly n times
- X{n,} X, at least n times
- X{n,m} X, at least n but not more than m times
Capturing Groups
Capturing groups are created by enclosing parts of the regular expresion in brackets (). The string matched by a capturing group is accessible later on with the use of $n tags, where $1 .. $n represent capturing groups 1 to n.
Formatting Syntax
The <format> attributes in IxoraRMS accept the standard Java syntax for number and dates formatting (DecimalFormat and SimpleDateFormat). For full information please refer to Java documentation
Formatting tokens for numbers
- 0 Number Digit
- # Number Digit, zero shows as absent
- . Number Decimal separator or monetary decimal separator
- - Number Minus sign
- , Number Grouping separator
- E Number Separates mantissa and exponent in scientific notation. Need not be quoted in prefix or suffix.
- ; Subpattern boundary Separates positive and negative subpatterns
- % Prefix or suffix Multiply by 100 and show as percentage
- \u2030 Prefix or suffix Multiply by 1000 and show as per mille
- \u00A4 Prefix or suffix Currency sign, replaced by currency symbol. If doubled, replaced by international currency symbol. If present in a pattern, the monetary decimal separator is used instead of the decimal separator.
- ' Prefix or suffix Used to quote special characters in a prefix or suffix, for example, "'#'#" formats 123 to "#123". To create a single quote itself, use two in a row: "# o''clock".
Formatting tokens for dates:
- G Era designator
- y Year
- M Month in year
- w Week in year
- W Week in month
- D Day in year
- d Day in month
- F Day of week in month
- E Day in week
- a Am/pm marker
- H Hour in day (0-23)
- k Hour in day (1-24)
- K Hour in am/pm (0-11)
- h Hour in am/pm (1-12)
- m Minute in hour
- s Second in minute
- S Millisecond
- z Time zone (General)
- Z Time zone (RFC 822)