|
| \d | any number |
| \s | any whitespace |
| \w | any letter, number, or underscore |
| \D | anything not a number |
| \S | anything not whitespace |
| \W | anything not a letter, number, or underscore |
These match one, and only one, character of the sort described. For example, \d by itself will only match one number.
Basic Assertions
Assertions are different from normal
parts of patterns in one very unique way. They don't consume
characters. What this means is they are used to specify a condition instead of actually matching anything. A list of basic assertions
follows.
| ^ | beginning of a line |
| $ | end of a line |
| \b | a word boundary |
| \B | anything not a word boundary |
The most common assertions are the circumflex and dollar. They are used to specify the beginning and end of a line. Starting a trigger with a circumflex and ending it with a dollar is often called 'anchoring' the trigger. This will prevent it from firing in the middle of something, like hearing it over a tell or a channel. If we wanted to anchor the trigger we created in section three, it would look like this:
^Trevize greets you\.$
That trigger will only fire if there is nothing before Trevize and nothing after the period. A word boundary is where there are not two adjacent characters that would match \w. So why not use \W instead? Because \b is an assertion, which means doesn't actually match anything, but just requires the boundary be there. If you used \W, for example, it wouldn't match the end or beginning of a line. The pattern Trevize\b would match Trevize but not Trevizes or Trevizer.
Character Classes
A character class matches any single character defined by the class. It begins with an opening square bracket and ends with a closing square bracket. Everything between those brackets is a potential character the class can match. In the class, you can include normal characters and escaped metacharacters. If a circumflex is the first character in the class, it means the class will match anything except what is in the class. A dash serves to specify a range. Here are a few examples:
| [abcde] | matches a, b, c, d, or e |
| [^abcde] | matches anything but a, b, c, d, or e |
| [a-e] | same as [abcde] |
| [0-9] | same as \d |
| [a-zA-Z0-9_] | same as \w |
Their power lies in being able to be very specific or as broad as you want. For example, we used m.n earlier to match man and men. But it would also match min or mmn, which we do not want. Yet \w is no better. Instead, we could use a character class here to specify just a or e. The pattern m[ae]n would do this perfectly.
Subpatterns and the Vertical Bar
Subpatterns are just that, a small pattern within a larger one. A subpattern is anything surrounded by parentheses. They have many uses. Most importantly, plain parentheses are 'capturing' subpatterns. In zMUD and MUSHclient, that means they pass whatever is inside them back to the script as %1-9. For example, if you wanted to capture a dice roll, you could use \d inside a subpattern. The pattern:
^You roll the dice, and get (\d) and (\d)\.$
Would send the two numbers to the script as %1 and %2. If you wanted to give several options for a subpattern, you can split it in two with a vertical bar. The pattern:
^You are (tall|short)\.$
Would match with either tall or short. It would also send whichever matched to %1, which we may not want! If you make a question mark then a colon the first two things in a subpattern, it will group but it will not capture. So we could fix that pattern like this:
^You are (?:tall|short)\.$
And ta-da! Perfection. You can have as many vertical bars as you want, and you can have subpatterns inside subpatterns. You can also make a vertical bar the last thing in the subpattern, like this:
^You are (?:very |not |)short\.$
And it would match very short, not short, and just plain short.
Quantifiers
Finally, one of the most useful aspects of regex is repetition. Repetition is specified by quantifiers. A quantifier may follow any single character, a period, an escaped character type, a character class, or a subpattern (except assertions). A quantifier is normally surrounded by curly braces and specifies an exact amount or a minimum and a maximum. There are three shorthand quantifiers. Examples:
| {3} | matches exactly three times |
| {1,2} | matches between one and two times |
| {3,} | matches at least three times, with no maximum |
| + | same as {1,} (one or more times) |
| * | same as {0,} (zero or more times) |
| ? | same as {0,1} (zero or one time) |
What does this mean? Well, z{10} would match zzzzzzzzzz, and z{2,4} would match zz, zzz, and zzzz. The pattern \d+ would match any amount of numbers, as long as there is at least one. You could use [abc]? to match a, b, c, or nothing.
One common usage is an? to match a or an. In the last example of the subpatterns section above, you could use a quantifier to indicate the possibility of neither word instead, as well as moving the space outside the subpattern and indicating it could also not exist. Like so:
^You are (?:very|not)? ?short\.$
References
http://www.gammon.com.au/mushclient/regexp.htmhttp://mushclient.com/pcre/pcrepattern.html