Regular Expressions (RegEx)
Regular expressions are not part of the syllabus, but work to extend understanding of BNF (16.2) and other principles. They are powerful and can make challenging programming relatively straightforward.
This is a mini page to help with performing regular expressions.
Contents
Regular Expressions
What Are Regular Expressions
A regular expression is a pattern describing a certain amount of text. Their name comes from the mathematical theory on which they are based. It is often abbreviated to regexp or regexes for the plural. Given formally, it is a notation for defining all the valid strings of formal language or a special text string for describing a search pattern.
Regular expressions provide a powerful, flexible, and efficient method for processing text. The extensive pattern-matching notation of regular expressions enables you to quickly parse large amounts of text to find specific character patterns; to validate text to ensure that it matches a predefined pattern (such as an e-mail address); to extract, edit, replace, or delete text substrings; and to add the extracted strings to a collection in order to generate a report. For many applications that deal with strings or that parse large blocks of text, regular expressions are an indispensable tool.
Regular Expressions & VB.NET
The MSDN article provides much more information on this. The sample below is demonstrating the use of a substring replacement. There are some videos below, including one in Python.
Assume that a mailing list contains names that sometimes include a title (Mr., Mrs., Miss, or Ms.) along with a first and last name. If you do not want to include the titles when you generate envelope labels from the list, you can use a regular expression to remove the titles, as the following example illustrates.
Imports System.Text.RegularExpressions
Module Example
Public Sub Main()
Dim pattern As String = "(Mr\.? |Mrs\.? |Miss |Ms\.? )"
Dim names() As String = { "Mr. Henry Hunt", "Ms. Sara Samuels", _
"Abraham Adams", "Ms. Nicole Norris" }
For Each name As String In names
Console.WriteLine(Regex.Replace(name, pattern, String.Empty))
Next
End Sub
End Module
' The example displays the following output:
' Henry Hunt
' Sara Samuels
' Abraham Adams
' Nicole Norris
The regular expression pattern (Mr\.?|Mrs\.? |Miss |Ms\.? )matches any occurrence of "Mr ", "Mr. ", "Mrs ","Mrs. ", "Miss ","Ms or "Ms. ". The call to the Regex.Replace method replaces the matched string with String.Empty; in other words, it removes it from the original string.
Click here for a regex language quick reference. Unfortunately, while .NET languages implement a strong, feature-rich set of metacharacters, the official help support is poor.
The regex classes are located in the namespace System.Text.RegularExpressions. To make them available, place Imports System.Text.RegularExpressions at the start of your source code.
The Regex class is the one you use to compile a regular expression. For efficiency, regular expressions are compiled into an internal format. If you plan to use the same regular expression repeatedly, construct a Regex object as follows: Dim RegexObj as Regex = New Regex("regularexpression"). You can then call RegexObj.IsMatch("subject") to check whether the regular expression matches the subject string. The Regex allows an optional second parameter of type RegexOptions. You could specify RegexOptions.IgnoreCase as the final parameter to make the regex case insensitive. Other options are RegexOptions.Singleline which causes the dot to match newlines and RegexOptions.Multiline which causes the caret and dollar to match at embedded newlines in the subject string.
There are a number of useful website below, including this site which offers a gentle(ish) introduction into using regex with VB.NET.
Call RegexObj.Replace("subject", "replacement") to perform a search-and-replace using the regex on the subject string, replacing all matches with the replacement string. In the replacement string, you can use $& to insert the entire regex match into the replacement text. You can use $1, $2, $3, etc... to insert the text matched between capturing parentheses into the replacement text. Use $$ to insert a single dollar sign into the replacement text.
RegexObj.Split("Subject") splits the subject string along regex matches, returning an array of strings. The array contains the text between the regex matches. If the regex contains capturing parentheses, the text matched by them is also included in the array. If you want the entire regex matches to be included in the array, simply place round brackets around the entire regular expression when instantiating RegexObj.
The Regex class also contains several static methods that allow you to use regular expressions without instantiating a Regex object. This reduces the amount of code you have to write, and is appropriate if the same regular expression is used only once or reused seldomly. Note that member overloading is used a lot in the Regex class. All the static methods have the same names (but different parameter lists) as other non-static methods.
Here is another VB.NET example which performs some basic string matching.
Imports System.Text.RegularExpressions
Module Module1
Sub Main()
Dim myMatches As MatchCollection
Dim myRegex As New Regex("\w+")
Dim t As String = "Introduction to Regular Expressions in Visual Basic"
myMatches = myRegex.Matches(t)
Dim successfulMatch As Match
For Each successfulMatch In myMatches
Console.WriteLine(successfulMatch.Value)
Next
Console.ReadKey()
End Sub
End Module
Here is the same example in Python
import re
MyMatches=[]
myRegex = "\w+"
text = "Introduction to Regular Expressions in Python"
myMatches = re.findall(myRegex, text)
for successfulMatch in myMatches:
print(successfulMatch)
As you can see, using Visual Basic to manipulate regular expressions is not difficult. If you do not practise, however, then expect this to be a topic in which you will never master!
The About.com website includes a good article regarding regular expressions in VB.NET.
Terminology
Click here for a comprehensive glossary of the key terms relating to regular expressions.
Metacharacters & Examples
The table below gives only a small set of the different metacharacters available. Unfortunately, there are different implementations of regular expressions so not all characters are compatible with all versions.
Below are a number of examples using some of the above metacharacters.
Useful Websites/Resources
Regular Expression Websites
Regular Expressions - User Guide - Look here first
Visual Regex debugger - Excellent resource
Regular-Expressions - A Gentle Introduction
Using Regex within .NET framework - provides better regex help than MSDN
Videos
you can follow links if you want to see the other parts to this