Regular Expressions (RegEx)

Regular expressions are not part of the syllabus, but work to extend understanding of BNF (16.2) and other principles.  They are powerful and can make challenging programming relatively straightforward. 

This is a mini page to help with performing regular expressions.

Contents

Regular Expressions

What Are Regular Expressions

A regular expression is a pattern describing a certain amount of text. Their name comes from the mathematical theory on which they are based.  It is often abbreviated to regexp or regexes for the plural.  Given formally, it is a notation for defining all the valid strings of formal language or a special text string for describing a search pattern.

Regular expressions provide a powerful, flexible, and efficient method for processing text. The extensive pattern-matching notation of regular expressions enables you to quickly parse large amounts of text to find specific character patterns; to validate text to ensure that it matches a predefined pattern (such as an e-mail address); to extract, edit, replace, or delete text substrings; and to add the extracted strings to a collection in order to generate a report. For many applications that deal with strings or that parse large blocks of text, regular expressions are an indispensable tool.

Regular Expressions & VB.NET

The MSDN article provides much more information on this.  The sample below is demonstrating the use of a substring replacement.  There are some videos below, including one in Python.

Assume that a mailing list contains names that sometimes include a title (Mr., Mrs., Miss, or Ms.) along with a first and last name. If you do not want to include the titles when you generate envelope labels from the list, you can use a regular expression to remove the titles, as the following example illustrates.

Imports System.Text.RegularExpressions

Module Example    

Public Sub Main()

      Dim pattern As String = "(Mr\.? |Mrs\.? |Miss |Ms\.? )" 

      Dim names() As String = { "Mr. Henry Hunt", "Ms. Sara Samuels", _

                                "Abraham Adams", "Ms. Nicole Norris" }

      For Each name As String In names

         Console.WriteLine(Regex.Replace(name, pattern, String.Empty))

      Next                                 

   End Sub 

End Module 

' The example displays the following output: 

'    Henry Hunt 

'    Sara Samuels 

'    Abraham Adams 

'    Nicole Norris

The regular expression pattern (Mr\.?|Mrs\.? |Miss |Ms\.? )matches any occurrence of "Mr ", "Mr. ", "Mrs ","Mrs. ", "Miss ","Ms or "Ms. ". The call to the Regex.Replace method replaces the matched string with String.Empty; in other words, it removes it from the original string.

Click here for a regex language quick reference.  Unfortunately, while .NET languages implement a strong, feature-rich set of metacharacters, the official help support is poor.

The regex classes are located in the namespace System.Text.RegularExpressions. To make them available, place Imports System.Text.RegularExpressions at the start of your source code.

The Regex class is the one you use to compile a regular expression. For efficiency, regular expressions are compiled into an internal format. If you plan to use the same regular expression repeatedly, construct a Regex object as follows: Dim RegexObj as Regex = New Regex("regularexpression"). You can then call RegexObj.IsMatch("subject") to check whether the regular expression matches the subject string. The Regex allows an optional second parameter of type RegexOptions. You could specify RegexOptions.IgnoreCase as the final parameter to make the regex case insensitive. Other options are RegexOptions.Singleline which causes the dot to match newlines and RegexOptions.Multiline which causes the caret and dollar to match at embedded newlines in the subject string.

There are a number of useful website below, including this site which offers a gentle(ish) introduction into using regex with VB.NET.

Call RegexObj.Replace("subject", "replacement") to perform a search-and-replace using the regex on the subject string, replacing all matches with the replacement string. In the replacement string, you can use $& to insert the entire regex match into the replacement text. You can use $1, $2, $3, etc... to insert the text matched between capturing parentheses into the replacement text. Use $$ to insert a single dollar sign into the replacement text. 

RegexObj.Split("Subject") splits the subject string along regex matches, returning an array of strings. The array contains the text between the regex matches. If the regex contains capturing parentheses, the text matched by them is also included in the array. If you want the entire regex matches to be included in the array, simply place round brackets around the entire regular expression when instantiating RegexObj.

The Regex class also contains several static methods that allow you to use regular expressions without instantiating a Regex object. This reduces the amount of code you have to write, and is appropriate if the same regular expression is used only once or reused seldomly. Note that member overloading is used a lot in the Regex class. All the static methods have the same names (but different parameter lists) as other non-static methods.

Here is another VB.NET example which performs some basic string matching.

Imports System.Text.RegularExpressions

Module Module1


    Sub Main()

        Dim myMatches As MatchCollection

        Dim myRegex As New Regex("\w+")

        Dim t As String = "Introduction to Regular Expressions in Visual Basic"

        myMatches = myRegex.Matches(t)


        Dim successfulMatch As Match

        For Each successfulMatch In myMatches

            Console.WriteLine(successfulMatch.Value)

        Next

        Console.ReadKey()

    End Sub


End Module

Here is the same example in Python


import re

MyMatches=[]

myRegex = "\w+"

text = "Introduction to Regular Expressions in Python"


myMatches = re.findall(myRegex, text)

  

for successfulMatch in myMatches:

    print(successfulMatch)

As you can see, using Visual Basic to manipulate regular expressions is not difficult.  If you do not practise, however, then expect this to be a topic in which you will never master!

The About.com website includes a good article regarding regular expressions in VB.NET.

Terminology

Click here for a comprehensive glossary of the key terms relating to regular expressions.

Metacharacters & Examples

The table below gives only a small set of the different metacharacters available. Unfortunately, there are different implementations of regular expressions so not all characters are compatible with all versions.

Below are a number of examples using some of the above metacharacters.

you can follow links if you want to see the other parts to this