Home
    Shop
    Advertise
    Write For Us
    Affiliate
    Newsletter
    Contact

Using Regex Class in ASP.NET

Regex class represents regular expression. It is immutable (means "read-only", it can't be changed after an instance is created. Thus Regex instance is completely defined in class constructor) and thread safe. Regex class is located in System.Text.RegularExpressions namespace.

 

Regex class is used in two basic ways:
1. Create an instance of the Regex class and then call instance properties and methods, or
2. Call static Regex methods directly without creating of instance

To achieve better performance, it is usually better to call static method to avoid creating of Regex instance. Regex class has these members:

CacheSize static property

CacheSize is static property; it represents maximum number of compiled regular expressions in current static cache. Default value is 15. You can increase this value if needed, but be aware that large cache will require more memory and possibly slow down your application. Cache size property affects only when static methods of Regex class are used. It is not recommended to completely turn off cache (with CacheSize = 0), especially if you use same regular expression a lot of times.

Options read-only property

Options is read-only property. It returns options from RegexOptions collection that are used in Regex constructor. Complete list and meaning of each option you can find at .Net Regular Expressions Syntax article.

RightToLeft read-only property

RightToLeft is read-only property. It returns true if Regex searches from right to left which is useful for languages that are read from right to left. RightToLeft will return true if RegexOptions.RightToLeft is used in Regex constructor.

CompileToAssembly property

Compiles regular expression to assembly on disc. Then, you can use this assembly like any other .Net assembly: add reference to application and call its methods from code. Compiled regular expression executes faster (commonly between 30% and 300% faster, depending of expression and amount of text. Improvement could be even 6x if you analyze gigabytes of text). Also, there is no initial compilation like when RegexOptions.Compile is used, so expressions compiled to assembly start faster and execute faster too. Be aware that expressions compiled in assembly can't be changed at run-time.

Escape method

Escape method converts input string to string with escaped metacharacters. Practically, Escape method just adds backslash ( \ ) character before metacharacters \, *, +, ?, |, {, [, (,), ^, $,., #, and white space. On this way escaped metacharacters are interpreted as literals. This method is useful when you work with string inserted dynamically by user, when we don't know in advance what characters regular expression could contain. Then, if user inserts some metacharacter Escape method will convert it to literal.

GetGroupNames method

GetGroupNames method returns a string array that contains names of all captured groups. If some group is unnamed, then they get indexed numeric name like 1, 2, 3, etc.

GroupNameFromNumber method

GroupNameFromNumber method returns group's name (as a string) for given group number.

GroupNumberFromName method

GroupNumberFromName method returns group's number for given group name.

IsMatch property

IsMatch property returns true or false, depending if expression found pattern in given text. This method is useful in cases when we don't want to capture data, but only want to know if pattern exists in text. Common use of IsMatch method is for data validation.

Match method

Match method searches text and returns only first result in type of Match class (in System.Text.RegularExpressions.Match). Don't be confused because both Regex.Match() method and Match class have the same name, remember that Regex.Match method returns instance of Match class. Match class has Success property (boolean) that can be used to check if RegEx engine is returned some result from text or not. Use Match.Value property to get actual result as string. Match method returns only first result, so if you want to get all results use Matches method that returns collection of Match objects in form of MatchCollection object.

Matches method

Matches method always searches complete text and returns all results as MatchCollection object. MatchCollection contains a collection of Match objects. MatchCollection is read-only (immutable) without public constructor. To access to single Match objects, you can iterate through the members using some loop, like foreach [ C# ] or For Each [ VB.NET ].

Replace method

Replace method is used in search-and-replace scenario. It requires two regular expressions. First regular expression is used to find which strings in text should be replaced (search) and second regular expression is used to build replacement strings (replace). Connection between first and second regular expression is achieved using backreferences.

Split method

Split method splits text into strings and returns string array. Positions (delimiters) for splitting are defined by regular expression, so unlike other methods, Split method returns strings that are not matched by regular expression. If Count parameter is used, you can set maximum number of strings in returned array. Also, StartAt parameter specifies starting position in text where splitting will begin. If delimiter is empty string, method will split text into single characters. If delimiter appears on start or on the end of the text, an empty string will be added on start or end of resulting array string

ToString method

ToString method returns regular expression specified in Regex constructor. Since Regex class is read-only, regular expression given in constructor can not be changed later.

Unescape method

As opposite to Escape method, Unescape method unescapes any escaped character in text (for example, it replaces \n with n, or \* with *). Notice that Escape method escapes only methacharacters, and Unescape method unescapes methacharacters but also any escaped literal too.

Common uses of Regex class

Regex class is commonly used for these tasks:
- Using Regex to extract data from text
- Data validation with Regex class
- Search and replace with Regex
- Split string into string array using Regex class

There are online examples of our four methods, you can Test .Net Regular Expressions on your own data. Let's see how to every of these tasks could be done.

Using Regex to extract data from text

Regular expressions are great tool to extract valuable information from large textual data. Of course, it is possible to get this data using classic string manipulation, but any more complex problem will demand endless number of System.String class methods, like IndexOf, SubString, Trim etc. Much simpler, cleaner and faster to implement solution is to use regular expression and extract data with Regex.Match or Regex.Matches method.

Regex.Match method returns only one instance of Match class, which represents first result that RegEx engine found in text. Regex.Matches method returns MatchCollection object that is a collection of Match classes. Regex.Matches returns all results in given range.

Let see how it works on example, we'll try to find all URLs in given HTML (which is useful if you try to build web spider):

[ C# ]

/// <summary>
    /// Finds all URLs in given HTML code
    /// </summary>
    /// <param name="HTMLCode">HTML for search</param>
    private void getAllUrlsFromHtml(string HTMLCode)
    {
        // regular expression for URL
        string regURL = @"http(s)?://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?";
        // Creates an instance of Regex class
        Regex r = new Regex(regURL);
        // Finds and stores results in MatchCollection object
        MatchCollection results = r.Matches(HTMLCode);
    
        // now do something with found URLs, like store it to database,
        // XML file etc.
        foreach (Match m in results)
        {
            // in this example we'll just print URLs on form
            Response.Write(m.Value + "<br />");
        }
    }

[ VB.NET ]

''' <summary>
''' Finds all URLs in given HTML code
''' </summary>
''' <param name="HTMLCode">HTML for search</param>
Private Sub getAllUrlsFromHtml(ByVal HTMLCode As String)
    
    ' regular expression for URL
    Dim regURL As String = "http(s)?://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?"
    ' Creates an instance of Regex class
    Dim r As Regex = New Regex(regURL)
    ' Finds and stores results in MatchCollection object
    Dim results As MatchCollection = r.Matches(HTMLCode)
    
    ' now do something with found URLs, like store it to database,
    ' XML file etc.
    For Each m As Match In results
    
        ' in this example we'll just print URLs on form
        Response.Write(m.Value & "<br />")
    Next
End Sub

Don't forget to import System.Text.RegularExpressions namespace. You can try Extract Data online application to test if your regular expression extracts data correctly.

Extracting data using regular expressions groups

Sometimes we don't need complete result. For example, if we need to get text between <title> and </title> tags, that is easily achievable with expression <title>.*?</title>, but result will contain tags "<title>" and "</title>" too. Although this could be solved using look-arround groups, much simpler solution is to use GroupCollection from Regex.Groups property. Regex.Groups can get value of single group using its name or index. Let's see how Groups work on simple example. Here is the code that reads string inside title tag of HTML using named group "<t>":

[ C# ]

private void findTextBetweenTitleTags(string HTMLCode)
    {
        // Regular expression to find HTML title tag
        string regTitle = @"<title>(?<t>.*?)</title>";
        // Creates an instance of Regex class
        Regex r = new Regex(regTitle);
        // RegEx engine returns string that contain title tags to
        Match m = r.Match(HTMLCode);
    
        // Filter result to get only value of group "t", a value
        // without <title> and </title> tags
        string HtmlTitle = m.Groups["t"].Value;
    
        // Write found result on page
        Response.Write("Title is: " + HtmlTitle + "<br />");
    }

[ VB.NET ]

Private Sub findTextBetweenTitleTags(ByVal HTMLCode As String)
    
    ' Regular expression to find HTML title tag
    Dim regTitle As String = "<title>(?<t>.*?)</title>"
    ' Creates an instance of Regex class
    Dim r As Regex = New Regex(regTitle)
    ' RegEx engine returns string that contain title tags to
    Dim m As Match = r.Match(HTMLCode)
    
    ' Filter result to get only value of group "t", a value
    ' without <title> and </title> tags
    Dim HtmlTitle As String = m.Groups("t").Value
    
    ' Write found result on page
    Response.Write("Title is: " & HtmlTitle & "<br />")
End Sub

Data validation with Regex class

Very common use of regular expressions is data validation. ASP.NET provides RegularExpressionValidator control which is useful for validation of user input on web forms. But for other tasks we can use Regex.IsMatch method. IsMatch method doesn't provide match values like Match or Matchs methods. It just returns true or false value that tell us if regular expression matched given string. .Net data validation code example, that checks if inserted string is a valid e-mail, could look like this:

[ C# ]

/// <summary>
/// Checks if given e-mail is valid
/// </summary>
/// <param name="Email">E-mail to validate</param>
/// <returns>Returns true if e-mail is valid and false if not</returns>
private bool validateEmail(string Email)
{
    // Regular expression to validate e-mail
    string regEmail = @"^\w+([-+.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*$";
    // Creates an instance of Regex class
    Regex r = new Regex(regEmail);
    
    // Checks if given email is valid and return value
    return r.IsMatch(Email);
}

[ VB.NET ]

''' <summary>
''' Checks if given e-mail is valid
''' </summary>
''' <param name="Email">E-mail to validate</param>
''' <returns>Returns true if e-mail is valid and false if not</returns>
Private Function validateEmail(ByVal Email As String) As Boolean
    
    ' Regular expression to validate e-mail
    Dim regEmail As String = "^\w+([-+.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*$"
    ' Creates an instance of Regex class
    Dim r As Regex = New Regex(regEmail)
    
    ' Checks if given email is valid and return value
    Return r.IsMatch(Email)
End Function

There is Data Validation online application where you can test your regular expression to see if data validation works well.

String replace with Regex

In search-and-replace scenario, we need two regular expressions. First expression is used to find what should be replaced, and second expression is used to build replacement strings. To relate groups in first and second regular expression we'll use backreferences. Here is the example function that looks for valid URLs in input text and converts them to clickable <a> tags which is useful for forums, customer support applications etc.:

[ C# ]

private string getTextWithLinks(string InputText)
{
    // Regular expression to find valid URLs
    string URLExpression = @"(http(s)?://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?)";
    // Regular expression to create <a> tags
    string ATagExpression = @"<a href=""$1"">$1</a>";
    // Calling Regex class to make links
    return Regex.Replace(InputText, URLExpression, ATagExpression,
        RegexOptions.IgnoreCase);
}

[ VB.NET ]

Private Function getTextWithLinks(ByVal InputText As String) As String
     ' Regular expression to find valid URLs
     Dim URLExpression As String = "(http(s)?://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?)"
     ' Regular expression to create <a> tags
     Dim ATagExpression As String = "<a href=""$1"">$1</a>"
     ' Calling Regex class to make links
     Return Regex.Replace(InputText, URLExpression, ATagExpression, _
        RegexOptions.IgnoreCase)
End Function

As you see in second expression (string ATagExpression), backreferences can be referenced more than once. In this case $1 backreference is used two times to create clickable links. Here is Search and Replace online application to test expressions in your scenario.

Using Regex to split string

You can use Regex.Split method to split given string into string array. In this case, regular expression is used to define delimiter, so RegEx engine returns parts of text that regular expression didn't match.

Theoretically, Regex.Split is in some way like inverted Regex.Matches method. Practically, it depends what is easier to define: wanted or unwanted data in text. Use Regex.Match or Regex.Matches if it is easier for you to write regular expression that matches wanted data, otherwise use Regex.Split method if it is simpler to write expression that match unwanted data.

For example let say you want to split text in single words. Delimiter could be empty single or multiple spaces, but also comma, semicolon, new line, etc., or even all that together. Regular expressions offer pretty short and simple solution for this problem:

[ C# ]

private string[] getWordsFromText(string PlainText)
{
    // Regular expression to define delimiter
    string regDelimiter = @"\W+?";
    // Creates an instance of Regex class
    Regex r = new Regex(regDelimiter);
    
    // Finds words and return results
    return r.Split(PlainText);
}

[ VB.NET ]

Private Function getWordsFromText(ByVal PlainText As String) As String()
    
    ' Regular expression to define delimiter
    Dim regDelimiter As String = "\W+?"
    ' Creates an instance of Regex class
    Dim r As Regex = New Regex(regDelimiter)
    
    ' Finds words and return results
    Return r.Split(PlainText)
End Function

Notice that we could use data extraction with Regex.Matches method to get same results. In that case, regular expression would define wanted strings (whole words) instead of unwanted strings (a.k.a. delimiter) in Regex.Split method. Because of that, regular expression would be different too. Instead of "\W+?" if Regex.Matches method is used expression would be "\w+?". You can try Split String web form to test if your regular expression defines delimiter correctly.

Conclusion

Regular expressions are not solution for every problem. In some cases, for example if you need to get data from XML file, .Net Framework offers specialized classes from System.XML namespace. Although you can use regular expressions to extract data from XML file, that will probably demand more efforts compared to writing simple XPath query. Also, if problem is very simple, and you can extract data from text with one or two String.SubString functions, make it so.

On .Net Regular Expressions Syntax page you can find summary of rules used by regular expression language in .Net Framework.

Happy coding!


Tutorial toolbar:  Tell A Friend  |  Add to favorites  |  Feedback  |   Google


comments powered by Disqus