Regular Expressions

How to use regular expressions in Notepad++ (tutorial)

In case you have the plugins installed, try Ctrl+R or in the TextFX -> TextFX Quick -> Find/Replace to get a sophisticated dialogue including a drop down for regular expressions and multi line search/replace

Notepad++ regex syntax

In a regular expression (shortened into regex throughout), special characters interpreted are:

.

Matches any character except the new line character

(

This marks the start of a region for tagging a match; so what's inside ( ) you can use in "replace with" using \1, \2 etc.

)

This marks the end of a tagged region.

\n

Where n is 1 through 9 refers to the first through ninth tagged region when replacing.

  • For example, if the search string was Fred([1-9])XXX and the replace string was Sam\1YYY , when applied to Fred2XXX this would generate Sam2YYY .

\<

This matches the start of a word using Scintilla's definitions of words.

\>

This matches the end of a word using Scintilla's definition of words.

\x

This allows you to use a character x that would otherwise have a special meaning. For example, \[ would be interpreted as [ and not as the start of a character set.

[...]

This indicates a set of characters, for example, [abc] means any of the characters a, b or c. You can also use ranges, for example [a-z] for any lower case character.

[^...]

The complement of the characters in the set. For example, [^A-Za-z] means any character except an alphabetic character.

^ This matches the start of a line (unless used inside a set, see above).

$

This matches the end of a line.

*

This matches 0 or more times. For example, Sa*m matches Sm , Sam , Saam , Saaam and so on.

+

This matches 1 or more times. For example, Sa+m matches Sam , Saam , Saaam and so on.

Source of this information is the Scintilla edit component help, but it was adapted to Notepad++ behavior.

Character Classes:

Character Classes in regular expressions match a selection of characters at once. For example, "\d" will match any digit from 0 to 9 inclusive. "\w" will match letters and digits, and "\W" will match everything but letters and digits. A pattern to identify letters, numbers or white-space could be

\d will match any digit from 0 to 9 inclusive.

\w will match letters and digits

\W will match everything but letters and digits

$ will take you to the End-of-line

Examples:

IMPORTANT

  • You have to check the box "regular expression" in search & replace dialog

  • When copying the strings out of here, pay close attention not to have additional spaces in front of them! Then the RegExp will not work!

Example 1

You use a MediaWiki (e.g. Wikipedia, Wikitravel) and want to make all headings one "level higher", so a H2 becomes a H1 etc.

    • Search ^=(=)

    • Replace with \1

    • Click "Replace all"

    • You do this to find all headings2...9 (two equal sign characters are required) which begin at line beginning (^) and to replace the two equal sign characters by only the last of the two, so eleminating one and having one remaining.

    • Search =(=)$

    • Replace with \1

    • Click "Replace all"

    • You do this to find all headings2...9 (two equal sign characters are required) which end at line ending ($) and to replace the two equal sign characters by only the last of the two, so eleminating one and having one remaining.

== title == became = title =, you're done :-)

Example 2

You have a document with a lot of dates, which are in German date format (dd.mm.yy) and you'd like to transform them to sortable format (yy-mm-dd). Don't be afraid by the length of the search term – it's long, but consiting of pretty easy and short parts.

Do the following:

  • Search ([^0-9])([0123][0-9])\.([01][0-9])\.([0-9][0-9])([^0-9])

  • Replace with \1\4-\3-\2\5

  • Click "Replace all"

You do this to fetch

  • the day, whose first number can only be 0, 1, 2 or 3

  • the month, whose first number can only be 0 or 1

  • but only if the spearator is . and not any charcter ( . versus \. )

  • but only if no numbers are sourrounding the date, as then it might be an IP address instead of a date

and to write all of this in the opposite order, except for the surroundings. Pay attention: Whatever SEARCH matches will be deleted and only replaced by the stuff in the REPLACE field, thus it is mandatory to have the surroundings in the REPLACE field as well!

Outcome:

  • 31.12.97 became 97-12-31

  • 14.08.05 became 05-08-14

  • the IP address 14.13.14.14 did not change

You're done :-)

Example 3

You have printed in windows a file list using dir /b/s >filelist.txt to the file filelist.txt and want to make local URLs out of them.

  1. Open filelist.txt with Notepad++

    • Search \\

    • Replace with /

    • Click "Replace all" to change windows path separator char \ into URL path separator char /

    • Search ^(.*)$

    • Replace with file:///\1

    • Click "Replace all" to add file:/// in the beginning of all lines

According on your requirements, preceed to escape some characters like space to %20 etc. C:\!\aktuell.csv became file:///C:/!/aktuell.csv, you're done :-)

Example 4

Another Search Replace Example

[Data] AS AF AFG 004 Afghanistan EU AX ALA 248 Ŭand Islands EU AL ALB 008 Albania, People's Socialist Republic of AF DZ DZA 012 Algeria, People's Democratic Republic of OC AS ASM 016 American Samoa EU AD AND 020 Andorra, Principality of AF AO AGO 024 Angola, Republic of NA AI AIA 660 Anguilla AN AQ ATA 010 Antarctica (the territory South of 60 deg S) NA AG ATG 028 Antigua and Barbuda SA AR ARG 032 Argentina, Argentine Republic AS AM ARM 051 Armenia NA AW ABW 533 Aruba OC AU AUS 036 Australia, Commonwealth of

  • Search for: ([A-Z]+) ([A-Z]+) ([A-Z]+) ([0-9]+) (.*)

  • Replace with: \1,\2,\3,\4,\5

  • Hit "Replace All"

Final Data:

AS,AF,AFG,004,Afghanistan EU,AX,ALA,248,Ŭand Islands EU,AL,ALB,008,Albania, People's Socialist Republic of AF,DZ,DZA,012,Algeria, People's Democratic Republic of OC,AS,ASM,016,American Samoa EU,AD,AND,020,Andorra, Principality of AF,AO,AGO,024,Angola, Republic of NA,AI,AIA,660,Anguilla AN,AQ,ATA,010,Antarctica (the territory South of 60 deg S) NA,AG,ATG,028,Antigua and Barbuda SA,AR,ARG,032,Argentina, Argentine Republic AS,AM,ARM,051,Armenia NA,AW,ABW,533,Aruba OC,AU,AUS,036,Australia, Commonwealth of