My Stuff

My Collection of Useful Stuff

« PreviousNext »

Remembering Regex Patterns

27 September 2007

It’s often useful to remember patterns that have been matched so that they can be used again. It just so happens that anything matched in parentheses gets remembered in the variables $1,…,$9. These strings can also be used in the same regular expression (or substitution) by using the special RE codes \1,…,\9. For example

$_ = “Lord Whopper of Fibbing”;
s/([A-Z])/:\1:/g;
print “$_\n”;

will replace each upper case letter by that letter surrounded by colons. It will print :L:ord :W:hopper of :F:ibbing. The variables $1,…,$9 are read-only variables; you cannot alter them yourself.

As another example, the test

if (/(\b.+\b) \1/)
{
print “Found $1 repeated\n”;
}

will identify any words repeated. Each \b represents a word boundary and the .+ matches any non-empty string, so \b.+\b matches anything between two word boundaries. This is then remembered by the parentheses and stored as \1 for regular expressions and as $1 for the rest of the program.

The following swaps the first and last characters of a line in the $_ variable:

s/^(.)(.*)(.)$/\3\2\1/

The ^ and $ match the beginning and end of the line. The \1 code stores the first character; the \2 code stores everything else up the last character which is stored in the \3 code. Then that whole line is replaced with \1 and \3 swapped round.

After a match, you can use the special read-only variables $` and $& and $’ to find what was matched before, during and after the seach. So after

$_ = “Lord Whopper of Fibbing”;
/pp/;

all of the following are true. (Remember that eq is the string-equality test.)

$` eq “Lord Wo”;
$& eq “pp”;
$’ eq “er of Fibbing”;

Finally on the subject of remembering patterns it’s worth knowing that inside of the slashes of a match or a substitution variables are interpolated. So

$search = “the”;
s/$search/xxx/g;

will replace every occurrence of the with xxx. If you want to replace every occurence of there then you cannot do s/$searchre/xxx/ because this will be interpolated as the variable $searchre. Instead you should put the variable name in curly braces so that the code becomes

$search = “the”;
s/${search}re/xxx/;

Posted in Unix Shell, REGEX, Scripting | Trackback | del.icio.us | Top Of Page

Comments are closed.