Perl Pattern Matching
A pattern is a sequence of characters to be searched for in a character string. Perl patterns are normally enclosed in slash characters /def/.
There are two pattern Matching operators
Pattern Matching Operators |
|
Pattern Match | Pattern matched - NON Zero or true returned |
No Pattern Match | NO Pattern matched - NON Zero or true returned Pattern Matched - 0 or False returned |
Pattern Matching Operators Examples |
|
Pattern Match | $result = $var =~ /abc/; # true if $var has abc in the string if ( $question =~ /right/ ) { .... } # true if $var has abc in the string, false if abc is not in $var |
No Pattern Match | $result = $var !~ /abc/; # true if $var does not have abc in the string if ( $question !~ /right/ ) { .... } # true if $var does not have abc, false if abc does appear in string $var |
Pattern match operators do have a order of precedence which can be viewed from the Perl Cheat Sheet.
Special Characters
There are a number of special characters which can be used inside the patterns, which enables you to match any of a number of character strings, these are what make patterns useful.
Special Characters |
|
. character | matches any character except the newline character, the special combination of .* tries to match as much as possible. |
+ character | means one or more of the preceding characters |
[ ] character | enable you to define patterns that match one of a group of alternatives, you can also uses ranges such as [0-9] or [a-z,A-Z] |
* character | match zero or more occurrences of the preceding character |
? character | match zero or one occurrence of the preceding character |
Pattern anchor | there are a number of pattern anchors, match at beginning of a string (^ or \A), match at the end of a string ($ or \Z), match on word boundary (\b) and match inside a work (\B - opposite of \b) |
Escape sequence | if you want to include a character that is normally treated as a special character, you must precede the character with a backslash, you can use the \Q to tell perl to treat everything after as a normal character until it see's \E |
Excluding | you can exclude words or characters by using the ^ inside square brackets [^] |
Character-Range escape sequences | there are special character range escape sequences such as any digit (\d), anything other than a digit (\D), to see the full list see Perl Cheat Sheet |
Specified number of occurrences | you can define how any occurrences you want to match using the {<minimum>,<maximum>} |
specify choice | the special character | (pipe) enables you to specify two or more alternatives to choose from when matching a pattern |
Portition reuse | some times you want to store what has been matched, you can do this by using (), the first set will be store in \1 (used in pattern matching) or $1 (used when assigning to variables) , the second set \2 or $2 and so on. |
Different delimiter | you can specify a different delimiter |
Special Characters Examples |
|
. character | /d.f/ # could match words like def, dif, duf /d.*f/ # could match words like deaf, deef, def, dzzf, etc |
+ character | /de+f/ # could match words like def, deef, deeef, deeeef, etc / +/ # match words between multiple spaces |
[ ] character | /d[eE]f/ # match words def or dEf /d[a-z]f/ # match words like def, def, dzf, dsf, etc |
* character | /de*f/ # match words like df, def, deef, deeef, etc |
? character | /de?f/ # match only the words df and def (not deef only matches one occurence) |
Pattern anchors | /^hello/ # match only if line starts with hello /\Bdef/ # matches abcdef (opposite of \b) |
Escape sequence | /\+salary/ # will match the word +salary, the + (plus) is treated as a normal character because of the \ /\Q**++\E/ # will match **++ |
Excluding | /d[^eE]f/ # 1st character is d, 2nd character is anything other than e or E, last character is f |
Character-Range escape sequences | /\d/ # match any digit /\d+/ # match any number of digits |
Specified number of occurrences | /de{3}f/ # match only deeeef the {3} means three preceding e's /de{1,3} # match only deef, deeef and deeeef ( minimum = 1, maximum = 3 occurrences) |
specify choice | /def|ghi/ # match either def or ghi |
Portition reuse | /(def)(ghi)/ # the first matched pattern will be store in \1 or $1, the second in \2 or $2 $result = $1; # assign the obtained matched pattern above in $result $result2 = $2; # assign the second obtained matched pattern above in $result2 |
Different delimiter | !/usr/sbin! # match /usr/sbin, here we are using the ! (bang) character as a delimiter |
Pattern-Matching Options
When you specify a pattern, you can also supply options that control how the pattern is to be matched, to see the full list see the Perl Cheat Sheet
Pattern-Matching Options Examples |
|
Match all possible patterns (global) |
@matches = "balata" =~ /.a/g; # Matches ba, la and ta $matches = "balata" =~ /.a/g; # Matches the first occurrence ba |
Ignore case | /de/i # matches de, De, dE or DE |
Treat string as multiple lines | /^The/m # Match the word in multiple lines |
Only evaluate once | /def/o # Match only the first occurrence |
Treat string as single line | /a.*bc/s # Used when you multiple lines |
Ignore white space in pattern | /\d{2} \d{2} /x # ignore the spaces in the pattern so the code is # interpreted as /\d{2}\d{2}/ spaces make the code easier # to read |
substitution | s/abc/def/ # substitute the first occurrence of abc with def Note: you can use any other pattern-matching option with substitution |
Translation | tr/abc/def/ # translate all a into d, all b into e, all c into f Note: you can use y/ instead of tr/, you have several options |