Regular Expressions in Perl
Regular Expressions I
What are regexes good for ?
- Decide if a string is part of a larger string
- Validate the format of some value (string) (e.g. is it a decimal number?, is it a hex?)
- Find if there are repetitions in a string
- Analyze a string and fetch parts of if given some loose description
Examples
Is the input given by the user a number?
(BTW which one is a number: 23, 2.3, 2.3.4, 2.4e3, abc ?)
Is there a word in the file that is repeated 3 or more times?
Replaces all occurrences of Perl or perl by Java ...
... but avoid replacing Perla.
Given a text message fetch all the phone numbers:
Fetch numbers that look like 09-1234567
then also fetch +972-2-1234567
and maybe also 09-123-4567
Check if in a given text passing your network there are credit card numbers....
Given a text find if the word "password" is in it and fetch the surrounding text.
Given a log file like this:
[Tue Jun 12 00:01:00 2006] - (3423) - INFO - ERROR log restarted
[Tue Jun 12 09:08:17 2006] - (3423) - INFO - System starts to work
[Tue Jun 13 08:07:16 2006] - (3423) - ERROR - Something is wrong
provide statistics on how many of the different levels of log messages
were seen. Separate the log messages into files.
Where can I use it ?
- grep, egrep
- Unix tools such as sed, awk, procmail
- vi, emacs, other editors
- text editors such as Multi-Edit
- .NET languages: C#, C++, VB.NET
- Java
- Perl
- PHP, Python, Ruby, Tcl
- Word, Open Office ...
- PCRE
Introduction to Regexes
- =~
- !~
- //
my $str = "Some string here";
if ($str =~ /m/) {
print "There is a match\n";
}
if ($str !~ /a/) {
print "No match\n";
}
# which is the same as
if (not $str =~ /a/) {
print "No match\n";
}
grep
The Unix(ish) grep command was named after the g/re/p
command of the UNIX ed editor.
$ grep REGEX file(s)
Tools
Linux
- kregexpeditor
- regexxer
Windows
Commercial on Windows
- Regex Buddy teaching Regex
- Power Grep grep with gui
Both Linux and Windows
Find a string in a file
#!/usr/bin/perl
use strict;
use warnings;
my $filename = shift or die "$0 FILENAME\n";
open my $fh, '<', $filename or die "Could not open '$filename'\n";
while (my $line = <$fh>) {
if ($line =~ /REGEX/) {
print $line;
}
}
Regex Examples: single character
Any line that has an 'x' in it.
Regex: /x/
Input: "abcde"
Input: "abxcde"
Input: "xabcde"
Input: "xabcxde"
Any line that starts with an 'x'.
Regex: /^x/
Input: "abcde"
Input: "abxcde"
Input: "xabcde"
Input: " xabcde"
Input: "^xabcde"
^ at the beginning of the regular expression means, match at the beginning of the string.
Regex Examples: more characters
Any line that has an 'abc' in it.
Regex: /abc/
Input: "abc xy"
Input: "abcde"
Input: "abxcxbdabce"
Input: "xabcde"
Input: "xa b c de"
Any line that starts with an 'abc'.
Regex: /^abc/
Input: "abcde"
Input: "abxcxbde"
Input: "xabcde"
Regex Examples dot .
Any line that has any of the xax, xbx, ..., that is any character between two x-es.
Regex: /x.x/
Input: "abcde"
Input: "abxcxbde"
Input: "xabcde"
Input: "xabxcxde"
Input: "adax xabcde"
Input: "adax.xabcde"
Input: "x.x"
Input: "xx"
Only lines that have x.x (A real . between two x-es.)
Regex: /x\.x/
The special characters are: . * + ? ^ $ \ ( ) [ ] | { } and the delimiter: /
Some of the characters change their special meaning based on position.
Regex Examples (character class)
Any line that has any of the #a#, #b#, #c#, #d#, #e#, #f#, #@# or #.#
Regex: /#[abcdef@.]#/
Input: "ab #q# "
Input: "ab #z#a# "
Input: "ab #.# "
Input: "ab ## "
Input: "#a#"
Input: "##"
Input: "###"
Regex: /#[a-f@.]#/
Regex Examples (^ in character class)
^ as the first character in a character class means "a character that is not listed in this character class"
Regex: /#[^abc]#/
Input: "abc #a# z"
Input: "abc #z# z"
Input: "#z#"
Input: "##"
Optional character
- ?
Match the word color or the word colour.
Regex: /colou?r/
Input: color
Input: colour
Input: colouur
Regex Examples quantifiers
- quantifiers
- ?
-
-
Any line with two - -es with anything in between.
Regex: /-.*-/
Input: "ab"
Input: "ab - cde"
Input: "ab - qqqrq -"
Input: "ab -- cde"
Input: "--"
Quantifiers
Quantifiers apply to the thing in front of them
/ax*a/ # aa, axa, axxa, axxxa, ...
/ax+a/ # axa, axxa, axxxa, ...
/ax?a/ # aa, axa
/ax{2,4}a/ # axxa, axxxa, axxxxa
/ax{3,}a/ # axxxa, axxxxa, ...
/ax{17}a/ # axxxxxxxxxxxxxxxxxa
* 0-
+ 1-
? 0-1
{n,m} n-m
{n,} n-
{n} n
Quantifiers on character classes
Regex: /-[abc]-/
Input: "-a-"
Input: "-b-"
Input: "-x-"
Input: "-ab-"
Regex: /-[abc]+-/
Input: "-a-"
Input: "-b-"
Input: "-aa-"
Input: "-ab-"
Input: "-x-"
Input: "--"
Exercises: Regular expressions
Pick up a file with some text in it. (e.g. examples/regex-perl/text.txt ). Write a script (one for each item) that prints out every line from the file that matches the requirement. You can use the script at the end of the page as a starting point but you will have to change it!
- has a 'q'
- starts with a 'q'
- has 'th'
- has an 'q' or a 'Q'
- has a '*' in it
- starts with an 'q' or an 'Q'
- has both 'a' and 'e' in it
- has an 'a' and somewhere later an 'e'
- does not have an 'a'
- does not have an 'a' nor 'e'
- has an 'a' but not 'e'
- has at least 2 consecutive vowels (a,e,i,o,u) like in the word "bear"
- has at least 3 vowels
- has at least 6 characters
- has at exactly 6 characters
- all the words with either 'Bar' or 'Baz' in them
- all the rows with either 'apple pie' or 'banana pie' in them
- for each row print if it was apple or banana pie?
- Bonus: Print if the same word appears twice in the same line
- Bonus: has a double character (e.g. 'oo')
#!/usr/bin/perl
use strict;
use warnings;
my $filename = shift or die "$0 FILENAME\n";
open my $fh, '<', $filename or die "Could not open '$filename'\n";
while (my $line = <$fh>) {
if ($line =~ /REGEX1/) {
print "has an a: $line";
}
if ($line =~ /REGEX2/) {
print "starts with an a: $line";
}
}
Solutions: Regular expressions
- has a 'q'
/q/
- starts with a 'q'
/^q/
- has 'th'
/th/
- has an 'q' or a 'Q'
/[qQ]/
- has a
*
in it/\*/
- another solution:
/[*]/
- starts with an 'q' or an 'Q'
/^[qQ]/
- has both 'a' and 'e' in it
$str =~ /a/ and $str =~ /e/
- has an 'a' and somewhere later an 'e'
/a.*e/
- does not have an 'a'
$str !~ /a/
Not good:/[^a]/
- does not have an 'a' nor 'e'
$str !~ /[ae]/
- has an 'a' but not 'e' $str =~ /a/ and $str !~ /e/
- has at least 2 consecutive vowels (a,e,i,o,u) like in the word "bear" /[aeiou]{2}/
- has at least 3 vowels /[aeiou].[aeiou].[aeiou]/
- has at least 6 characters /....../
- another solution: /.{6}/
- yet another solution: length($str) >= 6
- has at exactly 6 characters: length($str) == 6
- all the words with either 'Bar' or 'Baz' in them /Ba[rz]/
- all the rows with either 'apple pie' or 'banana pie' in them
if ($row =~ /apple pie/ or $row =~ /banana pie/){ }
- for each row print if it was apple or banana pie?
my $ok;
if ($row =~ /apple pie/) {
print "apple\n";
$ok = 1;
} elsif ($row =~ /banana pie/) {
print "banana\n";
$ok = 1;
}
Regular Expressions II
Grouping, alternatives
all the rows with either 'apple pie' or 'banana pie' in them
if ($row =~ /apple pie/ or $row =~ /banana pie/) {
}
Alternatives
if ($row =~ /apple pie|banana pie/) {
}
Move the common part in one place and limit the alternation to the part within the parentheses.
if ($row =~ /(apple|banana) pie/) {
}
Capturing
- \1
- $1
- ()
- capturing
- grouping
if ($row =~ /(apple|banana) pie/) {
print $1;
}
Has a double character (e.g. 'oo' in loop)
# Input: "my loop"
if ($line =~ /(.)\1/) {
print $1;
}
Print if the same word appears twice in the same line
my $str = " in this line there is this word twice ";
if ($str =~ / ([a-z]+) .* \1 /) {
print "$1\n";
}
/(.+).*\1/ # lines with anything more than once
/((.+).*\1)+/ # Syntax error ! Why ?
if ($line =~ /(.*)=(.*)/) {
print "left: $1\n";
print "right: $2\n";
}
Anchors
- anchors
- ^
- $
- \b
^ # at the beginning of the pattern means beginning of the string
$ # at the end of the pattern means the end of the string
/the/ # matches anywhere in the string:
# "atheneum", "thermostat", "the", "mathe"
/^the/ # matches only if the string starts with the
# "thermostat", "the"
/the$/ # matches only if the string ends with the
# "the", "mathe"
/^the$/ # matches only if the string
# "the"
/^The.*finished$/
# starts with "The" ends with "finished" and anything in between
\b # Word delimiter
/\bstruct\b/ # match every place the word "struct" but not
# "structure" or "construct"
Character classes
A list of optional characters within square brackets []
/a[bc]a/ # aba, aca
/a[2#=x?.]a/ # a2a, a#a, a=a, axa, a?a, a.a
# inside the character class most of the spec characters lose their
# special meaning BUT there are some new special characters
/a[2-8]a/ # is the same as /a[2345678]a/
/a[2-]a/ # a2a, a-a - has no special meaning at the ends
/a[-8]a/ # a8a, a-a
/a[6-C]a/ # a6a, a7a ... aCa
# characters from the ASCII table: 6789:;<=>?@ABC
/a[C-6]a/ # syntax error
/a[^xa]a/ # "aba", "aca" but not "aaa", "axa" what about "aa" ?
# ^ as the first character in a character class means
# a character that is not in the list
/a[a^x]a/ # aaa, a^a, axa
Special character classes
- \w
- \d
- \s
- ^
- $
| Expression | Meaning | Usage | | \w | Word characters: [a-zA-Z0-9_] | \w+ or [\w#-]+ | | \d | Digits: [0-9] | | | \s | [\f\t\n\r ] form-feed, tab, newline, carriage return and SPACE | | | \W | [^\w] | | | \D | [^\d] | | | \S | [^\s] | | | [:class:] | POSIX character classes (alpha, alnum...) | [:alpha:]+ or [[:alpha:]#-]+ | | \p{...} | Unicode definitions (IsAlpha, IsLower, IsHebrew, ...) | \p{IsHebrew}+ or [\p{IsHebrew}#!-] | | \P{...} | Negation of Unicode character class | |
See also perldoc perlre
and perldoc perluniintro
.
Exercise: Hex/Oct/Bin
Write functions that return true if the given value is a
- Hexadecimal number
- Octal number
- Binary number
Exercise: Number
Write a function that given a string it return true if the string is a number. As there might be several definitions of what is the number create several solutions one for each definition:
- Non negative integer.
- Integer. (Will you also allow + in front of the number or only - ?
- Real number. (Do you allow .3 ? What about 2. ?
- In scientific notation. (something like this: 2.123e4 )
Exercise: Roman numbers
Write functions that return true if the given value is a Roman Number. If you can do that maybe write another function to return the decimal value of the given number.
I, II, III, IV, V, VI, VII,....
I = 1
V = 5
X = 10
L = 50
C = 100
D = 500
M = 1000
Solution: Number
#!/usr/bin/perl
use strict;
use warnings;
while (my $number = <STDIN>) {
if (is_non_negative($number)) {
print "non negative integer without +- sign\n"; # 0, 3, 7
}
if (is_integer($number)) {
print "integer with optional +- sign\n"; # -1, +3
}
if (is_real($number)) {
print "real number with decimal point\n"; # 3.1, 0.0, .3, 2., -.7
}
if (is_exp($number)) {
print "exponential format\n"; # .1e
}
if (is_exp2($number)) {
print "exponential format (x)\n"; # .1e
}
}
sub is_non_negative { $_[0] =~ /^\d+$/ }
sub is_integer { $_[0] =~ /^[+-]?\d+$/ }
sub is_real { $_[0] =~ /\d/ and $_[0] =~ /^[+-]?\d*\.?\d*$/}
sub is_exp { $_[0] =~ /\d/ and $_[0] =~ /^[+-]?\d*\.?\d*(e[+-]?\d+)?$/}
sub is_exp2 { $_[0] =~ /\d/ and $_[0] =~ /^
[+-]? # optional + or - sign
\d* # 0 or more digits before the decimal point
\.? # optional decimal point
\d* # 0 or more digits after the decimal point
(e[+-]?\d+)? # optional "e" followed by an integer number
$/x}
Solution: Hex/Oct/Bin
#!/usr/bin/perl
use strict;
use warnings;
while (my $number = <STDIN>) {
if (is_hex($number)) {
print "Hexadecimal number\n"; # 0xAD37F
}
if (is_octal($number)) {
print "Octal number\n"; # 02432471
}
if (is_binary($number)) {
print "Binary number\n"; # 0b01110
}
}
sub is_hex { $_[0] =~ /^0x[\da-fA-F]+$/ }
sub is_octal { $_[0] =~ /^0[0-7]+$/ }
sub is_binary { $_[0] =~ /^0b[01]+$/ }
Solution: Roman numbers
#!/usr/bin/perl
use strict;
use warnings;
while (my $number = <STDIN>) {
# This solution only check is the string consists of characters
# used in as Roman numbers but does not check if the number is
# actually a valid number. (e.g. IVI is not valid)
# I yet to see a definition on how to validate a Roman number.
if (is_roman($number)) {
print "Roman number\n";
}
}
sub is_roman { $_[0] =~ /^[IVXLCDM]+$/ }
sub is_roman2 { $_[0] =~ /^
(M{0,4})
(CM|CD|D?C{0,3})
(XL|XC|L?X{0,3})
(IV|IX|V?I{0,3})
$/x }
Regular Expressions
Regexp::Common
#!/usr/bin/perl
use strict;
use warnings;
use Regexp::Common 'RE_ALL';
my $file = 'regexp_common.txt';
if (@ARGV) {
$file = shift;
}
open(my $data, '<', $file) or die "Could not open $file\n";
while (my $line = <$data>) {
chomp $line;
print "LINE: '$line'";
if ($line =~ RE_balanced(-parens=>'()')) {
print " ** balanced parentheses";
}
if ($line =~ RE_num_real()) {
print " ** a real number";
}
if ($line =~ RE_num_int()) {
print " ** an integer";
}
print "\n";
}
one
(two)
(three))
((three)
)four(
3.4
42
Options and modifiers
// is actually the same as m//
When using the m sign you can change the delimiters:
Let's say you would like to match lines with
/usr/bin/perl
if ($line =~ /\/usr\/bin\/test-perl/) {
}
if ($line =~ m{/usr/bin/perl}) {
}
/i Case sensitivity
$line = "Apple";
/apple/ # does not match
/apple/i # case insensitive will match
Matching quotes
In <this text> there are <some in marks>,
but there are more lines in this file and there
are more <such symbols>. How can someone match sentences?
A sentence - in our case, starts with a capital letter
and ends with a dot.
#!/usr/bin/perl
use strict;
use warnings;
my $filename = 'sentences.txt';
my $content = slurp($filename);
if ($content =~ /<.*>/) {
print "$&\n";
print "-----------\n";
}
# If we want the smallest:
if ($content =~ /<[^>]*>/) {
print "$&\n";
print "-----------\n";
}
# If we want the biggest
if ($content =~ /<.*>/s) {
print "$&\n";
print "-----------\n";
}
sub slurp {
my $file = shift;
open my $fh, '<', $file or die "Could not open '$file' $!";
local $/ = undef;
my $all = <$fh>;
close $fh;
return $all;
}
/s single line
. will match any character (including newline)
/m multiple lines
^ will match beginning of line
$ will match end of line
\A still matches beginning of string
\z
\Z
#!/usr/bin/perl
use strict;
use warnings;
#../regex/examples/text/american-english
my $filename = shift or die;
my $data;
{
open my $fh, '<', $filename or die;
local $/ = undef;
$data = <$fh>
}
if ($data =~ /(^a.*\nb.*\n)/mi) {
print $1;
}
/x enable white spaces and comments
/(X\d+).*\1/
/
(X\d+) # product number
.* # any character
\1 # the same product number
/x
Substitute
- s/PATTERN/REPLACEMENT/
- s{PATTERN}{REPLACEMENT}
$line = "abc123def";
$line =~ s/\d+/ /; # "abc def"
$line =~ s{
([a-z]*)
(\d*)
([a-z]*)
}
{$3$2$1}x; # "def123abc"
Global substitute
$line = "abc123def";
$line =~ s/.../x/; # "x123def";
$line =~ s/.../x/g; # "xxx";
$line =~ s/(.)(.)/$2$1/; # "bac123def"
$line =~ s/(.)(.)/$2$1/g; # "ba1c32edf"
Greedy quantifiers
use strict;
use warnings;
'xaaab' =~ /xa*/;
print "$&\n";
'xabxaab' =~ /xa*/;
print "$&\n";
'xabxaab' =~ /a*/;
print "$&\n";
/xa*/ on xaaab xaaa because it is greedy
/xa*/ on xabxaab xa at the beginning even though the other one is longer
/a*/ on xabxaab the empty string at the beginning of the string
minimal match
/a.*b/ axbzb
/a.*?b/ axbzb
/a.*b/
/a.*?b/
axy121413413bq
axyb121413413q
Replace spaces
s/^\s+// leading
s/\s+$// trailing
both ends:
s/^\s*(.*)\s*$/$1/ " abc " => "abc " because of the greediness
s/^\s*(.*?)\s*$/$1/ " abc " => "abc" minimal match
Replace string in assembly code
mv A, R3
mv R2, B
mv R1, R3
mv B, R4
add A, R1
add B, R1
add R1, R2
add R3, R3
add R21, X
add R12, Y
mv X, R2
#!/usr/bin/perl
use strict;
use warnings;
# assuming there are no R4 values then 4 substitutions will do
s/R1/R4/g
s/R3/R1/g
s/R2/R3/g
s/R4/R2/g
# or without any assumption and in one substitution:
my %map = (
R1 => 'R2',
R2 => 'R3',
R3 => 'R1',
);
s/\b(R[123])\b/$map{$1}/g;
s/\b(R1|R2|R3|R12)\b/$map{$1}/g;
my $regex = join "|", keys %map;
s/\b($regex)\b/$map{$1}/g;
Full example of previous
#!/usr/bin/perl
use strict;
use warnings;
use File::Slurp qw(read_file write_file);
my %conversion = (
R1 => 'R2',
R2 => 'R3',
R3 => 'R1',
R12 => 'R21',
R21 => 'R12',
);
replace(\%conversion, \@ARGV);
sub replace {
my ($map, $files) = @_;
my $regex = join "|", keys %$map;
my $ts = time;
foreach my $file (@$files) {
my $data = read_file($file);
$data =~ s/\b($regex)\b/$map->{$1}/g;
rename $file, "$file.$ts"; # backup with current timestamp
write_file( $file, $data);
}
}
split with regular expression
LIST = split REGEX, STRING;
fname = Foo
lname = Bar
email=foo@bar.com
#!/usr/bin/perl
use strict;
use warnings;
# data: field_value_pairs.txt
my $filename = shift or die "Usage: $0 filename\n";
open(my $fh, "<", $filename) or die "Could not open '$filename'\n";
while (my $line = <$fh>) {
chomp $line;
my ($field, $value) = split /\s*=\s*/, $line;
print "$value=$field\n";
}
Fixing dates
In the input we get dates like this 2010-7-5 but we would like to make sure we have two digits for both days and months: 2010-07-05
#!/usr/bin/perl
use strict;
use warnings;
sub fix_date {
my $str = shift;
$str =~ s/-(\d)\b/-0$1/g;
return $str;
}
my %dates = (
'2010-7-5' => '2010-07-05',
'2010-07-5' => '2010-07-05',
'2010-07-05' => '2010-07-05',
'2010-7-15' => '2010-07-15',
);
use Test::More;
plan tests => scalar keys %dates;
foreach my $in (sort keys %dates) {
my $result = fix_date($in);
is $result, $dates{$in}, $in;
# print " old: $in\n";
# print " new: $result\n";
# print " expected: $dates{$in}\n\n";
}
old: 2010-07-05
new: 2010-07-05
expected: 2010-07-05
old: 2010-07-5
new: 2010-07-05
expected: 2010-07-05
old: 2010-7-15
new: 2010-07-15
expected: 2010-07-15
old: 2010-7-5
new: 2010-07-05
expected: 2010-07-05
1..4
ok 1 - 2010-07-05
ok 2 - 2010-07-5
ok 3 - 2010-7-15
ok 4 - 2010-7-5
date.pl .. ok
All tests successful.
Files=1, Tests=4, 0 wallclock secs ( 0.05 usr + 0.00 sys = 0.05 CPU)
Result: PASS
Exercise: split CGI
Given a string that looks like this:
my $str = 'fname=Foo&lname=Bar&email=foo@bar.com';
Create a hash where the keys are fname, lname, email or if the string looks like this
my $str = 'title=Stargates&year=2005&chapter=03&bitrate=128';
then create a hash where the keys are title, year, chapter, bitrate Use a single statement (with split) to achieve this.
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my @input = (
'fname=Foo&lname=Bar&email=foo@bar.com',
'ip=127.0.0.1&machine=foobar',
);
foreach my $str (@input) {
process($str);
}
sub process {
my $str = shift;
my @pairs = split /&/, $str;
my %data;
foreach my $p (@pairs) {
my ($k, $v) = split /=/, $p;
$data{$k} = $v;
}
print Dumper \%data;
}
$VAR1 = {
'email' => 'foo@bar.com',
'lname' => 'Bar',
'fname' => 'Foo'
};
$VAR1 = {
'ip' => '127.0.0.1',
'machine' => 'foobar'
};
Exercise: basename/dirname
Create two functions basename()
and dirname()
Given a path such as /home/foo/.mozilla/cache/data.txt
the basename()
function should return the filename ( data.txt in the example).
The dirname()
function should return the full-path directory name
( /home/foo/.mozilla/cache
in the example.)
Exercise: Sort SNMP numbers
Given a file with SNMP numbers (one number on every line) print them in sorted order comparing the first number of each SNMP number first. If they are equal then comparing the second number, etc...
input:
1.2.7.6
4.5.7.23
1.2.7
1.12.23
2.3.5.7.10.8.9
output:
1.2.7
1.2.7.6
1.12.23
2.3.5.7.10.8.9
4.5.7.23
Exercise: parse hours log file and give report
The log file looks like this
{% embed include file="src/examples/regex-perl/timelog.log)
the report should look something like this:
09:20-11:00 Introduction
11:00-11:15 Exercises
11:15-11:35 Break
...
Solutions 95 minutes 9%
Break 65 minutes 6%
...
Exercise: Parse ini file
An ini file has sections starting by the name of the section in square brackets and within each section there are key = value pairs with optional spaces around the "=" sign. The keys can only contain letters, numbers, underscore or dash. In addition there can be empty lines and lines starting with # which are comments.
Given a filename, generate a 2 dimensional hash and then print it out with Data::Dumper. Example ini file:
{% embed include file="src/examples/regex-perl/inifile.ini)
Result
$VAR1 = {
'earth' => {
'base' => 'London',
'ship' => 'x-wing'
},
'alpha' => {
'base' => 'moon',
'ship' => 'alpha 3'
}
};
Exercise: parse perl file
Parse your perl files and print out the names of your variables. In the first version print out the scalar variables only. In the second version show all variables.
The user gives the name of the file on the command line.
Input file:
#!/usr/bin/perl
use strict;
use warnings;
my $filename = shift or die "Usage: $0 filename\n";
open(my $fh, "<", $filename) or die "Could not open '$filename'\n";
my @snmps = <$fh>;
chomp @snmps;
print join "\n", @snmps;
print "\n------------------\n";
my @sorted_snmps = sort by_snmp_number @snmps;
print join "\n", @sorted_snmps;
sub by_snmp_number {
return 0 if $a eq $b;
my @a = split /\./, $a;
my @b = split /\./, $b;
foreach my $i (0..@a-1) {
return 1 if $i >= @b;
next if $a[$i] == $b[$i];
return $a[$i] <=> $b[$i];
}
return -1;
}
print "\n------------------\n";
Output:
$filename
$0
$fh
@snmps
@sorted_snmps
$a
$b
@a
@b
$i
Solution: Split CGI
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my @input = (
'fname=Foo&lname=Bar&email=foo@bar.com',
'ip=127.0.0.1&machine=foobar',
);
foreach my $str (@input) {
process($str);
}
sub process {
my $str = shift;
my %data = split /[=&]/, $str;
print Dumper \%data;
}
Solution: filename/dirname
#!/usr/bin/perl
use strict;
use warnings;
my $path = "/home/foo/.mozilla/cache/data.txt";
my $filename = ($path =~ m{([^/]*)$} ? $1 : "");
my $dirname = ($path =~ m{^(.*)/} ? $1 : "");
my ($file_name) = $path =~ m{([^/]*)$};
my ($dir_name) = $path =~ m{^(.*)/};
my ($dir, $file) = $path =~ m{^(.*)/(.*)$};
print "$path\n";
print "---------\n";
print "$filename\n";
print "$dirname\n";
print "---------\n";
print "$file_name\n";
print "$dir_name\n";
print "---------\n";
print "$file\n";
print "$dir\n";
print "---------\n";
use File::Basename;
print basename($path) . "\n";
print dirname($path) . "\n";
Solution: Sort SNMP numbers
#!/usr/bin/perl
use strict;
use warnings;
my $filename = shift or die "Usage: $0 filename\n";
open(my $fh, "<", $filename) or die "Could not open '$filename'\n";
my @snmps = <$fh>;
chomp @snmps;
print join "\n", @snmps;
print "\n------------------\n";
my @sorted_snmps = sort { cmp_snmp($a, $b) } @snmps;
print join "\n", @sorted_snmps;
sub cmp_snmp {
my ($first, $second) = @_;
if ($first eq $second) {
return 0;
}
my @F = split /\./, $first;
my @S = split /\./, $second;
foreach my $i (0..@F-1) {
if ($i >= @S) {
return 1;
}
if ($F[$i] < $S[$i]) {
return -1;
}
if ($F[$i] > $S[$i]) {
return 1;
}
}
return -1;
}
print "\n------------------\n";
Solution: parse hours log file and give report
#!/usr/bin/perl
use strict;
use warnings;
my $filename = shift or die "Usage: $0 filename\n";
my @entries;
my %stat;
open(my $fh, "<", $filename) or die "Could not open '$filename' $!";
while (my $line = <$fh>) {
chomp $line;
next if $line =~ /^#/;
if ($line =~ /\S/) {
push @entries, $line;
next;
}
process_day();
@entries = ();
}
process_day(); # in case there is no empty line after the last line
foreach my $title (keys %stat) {
printf "%-25s %4s minutes %3s%%\n",
$title, $stat{$title}, int(100 * $stat{$title} / $stat{Total});
}
sub process_day {
my @day;
foreach my $e (@entries) {
my ($time, $title) = split / /, $e, 2;
if (@day) {
$day[-1]{end} = $time;
my ($start_hour, $start_min) = split /:/, $day[-1]{start};
my ($end_hour, $end_min) = split /:/, $day[-1]{end};
$day[-1]{total} = $end_hour*60+$end_min - ($start_hour*60+$start_min);
if ($day[-1]{title} =~ /Break|Exercises|Solutions/) {
$stat{$day[-1]{title}} += $day[-1]{total};
} else {
$stat{Lectures} += $day[-1]{total};
}
$stat{Total} += $day[-1]{total};
print "$day[-1]{start}-$day[-1]{end} $day[-1]{title}\n";
}
if ($title ne "End") {
push @day, {
start => $time,
title => $title,
};
}
}
print "\n";
return;
}
Solution: Parse ini file
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper qw(Dumper);
my $filename = shift or die "Usage: $0 filename\n";
open my $fh, '<', $filename or die "Could not open '$filename' $!";
my $section;
my %data;
while (my $line = <$fh>) {
if ($line =~ /^\s*#/) {
next; # skip comments
}
if ($line =~ /^\s*$/) {
next; # skip empty lines
}
if ($line =~ /^\[(.*)\]\s*$/) {
$section = $1;
next;
}
if ($line =~ /^([^=]+?)\s*=\s*(.*?)\s*$/) {
my ($field, $value) = ($1, $2);
if (not defined $section) {
warn "Line outside of seciton '$line'\n";
next;
}
$data{$section}{$field} = $value;
}
}
print Dumper \%data;
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper qw(Dumper);
use Config::Tiny;
my $filename = shift or die "Usage: $0 filename\n";
open my $fh, '<', $filename or die "Could not open '$filename' $!";
my $data = Config::Tiny->read( $filename );
print Dumper $data;
Solution: parse perl file
#!/usr/bin/perl
use strict;
use warnings;
# scalars only but finds only the first variable on every line
#while (<>) {
# if (/(\$\w+)\b/) {
# if (not defined $h{$ARGV}{$1}) {
# $h{$ARGV}{$1}=1;
# print "$ARGV: $1\n";
# }
# }
#}
# scalars $ or arrays @ or hashes %
# including all variables on every line
my %h;
while (my $line = <>) {
if (my @vars = $line =~/[\$@%]\w+\b/g) {
foreach my $v (@vars) {
if (not defined $h{$ARGV}{$v}) {
$h{$ARGV}{$v}=1;
print "$ARGV: $v\n";
}
}
}
}
Regular Expressions Cheat sheet
| Expression | Meaning | | a | Just an 'a' character | | . | any character except new-line | | [bgh.] | one of the chars listed in the character class b,g,h or . | | [b-h] | The same as [bcdefgh] | | [a-z] | Lower case letters | | [b-] | The letter b or - | | [^bx] | Anything except b or x | | \w | Word characters: [a-zA-Z0-9_] | | \d | Digits: [0-9] | | \s | [\f\t\n\r ] form-feed, tab, newline, carriage return and SPACE | | \W | [^\w] | | \D | [^\d] | | \S | [^\s] | | [:class:] | POSIX character classes (alpha, alnum...) | | \p{...} | Unicode definitions (IsAlpha, IsLower, IsHebrew, ...) | | a* | 0-infinite 'a' characters | | a+ | 1-infinite 'a' characters | | a? | 0-1 'a' characters | | a{n,m} | n-m 'a' characters | | ( ) | Grouping and capturing | | | | Alternation | | \1, \2 | Capture buffers | | $1, $2 | Capture variables | | ^ $ | Beginning and end of string anchors |
See also perldoc perlre