Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Regular Expressions in Perl

Regular Expressions I

What are regexes good for ?

  • Decide if a string is part of a larger string
  • Validate the format of some value (string) (e.g. is it a decimal number?, is it a hex?)
  • Find if there are repetitions in a string
  • Analyze a string and fetch parts of if given some loose description

Examples

Is the input given by the user a number?

(BTW which one is a number:  23, 2.3,  2.3.4, 2.4e3, abc ?)

Is there a word in the file that is repeated 3 or more times?

Replaces all occurrences of Perl or perl by Java ...
... but avoid replacing Perla.


Given a text message fetch all the phone numbers:
Fetch numbers that look like 09-1234567
then also fetch +972-2-1234567
and maybe also 09-123-4567

Check if in a given text passing your network there are credit card numbers....

Given a text find if the word "password" is in it and fetch the surrounding text.

Given a log file like this:

[Tue Jun 12 00:01:00 2006] - (3423) - INFO - ERROR log restarted
[Tue Jun 12 09:08:17 2006] - (3423) - INFO - System starts to work
[Tue Jun 13 08:07:16 2006] - (3423) - ERROR - Something is wrong

provide statistics on how many of the different levels of log messages
were seen. Separate the log messages into files.

Where can I use it ?

  • grep, egrep
  • Unix tools such as sed, awk, procmail
  • vi, emacs, other editors
  • text editors such as Multi-Edit
  • .NET languages: C#, C++, VB.NET
  • Java
  • Perl
  • PHP, Python, Ruby, Tcl
  • Word, Open Office ...
  • PCRE

Introduction to Regexes

  • =~
  • !~
  • //
my $str = "Some string here";
if ($str =~ /m/) {
    print "There is a match\n";
}

if ($str !~ /a/) {
    print "No match\n";
}

# which is the same as

if (not $str =~ /a/) {
    print "No match\n";
}

grep

The Unix(ish) grep command was named after the g/re/p command of the UNIX ed editor.

$ grep REGEX file(s)

Tools

Linux

  • kregexpeditor
  • regexxer

Windows

Commercial on Windows

Both Linux and Windows

Find a string in a file

#!/usr/bin/perl
use strict;
use warnings;

my $filename = shift or die "$0 FILENAME\n";
open my $fh, '<', $filename or die "Could not open '$filename'\n";

while (my $line = <$fh>) {
    if ($line =~ /REGEX/) {
        print $line;
    }
}

Regex Examples: single character

Any line that has an 'x' in it.

Regex: /x/
Input: "abcde"
Input: "abxcde"
Input: "xabcde"
Input: "xabcxde"

Any line that starts with an 'x'.

Regex: /^x/
Input: "abcde"
Input: "abxcde"
Input: "xabcde"
Input: " xabcde"
Input: "^xabcde"

^ at the beginning of the regular expression means, match at the beginning of the string.

Regex Examples: more characters

Any line that has an 'abc' in it.

Regex: /abc/
Input: "abc xy"
Input: "abcde"
Input: "abxcxbdabce"
Input: "xabcde"
Input: "xa b c de"

Any line that starts with an 'abc'.

Regex: /^abc/
Input: "abcde"
Input: "abxcxbde"
Input: "xabcde"

Regex Examples dot .

Any line that has any of the xax, xbx, ..., that is any character between two x-es.

Regex: /x.x/
Input: "abcde"
Input: "abxcxbde"
Input: "xabcde"
Input: "xabxcxde"
Input: "adax xabcde"
Input: "adax.xabcde"
Input: "x.x"
Input: "xx"

Only lines that have x.x (A real . between two x-es.)

Regex: /x\.x/
The special characters are: . * + ? ^ $ \ ( ) [ ] | { }    and the delimiter: /

Some of the characters change their special meaning based on position.

Regex Examples (character class)

Any line that has any of the #a#, #b#, #c#, #d#, #e#, #f#, #@# or #.#

Regex: /#[abcdef@.]#/
Input: "ab #q# "
Input: "ab #z#a# "
Input: "ab #.# "
Input: "ab ## "
Input: "#a#"
Input: "##"
Input: "###"
Regex: /#[a-f@.]#/

Regex Examples (^ in character class)

^ as the first character in a character class means "a character that is not listed in this character class"

Regex: /#[^abc]#/
Input: "abc #a# z"
Input: "abc #z# z"
Input: "#z#"
Input: "##"

Optional character

  • ?

Match the word color or the word colour.

Regex: /colou?r/
Input: color
Input: colour
Input: colouur

Regex Examples quantifiers

  • quantifiers
  • ?

Any line with two - -es with anything in between.

Regex: /-.*-/
Input: "ab"
Input: "ab - cde"
Input: "ab - qqqrq -"
Input: "ab -- cde"
Input: "--"

Quantifiers

Quantifiers apply to the thing in front of them

/ax*a/      # aa, axa, axxa, axxxa, ...
/ax+a/      #     axa, axxa, axxxa, ...
/ax?a/      # aa, axa
/ax{2,4}a/  #          axxa, axxxa, axxxxa
/ax{3,}a/   #                axxxa, axxxxa, ...
/ax{17}a/   #                                 axxxxxxxxxxxxxxxxxa
*      0-
+      1-
?      0-1
{n,m}  n-m
{n,}   n-
{n}    n

Quantifiers on character classes

Regex: /-[abc]-/
Input: "-a-"
Input: "-b-"
Input: "-x-"
Input: "-ab-"
Regex: /-[abc]+-/
Input: "-a-"
Input: "-b-"
Input: "-aa-"
Input: "-ab-"
Input: "-x-"
Input: "--"

Exercises: Regular expressions

Pick up a file with some text in it. (e.g. examples/regex-perl/text.txt ). Write a script (one for each item) that prints out every line from the file that matches the requirement. You can use the script at the end of the page as a starting point but you will have to change it!

  • has a 'q'
  • starts with a 'q'
  • has 'th'
  • has an 'q' or a 'Q'
  • has a '*' in it
  • starts with an 'q' or an 'Q'
  • has both 'a' and 'e' in it
  • has an 'a' and somewhere later an 'e'
  • does not have an 'a'
  • does not have an 'a' nor 'e'
  • has an 'a' but not 'e'
  • has at least 2 consecutive vowels (a,e,i,o,u) like in the word "bear"
  • has at least 3 vowels
  • has at least 6 characters
  • has at exactly 6 characters
  • all the words with either 'Bar' or 'Baz' in them
  • all the rows with either 'apple pie' or 'banana pie' in them
  • for each row print if it was apple or banana pie?
  • Bonus: Print if the same word appears twice in the same line
  • Bonus: has a double character (e.g. 'oo')
#!/usr/bin/perl
use strict;
use warnings;

my $filename = shift or die "$0 FILENAME\n";
open my $fh, '<', $filename or die "Could not open '$filename'\n";

while (my $line = <$fh>) {
    if ($line =~ /REGEX1/) {
        print "has an a: $line";
    }

    if ($line =~ /REGEX2/) {
        print "starts with an a: $line";
    }
}

Solutions: Regular expressions

  • has a 'q' /q/
  • starts with a 'q' /^q/
  • has 'th' /th/
  • has an 'q' or a 'Q' /[qQ]/
  • has a * in it /\*/
  • another solution: /[*]/
  • starts with an 'q' or an 'Q' /^[qQ]/
  • has both 'a' and 'e' in it $str =~ /a/ and $str =~ /e/
  • has an 'a' and somewhere later an 'e' /a.*e/
  • does not have an 'a' $str !~ /a/ Not good: /[^a]/
  • does not have an 'a' nor 'e' $str !~ /[ae]/
  • has an 'a' but not 'e' $str =~ /a/ and $str !~ /e/
  • has at least 2 consecutive vowels (a,e,i,o,u) like in the word "bear" /[aeiou]{2}/
  • has at least 3 vowels /[aeiou].[aeiou].[aeiou]/
  • has at least 6 characters /....../
  • another solution: /.{6}/
  • yet another solution: length($str) >= 6
  • has at exactly 6 characters: length($str) == 6
  • all the words with either 'Bar' or 'Baz' in them /Ba[rz]/
  • all the rows with either 'apple pie' or 'banana pie' in them if ($row =~ /apple pie/ or $row =~ /banana pie/){ }
  • for each row print if it was apple or banana pie?
my $ok;
if ($row =~ /apple pie/) {
     print "apple\n";
     $ok = 1;
} elsif ($row =~ /banana pie/) {
     print "banana\n";
     $ok = 1;
}

Regular Expressions II

Grouping, alternatives

all the rows with either 'apple pie' or 'banana pie' in them

if ($row =~ /apple pie/ or $row =~ /banana pie/) {
}

Alternatives

if ($row =~ /apple pie|banana pie/) {
}

Move the common part in one place and limit the alternation to the part within the parentheses.

if ($row =~ /(apple|banana) pie/) {
}

Capturing

  • \1
  • $1
  • ()
  • capturing
  • grouping
if ($row =~ /(apple|banana) pie/) {
    print $1;
}
Has a double character (e.g. 'oo' in loop)
# Input:   "my loop"

if ($line =~ /(.)\1/) {
    print $1;
}
Print if the same word appears twice in the same line
my $str = " in this line there is this word twice ";
if ($str =~ / ([a-z]+) .* \1 /) {
    print "$1\n";
}
/(.+).*\1/        # lines with anything more than once
/((.+).*\1)+/     # Syntax error ! Why ?

if ($line =~ /(.*)=(.*)/) {
    print "left:  $1\n";
    print "right: $2\n";
}

Anchors

  • anchors
  • ^
  • $
  • \b
^            # at the beginning of the pattern means beginning of the string
$            # at the end of the pattern means the end of the string

/the/        # matches anywhere in the string:
             # "atheneum", "thermostat", "the", "mathe"

/^the/       # matches only if the string starts with the
             # "thermostat", "the"

/the$/       # matches only if the string ends with the
             # "the", "mathe"

/^the$/      # matches only if the string
             # "the"

/^The.*finished$/
             # starts with  "The" ends with "finished" and anything in between


\b           # Word delimiter
/\bstruct\b/ # match every place the word "struct" but not
             # "structure" or "construct"

Character classes

A list of optional characters within square brackets []

/a[bc]a/      # aba, aca
/a[2#=x?.]a/  # a2a, a#a, a=a, axa, a?a, a.a
              # inside the character class most of the spec characters lose their
              # special meaning  BUT there are some new special characters
/a[2-8]a/     # is the same as /a[2345678]a/
/a[2-]a/      # a2a, a-a        - has no special meaning at the ends
/a[-8]a/      # a8a, a-a
/a[6-C]a/     # a6a, a7a ... aCa
              #      characters from the ASCII table: 6789:;&lt;=&gt;?@ABC
/a[C-6]a/     # syntax error

/a[^xa]a/     # "aba", "aca"  but not "aaa", "axa"    what about "aa" ?
              # ^ as the first character in a character class means
              # a character that is not in the list
/a[a^x]a/     # aaa, a^a, axa

Special character classes

  • \w
  • \d
  • \s
  • ^
  • $

| Expression | Meaning | Usage | | \w | Word characters: [a-zA-Z0-9_] | \w+ or [\w#-]+ | | \d | Digits: [0-9] | | | \s | [\f\t\n\r ] form-feed, tab, newline, carriage return and SPACE | | | \W | [^\w] | | | \D | [^\d] | | | \S | [^\s] | | | [:class:] | POSIX character classes (alpha, alnum...) | [:alpha:]+ or [[:alpha:]#-]+ | | \p{...} | Unicode definitions (IsAlpha, IsLower, IsHebrew, ...) | \p{IsHebrew}+ or [\p{IsHebrew}#!-] | | \P{...} | Negation of Unicode character class | |

See also perldoc perlre and perldoc perluniintro.

Exercise: Hex/Oct/Bin

Write functions that return true if the given value is a

  • Hexadecimal number
  • Octal number
  • Binary number

Exercise: Number

Write a function that given a string it return true if the string is a number. As there might be several definitions of what is the number create several solutions one for each definition:

  • Non negative integer.
  • Integer. (Will you also allow + in front of the number or only - ?
  • Real number. (Do you allow .3 ? What about 2. ?
  • In scientific notation. (something like this: 2.123e4 )

Exercise: Roman numbers

Write functions that return true if the given value is a Roman Number. If you can do that maybe write another function to return the decimal value of the given number.

I, II, III, IV, V, VI, VII,....
   I = 1
   V = 5
   X = 10
   L = 50
   C = 100
   D = 500
   M = 1000

Solution: Number

#!/usr/bin/perl
use strict;
use warnings;

while (my $number = <STDIN>) {
    if (is_non_negative($number)) {
        print "non negative integer without +- sign\n";   # 0, 3, 7
    }
    if (is_integer($number)) {
        print "integer with optional +- sign\n";    # -1, +3
    }
    if (is_real($number)) {
        print "real number with decimal point\n";   # 3.1, 0.0,  .3,  2., -.7
    }
    if (is_exp($number)) {
        print "exponential format\n";               # .1e
    }
    if (is_exp2($number)) {
        print "exponential format (x)\n";           # .1e
    }
}
sub is_non_negative { $_[0] =~ /^\d+$/ }
sub is_integer      { $_[0] =~ /^[+-]?\d+$/ }
sub is_real         { $_[0] =~ /\d/ and $_[0] =~ /^[+-]?\d*\.?\d*$/}
sub is_exp          { $_[0] =~ /\d/ and $_[0] =~ /^[+-]?\d*\.?\d*(e[+-]?\d+)?$/}
sub is_exp2         { $_[0] =~ /\d/ and $_[0] =~ /^
            [+-]?            # optional + or - sign 
            \d*              # 0 or more digits before the decimal point
            \.?              # optional decimal point
            \d*              # 0 or more digits after the decimal point
            (e[+-]?\d+)?     # optional "e" followed by an integer number
            $/x}

Solution: Hex/Oct/Bin

#!/usr/bin/perl
use strict;
use warnings;

while (my $number = <STDIN>) {
    if (is_hex($number)) {
        print "Hexadecimal number\n";    # 0xAD37F
    }
    if (is_octal($number)) {
        print "Octal number\n";          # 02432471
    }
    if (is_binary($number)) {
        print "Binary number\n";         # 0b01110
    }
}

sub is_hex    { $_[0] =~ /^0x[\da-fA-F]+$/ }
sub is_octal  { $_[0] =~ /^0[0-7]+$/       } 
sub is_binary { $_[0] =~ /^0b[01]+$/       }

Solution: Roman numbers

#!/usr/bin/perl
use strict;
use warnings;

while (my $number = <STDIN>) {

    # This solution only check is the string consists of characters
    # used in as Roman numbers but does not check if the number is
    # actually a valid number. (e.g. IVI is not valid)
    # I yet to see a definition on how to validate a Roman number.
    if (is_roman($number)) {
        print "Roman number\n";
    }
}

sub is_roman    { $_[0] =~ /^[IVXLCDM]+$/    }

sub is_roman2   { $_[0] =~ /^
    (M{0,4})
    (CM|CD|D?C{0,3})
    (XL|XC|L?X{0,3})
    (IV|IX|V?I{0,3})
    $/x  }

Regular Expressions

Regexp::Common

#!/usr/bin/perl
use strict;
use warnings;

use Regexp::Common 'RE_ALL';

my $file = 'regexp_common.txt';
if (@ARGV) {
    $file = shift;
}
open(my $data, '<', $file) or die "Could not open $file\n";

while (my $line = <$data>) {
    chomp $line;
    print "LINE: '$line'";
    if ($line =~ RE_balanced(-parens=>'()')) {
        print "  ** balanced parentheses";
    }
    if ($line =~ RE_num_real()) {
        print "  ** a real number";
    }
    if ($line =~ RE_num_int()) {
        print "  ** an integer";
    }
    print "\n";
}



one
(two)
(three))
((three)
)four(
3.4
42

Options and modifiers

//    is actually the same as    m//
When using the m sign you can change the delimiters:
Let's say you would like to match lines with

/usr/bin/perl
if ($line =~ /\/usr\/bin\/test-perl/) {
}

if ($line =~ m{/usr/bin/perl}) {
}

/i Case sensitivity

$line = "Apple";
/apple/      # does not match
/apple/i     # case insensitive will match

Matching quotes

In <this text> there are <some in marks>,
but there are more lines in this file and there
are more <such symbols>. How can someone match sentences?
A sentence - in our case, starts with a capital letter
and ends with a dot.
#!/usr/bin/perl
use strict;
use warnings;

my $filename = 'sentences.txt';

my $content = slurp($filename);

if ($content =~ /<.*>/) {
    print "$&\n";
    print "-----------\n";
}

# If we want the smallest:
if ($content =~ /<[^>]*>/) {
    print "$&\n";
    print "-----------\n";
}

# If we want the biggest
if ($content =~ /<.*>/s) {
    print "$&\n";
    print "-----------\n";
}


sub slurp {
    my $file = shift;
    open my $fh, '<', $file or die "Could not open '$file' $!";
    local $/ = undef;
    my $all = <$fh>;
    close $fh;
    return $all;
}

/s single line

. will match any character (including newline)

/m multiple lines

^ will match beginning of line
$ will match end of line

\A still matches beginning of string
\z
\Z
#!/usr/bin/perl
use strict;
use warnings;

#../regex/examples/text/american-english
my $filename = shift or die;

my $data;
{
    open my $fh, '<', $filename or die;
    local $/ = undef;
    $data = <$fh>
}
if ($data =~ /(^a.*\nb.*\n)/mi) {
    print $1;
}


/x enable white spaces and comments

/(X\d+).*\1/

/
 (X\d+)         # product number
 .*             # any character
 \1             # the same product number
/x

Substitute

  • s/PATTERN/REPLACEMENT/
  • s{PATTERN}{REPLACEMENT}
$line = "abc123def";

$line =~ s/\d+/ /;       # "abc def"


$line =~ s{
      ([a-z]*)
      (\d*)
      ([a-z]*)
      }
      {$3$2$1}x;        # "def123abc"

Global substitute

$line = "abc123def";

$line =~ s/.../x/;           # "x123def";

$line =~ s/.../x/g;          # "xxx";


$line =~ s/(.)(.)/$2$1/;     # "bac123def"

$line =~ s/(.)(.)/$2$1/g;    # "ba1c32edf"

Greedy quantifiers

use strict;
use warnings;

'xaaab' =~ /xa*/;
print "$&\n";

'xabxaab' =~ /xa*/;
print "$&\n";

'xabxaab' =~ /a*/;
print "$&\n";

/xa*/ on xaaab      xaaa  because it is greedy
/xa*/ on xabxaab    xa at the beginning even though the other  one is longer
/a*/  on xabxaab    the empty string at the beginning of the string

minimal match

/a.*b/   axbzb
/a.*?b/  axbzb

/a.*b/
/a.*?b/
   axy121413413bq
   axyb121413413q

Replace spaces

s/^\s+//        leading
s/\s+$//        trailing

both ends:

s/^\s*(.*)\s*$/$1/   " abc " =>  "abc "  because of the greediness
s/^\s*(.*?)\s*$/$1/   " abc " =>  "abc"  minimal match

Replace string in assembly code

mv A, R3
mv R2, B
mv R1, R3
mv B, R4
add A, R1
add B, R1
add R1, R2
add R3, R3
add R21, X
add R12, Y
mv X, R2
#!/usr/bin/perl
use strict;
use warnings;

# assuming there are no R4 values then 4 substitutions will do
s/R1/R4/g
s/R3/R1/g
s/R2/R3/g
s/R4/R2/g



# or without any assumption and in one substitution:
my %map = (
    R1  => 'R2',
    R2  => 'R3',
    R3  => 'R1',
);

s/\b(R[123])\b/$map{$1}/g;


s/\b(R1|R2|R3|R12)\b/$map{$1}/g;

my $regex = join "|", keys %map;
s/\b($regex)\b/$map{$1}/g;


Full example of previous

#!/usr/bin/perl
use strict;
use warnings;

use File::Slurp qw(read_file write_file);


my %conversion = (
    R1  => 'R2',
    R2  => 'R3',
    R3  => 'R1',
    R12 => 'R21',
    R21 => 'R12',
);

replace(\%conversion, \@ARGV);

sub replace {
    my ($map, $files) = @_;

    my $regex = join "|", keys %$map;

    my $ts = time;

    foreach my $file (@$files) {
        my $data = read_file($file);
        $data =~ s/\b($regex)\b/$map->{$1}/g;
        rename $file, "$file.$ts";       # backup with current timestamp
        write_file( $file, $data);
    }
}


split with regular expression

LIST = split REGEX, STRING;
fname    =    Foo
lname    = Bar
email=foo@bar.com
#!/usr/bin/perl
use strict;
use warnings;

# data: field_value_pairs.txt
my $filename = shift or die "Usage: $0 filename\n";

open(my $fh, "<", $filename) or die "Could not open '$filename'\n";
while (my $line = <$fh>) {
    chomp $line;
    my ($field, $value) = split /\s*=\s*/, $line;
    print "$value=$field\n";
}

Fixing dates

In the input we get dates like this 2010-7-5 but we would like to make sure we have two digits for both days and months: 2010-07-05

#!/usr/bin/perl
use strict;
use warnings;

sub fix_date {
    my $str = shift;
    $str =~ s/-(\d)\b/-0$1/g;
    return $str;
}


my %dates = (
    '2010-7-5'   => '2010-07-05',
    '2010-07-5'  => '2010-07-05',
    '2010-07-05' => '2010-07-05',
    '2010-7-15'  => '2010-07-15',
);

use Test::More;
plan tests => scalar keys %dates;

foreach my $in (sort keys %dates) {
    my $result = fix_date($in);

    is $result, $dates{$in}, $in;

#    print  "      old: $in\n";
#    print  "      new: $result\n";
#    print  " expected: $dates{$in}\n\n";
}

      old: 2010-07-05  
      new: 2010-07-05
 expected: 2010-07-05

      old: 2010-07-5   
      new: 2010-07-05
 expected: 2010-07-05

      old: 2010-7-15   
      new: 2010-07-15
 expected: 2010-07-15

      old: 2010-7-5    
      new: 2010-07-05
 expected: 2010-07-05

1..4
ok 1 - 2010-07-05
ok 2 - 2010-07-5
ok 3 - 2010-7-15
ok 4 - 2010-7-5
date.pl .. ok
All tests successful.
Files=1, Tests=4,  0 wallclock secs ( 0.05 usr +  0.00 sys =  0.05 CPU)
Result: PASS

Exercise: split CGI

Given a string that looks like this:

my $str = 'fname=Foo&amp;lname=Bar&amp;email=foo@bar.com';

Create a hash where the keys are fname, lname, email or if the string looks like this

my $str = 'title=Stargates&amp;year=2005&amp;chapter=03&amp;bitrate=128';

then create a hash where the keys are title, year, chapter, bitrate Use a single statement (with split) to achieve this.

#!/usr/bin/perl
use strict;
use warnings;

use Data::Dumper;

my @input = (
    'fname=Foo&lname=Bar&email=foo@bar.com',
    'ip=127.0.0.1&machine=foobar',
);

foreach my $str (@input) {
    process($str);
}

sub process {
    my $str = shift;

    my @pairs = split /&/, $str;
    my %data;
    foreach my $p (@pairs) {
        my ($k, $v) = split /=/, $p;
        $data{$k} = $v;
    }
    print Dumper \%data;
}

$VAR1 = {
          'email' => 'foo@bar.com',
          'lname' => 'Bar',
          'fname' => 'Foo'
        };
$VAR1 = {
          'ip' => '127.0.0.1',
          'machine' => 'foobar'
        };

Exercise: basename/dirname

Create two functions basename() and dirname() Given a path such as /home/foo/.mozilla/cache/data.txt the basename() function should return the filename ( data.txt in the example). The dirname() function should return the full-path directory name ( /home/foo/.mozilla/cache in the example.)

Exercise: Sort SNMP numbers

Given a file with SNMP numbers (one number on every line) print them in sorted order comparing the first number of each SNMP number first. If they are equal then comparing the second number, etc...

input:

1.2.7.6
4.5.7.23
1.2.7
1.12.23
2.3.5.7.10.8.9

output:

1.2.7
1.2.7.6
1.12.23
2.3.5.7.10.8.9
4.5.7.23

Exercise: parse hours log file and give report

The log file looks like this

{% embed include file="src/examples/regex-perl/timelog.log)

the report should look something like this:

09:20-11:00 Introduction
11:00-11:15 Exercises
11:15-11:35 Break
...


Solutions                   95 minutes   9%
Break                       65 minutes   6%
...

Exercise: Parse ini file

An ini file has sections starting by the name of the section in square brackets and within each section there are key = value pairs with optional spaces around the "=" sign. The keys can only contain letters, numbers, underscore or dash. In addition there can be empty lines and lines starting with # which are comments.

Given a filename, generate a 2 dimensional hash and then print it out with Data::Dumper. Example ini file:

{% embed include file="src/examples/regex-perl/inifile.ini)

Result

$VAR1 = {
          'earth' => {
                       'base' => 'London',
                       'ship' => 'x-wing'
                     },
          'alpha' => {
                       'base' => 'moon',
                       'ship' => 'alpha 3'
                     }
        };

Exercise: parse perl file

Parse your perl files and print out the names of your variables. In the first version print out the scalar variables only. In the second version show all variables.

The user gives the name of the file on the command line.

Input file:

#!/usr/bin/perl
use strict;
use warnings;

my $filename = shift or die "Usage: $0 filename\n";
open(my $fh, "<", $filename) or die "Could not open '$filename'\n";

my @snmps = <$fh>;
chomp @snmps;

print join "\n", @snmps;
print "\n------------------\n";
my @sorted_snmps = sort by_snmp_number @snmps;
print join "\n", @sorted_snmps;

sub by_snmp_number {
    return 0 if $a eq $b;
    my @a = split /\./, $a;
    my @b = split /\./, $b;
    foreach my $i (0..@a-1) { 
        return 1 if $i >= @b;
        next if $a[$i] == $b[$i];
        return $a[$i] <=> $b[$i];
    }
    return -1;
}

print "\n------------------\n";

Output:

$filename
$0
$fh
@snmps
@sorted_snmps
$a
$b
@a
@b
$i

Solution: Split CGI

#!/usr/bin/perl
use strict;
use warnings;

use Data::Dumper;

my @input = (
    'fname=Foo&lname=Bar&email=foo@bar.com',
    'ip=127.0.0.1&machine=foobar',
);

foreach my $str (@input) {
    process($str);
}

sub process {
    my $str = shift;

    my %data = split /[=&]/, $str;

    print Dumper \%data;
}

Solution: filename/dirname

#!/usr/bin/perl
use strict;
use warnings;

my $path = "/home/foo/.mozilla/cache/data.txt";

my $filename = ($path =~ m{([^/]*)$} ? $1 : "");
my $dirname  = ($path =~ m{^(.*)/}   ? $1 : "");

my ($file_name) = $path =~ m{([^/]*)$};
my ($dir_name)  = $path =~ m{^(.*)/};

my ($dir, $file) = $path =~ m{^(.*)/(.*)$};

print "$path\n";
print "---------\n";

print "$filename\n";
print "$dirname\n";
print "---------\n";

print "$file_name\n";
print "$dir_name\n";
print "---------\n";

print "$file\n";
print "$dir\n";
print "---------\n";



use File::Basename;
print basename($path) . "\n";
print dirname($path) . "\n";



Solution: Sort SNMP numbers

#!/usr/bin/perl
use strict;
use warnings;

my $filename = shift or die "Usage: $0 filename\n";
open(my $fh, "<", $filename) or die "Could not open '$filename'\n";

my @snmps = <$fh>;
chomp @snmps;

print join "\n", @snmps;
print "\n------------------\n";
my @sorted_snmps = sort { cmp_snmp($a, $b) } @snmps;
print join "\n", @sorted_snmps;

sub cmp_snmp {
    my ($first, $second) = @_;

    if ($first eq $second) {
        return 0;
    }

    my @F = split /\./, $first;
    my @S = split /\./, $second;
    
    foreach my $i (0..@F-1) { 
        if ($i >= @S) {
            return 1;
        }
        if ($F[$i] < $S[$i]) {
            return -1;
        }
        if ($F[$i] > $S[$i]) {
            return 1;
        }
    }
    return -1;
}

print "\n------------------\n";

Solution: parse hours log file and give report

#!/usr/bin/perl
use strict;
use warnings;

my $filename = shift or die "Usage: $0 filename\n";

my @entries;
my %stat;

open(my $fh, "<", $filename) or die "Could not open '$filename' $!";

while (my $line = <$fh>) {
    chomp $line;
    next if $line =~ /^#/;
    if ($line =~ /\S/) {
        push @entries, $line;
        next;
    }
    process_day();
    @entries = ();
}

process_day(); # in case there is no empty line after the last line

foreach my $title (keys %stat) {
    printf "%-25s %4s minutes %3s%%\n", 
        $title, $stat{$title}, int(100 * $stat{$title} / $stat{Total});
}



sub process_day {
    my @day;
    foreach my $e (@entries) {
        my ($time, $title) = split / /, $e, 2;
        if (@day) {
            $day[-1]{end}    = $time;
            my ($start_hour, $start_min) = split /:/, $day[-1]{start};
            my ($end_hour, $end_min)     = split /:/, $day[-1]{end};
            $day[-1]{total} = $end_hour*60+$end_min - ($start_hour*60+$start_min);

            if ($day[-1]{title} =~ /Break|Exercises|Solutions/) {
                $stat{$day[-1]{title}} += $day[-1]{total};
            } else {
                $stat{Lectures} += $day[-1]{total};
            }
            $stat{Total} += $day[-1]{total};

            print "$day[-1]{start}-$day[-1]{end} $day[-1]{title}\n";
        }
        if ($title ne "End") {
            push @day, {
                start => $time,
                title => $title,
                };
        }
    }
    print "\n";
    return;
}

Solution: Parse ini file

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper qw(Dumper);

my $filename = shift or die "Usage: $0 filename\n";
open my $fh, '<', $filename or die "Could not open '$filename' $!";

my $section;

my %data;

while (my $line = <$fh>) {
    if ($line =~ /^\s*#/) {
        next;        # skip comments
    }
    if ($line =~ /^\s*$/) {
        next;    # skip empty lines
    }

    if ($line =~ /^\[(.*)\]\s*$/) {
        $section = $1;
        next;
    }

    if ($line =~ /^([^=]+?)\s*=\s*(.*?)\s*$/) {
        my ($field, $value) = ($1, $2);
        if (not defined $section) {
            warn "Line outside of seciton '$line'\n";
            next;
        }
        $data{$section}{$field} = $value;
    }
}

print Dumper \%data;
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper qw(Dumper);

use Config::Tiny;

my $filename = shift or die "Usage: $0 filename\n";
open my $fh, '<', $filename or die "Could not open '$filename' $!";

my $data = Config::Tiny->read( $filename );

print Dumper $data;

Solution: parse perl file

#!/usr/bin/perl
use strict;
use warnings;

# scalars only but finds only the first variable on every line
#while (<>) {
#   if (/(\$\w+)\b/) {
#       if (not defined $h{$ARGV}{$1}) {
#           $h{$ARGV}{$1}=1;
#           print "$ARGV: $1\n";
#       }
#    }
#}


# scalars $ or arrays @ or hashes %
# including all variables on every line
my %h;
while (my $line = <>) {
    if (my @vars = $line =~/[\$@%]\w+\b/g) {   
        foreach my $v (@vars) {
            if (not defined $h{$ARGV}{$v}) {
                $h{$ARGV}{$v}=1;
                print "$ARGV: $v\n";
            }
        }
    }
}

Regular Expressions Cheat sheet

| Expression | Meaning | | a | Just an 'a' character | | . | any character except new-line | | [bgh.] | one of the chars listed in the character class b,g,h or . | | [b-h] | The same as [bcdefgh] | | [a-z] | Lower case letters | | [b-] | The letter b or - | | [^bx] | Anything except b or x | | \w | Word characters: [a-zA-Z0-9_] | | \d | Digits: [0-9] | | \s | [\f\t\n\r ] form-feed, tab, newline, carriage return and SPACE | | \W | [^\w] | | \D | [^\d] | | \S | [^\s] | | [:class:] | POSIX character classes (alpha, alnum...) | | \p{...} | Unicode definitions (IsAlpha, IsLower, IsHebrew, ...) | | a* | 0-infinite 'a' characters | | a+ | 1-infinite 'a' characters | | a? | 0-1 'a' characters | | a{n,m} | n-m 'a' characters | | ( ) | Grouping and capturing | | | | Alternation | | \1, \2 | Capture buffers | | $1, $2 | Capture variables | | ^ $ | Beginning and end of string anchors |

See also perldoc perlre

Matching numbers using Perl regex