Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Advanced Perl

Perl References

Uses cases for references

  • Creating multi-dimensional and other deep data structures (arrays of arrays, hashes of arrays, etc.)
  • Passing complex data structures to functions (e.g. arrays, hashes)

References

\$x    # reference to scalar
\@y    # reference to array
\%z    # reference to hash
\&f    # reference to function

Array References

  • @array
  • ARRAY

To solve the problem we will use references. Prefixing the array with a back-slash \ creates a reference to it.

my $names_ref  = \@names;
print $names_ref;      # ARRAY(0x703dcf2)
@$names_ref

but it will be probably more readable to write

@{$names_ref}

or even

@{ $names_ref }

Once we know all this we can pass a reference to a function, within the function we can dereference the array and we get back the original array.

#!/usr/bin/env perl
use strict;
use warnings;

my @names = qw(Foo Bar Baz);

my $names_ref  = \@names;
print "$names_ref\n";         # ARRAY(0x703dcf2)

print "@$names_ref\n";        # Foo Bar Baz
print "@{ $names_ref }\n";    # Foo Bar Baz


print "$names[0]\n";          # Foo
print "$names_ref->[0]\n";    # Foo

Hash References

  • %hash
  • HASH

Similarly to the Array references one can create references to Hashes as well.

my $phones_ref = \%phones;
Hash            Hash Reference
%phones         %{ $phones_ref }
$phones{Foo}    ${ $phones_ref }{Foo}
                $phones_ref->{Foo}
keys %phones    keys %{ $phones_ref }

Using the print function to print out the content of an Array was quite OK. Printing a Hash was a disaster. The same happens once you dereference a reference to an array or hash.

#!/usr/bin/env perl
use strict;
use warnings;

my %phones = (
    Barney  => 123,
    Fred    => 456,
    Wilma   => 789,
);
my $phones_ref = \%phones;

print $phones_ref, "\n";             # HASH(0x703fed1)
my @names = sort keys %$phones_ref};
print "@names\n";                    # Barney Fred Wilma

print "$phones{Fred}\n";             # 456
print "$phones_ref->{Fred}\n";       # 456

Scalar references

  • scalar references

Similar to array references and hash references one can also create references to scalars.

#!/usr/bin/env perl
use strict;
use warnings;

my $name = "Foo";
my $name_ref = \$name;

print "$name_ref\n";  # SCALAR(0x562516e5f140)
print "$name\n";      # Foo
print "$$name_ref\n"; # Foo

Subroutine references

  • subroutine references
  • function references
#!/usr/bin/env perl
use strict;
use warnings;

sub add {
    my ($x, $y) = @_;
    return $x+$y;
}

my $add_ref = \&add;

print "$add_ref\n";              # CODE(0x564e85424148)
print &{ $add_ref }(2, 3), "\n"; # 5
print $add_ref->(2, 3), "\n";    # 5

Passing values to a function

Let's see a simple example of passing values to a function.

#!/usr/bin/bin perl
use strict;
use warnings;

sub add {
    my ($x, $y) = @_;
    return $x+$y;
}

print add(2, 3), "\n";

Add two (or more) arrays

  • @_
Let's extend it so it will be able to take two vectors (arrays) and add
them pair-wise.  (2, 3) + (7, 8, 5) =  (9, 11, 5)
#!/usr/bin/env perl
use strict;
use warnings;

my @left  = (2, 3);
my @right = (7, 8, 5);
add(@left, @right);

sub add {
    my ($first, $second) = @_;
    print "$first\n";
    print "$second\n";
}

# 2
# 3

addx(@left, @right);
sub addx {
    my (@first, @second) = @_;
    print "First: @first\n";
    print "Second: @second\n";
}

# First: 2 3 7 8 5
# Second:

The problem is, @_ in the add function will get (2, 3, 7, 8, 5); We cannot know where does the first array end and where the second begins.

(I am sure one can come up with a complex way of prefixing the real data by meta information that describes the data, but why work so hard if Perl already has the tools you need?)

Add two arrays

#!/usr/bin/env perl
use strict;
use warnings;

my @first  = (2, 3);
my @second = (7, 8, 5);
my @res    = add(\@first, \@second);  # passing two references
print "@res\n";

sub add {
    my ($one_ref, $two_ref) = @_;
    my @one = @{ $one_ref };       # dereferencing and copying each array
    my @two = @{ $two_ref };

    my @result;
    foreach my $i (0..@one-1) {
        if (defined $two[$i]) {
            push @result, $one[$i] + $two[$i];
        }
        else {
            push @result, $one[$i];
        }
    }
    foreach my $i (@one..@two-1) {
        push @result, $two[$i];
    }
    return @result;
}

Passing two arrays to a function

Array References

Of course if the arrays are big, copying them is waste of time and memory. Let's see how can we access the individual elements of an array when using a reference to that array. Then we won't have to copy them within the function.

my $names_ref = \@names;
Array      Array Reference
@names     @{ $names_ref }
$names[0]  ${ $names_ref }[0]
           $names_ref->[0]
$#names    $#$names_ref

You really don't want to use this $#$names_ref among people...

Process arrays without copying

This is similar to the previous solution but without copying the arrays

#!/usr/bin/env perl
use strict;
use warnings;

my @first  = (2, 3);
my @second = (7, 8, 5);
my @res    = add(\@first, \@second);
print "@res\n";

sub add {
    my ($one_ref, $two_ref) = @_;

    my @result;
    foreach my $i (0..@{ $one_ref }-1) {
        if (defined $two_ref->[$i]) {
            push @result, $one_ref->[$i] + $two_ref->[$i];
        }
        else {
            push @result, $one_ref->[$i];
        }
    }
    foreach my $i (@{ $one_ref }..@{ $two_ref }-1) {
        push @result, $two_ref->[$i];
    }
    return @result;
}

Scope of variables

  • scope

Let's see an example where we create an array and a reference to it in some arbitrary {} scope. The array is defined within the scope while the variable holding the reference is defined outside the scope.

#!/usr/bin/env perl
use strict;
use warnings;

my $names_ref;

{
    my @names = qw(Foo Bar);
    $names_ref = \@names;
}
print "$names_ref->[0]\n"; # Foo
# print "@names\n";  # Global symbol "@names" requires explicit package name

#!/usr/bin/env perl
use strict;
use warnings;
use Devel::Refcount qw(refcount);

my $names_ref;

{
    my @names = qw(Foo Bar);
    $names_ref = \@names;
}
print "$names_ref->[0]\n"; # Foo
print(refcount($names_ref), "\n"); # 1

After the closing } @names went out of scope already but $names_ref still lets us access the values.

As $names_ref still holds the reference to the location of the original array in the memory perl keeps the content of the array intact.

Reference counting

  • reference counting}

As you already know perl automatically allocates memory when you add a new element to an array or hash. It also frees up memory (although only internally) when you remove the element from the array or hash.

Similarly when you create an array or a hash, perl sets up a counter (called reference counter) and sets it to 1. When the array or hash goes out of scope the counter is reduced by 1. If the counter reaches 0 perl frees the memory allocated to that array. Without references this is always the case.

When you create a reference to an array and store it in a variable perl increases the reference counter of the original array by 1 (to 2). Now if the array goes out of scope and the ref count is reduced by 1 it still does not reach 0 so perl does not free up the location of the array.

Only if the reference stops referring to this memory location will the counter reach 0 thereby freeing up the memory allocated to the array.

More Reference Counting

my @names     = qw(Foo Bar Baz);   # cnt = 1
my $names_ref = \@names;           # cnt = 2
my $other_ref = \@names;           # cnt = 3
my $x_ref     = $names_ref;        # cnt = 4

$other_ref = undef;            # cnt = 3

Process arrays without copying even the return values

In the previous solution we passed the references to the function but returned a full array, thereby copying all the values. If we care about memory and speed we might eliminate this copying by returning the reference to the resulting array.

The cost is a small (?) inconvenience as now we have to dereference the resulting array reference in the calling code.

#!/usr/bin/env perl
use strict;
use warnings;

my @first   = (2, 3);
my @second  = (7, 8, 5);
my $res_ref = add(\@first, \@second);
print "@{ $res_ref }\n";

sub add {
    my ($one_ref, $two_ref) = @_;

    my @result;
    foreach my $i (0..@{ $one_ref }-1) {
        if (defined $two_ref->[$i]) {
            push @result, $one_ref->[$i] + $two_ref->[$i];
        }
        else {
            push @result, $one_ref->[$i];
        }
    }
    foreach my $i (@{ $one_ref }..@{ $two_ref }-1) {
        push @result, $two_ref->[$i];
    }
    my $ret = \@result;
    return $ret;
}

Debugging (pretty printing)

  • Data::Dumper

But once we have those references we have a better tool to print out their content.

#!/usr/bin/env perl
use strict;
use warnings;

use Data::Dumper qw(Dumper);

my @names     = qw(Foo Bar Baz);
my $names_ref = \@names;

my %phones = (
    Foo  => 123,
    Bar  => 456,
    Baz  => 789,
);
my $phones_ref = \%phones;

print Dumper $names_ref, $phones_ref;
$VAR1 = [
          'Foo',
          'Bar',
          'Baz'
        ];
$VAR2 = {
          'Bar' => 456,
          'Baz' => 789,
          'Foo' => 123
        };

Actually you can use the Dumper on the references themself without putting them in scalar variables.

print Dumper \@names, \%phones;

Change values in a reference

If you create a copy of an array then the two arrays are separated. Change to any of the arrays is not reflected in the other array.

#!/usr/bin/env perl
use strict;
use warnings;

my @names = qw(Foo Bar);
my @copy_names = @names;
$copy_names[0] = 'Baz';

print "$names[0]\n";       # Foo
print "$copy_names[0]\n";  # Baz

When you create a reference to an array, then the referenced array has the same memory location, hence change in either one of them is a change in both of them.

#!/usr/bin/env perl
use strict;
use warnings;

my @names = qw(Foo Bar);
my $names_ref = \@names;

$names_ref->[0] = 'Baz';
print "$names[0]\n";        # Baz
print "$names_ref->[0]\n";  # Baz

That means you can pass to a function an array reference, then from within the function it is easy to change the content of the original array.

Exercise: double numbers

Create a function that gets an array reference and multiplies each value in it by 2;

my @numbers = (2, 4, 7);
multiply_by_two(\@numbers);
print "@numbers\n";   # 4 8 14

Exercise: Add many arrays

Pick up the examples/references/add_arrays.pl script that can add two arrays and change it to accept any number of array references.

Extra exercise: add parameters that will control to stop the addition at the shortest array or the longest array.

my @a = (2, 3);
my @b = (4, 5, 6);
add('shortest', \@a, \@b); # returns (6, 8)
add('longest', \@a, \@b);  # returns (6, 8, 6)

Exercise: Function to compare two hashes

Create a function that given two hashes, returns a report showing missing keys or keys with different values.

#!/usr/bin/env perl
use strict;
use warnings;

my %x = (
    foo    => 1,
    bar    => 2,
    baz    => 3,
    zoo    => 6,
    foobar => undef,
    moose  => undef,
);
my %y = (
    foo    => 1,
    bar    => 4,
    moo    => 5,
    zoo    => undef,
    foobar => 9,
    moose  => undef,
);

my @report = compare_hashes(\%x, \%y);
print join "\n", @report;
print "\n";
The value of 'bar' differs: '2' and '4'
The key 'baz' does not exist in second hash
The value of 'zoo' is '6' in the fist has and undef in the second hash
The value of 'foobar' is '9' in the second has and undef in the first hash
The key 'moo' does not exist in first hash

Solution: Double numbers

#!/usr/bin/env perl
use strict;
use warnings;

my @numbers = (2, 4, 7);
multiply_by_two(\@numbers);
print "@numbers\n";   # 4 8 14

sub multiply_by_two {
    my ($ref) = @_;
    foreach my $number (@$ref) {
        $number *= 2;
    }
}

Solution: Add many arrays

#!/usr/bin/env perl
use strict;
use warnings;

my @first  = (2, 3);
my @second = (7, 8, 5);
my @third  = (9, 10, 2, 4);
my $res    = add(\@first, \@second, \@third);
print "@$res\n"; # 18 21 7 4

sub add {
    my @args = @_;
    return [] if @args == 0;

    my $first = shift @args;
    if (@args) {
        my $rest = add(@args);
        return add2($first, $rest);
    } else {
        return [@$first];
    }
}


# is the same as was the add() in add_arrays_nocopy_return.pl
sub add2 {
    my ($one_ref, $two_ref) = @_;

    my @result;
    foreach my $i (0..@{ $one_ref }-1) {
        if (defined $two_ref->[$i]) {
            push @result, $one_ref->[$i] + $two_ref->[$i];
        }
        else {
            push @result, $one_ref->[$i];
        }
    }
    foreach my $i (@{ $one_ref }..@{ $two_ref }-1) {
        push @result, $two_ref->[$i];
    }
    return \@result;
}

Solution: Add many arrays

#!/usr/bin/perl
use strict;
use warnings;

my @first  = (2, 3);
my @second = (7, 8, 5);
my @third  = (9, 10, 2, 4);
my $res    = add(\@first, \@second, \@third);
print "@$res\n"; # 18 21 7 4

sub add {
    my @args = @_;
    return [] if @args == 0;

    my $longest = 0;
    foreach my $r (@args) {
        if ($longest < @$r) {
            $longest = @$r;
        }
    }

    my @result;
    foreach my $i (0..$longest-1) {
        foreach my $r (@args) {
            $result[$i] += (defined $r->[$i] ? $r->[$i] : 0);
        }
    }
    return \@result;
}

Solution: Function to compare two hashes

#!/usr/bin/perl
use strict;
use warnings;


my %x = (
    foo    => 1,
    bar    => 2,
    baz    => 3,
    zoo    => 6,
    foobar => undef,
    moose  => undef,
);
my %y = (
    foo    => 1,
    bar    => 4,
    moo    => 5,
    zoo    => undef,
    foobar => 9,
    moose  => undef,
);

my @report = compare_hashes(\%x, \%y);
print join "\n", @report;
print "\n";


sub compare_hashes {
    my ($first, $second) = @_;
    my @report;
    foreach my $k (keys %{ $first }) {
        if (not exists $second->{$k}) {
            push @report, "The key '$k' does not exist in second hash";
        } elsif (not defined $first->{$k} and not defined $second->{$k}) {
            # ok, neither is defined
        } elsif (defined $first->{$k} and not defined $second->{$k}) {
            push @report,
                "The value of '$k' is '$first->{$k}' in the first hash"
                . " and undef in the second hash";
        } elsif (not defined $first->{$k} and defined $second->{$k}) {
            push @report,
                "The value of '$k' is '$second->{$k}' in the second hash"
                . " and undef in the first hash";
        } elsif ($first->{$k} ne $second->{$k}) {
            push @report,
                "The value of '$k' differs: '$first->{$k}' and '$second->{$k}'";
        }
    }
    foreach my $k (keys %{ $second }) {
        if (not exists $first->{$k}) {
            push @report, "The key '$k' does not exist in first hash";
        }
    }
    return @report;
}


Anonymous Arrays

  • anonymous array
  • []

Occasionally we are not interested in the array @names, just in a reference to it. We can use the scoping trick we saw earlier to force the array out of scope immediately once it was used.

my $names_ref;
{
    my @names     = ("Foo", "Bar", "Baz");
    $names_ref = \@names;
}
# @names is not in scope here
# $names_ref is still is

We can reach the same result without the temporary array and the whole scoping issue by creating what we call an "anonymous array". That is an array that does not have a real name, (@name) just a reference to it.

my $other_names_ref = ["Foo", "Bar", "Baz"];

my $yet_another_names_ref = [ qw(Foo Bar Baz) ];

We got used to seeing [] as an indicator of an array element. The same visual clue will help us remember that the above creates an array reference.

Array of Arrays

  • AoA
#!/usr/bin/perl
use strict;
use warnings;

my $first_names_ref  = [ qw(Foo Bar Baz) ];
my $family_names_ref = [ qw(Moo Zorg) ];

my @names = ($first_names_ref, $family_names_ref);

print "$first_names_ref\n";       # ARRAY(0x703dcf2)
print "@{ $first_names_ref }\n";  # Foo Bar Baz

print "$names[0]\n";              # ARRAY(0x703dcf2)
print "@{ $names[0] }\n";         # Foo Bar Baz
print "@{ $names[1] }\n";         # Moo Zorg

print "$first_names_ref->[0]\n";  # Foo
print "$names[0]->[0]\n";         # Foo

print "$names[0][0]\n";           # Foo

That already looks like a two dimensional array. It is not, but it can be used like one.

Array of Arrays (AoA)

Instead of naming the internal array references we can use them within the creation of the larger array.

#!/usr/bin/env perl
use strict;
use warnings;

my @names = (
    [ qw(Foo Bar Baz) ],
    [ qw(Moo Zorg) ],
);

print "$names[0]\n";              # ARRAY(0x703dcf2)
print "@{ $names[0] }\n";         # Foo Bar Baz
print "@{ $names[1] }\n";         # Moo Zorg

print "$names[0]->[0]\n";         # Foo
print "$names[0][0]\n";           # Foo

Many dimensional arrays

#!/usr/bin/perl
use strict;
use warnings;

use Data::Dumper;

my @dim;
$dim[0] = "zero";
$dim[1][0] = "one-zero";
$dim[1][1][1] = "one-one-one";
$dim[2][0][0] = "two-zero-zero";

print Dumper \@dim;
$VAR1 = [
          'zero',
          [
            'one-zero',
            [
              undef,
              'one-one-one'
            ]
          ],
          [
            [
              'two-zero-zero'
            ]
          ]
        ];

Anonymous hashes

  • anonymous hash
  • {}}
  • HoH}
#!/usr/bin/perl
use strict;
use warnings;

my $phones_ref = {
    Foo   => 123,
    Bar   => 456,
    Moo   => 789,
};

print "$phones_ref->{Foo}\n";   # 123

Hash of Hashes (HoH)

#!/usr/bin/perl
use strict;
use warnings;

use Data::Dumper;

my %distance;
$distance{'New York'}{'London'} = 5_027;
$distance{'New York'}{'Bejin'}  = 10_100;
$distance{'Paris'}{'London'}    = 350;

print Dumper \%distance;
$VAR1 = {
          'Paris' => {
                       'London' => 350
                     },
          'New York' => {
                          'Bejin' => 10100,
                          'London' => 5027
                        }
        };

More complex data structures

$grade{Name}{Subject}[index] = Exam-Grade;
$grade{Name}{Subject} = Final-Grade;
#!/usr/bin/perl
use strict;
use warnings;

use Data::Dumper;

my %grades;
$grades{"Foo Bar"}{Mathematics}   = 97;
$grades{"Foo Bar"}{Literature}    = 67;
$grades{"Peti Bar"}{Literature}   = 88;
$grades{"Peti Bar"}{Mathematics}  = 82;
$grades{"Peti Bar"}{Art}          = 99;
$grades{"Foo Bar"}{Chemistry}[0]  = 30;
$grades{"Foo Bar"}{Chemistry}[1]  = 48;
$grades{"Foo Bar"}{Chemistry}[2]  = 72;
$grades{"Foo Bar"}{Chemistry}[3]  = 80;

print Dumper \%grades;

$VAR1 = {
          'Peti Bar' => {
                          'Art' => 99,
                          'Literature' => 88,
                          'Mathematics' => 82
                        },
          'Foo Bar' => {
                         'Chemistry' => [
                                          30,
                                          48,
                                          72,
                                          80
                                        ],
                         'Literature' => 67,
                         'Mathematics' => 97
                       }
        };

Memory leak with cross references

#!/usr/bin/perl 
use strict;
use warnings;

use Data::Dumper;

# memory leak
# try running with 5,000,000

make_match('Foo', 'Bara', 'dump');

my $n = $ARGV[0] || 0;

for (1..$n) {
    make_match('Foo', 'Bara');
}
if (@ARGV) {
    print "Please press ENTER to finish...";
    <STDIN>;
}

sub make_match {
    my ($man_name, $woman_name, $dump) = @_;


    my $man = {
        name => $man_name,
    };
    my $wife = {
        name => $woman_name,
    };
    
    $man->{wife} = $wife;
    $wife->{man} = $man;
    
    if ($dump) {
        print Dumper $man, $wife;
    }
}


$VAR1 = {
          'wife' => {
                      'man' => $VAR1,
                      'name' => 'Bara'
                    },
          'name' => 'Foo'
        };
$VAR2 = $VAR1->{'wife'};

Run the script and when it displays the prompt, check the memory usage. Passing 100,000 on the command line made it use 39 Mb memory.

Memory leak with cross references - weaken

  • weaken
#!/usr/bin/perl 
use strict;
use warnings;

use Data::Dumper;
use Scalar::Util qw(weaken);

# memory leak
# try running with 5,000,000

make_match('Foo', 'Bara', 'dump');

my $n = $ARGV[0] || 0;

for (1..$n) {
    make_match('Foo', 'Bara');
}
if (@ARGV) {
    print "Please press ENTER to finish...";
    <STDIN>;
}

sub make_match {
    my ($man_name, $woman_name, $dump) = @_;


    my $man = {
        name => $man_name,
    };
    my $wife = {
        name => $woman_name,
    };
    
    $man->{wife} = $wife;
    $wife->{man} = $man;
    weaken $wife->{man};

    if ($dump) {
        print Dumper $man, $wife;
    }
}


Read CSV file

  • CSV

{% embed include file="src/examples/references/data.csv)

We would like to read in that file and be able to access the fname of row 5
as  $data[3]{fname}

# the fname on line 5 is in index 3 because: 
# line 1 is the header
# line 2 is element 0 in the array
# ...
# line 5 is element 3 in the array
#!/usr/bin/perl
use strict;
use warnings;

use Data::Dumper;

my $filename = shift 
    or die "Usage: $0 FILENAME ( examples/references/data.csv )\n";

my @data;
open my $fh, '<', $filename or die;

my $header = <$fh>;
chomp $header;
my @header = split /,/, $header;

while (my $line = <$fh>) {
    chomp $line;
    my %row;

    my @values = split /,/, $line;
    foreach my $i (0..@header-1) {
        my $field = $header[$i];
        $row{$field} = $values[$i];
    }
 
    # using hash slices:
    # @row{@header} = split /,/, $line;

    push @data, \%row;
}

print Dumper \@data;

$VAR1 = [
          {
            'lname' => 'Bar',
            'fname' => 'Boo',
            'phone' => '123'
          },
          {
            'lname' => 'Baz',
            'fname' => 'Foo',
            'phone' => '456'
          },
          {
            'lname' => 'Zorg',
            'fname' => 'Moo',
            'phone' => '789'
          }
        ];

Exercise: read in an ini file

Read in an ini file and create a two dimensional hash.
First dimension - section name. Second dimension - key name.
print $data{Colors}{blue};   # dark

{% embed include file="src/examples/references/data.ini)

Exercise: improve the csv reader

Instead of an array create a hash in which the key is
the fname and the value is the hash reference.

To get the phone of 'Foo' we could write:

$data{Foo}{phone}
  • First you can assume fname values are unique.
  • Then improve it by checking if the fname is indeed unique and warn if not.
  • Further improve it by removing the fname from the internal hashes.
  • Further improve by passing the name of the key field (currently fname) as a parameter.
  • Bonus: try to deal with the case when there are duplicate values for the key field without just giving an error message.

Solution: Read ini file

Read in an ini-file and create a two dimensional hash.

#!/usr/bin/perl
use strict;
use warnings;

use Data::Dumper;

my $filename = shift 
    or die "Usage: $0 FILENAME ( examples/references/data.ini )\n";

my %ini;
my $section = '';
open my $fh, '<', $filename or die;
while (my $line = <$fh>) {
    chomp $line;
    next if $line =~ /^\s*(#.*)?$/;
    if ($line =~ /^\[(.*)\]/) {
        $section = $1;
        next;
    }
    if ($line =~ /^(\w+)\s*=\s*(.*)$/) {
        $ini{$section}{$1} = $2;
        next;
    }
}

print Dumper \%ini;

Solution: improve the csv reader

#!/usr/bin/perl
use strict;
use warnings;

use Data::Dumper;

my $filename = shift 
    or die "Usage: $0 FILENAME ( examples/references/data.csv )\n";

my %data;
open my $fh, '<', $filename or die;

my $header = <$fh>;
chomp $header;
my @header = split /,/, $header;

while (my $line = <$fh>) {
    chomp $line;
    my %row;
    @row{@header} = split /,/, $line;
    
    my $key = $row{fname};
    $data{$key} = \%row;
}

print Dumper \%data;

autovivification

  • autovivification
#!/usr/bin/perl
use strict;
use warnings;

use Data::Dumper;

my $data;
$data->{Foo}{Age} = 23;

print Dumper $data;

if ($data->{Bar}{Age} > 18) {
    print "Your are too old\n";
}

print Dumper $data;


delete $data->{Moo}{Age};

print Dumper $data;

Output:

$VAR1 = {
          'Foo' => {
                     'Age' => 23
                   }
        };
$VAR1 = {
          'Bar' => {},
          'Foo' => {
                     'Age' => 23
                   }
        };
$VAR1 = {
          'Bar' => {},
          'Moo' => {},
          'Foo' => {
                     'Age' => 23
                   }
        };

What is autovivification?

Scalar references in Getopt::Long

  • Getopt::Long

It is used less often than the other references but one of the prominent uses is the GetOptions function of Getopt::Long.

use Getopt::Long;
my $file = "default.txt";
my $debug;
GetOptions(
        "file=s" => \$file,
        "debug"  => \$debug,
) or die;

Anonymous subroutines

#!/usr/bin/perl
use strict;
use warnings;

my $add_ref = sub {
    my ($x, $y) = @_;
    return $x+$y;
};

print &{ $add_ref }(2, 3), "\n";
print $add_ref->(2, 3), "\n";

Uses of Subroutine references

The find() function of File::Find.

Exercise: DNA Sequence Analyzer

Create a function for researching DNA sequences. Below, you can see the "source" file. There is all kind of extra information, that is currently not interesting to us. Implement the read_file function that goes over the given file extracting and returning the list of all the DNA sequences. The user of your function can now, create a foreach-loop going over the list of DNA sequences and call her own function that will analyze the sequence and print the results. See the input file, the skeleton of the solution in which I already added the analyze function the end user is going to write. The expected output can be also found. (Of course this read_file should work regardless of what the analyze function does.)

Some header text
-----

name: Name of the source
date: 1999.01.23
DNA: GAGATTC

name: Name of the source
date: 1999.01.24
DNA: GAGATCCTGC

name: Name of the source
date: 2007.01.24
DNA: CGTGAGAATCTGC

name: Name of the source
date: 2008.01.24
DNA: CGTGATCTGC
#!/usr/bin/perl
use strict;
use warnings;

my $file = 'dna.txt';

my @dna_sequences = read_file($file);
foreach my $seq (@dna_sequences) {
    analyze_dna($seq);
}
sub analyze_dna {
    my ($dna) = @_;
    if ($dna =~ /(.)\1/) {
        print "$dna has double $1\n";
    }
}


sub read_file {
   ...
}
GAGATTC has double T
GAGATCCTGC has double C
CGTGAGAATCTGC has double A

Solution: DNA Sequence Analyzer

#!/usr/bin/perl
use strict;
use warnings;

my $file = 'dna.txt';

my @dna_sequences = read_file($file);
foreach my $seq (@dna_sequences) {
    analyze_dna($seq);
}
sub analyze_dna {
    my ($dna) = @_;
    if ($dna =~ /(.)\1/) {
        print "$dna has double $1\n";
    }
}


sub read_file {
    my ($filename) = @_;
    open my $fh, '<', $filename or die;

    my @results;
    while (my $line = <$fh>) {
        chomp $line;
        if ($line =~ /^DNA:\s*([CGTA]+)/) {
            push @results, $1;
        }
    }
    return @results;
}

Exercise: DNA Sequence Analyzer with callback

The above can work, but if the file is huge, we might not be able to hold all the list in memory. Change the read_file function that to allow the user to supply the analyze function (or rather the reference to the analyze function) as a parameter. See the skeleton below. The output should be the same as above.

#!/usr/bin/perl
use strict;
use warnings;

my $file = 'dna.txt';

my @dna_sequences = read_file(\&analyze_dna, $file);
sub analyze_dna {
    my ($dna) = @_;
    if ($dna =~ /(.)\1/) {
        print "$dna has double $1\n";
    }
}

sub read_file {
    ...
    return;
}

Solution: DNA Sequence Analyzer with callback

#!/usr/bin/perl
use strict;
use warnings;

my $file = 'dna.txt';

my @dna_sequences = read_file(\&analyze_dna, $file);
sub analyze_dna {
    my ($dna) = @_;
    if ($dna =~ /(.)\1/) {
        print "$dna has double $1\n";
    }
}

sub read_file {
    my ($sub, $filename) = @_;
    open my $fh, '<', $filename or die;

    while (my $line = <$fh>) {
        chomp $line;
        if ($line =~ /^DNA:\s*([CGTA]+)/) {
            $sub->($1);
        }
    }
    return;
}

Exercise: DNA Sequence Analyzer with shortcut

What if you would like to provide the user the ability to stop the processing any time she wants even if the file has not finished yet? For example when she found the first match. This can be controlled by the return value of the analyze function. If it is true, stop processing. See the skeleton and the expected output:

#!/usr/bin/perl
use strict;
use warnings;

my $file = 'dna.txt';

my @dna_sequences = read_file(\&analyze_dna, $file);
sub analyze_dna {
    my ($dna) = @_;
    if ($dna =~ /(.)\1/) {
        print "$dna has double $1\n";
        return 1
    }

    return 0;
}

sub read_file {
    ...
    return;
}
GAGATTC has double T

Solution: DNA Sequence Analyzer with shortcut

#!/usr/bin/perl
use strict;
use warnings;

my $file = 'dna.txt';

my @dna_sequences = read_file(\&analyze_dna, $file);
sub analyze_dna {
    my ($dna) = @_;
    if ($dna =~ /(.)\1/) {
        print "$dna has double $1\n";
        return 1
    }

    return 0;
}

sub read_file {
    my ($sub, $filename) = @_;
    open my $fh, '<', $filename or die;

    while (my $line = <$fh>) {
        chomp $line;
        if ($line =~ /^DNA:\s*([CGTA]+)/) {
            last if $sub->($1);
        }
    }
    return;
}

Dispatch Table

#!/usr/bin/perl
use strict;
use warnings;


sub add {
    my ($x, $y) = @_;
    return $x+$y;
}

sub multiply {
    my ($x, $y) = @_;
    return $x*$y;
}

my %dispatch = (
    '+'  => \&add,
    '*'  => \&multiply,
);

my $op = '+';
print $dispatch{$op}->(2, 3), "\n";

Dispatch Table using symbolic references

#!/usr/bin/perl
use strict;
use warnings;


sub add {
    my ($x, $y) = @_;
    return $x+$y;
}

sub multiply {
    my ($x, $y) = @_;
    return $x*$y;
}

my %dispatch = (
    '+'  => 'add',
    '*'  => 'multiply',
);

my $op = '+';

# Generally not recommended to use symbolic references!!
# unless you really know what are you doing
{
    no strict 'refs';
    print $dispatch{$op}->(2, 3), "\n";
}

The ref() function

  • ref
my $name = 'Foo';
ref($name);           # undef
ref(\$name);          # SCALAR
my $array_ref = [];
ref($array_ref);      # ARRAY
my $hash_ref = {};
ref($hash_ref);       # HASH
my $sub_ref = sub {};
ref($sub_ref);        # CODE

Copy a data structure

Copy an array or hash referred to by a reference.
#!/usr/bin/perl
use strict;
use warnings;

my $names_ref = [qw(Foo Zorg)];

# copy the ARRAY reference, the content is the same location
my $names_other_ref  = $names_ref;
$names_other_ref->[0] = 'Bar';
print "$names_other_ref->[0]\n";    # Bar
print "$names_ref->[0]\n";          # Bar
print "$names_ref\n";
print "$names_other_ref\n";

# copy the content of the ARRAY reference
my $names_yet_other_ref = [ @{$names_ref} ];
$names_yet_other_ref->[0] = 'Moo';
print "$names_yet_other_ref->[0]\n";    # Moo
print "$names_ref->[0]\n";              # Bar 
print "$names_other_ref->[0]\n";        # Bar
print "$names_yet_other_ref\n";

Copy HASH reference
$other_phones_ref = { %{$phones_ref} };

Deep copy

  • deep copy
#!/usr/bin/perl
use strict;
use warnings;

use Data::Dumper qw(Dumper);

my $ini = {
    'name' => {
            Foo => 123,
            Bar => 456,
    }
};

my $other_ini = { %{ $ini } };
print "$ini->{name}{Foo}\n";          # 123
print "$other_ini->{name}{Foo}\n";    # 123

$ini->{phone}{Baz} = 678;
print "$ini->{phone}{Baz}\n";         # 678
print "$other_ini->{phone}{Baz}\n";   # undef as expected
print Dumper $ini, $other_ini;

$ini->{name}{Foo} = 999;
print "$ini->{name}{Foo}\n";          # 999
print "$other_ini->{name}{Foo}\n";    # 999 !!!!
print Dumper $ini, $other_ini;

123
123
678

$VAR1 = {
          'name' => {
                      'Bar' => 456,
                      'Foo' => 123
                    },
          'phone' => {
                       'Baz' => 678
                     }
        };
$VAR2 = {
          'name' => $VAR1->{'name'},
          'phone' => {}
        };
999
999
$VAR1 = {
          'name' => {
                      'Bar' => 456,
                      'Foo' => 999
                    },
          'phone' => {
                       'Baz' => 678
                     }
        };
$VAR2 = {
          'name' => $VAR1->{'name'},
          'phone' => {}
        };

Deep copy - Storable dclone

  • dclone
#!/usr/bin/perl
use strict;
use warnings;

use Storable qw(dclone);

my $ini = {
    'name' => {
            Foo => 123,
            Bar => 456,
    }
};

my $other_ini = dclone($ini);
print "$ini->{name}{Foo}\n";
print "$other_ini->{name}{Foo}\n";

$ini->{phone}{Baz} = 678;
print "$ini->{phone}{Baz}\n";
print "$other_ini->{phone}{Baz}\n"; #undef

$ini->{name}{Foo} = 999;
print "$ini->{name}{Foo}\n";          # 999
print "$other_ini->{name}{Foo}\n";    # 123

Serialization

Data::Dumper

#!/usr/bin/perl
use strict;
use warnings;

use Data::Dumper ();

my $data_structure = {
    phones => {
        Foo     => 123, 
        Bar     => 345,
        Baz     => 678,
    }
};

open my $fh, '>', 'dumped.data' or die;
print $fh Data::Dumper->Dump([$data_structure], ['data_structure']);
close $fh;

$data_structure = {
                    'phones' => {
                                  'Bar' => 345,
                                  'Baz' => 678,
                                  'Foo' => 123
                                }
                  };
#!/usr/bin/perl
use strict;
use warnings;

my $data_structure;
my $filename = 'dumped.data';
if (open my $fh, '<', $filename) {
    local $/ = undef;
    my $dump = <$fh>;
    eval $dump;
} else {
    die "Could not open '$filename' $!";
}

print "Foo: $data_structure->{phones}{Foo}\n"; # Foo: 123
print "Bar: $data_structure->{phones}{Bar}\n"; # Bar: 345
print "Baz: $data_structure->{phones}{Baz}\n"; # Baz: 678

Data::Dumper both dump and restore

#!/usr/bin/perl
use strict;
use warnings;

use Data::Dumper ();

my $data_structure;
my $filename = 'dumped.data';

if (-e $filename and open my $fh, '<', $filename) {
    local $/ = undef;
    my $dump = <$fh>;
    eval $dump;
} else {
    # initialize
    $data_structure = {
        phones => {
            Foo     => 0, 
            Bar     => 0,
            Baz     => 0,
        }
    }
};

# update data
$data_structure->{phones}{Foo} += 1 * int rand 5;
$data_structure->{phones}{Bar} += 3 * int rand 5;
$data_structure->{phones}{Baz} += 9 * int rand 5;


print "Foo: $data_structure->{phones}{Foo}\n";
print "Bar: $data_structure->{phones}{Bar}\n";
print "Baz: $data_structure->{phones}{Baz}\n";


open my $fh, '>', $filename or die;
print $fh Data::Dumper->Dump([$data_structure], ['data_structure']);
close $fh;

Storable

#!/usr/bin/perl
use strict;
use warnings;

use Storable qw(store);

my $data_structure = {
    phones => {
        Foo     => 123, 
        Bar     => 345,
        Baz     => 678,
    }
};

store $data_structure, 'frozen.data' or die;

creates a binary file

#!/usr/bin/perl
use strict;
use warnings;

use Storable qw(retrieve);

my $data = retrieve 'frozen.data' or die;
print "Foo: $data->{phones}{Foo}\n"; # Foo: 123
print "Bar: $data->{phones}{Bar}\n"; # Bar: 345
print "Baz: $data->{phones}{Baz}\n"; # Baz: 678

Storable in memory freezing

#!/usr/bin/perl
use strict;
use warnings;

use Storable qw(freeze thaw);

my $data_structure = {
    phones => {
        Foo     => 123, 
        Bar     => 345,
        Baz     => 678,
    }
};

my $frozen = freeze $data_structure;

# here send it over the network 
# or save it in a database

my $new_data_structure = thaw $frozen;

print "$new_data_structure->{phones}{Foo}\n"; # 123
print "$new_data_structure->{phones}{Bar}\n"; # 345

print "$data_structure\n";      # HASH(0x4e8144)
print "$new_data_structure\n";  # HASH(0x27b4a3c)

print "$data_structure->{phones}\n";     # HASH(0x4e7fe4)
print "$new_data_structure->{phones}\n"; # HASH(0x1ffe294)

YAML

---
people:
  Bar:
    phones:
      - 345
      - 678
    title: CEO
  Baz: NA
  Foo:
    phone: 123
#!/usr/bin/perl
use strict;
use warnings;

use YAML qw(DumpFile);

my $data_structure = {
    people => {
        Foo     => {
                phone => '123',
        },
        Bar     => {
            phones => [
                    '345',
                    '678',
            ],
            title => 'CEO',
        },
        Baz     => 'NA',
    }
};

DumpFile 'data.yml', $data_structure;

#!/usr/bin/perl
use strict;
use warnings;

use YAML qw(LoadFile);
my $data = LoadFile('data.yml');
print "$data->{people}->{Foo}->{phone}\n";          # 123
print "$data->{people}->{Bar}->{phones}->[0]\n";    # 345
print "$data->{people}->{Baz}\n";                   # NA

YAML in one file

#!/usr/bin/perl
use strict;
use warnings;

use YAML qw(DumpFile LoadFile);

my $data;
my $yml_file = 'data.yml';
if (-e $yml_file) {
    $data = LoadFile($yml_file);
} else {
    $data = {
        people => {
            Foo     => {
                    phone => '123',
            },
            Bar     => {
                phones => [
                        '345',
                        '678',
                ],
                title => 'CEO',
            },
            Baz     => 'NA',
        }
    };
}


print "Foo: $data->{people}{Foo}{phone}\n";       # Foo: 123
print "Bar: $data->{people}{Bar}{phones}->[0]\n"; # Bar: 345
print "Baz: $data->{people}{Baz}\n";              # Baz: NA

DumpFile $yml_file, $data;

JSON in one file

{
   "phones" : {
      "Bar" : "21",
      "Baz" : "72",
      "Foo" : "12"
   }
}
#!/usr/bin/perl
use strict;
use warnings;

use JSON qw(from_json to_json);

my $data;
my $json_file = 'data.json';
if (-e $json_file) {
    open my $fh, '<', $json_file or die;
    local $/ = undef;
    my $json = <$fh>;
    $data = from_json($json);
} else {
    # initialize
    $data = {
        phones => {
            Foo     => 0,
            Bar     => 0,
            Baz     => 0,
        }
    };
}


# update data
$data->{phones}{Foo} += 1 * int rand 5;
$data->{phones}{Bar} += 3 * int rand 5;
$data->{phones}{Baz} += 9 * int rand 5;

print "Foo: $data->{phones}{Foo}\n";
print "Bar: $data->{phones}{Bar}\n";
print "Baz: $data->{phones}{Baz}\n";

open my $out, '>', $json_file or die;
print $out to_json($data, { pretty => 1, utf8 => 1, });

Main uses of references

  • Creating complex data structure (2 or more dimensional arrays, hashes and more complex structures)
  • Passing arrays, hashes and more complex data structure to and from functions

Exercise: save ini and csv as YAML

Take the ini-file reader and the csv reader and save the resulting data structures in YAML format.

Exercise: Create a cache for NetSlow

You are using a module call NetSlow.pm and its sole function compute();
This function can get two numbers and after a remote procedure call
(that takes a lot of time) returns a single value. Create a local cache
of the given input values and results so you won't need to access the
remote machine for identical calls. (Make sure your cache is persistent
between execution of your script.

examples/references/NetSlow.pm

Here is how we can use the module: 
#!/usr/bin/perl
use strict;
use warnings;

use lib 'examples/references';
use NetSlow qw(compute);

die "Need 2 numbers\n" if @ARGV != 2;

print compute(@ARGV), "\n";
Later on you find out that the results are changing over time.
You don't want to drop the caching and you decide you can live with
certain lack of accuracy. You decide you fetch the value again if 
the last call to that specific set of input values was computed more 
than 5 seconds ago. That is you expire the cache after 5 seconds.

Exercise: create a function that generates numbers multipliers

Create a function that given a number returns a subroutine reference. Calling that subroutine with an array reference will then multiply each value by the originally given number.

my $duplicate = multiplier_generator(2);

my @numbers = (2, 4, 7);
$duplicate->(\@numbers);
print "@numbers\n";   # 4 8 14

Solution: save ini and csv as YAML

Just add the following code:

use YAML qw(DumFile);
DumpFile 'ini.yml', \@data; # in read_csv_file_hash.pl

DumpFile 'ini.yml', \%ini; # in read_ini_file.pl

DumpFile 'csv.yml', \%data; # in read_csv_file_hash.pl

Solution: Create a cache for NetSlow

#!/usr/bin/perl
use strict;
use warnings;

use lib 'examples/references';
use NetSlow qw(compute);
use YAML qw(DumpFile LoadFile);

die "Need 2 numbers\n" if @ARGV != 2;

my $cache_file = "netslow_cache.yml";
my $cache;

if (-e $cache_file) {
    $cache = LoadFile($cache_file);
}
my ($x, $y) = @ARGV;
my $result;

if (not defined $cache->{$x}{$y}) {
    $cache->{$x}{$y} = compute($x, $y);
}
DumpFile($cache_file, $cache);
print $cache->{$x}{$y}, "\n";

Solution: NetSlow cache with timeout

#!/usr/bin/perl
use strict;
use warnings;

use lib 'examples/references';
use NetSlow qw(compute);
use YAML qw(DumpFile LoadFile);

die "Need 2 numbers\n" if @ARGV != 2;

my $cache_file = "netslow_cache_timeout.yml";
my $cache;

if (-e $cache_file) {
    $cache = LoadFile($cache_file);
}
my ($x, $y) = @ARGV;
my $result;

my $TIMEOUT = 5;

if (not defined $cache->{$x}{$y} or
    $cache->{$x}{$y}{timestamp} < time() - $TIMEOUT ) {
    $cache->{$x}{$y}{value} = compute($x, $y);
    $cache->{$x}{$y}{timestamp} = time;
}
DumpFile($cache_file, $cache);
print $cache->{$x}{$y}{value}, "\n";

Resources

Perl documentation:

  • perlreftut - Mark Jason Dominus's very short tutorial about references
  • perllol - Arrays of Arrays
  • perldsc - Data Structure Cookbook
  • perlref - Perl references and nested data structures
  • perldata - Perl data types

Books:

  • Perl Objects, References & Modules (Intermediate Perl)
  • Advanced Perl Programming 1st edition !
  • Object Oriented Perl

Processes and signals

Signals and the kill function

  • $SIG
  • %SIG
  • kill

List of signals on Linux: man -S 7 signal

To send a signal use kill SIG, LIST (process IDs)

For example: kill 9, $pid;

use strict;
use warnings;

my ($sig, @process) = @ARGV;

die "Usage: $0 SIGNAL PROCESS-ID\n" if not @process;
kill $sig, @process;

Catch signals

#!/usr/bin/env perl
use strict;
use warnings;

$SIG{KILL} = sub {
   print "KILL received. kill -9  We cannot catch it.\n";
};

$SIG{INT} = sub {
    print "INT received. kill -2  or Ctrl-C\n";
};

$SIG{TERM} = sub {
    print "TERM received. kill -15\n";
};

$SIG{TSTP} = sub {
    print "TSTP received. kill -20 or Ctrl-Z\n";
};

print "Please press Ctrl-C or Ctrl-Z\n";
print "(or type   kill -15 $$   on the command line)\n";

for my $i (1..100) {
    sleep 1;
    print "$i\n";
}
print "done\n";


$SIG{$name} = sub {};     # do something when signal received
$SIG{$name} = 'IGNORE';   # ignore it
$SIG{$name} = 'DEFAULT';  # reset to the default behavior
#!/usr/bin/env perl
use strict;
use warnings;

my $ctrl_c = 0;

$SIG{INT} = sub {
    if ($ctrl_c) {
        print "\nCtrl-C received twice\n";
        exit();
    }
    $ctrl_c++;
    print "\nPlease, press Ctr-C again if you really mean it\n";
};

print "Please press Ctrl-C to terminate this program:\n";
my $i = 0;
while (1) {
    $i++;
    print "$i\n";
    sleep 1;
}
print "done\n";

Exercise: Catch ctrl-c and ask continue or terminate?

Take the examples/signals/catch_ctr_c.pl and change it so when the user presses Ctrl-C the counting stops and the user is asked if she really wants to terminate the program. (y/n).

If yes is given then quit. If no is given continue.

If Ctrl-c is pressed again later then ask again.

Make sure you do as little as possible in the actual signal handle.

Solution: Catch ctrl-c and ask

#!/usr/bin/env perl
use strict;
use warnings;

my $ctrl_c;

$SIG{INT} = sub {
        $ctrl_c = 1;
};

print "Please press Ctrl-C to terminate this program:\n";
for my $i (1..100) {
    confirm_exit();
    sleep 1;
    print "$i\n";
}
print "done\n";

sub confirm_exit {
    return if not $ctrl_c;

    local $SIG{INT} = 'IGNORE';

    $ctrl_c = 0;
    print "Do you really want to terminate the application? (y/n) [n]";
    chomp(my $input = <STDIN>);
    exit if $input eq 'y' or $input eq 'Y';
}

See what happens if you don't set the SIG handler to IGNORE?

See what happens if you remove the word local?

Graceful termination of process

This process will catch both the INT and the TERM signals and in both cases it will flip a flag and based on that flag it will stop he program. INT is received when the user presses Ctrl-C or if using a another terminal the user sends the kill -2 signal. TERM is kill -15.

When we run the program it will print out its process ID and instructions how to stop it.

kill -15 810344 or press Ctrl-C to stop

So Ctrl-C instead of just killing the process in mid-operation it will tell the process to "please stop" and the loop will finish whatever it was doing and exit the program.

#!/usr/bin/env perl
use strict;
use warnings;

print "kill -15 $$    or press Ctrl-C    to stop\n";
my $continue = 1;
$SIG{TERM} = sub {
    print "TERM received\n";
    $continue = 0;
};

$SIG{INT} = sub {
    print "INT received\n";
    $continue = 0;
};

while ($continue) {
    if (open my $fh, '>>', 'process.log') {
        print $fh scalar localtime();
        print $fh "\n";
    }
    sleep 1;
}

Advanced slides

Always use strict and warnings

  • strict
  • warnings

Always start your Perl code with:

use strict;
use warnings;

Example code that generates warning:

Add code everywhere to avoid warnings:

#!/usr/bin/perl 
use strict;
use warnings;

my $total;

add();

print "$total\n";


sub add {
    $total = $total + rand();
}

# triggers:
# Use of uninitialized value in addition (+)
#   at examples/advanced/code_with_warnings.pl line 14.
# or in 5.10
# Use of uninitialized value $total in addition (+)
#   at examples/advanced/code_with_warnings.pl line 14.

Avoid warnings

#!/usr/bin/perl 
use strict;
use warnings;

my $total;

add();

print "$total\n";


sub add {
    if (not defined $total) {
        $total = 0;
    }
    $total = $total + rand();
}

A lot of work. Cannot be done easily to an existing application.

Turn off warnings selectively and in a small scope

  • no warnings
#!/usr/bin/perl 
use strict;
use warnings;

my $total;

add();

print "$total\n";


sub add {
    no warnings 'uninitialized';
    $total = $total + rand();
}
See
perldoc warnings       for the use of warnings
perldoc perllexwarn    for the warning categories
perldoc perldiag       for a list of warnings and errors

Catch and log warnings

  • $SIG
  • WARN
#!/usr/bin/perl
use strict;
use warnings;

local $SIG{__WARN__} = sub {
    my $message = shift;
    logger('warning', $message);
};

my $total;
add();
print "$total\n";

sub add {
    $total = $total + rand();
}

# Use of uninitialized value in addition (+)
#    at examples/advanced/code_with_warnings.pl line 14.

sub logger {
    my ($level, $msg) = @_;
    if (open my $out, '>>', 'log.txt') {
        chomp $msg;
        print $out "$level - $msg\n";
    }
}

%SIG holds all the signals perl can deal with with the two special signal handles WARN and DIE sub {} is an anonymous subroutine we will discuss later

#!/usr/bin/perl
use strict;
use warnings;

my %WARNS;
local $SIG{__WARN__} = sub {
    my $message = shift;
    return if $WARNS{$message}++;
    logger('warning', $message);
}

my $counter;
count();
print "$counter\n";
$counter = undef;
count();

sub count {
    $counter = $counter + 42;
}

# Use of uninitialized value in addition (+)
#    at examples/advanced/code_with_warnings.pl line 14.

sub logger {
    my ($level, $msg) = @_;
    if (open my $out, '>>', 'log.txt') {
        chomp $msg;
        print $out "$level - $msg\n";
    }
}

splain and use diagnostics

  • splain
  • diagnostics
perl examples/advanced-perl/code_with_warnings.pl 2> err.txt
splain err.txt

The output looks like this:

Use of uninitialized value $total in addition (+) at
	examples/advanced-perl/use_diagnostics.pl line 15 (#1)
    (W uninitialized) An undefined value was used as if it were already
    defined.  It was interpreted as a "" or a 0, but maybe it was a mistake.
    To suppress this warning assign a defined value to your variables.

    To help you figure out what was undefined, perl will try to tell you the
    name of the variable (if any) that was undefined. In some cases it cannot
    do this, so it also tells you what operation you used the undefined value
    in.  Note, however, that perl optimizes your program and the operation
    displayed in the warning may not necessarily appear literally in your
    program.  For example, "that $foo" is usually optimized into "that "
    . $foo, and the warning will refer to the concatenation (.) operator,
    even though there is no . in your program.

Alternatively you could also insert the following in your code: use diagnostics; to get the explanations for every warning. See also perldoc perldiag for a detailed explanation of each warning and error.

Fatal warnings

  • FATAL
use warnings FATAL => 'all';

Logging Exceptions

  • DIE
$SIG{__DIE__} = sub { logger('error', $_[0]) };
#!/usr/bin/perl 
use strict;
use warnings;

$SIG{__DIE__}  = sub { logger('error', $_[0]); };


print "Before\n";
die 'Something bad';
print "After\n";



sub logger {
    my ($level, $msg) = @_;
    if (open my $out, '>>', 'log.txt') {
        chomp $msg;
        print $out "$level - $msg\n";
    }
}

Always open files in the new way

  • open

Old way:

#!/usr/bin/perl
use strict;
use warnings;

my $filename = "data.txt";

open(FH, ">$filename") or die;
print FH "data";
close FH;

open(FH, $filename) or die;
my $line = <FH>;
close FH;

Recommended way:

#!/usr/bin/perl
use strict;
use warnings;

my $filename = "data.txt";

open(my $wfh, '>', $filename) or die;
print $wfh "data";
close $wfh;

open(my $rfh, '<', $filename) or die;
my $line = <$rfh>;
close $rfh;


Security problems.

Being global, difficult to pass as parameter to functions.

Array slices

#!/usr/bin/perl 
use strict;
use warnings;


my @letters = qw(a b c d e f g h);
print "@letters\n";              # a b c d e f g h
print "$letters[3]\n";           # d   the element with index 3
print "@letters[3]\n";           # d   (a one element array !)
   # generates a warning:
   # Scalar value @letters[3] better written as $letters[3]
   # at examples/advanced/array_slices.pl line 9.

my @data1 = ($letters[3], $letters[2]);        # d c
print "@data1\n";

my @data2 = @letters[3, 2];                    # d c
print "@data2\n";        # d c

print "@letters[3..5]\n";        # d e f
print "@letters[3,3,2]\n";       # d d c

A few more examples

my @i = (3, 5, 7);
print "@i\n";                           # 3 5 7
print "@letters[@i]\n";                 # d f h
print "@letters[split ' ', '3 5 7']\n"; # d f h

Array slices on the fly

#!/usr/bin/perl 
use strict;
use warnings;

my $line = "named:x:44:66:Nameserver Daemon:/var/named:/bin/bash";

my @fields = split ":", $line;         # fetch 3 values like this
my $uid  = $fields[2];
my $gid  = $fields[3];
my $home = $fields[5];

($uid, $gid, $home) = @fields[2,3,5];

($uid, $gid, $home) = (split ":", $line)[2,3,5];  # or like this
print "uid:  $uid\n";
print "gid:  $gid\n";
print "home: $home\n";

Hash slices

#!/usr/bin/perl 
use strict;
use warnings;


my %phone_of = (
    Foo    => '123-foo',
    Bar    => '123-bar',
    Baz    => '123-baz',
    Moo    => '123-moo',
);

print "$phone_of{Foo}\n";

my @phones = ($phone_of{Foo}, $phone_of{Bar});

@phones = @phone_of{'Bar', 'Baz'};

@phones = @phone_of{ qw(Bar Baz) };

my @selected_people = qw(Bar Baz); 
@phones = @phone_of{ @selected_people };

print "@phones\n";

Hash slices in assignment

#!/usr/bin/perl 
use strict;
use warnings;


my @fields = qw(fname lname phone);

my @values = qw(Peti Bar 12345);

my %h;
@h{@fields} = @values;


print "$h{fname}\n";   # Peti
print "$h{lname}\n";   # Bar
print "$h{phone}\n";   # 12345

splice

#!/usr/bin/perl
use strict;
use warnings;

my @names = qw(Foo Bar Baz Moo Qux Barney Hoppy Bammbamm Dino);
my @more_names = qw(Fred Wilma);

my @sublist = splice(@names, 2, 3);
print "@sublist\n";              # Baz Moo Qux
print "@names\n";                # Foo Bar Barney Hoppy Bammbamm Dino

my @zlist = splice(@names, 2, 3, @more_names);
print "@zlist\n";                # Barney Hoppy Bammbamm
print "@names\n";                # Foo Bar Fred Wilma Dino

Splice to slice and dice arrays in Perl

LIST and SCALAR context

  • LIST
  • SCALAR
  • context

See the behavior of localtime().

#!/usr/bin/perl
use strict;
use warnings;

my $t = time;

my $lt = localtime($t);
print "$lt\n";       # Fri May 20 11:26:23 2011

my @time = localtime($t);    
print "@time\n";     # 23 26 11 20 4 111 5 139 1

How could we implement something similar?

wantarray

When called within a function it will return undef, true or false depending on how was the function called.

undef  if it was called in a void context like f();
true   if it was called in a list context like @x = f(); or print f();
false  if it was called in scalar context like $x = f(); or if($f()) {...}
#!/usr/bin/perl
use strict;
use warnings;

sub f {
    if (not defined wantarray()) {
        print "called in VOID context\n";
    } elsif (wantarray()) {
        print "called in LIST context\n";
    } else {
        print "called in SCALAR context\n";
    }
}

f();          # called in VOID context
my @y = f();  # called in LIST context
my $x = f();  # called in SCALAR context

wantarray example

#!/usr/bin/perl
use strict;
use warnings;

sub count {
    if (not defined wantarray) {
        print "ERROR - function called in void context\n";
        return;
    }

    my $count = @_;
    if (not wantarray) {
        #print "SCALAR\n";
        return $count;
    }

    my $sum=0;
    while (my $v = shift @_) {
        $sum += $v;
    }

    #print "ARRAY\n";
    return ($count, $sum);
}

count();                    # call in void context, ERROR message

my (@x) = count(2,3,5);     # LIST context
print "@x\n";               # 3 10

my $z = count(2,3,5);       # SCALAR context
print "$z\n";               # 3

See also Want and Contextual::Return for even more options.

Slow sorting

The problem: bad performance

#!/usr/bin/perl
use strict;
use warnings;


my @files = glob "*.xml";

my @sorted_files = sort { -s $a <=> -s $b } @files;

print "@sorted_files\n";

Speed up sorting

Reduce the number of slow calls from n**2 to n.

#!/usr/bin/perl
use strict;
use warnings;


my @files = glob "*.xml";

my @unsorted_pairs = map  { [$_, -s $_] } @files;
my @sorted_pairs   = sort { $a->[1] <=> $b->[1] } @unsorted_pairs;
my @quickly_sorted_files = map  { $_->[0] } @sorted_pairs;

print "@quickly_sorted_files\n";

Schwartzian transformation

#!/usr/bin/perl
use strict;
use warnings;

my @files = glob "*.xml";

my @quickly_sorted_files =
    map  { $_->[0] }
    sort { $a->[1] <=> $b->[1] }
    map  { [$_, -s $_] }
    @files;

print "@quickly_sorted_files\n";

Compilation phases: BEGIN, CHECK, INIT, END

  • BEGIN
  • CHECK
  • INIT
  • END
BEGIN {
    # do something here
}
* You can have more than one of each one

BEGIN  - execute as soon as possible (compiled)
CHECK  - execute after the whole script was compiled
INIT   - execute before running starts
END    - execute as late as possible (after exit() or die())

When running perl -c script.pl   both the BEGIN and CHECK blocks are executed.

perldoc perlmod
#!/usr/bin/perl
use strict;
use warnings;

END {
    print "END\n";
}

print "BODY\n";
my $x = 0;
print 1/$x;          # error the program stops working

print "AFTER ERROR\n";

BEGIN {
    print "BEGIN\n";
}

print "After BEGIN block\n";

BEGIN
BODY
Illegal division by zero at examples/other/begin_end.pl line 11.
END
#!/usr/bin/perl
use strict;
use warnings;

BEGIN {
    print "BEGIN\n";
}

CHECK {
    print "CHECK\n";
}


INIT {
    print "INIT\n";
}

END {
    print "END\n";
}

print "BODY\n";

BEGIN
CHECK
INIT
BODY
END

AUTOLOAD

  • AUTOLOAD
  • $AUTOLOAD
#!/usr/bin/perl
use strict;
use warnings;

f("hello", "world");

Output:

Undefined subroutine &main::f called

If you call a function that does not exist when the call is made, Perl will raise an Undefined subroutine called exception. If the exception is not handled it will stop your program.

In the exception it will also give you the full name of the function that was missing. In the above case it will be &main::f.

With all the other magic, Perl also provides you a tool that will help you deal with such situations. You can include a block called AUTOLOAD in your code and if AUTOLOAD {} is defined then it will be executed instead of every missing function. From that point your imagination is the only limiting factor on how to handle the situation.

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper qw(Dumper);

f("hello", "world");

AUTOLOAD {
    our $AUTOLOAD;
    print "$AUTOLOAD\n";
    print Dumper \@_;
}

Output:

main::f
$VAR1 = [
          'hello',
          'world'
        ];

Static variable

#!/usr/bin/perl
use strict;
use warnings;

{
    my $counter = 0;
    sub inc {
        $counter++;
        return $counter;
    }
}

print inc(), "\n";
print inc(), "\n";
print inc(), "\n";
print inc(), "\n";

Exercise: create counter

Create a script with a function called count() which is capable of maintaining several counters.

#!/usr/bin/perl 
use strict;
use warnings;

print counter('a'), "\n";  # 1
print counter('b'), "\n";  # 1
print counter('a'), "\n";  # 2
print counter('c'), "\n";  # 1
print counter('a'), "\n";  # 3
print counter('a'), "\n";  # 4
print counter('b'), "\n";  # 2

Solution: create counter

#!/usr/bin/perl 
use strict;
use warnings;

print counter('a'), "\n";  # 1
print counter('b'), "\n";  # 1
print counter('a'), "\n";  # 2
print counter('c'), "\n";  # 1
print counter('a'), "\n";  # 3
print counter('a'), "\n";  # 4
print counter('b'), "\n";  # 2

{
    my %counter;
    sub counter {
       my $id = shift;
       return ++$counter{$id};
    }
}

Saved variable: local

  • local

Slurp mode used local $/ = undef;

#!/usr/bin/perl
use strict;
use warnings;

our $global = 100;
our $my = 200;
our $local = 300;
print_variables();
              # $global   100
              # $my       200
              # $local    300
set();
              # $global   1
              # $my       2
              # $local    3
# inside print_variables():
              # $global   1
              # $my       200
              # $local    3

print_variables();
              # $global   1
              # $my       200
              # $local    300

sub set {
    $global = 1;
    my $my = 2;
    local $local = 3;

    print "$global\n";
    print "$my\n";
    print "$local\n";
    print "---\n";

    print_variables();
}

sub print_variables {
    print "$global\n";
    print "$my\n";
    print "$local\n";
    print "---\n";
}
100
200
300
---
1
2
3
---
1
200
3
---
1
200
300
---
use strict;
use warnings;

use Data::Dumper;

my %data = (
    foo => 1,
    bar => 1,
    moo => 1,
);

_dump();
_local_sorted();
_dump();
_sorted();
_dump();



sub _local_sorted {
    local $Data::Dumper::Sortkeys = 1;
    _dump();
}

sub _sorted {
    $Data::Dumper::Sortkeys = 1;
    _dump();
}

sub _dump {
    print Dumper \%data;
}

$VAR1 = {
          'bar' => 1,
          'moo' => 1,
          'foo' => 1
        };
$VAR1 = {
          'bar' => 1,
          'foo' => 1,
          'moo' => 1
        };
$VAR1 = {
          'bar' => 1,
          'moo' => 1,
          'foo' => 1
        };
$VAR1 = {
          'bar' => 1,
          'foo' => 1,
          'moo' => 1
        };
$VAR1 = {
          'bar' => 1,
          'foo' => 1,
          'moo' => 1
        };

autodie

  • autodie
#!/usr/bin/perl
use strict;
use warnings;

use autodie;

open my $fh, '<', 'data.txt';

print "ok\n";

Can't open 'data.txt' for reading: 'No such file or directory' at examples/advanced-perl/autodie.pl line 7

Modern::Perl

  • Modern::Perl
use Modern::Perl;

Loads strict, warnings, the features of 5.10 and the c3 Method Resolution Order

Check out the Modern Perl book

Perl::Critic

  • Perl::Critic
  • perlcritic

Originally based on the Perl Best Practices by Damian Conway. Written by Jeffrey Thalhammer. Try Perl::Critic online.

perlcritic -h
perlcritic -5 file_name.pl

Perl::Tidy

  • Perl::Tidy
  • perltidy
perltidy

caller

  • caller
#!/usr/bin/perl
use strict;
use warnings;

fib(4);

sub fib {
    my $n = shift;
    my ($package, $filename, $line) = caller(0);
    print "$package  $filename  $line\n";
    if ($n == 1 or $n == 2) {
        return 1;
    }
    return fib($n-1) + fib($n-2);
}

Log::Dispatch

  • Log::Dispatch
#!/usr/bin/perl
use strict;
use warnings;

use Log::Dispatch;
use Log::Dispatch::Screen;

my $logger = Log::Dispatch->new;
$logger->add( Log::Dispatch::Screen->new(
    name => 'screen',
    #min_level => 'warning',
    min_level => 'debug',
));

fib(4);
fib(0);

sub fib {
    my $n = shift;
    $logger->debug("fib($n)");

    if ($n < 1) {
        $logger->error("fib($n) is invalid");
        return;
    }

    if ($n == 1 or $n == 2) {
        return 1;
    }
    return fib($n-1) + fib($n-2);
}

Log::Log4perl easy

  • Log::Log4perl
use strict;
use warnings;
use 5.010;

use My::EasyApp;

use Log::Log4perl qw(:easy);
Log::Log4perl->easy_init($WARN);
FATAL "This is", " fatal";
ERROR "This is error";
WARN  "This is warn";
INFO  "This is info";
DEBUG "This is debug";
TRACE "This is trace";

my $app = My::EasyApp->new;
$app->run;

package My::EasyApp;
use strict;
use warnings;

use Log::Log4perl; #qw(get_logger);

sub new {
    bless {}, shift;
}
sub run {
    my $logger = Log::Log4perl->get_logger();
    $logger->fatal("FATAL from EasyApp");
    $logger->debug("DEBUG from EasyApp");
}

1;

2014/08/24 08:02:52 This is fatal
2014/08/24 08:02:52 This is error
2014/08/24 08:02:52 This is warn
2014/08/24 08:02:52 FATAL from EasyApp

Exercise: Log::Dispatch

Take the examples/advanced-perl/log_dispatch.pl and change it so only warnings are printed to the screen. debug messages are printed to a log file. The name of the file should be in the format YYYY-MM-DD.log The log line should include the name file name and the line number where it was called.

Check out the strftime of the POSIX module and see how Log::Dispatch allows you to provide a callback function.

Solution Log::Dispatch

#!/usr/bin/perl
use strict;
use warnings;
use 5.010;

use Log::Dispatch;
use Log::Dispatch::Screen;
use Log::Dispatch::File;
use POSIX ();

my $logger = Log::Dispatch->new;
$logger->add( Log::Dispatch::Screen->new(
    name => 'screen',
    min_level => 'warning',
));
$logger->add( Log::Dispatch::File->new(
    name      => 'file',
    min_level => 'debug',
    filename  => POSIX::strftime("%Y-%m-%d", localtime) . '.log',
    callbacks => \&callback,
));

fib(4);
fib(0);

sub fib {
    my $n = shift;
    $logger->debug("fib($n)");

    if ($n < 1) {
        $logger->error("fib($n) is invalid");
        $logger->log(level => 'error', message => 'another');
        return;
    }

    if ($n == 1 or $n == 2) {
        return 1;
    }
    return fib($n-1) + fib($n-2);
}

sub callback {
    my %arg = @_;
    my ($package, $filename, $line) = caller(6);
    return "$arg{message}  in $filename line $line\n";
}

Functional Programming in Perl

Programming Paradigms

Functional programming

  • Immutability (variables don't change)
  • Separation of data and functions.
  • Pure functions (no side-effects)
  • First-class functions (you can assign function to another name and you can pass function to other functions and return them as well. We can also manipulate functions)
  • Higher order functions: a functions that either takes a function as a parameter or returns a function as a parameter.

grep

  • grep
  • filter

The grep keyword in Perl is a generalization of the Unix/Linux grep tool. Given a condition and a list of values it will return a, usually shorter, list of elements that will return true if used in the expression. In other language the similar tool is called filter.

In this example we have an array of numbers and an expression comparing $_ which holds the current value as grep iterates over the elements of the array. If the current value is greater or equal than 5 then it will be passed to the left hand side, if it is less than 5 then it will be filtered out.

Note, there is no comma after the curly braces.

ARRAY = grep BLOCK LIST
#!/usr/bin/perl
use strict;
use warnings;

my @numbers = qw(8 2 5 3 1 7);
my @big_numbers = grep {$_ >= 5} @numbers;
print "@big_numbers\n";      # (8, 5, 7)

grep to filter files based on modification date

  • grep
#!/usr/bin/perl
use strict;
use warnings;

my @files = glob "*.xml";
my @old_files = grep { -M $_ > 365 } @files;
print join "\n", @old_files;

Imitating the Unix/Linux grep command in Perl

  • grep
#!/usr/bin/perl
use strict;
use warnings;

my $regex = shift;
print grep { $_ =~ /$regex/ } <>;

map

  • map

map will transform the content of a list.

Given a list of values that can come from an array or from calling the keys function on a hash or in any other way, we can apply a transformation on each element and then collect the transformed values on the left hand side of the assignment. e.g. in an array.

On each iteration the current element is placed in the $_ variable, the code in the block is executed, and the result is passed to the left-hand-side that collects all the responses.

ARRAY = map BLOCK LIST
#!/usr/bin/perl
use strict;
use warnings;

my @numbers = (1, 4, 7, 3);
my @doubles = map {$_ * 2} @numbers;
print "@doubles\n";  # 2 8 14 6

Use map to filter values

I am not sure why would you do this instead of using grep, but you can do this and this can bring use to another, more usefule example.

#!/usr/bin/perl
use strict;
use warnings;

my @numbers = (1, 4, 17, 3, 28, 4);
my @big_numbers = map {$_ >= 10 ? $_ : ()} @numbers;
print "@big_numbers\n";  # 17 28

Map to add more elements

#!/usr/bin/perl
use strict;
use warnings;

my @numbers = (1, 4, 17);
my @big_numbers = map {($_, $_)} @numbers;
print "@big_numbers\n";  # 1 1 4 4 17 17

Use map to filter and enrich

#!/usr/bin/perl
use strict;
use warnings;

my @numbers = (1..5);

my @riches = map { $_ > 3 ? ($_+1, $_+2) : () } @numbers;
print "@riches\n"; # 5 6 6 7

Create a hash from an array using map

  • map

  • We have a list of values. We would like to build a fast look-up table to check for existence.

The time it takes to check if a value can be found in an array is proportional to the length of the array. The complexity is O(n).

If you need to do it a lot of times you might be better off building a hash where the keys are the items coming from the array. The values don't matter as we will check the existance of a key. (Alternatively you can set the values of the hash to be 1 and then you can check if the the value is there.) The time it takes to look up a key in a hash does not depend on the size of the hash. It is O(1). So once we have the hash the look-up will be much faster. Building the hash is proportional to the number of items in the array.

So if we need to look up a very small number of elements or if the original array is small then probably it is better to just use the array.

If we need a lot of look-ups and there are many elements in the original array then building a temporary look-up hash might be a good idea.

We use more memory but we can gain speed.

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper qw(Dumper);

my @words = qw(Foo Bar Baz);

my %copy_paste = (Foo => 1, Bar => 1, Baz => 1);

my %lookup = map {$_ => 1} @words;

print Dumper \%copy_paste;
print Dumper \%lookup;

# $VAR1 = {
#           'Bar' => 1,
#           'Baz' => 1,
#           'Foo' => 1
#         };

Unique values

  • unique

You have a list of values with duplications, how can you create a unique list of the values?

#!/usr/bin/env perl
use strict;
use warnings;

my @data = qw(Earth Mars Earth Venus Earth Mars);
my @unique;
my %seen;

foreach my $value (@data) {
    if (! $seen{$value}) {
        push @unique, $value;
        $seen{$value}++;
    }
}

print "@unique\n"; # Earth Mars Venus

Unique values - improved

But actually we don't need to do it in two steps:

#!/usr/bin/env perl
use strict;
use warnings;

my @data = qw(Earth Mars Earth Venus Earth Mars);
my @unique;
my %seen;

foreach my $value (@data) {
    if (! $seen{$value}++ ) {
        push @unique, $value;
    }
}


print "@unique\n"; # Earth Mars Venus

Unique values using grep

Of course there is an even shorter way to write it:

#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';

my @data = qw(Earth Mars Earth Venus Earth Mars);

#my %seen;
#my @unique = grep { !$seen{$_}++ } @data;

my @unique = do { my %seen; grep { !$seen{$_}++ } @data };

print "@unique\n"; # Earth Mars Venus

In this version you can even assign the values back to the original array writing:

@data = grep { !$seen{$_}++} @data;

Uniq with List::MoreUtils

  • uniq
  • distinct
  • List::MoreUtils

There are several ways to implement this without using an external module, but why would we want to reinvent the wheel when there is already a good solution in the List::MoreUtils module?

The only problem, but we see it all over the programming world is that this function called "uniq" would return a list of distinct elements instead of the ones that were unique in the original list.

#!/usr/bin/env perl
use strict;
use warnings;
use List::MoreUtils qw(uniq);

my @data = qw(Earth Mars Earth Venus Earth Mars);
my @unique = uniq @data;

print "@unique\n"; # Earth Mars Venus

Closures

A subroutine that generates subroutines based on some input values.
#!/usr/bin/perl
use strict;
use warnings;

sub incrementer {
    my $old = $_[0];
    $_[0] += $_[1];
    return $old;
}

my $x = 23;
my $old = incrementer($x, 19);
print "$old  $x\n"; # 23  42

sub incrementer_23 {
    my $inc = 23;
    my $old = $_[0];
    $_[0] += $inc;
    return $old;
}


#!/usr/bin/perl
use strict;
use warnings;

sub incrementer_generator {
    my ($inc) = @_;

    return sub {
        my $old = $_[0];
        $_[0] += $inc;
        return $old;
    }
}

my $inc19 = incrementer_generator(19);

my $x = 23;
my $old_value = $inc19->($x);
print "$x\n";
print "old after inc19: $old_value\n";


my $inc5 = incrementer_generator(5);
print "old after inc5: " . $inc5->($x) . "\n";
print "$x\n";

my $prev_value = $inc19->($x);
print "old after inc19: $prev_value\n";
print "$x\n";

Output:

42
old after inc19: 23
old after inc5: 42
47
old after inc19: 47
66
package Transformers;
use strict;
use warnings;
use Time::HiRes qw(time);

use Exporter qw(import);
our @EXPORT_OK = qw(show_elapsed_time);

sub show_elapsed_time {
    my (@subs) = @_;
    my $caller = caller();
    for my $sub (@subs) {
        my $name = "${caller}::$sub"; # fully qualified name

        no strict 'refs';
        my $subref = \&$name;
        my $new = sub {
            my $start = time;
            my (@results, $result);
            if (wantarray) {
                @results = $subref->(@_);
            } elsif (defined wantarray) {
                $result = $subref->(@_);
            } else {
                $subref->(@_);
            }
            my $end = time;
            my $elapsed = $end - $start;
            print "Elapsed time of $name: $elapsed\n";
            return wantarray ? @results : $result;
        };

        no warnings 'redefine';
        *{$name} = $new;
    }
    return;
}


1;
use strict;
use warnings;

sub count {
    my ($limit) = @_;
    print "count till $limit\n";
    my $counter = 0;
    $counter++ while $counter < $limit;
}

sub show_elapsed {
    my ($sub) = @_;

    use Time::HiRes qw(time);
    return sub {
        my $start = time;
        $sub->(@_);
        my $end = time;
        my $elapsed = $end - $start;
        print "Elapsed time: $elapsed\n";
    }
}

my $newcount = show_elapsed(\&count);
$newcount->(10000000);
#*count = $newcount;

count(10000000);

use strict;
use warnings;
use lib '.';
use Transformers qw(show_elapsed_time);

show_elapsed_time('count', 'add', 'calc');

sub count {
    my ($limit) = @_;
    print "count till $limit\n";
    my $counter = 0;
    $counter++ while $counter < $limit;
}

sub add {
    my ($x, $y) = @_;
    print "Add $x + $y\n";
    return $x+$y;
}

sub calc {
    my ($x, $y) = @_;
    print "Calc $x $y\n";
    return $x+$y, $x*$y;
}



count(10000000);
my $res = add(2, 3);
print "Res: $res\n";

my @res = calc(2, 3);
print "Res: @res\n";

Perl Libraries and Modules

Lack of code reuse

  • common functions
  • copy-paste
  • Perl4 solution: create libraries

Perl library (perl4 style)


$base = 10;

sub add {
    validate_parameters(@_);

    my $total = 0;
    $total += $_ for (@_);
    return $total;
}

sub multiply {
}

sub validate_parameters {
    die 'Not all of them are numbers'
        if  grep {/\D/} @_;
    return 1;
}

1;
#!/usr/bin/perl

require "examples/modules/library.pl";

print add(19, 23);
print "\n";
print "$base\n";

The 1; at the end of the library is needed in order to make sure the compilation of library.pl returns a true value.

Otherwise one could get an error such as this one:

examples/modules/library.pl did not return a true value

Problems with Libraries

  • All or nothing
  • Potential collision (redefine)
  • Global variables

While libraries can help in code reuse they still lack many features.

Prefix everything with unique name


$calc_base = 10;

sub calc_add {
    calc_validate_parameters(@_);

    my $total = 0;
    $total += $_ for (@_);
    return $total;
}

sub calc_multiply {
}

sub calc_validate_parameters {
    die 'Not all of them are numbers'
        if  grep {/\D/} @_;
    return 1;
}

1;

But we always have to use this prefix. Even within the library.

Namespaces

Perl has a solution to this problem using namespaces, also called packages.

#!/usr/bin/perl
use strict;
use warnings;

package Calc;
use strict;
use warnings;

sub add {
    my $total = 0;
    $total += $_ for (@_);
    return $total;
}


package main;

print Calc::add(3, 4), "\n";

# add(3, 4);
# would die with:
# Undefined subroutine &main::add called
#   at examples\modules\namespaces.pl line 20.

Solution with namespace

#!/usr/bin/perl
use strict;
use warnings;

require "examples/modules/namespace_lib.pl";

print Calc::add(3, 4), "\n";

package Calc;
use strict;
use warnings;

my $base = 10;

sub add {
    validate_parameters(@_);

    my $total = 0;
    $total += $_ for (@_);
    return $total;
}

sub multiply {
}

sub validate_parameters {
    die 'Not all of them are numbers'
        if  grep {/\D/} @_;
    return 1;
}

1;

Here, within the Calc namespace you don't have to use the full name, only when you are using it outside.

Modules

We could have placed the package keyword and the code in the main script or we can put several packages in the same external file but the best approach is to put every package in a separate file having the same name as the package itself (case sensitive) and .pm as file extension.

Then we call it a Perl Module.

#!/usr/bin/perl
use strict;
use warnings;

use lib 'examples/modules';

require Calc;

print Calc::add(3, 4), "\n";
package Calc;
use strict;
use warnings;

my $base = 10;

sub add {
    validate_parameters(@_);

    my $total = 0;
    $total += $_ for (@_);
    return $total;
}

sub multiply {
}

sub validate_parameters {
    die 'Not all of them are numbers'
        if  grep {/\D/} @_;
    return 1;
}



1;

  • How did perl find the file Calc.pm ?
  • How could we use add() without the Calc:: ?
  • Why did we write "require" instead of "use"?

Packages, @INC and Namespace hierarchy

perl -V

@INC:
  C:/strawberry/test-perl/lib
  C:/strawberry/test-perl/site/lib
  C:\strawberry\perl\vendor\lib
  .

require Calc; - Calc.pm somewhere in @INC

require Math::Calc; - Math/Calc.pm somewhere in @INC

require Math::Calc::Clever; - Math/Calc/Clever.pm somewhere in @INC

use, require and import

require Math::Calc;
use Math::Calc qw(add);
BEGIN {
    require Math::Calc;
    Math::Calc->import( qw(add) );
}
  • use is executed at compile time, just as a BEGIN block.
  • require is executed at run time so if we don't enclose it in a BEGIN` block it will happen later.
if ($holiday) {
    use Vaction::Mode;
}

The above does not make much sense as the use will load the module at compile time regardless of day.

if ($holiday) {
    require Vacation::Mode;
    Vacation::Mode->import;
}

And we don't even need to call import() if we don't care about that.

Export - Import

In order to eliminate the need to use the full name of a subroutine e.g. Calc::add() we can export it from the module and the user can import it into the namespace of her own code.

(we call it A::Calc as we already had a Calc.pm in the previous slide)

#!/usr/bin/perl
use strict;
use warnings;

use lib 'examples/modules';

use A::Calc;

print add(2, 3), "\n";
package A::Calc;
use strict;
use warnings;

use Exporter qw(import);

our @EXPORT = qw(add multiply);

my $base = 10;

sub add {
    validate_parameters(@_);

    my $total = 0;
    $total += $_ for (@_);
    return $total;
}

sub multiply {
}

sub validate_parameters {
    die 'Not all of them are numbers'
        if  grep {/\D/} @_;
    return 1;
}

1;

Exporter is a standard module that provides the 'import' function called by 'use Module' automatically.

This imports automatically every function (and variable) mentioned in the @EXPORT array. This is nice as the user (the script writer) does not have to think much.

On the other hand it might be bad as the user gets many functions she does not need, just because the module author thinks these functions are useful.

Restrict the import

The user (the script writer) can restrict the list of imported functions but the unsuspecting script write will still get all the functions.

#!/usr/bin/perl
use strict;
use warnings;

use lib 'examples/modules';

use A::Calc qw(add);

print add(2, 3), "\n";

On demand Import

package B::Calc;
use strict;
use warnings;

use Exporter qw(import);

our @EXPORT = qw(add);
our @EXPORT_OK = qw(multiply);

my $base = 10;

sub add {
    validate_parameters(@_);

    my $total = 0;
    $total += $_ for (@_);
    return $total;
}

sub multiply {
}

sub validate_parameters {
    die 'Not all of them are numbers'
        if  grep {/\D/} @_;
    return 1;
}


1;

#!/usr/bin/perl
use strict;
use warnings;

use lib 'examples/modules';

use B::Calc qw(add);

print add(2, 3), "\n";

Importing

use Module;

Every function (and variable) listed in the @EXPORT array is imported automatically.

use Module ();

Nothing is imported.

use Module qw(foo bar);

Functions foo() and bar() are imported, nothing else. Any function from the @EXPORT and @EXPORT_OK arrays can be requested to be imported. There is also an %EXPORT_TAGS that can be used to define groups of functions to be imported.

Modules - behind the scene

  • The functionality of Cwd is implemented in a file called Cwd.pm
  • @INC the list of libraries
  • perl -V
  • perldoc Cwd documentation
  • perldoc -l Cwd location /usr/lib/perl5/5.8.6/i386-linux-thread-multi/Cwd.pm
  • perldoc -m Cwd module content

Examples of modules that export functions. Look at their source code

  • Cwd
  • File::Basename
  • File::Spec
  • File::Spec::Functions

Tools for packaging and distribution

Packaging modules

The semi-standard directory structure of CPAN modules can be also very useful for any Perl application:

  dir/

     Makefile.PL
     Build.PL

     dist.ini


     README
     CHANGES
     MANIFEST
     MANIFEST.SKIP
     META.yml
     META.json

     lib/
        Application/Name.pm
        Application/Name/...
     script/
        application.pl

     t/
     xt/

     sample/

     share/
     templates/
     views/

Makefile.PL of ExtUtils::MakeMaker

use strict;
use warnings;

use 5.008;

use ExtUtils::MakeMaker;
WriteMakefile
(
    NAME         => 'App',
    VERSION_FROM => 'lib/App.pm',
    PREREQ_PM    => {
          'File::Basename' => '0',
          'Moose'          => '0.24',
    },
    BUILD_REQUIRES => {
          'Test::More' => '0.47'
    },
    EXE_FILES' => [
           'script/app.pl'
    ],
);

Makefile.PL of Module::Install

use strict;
use warnings;
use 5.008;
use inc::Module::Install 1.00;


name 'App';
license 'perl';
author 'Gabor Szabo';
all_from 'lib/App.pm';
requires 'perl' => '5.008';

requires 'File::Basename';                     # no version number
requires 'Win32' => '0.31' if $^O =~ /win32/i; # conditional requirement

test_requires 'Test::More' => '0.47';

homepage 'http://padre.perlide.org/';
bugtracker 'http://padre.perlide.org/trac/';
repository 'http://svn.perlide.org/padre/trunk/Padre/';

install_script 'script/app.pl';

# install_share;   # will install the share/ directory from the distribution
                   # to be found by File::ShareDir::dist_dir('App') late on

WriteAll;

Build.PL of Module::Build

use strict;
use warnings;
use Module::Build;

use 5.008;

my $builder = Module::Build->new(
    module_name         => 'App',
    license             => 'perl',
    dist_author         => 'Gabor Szabo <gabor@szabgab.com>',
    dist_abstract       => 'Framework to do something',
    create_makefile_pl  => 0,     #'traditional',
    script_files        => 'script/app.pl',
    create_readme       => 0,
    requires            => {
        'File::Basename'      => '0',
        'Moose'               => '0.24',
    },
    build_requires      => {
        'Test::More'          => '0.47',
    },
);

$builder->create_build_script();

Changes and README


v0.02 2007.11.23

    Added feature for doing something
    Fixed strange bug causing trouble (#7)


v0.01 2007.10.12

    Releasing first version of Application
Application for bla

See the documentation of the module 
by typing now perldoc lib/App/Module.pm

or after installation typing

perldoc App::Module

MANIFEST and MANIFEST.SKIP

Build.PL
CHANGES
Makefile.PL
MANIFEST
META.yml
README

script/app.pl
lib/App.pm
t/01-load.t
# Avoid version control files.
\bRCS\b
\bCVS\b
,v$
\B\.svn\b
\B\.cvsignore$

# Avoid Makemaker generated and utility files.
\bMakefile$
\bblib
\bMakeMaker-\d
\bpm_to_blib$
\bblibdirs$
^MANIFEST\.SKIP$

# Avoid Module::Build generated and utility files.
\bBuild$
\bBuild.bat$
\b_build

# Avoid Devel::Cover generated files
\bcover_db

# Avoid temp and backup files.
~$
\.tmp$
\.old$
\.bak$
\#$
\.#
\.rej$

# Avoid OS-specific files/dirs
#   Mac OSX metadata
\B\.DS_Store
#   Mac OSX SMB mount metadata files
\B\._
# Avoid archives of this distribution
\bApp-[\d\.\_]+
^MYMETA.yml$

A script

#!/usr/bin/perl
use strict;
use warnings;

use App qw(run);

run();

# If there is a script in the application
# or if the whole application is one or more scripts
# then this is an example of such a script.

A module

package App;
use strict;
use warnings;
use 5.008;

our $VERSION = '0.01';

sub add {
    my ($x, $y) = @_;
    return $x+$y;
}


=head1 NAME

App - application

=head1 SYNOPSIS

 A quick example for the really inpatient.

=head1 DESCRIPTION

=head2 Methods

=over 4

=item method_a

=item method_b

=back

=head1 BUGS

Probably plenty but nothing I know of. Please report them to the author.

=head1 Development

Instructions to those who wish to participate in the development efforts.
E.g. where is the version control system, where is the development mailing
list or forum (if you have one).

=head1 Thanks

Potential thanks to people who helped you.

=head1 AUTHOR

Gabor Szabo <gabor@szabgab.com>

=head1 COPYRIGHT

Copyright 2006 by Gabor Szabo <gabor@szabgab.com>.

This program is free software; you can redistribute it
and/or modify it under the same terms as Perl itself.

See http://www.perl.com/perl/misc/Artistic.html

=cut

1;


Packaging with Makefile.PL

perl Makefile.PL
make
make test
make manifest
make dist

When installing you'll have to type make install instead of the make manifest and the make dist.

Packaging with Build.PL

perl Build.PL
perl Build
perl Build test
perl Build manifest
perl Build dist

When installing you'll have to type perl Build install instead of the perl Build manifest and the perl Build dist.

A test file

use strict;
use warnings;

use Test::More;

plan tests => 3;

use_ok('App');

ok(App::add(1, 1) == 2, "1+1 = 2");

is(App::add(2, 3),  5, "2+3 = 5");



Writing Tests

ok($condition, $name);

is($actual, $expected, $name);

like($actual, qr/$expected/, $name);

use_ok('Module::Name');

TODO: {
   local $TODO = "Some excuse";
   is(add(1,1,1), 3, "adding 3 numbers");
}

diag "Some message";

diag explain @data_structure;
Test::Most
Test::NoWarnings  or Test::FailWarnings
Test::Exception  or the newer  Test::Fatal
Test::Warn
Test::Deep

Devel::Cover

Exercises: Multiply numbers

  • Write a script with a function to multiply numbers sub multiply.
  • Once it is ready move the multiply subroutine to the library.pl file that already has the add subroutine and write a script that can use it.
  • Create a package called Exercise::Calc that will export the multiply function. Write a script that will use this module and the given functions.
  • Write a few test cases for both the add and multiply functions. Including test cases where you try to add 3 or more numbers. As the tests are failing, mark them as TODO.
  • Fix the add() function so it passes its new tests with 3 or more parameters.
  • Add short user documentation that can be read by perldoc.
  • Create the files necessary for distributions and make sure you can generate the tar.gz file.

Solution: Multiply numbers

sub sum {
    my $sum = 0;
    $sum += $_ for (@_);
    return $sum;
}

sub multiply {
    return 0 if not @_;
    my $res = shift;
    $res *= $_ for (@_);
    return $res;
}

1;

Overview

  • Code reuse
  • Perl4 style was putting functions in library files and "require"-ing the files
  • Danger of redefining other functions
  • Perl Modules are namespaces
  • Main name space is called main, separation is done using :: so variable $x in the main script are actually called $main::x
  • Within a namespace you can use any function
  • This does not redefine any function in the other namespaces

How to create a Perl Module for code reuse?

Error Handling

Returning error status or throwing exception?

foreach my $filename (@names) {
    my $fh = open_file($filename);
    my $data = read_file($fh);
}

What if open fails and the $fh is undef? Who will notice this problem? Only in read_file we will notice it.

  • Developer must work in order to ignore the exception
  • Propagating exception upwards is easy: don't do anything

eval block

  • eval
#!/usr/bin/perl
use strict;
use warnings;

use English qw( -no_match_vars ) ;

my $mode = shift or die "Usage: $0 good|bad|ugly\n";


my $answer;
eval {
    $answer = code($mode);
    print "Answer received\n";
};
if ($EVAL_ERROR) {    # $@
    chomp $EVAL_ERROR;
    if ($EVAL_ERROR eq 'bad code') {
        warn "exception '$EVAL_ERROR' received\n";
    } else {
        warn "Unexpected exception '$EVAL_ERROR' received\n";
        die $EVAL_ERROR;
    }
} else {
    print "The answer is $answer\n";
}



sub code {
    my $mode = shift;
    print "code: $mode\n";
    if ($mode eq "bad") {
        die "bad code\n";
    } elsif ($mode eq 'ugly') {
        die 'Some other error';
    } else {
        return 42;
    }
}

  • Simple
  • built in the language
  • but not powerful
  • cannot pass context of the error, just a string

Exception handling with Try::Tiny

  • Try::Tiny
  • try
  • catch
#!/usr/bin/perl
use strict;
use warnings;

use Try::Tiny;

my $mode = shift or die "Usage: $0 good|bad|ugly\n";


my $answer;
try {
    $answer = code($mode);
    print "Answer received\n";
} catch {
    chomp $_;   # not $@
    if ($_ eq 'bad code') {
        warn "exception '$_' received\n";
    } else {
        warn "Unexpected exception '$_' received\n";
        die $_;
    }
};
if (defined $answer) {
    print "The answer is $answer\n";
}



sub code {
    my $mode = shift;
    print "code: $mode\n";
    if ($mode eq 'bad') {
        die "bad code\n";
    } elsif ($mode eq 'ugly') {
        die 'Some other error';
    } else {
        return 42;
    }
}

Carp

  • Carp
  • carp
  • croak
It is better to use croak and carp instead of die and warn.
#!/usr/bin/perl
use strict;
use warnings;

use Module;

Module::f("x");  # Parameter needs to be a number at Module.pm line 10.

Module::g("x");  # Parameter needs to be a number at carp.pl line 8

Module::h("x");
# Parameter needs to be a number at Module.pm line 24
#        Module::h('x') called at carp.pl line 9
package Module;
use strict;
use warnings;

use Carp qw(carp cluck);

sub f {
    my ($x) = @_;
 
    warn "Parameter needs to be a number" if $x =~ /\D/;
    return $x;
}

sub g {
    my ($x) = @_;
 
    carp "Parameter needs to be a number" if $x =~ /\D/;
    return $x;
}

sub h {
    my ($x) = @_;
 
    cluck "Parameter needs to be a number" if $x =~ /\D/;
    return $x;
}


1;

Carp::Clan

  • Carp::Clan
#!/usr/bin/perl
use strict;
use warnings;

use App::Module;

App::Module::f("x");
# Parameter needs to be a number at App/Code.pm line 11.

App::Module::g("x");
# Parameter needs to be a number at App/Module.pm line 15

App::Module::h("x");
# App::Module::h(): Parameter needs to be a number at carp_clan.pl line 11
package App::Code;
use strict;
use warnings;

use Carp::Clan qw(^App::);
use Carp ();

sub f {
    my ($x) = @_;
 
    warn "Parameter needs to be a number" if $x =~ /\D/;
    return $x;
}

sub g {
    my ($x) = @_;
 
    Carp::carp "Parameter needs to be a number" if $x =~ /\D/;
    return $x;
}

sub h {
    my ($x) = @_;
 
    carp "Parameter needs to be a number" if $x =~ /\D/;
    return $x;
}


1;
package App::Module;
use strict;
use warnings;

use Carp qw(carp cluck);
use App::Code;

sub f {
    my ($x) = @_;
    App::Code::f($x); 
}

sub g {
    my ($x) = @_;
    App::Code::g($x); 
}

sub h {
    my ($x) = @_;
    App::Code::h($x); 
}


1;

The problem with die and croak

In both cases the error string defines the type of the error.
If we want to change the text, then in every place where we catch those
errors we have to update some regular expression.

Cannot easily propagate the context of the error as it is just a string.

Exception::Class

  • Exception::Class
  • throw
#!/usr/bin/perl
use strict;
use warnings;

use English qw( -no_match_vars ) ;
use Carp qw(croak carp);
use Lottery qw(add_lottery_numbers);

# Catching exceptions
eval {
    add_lottery_numbers(@ARGV);
};
if ($EVAL_ERROR) {
    if (Number::Missing->caught) {
        print "$EVAL_ERROR\n";
        print "Usage: $0 NUMBERs  (at least 2 numbers are needed)\n";
        exit;
    }
    if (Number::Small->caught() or Number::Big->caught) {
        print "$EVAL_ERROR\n";
        print "Numbers must be between 1-90\n";
        exit;
    }

    # error that I don't know how to handle:
    croak $EVAL_ERROR;
}
print "Number was ok\n";


package Lottery;
use strict;
use warnings;

use Lottery::Exceptions;
use Exporter qw(import);
our @EXPORT = ('add_lottery_numbers');

# Throwing exceptions
sub check_number {
    my ($num) = @_;

    if ($num < 1) {
        Number::Small->throw(number => $num);
    }
    if ($num > 90) {
        Number::Big->throw(number => $num);
    }
}

sub add_lottery_numbers {
    my (@numbers) = @_;

    if (@numbers < 2) {
        Number::Missing->throw();
    }

    foreach my $n (@numbers) {
       check_number($n);
    }
    submit_numbers(@numbers);
}

sub submit_numbers {
}

1;
use Exception::Class (
    'Number::Small' => {
        fields => [ 'number' ],
    },
    'Number::Big' => {
        fields => [ 'number' ],
    },
    'Number::Missing' => {},
);

sub Number::Small::full_message {
    my ($self) = @_;
    return "number " . $self->number . " is too small";
}

sub Number::Big::full_message {
    my ($self) = @_;
    return "number " . $self->number . " is too big";
}
sub Number::Missing::full_message {
    my ($self) = @_;
    return "a number is missing";
}

1;

Processing exceptions

Make sure the exceptions are checked in growing order of generality. First check for the most specific exceptions then for the more generic ones.

Exercise

Take the examples/error/exception.pl file and add an exception class that checks if the given value was a number or not.

Add an exception class called Number::Bad and make Number::Small and Number::Big subclasses of that class.

Solution

#!/usr/bin/perl
use strict;
use warnings;

use English qw( -no_match_vars ) ;
use Carp qw(croak carp);
use Exception::Class (
    'Number::Small' => {
        fields => [ 'number' ],
    },
    'Number::Big' => {
        fields => [ 'number' ],
    },
    'Number::Missing' => {},
    'Number::Invalid' => {
        fields => [ 'input' ],
    },
);

sub Number::Invalid::full_message {
    my ($self) = @_;
    return
        sprintf("We received '%s' while the input must be a number.",
        $self->input);
}

sub Number::Small::full_message {
    my ($self) = @_;
    return "number " . $self->number . " is too small";
}

sub Number::Big::full_message {
    my ($self) = @_;
    return "number " . $self->number . " is too big";
}
sub Number::Missing::full_message {
    my ($self) = @_;
    return "number is missing";
}



# Throwing exceptions
sub check_number {
    my ($num) = @_;
    if (not defined $num) {
        Number::Missing->throw();
    }
    if ($num !~ /^\d+$/) {
        Number::Invalid->throw(input => $num);
    }

    if ($num < 0) {
        Number::Small->throw(number => $num);
    }
    if ($num > 100) {
        Number::Big->throw(number => $num);
    }
}


# Catching exceptions
eval {
    check_number(@ARGV);
};
if ($EVAL_ERROR) {
    if (Number::Missing->caught) {
        print "$EVAL_ERROR\n";
        print "Usage: $0 NUMBER\n";
        exit;
    }
    if (Number::Small->caught() or Number::Big->caught) {
        print "$EVAL_ERROR\n";
        print "Number must be between 0-100\n";
        exit;
    }

    # error that I don't know how to handle:
    croak $EVAL_ERROR;
}
print "Number was ok\n";


Solution

#!/usr/bin/perl
use strict;
use warnings;

use English qw( -no_match_vars ) ;
use Carp qw(croak carp);
use Exception::Class (
    'Number::Bad'  => {
        fields => [ 'number' ],
    }
    'Number::Small' => {
        isa => 'Number::Bad',
    },
    'Number::Big' => {
        isa => 'Number::Bad',
    },
    'Number::Missing' => {},
    'Number::Invalid' => {
        fields => [ 'input' ],
    },
);

sub Number::Invalid::full_message {
    my ($self) = @_;
    return
        sprintf("We received '%s' while the input must be a number.",
        $self->input);
}

sub Number::Bad::full_message {
    my ($self) = @_;
    return "number " . $self->number . " is not correct";
}

sub Number::Small::full_message {
    my ($self) = @_;
    return "number " . $self->number . " is too small";
}

sub Number::Big::full_message {
    my ($self) = @_;
    return "number " . $self->number . " is too big";
}
sub Number::Missing::full_message {
    my ($self) = @_;
    return "number is missing";
}



# Throwing exceptions
sub check_number {
    my ($num) = @_;
    if (not defined $num) {
        Number::Missing->throw();
    }
    if ($num !~ /^\d+$/) {
        Number::Invalid->throw(input => $num);
    }

    if ($num < 0) {
        Number::Small->throw(number => $num);
    }
    if ($num > 100) {
        Number::Big->throw(number => $num);
    }
}


# Catching exceptions
eval {
    check_number(@ARGV);
};
if ($EVAL_ERROR) {
    if (Number::Missing->caught) {
        print "$EVAL_ERROR\n";
        print "Usage: $0 NUMBER\n";
        exit;
    }
    if (Number::Small->caught() or Number::Big->caught) {
        print "$EVAL_ERROR\n";
        print "Number must be between 0-100\n";
        exit;
    }

    # error that I don't know how to handle:
    croak $EVAL_ERROR;
}
print "Number was ok\n";


Introduction to Moose

Moose Constructor

use strict;
use warnings;
use v5.10;

use Person;

my $teacher = Person->new;

package Person;
use Moose;

1;

Test Moose Constructor

use strict;
use warnings;
use v5.10;

use Test::More tests => 1;

use Person;

my $p = Person->new;
isa_ok($p, 'Person');

Moose Attribute

  • attribute
use strict;
use warnings;
use v5.10;

use Person;

my $teacher = Person->new;

$teacher->name('Foo');

say $teacher->name;

package Person;
use Moose;

has 'name' => (is => 'rw');

1;

use strict;
use warnings;
use v5.10;

use Person;

my $teacher = Person->new( name => 'Foo' );

say $teacher->name;

Test Moose Attribute

use strict;
use warnings;
use v5.10;

use Test::More tests => 5;

use Person;

my $p = Person->new;
isa_ok($p, 'Person');

is($p->name('Foo'), 'Foo', 'setter');
is($p->name, 'Foo', 'getter');

my $o = Person->new( name => 'Bar' );
isa_ok($o, 'Person');
is($o->name, 'Bar');

Moose Attribute Type

  • type
use strict;
use warnings;
use v5.10;

use Person;

my $student = Person->new( name => 'Foo' );

$student->year(1988);

say $student->year;

$student->year('23 years ago');
package Person;
use Moose;

has 'name' => (is => 'rw');
has 'year' => (isa => 'Int', is => 'rw');

1;

Attribute (year) does not pass the type constraint because:
   Validation failed for 'Int' with value "23 years ago"
       at accessor Person::year (defined at lib/Person.pm line 5) line 4
   Person::year('Person=HASH(0x19a4120)', '23 years ago')
       called at script/person.pl line 13

Test Moose Attribute Type

use strict;
use warnings;
use v5.10;

use Test::More tests => 6;
use Test::Exception;

use Person;

my $p = Person->new;
isa_ok($p, 'Person');

is($p->name('Foo'), 'Foo', 'setter');
is($p->name, 'Foo', 'getter');

is($p->year(2000), 2000);
is($p->year, 2000);

my $def_err  = qr{Attribute \(year\) does not pass the type constraint because:};
my $home_err = qr{Validation failed for 'Int' with value 23 years ago};

throws_ok { $p->year('23 years ago') } qr{$def_err $home_err}, 'exception';

Moose Attribute Type class

  • type constraint
use strict;
use warnings;
use v5.10;

use Person;
use DateTime;

my $student = Person->new( name => 'Foo' );

$student->birthday( DateTime->new( year => 1988, month => 4, day => 17) );

say $student->birthday;

$student->birthday(1988);
package Person;
use Moose;

has 'name'     => (is => 'rw');
has 'birthday' => (isa => 'DateTime', is => 'rw');

1;

Attribute (birthday) does not pass the type constraint because: 
    Validation failed for 'DateTime' with value 1988
       at accessor Person::birthday (defined at lib/Person.pm line 5) line 4
    Person::birthday('Person=HASH(0x2143928)', 1988)
       called at script/person.pl line 14

Test Moose Attribute Type class

use strict;
use warnings;
use v5.10;

use DateTime;

use Test::More tests => 7;
use Test::Exception;

use Person;

my $p = Person->new;
isa_ok($p, 'Person');

is($p->name('Foo'), 'Foo', 'setter');
is($p->name, 'Foo', 'getter');

isa_ok(
  $p->birthday( DateTime->new( year => 1988, month => 4, day => 17) ),
  'DateTime');

my $d = $p->birthday;
isa_ok($d, 'DateTime');
is($d->year, 1988, 'year is correct');

my $default_err =
  qr{Attribute \(birthday\) does not pass the type constraint because:};
my $homemade_err =
  qr{Validation failed for 'DateTime' with value 1988};

throws_ok { $p->birthday( 1988 ) }
   qr{$default_err $homemade_err}, 'Year as birthday';

Moose Attribute - create your own subtype

  • subtype
use strict;
use warnings;
use v5.10;

use Person;
use DateTime;

my $student = Person->new( name => 'Foo' );

$student->sex('m');
say $student->sex;

$student->sex('male');
package Person;
use Moose;
use Moose::Util::TypeConstraints;

subtype 'Person::Type::Sex'
  => as 'Str'
  => where { $_ eq 'f' or $_ eq 'm' }
  => message { "($_) is not a valid sex. Valid values are 'f' and 'm'." };

has 'name'     => (is => 'rw');
has 'birthday' => (isa => 'DateTime', is => 'rw');
has 'sex'      => (isa => 'Person::Type::Sex', is => 'rw');

1;

Attribute (sex) does not pass the type constraint because:
  (male) is not a valid sex.
  Valid values are 'f' and 'm'. at script\person.pl line 13

Test Moose Attribute - create your own subtype

use strict;
use warnings;
use v5.10;

use Test::More tests => 8;
use Test::Exception;

use Person;

my $p = Person->new;
isa_ok($p, 'Person');

is($p->name('Foo'), 'Foo', 'setter');
is($p->name, 'Foo', 'getter');

is($p->sex('m'), 'm', 'set m');
is($p->sex('f'), 'f', 'set f');

throws_ok { $p->sex('male') }
    qr{Attribute \(sex\) does not pass the type constraint because:};
throws_ok { $p->sex('M') }
    qr{Attribute \(sex\) does not pass the type constraint because:};
throws_ok { $p->sex('other') }
    qr{Attribute \(sex\) does not pass the type constraint because:};

Moose Attribute - coercion

  • coercion
use strict;
use warnings;
use v5.10;

use Person;
use DateTime;

my $student = Person->new( name => 'Foo' );

$student->sex('m');        # should be accepted as 'm'
say $student->sex;

$student->sex('female');   # should be accepted as 'f'
say $student->sex;

$student->sex('other');    # should not be accepted

package Person;
use Moose;
use Moose::Util::TypeConstraints;

subtype 'Person::Type::Sex'
    => as 'Str'
    => where { $_ eq 'f' or $_ eq 'm' }
    => message { "The sex you provided ($_) is not valid. " .
        "Valid values are 'f' and 'm'." };

coerce 'Person::Type::Sex'
    => from 'Str'
    => via { lc substr($_, 0, 1) };

has 'name'     => (is => 'rw');
has 'birthday' => (isa => 'DateTime', is => 'rw');
has 'sex'      => (isa => 'Person::Type::Sex', is => 'rw', coerce => 1);

1;

Attribute (sex) does not pass the type constraint because: 
   The sex you provided (o) is not valid. Valid values are 'f' and 'm'. 
   at accessor Person::sex (defined at lib/Person.pm line 16) line 8
     Person::sex('Person=HASH(0x24ff918)', 'other')
       called at script/person.pl line 16

Test Moose Attribute - coercion

use strict;
use warnings;
use v5.10;

use Test::More tests => 12;
use Test::Exception;

use Person;

my $p = Person->new;
isa_ok($p, 'Person');

is($p->name('Foo'), 'Foo', 'setter');
is($p->name, 'Foo', 'getter');

is($p->sex('m'), 'm', 'set m');
is($p->sex, 'm',      'get m');
is($p->sex('f'), 'f', 'set f');
is($p->sex, 'f',      'get f');

is($p->sex('M'), 'm', 'set M');
is($p->sex, 'm',      'get m');

is($p->sex('male'), 'm', 'set male');
is($p->sex, 'm',      'get m');
throws_ok { $p->sex('other') }
    qr{Attribute \(sex\) does not pass the type constraint because:};

Moose Enumeration

package Person;
use Moose;
use Moose::Util::TypeConstraints;

enum 'Person::Type::Sex' => [ qw(f m) ];

coerce 'Person::Type::Sex'
    => from 'Str'
    => via { lc substr($_, 0, 1) };

has 'name'     => (is => 'rw');
has 'birthday' => (isa => 'DateTime', is => 'rw');
has 'sex'      => (isa => 'Person::Type::Sex', is => 'rw', coerce => 1);

1;

Moose Attributes Overview

# accessors, mutators, getters, setters
has 'x' => (is => 'rw');
has 'x' => (is => 'ro');
has 'x' => (is => 'bare');

# types
has 'x' => (is => 'rw', isa => 'Int');
has 'y' => (is => 'rw', isa => 'Str');

# required
has 'z' => (
    is => 'ro',
    isa => 'Str',
    required => 1,
);

# default
has 'x' => (
    is => 'rw',
    isa => 'Int',
    default => 42,
);
has 'names' => (
    is => 'rw',
    isa => 'HashRef',
    default => sub { {} },
);

has 'names' => (
    is => 'rw',
    isa => 'HashRef',
    builder => '_build_name',
);
sub _build_name {
    my $self = shift;
	# ...
	return {};
}

Inheritance

Inheritance is declared in Moose using the "extends" keyword. Multiple inheritance is allowed.

use strict;
use warnings;
use v5.10;

use Person;
use Employee;

my $student = Person->new( name => 'Foo' );
say $student->name;        # Foo

my $programmer = Employee->new( name => 'Bar', employer => 'Perl Corp' );
say $programmer->name;     # Bar
say $programmer->employer; # Perl Corp

my $unemployed = Person->new( name => 'Foo', employer => 'Java Corp' );
say $unemployed->name;     # Foo
say $unemployed->employer;
# Exception:
# Can't locate object method "employer" via package "Person"
package Employee;
use Moose;

extends 'Person';

has 'employer'   => (is => 'rw');

1;

Testing Inheritance

use strict;
use warnings;
use v5.10;

use Test::More tests => 12+2+2;
use Test::Exception;

use Employee;

my $e = Employee->new;
isa_ok($e, 'Employee');

is($e->name('Foo'), 'Foo', 'setter');
is($e->name, 'Foo', 'getter');

is($e->sex('m'), 'm', 'set m');
is($e->sex, 'm',      'get m');
is($e->sex('f'), 'f', 'set f');
is($e->sex, 'f',      'get f');

is($e->sex('M'), 'm', 'set M');
is($e->sex, 'm',      'get m');

is($e->sex('male'), 'm', 'set male');
is($e->sex, 'm',      'get m');
throws_ok { $e->sex('other') }
    qr{Attribute \(sex\) does not pass the type constraint because:},
   'sex is f or m';


is($e->employer('Acme Corporation'), 'Acme Corporation', 'set employer');
is($e->employer, 'Acme Corporation', 'get employer');


use Person;
my $p = Person->new(employee => 'Acme Corp');
isa_ok($p, 'Person');
throws_ok { $p->employee }
    qr{Can't locate object method "employee" via package "Person"},
    'no employee method';

MooseX::StrictConstructor

use strict;
use warnings;
use v5.10;

use Person;
use Employee;

my $student = Person->new( name => 'Foo' );
say $student->name;        # Foo

my $programmer = Employee->new( name => 'Bar', employer => 'Perl Corp' );
say $programmer->name;     # Bar
say $programmer->employer; # Perl Corp

my $unemployed = Person->new( name => 'Foo', employer => 'Java Corp' );
# Exception:
# Found unknown attribute(s) init_arg passed to the constructor: employer ...

package Person;
use Moose;
use Moose::Util::TypeConstraints;
use MooseX::StrictConstructor;

enum 'Person::Type::Sex' => [ qw(f m) ];

coerce 'Person::Type::Sex'
    => from 'Str'
    => via { lc substr($_, 0, 1) };

has 'name'     => (is => 'rw');
has 'birthday' => (isa => 'DateTime', is => 'rw');
has 'sex'      => (isa => 'Person::Type::Sex', is => 'rw', coerce => 1);

1;

Testing Strict attributes

use strict;
use warnings;
use v5.10;

use Test::More tests => 12+2+2;
use Test::Exception;

use Employee;

my $e = Employee->new;
isa_ok($e, 'Employee');

is($e->name('Foo'), 'Foo', 'setter');
is($e->name, 'Foo', 'getter');

is($e->sex('m'), 'm', 'set m');
is($e->sex, 'm',      'get m');
is($e->sex('f'), 'f', 'set f');
is($e->sex, 'f',      'get f');

is($e->sex('M'), 'm', 'set M');
is($e->sex, 'm',      'get m');

is($e->sex('male'), 'm', 'set male');
is($e->sex, 'm',      'get m');
throws_ok { $e->sex('other') }
    qr{Attribute \(sex\) does not pass the type constraint because:},
    'sex is f or m';


is($e->employer('Acme Corporation'), 'Acme Corporation', 'set employer');
is($e->employer, 'Acme Corporation', 'get employer');


use Person;
throws_ok { Person->new(employee => 'Acme Corp') }
    qr{Found unknown attribute},
    'MooseX::StrictConstructor';
#isa_ok($p, 'Person');
#throws_ok { $p->employee }
#    qr{Can't locate object method "employee" via package "Person"},
#    'no employee method';

throws_ok { Employee->new(employee => 'Acme Corp') }
    qr{Found unknown attribute},
    'MooseX::StrictConstructor';

Encapsulation

There is no real enforced encapsulation in Moose either. The user still receives a hash reference and nothing stops her from poking around the details. Nothing except her good will.

Class data

Variables declared within the class using my will remain within the class. They won't be associated with any of the objects. Methods of the class will be able to access them.

Special actions during object construction

# called after the constructor has been called
sub BUILD {
	my $self = shift;
};

Singleton in Moose

MooseX::Singleton;

A game - the main class or the board is a singleton

Configuration

Database access

use strict;
use warnings;

use Test::More;
plan tests => 2;

use Games::Spacefight;

my $game = Games::Spacefight->new;
isa_ok($game, 'Games::Spacefight');

my $game2 = Games::Spacefight->new;
is $game2, $game, 'Singleton';


package Games::Spacefight;
use Moose;
use MooseX::Singleton;

1;

Destructor in Moose

Usually you don't need to implement a destructor in Perl. but in case you do DESTROY is the name in standard perl and DEMOLISH in Moose.

Object Oriented Perl using Moose

Running subprocess

Possibilities

  • system
  • qx, backticks
  • open "...|" and open "|...."
  • IPC::Open2
  • IPC::Open3
  • IPC::Run
  • IPC::Run3
  • Expect

See the documentation of IPC::Run3 for a comparison

Prima

Prima Hello World

use strict;
use warnings;

use Prima qw(Application Buttons);

Prima::MainWindow->new(
        text     => 'Hello world!',
        size     => [ 200, 200],
)-> insert( Button =>
        centered => 1,
        text     => 'Hello world!',
        onClick  => sub { $::application-> close },
);

run Prima;

wxPerl - wxWidgets for Perl

wxPerl - Hello World

{% embed include file="src/examples/wxperl/hello_world.pl)

See more examples in https://github.com/PadreIDE/Padre/tree/master/share/examples/wx

Microsoft Windows

Win32 examples

use 5.010;
use Modern::Perl;
use Win32::EventLog;

my $h = Win32::EventLog->new('System', $ENV{ComputerName}) or die;
my $recs;
my $base;
$h->GetNumber($recs);
say $recs;

$h->GetOldest($base);
say $base;
use 5.010;
use Modern::Perl;
use Win32::GUI;


use 5.010;
use Modern::Perl;
use Win32::IPConfig;

#my $host = '127.0.0.1';
my $host = 'dwarf';
my $ipconfig = Win32::IPConfig->new($host) or die;

say 'hostname=', $ipconfig->get_hostname;
say 'domain=',   $ipconfig->get_domain;

use 5.010;
use Modern::Perl;

use Win32;


# Win32::MsgBox('hello');

my $resp = Win32::MsgBox('hello', 3, 'Some title');
Win32::MsgBox($resp);
use 5.010;
use Modern::Perl;
use Email::Send::SMTP::Gmail;

my $mail=Email::Send::SMTP::Gmail->new( 
      -smtp   =>'gmail.com',
      -login  =>'szabgab@gmail.com',
      -pass   =>'',
#      -debug  => 0,
);

$mail->send(-to=>'szabgab@gmail.com',
               -subject  =>'Hello! 2',
               -verbose  => 0,
               -body     =>'Just testing it',
           #    -attachments=>'full_path_to_file',
           );

$mail->bye;
say 'DONE';
use 5.010;
use Modern::Perl;
use File::Find::Rule;

my $rule = File::Find::Rule->file()
                           ->name( '*.log' )
                           ->maxdepth(1)
                           ->size( '>7K' )
                           ->start( 'C:\Windows' );
                           
while (my $file = $rule->match) {
	say $file, '  ', -s $file;
}
say 'DONE';


# list all the files ordered by date
# ordered by the number of 'Error' strings in each file


PSGI

PSGI Cookbook

Parallel processing

Types of problems

  • CPU intensive application - use more of the cores to reduce the wallclock time.
  • IO intensive applications - don't waste the CPU and wallclock time while waiting for the IO process.
  • Interactive applications - make sure they are responsive during long operations.

Types of solutions

  • Number of processes (forking on Unix or spawning)
  • Number of threads (Single threaded vs Multi-threaded)
  • Asynchronous, non-blocking or synchronous vs blocking (aka "normal") Cooperative Multitasking

Tasks

  • count.pl - count numbers CPU intensive
  • process_csv.pl - mostly CPU intensive but also reading files
  • httpget.pl - Download web pages and extract title - mostly IO intensive

Measurements on 32 core

On the 32 core CPU-Optimized server on Digital Ocean

perl count.pl 0 30 10000000
    Elapsed time 7.92
perl count.pl 30 30 10000000
    Elapsed time 0.45

perl process_csv.pl 0 30
    Elapsed time 38.01
perl process_csv.pl 30 30
    Elapsed time 3.02

perl httpget.pl wikipedia.txt 0 80
    Elapsed time 12.93
perl httpget.pl wikipedia.txt 20 80
    Elapsed time 4.06
perl httpget.pl wikipedia.txt 80 80
    Elapsed time 1.11

Measurements on 4 core

perl count.pl 0 12 40000000
    Elapsed time 13.57
perl count.pl -1 12 40000000
    Elapsed time 5.78
perl count.pl 2 12 40000000
    Elapsed time 6.85
perl count.pl 3 12 40000000
    Elapsed time 6.42
perl count.pl 12 12 40000000
    Elapsed time 5.79
perl process_csv.pl 0 6
    Elapsed time 14.91
perl process_csv.pl 2 6
    Elapsed time 10.25
perl process_csv.pl 3 6
    Elapsed time 8.30
perl process_csv.pl 6 6
    Elapsed time 8.71
perl httpget.pl wikipedia.txt 0 20
    Elapsed time 18.72
perl httpget.pl wikipedia.txt 10 20
    Elapsed time 4.17

The Hardware

lscpu
cat /proc/cpuinfo
htop

Installations

cpanm Parallel::ForkManager
cpanm LWP::Simple
cpanm HTML::TreeBuilder::XPath

cpanm Win32::Getppid

Ubuntu system perl:

libwww-perl
libparallel-forkmanager-perl
libhtml-treebuilder-xpath-perl

Fork

  • fork
use strict;
use warnings;

main();

sub main {
    my $pid = fork();
    die "Could not fork" if not defined $pid;

    if ($pid) {
        parent_process();
    } else {
        child_process();
    }

}

sub child_process {
    print "In child process\n";
    exit;
}

sub parent_process {
    print "In parent process\n";
    my $finished = wait();
}

Fork with functions details

  • fork
use strict;
use warnings;

my $historical = 42;
main();

sub main {
    print "PID before fork $$\n";
    my $pid = fork();
    die "Could not fork" if not defined $pid;

    if ($pid) {
        print "PID of child: $pid\n";
        parent_process();
    } else {
        print "PID received in child: $pid\n";
        child_process();
    }

}

sub child_process {
    #sleep 1;
    my $ppid = getppid();
    print "In child process $$ of parent $ppid\n";
    print "value in child: $historical\n";
    $historical = 23;
    #my $x = 0;
    #my $y = 3 / $x;
    #die "bad thing";
    exit 42;
}

sub parent_process {
    #sleep 1;
    print "value in parent $historical\n";
    my $ppid = getppid();
    print "In parent process $$ of parent $ppid\n";
    my $finished = wait();
    my $exit_code = $? / 256;
    print "finished: $finished\n";
    print "exit code: $exit_code\n";
    print "value in parent $historical\n";
}

Fork random and seed

  • rand
  • srand
use strict;
use warnings;

srand 42;
print rand(), "\n";
print rand(), "\n";
print rand(), "\n";
#exit;

main();

sub main {
    my $pid = fork();
    die "Could not fork" if not defined $pid;

    if ($pid) {
        parent_process();
    } else {
        child_process();
    }

}

sub child_process {
    #srand;
    print "In child process\n";
    print "child ", rand(), "\n";
    print "child ", rand(), "\n";
    print "child ", rand(), "\n";
    exit;
}

sub parent_process {
    print "In parent process\n";
    print "parent ", rand(), "\n";
    print "parent ", rand(), "\n";
    print "parent ", rand(), "\n";
    my $finished = wait();
}

Fork without functions

  • fork
use strict;
use warnings;

main();

sub main {
    my $pid = fork();
    die "Could not fork" if not defined $pid;

    if (not $pid) {
        print "In child process\n";
        exit;
    }

    print "In parent process\n";
    my $finished = wait();
}

Fork details

use strict;
use warnings;

BEGIN {
    if ($^O eq "MSWin32") {
        print "Running on Windows\n";
        require Win32::Getppid;
        import Win32::Getppid qw(getppid);
    }
}

main();

sub main {
    my $shared = 42;

    my $pid = fork();
    die "Could not fork" if not defined $pid;
    if (not $pid) {
        sleep 1;
        my $parent = getppid();
        print "In Child:  PID: $$ from parnet: $parent\n";;
        print "Shared value: $shared\n";
        $shared = $$;
        exit;
    }
    sleep 1;
    print "In Parent: PID: $$ Child: $pid\n";
    my $finished = wait();
    my $exit_code = $? / 256;
    print "Child finised: $finished Exit code: $exit_code\n";
    print "Shared: $shared\n";
}

Fork many

use strict;
use warnings;

BEGIN {
    if ($^O eq "MSWin32") {
        print "Running on Windows\n";
        require Win32::Getppid;
        import Win32::Getppid qw(getppid);
    }
}

main();


sub main {
    my ($forks) = @ARGV;
    die "Usage: $0 FORKS\n" if not $forks;

    my $shared = 42;

    for my $id (1 .. $forks) {
        my $pid = fork();
        die "Could not fork" if not defined $pid;
        if (not $pid) {
            sleep 1;
            my $parent = getppid();
            print "Fork number $id, In Child:  PID: $$ from parnet: $parent\n";;
            print "Shared value: $shared\n";
            $shared = $$;
            exit;
        }
        print "In Parent: PID: $$ Child created: $pid\n";
    }
    sleep 1;
    print "In Parent: PID: $$\n";
    print "Shared value: $shared\n";
    for (1 .. $forks) {
        my $finished = wait();
        my $exit_code = $? / 256;
        print "Child finised: $finished with exit code $exit_code\n";
    }
}

Parent process ID

use strict;
use warnings;

BEGIN {
    if ($^O eq "MSWin32") {
        print "Running on Windows\n";
        require Win32::Getppid;
        import Win32::Getppid qw(getppid);
    }
}

main();

sub main {
    my $parent = getppid();
    print "PID: $$ from parnet: $parent\n";
}

pstree

pstree -p -a

htop

htop
  • H
  • F4 to filter

Active non-blocking waiting with waitpid

  • waitpid
  • POSIX
  • WNOHANG

Up till now we were usig the wait function to wait for a child process to terminate. It is a blocking call that will wait till any of the child processes terminates.

There are other options as well. Using the waitpid function you could wait for a specific child to terminate using its PID or you can have a non-blocking way to check if there is any child-processes that has already terminated. The non-blocking wait mode allows the parent process to do other things while waiting for the child processes to do their job.

use strict;
use warnings;
use Time::HiRes qw(sleep);

use POSIX ':sys_wait_h';

main();

sub main {
    my ($sleep, $exit) = @ARGV;

    die "Usage: $0 SLEEP EXIT\n" if not defined $exit;

    my $pid = fork();
    die 'Failed to fork' if not defined $pid;

    if ($pid == 0) {
        print "Child process PID $$\n";
        sleep $sleep;
        exit $exit;
    }

    while (1) {
        my $pid = waitpid(-1, WNOHANG);
        if ($pid > 0) {
            my $exit_code = $?/256;
            print "Child process $pid finished with exit code $exit_code\n";
            last;
        }
        print "Parent could do something else, but now sleeping\n";
        sleep 0.1;
    }
}
perl active_waiting.pl 0 0
perl active_waiting.pl 1 0
perl active_waiting.pl 1 1

Non-blocking waiting with waitpid - multiple forks

In this example we create multiple child processes and wait for them with a non-blocking waitpid. Each process will sleep for a random number of seconds imitating the randomness of the time it takes each one of them to finish doing their job. They also generate a random exit code to imitate that some of them might have failed.

use strict;
use warnings;
use Time::HiRes qw(sleep);
use POSIX ':sys_wait_h';

main();

sub main {
    my ($workers) = @ARGV;

    die "Usage: $0 WORKERS\n" if not defined $workers;
    $| = 1; # disable output buffering on STDOUT

    my %process;
    for my $worker_id (1 .. $workers) {
        my $pid = fork();
        die 'Failed to fork' if not defined $pid;

        if ($pid == 0) {
            my $exit_code = int rand(5);
            my $sleep_time = rand(5);
            print "Child process worker ID: $worker_id  PID $$ will sleep for $sleep_time and then exit with code $exit_code\n";
            sleep $sleep_time;
            exit $exit_code;
        }
        $process{$pid} = $worker_id;
        next;
    }

    while (1) {
        my $pid = waitpid(-1, WNOHANG);
        if ($pid > 0) {
            my $exit_code = $?/256;
            my $worker_id = delete $process{$pid};
            print "Child process $pid worker id $worker_id finished with exit code $exit_code\n";
            next;
        }
        print '.';
        sleep 0.1;
        last if not %process;
    }
}
perl active_waiting_loop.pl 1
perl active_waiting_loop.pl 5

Non-blocking waiting, rerun on failure

In this example we have a list of tasks we need to do. The user can supply the number of child processes that will deal with the tasks. Each child process generates a random number to wait to imitatet the work time and a random number as the exit code.

The parent monitors the child processes. If one of them exits with a non-zero error code the parent re-runs that job with another child process until all the tasks are done.

use strict;
use warnings;
use Time::HiRes qw(sleep);
use POSIX ':sys_wait_h';

main();

sub main {
    my ($workers) = @ARGV;
    die "Usage: $0 WORKERS\n" if not defined $workers;
    #my @tasks = 'a' .. 'z';
    my @tasks = 'a' .. 'd';

    my %process;
    while (1) {
        if (@tasks and scalar(keys %process) < $workers) {
            my $task = shift @tasks;

            my $pid = fork();
            die 'Failed to fork' if not defined $pid;

            if ($pid == 0) {
                my $exit_code = int rand(3);
                my $sleep_time = rand(5);
                print "Child process Task: $task PID $$ will sleep for $sleep_time and then exit with code $exit_code\n";
                sleep $sleep_time;
                exit $exit_code;
            }
            $process{$pid} = $task;
            next;
        }

        my $pid = waitpid(-1, WNOHANG);
        if ($pid > 0) {
            my $exit_code = $?/256;
            my $task = delete $process{$pid};
            print "Child process $pid task $task finished with exit code $exit_code\n";
            if ($exit_code > 0) {
                unshift @tasks, $task;
            }
            next;
        }
        sleep 0.1;
        last if not %process;
    }
}
Child process Task: a PID 128387 will sleep for 3.71995377981634 and then exit with code 2
Child process Task: b PID 128388 will sleep for 0.137503658640838 and then exit with code 0
Child process Task: c PID 128389 will sleep for 3.57264931009681 and then exit with code 1
Child process 128388 task b finished with exit code 0
Child process Task: d PID 128390 will sleep for 0.940422063447244 and then exit with code 0
Child process 128390 task d finished with exit code 0
Child process 128387 task a finished with exit code 2
Child process 128389 task c finished with exit code 1
Child process Task: a PID 128391 will sleep for 3.09917882712156 and then exit with code 2
Child process Task: c PID 128392 will sleep for 1.52811677938857 and then exit with code 2
Child process 128392 task c finished with exit code 2
Child process Task: c PID 128393 will sleep for 1.21268558593997 and then exit with code 0
Child process 128393 task c finished with exit code 0
Child process 128391 task a finished with exit code 2
Child process Task: a PID 128394 will sleep for 2.05004244389542 and then exit with code 2
Child process 128394 task a finished with exit code 2
Child process Task: a PID 128395 will sleep for 4.95541832222202 and then exit with code 0
Child process 128395 task a finished with exit code 0

Functions to be speed up

package Task;
use strict;
use warnings;
use Time::HiRes qw(time);
use LWP::Simple qw(get);
use HTML::TreeBuilder::XPath;

sub count {
    my ($max) = @_;

    my $counter = 0;
    while ($counter < $max) {
        $counter++;
    }
}

sub process_file {
    my ($file) = @_;

    my $total = 0;
    open my $fh, '<', $file or die;
    while (my $line = <$fh>) {
        chomp $line;
        my @fields = split /,/, $line;
        $total += $fields[2];
    }

    return $total;
}

sub get_title {
    my ($url) = @_;

    my $content = get $url;
    my $tree= HTML::TreeBuilder::XPath->new_from_content($content);
    my $nb = $tree->findvalue( '/html/head/title' );

    return $nb;
}


use lib '.';
use Transformers qw(show_elapsed_time);
show_elapsed_time('count', 'process_file', 'get_title');

1;

Counter process

use strict;
use warnings;
use Time::HiRes qw(time);
use lib '.';
use Task;
use ForkedCounter;

main();


sub main {
    my ($fork_count, $task_count, $max) = @ARGV;
    die "Usage: $0 FORK_COUNT HOW_MANY_TIMES TILL_WHICH_NUMBER\n" if not $max;
    my $start = time;

    if ($fork_count == 0) {
        for my $i (1..$task_count) {
            Task::count($max);
        }
    } elsif ($fork_count == -1) {
        ForkedCounter::counter_one_by_one($task_count, $max);
    } else {
        ForkedCounter::counter($fork_count, $task_count, $max);
    }


    my $end = time;
    my $elapsed = $end-$start;
    printf "Elapsed time %.2f\n", $elapsed;
}


perl count.pl 0 12 40000000

Forked counter process

package ForkedCounter;
use strict;
use warnings;
use Task;


sub counter_one_by_one {
    my ($task_count, $max) = @_;

    for my $fork_id (1 .. $task_count) {
        my $pid = fork();
        die "Could not fork" if not defined $pid;
        if (not $pid) {
            Task::count($max);
            exit;
        }
    }
    for my $fork_id (1 .. $task_count) {
        wait();
    }
}


sub counter {
    my ($fork_count, $task_count, $max) = @_;

    my @tasks;
    push @tasks, 1+int($task_count/$fork_count) for 1 .. ($task_count % $fork_count);
    push @tasks, int($task_count/$fork_count) for 1 .. ($fork_count - ($task_count % $fork_count));
    print "@tasks\n";
    for my $fork_id (1 .. $fork_count) {
        my $pid = fork();
        die "Could not fork" if not defined $pid;
        if (not $pid) {
            #print "PID $pid $tasks[$fork_id-1]\n";
            for my $i (1..$tasks[$fork_id-1]) {
                Task::count($max);
            }
            exit;
        }
    }
    for my $fork_id (1 .. $fork_count) {
        wait();
    }
}

1;

Prepare CSV files

Generate CSV data files to be used by the CSV processing task. Each file has rows with the 3rd column being increasing numbers. The number in the last row of each file contains the seriel number of the file. That will make the sum of the numbers different and we'll be able to verify if the results come from the different files.

use strict;
use warnings;

my ($n, $size) = @ARGV;
die "Usage: $0 NUMBER_OF_FILES  SIZE_OF_FILES\n" if not defined $size;

my $content = "";
my $cnt = 0;
while ($cnt < $size) {
    $cnt++;
    $content .= "a,b,$cnt,c,d\n";
}

for my $ix (1..$n) {
    my $filename = sprintf "data_%02s.csv", $ix;
    #print "$filename\n";
    open my $fh, '>', $filename or die "Could not open $filename\n";
        print $fh $content;
        print $fh "a,b,$ix,c,d\n";
}
perl prepare_files.pl 12 2000000

Process CSV files

We expect two parameters on the command line. The number of parallel processes to run and how many files to process. For parallel 0 means not to use the forking mechanizm at all. We use the number of files instead of accepting the list of files on the command line, becasue it is easier to select a subset of the files this way.

use strict;
use warnings;
use Time::HiRes qw(time);
use lib '.';
use Task;
use ForkedProcessCSV;

main();

sub main {
    my ($parallels, $file_count) = @ARGV;
    die "Usage $0 PARALLELS FILE_COUNT\n"  if not defined $file_count;

    my %results;
    my @files = sort glob "data_*.csv";
    die "Not enough files\n" if $file_count > @files;
    @files = @files[0 .. $file_count-1];
    #print "@files";

    my $start = time;
    if ($parallels == 0) {
        for my $file (@files) {
            my $total = Task::process_file($file);
            $results{$file} = $total;
        }
    } else {
        %results = ForkedProcessCSV::process_csv($parallels, @files);
    }

    for my $file (@files) {
        print "$file $results{$file}\n";
    }

    my $end = time;
    my $elapsed = $end-$start;
    printf "Elapsed time %.2f\n", $elapsed;
}
$ perl process_csv.pl 0 1
Elapsed time 1.51

$ perl process_csv.pl 0 4
Elapsed time 5.92

$ perl process_csv.pl 2 4
Elapsed time 4.02

$ perl process_csv.pl 4 4
Elapsed time 4.01

$ perl process_csv.pl 0 10
Elapsed time 15.18

$ perl process_csv.pl 4 10
Elapsed time 9.05

Use Parallel::ForkManager

use strict;
use warnings;
use Parallel::ForkManager;
use Data::Dumper qw(Dumper);

main();

sub main {
    my ($parallels) = @ARGV;
    die "Usage: $0 PARALLELS\n" if not defined $parallels;

    my $shared = 42;

    my $pm = Parallel::ForkManager->new($parallels);
    foreach my $input (2, 3, 5, 11) {
        my $pid = $pm->start and next;
        print "PID $$ input: $input shared: $shared\n";
        $shared = $$;
        $pm->finish();
    }
    $pm->wait_all_children;

    print "Shared: $shared\n";
}

Return values using Parallel::ForkManager

  • fork
use strict;
use warnings;
use Parallel::ForkManager;
use Data::Dumper qw(Dumper);

main();

sub main {
    my ($parallels) = @ARGV;
    die "Usage: $0 PARALLELS\n" if not defined $parallels;

    my %results;

    my $pm = Parallel::ForkManager->new($parallels);
    $pm->run_on_finish( sub {
        my ($pid, $exit_code, $ident, $exit_signal, $core_dump, $data_structure_reference) = @_;
        my $input = $data_structure_reference->{input};
        $results{$input} = $data_structure_reference->{result};
        #print "Finished PID $pid and exit code: $exit_code\n";
    });
    foreach my $input (2, 3, 5, 11) {
        my $pid = $pm->start and next;
        print "PID $$\n";
        my $result = $input * 2;
        $pm->finish(0, {input => $input, result => $result});
    }
    $pm->wait_all_children;

    print Dumper \%results;
}

Forked process CSV files

package ForkedProcessCSV;
use strict;
use warnings;
use Parallel::ForkManager;
use lib '.';
use Task;

sub process_csv {
    my ($parallels, @files) = @_;

    my %results;
    my $pm = Parallel::ForkManager->new($parallels);
    $pm->run_on_finish( sub {
        my ($pid, $exit_code, $ident, $exit_signal, $core_dump, $data_structure_reference) = @_;
        my $filename = $data_structure_reference->{filename};
        $results{$filename} = $data_structure_reference->{total};
        #print "Finished PID $pid and exit code: $exit_code\n";
    });
    foreach my $file (@files) {
        my $pid = $pm->start and next;
        print "PID $$\n";
        my $total = Task::process_file($file);
        $pm->finish(0, {filename => $file, total => $total});
    }
    $pm->wait_all_children;

    return %results;
}

1;



HTTP GET

use strict;
use warnings;
use Time::HiRes qw(time);
use lib '.';
use Task;
use ForkedHTTP;

binmode(STDOUT, ":utf8");

main();


sub main {
    my ($filename, $parallels, $limit) = @ARGV;
    die "Usage: $0 FILENAME PARALLEL LIMIT\n" if not defined $limit;

    open my $fh, '<', $filename or die;
    my @urls = <$fh>;
    chomp @urls;
    if ($limit and $limit < scalar @urls) {
        @urls = @urls[0..$limit-1];
    }
    #print scalar @urls;

    my %results;
    my $start = time;
    if ($parallels == 0) {
        for my $url (@urls) {
            $results{$url} = Task::get_title($url);
        }
    } else {
        %results = ForkedHTTP::get_titles($parallels, @urls);
    }

    for my $url (@urls) {
        print "$url  $results{$url}\n";
    }

    my $end = time;
    my $elapsed = $end-$start;
    printf "Elapsed time %.2f\n", $elapsed;
}

Forked HTTP requests

package ForkedHTTP;
use strict;
use warnings;
use Parallel::ForkManager;
use lib '.';
use Task;

sub get_titles {
    my ($parallels, @urls) = @_;

    my %results;

    my $pm = Parallel::ForkManager->new($parallels);
    $pm->run_on_finish( sub {
        my ($pid, $exit_code, $ident, $exit_signal, $core_dump, $data_structure_reference) = @_;
        my $url = $data_structure_reference->{url};
        $results{$url} = $data_structure_reference->{title};
    });
    foreach my $url (@urls) {
        my $pid = $pm->start and next;
        print "PID $$\n";
        my $title = Task::get_title($url);
        $pm->finish(0, {url => $url, title => $title});
    }
    $pm->wait_all_children;

    return %results;
}

1;

Exercise: Process Excel files

  • Create N excel files, similar to the CSV files we had in the example and then process them. Both in a serial manner and in parallel.

Exercise: Web crawler

  • Build a crawler that, given a single URL and a number N will visit N pages linked from that page. (Maybe need to also get to the links of the links etc.)
  • You can use HTML::TreeBuilder::XPath to extract the links from the html document.
  • It is probably better to allow only the main process to fork.

MCE - Many-Core Engine

  • MCE

  • fork

  • MCE

  • On Debian/Ubuntu it is called libmce-perl

use strict;
use warnings;

use MCE;

main();


sub main {
    my ($workers) = @ARGV;
    die "Usage: $0 WORKERS\n" if not defined $workers;

    my $mce = MCE->new(
        max_workers => $workers,
        user_func => sub {
            my ($mce) = @_;
            $mce->say("Hello from PID $$ WID " . $mce->wid);
        }
     );

    $mce->run;
}


$ perl use_mce.pl 5
Hello from PID 210834 WID 2
Hello from PID 210833 WID 1
Hello from PID 210835 WID 3
Hello from PID 210836 WID 4
Hello from PID 210837 WID 5

MCE - map running in parallel

The MCE package provides a map-like function that automatically runs the different tasks in separate processes then collects the results in the correct order.

By default it creates 4 child processes, but you can control that and a few other things by calling the init method.

use strict;
use warnings;

use MCE::Map;

main();

sub main {
    print "main PID: $$\n";
    my @results = mce_map { work($_) } 1..10;
    print "Results: @results\n";
}

sub work {
    my ($param) = @_;
    print "Param $param PID: $$\n";
    return $param * $param;
}
main PID: 150164
Param 1 PID: 150168
Param 2 PID: 150168
Param 7 PID: 150167
Param 8 PID: 150167
Param 3 PID: 150165
Param 5 PID: 150166
Param 4 PID: 150165
Param 6 PID: 150166
Param 9 PID: 150168
Param 10 PID: 150168
Results: 1 4 9 16 25 36 49 64 81 100

MCE - map with init

use strict;
use warnings;

use MCE::Map;

main();

sub main {
    MCE::Map->init(
        max_workers => 3, # defaults to 4
        chunk_size => 2, # defaults to 1
    );
    print "main PID: $$\n";
    my @results = mce_map { work($_) } 1..10;
    print "Results: @results\n";
}

sub work {
    my ($param) = @_;
    print "Param $param PID: $$\n";
    #exit 23 if $param == 2;
    return $param * $param;
}
main PID: 210966
Param 3 PID: 210969
Param 1 PID: 210967
Param 5 PID: 210968
Param 4 PID: 210969
Param 2 PID: 210967
Param 6 PID: 210968
Param 7 PID: 210968
Param 9 PID: 210969
Param 8 PID: 210968
Param 10 PID: 210969
Results: 1 4 9 16 25 36 49 64 81 100

other modules

Database access using Perl DBI

Architecture of a DBI Application

  • dbi
  • dbd

             |<- Scope of DBI ->|
                  .-.   .--------------.   .-------------.
  .-------.       | |---| XYZ Driver   |---| XYZ Engine  |
  | Perl  |       | |   `--------------'   `-------------'
  | script|  |A|  |D|   .--------------.   .-------------.
  | using |--|P|--|B|---|Oracle Driver |---|Oracle Engine|
  | DBI   |  |I|  |I|   `--------------'   `-------------'
  | API   |       | |...
  |methods|       | |... Other drivers
  `-------'       | |...
                  `-'


Taken from the DBI documentation.
Any number of databases at the same time.
Probably every RDBMS can be accessed with its own DBD.

While in many areas Perl has plenty of ways to achieve things in the area of database access a single package has evolved as the de-facto standard way of the low level access. The DBI - Database Independent Interface of Perl - is the module that most of the perl applications in the world use as their core.

Between this module and between virtually and relational database we have a Database Driver - a module in the DBD::* namespace. For example there are modules such as DBD::Oracle and DBD::MySQL or DBD::Pg for PostgreSQL.

Above DBI there are hundreds of higher level abstractions. Most of them live in the DBIx::* namespace on CPAN.

Create Sample Database

  • sqlite

For our examples we are going to use an SQLite database. It is simple to install as it comes within DBD::SQLite, its own database driver and it provides everything we need for our examples.

Actually SQLite is a very good database useful in many applications that don't require concurrent write access to the database frequently.

In order to create the sample database run examples/dbi/create_sample.pl

{% embed include file="src/examples/dbi/sample.sql)

Connect to database

  • connect
#!/usr/bin/perl
use strict;
use warnings;

use DBI;

my $dbfile = "sample.db";

my $dsn      = "dbi:SQLite:dbname=$dbfile";
my $user     = "";
my $password = "";
my $dbh = DBI->connect($dsn, $user, $password);

$dbh->disconnect;

Connecting to other databases:

my $dsn = "DBI:mysql:database=$database;host=$hostname;port=$port";
my $dbh = DBI->connect($dsn, $user, $password);

SELECT with one result

  • SELECT
#!/usr/bin/perl
use strict;
use warnings;

use DBI;

my $dbfile = "sample.db";

my $dbh = DBI->connect("dbi:SQLite:dbname=$dbfile");


my $sth = $dbh->prepare('SELECT COUNT(*) FROM people');
$sth->execute;

my ($count) = $sth->fetchrow_array();

print "There are $count number of rows.\n";

SELECT with more results

#!/usr/bin/perl
use strict;
use warnings;

use DBI;

my $dbfile = "sample.db";

my $dbh = DBI->connect("dbi:SQLite:dbname=$dbfile");


my $sth = $dbh->prepare('SELECT fname, lname FROM people');
$sth->execute;

while (my @row = $sth->fetchrow_array()) {
    print "$row[0] $row[1]\n";
}

SELECT, prepare with placeholders

  • placeholder
#!/usr/bin/perl
use strict;
use warnings;

use DBI;

my $dbfile = "sample.db";

my $dbh = DBI->connect("dbi:SQLite:dbname=$dbfile");

my $id = shift;
$id = 1 if not defined $id;

my $sth = $dbh->prepare('SELECT fname, lname FROM people WHERE id <= ?');
$sth->execute($id);

while (my @row = $sth->fetchrow_array()) {
    print "$row[0] $row[1]\n";
}

SELECT, using hashref

  • NAME_lc
#!/usr/bin/perl
use strict;
use warnings;

use DBI;

my $dbfile = "sample.db";

my $dbh = DBI->connect("dbi:SQLite:dbname=$dbfile");

my $id = shift;
$id = 1 if not defined $id;

my $sth = $dbh->prepare('SELECT fname, lname FROM people WHERE id <= ?');
$sth->execute($id);

while (my $h = $sth->fetchrow_hashref('NAME_lc')) {
    print "$h->{fname} $h->{lname}\n";
}

INSERT

  • INSERT
#!/usr/bin/perl
use strict;
use warnings;

use DBI;

my $dbfile = "sample.db";

my $dbh = DBI->connect("dbi:SQLite:dbname=$dbfile");

my ($fname, $lname, $email, $pw) = qw(Moose Foobar moose@foobar.com really?);


$dbh->do('INSERT INTO people (fname, lname, email, pw) VALUES (?, ?, ?, ?)',
            undef, 
            $fname, $lname, $email, $pw);

We might need to insert some data.

UPDATE

  • UPDATE
#!/usr/bin/perl 
use strict;
use warnings;

use DBI;

my $dbfile = "sample.db";

my $dsn      = "dbi:SQLite:dbname=$dbfile";
my $dbh = DBI->connect($dsn);

show_table("before");

my ($email, $id) = ('new@address.com', 1);

$dbh->do("UPDATE people SET email = ? WHERE id = ?", undef, $email, $id);

show_table("after");


sub show_table {
    print "----- $_[0]\n";
    my $sth = $dbh->prepare("SELECT id, email FROM people");
    $sth->execute();
    while (my $h = $sth->fetchrow_hashref('NAME_lc')) {
        print "$h->{id}   $h->{email}\n";
    }
}

----- before
1   foo@bar.com
2   peti@bar.com

----- after
1   new@address.com
2   peti@bar.com

We might need to update some data.

The same can be also done using a prepared a statement. It is better to use prepared statements if you would like to execute the same SQL query several times with different values.

Simple Database access using Perl DBI and SQL

Advanced Database access using Perl DBI

Data integrity

When you are working in a place where data integrity is important you have to use transactions when executing multiple queries that should either all succeed or all fail.

#!/usr/bin/perl 
use strict;
use warnings;

system "$^X examples/dbi/create_sample.pl";

use DBI;

my $dbfile = "sample.db";

my $dsn = "dbi:SQLite:dbname=$dbfile";
my $dbh = DBI->connect($dsn);

system "$^X examples/dbi/show_accounts.pl before";

debit(1, 100);

system "$^X examples/dbi/show_accounts.pl middle";
#exit;   # process killed

credit(2, 100);

system "$^X examples/dbi/show_accounts.pl account";

sub debit {
    credit($_[0], -1 * $_[1]);
}
sub credit {
    my ($id, $amount) = @_;

    my $sth = $dbh->prepare("SELECT amount FROM accounts WHERE id = ?");
    $sth->execute($id);
    my ($current) = $sth->fetchrow_array();
    $sth->finish;
    $dbh->do("UPDATE accounts SET amount = ? WHERE id = ?",
        undef, $current + $amount, $id);
}

Try to see what happens when you enable the exit() function

#!/usr/bin/perl 
use strict;
use warnings;

use DBI;

my $dbfile = "sample.db";

my $dsn      = "dbi:SQLite:dbname=$dbfile";
my $dbh = DBI->connect($dsn);


if ($ARGV[0]) {
    print "----- $ARGV[0]\n";
}

my $sth = $dbh->prepare("SELECT id, amount FROM accounts");
$sth->execute();
while (my $h = $sth->fetchrow_hashref('NAME_lc')) {
    print "$h->{id}   $h->{amount}\n";
}


Transactions

  • transactions
  • begin_work
  • commit
  • rollback
$dbh->begin_work
$dbh->commit
$dbh->rollback
#!/usr/bin/perl
use strict;
use warnings;

system "$^X examples/dbi/create_sample.pl";

use DBI;

my $dbfile = "sample.db";

my $dsn = "dbi:SQLite:dbname=$dbfile";
my $dbh = DBI->connect($dsn);

system "$^X examples/dbi/show_accounts.pl before";

$dbh->begin_work;
debit(1, 100);

system "$^X examples/dbi/show_accounts.pl middle";
exit;   # process killed

credit(2, 100);
$dbh->commit;


system "$^X examples/dbi/show_accounts.pl account";

sub debit {
    credit($_[0], -1 * $_[1]);
}
sub credit {
    my ($id, $amount) = @_;

    my $sth = $dbh->prepare("SELECT amount FROM accounts WHERE id = ?");
    $sth->execute($id);
    my ($current) = $sth->fetchrow_array();
    $sth->finish;
    $dbh->do("UPDATE accounts SET amount = ? WHERE id = ?",
        undef, $current + $amount, $id);
}

Sample database creation

#!/usr/bin/perl
use strict;
use warnings;

use FindBin qw($Bin);
use DBI;

my $verbose = shift;

my $dbfile = "sample.db";

unlink $dbfile;

print "Creating sample SQLite database at $dbfile\n" if $verbose;
my $dbh = DBI->connect("dbi:SQLite:dbname=sample.db");

my $schema;
{
    open my $fh, '<', "$Bin/sample.sql" or die;
    local $/ = undef;
    $schema = <$fh>;
}
foreach my $sql (split /;/, $schema) {
    next if $sql !~ /\S/; # skip empty entries
    $dbh->do($sql);
}
$dbh->disconnect;


Disconnect from the database

  • disconnect
$dbh->disconnect

Attributes

  • attributes|dbi
  • AutoCommit
  • PrintError
  • RaiseError
  • NAME_lc
  • FetchHashKeyName
  • TraceLevel

In addition to the connection string, the username and the password one can and should configure the database with the DBI specific configuration options (or attributes). These options all go in a HASH reference in the 4th parameter of the DBI connect method.

As the databases and the DBD - Database drivers might all decide arbitrarily on their mode regarding transactions one should always explicitly declare how she wants to operate.

The AutoCommit option.

my %attributes = (
    PrintError  => 1,
    RaiseError  => 1,
    AutoCommit  => 1,
    FetchHashKeyName => 'NAME_lc',  #   NAME_uc
    TraceLevel  => 1,  # see Debugging and Trace levels later
);

my $dsn = "dbi:SQLite:dbname=$dbfile";
my $dbh = DBI->connect($dsn, $username, $password, \%attributes);

Error handling

  • Set the attributes PrintError and RaiseError
  • Check for returned undef (or empty lists)
  • Check $h->err and $h->errstr (aka. $DBI::err and $DBI::errstr)
  • err - Native DB engine error code
  • errstr - Native DB engine error string
$sth = $dbh->prepare($statement)   or die $dbh->errstr;
$rv  = $sth->execute(@bind_values) or die $sth->errstr;

fetchrow_array (and others) return undef when no more row or if they encounter an error. Use RaiseError or check $sth->err

Debugging (Trace levels)

$dbh->trace();
$sth->trace();
DBI->trace();

TraceLevel attribute

Can be used like this:

DBI->trace(1);
DBI->trace(1, '/tmp/dbitrace.log');

The trace level can be 0 (off) .. 15 (usually 1-4 is more than enough)

In CGI scripts add the following:
BEGIN { $ENV{DBI_TRACE}='1=/tmp/dbitrace.log'; }

Calculating bank balance, take two: DBD::CSV

Database access using DBIx::Class

DBIx::Class

Use SQL Database without writing SQL

Create Sample Database

For our examples we are going to use an SQLite database. It is simple to install as it comes within DBD::SQLite, its own database driver and it provide everything we need for our examples.

Actually SQLite is a very good database useful in many applications that don't require concurrent write access to the database frequently.

In order to create the sample database run examples/dbi/create_sample.pl

package My::DB;
use strict;
use warnings;

our $VERSION = '0.01';

use base qw/DBIx::Class::Schema/;

__PACKAGE__->load_classes();

1;


Internet Explorer

Win32::IE::Mechanize

  • Written by Abe Timmerman
  • Currently at version 0.003
  • It is like WWW::Mechanize but using IE as the user agent
  • Connecting to IE using Win32::OLE automation
  • Win32::IE::Link is just like WWW::Mechanize::Link
  • Win32::IE::Form is just like HTML::Form

Accessing Google

use strict;
use warnings;

use Win32::IE::Mechanize;
my $w = Win32::IE::Mechanize->new(visible => 1);
$w->get("http://www.google.com/ncr");
$w->follow_link(text_regex => qr/advanced/i);

=pod
foreach my $f ($w->forms()) {
    printf("Name: %s\n", $f->name);
    foreach my $in ($f->inputs) {
        printf("   Input: %s %s %s\n",
            $in->name, $in->type, $in->value);
    }
}
=cut

$w->form_name("f");
$w->select("num", 20);
$w->submit_form(
#    form_name => 'f',
    fields => {
        as_q => 'perl training',
    },
);

# But specifically with Google, you should use their API and not their web site

Examples

We have already seen examples using WWW::Mechanize

They all work here too.

X-Windows

X11

  • X11
  • xcalc

X11 X-Windows is the mostly used windowing system for unix. There are several window managers running on X11 which makes the capturing part difficult. On unix machines we have a nice graphical calculator called xcalc.

X11 Tools

X11::GUITest

  • X11::GUITest

  • Available since 2003 by Dennis K. Paulsen.

  • X11::GUITest

LDTP - Linux Desktop Testing Project

Manual steps

Launch the application manually and look at its title

xcalc

X11: Launch the application

  • StartApp
  • WaitWindowViewable
  • GetInputFocus
#!/usr/bin/perl

use strict;
use warnings;

use X11::GUITest qw(:ALL);

StartApp('xcalc');

my ($Main) = WaitWindowViewable('Calculator');
if (!$Main) {
  die("Couldn't find xcalc window in time!");
}
print "Main: $Main\n";

my $Focus = GetInputFocus();
print "Focus: $Focus\n";

if ($Focus != $Main) {
   die "The focus is not on the main window or you have two xcalcs open\n";
}


perl examples/X/xcalc01.pl

As you can see we only open the xcalc window and do not close it yet. So now if you run this for a second time (without manually closing the first xcalc window) you will see the error message and you'll see that the Main windows id is the same as previously as the first windows was found first but the Focus is on another window, the one we have just opened.

Let's see the coordinates

  • GetWindowPos
my ($x, $y, $width, $height) = GetWindowPos($Main);
print "$x $y $width $height\n";
#!/usr/bin/perl

use strict;
use warnings;

use X11::GUITest qw(:ALL);

StartApp('xcalc');

my ($Main) = WaitWindowViewable('Calculator');
if (!$Main) {
  die("Couldn't find xcalc window in time!");
}
print "Main: $Main\n";

my $Focus = GetInputFocus();
print "Focus: $Focus\n";

if ($Focus != $Main) {
   die "The focus is not on the main window or you have two xcalcs open\n";
}

my ($x, $y, $width, $height) = GetWindowPos($Main);
print "$x $y $width $height\n";



perl examples/X/xcalc02.pl

Nice, but where exactly are these coordinates ? Let's put the mouse to the top left corner.

Where is this top left corner ?

  • MoveMouseAbs
MoveMouseAbs($x, $y);
#!/usr/bin/perl

use strict;
use warnings;

use X11::GUITest qw(:ALL);

StartApp('xcalc');

my ($Main) = WaitWindowViewable('Calculator');
if (!$Main) {
  die("Couldn't find xcalc window in time!");
}
print "Main: $Main\n";

my $Focus = GetInputFocus();
print "Focus: $Focus\n";

if ($Focus != $Main) {
   die "The focus is not on the main window or you have two xcalcs open\n";
}

my ($x, $y, $width, $height) = GetWindowPos($Main);
print "$x $y $width $height\n";
MoveMouseAbs($x, $y);



perl examples/X/xcalc03.pl

And let's see the rest of the coordinates

MoveMouseAbs($x, $y);
sleep(2);
MoveMouseAbs($x+$width, $y);
sleep(2);
MoveMouseAbs($x+$width, $y+$height);
sleep(2);
MoveMouseAbs($x, $y+$height);
#!/usr/bin/perl

use strict;
use warnings;

use X11::GUITest qw(:ALL);

StartApp('xcalc');

my ($Main) = WaitWindowViewable('Calculator');
if (!$Main) {
  die("Couldn't find xcalc window in time!");
}
print "Main: $Main\n";

my $Focus = GetInputFocus();
print "Focus: $Focus\n";

if ($Focus != $Main) {
   die "The focus is not on the main window or you have two xcalcs open\n";
}

my ($x, $y, $width, $height) = GetWindowPos($Main);
print "$x $y $width $height\n";
MoveMouseAbs($x, $y);
sleep(2);
MoveMouseAbs($x+$width, $y);
sleep(2);
MoveMouseAbs($x+$width, $y+$height);
sleep(2);
MoveMouseAbs($x, $y+$height);

perl examples/X/xcalc04.pl

The sleep() commands were added only so that we can see the mouse moving around.

Smooth on the edges

MoveMouseAbs($x, $y);
foreach my $w (0..$width) {
   MoveMouseAbs($x+$w, $y);
}
foreach my $h (0..$height) {
   MoveMouseAbs($x+$width, $y+$h);
}
foreach my $w (0..$width) {
   MoveMouseAbs($x+$width-$w, $y+$height);
}
foreach my $h (0..$height) {
   MoveMouseAbs($x, $y+$height-$h);
}
perl examples/X/xcalc05.pl
#!/usr/bin/perl

use strict;
use warnings;

use X11::GUITest qw(:ALL);

StartApp('xcalc');

my ($Main) = WaitWindowViewable('Calculator');
if (!$Main) {
  die("Couldn't find xcalc window in time!");
}
print "Main: $Main\n";

my $Focus = GetInputFocus();
print "Focus: $Focus\n";

if ($Focus != $Main) {
   die "The focus is not on the main window or you have two xcalcs open\n";
}

my ($x, $y, $width, $height) = GetWindowPos($Main);
print "$x $y $width $height\n";
MoveMouseAbs($x, $y);
foreach my $w (0..$width) {
   MoveMouseAbs($x+$w, $y);
}
foreach my $h (0..$height) {
   MoveMouseAbs($x+$width, $y+$h);
}
foreach my $w (0..$width) {
   MoveMouseAbs($x+$width-$w, $y+$height);
}
foreach my $h (0..$height) {
   MoveMouseAbs($x, $y+$height-$h);
}

Closing the window

  • ClickMouseButton
  • SendKeys
my ($x, $y, $width, $height) = GetWindowPos($Main);
print "$x $y $width $height\n";

MoveMouseAbs($x+2, $y-2);
ClickMouseButton M_LEFT;
SendKeys('c');

We could close the window by typing Alt-F4 but it does not work on my notebook as I have set Alt-F4 to be something else.

So we can either click on the X in the top-right corner, or click on the X in the top left corner, and then press C. We chose the latter.

The above code will probably work on most of the possible screen resolutions but generally moving a fixed number of pixels might not be the best solution.

perl examples/X/xcalc06.pl

#!/usr/bin/perl

use strict;
use warnings;

use X11::GUITest qw(:ALL);

StartApp('xcalc');

my ($Main) = WaitWindowViewable('Calculator');
if (!$Main) {
  die("Couldn't find xcalc window in time!");
}
print "Main: $Main\n";

my $Focus = GetInputFocus();
print "Focus: $Focus\n";

if ($Focus != $Main) {
   die "The focus is not on the main window or you have two xcalcs open\n";
}

my ($x, $y, $width, $height) = GetWindowPos($Main);
print "$x $y $width $height\n";


MoveMouseAbs($x+2, $y-2);
sleep(2);
ClickMouseButton M_LEFT;
sleep(2);
SendKeys('c');

Placing the cursor on one of the buttons and clicking on it

my ($x, $y, $width, $height) = GetWindowPos($Main);
print "$x $y $width $height\n";

my $button_width = $width/5;
my $button_height = $height*0.8/8;
MoveMouseAbs($x + 2.5 * $button_width, $y+$height * 0.2 + 5.5 * $button_height);

ClickMouseButton M_LEFT;

Observing the xcalc application we see that it has 5 buttons in a row and 8 in a column. At the top of the application there is another window which is about 20% of the total height, full width.

So first we calculate the size of a button and then the center of button "5".

perl examples/X/xcalc07.pl

#!/usr/bin/perl

use strict;
use warnings;

use X11::GUITest qw(:ALL);

StartApp('xcalc');

my ($Main) = WaitWindowViewable('Calculator');
if (!$Main) {
  die("Couldn't find xcalc window in time!");
}
print "Main: $Main\n";

my $Focus = GetInputFocus();
print "Focus: $Focus\n";

if ($Focus != $Main) {
   die "The focus is not on the main window or you have two xcalcs open\n";
}

my ($x, $y, $width, $height) = GetWindowPos($Main);
print "$x $y $width $height\n";

my $button_width = $width/5;
my $button_height = $height*0.8/8;
MoveMouseAbs($x + 2.5 * $button_width, $y+$height * 0.2 + 5.5 * $button_height);
ClickMouseButton M_LEFT;

sleep(2);


MoveMouseAbs($x+2, $y-2);
ClickMouseButton M_LEFT;
SendKeys('c');

So let's do all the calculations

my ($x, $y, $width, $height) = GetWindowPos($Main);
print "$x $y $width $height\n";

my $button_width = $width/5;
my $button_height = $height*0.8/8;

MoveMouseAbs($x + 2.5 * $button_width, $y+$height * 0.2 + 5.5 * $button_height);
ClickMouseButton M_LEFT;
sleep(2);

MoveMouseAbs($x + 4.5 * $button_width, $y+$height * 0.2 + 4.5 * $button_height);
ClickMouseButton M_LEFT;
sleep(2);

MoveMouseAbs($x + 1.5 * $button_width, $y+$height * 0.2 + 4.5 * $button_height);
ClickMouseButton M_LEFT;
sleep(2);

MoveMouseAbs($x + 4.5 * $button_width, $y+$height * 0.2 + 7.5 * $button_height);
ClickMouseButton M_LEFT;
sleep(2);
perl examples/X/xcalc08.pl

That would be 5 * 7 =

#!/usr/bin/perl

use strict;
use warnings;

use X11::GUITest qw(:ALL);

StartApp('xcalc');

my ($Main) = WaitWindowViewable('Calculator');
if (!$Main) {
  die("Couldn't find xcalc window in time!");
}
print "Main: $Main\n";

my $Focus = GetInputFocus();
print "Focus: $Focus\n";

if ($Focus != $Main) {
   die "The focus is not on the main window or you have two xcalcs open\n";
}

my ($x, $y, $width, $height) = GetWindowPos($Main);
print "$x $y $width $height\n";

my $button_width = $width/5;
my $button_height = $height*0.8/8;
MoveMouseAbs($x + 2.5 * $button_width, $y+$height * 0.2 + 5.5 * $button_height);
ClickMouseButton M_LEFT;
sleep(2);
MoveMouseAbs($x + 4.5 * $button_width, $y+$height * 0.2 + 4.5 * $button_height);
ClickMouseButton M_LEFT;
sleep(2);
MoveMouseAbs($x + 1.5 * $button_width, $y+$height * 0.2 + 4.5 * $button_height);
ClickMouseButton M_LEFT;
sleep(2);
MoveMouseAbs($x + 4.5 * $button_width, $y+$height * 0.2 + 7.5 * $button_height);
ClickMouseButton M_LEFT;
sleep(2);


MoveMouseAbs($x+2, $y-2);
ClickMouseButton M_LEFT;
SendKeys('c');

Fetch the list of Child Windows

  • GetChildWindows
  • GetWindowName
my @children = GetChildWindows $Main;

foreach my $id (@children) {
  my $name = GetWindowName($id) || '';
  print "Child: '$id'  '$name'\n";
}
perl examples/X/xcalc11.pl

Non of them has a title, this seem to be like a dead-end, but wait a second, maybe we can find the location of the child windows.

#!/usr/bin/perl

use strict;
use warnings;

use X11::GUITest qw(:ALL);

StartApp('xcalc');

my ($Main) = WaitWindowViewable('Calculator');
if (!$Main) {
  die("Couldn't find xcalc window in time!");
}
print "Main: $Main\n";

my $Focus = GetInputFocus();
print "Focus: $Focus\n";

if ($Focus != $Main) {
   die "The focus is not on the main window or you have two xcalcs open\n";
}


my @children = GetChildWindows $Main;

foreach my $id (@children) {
  my $name = GetWindowName($id) || '';
  print "Child: '$id'  '$name'\n";
}





{
   my ($x, $y, $width, $height) = GetWindowPos($Main);
   MoveMouseAbs($x+2, $y-2);
   ClickMouseButton M_LEFT;
   SendKeys('c');
}

Locate the Child Windows

printf "Main      %6s %6s %6s %6s\n", GetWindowPos($Main);

my @children = GetChildWindows $Main;

foreach my $id (@children) {
  printf "%6s  %6s %6s %6s %6s\n", $id, GetWindowPos($id);
}
perl examples/X/xcalc12.pl


Main: 18874425
Focus: 18874425
Main           6     26    226    304
18874426       6     26    226    304
18874428     186    298     40     26
18874429     142    298     40     26
18874430      98    298     40     26
18874431      54    298     40     26
18874432      10    298     40     26
18874433     186    268     40     26
18874434     142    268     40     26
18874435      98    268     40     26
18874436      54    268     40     26
18874437      10    268     40     26
18874438     186    238     40     26
18874439     142    238     40     26
18874440      98    238     40     26
18874441      54    238     40     26
18874442      10    238     40     26
18874443     186    208     40     26
18874444     142    208     40     26
18874445      98    208     40     26
18874446      54    208     40     26
18874447      10    208     40     26
18874448     186    178     40     26
18874449     142    178     40     26
18874450      98    178     40     26
18874451      54    178     40     26
18874452      10    178     40     26
18874453     186    148     40     26
18874454     142    148     40     26
18874455      98    148     40     26
18874456      54    148     40     26
18874457      10    148     40     26
18874458     186    118     40     26
18874459     142    118     40     26
18874460      98    118     40     26
18874461      54    118     40     26
18874462      10    118     40     26
18874463     186     88     40     26
18874464     142     88     40     26
18874465      98     88     40     26
18874466      54     88     40     26
18874467      10     88     40     26
18874468      10     28    216     46
18874469      17     31    204     38
18874470     145     53     18     15
18874471     109     53     34     15
18874472      79     53     26     15
18874473      49     53     26     15
18874474      22     55     26     15
18874475      36     34    186     17
18874476      22     34     10     15
#!/usr/bin/perl

use strict;
use warnings;

use X11::GUITest qw(:ALL);

StartApp('xcalc');

my ($Main) = WaitWindowViewable('Calculator');
if (!$Main) {
  die("Couldn't find xcalc window in time!");
}
print "Main: $Main\n";

my $Focus = GetInputFocus();
print "Focus: $Focus\n";

if ($Focus != $Main) {
   die "The focus is not on the main window or you have two xcalcs open\n";
}

printf "Main      %6s %6s %6s %6s\n", GetWindowPos($Main);

my @children = GetChildWindows $Main;

foreach my $id (@children) {
  printf "%6s  %6s %6s %6s %6s\n", $id, GetWindowPos($id);
}





{
   my ($x, $y, $width, $height) = GetWindowPos($Main);
   MoveMouseAbs($x+2, $y-2);
   ClickMouseButton M_LEFT;
   SendKeys('c');
}

Using the keyboard

SendKeys("7*5=");
sleep(5);
perl examples/X/xcalc21.pl
#!/usr/bin/perl

use strict;
use warnings;

use X11::GUITest qw(:ALL);

StartApp('xcalc');

my ($Main) = WaitWindowViewable('Calculator');
if (!$Main) {
  die("Couldn't find xcalc window in time!");
}
print "Main: $Main\n";

my $Focus = GetInputFocus();
print "Focus: $Focus\n";

if ($Focus != $Main) {
   die "The focus is not on the main window or you have two xcalcs open\n";
}

SendKeys("7*5=");
sleep(5);


{
   my ($x, $y, $width, $height) = GetWindowPos($Main);
   MoveMouseAbs($x+2, $y-2);
   ClickMouseButton M_LEFT;
   SendKeys('c');
}

Separate the keystrokes

SendKeys("7");
sleep(2);
SendKeys("*");
sleep(2);
SendKeys("5");
sleep(2);
SendKeys("=");
sleep(2);
perl examples/X/xcalc22.pl
#!/usr/bin/perl

use strict;
use warnings;

use X11::GUITest qw(:ALL);

StartApp('xcalc');

my ($Main) = WaitWindowViewable('Calculator');
if (!$Main) {
  die("Couldn't find xcalc window in time!");
}
print "Main: $Main\n";

my $Focus = GetInputFocus();
print "Focus: $Focus\n";

if ($Focus != $Main) {
   die "The focus is not on the main window or you have two xcalcs open\n";
}

SendKeys("7");
sleep(2);
SendKeys("*");
sleep(2);
SendKeys("5");
sleep(2);
SendKeys("=");
sleep(2);


{
   my ($x, $y, $width, $height) = GetWindowPos($Main);
   MoveMouseAbs($x+2, $y-2);
   ClickMouseButton M_LEFT;
   SendKeys('c');
}

Full solution for xcalc

perl examples/X/xcalc.pl keyboard
perl examples/X/xcalc.pl mouse
#!/usr/bin/perl


use strict;
use warnings;

use X11::GUITest qw{:ALL};

if (not @ARGV or
    not grep {$ARGV[0] eq $_} qw(keyboard mouse)) {
    die "Usage: $0 [keyboard|mouse]\n";
}

StartApp('xcalc');

my ($Main) = WaitWindowViewable('Calculator');
print "Main: $Main\n";

my $Focus = GetInputFocus();
print "Focus: $Focus\n";

die("Couldn't find xcalc window in time!") if not $Main;

my ($x, $y, $width, $height) = GetWindowPos($Main);
print "$x $y $width $height\n";


if ($ARGV[0] eq "keyboard") {
    print "Testing keyboard\n";
    SendKeys('7');
    sleep(1);   # just for the show
    SendKeys('*');
    SendKeys('5');
    sleep(1);   # just for the show
    SendKeys('=');
}



if ($ARGV[0] eq "mouse") {
    print "Testing mouse\n";
    press(2,5); #7
    sleep(1);   # just for the show
    press(5,5); #*
    sleep(1);   # just for the show
    press(3,6); #5
    sleep(1);   # just for the show
    press(5,8); #=
}

# Here we should be able to read the result and check if it is correct
sleep(2);

# Hopefully a good guess for clicking on the upper left corner of a window
# (the X - sign) and then pressing 'c' in order to close the applicaton
MoveMouseAbs($x+2, $y-2);
ClickMouseButton M_LEFT;
SendKeys('c');


sub press {
    my ($w, $h) = @_;
    MoveMouseAbs(button_width($w), button_height($h));
    ClickMouseButton M_LEFT;
}


# We assume that the window has a display part at the top of about 20% of the
# total height and it has a keypad of 5x8 size.
sub button_height{
   my $n = shift;
   # 80% of the full window has 8 buttons in it
   my $button_height = $height*0.8/8;
   return int $y + $height*0.2 + ($n-0.5) * $button_height;
}
sub button_width {
   my $n = shift;
   my $button_width = $width/5; # total width has 5 buttons in it
   return int $x + ($n-0.5) * $button_width;
}