hippodata: Perl Scripting: Part 2

(Please click here for part - 1)

Key Concepts:
==========
Associate Arrays (Hash Arrays)
File Input and output
Strings
Subroutines
Running external programs

Hash arrays:
=========
-Hash arrays are also called associate arrays.
-Instead of using “@” and “[ ]”, “%” and “{ }” are used for hash arrays.
-Hash arrays are used to find the value based on a key. A key is used to retrieve the value associated with key.
-To access an element from an hash array:
$h_array{key}
-Perl borrowed associate array from awk.
-Hashing helps to speed up searching significantly specially for large amount of data.

Hashing and hash function:
====================
-Hashing is a method for directly referencing records in a table by doing arithmetic transformations on keys into table addresses.
-If keys are distinct integers from 1 to N, then the records can be stored in the table by the key position.
-In this case, the data can be retrieved directly from the table using the key.
-The first step in using hashing is to transform the search key into a table address.
-The function or method used to convert the key into table address is the hash function.
-Ideal hash functions should map different keys into different table addresses.
-If different keys hash into the same address, then hash collision happens.
Good hashing functions produce fewer collisions.

Array and Hash comparison:
=====================
Access Data From Array:
-Start from index 0
-for each record, compare the key to that of the record.
-If match then access the data: retrieve or update.

index data
----- ----
0 record0
1 record1
2 record2
3 record3
4 record4
5 record5

Access Data From Hash:
-convert the key to hash table index.
-Get the first record from the table index.
-Search thru the list until found.

hashtable
---------
0
1 --------> record1a
2
3
4 --------> record4a ------> record4b
5

Hash Function:
===========
-Most of the time, the search key is a string.
-When converting a string into a hash table index, the following factors must be considered:
a. Speed of the function
b. Index less than zero
c. Index larger than table size
d. Frequency of collision
e. Since the names used for hash typically do not have limits, complex computation for a hash function may be too expensive.
f. The goal of the hash function is to minimize collision, but in practice, collisions always happen.

Closed Hash:
===========
-Closed hashing is used when the number of entries is known and it is less than the size of the hash table.
-Closed hash is also referred to as open-addressing (address, not hash. Do not let the terms confuse you).
-The simplest close hash (open addressing) is called linear probing.
-The way close hash works (for searching):
a. Use hash function to get the table index.
b. If the entry at the table index is the one then search is successful.
c. If the entry at the table index is empty, then search not found.
d. If the entry at the table index is not what is being searched, continue to search the next address.
e. Continue to search until an empty entry, a match or end of the table.

Open Hash:
==========
-Open hashing is also called separate chaining.
-When multiple entries hash to the same table index, a linked list is built to keep all the entries.
-Under open hash, the search within the table is eliminated.
-If the table size is H, entries will be distributed evenly to the H linked lists if the has function is good.
-Compare a hash table with size H and a single linked list, the search using a hash table can be close to H times faster (not exactly due to the hash function overhead and the imperfection of the hash function).

Hash array access: existence
====================
The function exists can be used to check the existence of a given key:

if (exists $books{“cats”}) { print “Yes, there is an entry for cats!\n”; }

Hash array access: assign/overwrite
===========================
-If an entry does not exist, it can be assigned: $name{“john”} = “tall guy”;
-If an entry exists already, it will be overwritten.
-A hash array can be copied to another one:
%new_hash = %old_hash; %inverse_hash = reverse %old_hash;
-A hash array can also be assigned name/value pairs directly:
%name_hash = (“key1”, 12, “key2”, 13, “key3”, 14);

Hash array access : name/value pair (=>)
==============================
-When assign name and value pairs to a hash array, it is difficult to tell which entry is key and which is value.
-Perl provides an easy way to represent name value pairs:
%names = ( “key1” => “value1”, “key2”=> “value2” )
-This way, it is very clear which entry is the name and which entry is the value.
-The big arrow (=>) can be used to replace a comma. It is also called fat comma.

Hash array access: delete
===================
-The delete function can be used to remove a given key.
-If there is no such key to be deleted, there will not be any warning or error.
Syntax: delete $h_array{“key”} ;
-It is not he same as storing undef into that hash element.
-The function exist returns true if a key value is “undef”.
-The function exist returns false if a key is deleted.

Hash array access: keys and values
==========================
-Hash array supports two functions:
a. keys: get the entire keys at once
b. values: get the entire values at once.

-Both keys and values return empty lists if no elements exist.

Snippet:
----------
unix>cat keys_values.pl
#!/usr/bin/perl -w
%hash = ("a" => 1, "b" => 2, "z"=>4, "x" =>23);

@ks = keys %hash;
@vs = values %hash;

printf "keys:";
printf " %s " x @ks, @ks ;
printf "\n";
printf "values:";
printf " %s " x @vs, @vs ;
printf "\n";
snpsemt225:yinglir>keys_values.pl
keys: a b x z
values: 1 2 23 4
unix>

File Handles:
=========
-Variables that represent files are called file handles.
-File handles do not have any special character ($, @, &).
-They are typically created as UPPER case variables.
-All file handles in perl are global. They can not be allocated locally.

The standard files:
=============
-Before a perl program runs, three standard files are opened:
STDIN
STDOUT
STDERR
-The < > operator returns one line from standard input. It returns undef when there is no more inputs.

$line = <STDIN>
while ($line = <STDIN>) {
chomp ($line);
}

Open and Close files:
===============
-The “open” and “close” operators work similar to other programming languages:
open (F1, “filename”); # open “filename” for read with handle F1.
open(F2, “>filename”); # open “filename” for writing as file handle F2
open(F3, “>>filename”); # open “filename” for appending .
close (F1); # close a file handle.

-Open can be used to establish read/write connection to a separate process launched by the OS (on Unix):
open (F, “ls –l |”); # open a pipe to read from an ls process
open (F, “| mail $addr”); # open a pipe to write to a mail process.

-A convenient way to exit a program: open(F, $filename) || die “could not open $filename\n” ;

Read data from a file:
================
-In a scalar context, the input operator reads one line at a time.
$line = <F> ; # reads in a line at a time.

-In an array context, the input operator reads the whole file into the memory as an array of its lines:
@a = <F> ; # reads the whole file

-The global variable “$/” which is the end-of-line marker (default to \n). Setting this to undef will cause the file to be read in as a single line:
$/ = undef;
$all_lines = <F> ; # read the whole file into one string.

Print output:
==========
By default, print uses STDOUT to send the outputs.
An output file handle can be specified to print: print F “here”, “ comes”, “ the rain!\n”;

Snippet:
----------
#!/usr/bin/perl -w
$fname = "fileline";
$line = "";
open ("here", $fname) or die ("Could not pen $fname\n");
while (<here>) {
chomp($_);
$line = $_;
print "$line\n";
}
close ("here");

Printing array:
===========
-Neither print or printf can print arrays directly.
-printf can be used with a little effort to dump an array content.
-Note that when array is used in the context of a scalar, it returns the number of elements.
-“format_str x @ary” can be used to create the format string.
-“x” is the repetition operator and @ary gives the number of elements.

Snippet:
----------
unix>cat print_array.pl
#!/usr/bin/perl -w
my @ary = qw/cat dog pig fish fly/ ;
my $format = "Array content: \n" . ("%15s \n" x @ary);
my $ary_n = “”;
printf STDOUT $format, @ary;

print "Using print:\n";
print @ary ;
print "\n";

print "Using printf:\n";
$ary_n=@ary;
printf “$ary_n\n” ;
print "\n";

unix>print_array.pl
Array content:
cat
dog
pig
fish
fly
Using print:
catdogpigfishfly
Using printf:
cat
unix>

String Processing (binding operator):
===========================
-String manipulation is one of Perl’s most powerful features.
-Perl utilizes regular expression extensively for string manipulation.
-Perl uses the binding operator (=~) to match pattern: ($string =~ /pattern/)
-The expression ( $string =~ /pattern/ ) returns true as long as the pattern exists in the string.
-($string =~ /pattern/i ) makes the matching case insensitive.
-If it matches, $1, $2 and etc will contain the tokens in the string. Note that this is equivalent to the \1 \2 in RegExp. It is supported by Perl as well.
-If the string matches, three special variables can be used:
a. $& (dollar-ampersand) holds the matched string
b. $` (dollar-back-quote) holds the string before the matched portion
c. $’ (dollar-quote) holds the string after the matched portion.

The m// construct:
=============
-The /pattern/ in the previous examples is actually a short cut for m/pattern/. The slash used as the delimiter, the “m” character can be omitted.
-With the “m” (for match) character, then any character can be used as the delimiter.
-The following are equivalent:
“string” =~ m/str/ ;
“string” =~ m”str” ;
“string” =~ m’str’ ;
“string” =~ m#str# ;

Limit over matching with ?
===================
-Both * and + tend to be greedy and over match (for the largest match).
-m/{(.*)}/ will match the largest string with “{}”.
-If the string is “{ group 1} and {group 2}”, m/{(.*?)}/ will return “group 1} and {group 2”.
-If ? (matches at most one occurrence) is used after * or +, Perl will match the shortest string instead of longest string.

String option modifiers: i, s, x, g
========================
-The “i” option makes the matching case insensitive.

Snippet:
----------
print “Do you want to continue?”
chomp ($_ = <STDIN>) ;
if (/yes/i) { # when matching to the default variable $_, $_ =~ can be omitted.
print “Thank you for the positive response!\n” ;
}

-The “s” option makes the (.) dot match to include newline.

Snippet:
-----------
$str = “The dog runs \n after the cat.\n” ;
if ($str =~ /dog.*cat/s) { # the matches fails w/o the “s”. print “See dog ahead of cat.\n” ;
}

-The “x” allows white space in the pattern to improve readability.
“/-?\d+\.?\d*/” and “/-? \d+ \.? \d* /x” are equivalent.

-The “g” allows the string substitution applied repeatedly in the string

-A “s” in front of the matching substring can be used to substitute the matching with a new substring:
$str =~ s/old/new/ig ; # i for mixed case, g for matching repeatedly.

-$1 and $2 can be used to refer to parts of the matched string (with groups):
$x = “This dress exacerbates the generic betrayal that is my legacy.\n” ;
$x =~ s/(r|l)(\w)/z$2/ig ; # r or l followed by a word char. ##
$x is now: “This dzess exacezbates the genezic betzayal that is my zegacy.

String “split”:
==========
-The split construct takes a regular expression and a string to return an array of all the substrings.
-If “-1” is passed as the third argument, an empty element is added to the end of the array.

Snippet:
-----------
split (/\s+/, “this is a string”); gives (“this”, “is”, “a”, “string”)

split (/\s*,\s*/, “that , tree, is,very ,tall”); gives (“that”, “tree”, “is”, “very”, “tall”)
split (/\s*,\s*/, “that , tree, is,very ,tall”, -1); gives (“that”, “tree”, “is”, “very”, “tall”, “”)

String “tr”:
========
-The “tr” construct can be used to convert characters in the string:
$str =~ tr/a/b/ ; $str =~ tr/A-Z/a-z/ ;

-Note that the two sets should match in sizes:

Snippet:
----------
unix>cat tr.pl

$str = "this is a cat and my cat can take on a dog.\n";
print $str ;
$str =~ tr/cat/dog/;
print $str ;

unix>
unix>perl tr.pl
this is a cat and my cat can take on a dog.
ghis is o dog ond my dog don goke on o dog.
unix>

Local Variables:
============
-Starting with Perl version 5, the “my” construct were introduced to create local variables:
my $var ; # declares a variable $var
my $v1 = “cat” ; # declares v1 and assign it with “cat”.
my @ary = (1..10) ; # declars array @ary and assign 1 thru 10 to it.
my ($x, $y); # declares two local variables $x and $y
my ($p, $q) = (100, 200) ;

-Local variables are mostly used in subroutines.

Return multiple values:
=================
-What if multiple values need to be returned to caller?
-One approach can be packing multiple values into an array and return it.
…
my ($str1, $value1, $str2) = myroutine();
…
sub myroutine {
…… @ary = (“this”, -1, “that”);
return (@ary);
}

Running External Programs:
======================
-Perl can invoke other programs using the “system” function.
-Perl passes the argument to “system” as a command line to the operating system.
-System returns 0 when the program successfully completes and the global variable “$?” should be set to an error.

system (“mail hippo\@rkguru.com < mail.txt”) == 0 | die “system error $?”;

Please click here for part - 3

hippodata

Pages

Sunday, May 12, 2013

Perl Scripting: Part 2

No comments:

Contact Form

Popular Posts