Friday 2 January 2009

Learning Perl

  • Difference between enclosing strings in " " and ' ' :

    There are two differences between double-quoted strings and single-quoted strings. The
    first difference is that scalar variables are replaced by their values in double-quoted
    strings but not in single-quoted strings. The following is an example:

    $string = "a string";
    $text = "This is $string"; # becomes "This is a string"
    $text = 'This is $string'; # remains 'This is $string'

    The second difference is that the backslash character, \, does not have a special meaning
    in single-quoted strings. This means that the statement
    $text = 'This is a string.\n';
  • chop function :

    $number = ;
    chop ($number);
    $result = $number + 1;
    This is what is happening: When $number is assigned a line of standard input, it really is
    being assigned a string. For instance, if you enter 22, $number is assigned the string 22\n
    (the \n represents the newline character). The chop function removes the \n, leaving the
    string 22, and this string is converted to the number 22 in the arithmetic expression.
  • Also, strings containing misprints might not contain
    what you expect.
    For example:
    $result = "12O34"; # the letter O, not the number 0
    When converting from a string to an integer, Perl starts
    at the left and continues until it sees a letter that is
    not a digit. In the preceding instance, 12O34 is converted
    to the integer 12, not 12034
  • In Perl, all scalar variables have an initial value of the null string, "". This means that
    you do not need to define a value for a scalar variable.
  • Joining a list of strings to form a sentence:

    perl -e '@var=(I,am,sriram,in,bangalore); $text=join(" ",@var); print("$text\n");'
  • Split the above line and know the number of words in the sentence :

    perl -e '$text="I am sriram from bangalore" ; $sp = split(" ",$text); print ("$sp\n");'
  • OPERATORS in Perl :
    Here are simple examples of each of the first six operators :
    $x <> 10 # true if $x is greater than 10
    $x == 10 # true if $x is equal to 10
    $x <= 10 # true if $x is less than or equal to 10 $x >= 10 # true if $x is greater than or equal to 10
    $x != 10 # true if $x is not equal to 10

    Each of these operators yields one of two values:
    -- True, or nonzero
    -- False, or zero

    The <=> operator is a special case. Unlike the other integer comparison operators, <=>
    returns one of three values:

    -- 0, if the two values being compared are equal
    -- 1, if the first value is greater
    -- -1, if the second value is greater
    For example, consider the following statement:
    $y = $x <=> 10;
  • Perl -w program.pl issues warnings on program if any suspicious things happen in the program.pl.
  • Perl special variables :
  • The following variables are global and should be localized in subroutines:

    $_ The default input and pattern-searching space.
    $. The current input line number of the last filehandle that was read. Reset only
    when the filehandle is closed explicitly.
    $/ The input record separator, newline by default. May be multi-character.
    $, The output field separator for the print operator.
    $" The separator which joins elements of arrays interpolated in strings.
    $\ The output record separator for the print operator.
    $# The output format for printed numbers. Deprecated.
    $* Set to 1 to do multilinematching within strings. Deprecated, see the m and s
    modifiers in section ‘Search and replace functions’.
    $? The status returned by the last ‘. . .‘ command, pipe close or system
    operator.
    $] The Perl version number, e.g. 5.004.
    $[ The index of the first element in an array, and of the first character in a
    substring. Default is 0. Deprecated.
    $; The subscript separator for multi-dimensional array emulation. Default is
    "\034".
    $! If used in a numeric context, yields the current value of errno. If used in a
    string context, yields the corresponding error string.
    $@ The Perl error message from the last eval or do EXPR command.
    $: The set of characters after which a string may be broken to fill continuation
    fields (starting with ‘ˆ’) in a format.
    $0 The name of the file containing the Perl script being executed. May be
    assigned to.
    $$ The process ID of the Perl interpreter running this script. Altered (in the
    child process) by fork.
    $<> The effective user ID of this process.
    $( The real group ID of this process.
    $) The effective group ID and groups of this process.

    The following variables are context dependent and need not be localized:

    $% The current page number of the currently selected output channel.
    $= The page length of the current output channel. Default is 60 lines.
    $- The number of lines remaining on the page.
    $| If set to nonzero, forces a flush after every write or print on the output
    channel currently selected. Default is 0.
    $ARGV The name of the current file when reading from <> .
    The following variables are always local to the current block:
    $& The string matched by the last successful pattern match.
    $‘ The string preceding what was matched by the last successful match.
    $’ The string following what was matched by the last successful match.
    $+ The last bracket matched by the last search pattern.
    $1. . .$9. . . Contain the subpatterns from the corresponding sets of parentheses in
    the last pattern successfully matched. $10 and up are only available if the
    match contained that many subpatterns.
  • list assignment: just like scalar assignments:

    ($a,$b,$c) = (1,2,3) or ($1..$2) = ($a..$z)
    same thing can be written as ($a,$b,$c) = qw(1 2 3 ) and so on..
  • An array doesnot become an element in the list, as the array contains scalars only.
    Just as empty scalar variables start out with undef, empty arrays startout with empty list
    array elements :
    $fruit[1] = "apple" ; $fruit[2] = "banana" ;

    same as :

    @fruit = ("apple" , "banana")
  • Push and pop operators for adding and removing elements of arrays.

    $ perl -e 'push @array, 1..10 ; print "@array"'
    1 2 3 4 5 6 7 8 9 10

    $perl -e 'push @array, a..z ;
    print "@array\n" ;
    pop @array;
    print "@array" ;'

    a b c d e f g h i j k l m n o p q r s t u v w x y z

    a b c d e f g h i j k l m n o p q r s t u v w x y
  • push and pop do the operations to the right of an array.
    where as unshift and shift do the operations to the start of an array.

    @array = 1..5;
    $num = shift @array; ($num gets 1 and now @array is (2,3,4,5)
    unshift(@array,1) ; (@array now has (1,2,3,4,5)
  • foreach control structure is exclusively for processing an entire array or list.
  • Here after, sample programs are denoted as SP:
    SP 1 :

    @NAMES = qw(ramu khanna tinki dumbu sekhar rajesh) ;

    foreach $name(@NAMES) {

    $name = "\t$name" ; # adding a tab before each element of array NAMES
    $name .= = "\n" ; # put a new line at the end of each name

    }

    print "the names are :\n", @NAMES ;


    If you miss the control variable in foreach loop i.e $name above, Perl uses its default variable which is $_.so, we need
    not worry about typing a new variable name.

    SP 2 :

    for each (1..10) {
    print "just printing $_ !\n" ;
    }
  • File handling in perl:

    * open(IN, "< $file"); - Opening a file handler to a file for data input. *< IN > - Reading one record or all records from a file handler.
    * open(OUT, "> $file"); - Opening a file handler to a file for data output.
    * print OUT @list; - Printing a list of values to a file handler.


  • subroutines in perl :

    A subroutine in perl returns a value of the expression in the last line of the subroutine.

    This is not like functions.

    By default, all variables in the subroutines are global variables, unless explicitly variables are scoped using "my" keyword.These are called lexical or local vairables.


    E.g: my ($square) = @_ ; # this is to list context, assings first element of the array .

    my $square = @_ ; # scalar context , assings number of parameters in the array.

    @_ is the special perl array variable which holds the list of parameters passed to the subroutine for each invocation. once the subroutines call is complete. @_ holds it's previous values again.
  • Looping conditions :

    reading input from - line input operator from standard input and display each line:
    # Evaluating lineinput operator in scalar context gives next line.
    E.g: while($line = <stdin>) {

    chomp ($line); # In this kind of loop use chomp only as a first line of the body.
    print "every line is $_";

    }

    # Evaluating line-input operator in list context (foreach) reads entire lines in the array
    foreach (@IN = <stdin> ) {

    chomp;
    print "This line is $_" ;

    }

    Note : reading line by line using while loop is faster when input is from large log files of over 500 MB.
  • Diamond operator <>:

    It is a special operator used for writing especially unix-like utilities usig perl (e.g: awk, sed, cat, grep etc.. )
    It is similar to line-input operator for reading input either from standard input or from user's choice of input.

    E.g: print all lines of file1, file2 and file3 one after the reader until <> reaches undef or end of file.

    perl -e 'while (<>) { print "this is the line $_ from file $ARGV I saw" ; } ' file1 file2 file3

    Note : Diamond operator looks in @ARGV by default for list of file names or command line arguements.

    if @ARGV is empty list, then it uses standard input stream.
  • Using printf for formating conversions and it's usage in case of printing an array with unlimited elements.

    printf "%g %10s %10.2f\n",5/2,"sriram",8 * 9 + 3/2;

    %g chooses an interger or floating-point or exponential as needed.

    If we take elements from a file or , we don't know the number of elements to print:

    In that case, we can store the print format like this :

    @array = <stdin> ;

    print "my elements of the input are :\n" . ("%10s\n" x @array)", @array ;

    note : y x 3 - means string y is replicated 3 times. @array above is used once in scalar context and once in a list context
  • I/O - Filehandling :
    Perl uses six special file handles by default to tell the O.S to open a file for it, they are STDIN,STDOUT,STDERR,ARGV,ARGVOUT
    and DATA.

    To use custom file handles use open to do that like this:

    open LOG, ">>logfile" ;
    open LOG, ">>&log_filename()";
    open CONFIG, "<filename";
    open CONFIG, ">filename";


    when you are done with the opened file, it is good to close the filehandle by saying

    close LOG;

    Note : closing a filehandle will flush any output buffers and release locks on the file openend.

  • Use perl's die to catch the fatal errors due to failed system request or the error you want to catch :

    E.g: since open returns true for success and false for failure, we can catch the message like this :


    if ( ! open LOG, ">>logfile" ) {

    die "Cannot create logfile : $!"
    }


    Note : In unix a program always returns 0 upon sucessful execution and non-zero value upon failure.

    die function prints the message and ensures that program exits with a non-zero exit status.

    Note : $! is the system variable that cotains the system returned error message.

    In scalar context it returns a number

  • Regular Expression Precedence :

    Like awk and sed, Perl allows similar and more powerful pattern matching with its regular expressions.

    Regular Characters like *, + and ? also called quantifiers which match the selected pattern with specified
    number of occurrences or more occurrences.

    {} also can be used to find a letter or word depending up on its occurrences.


    Precendence : *,+,? followed by { } . Paranthesis for grouping and memorizing selected pattern comes first always.


    \s identifies space
    \d identifies [0-9]
    \D identifies non-digits
    \b is equal to shell's anchoring ^ and $ (boundary matching)
    \B is for non-boundary matching
    . (dot) identifies any single character except new line.

    Note : To identify new lines, we can use character class [/d/D] to match new line also.

  • Alternatively we can use modifiers for case insensitive matching and newline matching.

    Some examples of modifiers used inside if or while conditions :

    a)/i - case insensitive matching.

    E.g:
    $_ = "say YES" ;
    if(/yes/i) {

    print "Accepted with YES or yes" ;
    }

    b)/s - To match new lines in the string.
    E.g:
    $_ = "I am sriram\n working in bangalore\nI stay near forum";
    if(/sriram.*banglaore/s) {
    print"sriram is from bangalore"
    }

    c)/x - allows user to add the white-space to a pattern to make it easily readable.

    E.g:

    /-?\d+\.?\d*/ - it matches an optional minus followed by one or more digits followed by optional decimal point followed
    by 0 or more occurrences of digits. (like -55.678 or 555)

    perl -e '
    $digit="this contains -567.890 or 5555 " ;
    if ($digit =~ /-?\d+\.?\d*/) {
    print " found digits"
    }'
    prints o/p : found digits

    the above pattern can be written to make it more readable like this:


    / -? \d+ \.? \d* /x or we can write them each in a seperate line between / and /x

    d) Another important thing to memorize is the usage of perl's special variables to select the matched pattern, also before
    and after the pattern.

    they are $`,$& and $' : to match everything including selected pattern like this

    $` is before the matched, $& is exactly the matched , $' is the after the matched.

    These are useful to find whether we match our expected pattern or not.
  • # sort lexically
    @articles = sort @files;

    # same thing, but with explicit sort routine
    @articles = sort {$a cmp $b} @files;

    # now case-insensitively
    @articles = sort {uc($a) cmp uc($b)} @files;

    # same thing in reversed order
    @articles = sort {$b cmp $a} @files;

    # sort numerically ascending
    @articles = sort {$a <=> $b} @files;

    # sort numerically descending
    @articles = sort {$b <=> $a} @files;

    # this sorts the %age hash by value instead of key
    # using an in-line function
    @eldest = sort { $age{$b} <=> $age{$a} } keys %age;

    # sort using explicit subroutine name
    sub byage {
    $age{$a} <=> $age{$b}; # presuming numeric
    }
    @sortedclass = sort byage @class;



No comments:

Post a Comment

Tweets by @sriramperumalla