ITNW 2310 - PERL
Allan Kochis,Adjunct Professor - CIT

Regular Expressions


  1. Matching
    We have been saying / /which is just shorthand for m/ / the match operator. When we omit the variable Perl uses $_ but we can check for a match(=~) or not a match(!~) in any variable.
    PerlOutput
    #!/usr/bin/perl -w
    
    $string="one two";
    
    if($string =~ /one/) {
         print "Matched\n";
    }
    
    if($string !~ /three/) {
      print "True I did not find three\n";
    }
    

    Matched
    True I did not find three
    

  2. Memory and backreference variables.
    Whenever a regex sees () it saves the contents of the match in a memory location, First $1, second match $2 and so on.
    PerlOutput
    #!/usr/bin/perl -w
    
    $string="one two";
    
    $_=$string;
    
    m/(\w+)\W+(\w+)/;
    
    print $1."\n";
    print $2."\n";
    
    

    one
    two
    

  3. Multiple matches
    In order to get more than the first match use the g flag. This will return an array reference.
    PerlOutput
    #!/usr/bin/perl -w
    
    $string="one two three four";
    
    @a=($string =~ m/(\w+)/g);
    
    for $item (@a) {
       print $item."\n";
    }
    
    

    one
    two
    three
    four
    

  4. The s and m flags.
    s treat multiple lines as a single line . will match a \n.
    m the pattern is over several lines. But . does not match \n
    PerlOutput
    #!/usr/bin/perl -w
    
    @a=(<<"FINIS" =~ m/(.+)/g);
    Hello
    my
    name
    is
    mud
    FINIS
    
    print join(" ",@a)."\n";
    
    @a=(<<"EOD"=~m/([\w'?]+)/gs);
    What's
    up
    Doc?
    EOD
    
    
    print join(" ",@a)."\n";
    

    Hello my name is mud
    What's up Doc?
    

  5. i and x modifiers
    i for case insensitive.
    x for multiple line and comments see page 171-172

  6. Substitution.

    s/find/replace/;
    PerlOutput
    #!/usr/bin/perl -w
    
    
    $a="The rain in Spain, stays mainly on the plain.";
    
    $a=~s/ain/out/;
    print $a."\n";
    
    
    $a="The rain in Spain, stays mainly on the plain.";
    $a=~s/ain/out/g;
    print $a."\n";
    

    The rout in Spain, stays mainly on the plain.
    The rout in Spout, stays moutly on the plout.
    

  7. Substitute with memory
    PerlOutput
    #!/usr/bin/perl -w
    #
    #
    print "----------------------------------------------\n";
    print `cat ip.data`;
    print "----------------------------------------------\n";
    open(IN,"<ip.data") || die "could not open data file\n";
    
    while(<IN>) {
       chomp;
       next if !(/ A /);
       @fields=split;
       $_=$fields[2];
       s/(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})/$4.$3.$2.$1/;
       print $_."\t\tIN\tPTR\t".$fields[0].".mydomain.com.\n";
    }
    
    close(IN);
    

    ----------------------------------------------
    @       IN      SOA     ns.mydomain.com. hostmaster.mydomain.com. (
                            1999021004      ; serial, todays date + todays serial #
                            8H              ; refresh, seconds
                            2H              ; retry, seconds
                            1W              ; expire, seconds
                            1D )            ; minimum, seconds
    
                    NS      ns              ; Inet Address of name server
                    MX      10 mail         ; Primary Mail Exchanger
    localhost       A       127.0.0.1
    ns              A       192.168.102.1
    linuxhost       A       192.168.102.2
    mail            A       192.168.102.3
    www             A       192.168.102.4 
    ----------------------------------------------
    1.0.0.127		IN	PTR	localhost.mydomain.com.
    1.102.168.192		IN	PTR	ns.mydomain.com.
    2.102.168.192		IN	PTR	linuxhost.mydomain.com.
    3.102.168.192		IN	PTR	mail.mydomain.com.
    4.102.168.192		IN	PTR	www.mydomain.com.
    

  8. Delimiters
    like q and qq can use any delimiter so can m and s.

  9. Lazy vs Greedy evaluation.
    LazyGreedy
    *?*
    +?+

    PerlOutput
    #!/usr/bin/perl -w
    
    $a=q/Test text "one" and "two" and "three"/;
    print "Text   : ".$a."\n";
    
    print "Greedy : ";
    
    $a=~m/(".*")/;
    
    print $1."\n";
    
    print "Lazy   : ";
    
    $a=~m/(".*?")/;
    
    print $1."\n";
    
    
    

    Text   : Test text "one" and "two" and "three"
    Greedy : "one" and "two" and "three"
    Lazy   : "one"
    

© Allan Kochis Last revision 10/10/2005