Perl
Perl 101
Perl is an interpreted language, but exist tool to turn them into binary.
Perl script can be setuid!!
The first column is $_[0], next one is $_[1], the whole line is $_ (or @_)
Contrast to Awk first column is $1, second is $2, and $0 is the whole input line.
reference = pointer.
@array isn an array, like: [,,] ($array[0], $array[1], etc)
\@array address of array [,,] (a pointer, stored as 32-bit integer, ie scalar)
@$array $array is a pointer, take that as address, and there is the array.
Ie decypher \@array
when declaring ref, use my $array.
so, that should be what is returned in a fn call.
Finding array size:
my @arr = (2);
print scalar @arr; # first way to print array size
print $#arr; # second way to print array size
my $arrSize = @arr;
print $arrSize; # third way to print array size
Tracing script
There are a couple of ways to trace a perl script, kinda similar to how bash -x ./myscript.sh works.
See
this stackoverflow post for discussion, and
perldoc for all the gory details details.
Method 1, use one of:
PERLDB_OPTS="NonStop AutoTrace" perl -d ./myscript.pl
PERLDB_OPTS="NonStop AutoTrace LineInfo=trace.txt" perl -d ./myscript.pl | tee stdout.txt
PERLDB_OPTS="NonStop AutoTrace frame=1 LineInfo=trace.txt" perl -d ./myscript.pl | tee stdout.txt
I am not sure where the tracer write its output, trying to capture stream 2 with 2>&1 doesnt work.
The frame=1 clause add lots more output!
Method 2:
cpan install "Devel::Trace"
perl -d:Trace ./myscript.pl
Perl Variables
$scalar = 1;
@array = ( 'a', 'b' );
%hash = ( 'key1' => 'value1', 'key2' => 'value2' );
FILEHANDLE
Objects and Class
It is essentially a hack on Packages, but bless()ed into a class.
Default Package variables are class-wide variables.
Instance/method variables need to be inside data structure that has been
bless()ed. But at time may need to check whether method was called
as class subroutine or as object's method!!
First param for a method call, $_[0], is a ref to the object (or its class), it is done automatically by Perl (using unshift or something).
Be sure to "use strict", or Perl may think of some closure need to applied
to specific variables declared in methods and those may become like
class-wide data, but in fact "use strict" would mean they should be scoped correctly and destroyed when method is completed.
constructor can be named anything, new() is just the most popular name.
Destructor is called automagically and must be named DESTROY.
ref: perltoot, perltooc, perlref
specifying libraries
perl -I/path/to/my/lib
export PERL5LIB=$PERL5LIB:/path/to/my/lib
export PERL5LIB=$PERL5LIB:/usr/prog/perl/current/lib/5.10.1:/usr/prog/perl/current/lib/site_perl/5.10.1/ # for CPAN installed lib, add path to the point where all files/dirs are added by CPAN.
inside perl program: use lib "/path/to/my/lib"
Note that :: is used as directory separator from the "library root dir". eg
use Bio::Perl::foo;
should find something in $PERL5LIB/Bio/Perl/foo.pm
Checking @INC for PERL5LIB
checking PERL5LIB @INC
env -i perl -V # ignores the PERL5LIB env var
env perl -V
both should return the same output, but if root's env got inherited, clear it with something like export PERL5LIB=''
One-Liner PERL in shell as AWK replacement
command line options:
-e = inline input for program file
-n = do not print/display input line
-p = print input line
-a = split array automatically (input line into @F field)
-F: = Use : as field separator (like AWK). $F[0] is first column , like python. unlike awk
-l = add \n at the end of each array element... may or may not be what is needed!!
ls -l | perl -nle 'print $_;' # plain pass thru
ls -l | perl -lne 'if ($_ =~ pattern) {print $_}' # like grep pattern
ls -l | perl -lane 'print @F' # split input into @F "field" array
ypcat passwd | perl -F: -lane 'print "$F[0] $F[2]";' # print 1st and 3rd column only (ie username and uid)
ls -l | perl -lane 'if ($F[2] !~ $F[8]) { print $_ };' # find out dir not owned by the user (eg run in /nfshome)
# $F[2] is owner column, $F[8] is last column with file/dir name
ls -l | perl -lane 'if ($F[2] !~ $F[8]) { print "chown $F[8] $F[8]"};' > fix.sh
# edit fix.sh and run to fix ownership
ls -l | perl -lane 'if ( $F[4] == 0 ) { print $_ };' # list file of size 0 (perl array is 0-based indexing)
ls -l | perl -lane 'if ( $F[4] != 0 ) { print $_ };' # list file of size not 0
qstat -u \* | perl -lane 'print $_ if F[4] eq "qw"' # print whole line when 5th column has word qw (UGE queued/waiting state column)
cat /etc/passwd | perl -F: -lane 'print "$F[6] $F[0]"'
cat /etc/passwd | perl -F: -nle 'chomp; @F = split( /:/, $_ ); print "$F[6] $F[0]";'
# above 2 lines should be equivalent,
# but sometime newline is attached to last column and automatic
# array split can't chop it off, and so need
# to do it manually
echo "az" | perl -nle 'print ++$_' # increment an alpha string, eg return "ba"
perl -e 'use Bio::Perl;' # test whether a given package is installed
perl -e 'use DBI;'
perl -spi 's/ftp/FTP-disabled/' /etc/passwd # inline substitution in a file, no (manual) tmp file creation needed.
# same as: sed -i 's/ftp/FTP-disabled/' /etc/passwd
# print the directory stack one line at a time with a number prefix usable with pushd +N
# have not figure out a way to put this as an alias, thus, place in script called Dirs
dirs | perl -lane 'for( $N=0; $N < @F; $N++ ) {print "$N: $F[$N]"}'
Bunch of baby things...
RegExp
\d digit: ie [0-9]
\w word: ie [A-Za-z0-9_]
\W non-word
\s space
\S non-space
([\w]+)\s([\w]+) beomce $1 $2 for two words separated by white space
if( $string =~ "patter" ) {
# regexp pattern is a substring in $string
}
String comparison operators:
ne not equal
eq equal
lt less than (how?)
numeric comparison operators:
== numeric equal
!= not equal, ie different
<= numeric less than
= single equal sign is for assignment
var = value
Loops
OUTER: while( <> ) {
INNER: while( ... ) {
if( $string1 eq $string2 ) {
next; # break INNER loop
} else {
last OUTER; #
}
}
# statements that will continue at end of INNER
#
stm1;
stm2;
# last OUTER will reach here
# and won't run stm1, stm2 above.
}
# LABEL: (those var ending with :) are optional in simple loop.
# next and last will jump to end of such loop automatically if no label are present.
# there is also redo.
# if-statement does not count as loop block.
# break is used to jump out of if-else block?
# ref: Learning Perl pp 104
Snipplets in stand alone program
# generating unix password crypt
my $en = crypt( "FixMyPw!", "pw");
print "$en\n";
# eg of command line argument
# 0 is not the command itself, but the first argument
print "arg 0: $ARGV[0] \n";
print "arg 1: $ARGV[1] \n";
# ls -l but only for file of non zero size
my @lsOut = qx "ls -l" ;
foreach my $line ( @lsOut ) {
if( $line =~ /^(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\d+)\s+/ ) {
if( $5 != 0 ) {
print $line;
}
}
}
Sample Code Useful in Bioinformatics
# Input file: 34 column .maf file (tab delimited)
# Output file: 9 column tab delimited
# eg for use with Genome MuSiC pfam
cat 34col.maf | perl -F\\\t -a -ne 'for ( $c = 0; $c < 8; $c++ ) {print $F[$c] . "\t"} ; print $F[$c] . "\n"' > 9col.tsv
# add column 10 thru column 34 with dummy "0"
cat 9col.tsv | perl -F\\\t -ne 'chomp; print $_; for ( $c = 10; $c <= 34; $c++ ) {print "\t0"}; print "\n"' > 34col.tsv
# print a comment line at the top of the file that contains column number
# and header lines from the original source maf file
(perl -e 'print "#" ; for ( $c = 1; $c <= 34; $c++) { print $c . "\t" }; print "\n"' ; \
head -2 source.maf ; cat 34col.tsv) > 34col+header.maf
# print column 1 of a tab delimted file
# perl column numbering starts with 0
# python column numbering starts with 0
# awk column numbering starts with 1
# cut column numbering starts with 1
cat file.tsv | perl -F\\\t -lane ' print "$F[0]"; '
cat file.tsv | awk -F\\\t '{print $1} '
cat /etc/passwd | cut -d: -f3 # display UID field
## in vim, use :set tabstop=20 and :set nowrap to get better view of tab-delimited file
sample code in ~/dp/code/...
POD/Perldoc
Plain Old Doc can be embeded inside perl code itself, and perldoc perl.html (this file)
will produce man page like documentation.
pod commands start with = , and most often, need to have blank line before
and after it!
eg:
=head1 Description
Plain text can now go in here, and it will be wrapped.
New line isn't really headed, and formatter will
join them and wrap at end of screen line.
=cut
=pod
= pod doesn't do much, other than it is text to be placed in pod doc.
when space/tab is prefixed in
the begining of the line
then perldoc will treat it like quoted text
and format them verbatim
lastly, for html OL item list stuff, need to have a block delimited as
= over
= back
and add entries with
= item itemName
=cut
=over
=item ItemName
what ever describe an item, these will be indented
=item *
instead of name, * should produce bullet, but seems to be formated as ? :(
in real life, don't mix name with bullet was a best practice...
=back
=cut
# the =cut is needed to indicate the end of pod in the =over/=back block
perldoc -t perllocal # list set of installed CPAN modules
ref:
perldoc.perl.org
5 min pod
quick perl ref
for( my $i; $i < 10; $i++ ) { ... }
foreach my $entry ( @list ) { ... }
# specify where perl will search for the libraries, @INC define the search path.
BEGIN {
push(@INC, '/zambeel/lib/perl/site_perl');
push(@INC, '/opt/zambeel/lib');
}
open( FH-IN, "< /path/to/inputfile" );
open( FH-OUT, "> /path/to/outputfile" );
while( ) { print (FH-OUT $_); }
close( FH-IN );
Note that it seems that while the file handle is open, a self-recursive call
will mangle up the FH internal db, thereby closing it when inner code returns.
One solution is read file into memory bugger and close FH before making recursive call.
calling main will all input args:
main($#ARGV+1, @ARGV);
see email ref folder for more perl stuff
---
# Perl module creation
# ch 11.2 of Programming Perl
package Bestiary; # store this in file Bestiary.pm
require Exporter; # copy these two lines verbatim
our @ISA = ("Exporter"); # work as inheritance from exporter class.
# then, have these:
our @EXPORT = qw($camel %wolf ram); # Export by default
our @EXPORT_OK = qw(leopard @llama $emu); # Export by request
our %EXPORT_TAGS = ( # Export as group
camelids => [qw($camel @llama)],
critters => [qw(ram $camel %wolf)],
);
# so, in a program that "use Bestiary",
# by default, it can access ram instead of having to do Restiary::ram
# The export as group need to be imported as "use Bestiary qw( :camelids )"
# colon is needed for tags, not for the export by request list.
# for variables declared in the module w/o my, they can be accessed as
# $Bestiary::emu (funny symbol before package name).
# running cmd in shell.
$| = 1; # avoid buffering
$exitCode = system( 'ls -l' );
$exitCode /= 256;
# Manish said that system use lot of buffering, and can something get confused and stuck if the are closed output... use $! = 1 to get around the problem.
###### styles ########
for sizable script, use a file to define all the commands, scripts, etc that will be called, eg defineCmd.pm.
This way, the defineCmd.pm can be smart enough to detect platform changes of where commands are, or the need to move script to different dir.
if( os == rh )
ping = ...
ifconfig = ...
elsif( os = sunos )
pint = ...
ifconfig = ...
if sys admin, think about setting a dir where all executable are located, independent of platform, but this will result to non-portable code.
############### net stuff #################
C:\refMaterial.YaEnCasa\ref.from.cd\oreileyPerlCDbookShelf\perlnut\c13_001.htm
inet_ntoa
inet_aton
convert from/to dotted ip (and english name) to 32bit number for low level manipulation of the strings.
inet_aton() will convert single integer number (eg 19) and treat that as last octet and create IP (32 bit bin) equivalent to 0.0.0.19, so that two 32bit number can be ORed together (via |)
Numberic addition will not work :(
---
Big endian = MSB = most significant bit first = network order =
ie, normal human looking pattern order for 32bit num, right most bit = 1, left most bit is most significant, highest value.
C:\refMaterial.YaEnCasa\ref.from.cd\oreileyPerlCDbookShelf\cookbook\ch02_05.htm
perl
N format = network order, (those produced by inet_aton).
B32 = bit by bit (the "normal" perl scalar data?)
CPANMINUS
a newer approach to cpan lib install. User will install this module in their own dir.
When installed to say NFS mounted home dir, setting perl5lib to include such dir, then the cpan mod will be avail to the user on any machine with NFS access to the dir
curl -L http://cpanmin.us | perl - App::cpanminus
cpanm -l ~/perl5.20 DBD
CPAN
Sample session
cpan # rest of commands are inside cpan shell
h # display help
o conf # display config
o conf ftp_proxy # display config of given scalar (CPAN config)
o conf ftp_proxy myproxy:2010 # change settings (permanent)
m DBD::mysql # search for (m)odule, display info
i /DBD::mysql/ # search for (m)odule, (b)undle, etc, REGEXP match
install DBD::mysql # do everyone in one command
get DBD::mysql # the command chain to download, compile, test, install
make DBD::mysql
test DBD::mysql
install DBD::mysql
look DBD::mysql # create a shell in the dir where modules is stored.
clean DBD::mysql # in case need to remove and restart
readme DBD::mysql # get basic info about the module
perldoc DBD::mysql # perldoc inline doc of module/bundle...
# note: DBD::mysql requires mysql-devel rpm, and the user running cpan install
# to be granted priv to drop tables etc in the TEST db
# the mysql db server need to be running on the local machine
# mysql -u root -e "grant all privileges on test.* to 'sysadm'@'localhost'"
# force install DBD:mysql may be able to install without test passing...
more details...
Download CPAN modules and install automatically as root to local system:
Note that which perl is invoked is important.
sun stock perl used cc for compile, prefer to use sunfreeware.com that install to /usr/local, and use gcc.
Also, remember that module names are case sensitive.
If there are more than one compiler,
def env var may help:
CC=/usr/local/bin/gcc (does it really use a CC compiler??)
MAKE= ... (is it $MAKE?)
Best to config CPAN as root and define the setting of where all the programs are located.
perl -MCPAN -e shell # abreviated as cpan in perl5
o conf prerequisites_policy # learn about the config
o conf prerequisites_policy follow # set to automatically get dependencies (def?)
o conf prerequisites_policy ask # change to prompt before adding prereq
install Mail::SpamAssasin (spell?)
install Digest::Nilsima
install URI::Escape
quit
or in non-interactive mode:
perl -MCPAN -e 'install Chocolate::Belgian'
perl -MCPAN -e 'install Net::DNS'
Files are stored in a variety of places :(
@INC is probably set differently, and multiple version can exist :(
/usr/local/lib/perl5/site_perl/5.8.0...
find /usr/local/lib/perl5/ -name '*SHA1*'
/usr/local/lib/perl5/site_perl/5.8.0/sun4-solaris/auto/Digest/SHA1
/usr/local/lib/perl5/site_perl/5.8.0/sun4-solaris/auto/Digest/SHA1/SHA1.so
/usr/local/lib/perl5/site_perl/5.8.0/sun4-solaris/auto/Digest/SHA1/SHA1.bs
/usr/local/lib/perl5/site_perl/5.8.0/sun4-solaris/Digest/SHA1.pm
/usr/local/lib/perl5/site_perl/5.8.0/Digest/HMAC_SHA1.pm
/usr/local/lib/perl5/site_perl/5.8.0/Digest/SHA1.pm
/usr/local/lib/perl5/site_perl/5.8.0/Mail/SpamAssassin/SHA1.pm
One can set PERL5LIB to desired root location, but this may still happen I think.
--
Clearing CPAN config setup (eg, to change proxy settings, compiler locations, etc)
Need to remove (or edit) Config.pm, at:
SMCperl that install to /usr/local: /usr/local/lib/perl5/5.8.0/CPAN/Config.pm
Troubleshooting CPAN install
Uninstalling/Reinstalling a module
There is no easy way to uninstall a module. CPAN is not a package manager :(
But most package have a .package list. It list all the files and the path they were installed.
Can use this to remove the files.
After this step, cpan can be run and installing the module should start from scratch again.
(cpan should not complain that the module is already installed, given it find the files are not there. partial file delete should work, but if trying to make backup, be careful to copy/move all files).
eg. cpan install of DBD::Pg left package list at
/usr/prog/perl/5.16.3_gcc/lib/site_perl/5.16.3/x86_64-linux/auto/DBD/Pg/.packlist
Manual install of modules
If a CPAN module won't install, go to the shell and
cd to the dir where the build location is, and try to run "make" manually, eg
cd /home/tin/.cpan/build/Inline-Java-0.53-Fyk_8W
perl Makefile.PL
Perl Debugger
perl debugger.
invoked by calling perl with -d option:
perl -d
perl debugger is really just perl running with addional support (from DB module to store state info).
It is not totally seprated from the program as traditional debugger.
To execute any perl code w/in a debugging program, just enter the perl code.
Anything that is not understood by debugger is considered perl code and executed via eval,
in the context at the debugging point of the program.
Debugger uses csh-like cmd edit.
! num, ! cmd, etc to excute cmd, H for history
debugger cmd (commoner stuff):
t : toggle trace mode (on emulate sh -x, print all code as executed)
R : reload/restart the debugger
b fn-name : set breakpoint at fn
b LINE : set breakpoint at line num (breakable lines has colon after num)
b sub-name : set breakpoint at a subroutine name
b load FILE : set breakpint when FILE is first loaded (by use, require?)
d LINE : remove breakpoint at line
D : remove all breakpoint
L : list all breakpoint
c : continue
s : step (will enter into subroutine)
n : next (execute subroutine completely and return, ie "step over")
: previous n or s cmd is repeated
r : return from current running subroutine
T : backTrace of fn call seq
S pattern : list all subroutines (with optional pattern for name)
w lineNum : look at "window" of source code around break point (or around lineNum)
w w : display more lines than single w by itself.
- : print previous few lines
l LINE|SUB : display Lines of code around LINE, SUB, (etc).
/PATTERN : search fwd in prog for PATTERN (think of vi search)
p $varname : print value of $varname.
f : something about switching file (to be read, set breakpoint?)
o : set lot of option stuff about the dbg
h : help. Use h h for concise usage.
h CMD : help on the given command.
H : cmd line History
use pipe preceding cmd will run cmd thru the pager. eg |h to page thru help
Inlined debugger command inside perl programs. If debugger isn't running, these command will be harmless.
$DB::single = 1 : s = stop program, return command to debugger.
$DB::single = 2 : n
$DB::trace = 1 : t = trace on ?
hoti1
bofh1