Perl for Bioinformatics
- What is Perl?
- Perl , a scripting language, stands for Practical Extraction and Report Language. It is also sometimes referred to as Pathologically Eclectic Rubbish Lister. It was created by Larry Wall, the chief architect, maintainer, and implementor of Perl, in 1987. It is a popular programming language, and used extensively in bioinformatics. The language has become popular with biologists due to the fact that several bioinformatics tasks can be accomplished using this language with realtive ease. It is a portable (works on all platforms), powerful, and flexible language. And, moreover, it is not difficult to learn!
- Getting Perl
- Many a time computers come with Perl alraedy installed, especially Unix and Linux computers. To check to see if Perl is already installed (on Unix and Linux systems) simply type the following at a command prompt
$ perl -v
The following message will be dispalyed if Perl is already installed:
This is perl, v5.8.8 built...
Copyright 1987-2006, Larry Wall
Perl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found ...
Complete documentation for perl, including FAQ lists, should be found...
(I have not written the complete message. This is just to give you an idea of the message, which is displayed if Perl is already installed.)
The following message will be displayed if Perl isn't installed:
perl: command not found
On a Windows system, type
perl -v
at an MS-DOS command window to check to see if Perl is already installed. One of the above two messages will be displayed depending upon whether Perl is installed or not on the system.
The current standard Perl distribution is ActivePerl from ActiveState. To get Perl, click on the following link:
http://www.activestate.com/Products/ActivePerl/
Look for the "Get Active Perl" button on the page that opens, click the "Get Active Perl" button, then click on the "Download" button. You will be asked to give your contact details. This is the optional step and could be skipped. Press "Continue" button to move ahead with the process. You will be taken to the page from where you can download the ActivePerl 5.8.8.820 for the platform (Windows, Linux, Solaris , Mac OS X) you are using.
After you have downloaded the package it's time to install and run it on your computer.
- Installing Perl
- Now install the package you have downloaded. Double-click the Perl installer ActivePerl-5.8.8.820-MSWin32-x86-274739 for Windows; the installation process will take some time. After the installation process is complete check to see if Perl installation was successful or not. Type
perl -v (on a Windows system)
at an MS-DOS command window. If the installation was successful you should see
This is perl, v5.8.8 built...
Copyright 1987-2006, Larry Wall
Perl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found ...
Complete documentation for perl, including FAQ lists, should be found...
- Running Perl
- A Perl script can be written in any standard text editor. The text file is then saved with .pl extension (e.g. my_perl_script.pl). Now open the command prompt or MS-DOS prompt window, and type the correct path where you have stored the file and then the Perl file name (e.g. c:\windows\desktop\my_perl_script.pl). Press enter! You should get the desired output.
- Perl Scripts
- Although the information provided above is just the tip of the iceberg, but I think it should suffice as a beginning.
Perl scripts for bioinformatics follow now. Most of the scripts given below are written by the site author; and the reference for those not written by the site author will be provided alongwith the script. Each script will be commented so that users can understand what goes "behind the scenes". You can copy and use these scripts.
- 1. A program to write complementary DNA sequence
#!/usr/bin/perl -w
#write complementary DNA strand
#let's store a DNA sequence in variable $DNA
#a variable in Perl with a $ sign is a scalar variable
$DNA="5'ACGTCGTC 3'";
print "Here is a DNA sequence: \n\n";
print "$DNA\n\n";
#"\n" or "\n\n" refers to newline
$DNACOMP=$DNA;
#then store the DNA sequence in another variable $DNACOMP
$DNACOMP =~ tr/ACGT5'3'/TGCA3'5'/;
# =~ is the binding operator and the tr operator stands for
#transliterate or translation; it translates a set of characters into
new characters
print "Here is the complementary sequence of the above:\n\n";
print "$DNACOMP\n";
exit;
The output of the above script would be:
Here is a DNA sequence:
5'ACGTCGTC 3'
Here is the complementary sequence of the above:
3'TGCAGCAG 5'
- 2. A program to reverse transcribe RNA to DNA
#!/usr/bin/perl -w
#let's store an RNA sequence in variable $rna
$rna="5'AGUGCUGCUGUCGUGCAGUCAGUCGCUGCAUGCUCGUAAAAAAAA3'";
print "Here is an RNA sequence: \n\n";
print "$rna\n\n";
$rnarevtrans=$rna;
#then store the RNA sequence in another variable $rnarevtrans
$rnarevtrans =~ tr/ACGU5'3'/TGCA3'5'/;
print "Here is the sequence after reverse transcription:\n\n";
print "$rnarevtrans\n";
exit;
The output of the above script would be:
Here is an RNA sequence:
5'AGUGCUGCUGUCGUGCAGUCAGUCGCUGCAUGCUCGUAAAAAAAA3'
Here is the sequence after reverse transcription:
3'TCACGACGACAGCACGTCAGTCAGCGACGTACGAGCATTTTTTTT5'
- 3. A program to find the length of a sequence
#/!usr/bin/perl -w
$sequence=0;
print "\nType a DNA, RNA or protein sequence and then press Enter:\n\n";
#now get the user input from the keyboard
$sequence=<STDIN>;
chomp ($sequence);
$result = length ($sequence);
#length is the perl's built-in function
print "\nThe length of the above sequence is: $result\n\n";
exit;
The output of the above script would be something like this:
Type a DNA, RNA or protein sequence and then press Enter:
CGATGACGATGCAGAGCAGAGACGCAGCTGAGCAGACTGA
The length of the above sequence is: 40
New scripts coming every week
HOME