Creative Commons License
Drekendrop | Blog of Tutorial by Mei Pakpahan is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Based on a work at drekendrop.blogspot.com.
Permissions beyond the scope of this license may be available at http://softdadesign.co.nr.

Saturday, July 27, 2013

How to Create Your Own BLAST protein database and use BLAST in Windows

Hi, so this is my second post about Science in this blog. I actually have a lot in mind to write in here but i have been extremely busy doing my research. Anyway, what i want to share in here is actually very simple, however, for such a beginner in protein sequence like me it should be a useful post.
So here is the thing, i want to find homologous pair of a protein, BLAST particularly blastp can generate and score some proteins that potentially homologous to another protein. Let's start.

1st step ---
Download the latest BLAST from NCBI site here. In my case i used BLAST Win-32. Just follow the instructions when installing, then it should be installed in C:\Program Files\NCBI\blast-2.2.28+.

2nd step ---
Prepare your FASTA sequence file. In my case i tried to find homologous pair of protein malate dehydrogenase from Salinibacter ruber (PDB ID: 3NEP). You can download it easily from Protein Data Bank website by simply go to the Download Files hyperlink and choose FASTA seq. Following is the FASTA file looks like,

>3NEP:X|PDBID|CHAIN|SEQUENCE
MKVTVIGAGNVGATVAECVARQDVAKEVVMVDIKDGMPQGKALDMRESSPIHGFDTRVTGTNDYGPTEDSDVCIITAGLP
RSPGMSRDDLLAKNTEIVGGVTEQFVEGSPDSTIIVVANPLDVMTYVAYEASGFPTNRVMGMAGVLDTGRFRSFIAEELD
VSVRDVQALLMGGHGDTMVPLPRYTTVGGIPVPQLIDDARIEEIVERTKGAGGEIVDLMGTSAWYAPGAAAAEMTEAILK
DNKRILPCAAYCDGEYGLDDLFIGVPVKLGAGGVEEVIEVDLDADEKAQLKTSAGHVHSNLDDLQRLRDEGKIG


3rd step ---
Prepare your database. First of all, you need to prepare your FASTA file that contains all the FASTA sequences of the proteins that you would like to evaluate. In the example below, my FASTA file contains 4 PDB (proteins), namely 1EMD, 1IB6, 1Z2I, 3TL2. I saved it as meso.fasta file.

>1EMD:A|PDBID|CHAIN|SEQUENCE
MKVAVLGAAGGIGQALALLLKTQLPSGSELSLYDIAPVTPGVAVDLSHIPTAVKIKGFSGEDATPALEGADVVLISAGVR
RKPGMDRSDLFNVNAGIVKNLVQQVAKTCPKACIGIITNPVNTTVAIAAEVLKKAGVYDKNKLFGVTTLDIIRSNTFVAE
LKGKQPGEVEVPVIGGHSGVTILPLLSQVPGVSFTEQEVADLTKRIQNAGTEVVEAKAGGGSATLSMGQAAARFGLSLVR
ALQGEQGVVECAYVEGDGQYARFFSQPLLLGKNGVEERKSIGTLSAFEQNALEGMLDTLKKDIALGQEFVNK
>3TL2:A|PDBID|CHAIN|SEQUENCE
SNAMTIKRKKVSVIGAGFTGATTAFLLAQKELADVVLVDIPQLENPTKGKALDMLEASPVQGFDANIIGTSDYADTADSD
VVVITAGIARKPGMSRDDLVATNSKIMKSITRDIAKHSPNAIIVVLTNPVDAMTYSVFKEAGFPKERVIGQSGVLDTARF
RTFIAQELNLSVKDITGFVLGGHGDDMVPLVRYSYAGGIPLETLIPKERLEAIVERTRKGGGEIVGLLGNGSAYYAPAAS
LVEMTEAILKDQRRVLPAIAYLEGEYGYSDLYLGVPVILGGNGIEKIIELELLADEKEALDRSVESVRNVMKVLV
>1IB6:A|PDBID|CHAIN|SEQUENCE
MKVAVLGAAGGIGQALALLLKTQLPSGSELSLYDIAPVTPGVAVDLSHIPTAVKIKGFSGEDATPALEGADVVLISAGVA
RKPGMDRSDLFNVNAGIVKNLVQQVAKTCPKACIGIITNPVNTTVAIAAEVLKKAGVYDKNKLFGVTTLDIICSNTFVAE
LKGKQPGEVEVPVIGGHSGVTILPLLSQVPGVSFTEQEVADLTKRIQNAGTEVVEAKAGGGSATLSMGQAAARFGLSLVR
ALQGEQGVVECAYVEGDGQYARFFSQPLLLGKNGVEERKSIGTLSAFEQNALEGMLDTLKKDIALGQEFVNK
>1IB6:B|PDBID|CHAIN|SEQUENCE
MKVAVLGAAGGIGQALALLLKTQLPSGSELSLYDIAPVTPGVAVDLSHIPTAVKIKGFSGEDATPALEGADVVLISAGVA
RKPGMDRSDLFNVNAGIVKNLVQQVAKTCPKACIGIITNPVNTTVAIAAEVLKKAGVYDKNKLFGVTTLDIICSNTFVAE
LKGKQPGEVEVPVIGGHSGVTILPLLSQVPGVSFTEQEVADLTKRIQNAGTEVVEAKAGGGSATLSMGQAAARFGLSLVR
ALQGEQGVVECAYVEGDGQYARFFSQPLLLGKNGVEERKSIGTLSAFEQNALEGMLDTLKKDIALGQEFVNK
>1IB6:C|PDBID|CHAIN|SEQUENCE
MKVAVLGAAGGIGQALALLLKTQLPSGSELSLYDIAPVTPGVAVDLSHIPTAVKIKGFSGEDATPALEGADVVLISAGVA
RKPGMDRSDLFNVNAGIVKNLVQQVAKTCPKACIGIITNPVNTTVAIAAEVLKKAGVYDKNKLFGVTTLDIICSNTFVAE
LKGKQPGEVEVPVIGGHSGVTILPLLSQVPGVSFTEQEVADLTKRIQNAGTEVVEAKAGGGSATLSMGQAAARFGLSLVR
ALQGEQGVVECAYVEGDGQYARFFSQPLLLGKNGVEERKSIGTLSAFEQNALEGMLDTLKKDIALGQEFVNK
>1IB6:D|PDBID|CHAIN|SEQUENCE
MKVAVLGAAGGIGQALALLLKTQLPSGSELSLYDIAPVTPGVAVDLSHIPTAVKIKGFSGEDATPALEGADVVLISAGVA
RKPGMDRSDLFNVNAGIVKNLVQQVAKTCPKACIGIITNPVNTTVAIAAEVLKKAGVYDKNKLFGVTTLDIICSNTFVAE
LKGKQPGEVEVPVIGGHSGVTILPLLSQVPGVSFTEQEVADLTKRIQNAGTEVVEAKAGGGSATLSMGQAAARFGLSLVR
ALQGEQGVVECAYVEGDGQYARFFSQPLLLGKNGVEERKSIGTLSAFEQNALEGMLDTLKKDIALGQEFVNK
>1Z2I:A|PDBID|CHAIN|SEQUENCE
MAHGNEKATVLARLDELERFCRAVFLAVGTDEETADAATRAMMHGTRLGVDSHGVRLLAHYVTALEGGRLNRRPQISRVS
GFGAVETIDADHAHGARATYAAMENAMALAEKFGIGAVAIRNSSHFGPAGAYALEAARQGYIGLAFCNSDSFVRLHDGAM
RFHGTNPIAVGVPAADDMPWLLDMATSAVPYNRVLLYRSLGQQLPQGVASDGDGVDTRDPNAVEMLAPVGGEFGFKGAAL
AGVVEIFSAVLTGMRLSFDLAPMGGPDFSTPRGLGAFVLALKPEAFLERDVFDESMKRYLEVLRGSPAREDCKVMAPGDR
EWAVAAKREREGAPVDPVTRAAFSELAEKFSVSPPTYH
>1Z2I:B|PDBID|CHAIN|SEQUENCE
MAHGNEKATVLARLDELERFCRAVFLAVGTDEETADAATRAMMHGTRLGVDSHGVRLLAHYVTALEGGRLNRRPQISRVS
GFGAVETIDADHAHGARATYAAMENAMALAEKFGIGAVAIRNSSHFGPAGAYALEAARQGYIGLAFCNSDSFVRLHDGAM
RFHGTNPIAVGVPAADDMPWLLDMATSAVPYNRVLLYRSLGQQLPQGVASDGDGVDTRDPNAVEMLAPVGGEFGFKGAAL
AGVVEIFSAVLTGMRLSFDLAPMGGPDFSTPRGLGAFVLALKPEAFLERDVFDESMKRYLEVLRGSPAREDCKVMAPGDR
EWAVAAKREREGAPVDPVTRAAFSELAEKFSVSPPTYH
>1Z2I:C|PDBID|CHAIN|SEQUENCE
MAHGNEKATVLARLDELERFCRAVFLAVGTDEETADAATRAMMHGTRLGVDSHGVRLLAHYVTALEGGRLNRRPQISRVS
GFGAVETIDADHAHGARATYAAMENAMALAEKFGIGAVAIRNSSHFGPAGAYALEAARQGYIGLAFCNSDSFVRLHDGAM
RFHGTNPIAVGVPAADDMPWLLDMATSAVPYNRVLLYRSLGQQLPQGVASDGDGVDTRDPNAVEMLAPVGGEFGFKGAAL
AGVVEIFSAVLTGMRLSFDLAPMGGPDFSTPRGLGAFVLALKPEAFLERDVFDESMKRYLEVLRGSPAREDCKVMAPGDR
EWAVAAKREREGAPVDPVTRAAFSELAEKFSVSPPTYH
>1Z2I:D|PDBID|CHAIN|SEQUENCE
MAHGNEKATVLARLDELERFCRAVFLAVGTDEETADAATRAMMHGTRLGVDSHGVRLLAHYVTALEGGRLNRRPQISRVS
GFGAVETIDADHAHGARATYAAMENAMALAEKFGIGAVAIRNSSHFGPAGAYALEAARQGYIGLAFCNSDSFVRLHDGAM
RFHGTNPIAVGVPAADDMPWLLDMATSAVPYNRVLLYRSLGQQLPQGVASDGDGVDTRDPNAVEMLAPVGGEFGFKGAAL
AGVVEIFSAVLTGMRLSFDLAPMGGPDFSTPRGLGAFVLALKPEAFLERDVFDESMKRYLEVLRGSPAREDCKVMAPGDR
EWAVAAKREREGAPVDPVTRAAFSELAEKFSVSPPTYH

When you're done with creating this file, you can now create your database.
1. Go to command prompt then go to your BLAST bin folder
cd "Program Files\NCBI\blast-2.2.28+\bin"
2. makeblastdb -in YOUR_FASTA_FILE -dbtype prot -out  YOUR_WISH_DBNAME
makeblastdb -in D:\proteins\blastdata\meso.fasta -dbtype prot -out Meso
3. Once it's done, go to your BLAST bin folder then you'll find 3 new files there: YOUR_WISH_DBNAME.phr, YOUR_WISH_DBNAME.pin, and YOUR_WISH_DBNAME.psq

4th step ---
Now we are ready to BLAST
blastp -query YOUR_FASTA_SEQUENCE_file.txt -db YOUR_WISH_DBNAME -out YOUR_RESULT.txt
blastp -query "D:\proteins\blastdata\3NEP.fasta.txt" -db Meso -out "D:\proteins\blastdata\homo1.txt"

Easy huh? Good luck! ;)

0 comments:

Post a Comment

 
Design by Free WordPress Themes | Bloggerized by Lasantha - Premium Blogger Themes | Grants For Single Moms