Creating your own local BLAST nucleotide database & perform local BLAST search - brief tutorial
This is a very brief step-by-step tutorial for the creation of a local BLAST database based on a set of sequences of interest and the subsequent BLAST search for similar sequences in another sequence dataset. The created database, however, can also used with the NCBI BLAST webtool.
Links and example files for this tutorial:
1. Go to the BLAST download folder and download the BLAST executables for your operating system:
ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ - unzip this archive in a folder of your choice.
2. Prepare a fasta sequence file that contains the sequences of interest, which you would like to have in your local BLAST database. A fasta file format looks like this and can be opened and edited with any texteditor:
>name_sequence1
AGTCGTCCTCT
>name_sequence2
GGTAGTACCTGAAGTA
It is not necessary to align the sequences and or trim them all to the same length. For this tutorial I have linked the algae.fasta file - try to open it with your text editor.
3. Copy the fasta file into the ncbi-blast-2.3.x bin folder. This folder should contain all the BLAST executables - important for this tutorial are the executables: makeblastdb & blastn
4. Open your terminal and go to the bin folder. Use the following command to create the algae nucleotide BLAST database:
makeblastdb -in algae.fasta -out algaedb -dbtype 'nucl' -hash_index
Great, you created your own BLAST database. Type:
makeblastdb -help to learn more about the available and used parameters.
5. Now, use your own large sequences collection (for example the OTU reference fasta file created with qiime or mothur) and BLAST these sequences.
Copy & paste the fasta file you would like to use for the BLAST search (for example the OTU reference fasta file - let's call it OTU_reference.fasta) into the the ncbi-blast-2.3.x bin folder.
To perform a blastn search use the following command line in your terminal window:
blastn -query OTU_reference.fasta -task blastn -db algaedb -out algae_blast.html -evalue 10 -word_size 4 -num_threads 2 -html
Great, you performed your own local BLAST similarity search. Type:
makeblastdb -help to learn more about the available and used parameters & watch the Biological Sequence Analysis I 2014 lecture to learn more about the evalue and wordsize parameters: NCBI BLAST Lecture
6. Open the generated algae_blast.html file with your browser to take a look at your results - the search results in the present example are very underwhelming and a more conservative BLAST search would probably yield even less hits.