http://www.ncbi.nlm.nih.gov/books/NBK1763/

BLAST Command Line Applications User Manual

Christiam Camacho, Thomas Madden, George Coulouris, Ning Ma, Tao Tao, and Richa Agarwala.

Created: June 23, 2008; Last Update: January 30, 2012.

1. Introduction

This manual documents the BLAST (Basic Local Alignment Search Tool) command line applications developed at the National Center for Biotechnology Information (NCBI). These applications have been revamped to provide an improved user interface, new features, and performance improvements compared to its counterparts in the NCBI C Toolkit. Hereafter we shall distinguish the C Toolkit BLAST command line applications from these command line applications by referring to the latter as the BLAST+ applications, which have been developed using the NCBI C++ Toolkit (http://www.ncbi.nlm.nih.gov/books/NBK7160/).

Please feel free to contact us with any questions, feedback, or bug reports at blast-help@ncbi.nlm.nih.gov.

2. Installation

The BLAST+ applications are distributed in executable and source code format. For the executable formats we provide installers as well as tarballs; the source code is only provided as a tarball. These are freely available atftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/. Please be sure to use the most recent available version; this will be indicated in the file name (for instance, in the sections below, version 2.2.18 is listed, but this should be replaced accordingly).

2.1 Windows

Download the executable installer ncbi-blast-2.2.18+.exe and double click on it. After accepting the license agreement, select the install location and click “Install” and then “Close”

2.2 MacOSX

For users without administrator privileges: Download the ncbi-blast-2.2.18+-universal-macosx.tar.gz tarball and follow the procedure described in Other Unix platforms.

For users with administrator privileges and machines MacOSX version 10.5 or higher: Download the ncbi-blast-2.2.18+.dmg installer and double click on it. Double click the newly mounted ncbi-blast-2.2.18+ volume, double click on ncbi-blast-2.2.18+.pkg and follow the instructions in the installer. By default the BLAST+ applications are installed in /usr/local/ncbi/blast, overwriting its previous contents (an uninstaller is provided and it is recommended when upgrading a BLAST+ installation).

2.3 RedHat Linux

Download the appropriate *.rpm file for your platform and either install or upgrade the ncbi-blast+ package as appropriate using the commands:

Install:
rpm -ivh ncbi-blast-2.2.18-1.x86_64.rpm
Upgrade:
rpm -Uvh ncbi-blast-2.2.18-1.x86_64.rpm

Note: one must have root privileges to run these commands. If you do not have root privileges, please use the procedure described inOther Unix platforms.

2.4 Other Unix platforms

Download the tarball and expand it in the location of your choice.

2.5 Source tarball

Use this approach if you would like to build the BLAST+ applications yourself. Download the tarball, expand it and in the expanded directory type the following commands:

cd c++
./configure --without-debug --with-strip --with-mt --with-build-root=ReleaseMT
cd ReleaseMT/build
make all_r

The compiled executables will be found in c++/ReleaseMT/bin.

In Windows, extract the tarball and open the appropriate MSVC solution or project file (e.g.: c++\compilers\msvc800_prj\static\build), build the -CONFIGURE- project, click on “Reload” when prompted by the development environment, and then build the -BUILD-ALL- project. The compiled executables will be found in the directory corresponding to the build configuration selected (e.g.: c++\compilers\msvc800_prj\static\bin\debugdll).

3. Quick start

3.1 For users of NCBI C Toolkit BLAST

The easiest way to get started using these command line applications is by means of the legacy_blast.pl PERL script which is bundled along with the BLAST+ applications. To utilize this script, simply prefix it to the invocation of the C toolkit BLAST command line application and append the --path option pointing to the installation directory of the BLAST+ applications. For example, instead of using

    blastall -i query -d nr -o blast.out 

use

    legacy_blast.pl blastall -i query -d nr -o blast.out 
--path /opt/blast/bin 

For more details, refer to the section titled Backwards compatibility script.

3.2 For users of Web BLAST (http://blast.ncbi.nlm.nih.gov)

Users of Web BLAST can take advantage of the search strategies to quickly get started using the BLAST+ applications, as these intend to allow seamless integration between the Web and command line BLAST tools. For more details, refer to the section onBLAST search strategies.

3.3 For new users of BLAST

An introduction to BLAST is outside the scope of this manual, more information on this subject can be found onhttp://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs. Nonetheless, new users will benefit from the examples in the cookbook as well as reading the user manual.

3.4 Downloading BLAST databases

The BLAST databases are required to run BLAST locally and to support automatic resolution of sequence identifiers. Documentation about these can be found ftp://ftp.ncbi.nlm.nih.gov/blast/db/README. These databases may be retrieved automatically with the update_blastdb.pl perl script, which is included as part of this distribution.

This script will download multiple tar files for each BLAST database volume if necessary, without having to designate each volume. For example:

./update_blastdb.pl htgs 

will download all the relevant HTGs tar files (htgs.00.tar.gz, …, htgs.N.tar.gz)

The script can also compare your local copy of the database tar file(s) and only download tar files if the date stamp has changed reflecting a newer version of the database. This will allow the script run on a schedule and only download tar files when needed. Documentation for the update_blastdb.pl script can be obtained by running the script without any arguments (perl is required).

4. User manual

4.1 Functionality offered by BLAST+ applications

The functionality offered by the BLAST+ applications has been organized by program type, as to more closely resemble Web BLAST. The following graph depicts a correspondence between the NCBI C Toolkit BLAST command line applications and the BLAST+ applications:

Image graph1.jpg

As an example, to run a search of a nucleotide query (translated “on the fly” by BLAST) against a protein database one would use the blastx application instead of blastall. The blastx application will also work in “Blast2Sequences” mode (i.e.: accept FASTAsequences instead of a BLAST database as targets) and can also send BLAST searches over the network to the public NCBI server if desired.

The blastn, blastp, blastx, tblastx, tblastn, psiblast, rpsblast, and rpstblastn are considered search applications, as they execute aBLAST search, whereas makeblastdb, blastdb_aliastool, and blastdbcmd are considered BLAST database applications, as they either create or examine BLAST databases.

There is also a new set of sequence filtering applications described in the section Sequence filtering applications and an application to build database indices that greatly speed up megablast in some cases (see section titled Megablast indexed searches).

Please note that the NCBI C Toolkit applications seedtop and blastclust are not available in this release.

4.2 Common options

The following is a listing of options that are common to the majority of BLAST+ applications followed by a brief description of what they do:

4.2.1 best_hit_overhang: Overhang value for Best-Hit algorithm. For more details, see the section Best-Hits filtering algorithm.

4.2.2 best_hit_score_edge: Score edge value for Best-Hit algorithm. For more details, see the section Best-Hits filtering algorithm.

4.2.3 db: File name of BLAST database to search the query against. Unless an absolute path is used, the database will be searched relative to the current working directory first, then relative to the value specified by the BLASTDB environment variable, then relative to the BLASTDB configuration value specified in the configuration file. Multiple databases may be provided as an argument, and they must be separated by a space. Many operating systems now allow spaces in file names and paths, so it is necessary to use quotes. See section 5.15 for details.

4.2.4 dbsize: Effective length of the database.

4.2.5 dbtype: Molecule type stored or to store in a BLAST database.

4.2.6 db_soft_mask: Filtering algorithm ID to apply to the database as soft masking for subject sequences. The algorithm IDs for a given BLAST database can be obtained by invoking blastdbcmd with its -info flag (only shown if such filtering in the BLAST database is available). For more details see the section Masking in BLAST databases.

4.2.7 culling_limit: Ensures that more than the specified number of HSPs are not aligned to the same part of the query. This option was designed for searches with a lot of repetitive matches, but if possible it is probably more efficient to mask the query to remove the repetitive sequences.

4.2.8 entrez_query: Restrict the search of the BLAST database to the results of the Entrez query provided.

4.2.9 evalue: Expectation value threshold for saving hits.

4.2.10 export_search_strategy: Name of the file where to save the search strategy (see section titled BLAST search strategies).

4.2.11 gapextend: Cost to extend a gap.

4.2.12 gapopen: Cost to open a gap.

4.2.13 gilist: File containing a list of GIs to restrict the BLAST database to search. The expect values in the BLAST results are based upon the sequences actually searched and not on the underlying database.

4.2.14 h: Displays the application’s brief documentation.

4.2.15 help: Displays the application’s detailed documentation.

4.2.16 html: Enables the generation of HTML output suitable for viewing in a web browser.

4.2.17 import_search_strategy: Name of the file where to read the search strategy to execute (see section titled BLAST search strategies).

4.2.18 lcase_masking: Interpret lowercase letters in query sequence(s) as masked.

4.2.19 matrix: Name of the scoring matrix to use.

4.2.720 max_target_seqs: Maximum number of aligned sequences to keep from the BLAST database. This option should only be used with formats that do not have a separate descriptions and alignments section, such as XML, tabular, ASN.1 or BLAST archive.

4.2.21 negative_gilist: File containing a list of GIs to exclude from the BLAST database.

4.2.22 num_alignments: Number of alignments to show in the BLAST output. This option should only be used with formats that have a separate alignments section, such as the standard BLAST report, including pairwise and any query-anchored flavor. This option may not work as expected with formats such as XML, tabular, etc. that do not have a separate alignment section. The max_target_seqs option should be used in that case

4.2.23 num_descriptions: Number of one-line descriptions to show in the BLAST output. This option should be used with output formats that have a separate descriptions section, such as the standard BLAST report, including pairwise and any query-anchored flavor. This option may not work as expected with formats such as XML, tabular, etc. that do not have a separate descriptions section. The max_target_seqs option should be used in that case.

4.2.24 num_threads: Number of threads to use during the search.

4.2.25 out: Name of the file to write the application’s output. Defaults to stdout.

4.2.26 outfmt: Allows for the specification of the search application’s output format. A listing of the possible format types is available via the search application’s -help option. I

抱歉!评论已关闭.