FASTA/TFASTA/FASTX/TFASTXv3(1)                  FASTA/TFASTA/FASTX/TFASTXv3(1)


NAME
       fasta3,  fasta3_t  - scan a protein or DNA sequence library for similar
       sequences

       tfasta3, tfasta3_t - compare a  protein  sequence  to  a  DNA  sequence
       library, translating the DNA sequence library `on-the-fly'.

       fastx3,  fastx3_t   -  compare  a  DNA  sequence  to a protein sequence
       database, comparing the translated DNA sequence in forward and  reverse
       frames.

       tfastx3,  tfastx3_t   -  compare  a  protein sequence to a DNA sequence
       database, calculating similarities with frameshifts to the forward  and
       reverse orientations.

       fasty3,  fasty3_t   -  compare  a  DNA  sequence  to a protein sequence
       database, comparing the translated DNA sequence in forward and  reverse
       frames.

       tfasty3,  tfasty3_t   -  compare  a  protein sequence to a DNA sequence
       database, calculating similarities with frameshifts to the forward  and
       reverse orientations.

       fasts3,  fasts3_t  -  compare  unordered peptides to a protein sequence
       database

       tfasts3, tfasts3_t - compare unordered peptides  to  a  translated  DNA
       sequence database

       fastf3,  fastf3_t  -  compare  mixed  peptides  to  a  protein sequence
       database

       tfastf3, tfastf3_t  -  compare  mixed  peptides  to  a  translated  DNA
       sequence database

       ssearch3,  ssearch3_t - compare a protein or DNA sequence to a sequence
       database using the Smith-Waterman algorithm.

       prss3, prfx3 - estimate statistical significance  of  an  alignment  by
       comparing  the score to the distribution of similarity scores generated
       by shuffling the second sequence.  prss3  uses  Smith-Waterman.   prfx3
       uses the fastx algorithm.


DESCRIPTION
       Release  3.x  of  the  FASTA package provides a modular set of sequence
       comparison programs that can run on conventional single processor  com-
       puters or in parallel on multiprocessor computers. Seven different pro-
       grams - fasta3, fastx3, fasty3, tfastx3, tfasty3, tfasta3, and ssearch3
       - are currently available.

       All  of  the  comparison  programs  share  a  set of basic command line
       options; additional options are  available  for  individual  comparison
       functions.

       The  fasta3_t,  fastx3_t, fasty3_t, tfasta3_t, tfastx3_t, tfasty3_t and
       ssearch3_t programs are threaded versions that will run in parallel  on
       Digital Equipment, Sun, and SGI multiprocessor computers.


Options for comparison functions
       These  versions  of  the  fasta programs have been modified to accept a
       query sequence from the unix "stdin" data stream.  This makes  it  much
       easier  to use fasta3 and its relatives as part of a WWW page. To indi-
       cate that stdin is to be used, use "@" as the query sequence file name.
       "@"  can  also  be used to specify a subset of the query sequence to be
       used, e.g:

     cat query.aa | fasta3 -q @:50-150 s

       would search the 's' database with residues 50-150 of query.aa.   FASTA
       cannot  automatically  detect  the  sequence type (protein vs DNA) when
       "stdin" is used, so the '-n' option is required for DNA.

       -1     Sort by "init1" score.

       -3     (TFASTA3, TFASTX/Y3 only) use only forward frame translations

       -a #   "SHOWALL" option attempts to align  all  of  both  sequences  in
              FASTA and SSEARCH.

       -A     force  Smith-Waterman  alignment  for output.  Smith-Waterman is
              the default for  protein  sequences  and  FASTX3,  but  not  for
              TFASTA3 or DNA comparisons with FASTA3.

       -b #   number  of  best  scores  to  show (must be < -E cutoff if -E is
              given)

       -B     show z-scores rather than bit scores

       -c #   threshold for band optimization (FASTA, FASTX)

       -C #   (fasta34t11d4)  length  of  name  abbreviation  in   alignments,
              default = 6.

       -d #   number of best alignments to show ( must be < -e cutoff)

       -D     turn  on  debugging  mode.   Enables checks on sequence alphabet
              that cause problems with tfastx3, tfasty3, tfasta3.

       -E #   expectation value upper limit for score and  alignment  display.
              Defaults  are 10.0 for FASTA3 and SSEARCH3 protein searches, 5.0
              for translated DNA/protein  comparisons,  and  2.0  for  DNA/DNA
              searches.

       -f #   penalty for opening a gap (or first residue for older versions)

       -F #   expectation  value  lower limit for score and alignment display.
              -F 1e-6 prevents library sequences with  E()-values  lower  than
              1e-6  from being displayed. This allows the use to focus on more
              distant relationships.

       -g #   penalty for additional residues in a gap

       -h #   (FASTX3, TFASTX3, FASTY3, TFASTY3 only) penalty for a frameshift
              between two codons.

       -j #   (FASTY3,  TFASTY3 only) penalty for a frameshift within a codon.

       -H     turn off histogram display

       -i     (DNA only) reverse complement the query sequence. (TFASTX)  com-
              pare   against  only  the  reverse  complement  of  the  library
              sequence.

       -l str specify FASTLIBS file

       -L     report long sequence description in alignments

       -m 0,1,2,3,4,5,6,9,10 alignment display options.  -m 0, 1, 2, 3
              display different types of alignments.  -m 4 provides an  align-
              ment  "map"  on the query. -m 5 combines the alignment map and a
              -m 0 alignment.  -m 6 provides an HTML output.  -m  9  does  not
              change  the  alignment output, but provides alignment coordinate
              and percent identity information with the  best  scores  report.
              -m 9c adds encoded alignment information to the -m 9; -m 9i pro-
              vides only percent identity  and  alignment  length  information
              with  the  best scores.  With current versions of the FASTA pro-
              grams, independent -m options can be combined; e.g. -m 1  -m  9c
              -m 6.

       -M #-# molecular  weight (residue) cutoffs.  -M "101-200" examines only
              sequences that are 101-200 residues long.

       -n     force query to nucleotide sequence

       -N #   break long library sequences into blocks of # residues.   Useful
              for  bacterial  genomes, which have only one sequence entry.  -N
              2000 works well for well for bacterial genomes.

       -o     (FASTA) turn fasta band optimization off during  initial  phase.
              This was the behavior of fasta1.x versions.

       -O file
              send output to file

       -q/-Q  quiet option; do not prompt for input

       -r "+n/-m"
              values  for  match/mismatch  for DNA comparisons. +n is used for
              the maximum positive value and -m is used for the maximum  nega-
              tive  value.  Values  between  max  and  min,  are rescaled, but
              residue pairs having the value -1 continue to be -1.

       -R file
              save all scores to statistics file (previously -r file)

       -s name
              specify substitution  matrix.   BLOSUM50  is  used  by  default;
              PAM250,  PAM120,  and  BLOSUM62  can  be specified by setting -s
              P120, P250, or BL62.   With  this  version,  many  more  scoring
              matrices  are  available,  including BLOSUM80 (BL80), and MDM10,
              MDM20,  MDM40  (Jones,  Taylor,  and   Thornton,   1992   CABIOS
              8:275-282;  specified as -s M10, -s M20, -s M40). Alternatively,
              BLASTP1.4 format scoring matrix files can be  specified.   BL80,
              BL62, and P120 are scaled in 1/2 bit units; all the other matri-
              ces use 1/3 bit units.  DNA scoring matrices can also be  speci-
              fied with the "-r" option.

       -S     treat  lower  case  letters in the query or database as low com-
              plexity regions that are equivalent to 'X'  during  the  initial
              database  scan, but are treated as normal residues for the final
              alignment display.  Statistical estimates are based on the 'X'ed
              out  sequence  used during the initial search. Protein databases
              (and query sequences) can be generated in the appropriate format
              using    John    Wooton's   "pseg"   program,   available   from
              ftp://ncbi.nlm.nih.gov/pub/seg/pseg.  Once you have compiled the
              "pseg" program, use the command:

              pseg database.fasta -z 1 -q  > database.lc_seg

       -t #   Translation  table  -  tfasta3,  fastx3,  tfastx3,  fasty3,  and
              tfasty3  now  support  the   BLAST   tranlation   tables.    See
              http://www.ncbi.nlm.nih.gov/htbin-post/Taxon-
              omy/wprintgc?mode=c/.

       -T #   (threaded, parallel only) number of threads or  workers  to  use
              (set by default to 4 at compile time).

       -U     Do  RNA  sequence  comparisons: treat 'T' as 'U', allow G:U base
              pairs (by scoring "G-A" and "T-C" as "G-G" -1).  Search only one
              strand.

       -V "?$%*"
              Allow  special  annotation  characters in query sequence.  These
              characters will be displayed in the alignments on the coordinate
              number line.

       -w # line width for similarity score, sequence alignment, output.

       -W # context length (default is 1/2 of line width -w) for alignment,
              like  fasta  and  ssearch, that provide additional sequence con-
              text.

       -x #   score used for matches to 'X' or 'N', overriding the value spec-
              ified in the scoring matrix.

       -X "#,#"
              offsets query, library sequence for numbering alignments

       -y #   Width  for  band optimization; by default 16 for DNA and protein
              ktup=2; 32 for protein ktup=1;

       -z #   Specify statistical calculation. Default is  -z  1,  which  uses
              regression against the length of the library sequence. -z 0 dis-
              ables statistics.  -z 2 provides  maximum  likelihood  estimates
              for  lambda  and  K,  censoring  the  250 lowest and 250 highest
              scores. -z 3 uses Altschul and Gish's statistical estimates  for
              specific  protein  BLOSUM scoring matrices and gap penalties. -z
              4,5: an alternate regression method.  -z 6  uses  a  composition
              based  maximum  likelihood  estimate based on the method of Mott
              (1992) Bull. Math. Biol. 54:59-75.  -z  11,12,14,15,16:  compute
              the regression against scores of randomly shuffled copies of the
              library sequences.  Twice as many comparisons are performed, but
              accurate  estimates  can  be generated from databases of related
              sequences. -z 11 uses the -z 1 regression strategy, etc.

       -Z db_size
              Set the apparent database size used for expectation value calcu-
              lations  (used  for  protein/protein  FASTA and SSEARCH, and for
              FASTX, FASTY, TFASTX, and TFASTY).

Environmental variables:
       FASTLIBS
              location of library choice file (-l FASTLIBS)

       SMATRIX
              default scoring matrix (-s SMATRIX)

       SRCH_URL
              the format string used to define the  option  to  re-search  the
              database.

       REF_URL
              the  format  string  used  to  define  the  option to lookup the
              library sequence in entrez, or some other database.


AUTHOR
       Bill Pearson
       wrp@virginia.EDU