README file for Asatura

This program was developed as part of the diploma thesis of Tancred Frickey 
(at the University of Konstanz, Germany), supervised by Yves Van de Peer and 
Axel Meyer.

Most of the program should be self-explanatory (we think) but for those who do 
run into problems, or genuinely like manuals, we have provided this text file, 
which should hopefully alleviate some problems.

The aim of the work was to develop a software tool that would let one estimate 
the level of mutational saturation present in a set of aligned amino acid (aa) 
sequences. Trees can then be inferred from the unsaturated fraction only. 

Basics:
AsaturA displays a plot of observed sequence changes over assumed evolutionary 
distance (see Van de Peer et al. 2002, Gene 295, 205-211.) Visualization of the 
amount of saturation present is supposed to facilitate the selection of an 
appropriate cutoff value. Cutoff values are numerical, and describe a score. 
All amino acid replacements with substitution values (as read from a substitution 
matrix) above the cutoff are defined as frequent, those with substitution values 
equal to, or below the cutoff are termed rare.  Then, the plot shows the number 
of rare, and frequent, changes over the assumed evolutionary distance. Careful 
selection of the cutoff value generally makes it possible to divide a 
mutationally saturated dataset into one that is highly saturated (increased 
evolutionary distance does not increase the number of observed changes) and 
one that displays almost clocklike behavior (linear correlation of evolutionary 
distance to observable changes). Once a cutoff value has been set, a distance 
matrix and neighbor-joining tree are calculated based only on the rare 
substitutions.

To use the program, the user has to provide the following:
1. Java runtime environment (1.4 or later)
2. Substitution matrices (should be included in the zipfile)
3. a sequence alignment (Fasta, Clustal, PHYLIP or Treecon format)
4. A graphical interface (windows, gnome, KDE, etc..)

To run the program from the command line try typing: "java -jar Asatura.jar"
If you run into Java.lang.OutOfMemoryError (might happen for LARGE alignments) 
try:" java -Xmx300m -jar Asatura.jar", which increases the maximum memory you are 
willing to allocate to this application to 300 Megabytes (increase the value 
as needed).

Under Windows: double click the icon, and if that doesn't work, try the command 
line or check your java installation.

Once the program has read in an alignment and substitution matrix the user needs to:
(1) select a distance correction method. The options are: No correction 
    (dissimilarity=distance, also called p-distance); Poisson correction; Kimura 
    correction and Tajima-Nei correction.
(2) decide whether to create bootstrap replicates and if yes, how many. 
(3) Then the "Make it so" button can be pushed which will start generating bootstrap 
    replicates and distance matrices.
(4) Once the program is finished, selecting "Draw Tree" will read in the distance 
    matrices (one for the alignment and one for each bootstrap replicate) and 
    calculate a phylogenetic tree from that information (by the neighbor-joining method). 
(5) press "view tree" and take a look at your tree.


//This is the simplest way to use ASauturA. For more adventurous users the following 
options are available.

- Choose species: pressing that button will pop up a window in which you can 
include/exclude sequences from analysis.

- Show substitution sites: That will show you a plot of the number of observed changes 
(rare and frequent) for each alignment position.

- Extra utilities: These were added a bit later to remedy some problems the different 
substitution matrices generated. Some give substitution values for equal amino acid 
frequencies, others already incorporate generalized frequencies. Anything you change 
here should not influence the resulting tree too much, unless you are working with 
alignments with highly skewed amino acid frequencies (in that case VERY carefully 
select what matrix you use).

Show AA Frequencies: this will show you the frequencies of the different amino acids 
in your alignment

Show AA Mutabilities: this is a measure on how invariant the various residues are, based 
on your alignment (M. Dayhoff definition)

Compute Mutation Probability Matrix: If you loaded a symmetrical matrix (i.e. the values 
are independent of actual amino acid frequencies) this lets you incorporate information 
about the AA frequencies and mutabilities specific to your alignment and therefore 
hopefully better reflects the evolutionary constraints the alignment was under. 

Show Base Matrix: shows the original matrix.

Update main program matrix: this will feed the new matrix generated by "Compute Mutation 
Probability Matrix" to the main program. If this is not done, the main program will 
use the matrix originally loaded from file to do its computations.

Save frequencies and mutabilities /Save Matrix: saves your alignments amino acid frequencies 
and mutabilities / customized substitution matrix to a file.

If you have any problems running or installing the program, please contact Yves Van de Peer 
(yvdp@gengenp.rug.ac.be) or Tancred Frickey (tancred.frickey@tuebingen.mpg.de)


Errors:

Log Error between Sequences X and Y: This indicates that you have two sequences that are 
more dissimilar than the distance correction model can handle.

In this case either use a different model to correct for multiple mutations and/or remove 
the most distantly related sequences (and check you alignment for frame shifts or the like).
  
Error deleting File-x or Error while writing to outfile!: Check that you have read/write 
permissions to that folder/file

Alignment is not recognizable as either Fasta,Clustal,Phylip,Treecon: Make sure that your 
alignment is in one of those formats, and if it is, try avoiding special characters in 
the sequence names.

Error Reading Matrix: The format of the matrix should be:
1st line: tab or space followed by the 20 Amino acids in alphabetic order(3 letter code) 
separated by tabs
following lines: AA-name followed by the corresponding substitution values (tab or space 
separated)