README file for Asatura This program was developed as part of the diploma thesis of Tancred Frickey (at the University of Konstanz, Germany), supervised by Yves Van de Peer and Axel Meyer. Most of the program should be self-explanatory (we think) but for those who do run into problems, or genuinely like manuals, we have provided this text file, which should hopefully alleviate some problems. The aim of the work was to develop a software tool that would let one estimate the level of mutational saturation present in a set of aligned amino acid (aa) sequences. Trees can then be inferred from the unsaturated fraction only. Basics: AsaturA displays a plot of observed sequence changes over assumed evolutionary distance (see Van de Peer et al. 2002, Gene 295, 205-211.) Visualization of the amount of saturation present is supposed to facilitate the selection of an appropriate cutoff value. Cutoff values are numerical, and describe a score. All amino acid replacements with substitution values (as read from a substitution matrix) above the cutoff are defined as frequent, those with substitution values equal to, or below the cutoff are termed rare. Then, the plot shows the number of rare, and frequent, changes over the assumed evolutionary distance. Careful selection of the cutoff value generally makes it possible to divide a mutationally saturated dataset into one that is highly saturated (increased evolutionary distance does not increase the number of observed changes) and one that displays almost clocklike behavior (linear correlation of evolutionary distance to observable changes). Once a cutoff value has been set, a distance matrix and neighbor-joining tree are calculated based only on the rare substitutions. To use the program, the user has to provide the following: 1. Java runtime environment (1.4 or later) 2. Substitution matrices (should be included in the zipfile) 3. a sequence alignment (Fasta, Clustal, PHYLIP or Treecon format) 4. A graphical interface (windows, gnome, KDE, etc..) To run the program from the command line try typing: "java -jar Asatura.jar" If you run into Java.lang.OutOfMemoryError (might happen for LARGE alignments) try:" java -Xmx300m -jar Asatura.jar", which increases the maximum memory you are willing to allocate to this application to 300 Megabytes (increase the value as needed). Under Windows: double click the icon, and if that doesn't work, try the command line or check your java installation. Once the program has read in an alignment and substitution matrix the user needs to: (1) select a distance correction method. The options are: No correction (dissimilarity=distance, also called p-distance); Poisson correction; Kimura correction and Tajima-Nei correction. (2) decide whether to create bootstrap replicates and if yes, how many. (3) Then the "Make it so" button can be pushed which will start generating bootstrap replicates and distance matrices. (4) Once the program is finished, selecting "Draw Tree" will read in the distance matrices (one for the alignment and one for each bootstrap replicate) and calculate a phylogenetic tree from that information (by the neighbor-joining method). (5) press "view tree" and take a look at your tree. //This is the simplest way to use ASauturA. For more adventurous users the following options are available. - Choose species: pressing that button will pop up a window in which you can include/exclude sequences from analysis. - Show substitution sites: That will show you a plot of the number of observed changes (rare and frequent) for each alignment position. - Extra utilities: These were added a bit later to remedy some problems the different substitution matrices generated. Some give substitution values for equal amino acid frequencies, others already incorporate generalized frequencies. Anything you change here should not influence the resulting tree too much, unless you are working with alignments with highly skewed amino acid frequencies (in that case VERY carefully select what matrix you use). Show AA Frequencies: this will show you the frequencies of the different amino acids in your alignment Show AA Mutabilities: this is a measure on how invariant the various residues are, based on your alignment (M. Dayhoff definition) Compute Mutation Probability Matrix: If you loaded a symmetrical matrix (i.e. the values are independent of actual amino acid frequencies) this lets you incorporate information about the AA frequencies and mutabilities specific to your alignment and therefore hopefully better reflects the evolutionary constraints the alignment was under. Show Base Matrix: shows the original matrix. Update main program matrix: this will feed the new matrix generated by "Compute Mutation Probability Matrix" to the main program. If this is not done, the main program will use the matrix originally loaded from file to do its computations. Save frequencies and mutabilities /Save Matrix: saves your alignments amino acid frequencies and mutabilities / customized substitution matrix to a file. If you have any problems running or installing the program, please contact Yves Van de Peer (yvdp@gengenp.rug.ac.be) or Tancred Frickey (tancred.frickey@tuebingen.mpg.de) Errors: Log Error between Sequences X and Y: This indicates that you have two sequences that are more dissimilar than the distance correction model can handle. In this case either use a different model to correct for multiple mutations and/or remove the most distantly related sequences (and check you alignment for frame shifts or the like). Error deleting File-x or Error while writing to outfile!: Check that you have read/write permissions to that folder/file Alignment is not recognizable as either Fasta,Clustal,Phylip,Treecon: Make sure that your alignment is in one of those formats, and if it is, try avoiding special characters in the sequence names. Error Reading Matrix: The format of the matrix should be: 1st line: tab or space followed by the 20 Amino acids in alphabetic order(3 letter code) separated by tabs following lines: AA-name followed by the corresponding substitution values (tab or space separated)