Functional Categories
  • Description of the procedure
  •  

    We assigned a number of functional categories to each of the 906 significantly differentially expressed genes, based on its best-hit homologies to genes for which a number of funcats was known by MIPS. By comparing the number of hits in each major funcat-group for each cluster, we get an indication of which type of genes are activated at which time. Below we show the workflow of the procedure, followed by a textual description.

    The MIPS has calculated for each known Arabidopsis thaliana protein, a list of homologous proteins and the e-value. For the ten most homologous proteins, the known functional categories were collected. These best-hit funcats for each Arabidopsis thaliana protein were made available in the MIPS MAtDB database. We downloaded the MAtDB dump (version v310103) and we parsed it so it would fit into a locally stored database on which we then wrote queries.

    One important query we wrote, takes a gene's assigned funcats list, and cuts off the sublevels in the categorization so that only a list of the major level funcats remains. Next, this query removes possible duplicates that could have arisen in this main-level funcat list. The result is a list of unique main-level functional categories for each At-gene.

    Next, we translated the spotnumbers of the 11 clusters to MIPS AT-codes (a few had to be left out, for which MIPS doesn't have an AT-code). With these AT-codes we could run the previously described query for all the genes in one cluster, and count how many hits each main-level funcat has, summed over all the genes of this cluster.

    We then wanted to know if this distribution of main-level funcats for a cluster is significantly different from what one would expect, based on what the distribution of funcats is for all of the genes on the microarray. Therefore we once again ran the above query, now for all of the genes on the microarray together.

    Finally, we compared the observed main-funcat distribution of each cluster to the distribution of all the genes on the microarray. For this comparison we used the chi-square statistic, explained by the figure below.

    The chi-square value can be converted to a statistical significancy that tells what the chance is that the observed main-level funcat distribution only happened by coincidence. We took a cutoff of 5% chance, corresponding to a chi-square of 3.84. We also looked if we would still receive interesting information with a little less stringent cutoff of chi-square = 3.00 (corresponding with a chance of 8.3%). [Calculated with this chi-square ~ chance converter]

     
  • Presentation for all clusters
  •   Excel file with a worksheet for each cluster: Funcats.xls. (Netscape users: please right-click and select "Save link as")
     
  • Funcat graphs for important clusters:
  •   EXP = expected number of homology-hits, according to chi-square statistics
    CLUx = actual number of homology-hits in cluster x
    red arrow = more than expected
    green arrow = less than expected
    thick arrow = with significance of 5% (chi-square >= 3.84)
    thin arrow = with significance of 8.3% (3.00 <= chi-square < 3.84)