GENOPLANTETMS.P.A.D.S. (Specific Primers & Amplicon Design Software)

Please send comments, suggestions, bug reports to: spads@gengenp.rug.ac.be
Note : For high-throughput design, please contact spads@gengenp.rug.ac.be for GENOPLANTETMSPADS distribution


Your email :  required
Amplicon(s) Id : 

Input data

- Sequence data
Download sequence file (*) : 
or Cut&Paste here (FASTA / GenBank format) : 

- Localisation data(genes and exons coordinates)
No coordinates (whole input sequence for design)(unselect this option if you want to use the following gene structure options)
or Download genes annotation file (*) (Genbank format): 
or Cut&Paste annotations (Genbank format) here. Go here to have an example 

and select the structure to consider : CDSmRNA


(*) For complete GenBank entries as input, upload twice the GenBank file : one for sequence and one for gene(s) features


GENOPLANTEtmSPADS parameters table


Amplicon in exon
Amplicon overlapping intron Percentage of intronic sequence allowed

- Amplicon design parameters
No amplicon specificity required 
or Amplicon Specificity (Reference db) Amplicon specificity (%)
Amplicon Size Min: Opt: Max: 

- Primer design parameters
Primer Specificity (template db)
Primer Size Min:  Opt:  Max: 
Primer Tm Min:  Opt:  Max:  Max Tm Difference:
Product Tm Min:  Opt:  Max: 
Primer GC% Min:  Opt:  Max: 
Max Self Complementarity:   Max 3' Self Complementarity:  
Max #N's:   Max Poly-X:  
Salt Concentration:   Annealing Oligo Concentration:  
CG Clamp:  
Primer designed on phase
Liberal base


Few explanations about GENOPLANTETMSPADS

FUNCTION

GENOPLANTETMSPADS tries to design a unique GST (Gene-Sequence Tag) and a specific primer set for its PCR amplification, on a genomic sequence, knowing the gene structure.

DESCRIPTION

In transcriptome approaches, probes corresponding to individual members of multigenic families often lack specificity and are responsible of molecular cross-hybridization events. GENOPLANTETMSPADS uses a strategy to optimize the selection of specific probes or unique GST (Gene Sequence Tag) in a poor conserved region of each gene in a given genome. It can be applied when the genome considered is fully sequenced or if the gene family involved is exhaustively characterized. From a gene for which the intron-exon structure is known (or predicted), the program selects, by comparison with the whole genome, the less conserved region(s), and designs a primer pair specific to that region for its amplification from genomic DNA.

GENOPLANTETMSPADS algorithm

GENOPLANTETMSPADS has been developped in PERL language and merges the BLASTn [1] and Primer3 [2] programs. The GENOPLANTETMSPADS algorithm steps are the following:
1 - Search for the most specific region within each gene
Each exon is tested with BLASTn against the whole genome sequence. Removing the regions where hits are found, specific regions are defined in which primers can be designed. GST are searched in the 3Æ> 5' direction. If none are detected, the mismatch parameter of BLASTn is decreased in order to find more stringent hits, thus enlarging the gene specific target regions for primer design. Consequently, the specificity of the GST obtained will be lower.
2 - Primer design
The specific region is used as input of the Primer3 software, which designs primer couple in function of the chosen parameters and avoids primer-dimer and hairpin phenomena. All parameters, primer size, GC% & Tm, amplicon size, can be chosen by the user.
3 - Selection of specific primer pairs
All the primers designed by Primer3 are tested for specificity with BLASTn against the whole genome. They are excluded if any matches indicates potential unwanted PCR amplification.
4 û Analysis of amplicon specificity
Each amplicon is tested with BLASTn to determine its specificity. If the similarity percentage with putative paralogous is over 70%, the amplicon is removed and the next specific region is processed.

GENOPLANTETMSPADS evaluation

Evaluation of the software is available in the GENOPLANTETMSPADS poster [Download PDF] .

GENOPLANTETMSPADS input format :

 

- sequence : fasta or GenBank format
example : gene sequence of Arabidopsis thaliana predicted gene At1g01510 (exons are in red and introns in black in this figure but it is not required for
input format)
>At1g01210 hypothetical protein
ATGTCTAAGCAGAGGAAGAAAGCTGACTTAGCCACCGTTTTGCGCAAGTCATGGTACCAC
TTAAGGCTCTCGGTGCGCCATCCCACTCGGGTCCCGACTTGGGATGCGATTGTGCTCACA
GCGGCTAGTCCTGAACAAGCGGAGCTCTACGACTGGCAGCTCCGGCGAGCGAAACGTATG
GGACGAATAGCTAGCTCCACTGTCACTTTGGCCGTTCCTGATCCAGATGGCAAACGGATC
GGGTCTGGTGCTGCTACTCTCAACGCCATTTATGCTCTCGCTCGTCATTATGAGAAATTG
GGTTTTGATCTTGGTCCCGAGGTAAACATTGTGTTGACAGGTTAGACTATTCATAATTTG
ACCTCACTGTATCTCTTGCTTGAGTTGATATCTGAATCATTACGGTAGTTGGTTTTGTTG
ATTAACTTTGTTAATTTGATGAATCTGGGATGTGCAATAAACTACATCTGTTTATATAGC
AGTGTGGTGTGATTTGATGTTGTATGTTAATAATCAACAGATGGAAGTTGCGAATGGTGC
TTGCAAATGGGTTAGATTCATCTCTGCAAAGCATGTATTGATGCTTCATGCTGGAGGTGA
CTCCAAAAGGGTTCCATGGGCAAATCCTATGGGCAAAGTATTCCTCCCACTTCCTTATCT
TGCAGCTGATGACCCTGATGGTCCTGTTCCTCTCCTTTTTGATCATATTCTTGCTATCGC
TTCATGTGCAAGACAAGCTTTCCAAGACCAAGGTGATATCCTTTTTTTAGCTATGTAAAA
CATACAACGGATGCTGATTTTTGAATTTTATTTGTGAAGGTGGATTATTTATTATGACTG
GAGACGTCCTTCCTTGTTTTGATGCTTTTAAAATGACTCTCCCTGAAGACGCAGCTTCCA
TAGTTACTGTGCCTATTACTCTCGATATTGCCTCCAACCATGGTGTTATTGTCACATCAA
AATCTGAGTCACTTGCTGAAAGCTATACAGTTAGTTTAGTCAATGATCTTCTGCAGAAGC
CTACAGTAGAGGATCTTGTCAAGAAAGATGCAATTTTACATGATGGACGGACACTCCTTG
ACACTGGGATAATATCTGCTAGGGGCAGAGCATGGTCGGACCTGGTCGCTCTTGGATGCT
CGTGCCAACCCATGATCTTAGAGCTTATAGGTAGTAAGAAAGAGGTAAGTTTCTCATATT
ATGCAGCTTATTCCTAAAGAATGATATTGTACTTCTTCAACAGACTTTGACCTCTTATAC
ATGTTTTGTAATTGAATGGGTTTTTGACTTTGCAGATGAGTTTGTATGAAGATTTGGTGG
CTGCTTGGGTTCCTTCAAGGCATGATTGGCTGCGAACCAGACCTTTGGGTGAACTTCTTG
TTAACAGTCTGGGGAGGCAAAAGATGTACAGCTACTGCACCTGTATGTTTGTACTGATTT
CAAGACTAGCTAAACTTAAAAAAAAAGAAATCGAGATTGCTATGCTTACTTTTCTAATCT
CTTTGTATCATCTTGTGTCAGATGATTTGCAGTTTTTGCATTTTGGAACATCAAGTGAGG
TATTGGATCATTTAAGCGGGGATGCTTCAGGAATTGTTGGTCGGAGACACTTATGTTCCA
TCCCTGCAACTACGGTTTCTGATATTGCAGCATCTTCCGTTATTTTGTCTAGTGAAATTG
CACCTGGTGTCTCCATTGGTGAAGATTCACTTATATATGATTCAACAGTTTCTGGTGCTG
TACAAATTGGTTCTCAGTCCATAGTTGTTGGTATTCACATCCCGAGCGAAGATCTTGGAA
CTCCAGAGAGTTTCAGGTTCATGCTTCCTGATAGGCATTGTCTTTGGGAGGTCCCACTAG
TGGGACATAAGGGAAGAGTGATTGTGTATTGTGGTCTCCATGACAATCCAAAGAACTCAA
TTCATAAAGATGGAACTTTTTGCGGTAAACCCTTGGAGAAGGTATTGTTTGATCTTGGCA
TTGAGGAAAGCGACCTCTGGAGCTCGTATGTTGCACAAGATAGATGTTTGTGGAATGCAA
AACTGTTCCCGATTCTTACGTATAGTGAAATGCTGAAGTTAGCGTCGTGGTTGATGGGTT
TAGATGATAGTAGAAACAAGGAGAAGATTAAGTTGTGGAGAAGCTCACAACGTGTAAGCT
TAGAAGAGTTGCATGGATCAATCAACTTTCCTGAGATGTGCAATGGTTCCAGCAATCATC
AAGCTGATCTTGCGGGTGGAATCGCTAAAGCATGTATGAACTATGGTATGCTTGGGCGTA
ATTTGTCTCAGCTGTGCCATGAGATTTTACAGAAAGAGTCATTAGGATTGGAAATATGCA
AGAATTTTCTGGATCAATGTCCCAAATTTCAGGAGCAGAACTCCAAAATTCTTCCAAAGA
GTCGAGCATACCAGGTAGAAGTTGATCTTCTTCGAGCATGTGGGGATGAAGCAAAAGCTA
TAGAGTTGGAGCATAAAGTATGGGGAGCAGTTGCAGAAGAAACTGCTTCAGCTGTGAGAT
ATGGTTTTAGAGGTAAAAATCTAGCCACCACCGTTTGGTATAACACCTTTCATAAACCTG
GATTTAACTCTTTTATTTGTTCTTCAGAACATCTGTTGGAATCAAGTGGCAAGTCTCATT
CTGAGAATCATATTTCTCATCCGGATCGAGTTTTTCAACCAAGAAGGACAAAAGTTGAAC
TACCAGTTCGGGTAGATTTTGTAGGAGGTTGGAGTGATACACCTCCATGGAGCTTAGAGC
GTGCAGGTTACGTCCTGAACATGGCTATAACCTTAGAAGGTTCACTTCCAATTGGCACAA
TCATTGAAACAACAAATCAGATGGGAATCTCAATCCAAGACGACGCTGGAAACGAGCTAC
ACATCGAAGATCCAATAAGCATTAAGACACCATTTGAAGTCAATGATCCATTCAGGCTTG
TTAAATCTGCTCTATTGGTAACCGGCATTGTCCAAGAAAATTTTGTTGACTCCACAGGGT
TAGCAATAAAGACATGGGCCAATGTTCCTCGTGGCAGTGGTCTAGGAACCTCGAGCATTC
TAGCTGCAGCTGTTGTGAAAGGACTTCTCCAGATATCTAATGGAGATGAAAGCAATGAAA
ACATTGCAAGACTTGTCTTGGTTCTGGAGCAACTCATGGGTACAGGAGGTGGCTGGCAAG
ATCAGATTGGTGGATTATATCCAGGAATCAAATTCACTTCAAGTTTTCCAGGAATCCCTA
TGCGTCTTCAAGTTGTTCCTTTACTCGCCTCGCCACAGCTAATTTCAGAGTTGGAGCAAC
GCCTCCTTGTTGTTTTCACGGGTCAAGTAAGTAGCAACCACTGAGAGGAAGAAAAGATTT
TTTGTTAGCTACAGAGTCTCATTCATTTTATGCCTTTTTTATATAAACAGGTCAGGCTAG
CTCATCAAGTCCTACACAAGGTCGTTACAAGGTATTTGCAAAGAGATAATCTCCTAATTT
CAAGCATTAAGCGATTGACGGAGCTGGCGAAATCCGGTAGAGAAGCGTTGATGAACTGTG
AAGTTGACGAGGTAGGCGACATAATGTCAGAAGCTTGGAGACTGCATCAAGAGCTGGATC
CGTATTGCAGCAATGAGTTTGTGGATAAGCTTTTTGAGTTTTCGCAACCTTATAGCTCAG
GATTCAAGCTGGTAGGTGCAGGTGGTGGTGGATTCTCACTTATATTGGCTAAGGACGCAG
AGAAAGCCAAGGAGTTAAGACAGAGATTGGAAGAACATGCAGAGTTTGATGTCAAAGTTT
ACAACTGGAGCATCTGTATTTGA

- gene structure format : Tabulate format as shown below (a file containing many gene feature could be given and will try to design a GST for each gene)
example for At1g01220 (delimitating exons shown in red in the gene sequence)

CDS join(1..321,521..752,820..1244,1296..1422,
1522..2535,2608..3326,3411..3803)

or
mRNA join(<1..321,521..752,820..1244,1296..1422,
1522..2535,2608..3326,3411..3803>)

Other gene features format could be added => email : spads@gengenp.rug.ac.be
 

GENOPLANTETMSPADS input parameters :

The list of all parameters is listed in this table
No feature :
If the specific amplicon design is done on the whole sequence without taking into account the gene structure, select "No feature".

Amplicon specificity (reference db):
The BLAST database against which the amplicon specificity is determined. This can be the sequence of a whole genome, but also a whole gene family against which the amplicon must be specific. Contact the administrator to add more databases.

Amplicon specificity (template db):
In order to prevent cross-hybridization events to occur, the similarity between the amplicon sequence and the paralogues should be lower than 70% (default value). It is possible to increase this value for relaxing this parameter (lower specific GST), or to decrease it to obtain only high specific amplicon. For only high specific amplicon, put the value to 40%.

Amplicon in exon :
GENOPLANTETMSPADS tries to find a specific amplicon in an exon (no intronic sequence)

Amplicon overlapping intron :
It is possible to allow the amplicon to overlap introns. If selected, GENOPLANTETMSPADS will look for a amplicon overlapping intron, only if it doesn't find any amplicon in an exon.
If both "Amplicon in exon" & "Amplicon overlapping intron" are selected, GENOPLANTETMSPADS will try to find a specific amplicon in exon then overlapping intron.

Percentage of intronic sequence allowed :
If option "Amplicon overlapping intron" is selected. It is possible to set here the maximum percentage of intronic sequence within the amplicon.

Amplicon size :
The minimal, optimal, and maximal amplicon length can be set here.

Primer specificity db :
The BLAST database against which the primer set specificity is determined. Default is the same database used for amplicon specificity. But if the PCR template is not the whole genome (for instance a BAC), it could be advantageous (increases the success rate, increases GENOPLANTETMSPADS speed) to use a more appropriate BLAST database.
If you want to add a personnal BLAST database, contact the administrator.

Primer size :
The minimal, optimal, and maximal primer length can be set here.

Primer Tm :
The minimal, optimal, and maximal primer Tm (nearest-neighbor calculation) can be set here. The maximum Tm difference between both primer can be fixed.

Primer GC :
The minimal, optimal, and maximal primer GC content can be set here.

Primer designed on phase :
Selecting this option, the designed primer given have their 3'end base corresponding to the first base of a codon (according to the exon coordinates given, or assuming that the sequence given is on phase).

For long oligo design :
The fact that GENOPLANTETMSPADS is concerned by specificity all along the process of probe design can also be exploited to select unique long oligonucleotides (i.e. 50-80 bp) for DNA arrays. In this case, risks of hairpin conformation should be tested with single strand DNA secondary structure prediction tools such as Mfold (Zucker et al., 1999).
To perform long oligo design, set parameters as follow :
Amplicon size = oligo size
Primer size = (oligo size /2) (with a maximum allowed by Primer3 to 35)
Example for a 60mer : set amplicon size to 60, and primer size to 30

 

GENOPLANTETMSPADS output format

Example :

>At1g01220_1    exon 7-7/7      strand +        type E1 (26.95%)        pos : 3'
                seq                               begin     end  length      Tm     %GC
PRIMER 5':      AGTTTTCGCAACCTTATAGCTCAGG          3637    3661      25   63.14   44.00
PRIMER 3':      TCAAATACAGATGCTCCAGTTGT            3803    3781      23   58.78   39.13
Amplicon        %GC=43  length 167      %intronic=0
AGTTTTCGCAACCTTATAGCTCAGGATTCAAGCTGGTAGGTGCAGGTGGTGGTGGATTCT
CACTTATATTGGCTAAGGACGCAGAGAAAGCCAAGGAGTTAAGACAGAGATTGGAAGAAC
ATGCAGAGTTTGATGTCAAAGTTTACAACTGGAGCATCTGTATTTGA
 

exon
The exon in which the GST has been designed : [GST_begin] - [GST_end] / number of exons in the gene

strand
The orientation of the gene in the input sequence (plus/minus)

type
The GST type could be E1,E2,I1 or I2 : 'E' means that the GST has been designed in an exon, 'I' for overlapping an intron;
'1' for highly specific : means <40% identity with the closest paralogue in BLAST database
'2' for medium specific : means identity between 40% and 70%
'3' for low specific : means identity over 70% (risk of cross-hybridization)
In bracket, the identity percentage estimated

pos
The GST position in the transcript (5', center or 3')

seq (primer sequence)
The sequence of the selected primer, always 5'->3'

begin (primer begin)
The position of the 5' base of the primer

end (primer end)
The position of the 3' base of the primer

length (primer length)
The length of the primer

Tm (primer Tm)
The melting temperature of the primer (based on nearest-neighbor calculation)

%GC (primer %GC)
The percent of G or C bases in the primer
 

AMPLICON information

%GC content
The percent of G or C bases in the amplicon

amplicon length
The length of the amplicon

%intronic
The percent of intronic sequence (never over 50%), if the GST is overlapping an intron (type I)
 
 

AUTHORS

 
Vincent Thareau
URGV
2, rue Gaston Cremieux CP 5708,
F-91057 Evry cedex
France
vincent.thareau@evry.inra.fr
Patrice Déhais
INRA-AGENA
BP 27
F-31326 Castanet-Tolosan cedex
France
Patrice.Dehais@toulouse.inra.fr
Pierre Rouzé
Bioinformatics team leader
Dpt of PlantGenetics
KL Ledeganckstraat 35
9000 GENT
Belgium
pirou@gengenp.rug.ac.be
Sébastien Aubourg
Unité de Recherche en Génomique Végétale
INRA
Evry
France
aubourg@evry.inra.fr

LICENCE

Versions 1.1.4 and 1.2.0 of GENOPLANTETM SPADS are available on CD-Rom after signing a licence agreement.
People in academic and non for profit organizations can subscribe a free of charge licence in clicking here.
If you are a worker in a commercial organization, please contact us by email at the adress: contactbioinf@genoplante.com.



Disclaimers and copyright

GENOPLANTETM SPADS received financial support from the GENOPLANTETM programme.

GENOPLANTETM SPADS was filed at the Agence pour la Protection des Programmes (APP) under the number IDDN.FR.001.350016.000.D.P.2002.000.10000. This software, its constitutive components and its documentation, are the intellectual property of GENOPLANTE-VALOR. GENOPLANTETM SPADS is copyright protected in France and abroad. Neither the reproduction, nor its modification or diffusion is permitted without prior written consent of GENOPLANTE-VALOR.
GENOPLANTETM SPADS is furnished "as is" without warranties of any kind nor associated service. You assume all risks associated to the use of this software.


 
 
genoplante-info