
This work is licensed under a Creative Commons License.
SiDE help
- Introduction and purpose
- Usage
- How long does it takes
- Authors and Acknowledgements
- Before you start: If "The program doesn't
work"
- How to cite SiDE
- Terms of use
- Privacy and Security
- Disclaimer
- References
RNA interference (RNAi) is a sequence-specific post-transcriptional
gene-silencing process that constitutes a powerful new tool for analyzing
gene knockdown phenotypes (Fire et al., 1998). This mechanism is evolutionary conserved
among eukaryotes and plays an essential role in mediating responses to
exogenous RNAs (such as viruses) and in stabilizing the genome by
sequestering repetitive sequences (such as transposons (Hannon, 2002, Tijsterman et al.,
2002)).
RNA silencing is triggered by double-stranded RNA molecules, which
are cleaved by the enzyme DICER into 21- to 23-nt duplexes containing
a 2-nt overhang at the 3' end of each strand. These duplexes are
incorporated into a protein complex called the RNAinduced silencing
complex (RISC). The RISC is responsible for the sequence-specific degradation
of target RNAs that contain homologous sequences. The resulting downmodulation
of the encoded proteins can subsequently lead to the induction of a
specific phenotype.
Here we describe a new tool for the automatic design of human and mouse
siRNA, SiDE (which stands for Small interfering RNA design), that applies
rules for pattern sequence selection and performs similarity searches in
order to minimise crosshybridization. In addition, the presence of genomic
features such as single nucleotide polymorphisms (SNPs) can be considered
in the design. The tool is designed for high throughput design of siRNAs,
which implies high specificity and capacity for dealing with a large number
of target genes.
The main objective of the tool is the high throughput design of siRNAs.
To achieve this, a programme capable of processing a large number of genes
and producing lists of highly specific, putative siRNAs that, at the same
time, fulfil a series of thermodynamic and empiric rules is necessary. The
novelty of SIDE is the high specificity in the detection of targets for
siRNA interference. Currently, SiDE can find siRNA candidates in humans and mice.
Specificity is probably the most important factor for the successful
selection of proper siRNA candidates. If the siRNA match another gene the
possibility of silencing side effects cannot be discarded, and conclusions
on the functional assay would be impossible. The idea of specificity comes
from the goal of having a unique match in a target gene and reducing the
possibility of any possible cross-hybridisation in any other gene to a minimum.
Specificity is usually achieved by means of using any similarity search
algorithm such as BLAST.
We built up the databases for BLAST using the genes annotated in the latest
NCBI build genome assembly (in the curren version, 34 for human and 32
for mouse). Sequences, unique for each gene, are obtained from Ensembl
(Birney et al., 2004) and formatted (using formatdb tool) for BLAST use. Consequently, our
databases (one for human and one for mouse genes) contains one single copy
of each gene sequence.
SiRNA candidate selection and parameters for siRNA design
The starting point is a list of genes that can be provided using different
identifiers that currently include HUGO systematic names, RefSeq and locusLink
IDs, Ensembl IDs, and EMBL, SwTrEMBL and SwissProt IDs.
ORF target:
This parameter defines the limits for searching candidate siRNAs with respect to the start and
stop codons. It is advisable to avoid 5' and 3' UTR regions as well as the
neighbourhood of the start and stop codons because these regions are rich in
regulatory motifs and UTR-binding proteins and translation initiation complexes
could interfere with siRNA binding. Within the defined limits (75 nucleotides
after the start codon and 50 before the stop codon, by default), a sliding
window of 23bp is used to scan downstream for siRNAs candidates. All candidates
must meet the additional parameters selected below.
Patterns:
Candidate siRNAs can be searched to fit one or more predefined patterns
(AAN19TT, NARN17YNN, NNN19NN, AAN19NN, NANN17YNN, AAN19, NAN19NN, AARN18NN, NNN19,
being letters UIPAC codes: N any nucleotide, R purines and Y pyrimidines).
Furthermore, user-defined patters covering between 19 and 23 nucleotides can be used.
Selection of more degenerated patterns discard the use of more specific ones (e.g.
selection of NAN19NN will disable the patterns AAN19TT, NARN17YNN, AAN19NN, NANN17YNN
and AARN18NN as they're included in NAN19NN.
Pattern filtering:
A series of filters are applied to the SiRNA candidates. Most of these filters
are parameters with the recommended values set by default. The values can be changed
to other less stringent ones if required. These are the available filters:
- GC content. This filter allow the selection of siRNA with a GC content between the
percentages selected. It is strongly recommended to maintain percentages between
26% and 56%, equivalent to the 30%-52% suggested by Reynolds et al., 2004.
- CT content. This filter is the same as above but for CT content.
- Avoid more than 4 (default) contigous G. Poly G stretches can form
agglomerates that may interfere with the siRNA binding. With this filter you can
discard patterns with polyG.
- Avoid more than 4 (default) contigous A. The same as above but for A.
- Variation of up to 20% (default) in nucleotide composition. In this way,
the nucleotides do not display extreme percentage values.
- Avoid A in position 3. For example, the pattern NNAN20 would be avoided.
- After nucleotide 2 (default) set to 3 (default value in the range 2 to 4)
the minimum number of G/Cs in the next 4 bases.
- After nucleotide 17 (default) set to 3 (default value in the range 2 to 4)
the minimum number of A/Ts in the next 4 bases.
- Avoid SNPs. This option may be useful if the gene is suspected of
being polymorphic. In this instance, silencing could be affected by mismatches
due to allele variants. This option and the two to follow are active by default.
- Avoid exon boundaries. This option might be useful if alternative
splicing can take place in the gene. Avoiding exon boundaries thus reduces
the possibility of mispairing siRNA.
- Avoid repeat regions. Again, if repeated regions do exist it is
good practice to exclude them.
- Filter oligos with less than 6 (default) points of Reynolds et al. score. This score is
calculated in the same way that in Reynolds et al., 2004. There, the authors identified a set of
8 characteristics associated with siRNA functionality and developed an algorithm to
predict the eficiency of a siRNA. With that algorithm a siRNA was assigned a score according
to the satisfaction or not of the 8 characteristics. The score was calculated as follows:
- GC content between 30%-52% -> +1 point
- +1 point for each A/T at positions 15-19
- Absence of internal repeats -> +1 point
- An 'A' at position 19 -> +1 point
- An 'A' at position 3 -> +1 point
- A 'T' at position 10 -> +1 point
- A 'G' or 'C' at position 19 -> -1 point
- A 'G' at position 13 -> -1 point
Using this algorithm most functional siRNA tested in that paper had a score of 6 or more, so
a score of 6 was defined as the cutoff for selecting siRNAs.
Here we calculate the score in the same way, using the central 19 nucleotides of the pattern
(for example, if you select the pattern AAN19TT, we use the N19 to calculate the score). If you
use a custom pattern, this filter will be available only if the pattern has a length of 19 bases.
Also, to check the presence of internal repeats the mfold program (Mathews et al., 1999) was used
instead of the calculation of the melting temperature of the siRNA hairpin. If the minimum
folding energy is positive, the formation of internal repeats is going to be thermodynamically
unfavourable, so we're goint to supose an absence of internal repeats and therefore rule number
three is going to be satisfied.
- Filter oligos with a classification lower than Ib (default) based on Ui-Tei et al., 2004
classification. In this filter, the siRNA are classified according to the classification of
Ui-Tei et al. 2004:
- Class Ia: A/T at the 5' antisense (AS) end, C/G at the 5' sense (SS) end and 5-7 A/T residues in a
7 nt 5'-terminal end of AS.
- Class Ib: A/T at the 5' AS end, C/G at the 5' SS end and 4 A/T residues in a
7 nt 5'-terminal end of AS.
- Class II: All siRNAs that not belong either to class I or III.
- Class III: siRNAs with opposite features with respect to class I
(i.e. C/G at the 5' AS end, A/T at the 5' SS end and less than 4 A/T residues in a
7 nt 5'-terminal end of the AS).
- Filter oligos with less than 2 points of Amarzguioui & Prydz, 2004 score. This score is
calculated as explined in Amarzguioui & Prydz, 2004:
- A/T differential between the three terminal nucleotides in the 3' and 5' ends of the duplex -> -3/+3 points.
- A 'C' or 'G' at position 1 -> +1 point.
- A 'T' at position 1 -> -1 point.
- An 'A' at position 6 -> +1 point.
- A 'G' at position 19 -> -1 point.
- An 'A' or 'T' at position 19 -> +1 point.
- Filter oligos with less than 2 points based on Hsieh et al., 2004 paper. The authors of
this paper do not create specifically an algorithm to get a score for a particular siRNA, but the sequence criteria
they propose can be used to construct an algorithm. Here we follow the implementation suggested in Saetrom & Snove, 2004
with an additional -1 for A/T in position 11 as suggested in Hsieh et al., 2004.
We calculate the score as follows:
- A 'C' at position 6 -> -1 point.
- A 'C' or 'G' at position 11 -> +1 point.
- An 'A' or 'T' at position 11 -> -1 point.
- An 'A' at position 13 -> +1 point.
- A 'G' at position 16 -> +1 point.
- A 'G' at position 19 -> -1 point.
- A 'T' at position 19 -> +1 point.
- Filter oligos with less than 10 points of Takasaki et al., 2004 score.
Here the score is calculated as in Takasaki et al. 2004:
- An 'A' at position 1 -> -3.97 points.
- A 'T' at position 1 -> -3.75 points.
- A 'G' at position 1 -> +7.4 points.
- An 'A' at position 6 -> +2.33 points.
- A 'G' at position 7 -> +2.4 points.
- A 'T' at position 7 -> -2.59 points.
- An 'A' at position 8 -> +3.02 points.
- A 'G' at position 8 -> -2.35 points.
- A 'G' at position 9 -> -2.35 points.
- A 'T' at position 9 -> +2.3 points.
- A 'T' at position 15 -> +2.7 points.
- A 'G' at position 19 -> -2 points.
Blast filtering:
- Allow unspecific BLAST (Altschul et al., 1990) alignments with more than 4 (default in the range 1 to 10)
non-identical bases (gap). This filter will include putative siRNA targets in the result if the second
unspecific match (if any) has a gap of more non-identical bases than the one selected.
- BLAST alignments specific of Transcript. Ensembl facilitates the use of
transcripts instead of the complete gene (in cases in which different transcripts are
available), and siRNA can be selected for silencing specific transcripts if this filter is
selected.
Output:
- Sorting. Results can be sorted using different criteria including all the
scores, Differential Tm, number of gaps or Tm. The criteria of a number of mismatches are
usually intuitive enough
to establish a threshold. Nevertheless, in some situations, such as genes with biases or
strong non- uniformities in base composition, mismatches can result in erroneous estimations
of the likelihood of cross-hybridisations. In this instance, the application of criteria
that is more related to physical likelihood of matching such as the melting temperature rather
than the simple number of mismatches can be more useful. Melting temperature (Tm) of the
hybrid siRNA and the target sequence is estimated by using the program MELTING (Le Novére, 2001). The
differential of Tm between the specific target sequence and the next unspecific candidates
can be thus used as more rational criteria to set a threshold. The scores can also be a
good criteria to select the siRNAs. A comparative of the 5 scores can be found in Saetrom and Snove, 2004.
- Remove the first 2 nucleotides at 5' on Sense and last 2 nucleotides at 3'
on Antisense. This option and the two next ones can be used for convenient representation
of the final siRNA sequences.
- Remove the first 2 nucleotides at 5' on Sense and Antisense.
- Remove the last 2 nucleotides at 3' on Sense and Antisense.
This is an example of the SiDE output in HTML format:
These are theresults for p53 (HUGO TP53) selecting the NAN19NN pattern with the default options
(note that these results are based on Ensembl 22_34d, so they may not correspond to what you can
have now).
The first 2 columns show the start and end positions of the designed siRNA with respect to the
start of transcription of the gene.
The next columns is the pattern (or patterns) selected to make the design.
Columns four and five show the designed siRNA (sense and antisense strands respectively).
After that we have several characteristics for each siRNA like the CG% content (including the
whole siRNA, not just the central N19), the melting temperature (Tm) of the sense and antisense
strands of the siRNA, and the five scores implemented for that siRNA (calculated as has been seen avobe,
see "Pattern filtering" section).
The salt and DNA concentration used to calculate the melting temperature are:
Sodium concentration: 20 mM
DNA concentration: 10 microM
The last 2 columns are the gap (number of mismatches) with the next best hit in the blast
results, and the differential Tm of the designed siRNA and the next best hit.
How long does it takes
The time spent to predict the siRNAs is very variable. It depends on the number of genes being scanned,
the number of transcripts for each gene, the length of those genes, and the options selected. The following
examples might guide you in order to see how long is going to take your prediction:
Running times for individual human genes.
| Gene |
# of transcripts |
Pattern selected |
Time |
| TP53 |
1 |
AAN19NN |
less than 5 secs |
| TP53 |
1 |
NAN19NN |
35 secs |
| BRCA2 |
2 |
AAN19NN |
1 min 50 secs |
| BRCA2 |
2 |
NAN19NN |
25 mins |
Running times for several sets of genes.
| Organism |
Number of genes |
Pattern selected |
Time |
| Human |
10 |
AAN19NN |
45 secs |
| Human |
10 |
NAN19NN |
35 mins |
| Mouse |
20 |
AAN19NN |
45 secs |
| Mouse |
20 |
NAN19NN |
25 mins |
| Mouse |
20 |
NNN19NN |
2 hours |
Please, note that these times may have variations due to overall system loading.
This program was developped by Javier Santoyo and Juan M. Vaquerizas
from the Bioinformatics Unit at
CNIO under the supervision of
Joaquín Dopazo. Now we're runnning the program from the Bioinformatics Department at the
Centro de Investigación Príncipe Felipe (CIPF).
We want to thank the people of the
CNIO Bioinformatics Unit and Cell Division and Cancer Group for beta testing, and
comments on the design of the tool.
Please, before calling/emailing us if "the program doesn't work", make
sure you have read this documentation.
Please, also make sure that javascript is enabled on your web browser.
This tool has been tested in Explorer 6.0, Netscape 7.0 and Konqueror 3.1.1.
When using SiDE for your scientific work, please use the following reference
for your publications:
- You acknowledge that the SiDE Software is experimental in nature
and is supplied "AS IS", without obligation by the authors, the CIPF's
Bioinformatics Department or the CIPF to provide accompanying
services or support. The entire risk as to the quality and performance of the
Software is with you. The CIPF and the authors expressly disclaim any and all
warranties regarding the software, whether express or implied, including but
not limited to warranties pertaining to merchantability or fitness for a
particular purpose.
- By using results obtained with SiDE for any publication we ask you to cite
the corresponding references and the main web site
(http://side.bioinfo.ochoa.fib.es) in
that publication.
- We ask you to give us feedback concerning bugs, errors or misconfigurations.
Complaints or suggestions are welcome.
Uploaded data set are saved in temporary directories in the server and are
accessible through the web until they are erased after some time. Anybody can
access those directories, nevertheless the name of the directories are not
trivial, thus it is not easy for a third person to access your data.
In any case, you should keep in mind that communications between the client
(your computer) and the server are not encripted at all, thus it is also
possible for somebody else to look at your data while you are uploading or
dowloading them.
This software is experimental in nature and is supplied "AS IS", without
obligation by the authors or the CIPF the to provide accompanying services or
support. The entire risk as to the quality and performance of the software is
with you. The authors expressly disclaim any and all warranties regarding the
software, whether express or implied, including but not limited to warranties
pertaining to merchantability or fitness for a particular purpose.
- Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. (1990).
Basic local alignment search tool. J. Mol. Biol. 215, 403-410.
- Amarzguioui, M and Prydz, H. (2004). An algorithm for selection of functional siRNA sequences.
Biochem. Biophys. Res. Commun. 316, 4, 1050-1058.ç
- Asharafi, D., Chang, F.Y. Watts, J.L., Fraser, A.G. Kamath, R.S., Ahringer, J.,
and Ruvkun, G. (2003) Genome-wide RNAi analysis of Caenorhabditis elegans fat regulatory genes.
Nature 421, 268-272
- Elbashir SM, Harborth J, Lendeckel W, Yalcin A, Weber K, Tuschl T. (2001) Duplexes of
21-nucleotide RNAs mediate RNA interference in cultured mammalian cells. Nature 411, 494-498
- Elbashir, S.M. Lendeckel, W. and Tuschl, T. (2001b) RNA interference is mediated by
21- and 22-nucleotide RNAs. Genes Dev. 15, 188-200
- Hannon, G.J. (2002) RNA interference. Nature 418, 244-251
- Hsieh, A., Bo, R., Manola, J., Vazquez, F., Bare, O., Khvorova, A., Scaringe, S., Sellers, W. (2004).
A library of siRNA duplexes targeting the phosphoinositide 3-kinase pathway: determinants
of gene silencing for use in cell-based screens. Nucleic Acids Res. 32, 3, 893-901.
- Le Novére, N. (2001). MELTING, computing the melting temperature of nucleic
acid duplex. Bioinformatics 17:1226-1227
- Mathews DH, Sabina J, Zuker M, Turner DH (1999). Expanded sequence dependence of
thermodynamic parameters improves prediction of RNA secondary structure. J Mol Biol. 288:911-940
- Saetrom P, Snove O Jr. (2004). A comparison of siRNA efficacy predictors.
Biochem Biophys Res Commun. 321, 1, 247-53.
- Reynolds, A., Leake, D., Boese, Q., Scaringe, S., Marshall, W.S. Khvorova, A.
(2004). Rational siRNA design for RNA interference. Nat. Biotechnol. 22:326-330
- Takasaki, S., Kotani, S., Konagaya, A. (2004). An effective method for selecting siRNA target
sequences in mammalian cells. Cell Cycle. Epub ahead of print.
- Ui-Tei, K., Naito, Y., Takahashi, F., Haraguchi, T., Ohki-Hamazaki, H., Juni, A., Ueda, R., Saigo, K. (2004).
Guidelines for the selection of highly effective siRNA sequences for mammalian and chick
RNA interference. Nucleic Acids Res. 32, 3, 936-948.
Questions? Comments? Send Nacho an email.
Last modified: Mon Apr 11 13:41:46 CEST 2005
Copyright
This document is copyrighted. Copyright © 2004, 2005, Javier Santoyo, Juan M. Vaquerizas, Joaquín Dopazo.