Creative Commons License
This work is licensed under a Creative Commons License.


SiDE help

Reference: Santoyo, J., Vaquerizas, J.M. & Dopazo, J. 2005
Highly specific and accurate selection of siRNAs for high-throughput functional assays.
Bioinformatics, 21(8), 1376-1382.

  1. Introduction and purpose
  2. Usage
  3. How long does it takes
  4. Authors and Acknowledgements
  5. Before you start: If "The program doesn't work"
  6. How to cite SiDE
  7. Terms of use
  8. Privacy and Security
  9. Disclaimer
  10. References

Intro and purpose

RNA interference (RNAi) is a sequence-specific post-transcriptional gene-silencing process that constitutes a powerful new tool for analyzing gene knockdown phenotypes (Fire et al., 1998). This mechanism is evolutionary conserved among eukaryotes and plays an essential role in mediating responses to exogenous RNAs (such as viruses) and in stabilizing the genome by sequestering repetitive sequences (such as transposons (Hannon, 2002, Tijsterman et al., 2002)). RNA silencing is triggered by double-stranded RNA molecules, which are cleaved by the enzyme DICER into 21- to 23-nt duplexes containing a 2-nt overhang at the 3' end of each strand. These duplexes are incorporated into a protein complex called the RNAinduced silencing complex (RISC). The RISC is responsible for the sequence-specific degradation of target RNAs that contain homologous sequences. The resulting downmodulation of the encoded proteins can subsequently lead to the induction of a specific phenotype. Here we describe a new tool for the automatic design of human and mouse siRNA, SiDE (which stands for Small interfering RNA design), that applies rules for pattern sequence selection and performs similarity searches in order to minimise crosshybridization. In addition, the presence of genomic features such as single nucleotide polymorphisms (SNPs) can be considered in the design. The tool is designed for high throughput design of siRNAs, which implies high specificity and capacity for dealing with a large number of target genes.


Usage

The main objective of the tool is the high throughput design of siRNAs. To achieve this, a programme capable of processing a large number of genes and producing lists of highly specific, putative siRNAs that, at the same time, fulfil a series of thermodynamic and empiric rules is necessary. The novelty of SIDE is the high specificity in the detection of targets for siRNA interference. Currently, SiDE can find siRNA candidates in humans and mice.


Specificity

Specificity is probably the most important factor for the successful selection of proper siRNA candidates. If the siRNA match another gene the possibility of silencing side effects cannot be discarded, and conclusions on the functional assay would be impossible. The idea of specificity comes from the goal of having a unique match in a target gene and reducing the possibility of any possible cross-hybridisation in any other gene to a minimum. Specificity is usually achieved by means of using any similarity search algorithm such as BLAST. We built up the databases for BLAST using the genes annotated in the latest NCBI build genome assembly (in the curren version, 34 for human and 32 for mouse). Sequences, unique for each gene, are obtained from Ensembl (Birney et al., 2004) and formatted (using formatdb tool) for BLAST use. Consequently, our databases (one for human and one for mouse genes) contains one single copy of each gene sequence.


SiRNA candidate selection and parameters for siRNA design

The starting point is a list of genes that can be provided using different identifiers that currently include HUGO systematic names, RefSeq and locusLink IDs, Ensembl IDs, and EMBL, SwTrEMBL and SwissProt IDs.


ORF target:

This parameter defines the limits for searching candidate siRNAs with respect to the start and stop codons. It is advisable to avoid 5' and 3' UTR regions as well as the neighbourhood of the start and stop codons because these regions are rich in regulatory motifs and UTR-binding proteins and translation initiation complexes could interfere with siRNA binding. Within the defined limits (75 nucleotides after the start codon and 50 before the stop codon, by default), a sliding window of 23bp is used to scan downstream for siRNAs candidates. All candidates must meet the additional parameters selected below.


Patterns:

Candidate siRNAs can be searched to fit one or more predefined patterns (AAN19TT, NARN17YNN, NNN19NN, AAN19NN, NANN17YNN, AAN19, NAN19NN, AARN18NN, NNN19, being letters UIPAC codes: N any nucleotide, R purines and Y pyrimidines). Furthermore, user-defined patters covering between 19 and 23 nucleotides can be used. Selection of more degenerated patterns discard the use of more specific ones (e.g. selection of NAN19NN will disable the patterns AAN19TT, NARN17YNN, AAN19NN, NANN17YNN and AARN18NN as they're included in NAN19NN.


Pattern filtering:

A series of filters are applied to the SiRNA candidates. Most of these filters are parameters with the recommended values set by default. The values can be changed to other less stringent ones if required. These are the available filters:


Blast filtering:




Output:




Output

This is an example of the SiDE output in HTML format:


Results in HTML format

These are theresults for p53 (HUGO TP53) selecting the NAN19NN pattern with the default options (note that these results are based on Ensembl 22_34d, so they may not correspond to what you can have now).

The first 2 columns show the start and end positions of the designed siRNA with respect to the start of transcription of the gene.

The next columns is the pattern (or patterns) selected to make the design.

Columns four and five show the designed siRNA (sense and antisense strands respectively).

After that we have several characteristics for each siRNA like the CG% content (including the whole siRNA, not just the central N19), the melting temperature (Tm) of the sense and antisense strands of the siRNA, and the five scores implemented for that siRNA (calculated as has been seen avobe, see "Pattern filtering" section).

The salt and DNA concentration used to calculate the melting temperature are:
Sodium concentration: 20 mM
DNA concentration: 10 microM

The last 2 columns are the gap (number of mismatches) with the next best hit in the blast results, and the differential Tm of the designed siRNA and the next best hit.


How long does it takes

The time spent to predict the siRNAs is very variable. It depends on the number of genes being scanned, the number of transcripts for each gene, the length of those genes, and the options selected. The following examples might guide you in order to see how long is going to take your prediction:

Running times for individual human genes.
Gene # of transcripts Pattern selected Time
TP53 1 AAN19NN less than 5 secs
TP53 1 NAN19NN 35 secs
BRCA2 2 AAN19NN 1 min 50 secs
BRCA2 2 NAN19NN 25 mins


Running times for several sets of genes.
Organism Number of genes Pattern selected Time
Human 10 AAN19NN 45 secs
Human 10 NAN19NN 35 mins
Mouse 20 AAN19NN 45 secs
Mouse 20 NAN19NN 25 mins
Mouse 20 NNN19NN 2 hours

Please, note that these times may have variations due to overall system loading.

Authors and Acknowledgements

This program was developped by Javier Santoyo and Juan M. Vaquerizas from the Bioinformatics Unit at CNIO under the supervision of Joaquín Dopazo. Now we're runnning the program from the Bioinformatics Department at the Centro de Investigación Príncipe Felipe (CIPF).

We want to thank the people of the CNIO Bioinformatics Unit and Cell Division and Cancer Group for beta testing, and comments on the design of the tool.

Before you start: If "The program doesn't work"

Please, before calling/emailing us if "the program doesn't work", make sure you have read this documentation. Please, also make sure that javascript is enabled on your web browser. This tool has been tested in Explorer 6.0, Netscape 7.0 and Konqueror 3.1.1.

How to cite SiDE

When using SiDE for your scientific work, please use the following reference for your publications:
Reference: Santoyo, J., Vaquerizas, J.M. & Dopazo, J. 2005
Highly specific and accurate selection of siRNAs for high-throughput functional assays.
Bioinformatics, 21(8), 1376-1382.

Terms of use


Privacy and Security

Uploaded data set are saved in temporary directories in the server and are accessible through the web until they are erased after some time. Anybody can access those directories, nevertheless the name of the directories are not trivial, thus it is not easy for a third person to access your data.

In any case, you should keep in mind that communications between the client (your computer) and the server are not encripted at all, thus it is also possible for somebody else to look at your data while you are uploading or dowloading them.

Disclaimer

This software is experimental in nature and is supplied "AS IS", without obligation by the authors or the CIPF the to provide accompanying services or support. The entire risk as to the quality and performance of the software is with you. The authors expressly disclaim any and all warranties regarding the software, whether express or implied, including but not limited to warranties pertaining to merchantability or fitness for a particular purpose.

References


Questions? Comments? Send Nacho an email.
Last modified: Mon Apr 11 13:41:46 CEST 2005

Copyright

This document is copyrighted. Copyright © 2004, 2005, Javier Santoyo, Juan M. Vaquerizas, Joaquín Dopazo.

Valid HTML 4.01! Viewable With Any Browser