热卖商品
新闻详情
维基百科的结果:分析RNA-SEQ_Mars-Zhan_新浪博客
来自 :
发布时间:2025-01-10
performtranscriptomestudies based onnext-generation
sequencingtechnologies. This technique is largely dependent
onbioinformaticstools developed to support the different steps of
the process. Here are listed some of the principal tools commonly
employed and links to some related web resources.
To follow an integrated guide to the analysis ofRNA-seqdata, please see -Next
Generation Sequencing (NGS)/RNAorRNA-Seq Workflow. Also, important
links areSEQanswers wikiandRNA-SeqBlog.
1.11Functional, Network Pathway Analysis Tools
1.12Workbench (analysis pipeline / integrated solutions)
1.13Further annotation tools for RNA-Seq data
1.14Webinars and Presentations
[edit]List of bioinformatics tools associated with RNA-Seq
[edit]Quality control and filtering data
Quality assessment is essential to the overall comprehension of
RNA-Seq, as well to guarantee that data are in the right format and
suitable for the next analyses. Often, is necessary to filter data,
removing low quality sequences, linkers, overrepresented sequences
or noise to assure a coherent final result.
cutadaptcutadaptremoves adapter sequences from
next-generation sequencing data (Illumina, SOLiD and 454). It is
used especially when the read length of the sequencing machine is
longer than the sequenced molecule, like the microRNA case.
FastQCFastQCis a quality control tool for
high-throughput sequence data (Babraham Institute) and is developed
inJava. Import of data is possible fromFastQfiles, BAM or SAM
format. This tool provides an overview to inform about problematic
areas, summary graphs and tables to rapid assessment of data.
Results are presented inHTMLpermanent reports. FastQC can be run as
a stand alone application or it can be integrated into a larger
pipeline solution. See alsoseqanswers/FastQC.
FASTXFASTXToolkit is a set of command line tools
to manipulate reads in filesFASTAorFASTQformat. These commands make
possible preprocess the files before mapping with tools like
Bowtie. Some of the tasks allowed are: conversion from FASTQ to
FASTA format, information about statistics of quality, removing
sequencing adapters, filtering and cutting sequences based on
quality or conversionDNA/RNA.
htSeqToolshtSeqToolsis a Bioconductor package
able to perform quality control, processing of data and
visualization. htSeqTools makes possible visualize sample
correlations, to remove over-amplification artifacts, to assess
enrichment efficiency, to correct strand bias and visualize
hits.
RNA-SeQCRNA-SeQCis a tool with application in
experiment design, process optimization and quality control before
computational analysis. Essentially, provides three types of
quality control: read counts (such as duplicate reads, mapped reads
and mapped unique reads, rRNA reads, transcript-annotated reads,
strand specificity), coverage (like mean coverage, mean coefficient
of variation, 5’/3’ coverage, gaps in coverage, GC bias) and
expression correlation (the tool provides RPKM-based estimation of
expression levels). RNA-SeQC is implemented in Java and is not
required installation, however can be run using the GenePattern web
interface. The input could be one or more BAM files. HTML reports
are generated as output.
RSeQCRSeQCanalyzes perse aspects of RNA-Seq
experiments: sequence quality, sequencing depth, strand
specificity, GC bias, read distribution over the genome structure
and coverage uniformity. The input can be SAM, BAM, FASTA, BED
files or Chromosome size file (two-column, plain text file).
Visualization can be performed by genome browsers like UCSC, IGB
and IGV. However, R scripts can also be used to visualization.
SAMStatSAMStatidentifies problems and reports
several statistics at different phases of the process. This tool
evaluates unmapped, poorly and accurately mapped sequences
independently to infer possible causes of poor mapping.
ShortReadShortReadis a package provided in theR
(programming language)/BioConductorenvironments and allows input,
manipulation, quality assessment and output of next-generation
sequencing data. This tool makes possible manipulation of data,
such as filter solutions to remove reads based on predefined
criteria. ShortRead could be complemented with several Bioconductor
packages to further analysis and visualization solutions
(BioStrings,BSgenome,IRanges, and so on). See
alsoseqanswers/ShortRead.
TrimmomaticTrimmomaticperforms trimming for
Illumina platforms and works with FASTQ reads (single or
pair-ended). Some of the tasks executed are: cut adapters, cut
bases in optional positions based on quality thresholds, cut reads
to a specific length, converts quality scores to Phred-33/64.
[edit]Alignment Tools
After control assessment, the first step of RNA-Seq analysis
involves alignment(RNA-Seq alignment)of the sequenced reads
to a reference genome (if available) or to a transcriptome
database. SeeList of sequence alignment softwareandHTS
Mappers.
[edit]Short (Unspliced) aligners
Short aligners are able to align continuous reads (not
containing gaps result of splicing) to a genome of reference.
Basically, there are two types: 1) based on theBurrows-Wheeler
transformmethod such as Bowtie and BWA, and 2) based on Seed-extend
methods,Needleman-WunschorSmith-Watermanalgorithms. The first group
(Bowtie and BWA) is many times faster, however some tools of the
second group, despite the time spent tend to be more sensitive,
generating more reads correctly aligned.
BFASTBFASTaligns short reads to reference
sequences and presents particular sensitivity towards errors, SNPs,
insertions and deletions. BFAST works with
theSmith-Watermanalgorithm. See alsoseqanwers/BFAST.
BowtieBowtieis a fast short aligner using an
algorithm based on theBurrows-Wheeler transformand theFM-index.
Bowtie tolerates a small number of mismatches. See
alsoseqanswers/Bowtie.
Burrows-Wheeler Aligner (BWA)BWAimplements two
algorithms based onBurrows Wheeler transform. The first algorithm
is used with reads with low error rate ( 3%). The second
algorithm was designed to handle more errors and implements
aSmith-Watermanstrategy. BWA allows mismatches and small gaps
(insertions and deletions). The output is presented in SAM format.
See alsoseqanswers/BWA.
Short Oligonucleotide Analysis Package
(SOAP)SOAP.
GNUMAPGNUMAPperforms alignment using a
probabilisticNeedleman-Wunschalgorithm. This tool is able to handle
alignment in repetitive regions of a genome without losing
information. The output of the program was developed to make
possible easy visualization using available software.
MaqMaqfirst aligns reads to reference sequences
and after performs a consensus stage. On the first stage performs
only ungapped alignment and tolerates up to 3 mismatches. See
alsoseqanswers/Maq.
MosaikMosaik. Mosaik is able to align reads
containing short gaps usingSmith-Waterman algorithm, ideal to
overcome SNPs, insertions and deletions. See
alsoseqanswers/Mosaik.
NovoAlignNovoAlign(commercial) is a short aligner
to the Illumina platform based onNeedleman-Wunschalgorithm.
Novoalign tolerates up to 8 mismatches per read, and up to 7bp of
indels. It is able to deal with bisulphite data. Output in SAM
format. See alsoseqanswers/NovoAlign.
SEALSEALuses aMapReducemodel to produce
distributed computing on clusters of computers. Seal uses BWA to
perform alignment andPicard MarkDuplicatesto detection and
duplicate read removal. See alsoseqanswers/SEAL.
SHRiMPSHRiMPemploys two techniques to align short
reads. Firstly, theq-gramfiltering technique based on multiple
seeds identifies candidate regions. Secondly, these regions are
investigated in detail usingSmith-Watermanalgorithm. See
alsoseqanswers/SHRiMP.
StampyStampycombines the sensitivity of hash
tables and the speed of BWA. Stampy is prepared to alignment of
reads containing sequence variation like insertions and deletions.
It is able to deal with reads up to 4500 bases and presents the
output in SAM format. See alsoseqanswers/Stampy.
ZOOM (commercial)ZOOMis a short aligner of the
Illumina/Solexa 1G platform. ZOOM uses extended spaced seeds
methodology building hash tables for the reads, and tolerates
mismatches and insertions and deletions. See
alsoseqanswers/ZOOM.
[edit]Spliced aligners
Many reads span exon-exon junctions and can not be aligned
directly by Short aligners, thus different approaches were
necessary. Some Spliced aligners employ Short aligners to align
firstly unspliced/continuous reads (exon-first approach), and after
follow a different strategy to align the rest containing spliced
regions - normally the reads are split into smaller segments and
mapped independently.
[edit]Aligners based on known splice junctions
In this case the detection of splice junctions is based on data
available in databases about known junctions. This type of tools
cannot identify novel splice junctions. Some of this data comes
from other expression methods likeexpressed sequence tags(EST).
ErangeErangeis a tool to alignment and data
quantification to mammalian transcriptomes. See
alsoseqanswers/Erange.
RNA-MATERNA-MATEis a computational pipeline for
alignment of data fromApplied BiosystemsSOLID system. Provides the
possibility of quality control and trimming of reads. The genome
alignments are performed usingmapreadsand the splice
junctions are identified based on a library of known exon-junction
sequences. This tool allows visualization of alignments and tag
counting. See alsoseqanswers/RNA-MATE.
RUMRUMperforms alignment based on a pipeline,
being able to manipulate reads with splice junctions, using Bowtie
and Blat. The flowchart starts doing alignment against a genome and
a transcriptome database executed by Bowtie. The next step is to
perform alignment of unmapped sequences to the genome of reference
using BLAT. In the final step all alignments are merged to get the
final alignment. The input files can be in FASTA or FASTQ format.
The output is presented in RUM and SAM format.
[edit]De novo Splice Aligners
De novo Splice aligners allow the detection of new Splice
junctions without previous annotated information. See alsoDe novo
Splice Aligners.
SpliceMapSpliceMap. See
alsoseqanswers/SpliceMap.
SuperSplatSuperSplatwas developed to find all
type of splice junctions. The algorithm splits each read in all
possible two-chunk combinations in an iterative way, and alignment
is tried to each chunck. Output in “Supersplat” format. See
alsoseqanswers/SuperSplat.
TopHatTopHat[2]is prepared to find de novo
junctions. TopHat aligns reads in two steps. Firstly, unspliced
reads are aligned with Bowtie. After, the aligned reads are
assembled with Maq resulting islands of sequences. Secondly, the
splice junctions are determined based on the initially unmapped
reads and the possible canonical donor and acceptor sites within
the island sequences. See alsoseqanswers/TopHat.
QPALMAQPALMApredicts splice junctions supported
onmachine learningalgorithms. In this case the training set is a
set of spliced reads with quality information and already known
alignments. See alsoseqanswers/QPALMA.
PassPassaligns gapped, ungapped reads and
alsobisulfite sequencingdata. It includes the possibility to filter
data before alignment (remotion of adapters). Pass
usesNeedleman-WunschandSmith-Watermanalgorithms, and performs
alignment in 3 stages: scanning positions of seed sequences in the
genome, testing the contiguous regions and finally refining the
alignment. See alsoseqanswers/Pass.
ContextMapContextMapwas developed to overcome
some limitations of TopHat and MapSplice, such as resolution of
ambiguities. The central idea of this tool is to consider reads in
gene expression context, improving this way alignment accuracy.
ContextMap can be used in stand-alone and supported by TopHat or
MapSplice. In stand-alone mode aligns reads to a genome, to a
transcriptome database or both.
HMMSplicerHMMSplicercan identify canonical and
non-canonical splice junctions in short-reads. Firstly, unspliced
reads are removed with Bowtie. After that, the remaining reads are
one at a time pided in half, then each part is seeded against a
genome and the exon borders are determined based on theHidden
Markov Model. A quality score is assigned to each junction, useful
to detect false positive rates. See
alsoseqanswers/HMMSplicer.
STARSTARis an ultrafast tool that employs
“sequential maximum mappable seed search in uncompressed suffix
arrays followed by seed clustering and stitching procedure”,
detects canonical, non-canonical splices junctions and
chimeric-fusion sequences. It is already adapted to align long
reads (third-generation sequencing technologies). See
alsoseqanswers/STAR.
[edit]Quantitative analysis
These tools calculate the abundance of each gene expressed in a
RNA-Seq sample. See alsoQuantification models.
Alexa-SeqAlexa-Seqis a pipeline that makes
possible to perform gene expression analysis, transcript specific
expression analysis, exon junction expression and quantitative
alternative analysis. Allows wide alternative expression
visualization, statistics and graphs. See
alsoseqanswers/Alexa-Seq.
MMSEQMMSEQis a pipeline for estimating isoform
expression and allelic imbalance in diploid organisms based on
RNA-Seq. The pipeline employs tools like Bowtie, TopHat,
ArrayExpressHTS and SAMtools. Also, edgeR or DESeq to perform
differential expression. See alsoseqanswers/MMSEQ.
rQuantrQuantis a web service (Galaxy
(computational biology)installation) that determines abundances of
transcripts per gene locus, based onquadratic programming. rQuant
is able to evaluate biases introduced by experimental conditions. A
combination of tools is employed: PALMapper (reads alignment), mTiM
and mGene (inference of new transcripts).
NSMAPNSMAPallows inference of isoforms as well
estimation of expression levels, without annotated information. The
exons are identified and splice junctions are detected using
TopHat. All the possible isoforms are computed by combination of
the detected exons.
[edit]Differential expression
Tools designed to study the variability of genetic expression
between samples. See acomparative studyof differential
expression.
BaySeqBaySeq. See
alsoseqanswers/BaySeq.
CuffdiffCuffdiff.
DESeqDESeq. See alsoseqanswers/DESeq.
DEGSeqDEGSeq. See
alsoseqanswers/DEGSeq.
EdgeREdgeRis a R package for analysis of
differential expression of data from DNA sequencing methods, like
RNA-Seq, SAGE or ChIP-Seq data. edgeR employs statistical methods
supported on negative binomial distribution as a model for count
variability. See alsoseqanswers/EdgeR.
Limma
MyrnaMyrnais a pipeline tool that runs in a cloud
environment (Elastic MapReduce) or in a unique computer for
estimating differential gene expression in RNA-Seq datasets. Bowtie
is employed for short read alignment and R algorithms for interval
calculations, normalization, and statistical processing. See
alsoseqanswers/Myrna.
NOISeqNOISeq. See
alsoseqanswers/NOISeq.
ScottyScottyPerforms power analysis to estimate
the number of replicates and depth of sequencing required to call
differential expression.
[edit]Statistical analysis
MultiExperiment Viewer (MeV)MeV. See
alsoseqanswers/MeV.
[edit]Fusion genes/chimeras/translocation finders
Genome arrangements result of cancer can produce aberrant
genetic modifications like fusions or translocations.
Identification of these modifications play important role in
carcinogenesis studies.
ChimeraScanChimeraScan.
FusionCatcherFusionCatcher.
FusionHunterFusionHunteridentifies fusion
transcripts without depending on already known annotations. It uses
Bowtie as a first aligner and paired-end reads. See
alsoseqanswers/FusionHunter.
FusionSeqFusionSeq. See
alsoseqanswers/FusionSeq.
SOAPFuseSOAPFuse.
TopHat-FusionTopHat-Fusionis based on TopHat
version and was developed to handle reads resulting from fusion
genes. It does not require previous data about known genes and uses
Bowtie to align continuous reads.. See
alsoseqanswers/TopHat-Fusion.
FusionMapFusionMap.
[edit]Copy Number Variations identification
CNVseqCNVseqdetectscopy number
variationssupported on a statistical model derived
fromarray-comparative genomic hybridization. Sequences alignment
are performed by BLAT, calculations are executed by R modules and
is fully automated using Perl. See
alsoseqanswers/CNVseq.
CnvHMM
[edit]RNA-Seq simulators
Flux simulatorFlux Simulator. See
alsoseqanswers/Flux.
RNASeqReadSimulatorRNASeqReadSimulator.
RSEM Read Simulatorrsem-simulate-reads.
BEERS SimulatorBEERSis formatted to mouse or
human data, and paired-end reads sequenced on Illumina platform.
Beers generates reads starting from a pool of gene models coming
from different published annotation origins. Some genes are chosen
randomly and afterwards are introduced deliberately errors (like
indels, base changes and low quality tails), followed by
construction of novel splice junctions.
[edit]Transcriptome assemblers
[edit]Genome-Guided assemblers
ScriptureScripture. See
alsoseqanswers/Scripture.
IsoInferIsoInfer.
IsoLassoIsoLasso.
[edit]Genome-Independent assemblers
KISSPLICEKISSPLICE.
Ingenuity Systems (commercial)iReport IPA:
Ingenuity’s IPA and iReport applications enable you to upload,
analyze, and visualize RNA-Seq datasets, eliminating the obstacles
between data and biological insight. Both IPA and iReport support
identification, analysis and interpretation of differentially
expressed isoforms between condition and control samples, and
support interpretation and assessment of expression changes in the
context of biological processes, disease and cellular phenotypes,
and molecular interactions. Ingenuity iReport supports the upload
of native Cuffdiff file format as well as gene expression lists.
IPA supports the upload of gene expression lists.
[edit]Workbench (analysis pipeline / integrated solutions)
ArrayExpressHTSArrayExpressHTS(andebi_ArrayExpressHTS)
is a BioConductor package that allows preprocessing, quality
assessment and estimation of expression of RNA-Seq datasets. It can
be run remotely at the European Bioinformatics Institute cloud or
locally. The package makes use of several tools: ShortRead (quality
control), Bowtie, TopHat or BWA (alignment to a reference genome),
SAMtools format, Cufflinks or MMSEQ (expression estimation). See
alsoseqanswers/ArrayExpressHTS.
Galaxy: Galaxy is a general purpose workbench platform
for computational biology. There are severalpublicly accessible
Galaxy serversthat support RNA-Seq tools and workflows,
includingNBIC\"sAndromeda, theCBIIT-Gigaserver, theGalaxy
Project\"spublic server, theGeneNetworkGalaxy server, theUniversity
of Oslo\"sGenomic Hyperbrowser,URGI\"sserver(which supports S-MART),
and many others.
GenePatternGenePatternoffers integrated solutions
to RNA-Seq analysis (Broad Institute).
Partek (commercial)Partek
NextGENe (commercial)NextGENe
RobiNARobiNA
S-MARTS-MARThandles mapped RNA-Seq data, and
performs essentially data manipulation (selection/exclusion of
reads, clustering and differential expression analysis) and
visualization (read information, distribution, comparison with
epigenomic ChIP-Seq data). It can be run on any laptop by a person
without computer background. A friendly graphycal user interface
makes easy the operation of the tools. See
alsoseqanswers/S-MART.
wapRNAwapRNA
BiNGS!SL-seq
[edit]Further annotation tools for RNA-Seq data
seq2HLAseq2HLAis an annotation tool for obtaining
an inpidual\"s HLA class I and II type and expression using standard
NGS RNA-Seq data infastqformat. It comprises mapping RNA-Seq reads
against a reference database of HLA alleles usingbowtie,
determining and reporting HLA type, confidence score and
locus-specific expression level. This tool is developed
inPythonandR. It is available as console tool orGalaxymodule. See
alsoseqanswers/seq2HLA.
HLAminerHLAmineris a computational method for
identifying HLA alleles directly from whole genome, exome and
transcriptome shotgun sequence datasets. HLA allele predictions are
derived by targeted assembly of shotgun sequence data and
comparison to a database of reference allele sequences. This tool
is developed inperland it is available as console tool.
[edit]Webinars and Presentations
RNASeq-Blog Presentations
RNA-Seq Workshop Documentation (UC Davis
University)
VIDEO: Strategies for Identifying Biologically Compelling
Genes from Breast Cancer Subtype RNA-Seq
ProfileswithAccompanying Analysis
Princeton Workshop
[edit]References
^Wang Z, Gerstein M, Snyder M.
(January 2009).\"RNA-Seq: a revolutionary tool for
transcriptomics\".Nature Reviews Genetics10(1):
57 63.doi:10.1038/nrg2484.PMC2949280.PMID19015660.
^Cole Trapnell, Lior
Pachter and Steven Salzberg (2009).\"TopHat: discovering splice
junctions with RNA-Seq\".Bioinformatics25(9):
1105 1111.doi:10.1093/bioinformatics/btp120.PMC2672628.PMID19289445.
^Cole Trapnell, Brian A
Williams, Geo Pertea, Ali Mortazavi, Gordon Kwan, Marijke J van
Baren, Steven L Salzberg, Barbara J Wold and Lior Pachter
(2010).\"Transcript assembly and abundance estimation from RNA-Seq
reveals thousands of new transcripts and switching among
isoforms\".Nature Biotechnology28(5):
511 515.doi:10.1038/nbt.1621.PMC3146043.PMID20436464.
^Zerbino DR, Birney E
(2008).\"Velvet: Algorithms for de novo short read assembly using de
Bruijn graphs\".Genome Research18(5):
821 829.doi:10.1101/gr.074492.107.PMC2336801.PMID18349386.
本文链接: http://genetools.immuno-online.com/view-1414077177.html
发布于 : 2025-01-10
阅读()
最新动态
1970-01-01
2024-01-11
2024-01-10
2024-01-17
2024-01-15
2024-01-07
2024-01-02
2024-01-19
2024-01-10
2024-01-25
品牌分类
制备对照寡糖
联络我们