4000-520-616
欢迎来到免疫在线!(蚂蚁淘生物旗下平台)  请登录 |  免费注册 |  询价篮
主营:主营:分子类,蛋白类,抗体类,生化类试剂
咨询热线电话
4000-520-616
当前位置: 首页 > 新闻动态 >
新闻详情
维基百科的结果:分析RNA-SEQ_Mars-Zhan_新浪博客
来自 : 发布时间:2025-01-10
performtranscriptomestudies based onnext-generation sequencingtechnologies. This technique is largely dependent onbioinformaticstools developed to support the different steps of the process. Here are listed some of the principal tools commonly employed and links to some related web resources. To follow an integrated guide to the analysis ofRNA-seqdata, please see -Next Generation Sequencing (NGS)/RNAorRNA-Seq Workflow. Also, important links areSEQanswers wikiandRNA-SeqBlog. 1.11Functional, Network Pathway Analysis Tools 1.12Workbench (analysis pipeline / integrated solutions) 1.13Further annotation tools for RNA-Seq data 1.14Webinars and Presentations [edit]List of bioinformatics tools associated with RNA-Seq [edit]Quality control and filtering data Quality assessment is essential to the overall comprehension of RNA-Seq, as well to guarantee that data are in the right format and suitable for the next analyses. Often, is necessary to filter data, removing low quality sequences, linkers, overrepresented sequences or noise to assure a coherent final result. cutadaptcutadaptremoves adapter sequences from next-generation sequencing data (Illumina, SOLiD and 454). It is used especially when the read length of the sequencing machine is longer than the sequenced molecule, like the microRNA case. FastQCFastQCis a quality control tool for high-throughput sequence data (Babraham Institute) and is developed inJava. Import of data is possible fromFastQfiles, BAM or SAM format. This tool provides an overview to inform about problematic areas, summary graphs and tables to rapid assessment of data. Results are presented inHTMLpermanent reports. FastQC can be run as a stand alone application or it can be integrated into a larger pipeline solution. See alsoseqanswers/FastQC. FASTXFASTXToolkit is a set of command line tools to manipulate reads in filesFASTAorFASTQformat. These commands make possible preprocess the files before mapping with tools like Bowtie. Some of the tasks allowed are: conversion from FASTQ to FASTA format, information about statistics of quality, removing sequencing adapters, filtering and cutting sequences based on quality or conversionDNA/RNA. htSeqToolshtSeqToolsis a Bioconductor package able to perform quality control, processing of data and visualization. htSeqTools makes possible visualize sample correlations, to remove over-amplification artifacts, to assess enrichment efficiency, to correct strand bias and visualize hits. RNA-SeQCRNA-SeQCis a tool with application in experiment design, process optimization and quality control before computational analysis. Essentially, provides three types of quality control: read counts (such as duplicate reads, mapped reads and mapped unique reads, rRNA reads, transcript-annotated reads, strand specificity), coverage (like mean coverage, mean coefficient of variation, 5’/3’ coverage, gaps in coverage, GC bias) and expression correlation (the tool provides RPKM-based estimation of expression levels). RNA-SeQC is implemented in Java and is not required installation, however can be run using the GenePattern web interface. The input could be one or more BAM files. HTML reports are generated as output. RSeQCRSeQCanalyzes perse aspects of RNA-Seq experiments: sequence quality, sequencing depth, strand specificity, GC bias, read distribution over the genome structure and coverage uniformity. The input can be SAM, BAM, FASTA, BED files or Chromosome size file (two-column, plain text file). Visualization can be performed by genome browsers like UCSC, IGB and IGV. However, R scripts can also be used to visualization. SAMStatSAMStatidentifies problems and reports several statistics at different phases of the process. This tool evaluates unmapped, poorly and accurately mapped sequences independently to infer possible causes of poor mapping. ShortReadShortReadis a package provided in theR (programming language)/BioConductorenvironments and allows input, manipulation, quality assessment and output of next-generation sequencing data. This tool makes possible manipulation of data, such as filter solutions to remove reads based on predefined criteria. ShortRead could be complemented with several Bioconductor packages to further analysis and visualization solutions (BioStrings,BSgenome,IRanges, and so on). See alsoseqanswers/ShortRead. TrimmomaticTrimmomaticperforms trimming for Illumina platforms and works with FASTQ reads (single or pair-ended). Some of the tasks executed are: cut adapters, cut bases in optional positions based on quality thresholds, cut reads to a specific length, converts quality scores to Phred-33/64. [edit]Alignment Tools After control assessment, the first step of RNA-Seq analysis involves alignment(RNA-Seq alignment)of the sequenced reads to a reference genome (if available) or to a transcriptome database. SeeList of sequence alignment softwareandHTS Mappers. [edit]Short (Unspliced) aligners Short aligners are able to align continuous reads (not containing gaps result of splicing) to a genome of reference. Basically, there are two types: 1) based on theBurrows-Wheeler transformmethod such as Bowtie and BWA, and 2) based on Seed-extend methods,Needleman-WunschorSmith-Watermanalgorithms. The first group (Bowtie and BWA) is many times faster, however some tools of the second group, despite the time spent tend to be more sensitive, generating more reads correctly aligned. BFASTBFASTaligns short reads to reference sequences and presents particular sensitivity towards errors, SNPs, insertions and deletions. BFAST works with theSmith-Watermanalgorithm. See alsoseqanwers/BFAST. BowtieBowtieis a fast short aligner using an algorithm based on theBurrows-Wheeler transformand theFM-index. Bowtie tolerates a small number of mismatches. See alsoseqanswers/Bowtie. Burrows-Wheeler Aligner (BWA)BWAimplements two algorithms based onBurrows Wheeler transform. The first algorithm is used with reads with low error rate ( 3%). The second algorithm was designed to handle more errors and implements aSmith-Watermanstrategy. BWA allows mismatches and small gaps (insertions and deletions). The output is presented in SAM format. See alsoseqanswers/BWA. Short Oligonucleotide Analysis Package (SOAP)SOAP. GNUMAPGNUMAPperforms alignment using a probabilisticNeedleman-Wunschalgorithm. This tool is able to handle alignment in repetitive regions of a genome without losing information. The output of the program was developed to make possible easy visualization using available software. MaqMaqfirst aligns reads to reference sequences and after performs a consensus stage. On the first stage performs only ungapped alignment and tolerates up to 3 mismatches. See alsoseqanswers/Maq. MosaikMosaik. Mosaik is able to align reads containing short gaps usingSmith-Waterman algorithm, ideal to overcome SNPs, insertions and deletions. See alsoseqanswers/Mosaik. NovoAlignNovoAlign(commercial) is a short aligner to the Illumina platform based onNeedleman-Wunschalgorithm. Novoalign tolerates up to 8 mismatches per read, and up to 7bp of indels. It is able to deal with bisulphite data. Output in SAM format. See alsoseqanswers/NovoAlign. SEALSEALuses aMapReducemodel to produce distributed computing on clusters of computers. Seal uses BWA to perform alignment andPicard MarkDuplicatesto detection and duplicate read removal. See alsoseqanswers/SEAL. SHRiMPSHRiMPemploys two techniques to align short reads. Firstly, theq-gramfiltering technique based on multiple seeds identifies candidate regions. Secondly, these regions are investigated in detail usingSmith-Watermanalgorithm. See alsoseqanswers/SHRiMP. StampyStampycombines the sensitivity of hash tables and the speed of BWA. Stampy is prepared to alignment of reads containing sequence variation like insertions and deletions. It is able to deal with reads up to 4500 bases and presents the output in SAM format. See alsoseqanswers/Stampy. ZOOM (commercial)ZOOMis a short aligner of the Illumina/Solexa 1G platform. ZOOM uses extended spaced seeds methodology building hash tables for the reads, and tolerates mismatches and insertions and deletions. See alsoseqanswers/ZOOM. [edit]Spliced aligners Many reads span exon-exon junctions and can not be aligned directly by Short aligners, thus different approaches were necessary. Some Spliced aligners employ Short aligners to align firstly unspliced/continuous reads (exon-first approach), and after follow a different strategy to align the rest containing spliced regions - normally the reads are split into smaller segments and mapped independently. [edit]Aligners based on known splice junctions In this case the detection of splice junctions is based on data available in databases about known junctions. This type of tools cannot identify novel splice junctions. Some of this data comes from other expression methods likeexpressed sequence tags(EST). ErangeErangeis a tool to alignment and data quantification to mammalian transcriptomes. See alsoseqanswers/Erange. RNA-MATERNA-MATEis a computational pipeline for alignment of data fromApplied BiosystemsSOLID system. Provides the possibility of quality control and trimming of reads. The genome alignments are performed usingmapreadsand the splice junctions are identified based on a library of known exon-junction sequences. This tool allows visualization of alignments and tag counting. See alsoseqanswers/RNA-MATE. RUMRUMperforms alignment based on a pipeline, being able to manipulate reads with splice junctions, using Bowtie and Blat. The flowchart starts doing alignment against a genome and a transcriptome database executed by Bowtie. The next step is to perform alignment of unmapped sequences to the genome of reference using BLAT. In the final step all alignments are merged to get the final alignment. The input files can be in FASTA or FASTQ format. The output is presented in RUM and SAM format. [edit]De novo Splice Aligners De novo Splice aligners allow the detection of new Splice junctions without previous annotated information. See alsoDe novo Splice Aligners. SpliceMapSpliceMap. See alsoseqanswers/SpliceMap. SuperSplatSuperSplatwas developed to find all type of splice junctions. The algorithm splits each read in all possible two-chunk combinations in an iterative way, and alignment is tried to each chunck. Output in “Supersplat” format. See alsoseqanswers/SuperSplat. TopHatTopHat[2]is prepared to find de novo junctions. TopHat aligns reads in two steps. Firstly, unspliced reads are aligned with Bowtie. After, the aligned reads are assembled with Maq resulting islands of sequences. Secondly, the splice junctions are determined based on the initially unmapped reads and the possible canonical donor and acceptor sites within the island sequences. See alsoseqanswers/TopHat. QPALMAQPALMApredicts splice junctions supported onmachine learningalgorithms. In this case the training set is a set of spliced reads with quality information and already known alignments. See alsoseqanswers/QPALMA. PassPassaligns gapped, ungapped reads and alsobisulfite sequencingdata. It includes the possibility to filter data before alignment (remotion of adapters). Pass usesNeedleman-WunschandSmith-Watermanalgorithms, and performs alignment in 3 stages: scanning positions of seed sequences in the genome, testing the contiguous regions and finally refining the alignment. See alsoseqanswers/Pass. ContextMapContextMapwas developed to overcome some limitations of TopHat and MapSplice, such as resolution of ambiguities. The central idea of this tool is to consider reads in gene expression context, improving this way alignment accuracy. ContextMap can be used in stand-alone and supported by TopHat or MapSplice. In stand-alone mode aligns reads to a genome, to a transcriptome database or both. HMMSplicerHMMSplicercan identify canonical and non-canonical splice junctions in short-reads. Firstly, unspliced reads are removed with Bowtie. After that, the remaining reads are one at a time pided in half, then each part is seeded against a genome and the exon borders are determined based on theHidden Markov Model. A quality score is assigned to each junction, useful to detect false positive rates. See alsoseqanswers/HMMSplicer. STARSTARis an ultrafast tool that employs “sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure”, detects canonical, non-canonical splices junctions and chimeric-fusion sequences. It is already adapted to align long reads (third-generation sequencing technologies). See alsoseqanswers/STAR. [edit]Quantitative analysis These tools calculate the abundance of each gene expressed in a RNA-Seq sample. See alsoQuantification models. Alexa-SeqAlexa-Seqis a pipeline that makes possible to perform gene expression analysis, transcript specific expression analysis, exon junction expression and quantitative alternative analysis. Allows wide alternative expression visualization, statistics and graphs. See alsoseqanswers/Alexa-Seq. MMSEQMMSEQis a pipeline for estimating isoform expression and allelic imbalance in diploid organisms based on RNA-Seq. The pipeline employs tools like Bowtie, TopHat, ArrayExpressHTS and SAMtools. Also, edgeR or DESeq to perform differential expression. See alsoseqanswers/MMSEQ. rQuantrQuantis a web service (Galaxy (computational biology)installation) that determines abundances of transcripts per gene locus, based onquadratic programming. rQuant is able to evaluate biases introduced by experimental conditions. A combination of tools is employed: PALMapper (reads alignment), mTiM and mGene (inference of new transcripts). NSMAPNSMAPallows inference of isoforms as well estimation of expression levels, without annotated information. The exons are identified and splice junctions are detected using TopHat. All the possible isoforms are computed by combination of the detected exons. [edit]Differential expression Tools designed to study the variability of genetic expression between samples. See acomparative studyof differential expression. BaySeqBaySeq. See alsoseqanswers/BaySeq. CuffdiffCuffdiff. DESeqDESeq. See alsoseqanswers/DESeq. DEGSeqDEGSeq. See alsoseqanswers/DEGSeq. EdgeREdgeRis a R package for analysis of differential expression of data from DNA sequencing methods, like RNA-Seq, SAGE or ChIP-Seq data. edgeR employs statistical methods supported on negative binomial distribution as a model for count variability. See alsoseqanswers/EdgeR. Limma MyrnaMyrnais a pipeline tool that runs in a cloud environment (Elastic MapReduce) or in a unique computer for estimating differential gene expression in RNA-Seq datasets. Bowtie is employed for short read alignment and R algorithms for interval calculations, normalization, and statistical processing. See alsoseqanswers/Myrna. NOISeqNOISeq. See alsoseqanswers/NOISeq. ScottyScottyPerforms power analysis to estimate the number of replicates and depth of sequencing required to call differential expression. [edit]Statistical analysis MultiExperiment Viewer (MeV)MeV. See alsoseqanswers/MeV. [edit]Fusion genes/chimeras/translocation finders Genome arrangements result of cancer can produce aberrant genetic modifications like fusions or translocations. Identification of these modifications play important role in carcinogenesis studies. ChimeraScanChimeraScan. FusionCatcherFusionCatcher. FusionHunterFusionHunteridentifies fusion transcripts without depending on already known annotations. It uses Bowtie as a first aligner and paired-end reads. See alsoseqanswers/FusionHunter. FusionSeqFusionSeq. See alsoseqanswers/FusionSeq. SOAPFuseSOAPFuse. TopHat-FusionTopHat-Fusionis based on TopHat version and was developed to handle reads resulting from fusion genes. It does not require previous data about known genes and uses Bowtie to align continuous reads.. See alsoseqanswers/TopHat-Fusion. FusionMapFusionMap. [edit]Copy Number Variations identification CNVseqCNVseqdetectscopy number variationssupported on a statistical model derived fromarray-comparative genomic hybridization. Sequences alignment are performed by BLAT, calculations are executed by R modules and is fully automated using Perl. See alsoseqanswers/CNVseq. CnvHMM [edit]RNA-Seq simulators Flux simulatorFlux Simulator. See alsoseqanswers/Flux. RNASeqReadSimulatorRNASeqReadSimulator. RSEM Read Simulatorrsem-simulate-reads. BEERS SimulatorBEERSis formatted to mouse or human data, and paired-end reads sequenced on Illumina platform. Beers generates reads starting from a pool of gene models coming from different published annotation origins. Some genes are chosen randomly and afterwards are introduced deliberately errors (like indels, base changes and low quality tails), followed by construction of novel splice junctions. [edit]Transcriptome assemblers [edit]Genome-Guided assemblers ScriptureScripture. See alsoseqanswers/Scripture. IsoInferIsoInfer. IsoLassoIsoLasso. [edit]Genome-Independent assemblers KISSPLICEKISSPLICE. Ingenuity Systems (commercial)iReport IPA: Ingenuity’s IPA and iReport applications enable you to upload, analyze, and visualize RNA-Seq datasets, eliminating the obstacles between data and biological insight. Both IPA and iReport support identification, analysis and interpretation of differentially expressed isoforms between condition and control samples, and support interpretation and assessment of expression changes in the context of biological processes, disease and cellular phenotypes, and molecular interactions. Ingenuity iReport supports the upload of native Cuffdiff file format as well as gene expression lists. IPA supports the upload of gene expression lists. [edit]Workbench (analysis pipeline / integrated solutions) ArrayExpressHTSArrayExpressHTS(andebi_ArrayExpressHTS) is a BioConductor package that allows preprocessing, quality assessment and estimation of expression of RNA-Seq datasets. It can be run remotely at the European Bioinformatics Institute cloud or locally. The package makes use of several tools: ShortRead (quality control), Bowtie, TopHat or BWA (alignment to a reference genome), SAMtools format, Cufflinks or MMSEQ (expression estimation). See alsoseqanswers/ArrayExpressHTS. Galaxy: Galaxy is a general purpose workbench platform for computational biology. There are severalpublicly accessible Galaxy serversthat support RNA-Seq tools and workflows, includingNBIC\"sAndromeda, theCBIIT-Gigaserver, theGalaxy Project\"spublic server, theGeneNetworkGalaxy server, theUniversity of Oslo\"sGenomic Hyperbrowser,URGI\"sserver(which supports S-MART), and many others. GenePatternGenePatternoffers integrated solutions to RNA-Seq analysis (Broad Institute). Partek (commercial)Partek NextGENe (commercial)NextGENe RobiNARobiNA S-MARTS-MARThandles mapped RNA-Seq data, and performs essentially data manipulation (selection/exclusion of reads, clustering and differential expression analysis) and visualization (read information, distribution, comparison with epigenomic ChIP-Seq data). It can be run on any laptop by a person without computer background. A friendly graphycal user interface makes easy the operation of the tools. See alsoseqanswers/S-MART. wapRNAwapRNA BiNGS!SL-seq [edit]Further annotation tools for RNA-Seq data seq2HLAseq2HLAis an annotation tool for obtaining an inpidual\"s HLA class I and II type and expression using standard NGS RNA-Seq data infastqformat. It comprises mapping RNA-Seq reads against a reference database of HLA alleles usingbowtie, determining and reporting HLA type, confidence score and locus-specific expression level. This tool is developed inPythonandR. It is available as console tool orGalaxymodule. See alsoseqanswers/seq2HLA. HLAminerHLAmineris a computational method for identifying HLA alleles directly from whole genome, exome and transcriptome shotgun sequence datasets. HLA allele predictions are derived by targeted assembly of shotgun sequence data and comparison to a database of reference allele sequences. This tool is developed inperland it is available as console tool. [edit]Webinars and Presentations RNASeq-Blog Presentations RNA-Seq Workshop Documentation (UC Davis University) VIDEO: Strategies for Identifying Biologically Compelling Genes from Breast Cancer Subtype RNA-Seq ProfileswithAccompanying Analysis Princeton Workshop [edit]References ^Wang Z, Gerstein M, Snyder M. (January 2009).\"RNA-Seq: a revolutionary tool for transcriptomics\".Nature Reviews Genetics10(1): 57 63.doi:10.1038/nrg2484.PMC2949280.PMID19015660. ^Cole Trapnell, Lior Pachter and Steven Salzberg (2009).\"TopHat: discovering splice junctions with RNA-Seq\".Bioinformatics25(9): 1105 1111.doi:10.1093/bioinformatics/btp120.PMC2672628.PMID19289445. ^Cole Trapnell, Brian A Williams, Geo Pertea, Ali Mortazavi, Gordon Kwan, Marijke J van Baren, Steven L Salzberg, Barbara J Wold and Lior Pachter (2010).\"Transcript assembly and abundance estimation from RNA-Seq reveals thousands of new transcripts and switching among isoforms\".Nature Biotechnology28(5): 511 515.doi:10.1038/nbt.1621.PMC3146043.PMID20436464. ^Zerbino DR, Birney E (2008).\"Velvet: Algorithms for de novo short read assembly using de Bruijn graphs\".Genome Research18(5): 821 829.doi:10.1101/gr.074492.107.PMC2336801.PMID18349386.

本文链接: http://genetools.immuno-online.com/view-1414077177.html

发布于 : 2025-01-10 阅读()