Bioinformatics Recipes for Analysis of Omics Data

Bioinformatics recipes for analysis of NG sequencing data

This course is intended for students and researchers without computer bioinformatics and programming experience. It presents an overview of major highthroughput omics data and use of software packages for basic processing of these data as well as downstream analysis.
Major topics covered are:

Overview of highthroughput functional genomics data:
Functional genomics players
DNA hybridization technologies: microarrays
DNA hybridization technologies: DNA and RNA sequencing
DNA micrioarrays: expression, tiling arrays, SNP chips
Normalization, artifacts, batch effect correction
Design of microarray experiments
Next generation sequencing (NGS): DNA-seq, RNA-seq, ChIP-seq:
RNA-seq challenges
What coverage of my transcriptome by NGS reads I need?
NGS: read mapping approaches:
Hash-table approaches
BW transformation approaches
CHiP-seq and RNA-seq: genome segmentation
Analysis of transcriptome data (RNA-seq), detection of transcribed isoforms:
rQuant approach
Tophat/Cufflinks approach
RNA-seq, transcript abundance estimation:
edgeR, and rQuant: distribution based expression estimatio
Cufflinks: likelihood approach
RSEM: Bayesian Network and EM-algorithm approach
Correction of errors in reads:
Eriksson’s window alignment approach
EDAR and KEC k-mer based approaches
Haplotype reconstruction, and haplotype frequency estimation:
Eriksson’s clustering approach
ShoRAH software
statistical estimations
Versus sequenced genome
RNA multi-sample approach
Genome assembly from short reads
OLC techniques
deBruijn graph approaches
Transcriptome assembly from short reads
ABySS approach
Trinity algorithm

Gnerre S, Maccallum I, Przybylski D, Ribeiro FJ, Burton JN, Walker BJ, Sharpe T, Hall G, Shea TP, Sykes S, Berlin AM, Aird D, Costello M, Daza R, Williams L, Nicol R, Gnirke A, Nusbaum C, Lander ES, Jaffe DB: High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci U S A 2011, 108:1513-1518.
Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I: ABySS: A parallel assembler for short read sequence data. Genome Res 2009, 19:1117-1123.
Simpson JT, Durbin R: Efficient de novo assembly of large genomes using compressed data structures. Genome Res 2012, 22:549-556.
Li R, Zhu H, Ruan J, Qian W, Fang X, Shi Z, Li Y, Li S, Shan G, Kristiansen K, Li S, Yang H, Wang J, Wang J: De novo assembly of human genomes with massively parallel short read sequencing. Genome Res 2010, 20:265-272.
Blankenberg, D., N.Coraor, K.G.Von, J.Taylor, and A.Nekrutenko. 2011. Integrating diverse databases into an unified analysis framework: a Galaxy approach. Database. (Oxford) 2011:bar011.
Kugel, J.F., and J.A.Goodrich. 2012. Non-coding RNAs: key regulators of mammalian transcription. Trends Biochem. Sci.
Loots, G.G. 2008. Genomic identification of regulatory elements by evolutionary sequence comparison and functional analysis. Adv. Genet. 61:269-293.
Mortazavi, A., B.A.Williams, K.McCue, L.Schaeffer, and B.Wold. 2008. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5:621-628.
Zhang, Y., T.Liu, C.A.Meyer, J.Eeckhoute, D.S.Johnson, B.E.Bernstein, C.Nusbaum, R.M.Myers, M.Brown, W.Li, and X.S.Liu. 2008. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9:R137.
Li B. and Dewey C.N. 2011. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome, Bioinformatics 2011, 12:323
Trapnell C., Williams al.2010. Transcript assembly and abundance estimation from RNA-Seq reveals thousands of new transcripts and switching among isoforms, Nat Biotechnol., 28(5): 511–515
Manfred G Grabherr M. G., Brian J Haas B. J. et al. 2011. Full-length transcriptome assembly from RNA-Seq data without a reference genome, Nat. Biotechnology, 29(7): 644-652
Ramaswami G., Zhang R. 2013. Identifying RNA editing sites using RNA sequencing data alone, Nature Methods, 10(2): 128-132


  1. Topic1_BioRecipes_FunctGenomicsPlayers
  2. Topic2_BioRecipes_DNAhybridTechnoMicroarray
  3. Topic 3_BioRecipes_DNAhybridTechnoNGS
  4. Topic 4_BioRecipes_MicroaarrayArtifacts
  5. Topic 5_BioRecipes_DesignOfExperiment
  6. Topic 6_BioRecipes_RNA_seqChallengesCoverage
  7. Topic 7_BioRecipes_ReadMappingHashTable
  8. Topic 8_BioRecipes_ReadMappingBWT
  9. Topic 9_BioRecipes_TranscriptReconstructionSegmentation
  10. Topic 10_BioRecipes_IsoformRquantTopHat
  11. Topic 11_BioRecipes_IsoformCufflinks
  12. Topic 12_BioRecipes_TranscriptQuantifRquantEdgeR_ParamDiff
  13. Topic 13_BioRecipes_TranscriptQuantifRquant_SVM
  14. Topic 14_BioRecipes_TranscriptQuantificationCufflinks
  15. Topic 15_BioRecipes_TranscriptQuantifRSEM_BayesNetwork
  16. Topic 16_BioRecipes_TranscriptQuantifRSEM_EMalgorithm
  17. Topic 17_BioRecipes_ReadErrorCorrectionEriksson
  18. Topic 18_BioRecipes_HaplotypesEriksson
  19. Topic 19_BioRecipes_ReadErrorCorrectionEDAR
  20. Topic 20_BioRecipes_HaplotypesShoRAH
  21. Topic 21_BioRecipes_RNA-editingVsGenome
  22. Topic 22_BioRecipes_RNA-editingMultiSample
  23. Topic 23_BioRecipes_DenovoGenomeAssemblyDeBruijn
  24. Topic 24_BioRecipes_DenovoGenomeAssemblyOLC
  25. Topic 25_BioRecipes_DenovoTranscriptomeAssemblyDeBruijn