The purpose of this course is to introduce students to the various applications of high-throughput sequencing including: chip-Seq, RNA-Seq, SNP calling, metagenomics, de-novo assembly and others. The course material will concentrate on presenting complete data analysis scenarios for each of these domains of applications and will introduce students to a wide variety of existing tools and techniques. We expect that by the end of the course work students will:
A laptop that has sufficient amount of battery power for 25 minute work may be required to perform data analysis tasks in class. Only Mac OSX (Tiger/Leopard) and Linux operating systems are supported.
Practical data analysis for life scientists BMMB 597D - Bio Data Analysis (2 cr.) Schedule #398704 Tuesday/Thursday 2:30-3:20 in 120 Thomas Building Limit of 25 students. Office hours: MW 2-3pm 502B Wartik
Lectures will appear below as they are presented. Homeworks are included in the handouts.
Lecture 2 - slides, handouts,
from the SGD. The GFF format,
sequence ontologies, basic Unix commands:
input and output streams, piping commands, processing a tabular file with UNIX tools, homework 2.
Lecture 6 - slides, handouts
running BLAST+, generating alignments with
blastn, customizing the
alignment output, using
makeblastdb to query the database and
extract sequences, blast+ manual, homework 6
Lecture 8 - slides, handouts, lecture-8.tar.gz (140Mb), tarbomb.tar.gz compressing files, creating and unpacking archives, fastq quality control, running and interpreting the FastQC tool output, homework 8
Lecture 9 - slides, handouts, Fastx Toolkit 64bit Mac, more about the fastq format, interpreting the fastqc plots, installing tools on your computer, running quality controls tools, the Fastx Toolkit, Fastq Quality Control Shootout homework 9
Lecture 17 - slides, handouts, the VCF and BCF formats, the VCF Poster, SNP calling with samtools, A statistical framework for SNP calling (Bioinformatics, 2011), homework 17
Lecture 24 - slides, handouts Chip-Seq frameworks, peak callers: MACS, Model Based Analysis for Chip-Seq, SISSRS, Site Identification from Short Sequence Reads, GeneTrack, helpful awk utilities bioawk-tools, homework 24
The final grade will be an average of the grades obtained on homework and a project. Please refer to the information in the first lecture. Homework will be handed out during each lectures in the form of exercises that will need to be turned in at the beginning of each week.
We want to emphasize that the primary goal of this course work is to improve students ability to handle and interpret data sets. Therefore the evaluation process is relative to the initial aptitudes. We aim to focus on developing permanent skills and talents that are not just immediately useful but also provide the foundation for further more in depth understanding of informatics in general.
Created by Istvan Albert • Last updated on Tuesday, March 31, 2015 • Site powered by PyBlue