Applied Bioinformatics 2014


Applied Bioinformatics
BMMB 852 - App Bioinfo(2 cr.)
Schedule #352210
Tuesday/Thursday 2:30-3:20 in 106 Wartik Hall
Limit of 25 students.
Office hours: MW 1-2:30pm MSC 130 (Crick Room)

This year (2014) the course will focus on the genomic data analysis of the Ebola virus that is causing the 2014 West Africa Ebola virus outbreak

Supporting Materials

Lecture Notes

Lectures will appear below as they are presented. Homeworks are specified in each handout.

Suggested reading

  1. Lecture 1 - slides, handouts. Course information, homework and project information, introduction to computing, setting up you computer, basic unix command line usage, organizing your projects, homework 1.

  2. Lecture 2 - slides, handouts, Biological databases, the GFF format, sequence ontologies, basic Unix commands: wc, grep, cut, sort, redirecting input and output streams, piping commands, processing a tabular file with UNIX tools, homework 2.

  3. Lecture 3 - slides, handouts, Inside the data factory: how the Ebola paper was written, the GenBank format, core concepts for the Short Read Archive (SRA), automated download of data from NCBI, installing and using Entrez Direct, homework 3.

  4. Lecture 4 - slides, handouts, Installing and using the SRA tookit, settings up paths, install a proper text editor, using the sra tooling to downloading project wide data, homework 4.

  5. Lecture 5 - slides, handouts, FASTA format, accession numbers, fetching subsequences from NCBI, creating scripts and reusable components, bash programming, homework 5.

  6. Lecture 6 - slides, handouts, An overview of single end sequencing, quality values, encodings, the Phred encoding, FASTQ format, homework 6.

  7. Lecture 7 - slides, handouts, Dealing with compressed files and file archives. Using gzip, gunzip and tar. Installing and running FastQC, interpreting the FastQC outputs, homework 7.

  8. Lecture 8 - slides, handouts, Base quality trimming, installing tools, evaluating the results of quality control, paired end sequencing concepts, homework 8.

  9. Lecture 9 - slides, handouts, Advanced pattern matching, regular expressions, detecting and trimming adaptor sequences homework 9.

  10. Lecture 10 - slides, handouts, The basics of alignments, global, local and semiglobal alignments, scoring matrices, pairwise alignments, homework 10.

  11. Lecture 11 - slides, handouts, Installing and Using BLAST, search strategies, Blast settings and configuration, homework 11.

  12. Lecture 12 - slides, handouts, Blast Cookbook, short usage examples, tips and tricks, homework 12.

  13. Lecture 13 - slides, handouts, installing tools, short read aligners, run and install bwa, homework 13.

  14. Lecture 14 - slides, handouts, the SAM (Sequence Alignment Map) format, homework 14.

  15. Lecture 15 - slides, handouts, the SAM/BAM and samtools, filter and select data, homework 15.

  16. Lecture 16 - slides, handouts, genomic data visualization, IGV, IGB, converting formats, homework 16.

  17. Lecture 17 - slides, handouts, some programming required, introduction to the AWK programming language, tabular file processing, filtering by feature types, homework 17.

  18. Lecture 18 - slides, handouts, the origins of genomic variation, a case study, comparing and evaluating alignment tools, homework 18.

  19. Lecture 19 - slides, handouts, sequencing coverages, pileups and the variant call format, homework 19.

  20. Lecture 20 - slides, handouts, aligner evaluation, computing coverages, the pileup formats, introduction to VCF formats, homework 20.

  21. Lecture 21 - slides, handouts, the variant call format, generating variant calls with samtools, homework 21.

  22. Lecture 22 - slides, handouts, bioinformatics survival toolkit: bioawk, seqtk, tabix, tabtk, align two genomes, annotate the effect of snps with snpEff, homework 22.

  23. Lecture 23 - slides, handouts, automating data processing, build and entire snp calling pipeline, homework 23.

  24. Lecture 24 - slides, handouts, interval datatypes, BED, GFF2, GTF, GFF3, specifying hierarhical relationships, homework 24.

  25. Lecture 25 - slides, handouts, interval handling, extending, flanking intervals with bedtools, extract sequences, homework 25.

  26. Lecture 26 - slides, handouts, intersecting and querying intervals data, homework 26.

  27. Lecture 27 - slides, handouts, rnaseq-data.tar.gz introduction to RNA-Seq, approaches, splice aware alignments , homework 27.

  28. Lecture 28 - slides, handouts, rnaseq-data.tar.gz running the Tuxedo suite: tophat, cuffdiff, cufflinks, homework 28.

  29. Lecture 29 - slides, handouts, rnaseq-data.tar.gz comparing different RNA-Seq methdologies eXpress, featureCounts suite, homework 29.

  30. Lecture 30 - slides, handouts the Gene Ontology, homework 30.

Course Syllabus

Instructor: Istvan Albert

Course records: PSU ELion

Course registration: BMMB 852 - Applied Bioinformatics

The purpose of this course is to introduce students to the various applications of high-throughput sequencing including: chip-Seq, RNA-Seq, SNP calling, metagenomics, de-novo assembly and others. The course material will concentrate on presenting complete data analysis scenarios for each of these domains of applications and will introduce students to a wide variety of existing tools and techniques. We expect that by the end of the course work students will:

Access to a Mac or Linux computer is necessary to perform the homework. Only Mac OSX (Tiger/Leopard) and Linux operating systems are supported.

Grading and Homework

This course will have a total of 30 homeworks that are given out at the end of each lecture and is due by the first lecture (Tuesday) each week. The final 30th homework will be a more complex project that requires more effort than a regular homework.

The final grade will be a weighted average of the grades obtained on the homeworks (the last homework has a weight of 5, the rest have a weight of 1).

For more details please refer to the information presented during the first lecture.

We want to emphasize that the primary goal of this course work is to improve students ability to handle and interpret data sets. Therefore the evaluation process is relative to the initial aptitudes. We aim to focus on developing permanent skills and talents that are not just immediately useful but also provide the foundation for further more in depth understanding of informatics in general.

All Penn State Policies regarding ethics and honorable behavior apply to this course.

Created by Istvan Albert • Last updated on Thursday, December 11, 2014 • Site powered by PyBlue