Doctoral Training in Computational Genomics


Required Courses

Computation Theme:
STAT 24400. Statistical Theory and Methods I. Fall
This course is the first quarter of a two-quarter systematic introduction to the principles and techniques of statistics, as well as to practical considerations in the analysis of data, with emphasis on the analysis of experimental data. This course covers tools from probability and the elements of statistical theory. Topics include the definitions of probability and random variables, binomial and other discrete probability distributions, normal and other continuous probability distributions, joint probability distributions and the transformation of random variables, principles of inference (including Bayesian inference), maximum likelihood estimation, hypothesis testing and confidence intervals, likelihood ratio tests, multinomial distributions, and chi-square tests. Examples are drawn from the social, physical, and biological sciences. The coverage of topics in probability is limited and brief, so students who have taken a course in probability find reinforcement rather than redundancy. Students who have already taken STAT 25100 may choose to take STAT 24410 (if offered) instead of STAT 24400. Students taking either STAT 24400 or STAT 24410 will have appropriate preparation for STAT 24500.

HGEN 48600 Fundamentals of Computational Biology: Models and Inference. Winter
Covers key principles in probability and statistics that are used to model and understand biological data. There will be a strong emphasis on stochastic processes and inference in complex hierarchical statistical models. Topics will vary but the typical content would include: Likelihood-based and Bayesian inference, Poisson processes, Markov models, Hidden Markov models, Gaussian Processes, Brownian motion, Birth-death processes, the Coalescent, Graphical models, Markov processes on trees and graphs, Markov Chain Monte Carlo. PQ: STAT 244 or equivalent.

HGEN 48800 Fundamentals of Computational Biology: Algorithms and Applications. Spring
This course will cover principles of data structure and algorithms, with emphasis on algorithms that have broad applications in computational biology. The specific topics may include dynamic programming, algorithms for graphs, numerical optimization, finite-difference schemes, matrix operations/factor analysis, and data management (e.g. SQL, HDF5). We will also discuss some applications of these algorithms (as well as commonly used statistical techniques) in genomics and system biology, including genome assembly, variant calling, transcriptome inference, and so on.

Core Electives

BIOS 20186. Fundamentals of Cell and Molecular Biology. Fall
This course is an introduction to molecular and cellular biology that emphasizes the unity of cellular processes amongst all living organisms. Topics are the structure, function, and synthesis of nucleic acids and protein; structure and function of cell organelles and extracellular matrices; energetics; cell cycle; cells in tissues and cell-signaling; temporal organization and regulation of metabolism; regulation of gene expression; and altered cell functions in disease states.

HGEN 47000 Human Genetics I. Fall
This course covers classical and modern approaches to studying cytogenetic, Mendelian, and complex human diseases. Topics include chromosome biology, human gene discovery for single gene and complex diseases, non Mendelian inheritance, mouse models of human disease, cancer genetics, and human population genetics. The format includes lectures and student presentations.

MGCB 31400 Genetic Analysis of Model Organisms. Fall.
Coverage of the fundamental tools of genetic analysis as used to study biological phenomena. Topics include genetic exchange in prokaryotes and eukaryotes, analysis of gene function, and epigenetics.

BIOS 20187 Fundamentals of Genetics. Winter
The goal of this course is to integrate recent developments in molecular genetics and the human genome project into the structure of classical genetics. Topics include Mendelian inheritance, linkage, tetrad analysis, DNA polymorphisms, human genome, chromosome aberrations and their molecular analysis, bacterial and virus genetics, regulatory mechanisms, DNA cloning, mechanism of mutation and recombination, and transposable elements.

ECEV 35600 Principles of Population Genetics I. Winter
Examines the basic theoretical principles of population genetics, and their application to the study of variation and evolution in natural populations.Topics include selection, mutation, random genetic drift, quantitative genetics, molecular evolution and variation, the evolution of selfish genetic systems, and human evolution.

HGEN/ECEV/BCMB 31100 Evolution of Biological Molecules. Winter
Introductory graduate-level course connects evolutionary changes imprinted in genes and genomes with the structure, function and behavior of the encoded protein and RNA molecules. Central themes are the mechanisms and dynamics by which molecular structure and function evolve, how protein/RNA architecture shapes evolutionary trajectories, and how patterns in present-day sequence can be interpreted to reveal the interplay data of evolutionary history and molecular properties.

BCMB 32200 Biophysics of Biomolecules. Spring.
This course covers the properties of proteins, RNA, and DNA, as well as their interactions. We emphasize the interplay between structure, thermodynamics, folding, and function at the molecular level. Topics include cooperativity, linked equilibrium, hydrogen exchange, electrostatics, diffusion, and binding.

HGEN 46900 Human Variation and Disease. Spring
This course focuses on principles of population and evolutionary genetics and complex trait mapping as they apply to humans. It will include the discussion of genetic variation and disease mapping data.

HGEN 47300 Genomics and Systems Biology. Spring
This lecture course explores the technologies that enable high-throughput collection of genomic-scale data, including sequencing, genotyping, gene expression profiling, assays of copy number variation, protein expression and protein-protein interaction. We also cover study design and statistical analysis of large data sets, as well as how data from different sources can be used to understand regulatory networks (i.e., systems). Statistical tools introduced include linear models, likelihood-based inference, supervised and unsupervised learning techniques, methods for assessing quality of data, hidden Markov models, and controlling for false discovery rates in large data sets. Readings are drawn from the primary literature.

HGEN 47100 Human Genetics III: Introductory Statistical Genetics. Winter
This courses focuses on genetic models for complex human disorders and quantitative traits. Topics covered also include linkage and linkage disequilibrium mapping genetic models for complex traits, and the explicit and implicit assumptions of such models.

MGCB 32000 Quantitative Analysis of Biological Dynamics. Spring
This course covers quantitative approaches to understanding biological organization and dynamics at molecular, sub-cellular and cellular levels. A key emphasis is on the use of simple mathematical models to gain insights into complex biological dynamics. We also will cover modern approaches to quantitative imaging and image analysis, and methods for comparing models to experimental data. A series of weekly computer labs will introduce students to scientific programming using Matlab and exercise basic concepts covered in the lectures.

Additional Electives
STAT 34300 Applied Linear Statistical Methods. Fall.
This course introduces the theory, methods, and applications of fitting and interpreting multiple regression models. Topics include the examination of residuals, the transformation of data, strategies and criteria for the selection of a regression equation, nonlinear models, biases due to excluded variables and measurement error, and the use and interpretation of computer package regression programs. The theoretical basis of the methods, the relation to linear algebra, and the effects of violations of assumptions are studied. Techniques discussed are illustrated by examples involving both physical and social sciences data.

ECEV 32000 Introduction to Scientific Computing for Biologists. Fall
The course will cover basic concepts in computing for an audience of biology graduate students. The students will receive basic training in the use of version control systems, databases and regular expressions. They will learn how to program in python and R and how to use R to produce publication-grade figures for their manuscripts, and how to typeset scientific manuscripts and theses using LaTeX. All the examples and exercises will be biologically motivated and will make use of real data. The approach will be hands-on, with lecturing followed by exercises in class.

STAT 30900/CMSC 37810. Mathematical Computation I — Matrix Computation. Fall.
This is an introductory course on numerical linear algebra, which is quite different from linear algebra. We will be much less interested in algebraic results that follow from axiomatic definitions of fields and vector spaces but much more interested in analytic results that hold only over the real and complex fields. The main objects of interest are real- or complex-valued matrices, which may come from differential operators, integral transforms, bilinear and quadratic forms, boundary and coboundary maps, Markov chains, correlations, DNA microarray measurements, movie ratings by viewers, friendship relations in social networks, etc. Numerical linear algebra provides the mathematical and algorithmic tools for analyzing these matrices. Topics covered: basic matrix decompositions LU, QR, SVD; Gaussian elimination and LU/LDU decompositions; backward error analysis, Gram-Schmidt orthogonalization and QR/complete orthogonal decompositions; solving linear systems, least squares, and total least squares problem; low-rank matrix approximations and matrix completion. We shall also include a brief overview of stationary and Krylov subspace iterative methods; eigenvalue and singular value problems; and sparse linear algebra.

CMSC 37720 Computational Systems Biology. Fall.
Introductory concepts of systems biology, computational methods for analysis, reconstruction, visualization, modeling and simulation of complex cellular networks including biochemical pathways for metabolism, regulation and signaling. Students will have the opportunity to explore systems of their own choosing and will participate in developing algorithms and tools for comparative genomic analysis, metabolic pathway construction, stoichiometeric analysis, flux analysis, metabolic modeling and cell simulation. A particular focus of the course will be onfurthering our understanding of the computer science challenges in the engineering of prokaryotic organisms. The course requires written assignments, programming assignments and a final course project.

STAT 24500 Statistical Theory and Methods II. Winter
This course is the second quarter of a two-quarter systematic introduction to the principles and techniques of statistics, as well as to practical considerations in the analysis of data, with emphasis on the analysis of experimental data. This course continues from either STAT 24400 or STAT 24410 and covers statistical methodology, including the analysis of variance, regression, correlation, and some multivariate analysis. Some principles of data analysis are introduced, and an attempt is made to present the analysis of variance and regression in a unified framework. Statistical software is used.

STAT 32950 Multivariate Statistical Analysis: Applications and Techniques. Winter
Multivariate Statistical Analysis concerns methods of simultaneous analysis of multiple outcome variables. The course will introduce basic theory and applications for analyzing multi-dimensional data. Topics include principal component analysis, factor analysis model, canonical correlation, multi-dimensional scaling, discriminant analysis, clustering methods, and common techniques of dimension reduction. In addition to traditional multivariate statistical inferences and methods based on Gaussian models, new developments in high dimensional data analysis will be discussed. Theoretical derivations will be presented with emphasis on motivations, applications and hands-on data analysis.

ECEV 42900 Theoretical Ecology. Winter
An introduction to mathematical modeling in ecology. The course will begin with linear growth and Lotka-Volterra models, and proceed to partial differential equations. The course’s perspective will emphasize numerical computations and fitting models to data.

STAT 35400=ECEV 35400,MGCB 35401 Gene Regulation. Spring
This course covers the fundamental theory of gene expression in prokaryotes and eukaryotes through lectures and readings in the primary literature. Natural and synthetic genetic systems arising in the context of E. coli physiology and Drosophila development will be used to illustrate fundamental biological problems together with the computational and theoretical tools required for their solution. These tools include large-scale optimization, image processing, ordinary and partial differential equations, the chemical Langevin and Fokker-Planck equations, and the chemical master equation. A central theme of the class is the art of identifying biological problems which require theoretical analysis and choosing the correct mathematical framework with which to solve the problem.

STAT 35500 Statistical Genetics. Spring
This is an advanced course in statistical genetics. We will take an in-depth look at statistical methods development in recent genetics literature, with the aim of achieving a deep understanding of the modeling approaches and assumptions, statistical principles, mathematical theorems, computational issues, and data analytic approaches underlying the methods. The goal is for the student to be able to ultimately apply the principles learned to future statistical methods development for genetic data analysis. This is a discussion course and student presentations will be required. Topics depend on the interests of the participants and will be based on recent published literature. Topics may include, but are not limited to, statistical problems in genetic association mapping, population genetics, integration of different types of genetic data, and genetic models for complex traits. The course material changes every year, and the course may be repeated for credit.

STAT 24610 Pattern Recognition. Spring
This course treats statistical models and methods for pattern recognition and machine learning. Topics include a review of the multivariate normal distribution, graphical models, computational methods for inference in graphical models in particular the EM algorithm for mixture models and HMM’s, and the sum-product algorithm. Linear discriminative analysis and other discriminative methods, such as decision trees and SVM’s are covered as well.

STAT 30210 Bayesian Analysis and Principles of Statistics. Spring
This course continues the development of Mathematical Statistics, with an emphasis on Bayesian analysis and underlying principles of inference. Topics include Bayesian Inference and Computation, Frequentist Inference and interpretation of p values and confidence intervals, Decision theory, admissibility and Stein’s paradox, the Likelihood principle, Exchangeability and De Finetti’s theorem, hierarchical modelling, multiple comparisons and False Discovery Rates. The mathematical level will generally be at that of an easy advanced calculus course. We will assume familiarity with standard statistical distributions (e.g., Normal, Poisson, Binomial, Exponential), with the laws of probability, expectation, conditional expectation, etc, and exposure to common statistical concepts such as p values and confidence intervals. Familiarity with the R statistical language will also be expected, and homework assignments will include programming problems in R.

STAT 37710 Machine Learning. Fall, Spring
This course provides hands-on experience with a range of contemporary machine learning algorithms, as well as an introduction to the theoretical aspects of the subject. Topics covered include: the PAC framework, elements of computational learning theory, the VC dimension, boosting, Bayesian learning, graphical models, clustering, dimensionality reduction, linear classifiers, kernel methods including SVMs, and an introduction to statistical learning theory.

STAT 37790 Topics in Statistical Machine Learning. Spring
This course is a second graduate level course in machine learning, assuming students have had previous exposure to machine learning and statistical theory. The emphasis of the course is on statistical methodology, learning theory, and algorithms for large-scale, high dimensional data. The selection of topics is influenced by recent research results, and students can take the course in more than one quarter.

Reading and Research Courses

English 33000
An advanced writing course for graduate students in all of the divisions and university professional programs. The Little Red Schoolhouse helps writers learn to communicate complex and difficult material clearly to a wide variety of expert and non-expert readers, including the readers in the academic community you are currently working to join. It is designed to prepare you for the demands of academic writing at the level of the dissertation, the academic or professional article, and the academic or professional book.

TBD: Seminar in Computational Biology.
Student presentation of genetics or genomics research papers with mixed experimental and statistics background. A key goal will be to improve communication among biological and quantitative students working with a heterogeneous audience. Students will present in pairs, typically chosen to represent complementary expertises (biology/quantitative).

GENE 31900 Introduction to Research (Allstars). Autumn, Winter.
Lectures on current research by departmental faculty and other invited speakers. A required course for all first-year graduate students.