Population Genomic Data Analysis

Learning outcome

After the course, the students can derive the key population genomic parameters from the coalescent theory, use these parameters to compare and classify different-sized populations and apply F-statistics methods for analyses of structured and admixed populations. They can efficiently perform computational analyses on genome-scale resequencing data, automating analysis steps using bash and awk. They can computationally retrieve data from databases and process and visualize these and self-produced data with R. Given all of that, they can interpret and critically evaluate the results of modern evolutionary genomic data analyses presented in original research articles.

Content

The course teaches efficient use of computational tools in the analysis of high-throughput resequencing data from model and non-model organisms, and the theoretical background of the key analyses. Particular emphasis is on efficient use of command-line tools on Linux, automation (scripting) of analysis steps and statistical analysis and visualisation of large-scale data with R. The central concepts of population genetics (pairwise differences, segregating sites, site frequency spectrum, F statistics) are derived from the Coalescent theory and these statistics are applied to study the effective population size, population structure and demography and natural selection.

Additional information

Completion methods
Course follows lecture → exercise → feedback cycle. Lectures focus on theoretical concepts and background around the week’s topic. The computational exercises are performed independently (collaboration is allowed) following written instructions; given assignments are returned in electronic format before the feedback session. Exercise works are reviewed in feedback sessions. Active participation is required to take the course exam. Lectures are in regular classrooms. Use of own laptop computer is strongly recommended.

Assessment practices and criteria
The final grade (1-5) is based on the course assignments, the research project made in small groups, and the final exam. In the research project, the learned skills are applied in practice, conclusions are drawn and the findings are written in the form of a mini-article. The final exam is done in Moodle. The exam consists of questions testing general knowledge of the course topics and practical work that utilises the methods learned in the course.

Activities and methods in support of learning
The course consists of 135 study hours. Contact teaching involves 40 hours: 30 hours of lectures, and 10 hours of voluntary advice on computer practicals. The remaining 95 hours consist of independently conducted studies, computer practicals and the group work.

Target groups
Primarily Master’s students in Ecology and Evolutionary Biology (EEB), Genetics and Molecular Biosciences (GMB) and Life Science Informatics (LSI). Open to students from other programmes and exchange students.

Teaching period
Period 2

Study module
The course is optional in Ecology and Evolutionary Biology, advanced studies

Language of instruction
English

Study materials
The online learning material is sufficient for completing the exercises. The following books provide a broader description of various topics handled in the course and are recommended to students continuing in the topic:

An introduction to population genetics / Rasmus Nielsen, Montgomery Slatkin

Coalescent theory: an introduction / John Wakeley

Bioinformatics Data Skills / Vince Buffalo