Practicals for 2024 Genome Bioinformatics module.

Introduction

Cheap sequencing has created the opportunity to perform molecular-genetic analyses on just about anything. Conceptually, doing this would be similar to working with traditional genetic model organisms. But a large difference exists: For traditional genetic model organisms, large teams and communities of expert assemblers, predictors, and curators have put years of efforts into the prerequisites for most genomic analyses, including a reference genome and a set of gene predictions. In contrast, those of us working on “emerging” model organisms often have limited or no pre-existing resources and are part of much smaller teams. Emerging model organisms includes most crops, animals and plant pest species, many pathogens, and major models for ecology & evolution.

At the end of this module, you should be able to:

  1. inspect and clean short (Illumina) reads,
  2. perform genome assembly,
  3. assess the quality of the genome assembly using simple statistics,
  4. predict protein-coding genes,
  5. assess quality of gene predictions,
  6. assess quality of the entire process using a biologically meaningful measure.

NOTE:_ The exemplar datasets are simplified to run on laptops and to fit into the short format of this course. For real projects, much more sophisticated approaches are needed!


1. Prerequisites

Prerequisites for the practicals are:

2. Practicals

  1. Short read cleaning: Illumina short read cleaning
  2. Reads to genome: genome assembly, quality control
  3. Gene prediction: gene prediction, quality control
    • Population sequencing to genotypes to population genetics statistics:
    1. Mapping reads, calling variants, visualizing variant calls.
    2. Analysing variants: PCA, measuring Differentiation & Diversity.

3. Computers

To perform the practicals, you will remotely connect to the Amazon Web Services (AWS) (here, for more informations). You will use an SSH (here for more information), client to connect to a remote shell, where you will run the first three practicals. Some results will be available on a personal web page created for the course. The same web page will allow you to perform the fourth and fifth practicals.

4. Authors/Credits

The initial version of this practical was put together by:

* [Yannick Wurm](http://wurmlab.com) [@yannick__](http://twitter.com/yannick__)
* [Oksana Riba-Grognuz](https://www.linkedin.com/in/oksana80)'.

It was heavily heavily heavily revised and improved thanks to efforts and new content by https://github.com/wurmlab/genomicscourse/graphs/contributors.