Bioinformatics summer school 2022 extended version

CRUK Cambridge Centre annual Bioinformatics summer school

Zoom virtual school, University of Cambridge

This course is free of charge if you are funded by CRUK.


Deadline 15 August

Functional genomics looks at the dynamic aspects of how the genome functions within cells, particularly in the form of gene expression (transcription) and gene regulation. This workshop surveys current methods for functional genomics using high-throughput technologies.

High-throughput technologies such as next generation sequencing (NGS) can routinely produce massive amounts of data. However, such datasets pose new challenges in the way the data have to be analyzed, annotated and interpreted which are not trivial and are daunting to the wet-lab biologist. This course covers state-of-the-art and best-practice tools for bulk RNA-seq and ChIP-seq data analysis, and will also introduce approaches in prognostic gene signatures.

This year's summer school is formed of 4 modules:

Module 1: Introduction to R and Unix (5th – 7th September 2022, 09:30-17:30)

Module 2: Analysis of bulk RNA-seq data (18th Nov, 25th Nov, 2nd Dec 2022, 09:30-17:30)

Module 3: Analysis of single cell RNA-seq data (18th Jan, 25th Jan, 1st Feb 2023, 09:30-17:30)

Module 4: ChIP-seq and ATAC-seq analysis (2 days April 2023 exact dates TBA)

Module 1 is a prerequisite for all subsequent modules 

Participants are free to choice any or all of Modules 2-4 offering full flexibility in needs based training.

Module 1 part 1– Introduction to R

During the course we will be working with one of the most popular packages in R; tidyverse that will allow you to manipulate your data effectively and visualise it to a publication level standard.

During this course you will learn about:

  • Basic programming concepts in R
  • Data manipulation
  • Data visualization

After this course you should be able to:

  • Write and execute R code
  • Know where to look for help about R code
  • Read data from a file and write data to a file
  • Format data from tables
  • Create plots from data

Part 2 – Introduction to Unix

Using the Linux operating system and the bash command line interface, we will demonstrate the basic structure of the UNIX operating system.

During this course you will learn about:

  • The basic features of a UNIX operating system
  • Navigating the filesystem using a terminal(text based interface)
  • Using bash as a tool for data manipulation and automation
  • Incorporating external tools and resources into your UNIX environment
  • Best practices for managing and maintaining scripts written in bash

After this course you should be able to:

  • Know how to look for help when writing bash commands
  • Navigate the file-system using the bash command line
  • Perform basic file and data manipulation in bash
  • Structure bash commands into simple pipelines
  • Run commands on external servers
  • Access data on external servers
  • Manage the installation of third-party tools into a UNIX environment
  • Write bash scripts to automate multiple commands

Module 2– Analysis of bulk RNA-seq data

During this course you will learn about:

  • RNA sequencing technology and considerations on experimental design
  • Quality control of raw and aligned sequencing reads: FASTQC and Picard
  • Read alignment to a reference genome: Hisat2
  • Extract information from SAM/BAM files: samtools
  • Sources of variation in RNA-seq data
  • Differential expression analysis usingDEseq
  • Annotation resources in Bioconductor
  • Identifying over-represented gene sets among a list of differentially expressed genes

After this course you should be able to:

  • Design your RNA-Seq experiments properly, considering advantages and limitations of RNA-seq assays
  • Assess the quality of your datasets
  • Perform alignment and quantification of expression through different tools and pipelines
  • Know what tools are available in Bioconductor for RNA-seq data analysis and understand the basic object types that are utilised
  • Produce a list of differentially expressed genes from an RNA-seq experiment

Module 3– Analysis of single cell RNA-seq data

Recent technological advances have made it possible to obtain genome-wide transcriptome data from single cells using high-throughput sequencing (scRNA-seq). For this module is is beneficial if you have attended module 2 and / or have some sort of familiarity with the analysis of bulk RNA-seq data.

During this course you will learn about:

  • Different scRNA-seq technologies and what kind of data you obtain from each
  • Processing raw sequencing data from the commonly-used 10x Chromium platform using cellranger and the Loupe browser for exploratory analysis of the data. Preparing reference genomes for mapping with cellranger.
  • Use several R/Bioconductor packages for downstream analysis of scRNA-seq data, including: data normalization, correction for batch effects, dimensionality reduction methods (PCA, t-SNE and UMAP), cell clustering and differential expression analysis.

After this course you should be able to: 

  • Know about different single cell sequencing technologies
  • Process raw single-cell sequencing data and assess the quality of your data
  • Normalise scRNA-seq data
  • Visualise the data and apply dimensionality reduction
  • Apply methods for batch correction and data integration
  • identify groups of similar cells by clustering and identify marker genes to diffentiate them
  • Apply differential expression between conditions

Module 4– ChIP-seq and ATAC-seq analysis

The course starts with an introduction to ChIP-seq experiments for the detection of genome-wide DNA binding sites of transcription factors and other proteins. On the second day, we then focus on the analysis of differential binding, comparing between different samples. We will also give an introduction to ATAC-seq data analysis for the detection of regions of open chromatin.

It would beneficial if you have attended module 2 and / or have some sort of familiarity with the analysis of bulk RNA-seq data.

 During this course you will learn about:

  • Considerations on experimental design for ChIP-seq
  • Quality control of raw reads
  • Read alignment to a reference genome
  • Peak calling and motif analysis
  • Differential binding analysis
  • Quality control, processing and analysis of ATAC-seq data
After this course you should be able to: 
  • Understand quality of high-throughput sequencing data
  • Assess the quality of your ChIP-seq datasets and reproducibility of replicates
  • Perform alignment and peak calling of ChIP-seq datasets
  • Compare samples by performing differential binding analysis
  • Detect regions of open chromatin by analysing ATAC-seq datasets
  • Visualize multiple layers of epigenomic data


FULL -no longer accepting applications

Deadline 15 August