CRG PhD Course 2017 Introduction to Statistics in R

From Bioinformatics Core Wiki
Revision as of 13:18, 2 October 2017 by Jponomarenko (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Course Description

This introductory course to statistics and R is offered in 3 consecutive modules (please see Course Syllabus below), each consisting of A two-hour practicum in a computer class, using R Studio.

Course Objectives

To introduce the basic concepts of statistics and how they can be applied to real-life datasets using R. The students will produce their first scripts that can be re-used when they start analyzing their own data. Knowledge of statistics or R is not required for taking this course. However, familiarity with the material in the previous modules is recommended if the modules are not taken in a sequence.

Course Instructors

  • Sarah Bonnin (Module I) sarah.bonnin@crg.eu
  • German Demidov (Module II) german.demidov@crg.eu
  • Julia Ponomarenko (organizer, Module III) julia.ponomarenko@crg.eu

Time and Location

  • Oct 5, 10, 11, 2016. 11:00 - 13:00. PRBB Building. Boinformatics classroom. 468. 4th floor. The hotel wing.


Course Syllabus, Schedule, and Materials

MODULE I. Introduction to R. Basic functions and plotting. Oct 5, 2016.

  • Introduction to R programming language:
    • Introduction to R Studio.
    • Data types, variables, packages, handling files/scripts,functions.
  • Basic plots in R. The ggplot2 package.
    • Exploratory data analysis: bar-plot, histogram, box-plot, scatter-plot.
    • R Studio ggplot2 cheatsheet
  • R Markdown. How to produce html / pdf / Word reports with R.


  • Slides for Module 1: here
  • Template for Exercise 4: here


MODULE II. Introduction to Probability & Hypothesis testing. Oct 10, 2016.

  • Independence, conditional probability, Bayes formula.
  • Distributions, population mean and population variance.
  • Central Limit theorem and the Law of large numbers.
  • The concept of hypothesis testing, type I and type II error, false discovery rate.
  • Significance and confidence level, p-value. Confidence intervals. One-sided and two-sided tests.
  • One-sample and two-sample tests for independent and matched samples with known and unknown variance.
  • Student t-distribution, assumption of normality.
  • Test for proportions.
  • Download the zip-file of the module's materials.


MODULE III. Non-parametric tests & Linear regression. Oct 11, 2016.

  • Non-parametric tests: Sign test, Wilcoxon sum of ranks test (Mann-Whitney U-test), Wilcoxon signed rank test, Kruskal-Wallis test.
  • Kolmogorov-Smirnov (KS) test. Shapiro test for normality. QQ-plot.
  • Data transformation.
  • Download the zip-file for this part of the practicum.



External Resources

Bioinformatics Core Facility @ CRG — 2011-2024