CRG Introduction to Statistics and R 2017

From Bioinformatics Core Wiki


This is an introductory course to statistics and R programming. For the previous edition of this course, please refer to this page.
The R part is offered in 4 slow-paced practicums for absolute beginners, followed by 3 fast-paced practicums of statistical modules.
For practical exercises we will use R programming language and R Studio.

The statistics material is offered in 3 consecutive modules (please see Course Syllabus below), each containing a morning lecture and an afternoon practicum in a computer class. These practicums are focused on using statistics in R, with the purpose to demonstrate and reinforce understanding of concepts introduced in the lectures, rather than teaching R programming.

Course Instructors

Dates, Time and Location

  • Module 0. Introduction to R. May 25, 26, 29, 30, 2017.
    • 10:00 - 13:00.
    • PRBB. Boinformatics classroom (468). 4th floor. The hotel/North wing.

  • Modules I, II, III. Introduction to Statistics. June 6, 8, 9, 2017.
      • 10:00 - 13:00.
      • PRBB. Ramon y Cajal.
      • 14:00 - 17:00.
      • PRBB. Boinformatics classroom (468). 4th floor. The hotel/North wing.

Course Syllabus, Schedule, and Materials

MODULE 0. Introduction to R. May 25, 26, 29, 30.

  • PRACTICUM I. Intro to R and R Studio. May 25. 10:00 - 13:00.
    • Introduction to R studio: explore environment variable, navigate the history of commands, navigate directory and file structure, workspace and files.
    • Simple arithmetic in R console.
    • Create and delete an object.
    • Introduction to data types and the "vector" data structure.
    • Create and run a short script.
    • Read and write a file.
    • OUTCOME: Write a script that creates (and enters) a directory, process a simple manipulation and write into a file.

Slides day1
Exercises day1
Correction for exercise 1, exercise 2, exercise 3

  • PRACTICUM II. Data structures in R. May 26. 10:00 - 13:00.
    • More on vectors and factors.
    • Matrices and data frames: create, access/extract/subset, modify, arithmetic, conversions, check and name dimensions.
    • OUTCOME: Produce a script that reads matrices and data frames, manipulate them, read and write files.

Slides day2
Exercises day2

  • PRACTICUM III. Lists & Packages. May 29. 10:00 - 13:00.
    • More on data structures. Lists: create, access/extract/subset, modify.
    • Packages: find, install, load, explore/find functions and documentation, get help on functions.
    • OUTCOME: Install the packages "diamonds" and "WriteXLS". Use them in a script that manipulates the diamonds data frame and writes it into an Excel file.

Slides day3
Exercises day3

  • PRACTICUM IV. Plots & Graphics in R. May 30. 10:00 - 13:00.
    • Basic plotting: scatter plots, box plots, histograms, density plots. Changing colors, points shapes, titles, labels, legend, axes, etc.
    • Introduction to ggplot2 package: structure of ggplot2 commands, scatter plots.
    • OUTCOME: Write a script that produces, customizes, and saves plots in files.

Slides day4
Exercises day4

MODULE I. Descriptive Statistics & Intro to Probability. June 6.

  • LECTURE. 10:00 - 13:00.
    • Exploratory data analysis and graphical displays.
    • Samples, measures of center and spread, percentiles, odds ratio.
    • Outliers and robustness.
    • Independence, conditional probability, Bayes formula.
    • Distributions, population mean and population variance, Binomial, Poisson, and Normal distribution.
    • Central Limit theorem and the Law of large numbers.
    • Continuity correction.
    • Sampling with and without replacement.
    • Correction for finite population size.

Lecture 1 slides.

  • PRACTICUM. 14:00 - 17:00.
    • Descriptive statistics.
    • Plots: Bar-plot, histogram, CDF, box-plot, scatter-plot, pie charts etc.
    • Distributions, population mean and population variance.

Download the zipped html-file for the practicum.

MODULE II. Statistical Inference. June 8.

  • LECTURE. 10:00 - 13:00..
    • The concept of hypothesis testing, type I and type II error, false discovery rate.
    • Significance and confidence level, p-value.
    • One-sided and two-sided tests and confidence intervals.
    • Sampling distribution, estimators, standard error.
    • Normal probabilities in application to p-value.
    • One-sample and two-sample tests for independent and matched samples with known variance.
    • The case of unknown variance and Student t-distribution, assumption of normality.
    • Pooled variance and equal variances assumption.
    • Estimation of variance.
    • Fisher test for variance equality.
    • Non-parametric tests: Sign test, Wilcoxon sum of ranks test (Mann-Whitney U-test), Wilcoxon signed rank test.
    • Chi-square test for goodness of fit, chi-square test for independence.
    • Sample size estimation.

Lecture 2 slides.

  • PRACTICUM. 14:00 - 17:00..
    • One- and two-sample tests with known and unknown variance.
    • Test for proportions.
    • Confidence intervals and t-distribution.
    • Fisher test.
    • Sample size estimation.

Download the zipped html-file for the practicum Part 1.
Download the zipped html-file for the practicum Part 2.

MODULE III. Statistical modeling & Regression. June 9.

  • LECTURE. 10:00 - 13:00.
    • Simple linear regression model, residuals, degrees of freedom, least squares method, correlation coefficient, variance decomposition, determination coefficient.
    • Interpretation of the slope, correlation, and determination coefficients.
    • Standard error and statistical inference in simple linear regression model.
    • Analysis of variance (ANOVA). One-way and two-way ANOVA.
    • Beyond simple regression models: multiple regression, logistic regression.
    • Correction for multiple testing, family-wise error rate.

Lecture 3 slides.

  • PRACTICUM. 14:00 - 17:00.
    • QQ-plot.
    • Tests for normality.
    • Data transformation.
    • Non-parametric tests.
    • Problems on linear regression.
    • ANOVA.

Download the zipped html-file for the practicum Part 1.
Download the zipped html-file for the practicum Part 2.

External Resources

Bioinformatics Core Facility @ CRG — 2011-2018