BIST Introduction to Statistics 2017

From Bioinformatics Core Wiki
Revision as of 14:44, 30 March 2017 by Jponomarenko (Talk | contribs)


Description

This is an introductory course to statistics and R programming. The R part is offered in 4 practicums followed by 3 practicums of statistical modules. The statistics material is offered in 3 consecutive modules (please see Course Syllabus below), each containing a morning lecture and an afternoon practicum in a computer class. For practical exercises we will use R programming language and R Studio. However, this course is focused on statistics rather than R; therefore, each practicum is designed with the purpose to demonstrate and reinforce understanding of concepts introduced in the lecture rather than to provide a training in R.

Course Instructors

  • Dmitri Pervouchine (lectures) pervouchine@gmail.com
  • German Demidov (practicums V - VII) german.demidov@crg.eu
  • Sarah Bonnin (practicums I - IV) sarah.bonnin@crg.eu
  • Julia Ponomarenko (organizer, practicums V - VII) julia.ponomarenko@crg.eu

Dates, Time and Location

  • LECTURES: PRBB. AULA Auditorium. 4th floor. The hotel wing.
    • June 6, 8, 9, 2017. 10:00 - 13:00.
  • PRACTICUMS: PRBB. Boinformatics classroom. 468. 4th floor. The hotel wing.
    • May 26, 26, 29, 30, 2017. 10:00 - 13:00.
    • June 6, 8, 9, 2017. 14:00 - 17:00.


Course Syllabus, Schedule, and Materials

MODULE 0. Workshop "Introduction to R". May 2, 2016. ICFO.

Download the workshop materials. The workshop was given by Dr. Alejandro Caceres, CREAL, and organized by the ICFO's Training and Development Program.


MODULE I. Descriptive statistics. May 6, 2016. CRG.

  • LECTURE I. View slides in this browser window. Exploratory data analysis: bar-plot, histogram, CDF, box-plot, scatter-plot, pie charts etc. Samples, measures of center and spread, percentiles, odds ratio. Outliers and robustness. Experiment versus observational study, confounding factors, simple random sample, other types of sampling, biases in sampling techniques.
  • LECTURE II. View slides in this browser window. Introduction to R programming language and R Studio: Data types, variables, packages, functions, handling files/scripts/projects.
  • PRACTICUM. View pdf-file in this browser window. Basic plots in R using the ggplot2 package.


MODULE II. Introduction to Probability. May 9, 2016. CRG.

  • LECTURE. View slides in this browser window. Independence, conditional probability, Bayes formula. Distributions, population mean and population variance, Binomial, Poisson, and Normal distribution. Central Limit theorem and the Law of large numbers. Continuity correction. Sampling with and without replacement. Correction for finite population size.
  • PRACTICUM. Download the zip-file. Elementary probability problems in R, pdf and cdf functions, simulation explicating the law of large numbers.
  • STATISTICAL TABLES
  • QUIZ 2


MODULE III. Statistical Inference, part I. May 13, 2016. CRG.

  • LECTURE. View slides in this browser window. Statistical Inference, part I. The concept of hypothesis testing, type I and type II error, false discovery rate. Significance and confidence level, p-value. Confidence intervals. One-sided and two-sided tests and confidence intervals. Sampling distribution, estimators, standard error. Normal probabilities in application to p-value. One-sample and two-sample tests for independent and matched samples with known variance. The case of unknown variance and Student t-distribution, assumption of normality. Pooled variance and equal variances assumption.
  • PRACTICUM. Download the zip-file. One- and two-sample tests with known and unknown variance, test for proportions, simulation involving confidence intervals and t-distribution.
  • QUIZ 3


MODULE IV. Statistical Inference, part II. May 18, 2016. CRG.

  • LECTURE. View slides in this browser window. Statistical Inference, part II. Estimation of variance. Fisher test for variance equality. Non-parametric tests. Sign test, Wilcoxon sum of ranks test (Mann-Whitney U-test), Wilcoxon signed rank test. Chi-square test for goodness of fit, chi-square test for independence. Kolmogorov-Smirnov (KS) test. Shapiro test for normality. Sample size estimation. Correction for multiple testing, family-wise error rate.
  • PRACTICUM. Download the zip-file. Tests with unknown variance, non-parametric tests, simulations explicating non-parametric tests, FDR.
  • QUIZ 4


MODULE V. Statistical modeling, Regression. May 20, 2016. CRG.

  • LECTURE. View slides in this browser window. Simple linear regression model, residuals, degrees of freedom, least squares method, correlation coefficient, variance decomposition, determination coefficient. Interpretation of the slope, correlation, and determination coefficients. Standard error and statistical inference in simple linear regression model. Analysis of variance (ANOVA). One-way and two-way ANOVA.
  • PRACTICUM. Download the zip-file. Problems on linear regression, ANOVA, data transformation.
  • QUIZ 5


External Resources

Bioinformatics Core Facility @ CRG — 2011-2024