Difference between revisions of "CRG Introduction to Statistics and R 2017"

From Bioinformatics Core Wiki
(MODULE III. Statistical modeling & Regression. June 9.)
 
(7 intermediate revisions by the same user not shown)
Line 105: Line 105:
 
** Plots: Bar-plot, histogram, CDF, box-plot, scatter-plot, pie charts etc.
 
** Plots: Bar-plot, histogram, CDF, box-plot, scatter-plot, pie charts etc.
 
** Distributions, population mean and population variance.  
 
** Distributions, population mean and population variance.  
[[Media:Module1_June_2017.html.zip|Download the zipped html-file for the practicum.]]  
+
[[Media:Module1_June_2017.html_2.zip|Download the zipped html-file for the practicum.]]  
 
<br>
 
<br>
  
Line 134: Line 134:
 
** Fisher test.
 
** Fisher test.
 
** Sample size estimation.  
 
** Sample size estimation.  
 +
[[Media:Module2_June_2017_Parametric_tests.html.zip|Download the zipped html-file for the practicum Part 1.]]<br>
 +
[[Media:Module2_June_2017_FDR_test_power.html.zip|Download the zipped html-file for the practicum Part 2.]]
 
<br>
 
<br>
 +
  
 
==== <b>MODULE III. Statistical modeling & Regression. June 9.</b> ====
 
==== <b>MODULE III. Statistical modeling & Regression. June 9.</b> ====
Line 145: Line 148:
 
** Beyond simple regression models: multiple regression, logistic regression.
 
** Beyond simple regression models: multiple regression, logistic regression.
 
** Correction for multiple testing, family-wise error rate.  
 
** Correction for multiple testing, family-wise error rate.  
 +
[[Media:Part3.pdf|Lecture 3 slides.]]
 
<br>
 
<br>
  
Line 154: Line 158:
 
** Problems on linear regression.
 
** Problems on linear regression.
 
** ANOVA.
 
** ANOVA.
 +
[[Media:Module3_June_2017.html_2.zip|Download the zipped html-file for the practicum Part 1.]]<br>
 +
[[Media:3rd_module_regression_anova.html.zip|Download the zipped html-file for the practicum Part 2.]]<br>
  
  
Line 179: Line 185:
 
* [https://www.edx.org/course/data-analysis-life-sciences-1-statistics-harvardx-ph525-1x Self-paced online course from Harvard "Statistics and R"]
 
* [https://www.edx.org/course/data-analysis-life-sciences-1-statistics-harvardx-ph525-1x Self-paced online course from Harvard "Statistics and R"]
 
* [https://www.edx.org/course/data-analysis-life-sciences-3-harvardx-ph525-3x Self-paced online course from Harvard "Statistical Inference and Modeling for High-throughput Experiments" ]
 
* [https://www.edx.org/course/data-analysis-life-sciences-3-harvardx-ph525-3x Self-paced online course from Harvard "Statistical Inference and Modeling for High-throughput Experiments" ]
* [http://data.bits.vib.be/pub/trainingen/StatTheory/SlidesFullDay.pdf VIB "Basic statistics theory" course slides.]
 
 
* [http://students.brown.edu/seeing-theory/ The Seeing Theory website visualizes the fundamental concepts covered in an introductory college statistics, using D3.jc]
 
* [http://students.brown.edu/seeing-theory/ The Seeing Theory website visualizes the fundamental concepts covered in an introductory college statistics, using D3.jc]

Latest revision as of 16:00, 9 June 2017


Description

This is an introductory course to statistics and R programming. For the previous edition of this course, please refer to this page.
The R part is offered in 4 slow-paced practicums for absolute beginners, followed by 3 fast-paced practicums of statistical modules.
For practical exercises we will use R programming language and R Studio.

The statistics material is offered in 3 consecutive modules (please see Course Syllabus below), each containing a morning lecture and an afternoon practicum in a computer class. These practicums are focused on using statistics in R, with the purpose to demonstrate and reinforce understanding of concepts introduced in the lectures, rather than teaching R programming.


Course Instructors


Dates, Time and Location

  • Module 0. Introduction to R. May 25, 26, 29, 30, 2017.
    • 10:00 - 13:00.
    • PRBB. Boinformatics classroom (468). 4th floor. The hotel/North wing.


  • Modules I, II, III. Introduction to Statistics. June 6, 8, 9, 2017.
    • LECTURES.
      • 10:00 - 13:00.
      • PRBB. Ramon y Cajal.
    • PRACTICUMS.
      • 14:00 - 17:00.
      • PRBB. Boinformatics classroom (468). 4th floor. The hotel/North wing.


Course Syllabus, Schedule, and Materials


MODULE 0. Introduction to R. May 25, 26, 29, 30.

  • PRACTICUM I. Intro to R and R Studio. May 25. 10:00 - 13:00.
    • Introduction to R studio: explore environment variable, navigate the history of commands, navigate directory and file structure, workspace and files.
    • Simple arithmetic in R console.
    • Create and delete an object.
    • Introduction to data types and the "vector" data structure.
    • Create and run a short script.
    • Read and write a file.
    • OUTCOME: Write a script that creates (and enters) a directory, process a simple manipulation and write into a file.

Slides day1
Exercises day1
Correction for exercise 1, exercise 2, exercise 3


  • PRACTICUM II. Data structures in R. May 26. 10:00 - 13:00.
    • More on vectors and factors.
    • Matrices and data frames: create, access/extract/subset, modify, arithmetic, conversions, check and name dimensions.
    • OUTCOME: Produce a script that reads matrices and data frames, manipulate them, read and write files.

Slides day2
Exercises day2


  • PRACTICUM III. Lists & Packages. May 29. 10:00 - 13:00.
    • More on data structures. Lists: create, access/extract/subset, modify.
    • Packages: find, install, load, explore/find functions and documentation, get help on functions.
    • OUTCOME: Install the packages "diamonds" and "WriteXLS". Use them in a script that manipulates the diamonds data frame and writes it into an Excel file.

Slides day3
Exercises day3

  • PRACTICUM IV. Plots & Graphics in R. May 30. 10:00 - 13:00.
    • Basic plotting: scatter plots, box plots, histograms, density plots. Changing colors, points shapes, titles, labels, legend, axes, etc.
    • Introduction to ggplot2 package: structure of ggplot2 commands, scatter plots.
    • OUTCOME: Write a script that produces, customizes, and saves plots in files.

Slides day4
Exercises day4

MODULE I. Descriptive Statistics & Intro to Probability. June 6.

  • LECTURE. 10:00 - 13:00.
    • Exploratory data analysis and graphical displays.
    • Samples, measures of center and spread, percentiles, odds ratio.
    • Outliers and robustness.
    • Independence, conditional probability, Bayes formula.
    • Distributions, population mean and population variance, Binomial, Poisson, and Normal distribution.
    • Central Limit theorem and the Law of large numbers.
    • Continuity correction.
    • Sampling with and without replacement.
    • Correction for finite population size.
  • STATISTICAL TABLES

Lecture 1 slides.

  • PRACTICUM. 14:00 - 17:00.
    • Descriptive statistics.
    • Plots: Bar-plot, histogram, CDF, box-plot, scatter-plot, pie charts etc.
    • Distributions, population mean and population variance.

Download the zipped html-file for the practicum.


MODULE II. Statistical Inference. June 8.

  • LECTURE. 10:00 - 13:00..
    • The concept of hypothesis testing, type I and type II error, false discovery rate.
    • Significance and confidence level, p-value.
    • One-sided and two-sided tests and confidence intervals.
    • Sampling distribution, estimators, standard error.
    • Normal probabilities in application to p-value.
    • One-sample and two-sample tests for independent and matched samples with known variance.
    • The case of unknown variance and Student t-distribution, assumption of normality.
    • Pooled variance and equal variances assumption.
    • Estimation of variance.
    • Fisher test for variance equality.
    • Non-parametric tests: Sign test, Wilcoxon sum of ranks test (Mann-Whitney U-test), Wilcoxon signed rank test.
    • Chi-square test for goodness of fit, chi-square test for independence.
    • Sample size estimation.

Lecture 2 slides.

  • PRACTICUM. 14:00 - 17:00..
    • One- and two-sample tests with known and unknown variance.
    • Test for proportions.
    • Confidence intervals and t-distribution.
    • Fisher test.
    • Sample size estimation.

Download the zipped html-file for the practicum Part 1.
Download the zipped html-file for the practicum Part 2.


MODULE III. Statistical modeling & Regression. June 9.

  • LECTURE. 10:00 - 13:00.
    • Simple linear regression model, residuals, degrees of freedom, least squares method, correlation coefficient, variance decomposition, determination coefficient.
    • Interpretation of the slope, correlation, and determination coefficients.
    • Standard error and statistical inference in simple linear regression model.
    • Analysis of variance (ANOVA). One-way and two-way ANOVA.
    • Beyond simple regression models: multiple regression, logistic regression.
    • Correction for multiple testing, family-wise error rate.

Lecture 3 slides.

  • PRACTICUM. 14:00 - 17:00.
    • QQ-plot.
    • Tests for normality.
    • Data transformation.
    • Non-parametric tests.
    • Problems on linear regression.
    • ANOVA.

Download the zipped html-file for the practicum Part 1.
Download the zipped html-file for the practicum Part 2.



External Resources

Bioinformatics Core Facility @ CRG — 2011-2024