Difference between revisions of "CRG Introduction to Statistics and R 2017"
Jponomarenko (Talk | contribs) (Created page with "__TOC__ === Description === This is an introductory course to statistics and R programming. <br> The R part is offered in 4 slow-paced practicums for absolute beginners, fol...") |
Jponomarenko (Talk | contribs) (→MODULE III. Statistical modeling & Regression. June 9.) |
||
(28 intermediate revisions by 2 users not shown) | |||
Line 3: | Line 3: | ||
=== Description === | === Description === | ||
− | This is an introductory course to statistics and R programming. <br> | + | This is an introductory course to statistics and R programming. For the previous edition of this course, please refer to [https://biocore.crg.eu/wiki/BIST_Introduction_to_Statistics_2016 this page]. |
− | The R part is offered in 4 slow-paced practicums for absolute beginners, followed by 3 fast-paced practicums of statistical modules. <br> | + | <br> The R part is offered in 4 slow-paced practicums for absolute beginners, followed by 3 fast-paced practicums of statistical modules. <br> |
For practical exercises we will use R programming language and [https://www.rstudio.com R Studio]. | For practical exercises we will use R programming language and [https://www.rstudio.com R Studio]. | ||
Line 46: | Line 46: | ||
** Create and run a short script. | ** Create and run a short script. | ||
** Read and write a file. | ** Read and write a file. | ||
− | ** OUTCOME: Write a script that creates (and enters) a directory, | + | ** OUTCOME: Write a script that creates (and enters) a directory, process a simple manipulation and write into a file. |
+ | [[Media:170525_Introduction_to_R_day1.pdf|Slides day1]] | ||
+ | <br> | ||
+ | [[Media:170525_Exercises_day1.pdf|Exercises day1]] | ||
+ | <br> | ||
+ | Correction for [[Media:ex1.R|exercise 1]], [[Media:ex2.R|exercise 2]], [[Media:ex3.R|exercise 3]] | ||
+ | |||
* <u>PRACTICUM II. Data structures in R. May 26. 10:00 - 13:00.</u> | * <u>PRACTICUM II. Data structures in R. May 26. 10:00 - 13:00.</u> | ||
− | ** More on vectors. | + | ** More on vectors and factors. |
** Matrices and data frames: create, access/extract/subset, modify, arithmetic, conversions, check and name dimensions. | ** Matrices and data frames: create, access/extract/subset, modify, arithmetic, conversions, check and name dimensions. | ||
− | ** OUTCOME: Produce a script that reads matrices and data frames, | + | ** OUTCOME: Produce a script that reads matrices and data frames, manipulate them, read and write files. |
+ | [[Media:170526_Introduction_to_R_day2.pdf|Slides day2]] | ||
+ | <br> | ||
+ | [[Media:170526_Exercises_day2.pdf|Exercises day2]] | ||
+ | <br> | ||
+ | |||
Line 59: | Line 70: | ||
** Packages: find, install, load, explore/find functions and documentation, get help on functions. | ** Packages: find, install, load, explore/find functions and documentation, get help on functions. | ||
** OUTCOME: Install the packages "diamonds" and "WriteXLS". Use them in a script that manipulates the diamonds data frame and writes it into an Excel file. | ** OUTCOME: Install the packages "diamonds" and "WriteXLS". Use them in a script that manipulates the diamonds data frame and writes it into an Excel file. | ||
− | + | [[Media:170529_Introduction_to_R_day3.pdf|Slides day3]] | |
+ | <br> | ||
+ | [[Media:170529_Exercises_day3.pdf|Exercises day3]] | ||
+ | <br> | ||
* <u>PRACTICUM IV. Plots & Graphics in R. May 30. 10:00 - 13:00.</u> | * <u>PRACTICUM IV. Plots & Graphics in R. May 30. 10:00 - 13:00.</u> | ||
Line 65: | Line 79: | ||
** Introduction to ggplot2 package: structure of ggplot2 commands, scatter plots. | ** Introduction to ggplot2 package: structure of ggplot2 commands, scatter plots. | ||
** OUTCOME: Write a script that produces, customizes, and saves plots in files. | ** OUTCOME: Write a script that produces, customizes, and saves plots in files. | ||
+ | [[Media:170530_Introduction_to_R_day4.pdf|Slides day4]] | ||
+ | <br> | ||
+ | [[Media:170530_Exercises_day4.pdf|Exercises day4]] | ||
+ | <br> | ||
<br> | <br> | ||
Line 80: | Line 98: | ||
** Correction for finite population size. | ** Correction for finite population size. | ||
* [[Media:Tables corrected.pdf|STATISTICAL TABLES]] | * [[Media:Tables corrected.pdf|STATISTICAL TABLES]] | ||
+ | [[Media:Module_1_Lectures_June_2017.pdf|Lecture 1 slides.]] | ||
<br> | <br> | ||
Line 85: | Line 104: | ||
** Descriptive statistics. | ** Descriptive statistics. | ||
** Plots: Bar-plot, histogram, CDF, box-plot, scatter-plot, pie charts etc. | ** Plots: Bar-plot, histogram, CDF, box-plot, scatter-plot, pie charts etc. | ||
− | |||
** Distributions, population mean and population variance. | ** Distributions, population mean and population variance. | ||
− | + | [[Media:Module1_June_2017.html_2.zip|Download the zipped html-file for the practicum.]] | |
<br> | <br> | ||
+ | |||
==== <b>MODULE II. Statistical Inference. June 8. </b> ==== | ==== <b>MODULE II. Statistical Inference. June 8. </b> ==== | ||
Line 106: | Line 125: | ||
** Chi-square test for goodness of fit, chi-square test for independence. | ** Chi-square test for goodness of fit, chi-square test for independence. | ||
** Sample size estimation. | ** Sample size estimation. | ||
+ | [[Media:Module_2_Lectures_June_2017.pdf|Lecture 2 slides.]] | ||
<br> | <br> | ||
* <u>PRACTICUM. 14:00 - 17:00.</u>. | * <u>PRACTICUM. 14:00 - 17:00.</u>. | ||
− | ** One- and two-sample tests with known and unknown variance | + | ** One- and two-sample tests with known and unknown variance. |
− | ** | + | ** Test for proportions. |
− | ** | + | ** Confidence intervals and t-distribution. |
− | ** | + | ** Fisher test. |
− | ** | + | ** Sample size estimation. |
− | + | [[Media:Module2_June_2017_Parametric_tests.html.zip|Download the zipped html-file for the practicum Part 1.]]<br> | |
+ | [[Media:Module2_June_2017_FDR_test_power.html.zip|Download the zipped html-file for the practicum Part 2.]] | ||
<br> | <br> | ||
+ | |||
==== <b>MODULE III. Statistical modeling & Regression. June 9.</b> ==== | ==== <b>MODULE III. Statistical modeling & Regression. June 9.</b> ==== | ||
Line 126: | Line 148: | ||
** Beyond simple regression models: multiple regression, logistic regression. | ** Beyond simple regression models: multiple regression, logistic regression. | ||
** Correction for multiple testing, family-wise error rate. | ** Correction for multiple testing, family-wise error rate. | ||
+ | [[Media:Part3.pdf|Lecture 3 slides.]] | ||
<br> | <br> | ||
* <u>PRACTICUM. 14:00 - 17:00.</u> | * <u>PRACTICUM. 14:00 - 17:00.</u> | ||
+ | ** QQ-plot. | ||
+ | ** Tests for normality. | ||
+ | ** Data transformation. | ||
+ | ** Non-parametric tests. | ||
** Problems on linear regression. | ** Problems on linear regression. | ||
** ANOVA. | ** ANOVA. | ||
+ | [[Media:Module3_June_2017.html_2.zip|Download the zipped html-file for the practicum Part 1.]]<br> | ||
+ | [[Media:3rd_module_regression_anova.html.zip|Download the zipped html-file for the practicum Part 2.]]<br> | ||
<br> | <br> | ||
+ | |||
=== External Resources === | === External Resources === | ||
* [http://www.nature.com/collections/qghhqm Nature Web-collection "Statistics for Biologists"] | * [http://www.nature.com/collections/qghhqm Nature Web-collection "Statistics for Biologists"] | ||
Line 155: | Line 185: | ||
* [https://www.edx.org/course/data-analysis-life-sciences-1-statistics-harvardx-ph525-1x Self-paced online course from Harvard "Statistics and R"] | * [https://www.edx.org/course/data-analysis-life-sciences-1-statistics-harvardx-ph525-1x Self-paced online course from Harvard "Statistics and R"] | ||
* [https://www.edx.org/course/data-analysis-life-sciences-3-harvardx-ph525-3x Self-paced online course from Harvard "Statistical Inference and Modeling for High-throughput Experiments" ] | * [https://www.edx.org/course/data-analysis-life-sciences-3-harvardx-ph525-3x Self-paced online course from Harvard "Statistical Inference and Modeling for High-throughput Experiments" ] | ||
− | |||
− | |||
* [http://students.brown.edu/seeing-theory/ The Seeing Theory website visualizes the fundamental concepts covered in an introductory college statistics, using D3.jc] | * [http://students.brown.edu/seeing-theory/ The Seeing Theory website visualizes the fundamental concepts covered in an introductory college statistics, using D3.jc] |
Latest revision as of 16:00, 9 June 2017
Contents
Description
This is an introductory course to statistics and R programming. For the previous edition of this course, please refer to this page.
The R part is offered in 4 slow-paced practicums for absolute beginners, followed by 3 fast-paced practicums of statistical modules.
For practical exercises we will use R programming language and R Studio.
The statistics material is offered in 3 consecutive modules (please see Course Syllabus below), each containing a morning lecture and an afternoon practicum in a computer class. These practicums are focused on using statistics in R, with the purpose to demonstrate and reinforce understanding of concepts introduced in the lectures, rather than teaching R programming.
Course Instructors
- Dmitri Pervouchine (lectures)
- Sarah Bonnin (practicums I - IV)
- Estefania Mancini (practicums I - IV)
- German Demidov (practicums V - VII)
- Julia Ponomarenko (organizer, practicums V - VII)
Dates, Time and Location
- Module 0. Introduction to R. May 25, 26, 29, 30, 2017.
- 10:00 - 13:00.
- PRBB. Boinformatics classroom (468). 4th floor. The hotel/North wing.
- Modules I, II, III. Introduction to Statistics. June 6, 8, 9, 2017.
- LECTURES.
- 10:00 - 13:00.
- PRBB. Ramon y Cajal.
- PRACTICUMS.
- 14:00 - 17:00.
- PRBB. Boinformatics classroom (468). 4th floor. The hotel/North wing.
- LECTURES.
Course Syllabus, Schedule, and Materials
MODULE 0. Introduction to R. May 25, 26, 29, 30.
- PRACTICUM I. Intro to R and R Studio. May 25. 10:00 - 13:00.
- Introduction to R studio: explore environment variable, navigate the history of commands, navigate directory and file structure, workspace and files.
- Simple arithmetic in R console.
- Create and delete an object.
- Introduction to data types and the "vector" data structure.
- Create and run a short script.
- Read and write a file.
- OUTCOME: Write a script that creates (and enters) a directory, process a simple manipulation and write into a file.
Slides day1
Exercises day1
Correction for exercise 1, exercise 2, exercise 3
- PRACTICUM II. Data structures in R. May 26. 10:00 - 13:00.
- More on vectors and factors.
- Matrices and data frames: create, access/extract/subset, modify, arithmetic, conversions, check and name dimensions.
- OUTCOME: Produce a script that reads matrices and data frames, manipulate them, read and write files.
- PRACTICUM III. Lists & Packages. May 29. 10:00 - 13:00.
- More on data structures. Lists: create, access/extract/subset, modify.
- Packages: find, install, load, explore/find functions and documentation, get help on functions.
- OUTCOME: Install the packages "diamonds" and "WriteXLS". Use them in a script that manipulates the diamonds data frame and writes it into an Excel file.
- PRACTICUM IV. Plots & Graphics in R. May 30. 10:00 - 13:00.
- Basic plotting: scatter plots, box plots, histograms, density plots. Changing colors, points shapes, titles, labels, legend, axes, etc.
- Introduction to ggplot2 package: structure of ggplot2 commands, scatter plots.
- OUTCOME: Write a script that produces, customizes, and saves plots in files.
MODULE I. Descriptive Statistics & Intro to Probability. June 6.
- LECTURE. 10:00 - 13:00.
- Exploratory data analysis and graphical displays.
- Samples, measures of center and spread, percentiles, odds ratio.
- Outliers and robustness.
- Independence, conditional probability, Bayes formula.
- Distributions, population mean and population variance, Binomial, Poisson, and Normal distribution.
- Central Limit theorem and the Law of large numbers.
- Continuity correction.
- Sampling with and without replacement.
- Correction for finite population size.
- STATISTICAL TABLES
- PRACTICUM. 14:00 - 17:00.
- Descriptive statistics.
- Plots: Bar-plot, histogram, CDF, box-plot, scatter-plot, pie charts etc.
- Distributions, population mean and population variance.
Download the zipped html-file for the practicum.
MODULE II. Statistical Inference. June 8.
- LECTURE. 10:00 - 13:00..
- The concept of hypothesis testing, type I and type II error, false discovery rate.
- Significance and confidence level, p-value.
- One-sided and two-sided tests and confidence intervals.
- Sampling distribution, estimators, standard error.
- Normal probabilities in application to p-value.
- One-sample and two-sample tests for independent and matched samples with known variance.
- The case of unknown variance and Student t-distribution, assumption of normality.
- Pooled variance and equal variances assumption.
- Estimation of variance.
- Fisher test for variance equality.
- Non-parametric tests: Sign test, Wilcoxon sum of ranks test (Mann-Whitney U-test), Wilcoxon signed rank test.
- Chi-square test for goodness of fit, chi-square test for independence.
- Sample size estimation.
- PRACTICUM. 14:00 - 17:00..
- One- and two-sample tests with known and unknown variance.
- Test for proportions.
- Confidence intervals and t-distribution.
- Fisher test.
- Sample size estimation.
Download the zipped html-file for the practicum Part 1.
Download the zipped html-file for the practicum Part 2.
MODULE III. Statistical modeling & Regression. June 9.
- LECTURE. 10:00 - 13:00.
- Simple linear regression model, residuals, degrees of freedom, least squares method, correlation coefficient, variance decomposition, determination coefficient.
- Interpretation of the slope, correlation, and determination coefficients.
- Standard error and statistical inference in simple linear regression model.
- Analysis of variance (ANOVA). One-way and two-way ANOVA.
- Beyond simple regression models: multiple regression, logistic regression.
- Correction for multiple testing, family-wise error rate.
- PRACTICUM. 14:00 - 17:00.
- QQ-plot.
- Tests for normality.
- Data transformation.
- Non-parametric tests.
- Problems on linear regression.
- ANOVA.
Download the zipped html-file for the practicum Part 1.
Download the zipped html-file for the practicum Part 2.
External Resources
- Nature Web-collection "Statistics for Biologists"
- 100 Statistical Tests.pdf - ResearchGate - just search Google to get a link
- Book "Basics of Statistics" by Jarko Isotalo
- "Introduction to Probability and Statistics using R" by G. Jay Kerns
- R Tutorials by William B. King
- Tutorials "R for basic statistics"
- Blog "R-bloggers"
- StatsBlogs
- Blog "Learning R"
- Blog "R you ready?"
- "R-statistics blog"
- Guide and tool for design and analysis of biological experiments from the UK's National Center for the Replacement Refinement and Reduction of Animals in Research (NC3R), covering topics of control for cofounding variables, sample size, effect size, a standardised effect size, power of statistical tests, multiple testing.
- Sample/effect size online calculators for designing biomedical experiments from UC San Francisco
- Self-paced online courses from UC Berkeley: Descriptive Statistics. Probability. Inference.
- Online book recommended for the UC Berkeley courses
- Self-paced online course "Explore Statistics with R"
- Online course from Stanford "An Introduction to Statistical Learning with Applications in R"
- Self-paced online course from Microsoft "Intro to R programming"
- Self-paced online course from Harvard "Statistics and R"
- Self-paced online course from Harvard "Statistical Inference and Modeling for High-throughput Experiments"
- The Seeing Theory website visualizes the fundamental concepts covered in an introductory college statistics, using D3.jc