Difference between revisions of "CRG PhD & Masters Course 2016 Introduction to Statistics in R"
From Bioinformatics Core Wiki
Jponomarenko (Talk | contribs) (Created page with "__TOC__ === Course Description === This introductory course to statistics and R is offered in 3 consecutive modules (please see Course Syllabus below), each consisting of A...") |
Jponomarenko (Talk | contribs) |
||
(22 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
__TOC__ | __TOC__ | ||
− | |||
=== Course Description === | === Course Description === | ||
Line 6: | Line 5: | ||
=== Course Objectives === | === Course Objectives === | ||
− | To introduce the basic concepts of statistics and how they can be applied to real-life datasets using R. Knowledge of statistics or R is not required for taking this course. However, familiarity with the material in the previous modules is recommended if the modules are not taken in a sequence. | + | To introduce the basic concepts of statistics and how they can be applied to real-life datasets using R. The students will produce their first scripts that can be re-used when they start analyzing their own data. Knowledge of statistics or R is not required for taking this course. However, familiarity with the material in the previous modules is recommended if the modules are not taken in a sequence. |
=== Course Instructors === | === Course Instructors === | ||
* Sarah Bonnin (Module I) sarah.bonnin@crg.eu | * Sarah Bonnin (Module I) sarah.bonnin@crg.eu | ||
− | * | + | * German Demidov (Module II) german.demidov@crg.eu |
− | * | + | * Julia Ponomarenko (organizer, Module III) julia.ponomarenko@crg.eu |
− | + | ||
=== Time and Location === | === Time and Location === | ||
Line 20: | Line 18: | ||
=== Course Syllabus, Schedule, and Materials === | === Course Syllabus, Schedule, and Materials === | ||
− | + | ==== MODULE I. Introduction to R. Basic functions and plotting. Oct 5, 2016. ==== | |
− | ==== MODULE I. | + | * Introduction to R programming language: |
− | * Introduction to R programming language | + | ** Introduction to R Studio. |
− | + | ** Data types, variables, packages, handling files/scripts,functions. | |
* Basic plots in R. The ggplot2 package. | * Basic plots in R. The ggplot2 package. | ||
− | * | + | ** Exploratory data analysis: bar-plot, histogram, box-plot, scatter-plot. |
− | * | + | ** R Studio ggplot2 [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf cheatsheet] |
− | + | * R Markdown. How to produce html / pdf / Word reports with R. | |
− | + | ** R Studio RMD [https://www.rstudio.com/wp-content/uploads/2016/03/rmarkdown-cheatsheet-2.0.pdf cheatsheet] | |
− | + | <br> | |
− | + | * Slides for Module 1: [[Media:161005_Module1_Introduction_to_R.pdf|here]] | |
− | + | * Template for Exercise 4: [[Media:Exercise4.Rmd|here]] | |
− | + | <br> | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | * | + | |
− | + | ||
− | * | + | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | * | + | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | * | + | |
− | * | + | |
− | + | ||
+ | ==== MODULE II. Introduction to Probability & Hypothesis testing. Oct 10, 2016. ==== | ||
+ | * Independence, conditional probability, Bayes formula. | ||
+ | * Distributions, population mean and population variance. | ||
+ | * Central Limit theorem and the Law of large numbers. | ||
+ | * The concept of hypothesis testing, type I and type II error, false discovery rate. | ||
+ | * Significance and confidence level, p-value. Confidence intervals. One-sided and two-sided tests. | ||
+ | * One-sample and two-sample tests for independent and matched samples with known and unknown variance. | ||
+ | * Student t-distribution, assumption of normality. | ||
+ | * Test for proportions. | ||
+ | * [[Media:BIST_Module3_practicum.zip|Download the zip-file of the module's materials.]] | ||
+ | <br> | ||
− | ==== MODULE | + | ==== MODULE III. Non-parametric tests & Linear regression. Oct 11, 2016. ==== |
− | * | + | * Non-parametric tests: Sign test, Wilcoxon sum of ranks test (Mann-Whitney U-test), Wilcoxon signed rank test, Kruskal-Wallis test. |
− | * | + | * Kolmogorov-Smirnov (KS) test. Shapiro test for normality. QQ-plot. |
− | + | * Data transformation. | |
+ | *[[Media:Module_3_Oct_11_2016.html.zip|Download the zip-file for this part of the practicum.]] | ||
+ | <br> | ||
+ | * Simple linear regression model, residuals, degrees of freedom. | ||
+ | * Interpretation of the slope, correlation, and determination coefficients. | ||
+ | * Standard error and statistical inference in simple linear regression model. | ||
+ | * Analysis of variance (ANOVA). One-way and two-way ANOVA. | ||
+ | * [[Media:BIST_Module5_hands_on.zip|Download the zip-file for the practicum on linear regression and ANOVA.]] | ||
Latest revision as of 16:53, 10 October 2016
Contents
Course Description
This introductory course to statistics and R is offered in 3 consecutive modules (please see Course Syllabus below), each consisting of A two-hour practicum in a computer class, using R Studio.
Course Objectives
To introduce the basic concepts of statistics and how they can be applied to real-life datasets using R. The students will produce their first scripts that can be re-used when they start analyzing their own data. Knowledge of statistics or R is not required for taking this course. However, familiarity with the material in the previous modules is recommended if the modules are not taken in a sequence.
Course Instructors
- Sarah Bonnin (Module I) sarah.bonnin@crg.eu
- German Demidov (Module II) german.demidov@crg.eu
- Julia Ponomarenko (organizer, Module III) julia.ponomarenko@crg.eu
Time and Location
- Oct 5, 10, 11, 2016. 11:00 - 13:00. PRBB Building. Boinformatics classroom. 468. 4th floor. The hotel wing.
Course Syllabus, Schedule, and Materials
MODULE I. Introduction to R. Basic functions and plotting. Oct 5, 2016.
- Introduction to R programming language:
- Introduction to R Studio.
- Data types, variables, packages, handling files/scripts,functions.
- Basic plots in R. The ggplot2 package.
- Exploratory data analysis: bar-plot, histogram, box-plot, scatter-plot.
- R Studio ggplot2 cheatsheet
- R Markdown. How to produce html / pdf / Word reports with R.
- R Studio RMD cheatsheet
MODULE II. Introduction to Probability & Hypothesis testing. Oct 10, 2016.
- Independence, conditional probability, Bayes formula.
- Distributions, population mean and population variance.
- Central Limit theorem and the Law of large numbers.
- The concept of hypothesis testing, type I and type II error, false discovery rate.
- Significance and confidence level, p-value. Confidence intervals. One-sided and two-sided tests.
- One-sample and two-sample tests for independent and matched samples with known and unknown variance.
- Student t-distribution, assumption of normality.
- Test for proportions.
- Download the zip-file of the module's materials.
MODULE III. Non-parametric tests & Linear regression. Oct 11, 2016.
- Non-parametric tests: Sign test, Wilcoxon sum of ranks test (Mann-Whitney U-test), Wilcoxon signed rank test, Kruskal-Wallis test.
- Kolmogorov-Smirnov (KS) test. Shapiro test for normality. QQ-plot.
- Data transformation.
- Download the zip-file for this part of the practicum.
- Simple linear regression model, residuals, degrees of freedom.
- Interpretation of the slope, correlation, and determination coefficients.
- Standard error and statistical inference in simple linear regression model.
- Analysis of variance (ANOVA). One-way and two-way ANOVA.
- Download the zip-file for the practicum on linear regression and ANOVA.
External Resources
- Nature Web-collection "Statistics for Biologists"
- 100 Statistical Tests.pdf - ResearchGate - just search Google to get a link
- Book "Basics of Statistics" by Jarko Isotalo
- "Introduction to Probability and Statistics using R" by G. Jay Kerns
- R Tutorials by William B. King
- Tutorials "R for basic statistics"
- Blog "R-bloggers"
- StatsBlogs
- Blog "Learning R"
- Blog "R you ready?"
- "R-statistics blog"
- Self-paced online courses from UC Berkeley: Descriptive Statistics. Probability. Inference.
- Online book recommended for the UC Berkeley courses
- Self-paced online course "Explore Statistics with R"
- Online course from Stanford "An Introduction to Statistical Learning with Applications in R"
- Self-paced online course from Microsoft "Intro to R programming"
- Self-paced online course from Harvard "Statistics and R"
- Self-paced online course from Harvard "Statistical Inference and Modeling for High-throughput Experiments"
- VIB "Basic statistics theory" course slides.
- VIB "Basic statistics in R" course. Tutorial, excercises, cheat sheets.