Difference between revisions of "CRG PhD Course 2017 Introduction to Statistics in R"
From Bioinformatics Core Wiki
Jponomarenko (Talk | contribs) (Created page with "__TOC__ === Course Description === This introductory course to statistics and R is offered in 3 consecutive modules (please see Course Syllabus below), each consisting of A t...") |
Jponomarenko (Talk | contribs) |
||
Line 2: | Line 2: | ||
=== Course Description === | === Course Description === | ||
− | This introductory course to statistics and R is offered in 3 consecutive modules (please see Course Syllabus below), each consisting of | + | This introductory course to statistics and R is offered in 3 two-hour consecutive modules (please see Course Syllabus below), each consisting of a hands-on practicum in a computer class, using [https://www.rstudio.com R Studio]. |
=== Course Objectives === | === Course Objectives === | ||
− | To introduce the basic concepts of statistics and how they can be applied to real-life datasets using R. The students will produce their first scripts that can be re-used when they start analyzing their own data. Knowledge of statistics or R is not required for taking this course. However, familiarity with the material in the previous modules is recommended if the modules are not taken in a sequence. | + | To introduce or to refresh the basic concepts of statistics and how they can be applied to real-life datasets using R. The students will produce their first scripts that can be re-used when they start analyzing their own data. Knowledge of statistics or R is not required for taking this course. However, familiarity with the material in the previous modules is recommended if the modules are not taken in a sequence. |
=== Course Instructors === | === Course Instructors === | ||
Line 13: | Line 13: | ||
=== Time and Location === | === Time and Location === | ||
− | * Oct | + | * Oct 3, 4, 5, 2017. 11:00 - 13:00. PRBB Building. Boinformatics classroom. 468. 4th floor. The hotel wing. |
=== Course Syllabus, Schedule, and Materials === | === Course Syllabus, Schedule, and Materials === | ||
− | ==== MODULE I. Introduction to R | + | ==== MODULE I. Introduction to R. Oct 3, 2017. ==== |
− | * Introduction to R programming language: | + | * Introduction to R programming language and R Studio: |
− | ** Introduction to R | + | ** Introduction to R studio: explore environment variable, navigate the history of commands, navigate directory and file structure, workspace and files. |
− | ** | + | ** Basics of R language: syntax, special characters, simple arithmetic in R console, create/delete and manipulate an object. |
− | * | + | ** R scripts: create and run, comment. |
− | ** | + | * Functions in R. |
− | ** | + | ** Input/output: read and write a file, change and create a directory (functions: setwd, getwd). |
− | * | + | * Data structures in R. |
− | ** | + | ** Vectors and factors: create, modify, subset, manipulate, compare. |
+ | ** Matrices and data frames: create, access/extract/subset, modify, arithmetic, conversions, check and name dimensions. | ||
+ | ** Lists: create, access/extract/subset, modify. | ||
+ | ** Missing values: how to deal with NA values (functions: is.na, na.omit). | ||
+ | * OUTCOME: Write a script that reads matrices and data frames, manipulates them, reads and writes files. | ||
<br> | <br> | ||
* Slides for Module 1: [[Media:161005_Module1_Introduction_to_R.pdf|here]] | * Slides for Module 1: [[Media:161005_Module1_Introduction_to_R.pdf|here]] | ||
− | * | + | <br> |
+ | |||
+ | ==== MODULE II. Descriptive statistics and plotting in R. Oct 4, 2017. ==== | ||
+ | * Packages in R: find, install, load, explore/find functions and documentation, get help on functions. | ||
+ | * Exploratory data analysis | ||
+ | ** Descriptive statistical functions: | ||
+ | ** Plots: bar-plots, histograms, box-plots, scatter-plots. | ||
+ | * Introduction to the ggplot2 package: structure of ggplot2 commands | ||
+ | * OUTCOME: | ||
+ | ** Install the packages "diamonds" and "WriteXLS". | ||
+ | ** Write a script that manipulates the diamonds data frame, writes it into an Excel file, produces and saves plots. | ||
+ | <br> | ||
+ | * Slides for Module II: [[Media:Module1_June_2017.html_2.zip|Download the zipped html-file for the practicum.]] | ||
+ | * [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf cheatsheet] | ||
<br> | <br> | ||
Line 56: | Line 73: | ||
* [[Media:BIST_Module5_hands_on.zip|Download the zip-file for the practicum on linear regression and ANOVA.]] | * [[Media:BIST_Module5_hands_on.zip|Download the zip-file for the practicum on linear regression and ANOVA.]] | ||
+ | |||
+ | |||
+ | <br> | ||
=== External Resources === | === External Resources === | ||
Line 69: | Line 89: | ||
* [https://ryouready.wordpress.com Blog "R you ready?"] | * [https://ryouready.wordpress.com Blog "R you ready?"] | ||
* [http://www.r-statistics.com "R-statistics blog"] | * [http://www.r-statistics.com "R-statistics blog"] | ||
+ | * [https://eda.nc3rs.org.uk/experimental-design Guide and tool for design and analysis of biological experiments from the UK's National Center for the Replacement Refinement and Reduction of Animals in Research (NC3R)], covering topics of control for cofounding variables, sample size, effect size, a standardised effect size, power of statistical tests, multiple testing. | ||
+ | * [http://www.sample-size.net/ Sample/effect size online calculators for designing biomedical experiments from UC San Francisco] | ||
* Self-paced online courses from UC Berkeley: [https://www.edx.org/course/introduction-statistics-descriptive-uc-berkeleyx-stat2-1x Descriptive Statistics.] [https://www.edx.org/course/introduction-statistics-probability-uc-berkeleyx-stat2-2x Probability.] [https://www.edx.org/course/introduction-statistics-inference-uc-berkeleyx-stat2-3x Inference.] | * Self-paced online courses from UC Berkeley: [https://www.edx.org/course/introduction-statistics-descriptive-uc-berkeleyx-stat2-1x Descriptive Statistics.] [https://www.edx.org/course/introduction-statistics-probability-uc-berkeleyx-stat2-2x Probability.] [https://www.edx.org/course/introduction-statistics-inference-uc-berkeleyx-stat2-3x Inference.] | ||
* [http://www.stat.berkeley.edu/~stark/SticiGui/ Online book recommended for the UC Berkeley courses] | * [http://www.stat.berkeley.edu/~stark/SticiGui/ Online book recommended for the UC Berkeley courses] | ||
Line 76: | Line 98: | ||
* [https://www.edx.org/course/data-analysis-life-sciences-1-statistics-harvardx-ph525-1x Self-paced online course from Harvard "Statistics and R"] | * [https://www.edx.org/course/data-analysis-life-sciences-1-statistics-harvardx-ph525-1x Self-paced online course from Harvard "Statistics and R"] | ||
* [https://www.edx.org/course/data-analysis-life-sciences-3-harvardx-ph525-3x Self-paced online course from Harvard "Statistical Inference and Modeling for High-throughput Experiments" ] | * [https://www.edx.org/course/data-analysis-life-sciences-3-harvardx-ph525-3x Self-paced online course from Harvard "Statistical Inference and Modeling for High-throughput Experiments" ] | ||
− | * [http:// | + | * [http://students.brown.edu/seeing-theory/ The Seeing Theory website visualizes the fundamental concepts covered in an introductory college statistics, using D3.jc] |
− | + |
Revision as of 14:17, 2 October 2017
Contents
Course Description
This introductory course to statistics and R is offered in 3 two-hour consecutive modules (please see Course Syllabus below), each consisting of a hands-on practicum in a computer class, using R Studio.
Course Objectives
To introduce or to refresh the basic concepts of statistics and how they can be applied to real-life datasets using R. The students will produce their first scripts that can be re-used when they start analyzing their own data. Knowledge of statistics or R is not required for taking this course. However, familiarity with the material in the previous modules is recommended if the modules are not taken in a sequence.
Course Instructors
- Sarah Bonnin (Module I) sarah.bonnin@crg.eu
- German Demidov (Module II) german.demidov@crg.eu
- Julia Ponomarenko (organizer, Module III) julia.ponomarenko@crg.eu
Time and Location
- Oct 3, 4, 5, 2017. 11:00 - 13:00. PRBB Building. Boinformatics classroom. 468. 4th floor. The hotel wing.
Course Syllabus, Schedule, and Materials
MODULE I. Introduction to R. Oct 3, 2017.
- Introduction to R programming language and R Studio:
- Introduction to R studio: explore environment variable, navigate the history of commands, navigate directory and file structure, workspace and files.
- Basics of R language: syntax, special characters, simple arithmetic in R console, create/delete and manipulate an object.
- R scripts: create and run, comment.
- Functions in R.
- Input/output: read and write a file, change and create a directory (functions: setwd, getwd).
- Data structures in R.
- Vectors and factors: create, modify, subset, manipulate, compare.
- Matrices and data frames: create, access/extract/subset, modify, arithmetic, conversions, check and name dimensions.
- Lists: create, access/extract/subset, modify.
- Missing values: how to deal with NA values (functions: is.na, na.omit).
- OUTCOME: Write a script that reads matrices and data frames, manipulates them, reads and writes files.
- Slides for Module 1: here
MODULE II. Descriptive statistics and plotting in R. Oct 4, 2017.
- Packages in R: find, install, load, explore/find functions and documentation, get help on functions.
- Exploratory data analysis
- Descriptive statistical functions:
- Plots: bar-plots, histograms, box-plots, scatter-plots.
- Introduction to the ggplot2 package: structure of ggplot2 commands
- OUTCOME:
- Install the packages "diamonds" and "WriteXLS".
- Write a script that manipulates the diamonds data frame, writes it into an Excel file, produces and saves plots.
- Slides for Module II: Download the zipped html-file for the practicum.
- cheatsheet
MODULE II. Introduction to Probability & Hypothesis testing. Oct 10, 2016.
- Independence, conditional probability, Bayes formula.
- Distributions, population mean and population variance.
- Central Limit theorem and the Law of large numbers.
- The concept of hypothesis testing, type I and type II error, false discovery rate.
- Significance and confidence level, p-value. Confidence intervals. One-sided and two-sided tests.
- One-sample and two-sample tests for independent and matched samples with known and unknown variance.
- Student t-distribution, assumption of normality.
- Test for proportions.
- Download the zip-file of the module's materials.
MODULE III. Non-parametric tests & Linear regression. Oct 11, 2016.
- Non-parametric tests: Sign test, Wilcoxon sum of ranks test (Mann-Whitney U-test), Wilcoxon signed rank test, Kruskal-Wallis test.
- Kolmogorov-Smirnov (KS) test. Shapiro test for normality. QQ-plot.
- Data transformation.
- Download the zip-file for this part of the practicum.
- Simple linear regression model, residuals, degrees of freedom.
- Interpretation of the slope, correlation, and determination coefficients.
- Standard error and statistical inference in simple linear regression model.
- Analysis of variance (ANOVA). One-way and two-way ANOVA.
- Download the zip-file for the practicum on linear regression and ANOVA.
External Resources
- Nature Web-collection "Statistics for Biologists"
- 100 Statistical Tests.pdf - ResearchGate - just search Google to get a link
- Book "Basics of Statistics" by Jarko Isotalo
- "Introduction to Probability and Statistics using R" by G. Jay Kerns
- R Tutorials by William B. King
- Tutorials "R for basic statistics"
- Blog "R-bloggers"
- StatsBlogs
- Blog "Learning R"
- Blog "R you ready?"
- "R-statistics blog"
- Guide and tool for design and analysis of biological experiments from the UK's National Center for the Replacement Refinement and Reduction of Animals in Research (NC3R), covering topics of control for cofounding variables, sample size, effect size, a standardised effect size, power of statistical tests, multiple testing.
- Sample/effect size online calculators for designing biomedical experiments from UC San Francisco
- Self-paced online courses from UC Berkeley: Descriptive Statistics. Probability. Inference.
- Online book recommended for the UC Berkeley courses
- Self-paced online course "Explore Statistics with R"
- Online course from Stanford "An Introduction to Statistical Learning with Applications in R"
- Self-paced online course from Microsoft "Intro to R programming"
- Self-paced online course from Harvard "Statistics and R"
- Self-paced online course from Harvard "Statistical Inference and Modeling for High-throughput Experiments"
- The Seeing Theory website visualizes the fundamental concepts covered in an introductory college statistics, using D3.jc