Difference between revisions of "CRG PhD Course 2017 Introduction to Statistics in R"

From Bioinformatics Core Wiki
(Created page with "__TOC__ === Course Description === This introductory course to statistics and R is offered in 3 consecutive modules (please see Course Syllabus below), each consisting of A t...")
 
Line 2: Line 2:
  
 
=== Course Description ===
 
=== Course Description ===
This introductory course to statistics and R is offered in 3 consecutive modules (please see Course Syllabus below), each consisting of A two-hour practicum in a computer class, using [https://www.rstudio.com R Studio].  
+
This introductory course to statistics and R is offered in 3 two-hour consecutive modules (please see Course Syllabus below), each consisting of a hands-on practicum in a computer class, using [https://www.rstudio.com R Studio].  
  
 
=== Course Objectives ===
 
=== Course Objectives ===
To introduce the basic concepts of statistics and how they can be applied to real-life datasets using R. The students will produce their first scripts that can be re-used when they start analyzing their own data. Knowledge of statistics or R is not required for taking this course. However, familiarity with the material in the previous modules is recommended if the modules are not taken in a sequence.
+
To introduce or to refresh the basic concepts of statistics and how they can be applied to real-life datasets using R. The students will produce their first scripts that can be re-used when they start analyzing their own data. Knowledge of statistics or R is not required for taking this course. However, familiarity with the material in the previous modules is recommended if the modules are not taken in a sequence.
  
 
=== Course Instructors ===
 
=== Course Instructors ===
Line 13: Line 13:
  
 
=== Time and Location ===
 
=== Time and Location ===
* Oct 5, 10, 11, 2016. 11:00 - 13:00. PRBB Building. Boinformatics classroom. 468. 4th floor. The hotel wing.
+
* Oct 3, 4, 5, 2017. 11:00 - 13:00. PRBB Building. Boinformatics classroom. 468. 4th floor. The hotel wing.
  
  
 
=== Course Syllabus, Schedule, and Materials ===
 
=== Course Syllabus, Schedule, and Materials ===
  
==== MODULE I. Introduction to R. Basic functions and plotting. Oct 5, 2016. ====
+
==== MODULE I. Introduction to R. Oct 3, 2017. ====
* Introduction to R programming language:  
+
* Introduction to R programming language and R Studio:  
** Introduction to R Studio.
+
** Introduction to R studio: explore environment variable, navigate the history of commands, navigate directory and file structure, workspace and files.
** Data types, variables, packages, handling files/scripts,functions.  
+
** Basics of R language: syntax, special characters, simple arithmetic in R console, create/delete and manipulate an object.
* Basic plots in R. The ggplot2 package.
+
** R scripts: create and run, comment.
** Exploratory data analysis: bar-plot, histogram, box-plot, scatter-plot.
+
* Functions in R.
** R Studio ggplot2 [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf cheatsheet]
+
** Input/output: read and write a file, change and create a directory (functions: setwd, getwd).
* R Markdown. How to produce html / pdf / Word reports with R.
+
* Data structures in R.
** R Studio RMD [https://www.rstudio.com/wp-content/uploads/2016/03/rmarkdown-cheatsheet-2.0.pdf cheatsheet]
+
** Vectors and factors: create, modify, subset, manipulate, compare.
 +
** Matrices and data frames: create, access/extract/subset, modify, arithmetic, conversions, check and name dimensions.
 +
** Lists: create, access/extract/subset, modify.
 +
** Missing values: how to deal with NA values (functions: is.na, na.omit).
 +
* OUTCOME: Write a script that reads matrices and data frames, manipulates them, reads and writes files.
 
<br>
 
<br>
 
* Slides for Module 1: [[Media:161005_Module1_Introduction_to_R.pdf|here]]
 
* Slides for Module 1: [[Media:161005_Module1_Introduction_to_R.pdf|here]]
* Template for Exercise 4: [[Media:Exercise4.Rmd|here]]
+
<br>
 +
 
 +
==== MODULE II. Descriptive statistics and plotting in R. Oct 4, 2017. ====
 +
* Packages in R: find, install, load, explore/find functions and documentation, get help on functions.
 +
* Exploratory data analysis
 +
** Descriptive statistical functions: 
 +
** Plots: bar-plots, histograms, box-plots, scatter-plots.
 +
* Introduction to the ggplot2 package: structure of ggplot2 commands
 +
* OUTCOME:
 +
** Install the packages "diamonds" and "WriteXLS".
 +
** Write a script that manipulates the diamonds data frame, writes it into an Excel file, produces and saves plots.
 +
<br>
 +
* Slides for Module II: [[Media:Module1_June_2017.html_2.zip|Download the zipped html-file for the practicum.]]
 +
* [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf cheatsheet]
 
<br>
 
<br>
  
Line 56: Line 73:
 
* [[Media:BIST_Module5_hands_on.zip|Download the zip-file for the practicum on linear regression and ANOVA.]]
 
* [[Media:BIST_Module5_hands_on.zip|Download the zip-file for the practicum on linear regression and ANOVA.]]
  
 +
 +
 +
<br>
  
 
=== External Resources ===
 
=== External Resources ===
Line 69: Line 89:
 
* [https://ryouready.wordpress.com Blog "R you ready?"]
 
* [https://ryouready.wordpress.com Blog "R you ready?"]
 
* [http://www.r-statistics.com "R-statistics blog"]
 
* [http://www.r-statistics.com "R-statistics blog"]
 +
* [https://eda.nc3rs.org.uk/experimental-design Guide and tool for design and analysis of biological experiments from the UK's National Center for the Replacement Refinement and Reduction of Animals in Research (NC3R)], covering topics of control for cofounding variables, sample size, effect size, a standardised effect size, power of statistical tests, multiple testing.
 +
* [http://www.sample-size.net/ Sample/effect size online calculators for designing biomedical experiments from UC San Francisco]
 
* Self-paced online courses from UC Berkeley: [https://www.edx.org/course/introduction-statistics-descriptive-uc-berkeleyx-stat2-1x Descriptive Statistics.] [https://www.edx.org/course/introduction-statistics-probability-uc-berkeleyx-stat2-2x Probability.] [https://www.edx.org/course/introduction-statistics-inference-uc-berkeleyx-stat2-3x Inference.]
 
* Self-paced online courses from UC Berkeley: [https://www.edx.org/course/introduction-statistics-descriptive-uc-berkeleyx-stat2-1x Descriptive Statistics.] [https://www.edx.org/course/introduction-statistics-probability-uc-berkeleyx-stat2-2x Probability.] [https://www.edx.org/course/introduction-statistics-inference-uc-berkeleyx-stat2-3x Inference.]
 
* [http://www.stat.berkeley.edu/~stark/SticiGui/ Online book recommended for the UC Berkeley courses]
 
* [http://www.stat.berkeley.edu/~stark/SticiGui/ Online book recommended for the UC Berkeley courses]
Line 76: Line 98:
 
* [https://www.edx.org/course/data-analysis-life-sciences-1-statistics-harvardx-ph525-1x Self-paced online course from Harvard "Statistics and R"]
 
* [https://www.edx.org/course/data-analysis-life-sciences-1-statistics-harvardx-ph525-1x Self-paced online course from Harvard "Statistics and R"]
 
* [https://www.edx.org/course/data-analysis-life-sciences-3-harvardx-ph525-3x Self-paced online course from Harvard "Statistical Inference and Modeling for High-throughput Experiments" ]
 
* [https://www.edx.org/course/data-analysis-life-sciences-3-harvardx-ph525-3x Self-paced online course from Harvard "Statistical Inference and Modeling for High-throughput Experiments" ]
* [http://data.bits.vib.be/pub/trainingen/StatTheory/SlidesFullDay.pdf VIB "Basic statistics theory" course slides.]
+
* [http://students.brown.edu/seeing-theory/ The Seeing Theory website visualizes the fundamental concepts covered in an introductory college statistics, using D3.jc]
* [https://www.bits.vib.be/index.php/training/180#download VIB "Basic statistics in R" course. Tutorial, excercises, cheat sheets.]
+

Revision as of 14:17, 2 October 2017

Course Description

This introductory course to statistics and R is offered in 3 two-hour consecutive modules (please see Course Syllabus below), each consisting of a hands-on practicum in a computer class, using R Studio.

Course Objectives

To introduce or to refresh the basic concepts of statistics and how they can be applied to real-life datasets using R. The students will produce their first scripts that can be re-used when they start analyzing their own data. Knowledge of statistics or R is not required for taking this course. However, familiarity with the material in the previous modules is recommended if the modules are not taken in a sequence.

Course Instructors

  • Sarah Bonnin (Module I) sarah.bonnin@crg.eu
  • German Demidov (Module II) german.demidov@crg.eu
  • Julia Ponomarenko (organizer, Module III) julia.ponomarenko@crg.eu

Time and Location

  • Oct 3, 4, 5, 2017. 11:00 - 13:00. PRBB Building. Boinformatics classroom. 468. 4th floor. The hotel wing.


Course Syllabus, Schedule, and Materials

MODULE I. Introduction to R. Oct 3, 2017.

  • Introduction to R programming language and R Studio:
    • Introduction to R studio: explore environment variable, navigate the history of commands, navigate directory and file structure, workspace and files.
    • Basics of R language: syntax, special characters, simple arithmetic in R console, create/delete and manipulate an object.
    • R scripts: create and run, comment.
  • Functions in R.
    • Input/output: read and write a file, change and create a directory (functions: setwd, getwd).
  • Data structures in R.
    • Vectors and factors: create, modify, subset, manipulate, compare.
    • Matrices and data frames: create, access/extract/subset, modify, arithmetic, conversions, check and name dimensions.
    • Lists: create, access/extract/subset, modify.
    • Missing values: how to deal with NA values (functions: is.na, na.omit).
  • OUTCOME: Write a script that reads matrices and data frames, manipulates them, reads and writes files.


  • Slides for Module 1: here


MODULE II. Descriptive statistics and plotting in R. Oct 4, 2017.

  • Packages in R: find, install, load, explore/find functions and documentation, get help on functions.
  • Exploratory data analysis
    • Descriptive statistical functions:
    • Plots: bar-plots, histograms, box-plots, scatter-plots.
  • Introduction to the ggplot2 package: structure of ggplot2 commands
  • OUTCOME:
    • Install the packages "diamonds" and "WriteXLS".
    • Write a script that manipulates the diamonds data frame, writes it into an Excel file, produces and saves plots.



MODULE II. Introduction to Probability & Hypothesis testing. Oct 10, 2016.

  • Independence, conditional probability, Bayes formula.
  • Distributions, population mean and population variance.
  • Central Limit theorem and the Law of large numbers.
  • The concept of hypothesis testing, type I and type II error, false discovery rate.
  • Significance and confidence level, p-value. Confidence intervals. One-sided and two-sided tests.
  • One-sample and two-sample tests for independent and matched samples with known and unknown variance.
  • Student t-distribution, assumption of normality.
  • Test for proportions.
  • Download the zip-file of the module's materials.


MODULE III. Non-parametric tests & Linear regression. Oct 11, 2016.

  • Non-parametric tests: Sign test, Wilcoxon sum of ranks test (Mann-Whitney U-test), Wilcoxon signed rank test, Kruskal-Wallis test.
  • Kolmogorov-Smirnov (KS) test. Shapiro test for normality. QQ-plot.
  • Data transformation.
  • Download the zip-file for this part of the practicum.




External Resources

Bioinformatics Core Facility @ CRG — 2011-2024