Difference between revisions of "CRG PhD Course 2017 Introduction to Statistics in R"

From Bioinformatics Core Wiki
Line 29: Line 29:
 
** Lists: create, access/extract/subset, modify.
 
** Lists: create, access/extract/subset, modify.
 
** Missing values: how to deal with NA values (functions: is.na, na.omit).
 
** Missing values: how to deal with NA values (functions: is.na, na.omit).
* OUTCOME: Write a script that reads matrices and data frames, manipulates them, reads and writes files.
 
 
* Slides for Module I: [[Media:161005_Module1_Introduction_to_R.pdf|Open the pdf file.]]
 
* Slides for Module I: [[Media:161005_Module1_Introduction_to_R.pdf|Open the pdf file.]]
  
Line 35: Line 34:
 
==== MODULE II. Descriptive Statistics & Plots in R. Oct 4, 2017. ====
 
==== MODULE II. Descriptive Statistics & Plots in R. Oct 4, 2017. ====
 
* Packages in R: find, install, load, explore/find functions and documentation, get help on functions.
 
* Packages in R: find, install, load, explore/find functions and documentation, get help on functions.
* Exploratory data analysis  
+
* Basic plots in R: bar-plots, histograms, box-plots, scatter-plots.
** Descriptive statistical functions:  summary, mean, sd, min, max, quantile.
+
* Exploratory data analysis and descriptive statistical functions:  summary, mean, sd, min, max, quantile.
** Plots: bar-plots, histograms, box-plots, scatter-plots.
+
* Introduction to the ggplot2 package: structure of ggplot2 commands
+
 
* OUTCOME:  
 
* OUTCOME:  
 
** Install the packages "diamonds" and "WriteXLS".  
 
** Install the packages "diamonds" and "WriteXLS".  
 
** Write a script that manipulates the diamonds data frame, writes it into an Excel file, produces and saves plots.
 
** Write a script that manipulates the diamonds data frame, writes it into an Excel file, produces and saves plots.
* Slides for Module II: [[Media:Module1_June_2017.html_2.zip|Download the zipped html-file for the practicum.]]  
+
* Slides for Module II, Part 1:
 +
* Slides for Module II, Part 2: [[Media:Module1_June_2017.html_2.zip|Download the zipped html-file for the practicum.]]  
 
* [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf The ggplot2 cheatsheet]
 
* [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf The ggplot2 cheatsheet]
 
<br>
 
<br>

Revision as of 15:10, 2 October 2017

Course Description

This introductory course to statistics and R is offered in 3 two-hour consecutive modules (please see Course Syllabus below), each consisting of a hands-on practicum in a computer class, using R Studio.

Course Objectives

To introduce or to refresh the basic concepts of statistics and how they can be applied to real-life datasets using R. The students will produce their first scripts that can be re-used when they start analyzing their own data. Knowledge of statistics or R is not required for taking this course. However, familiarity with the material in the previous modules is recommended if the modules are not taken in a sequence.

Course Instructors

  • Sarah Bonnin (Module I, II) sarah.bonnin@crg.eu
  • Julia Ponomarenko (organizer, Module II, III) julia.ponomarenko@crg.eu

Time and Location

  • Oct 3, 4, 5, 2017. 11:00 - 13:00. PRBB Building. Boinformatics classroom. 468. 4th floor. The hotel wing.


Course Syllabus, Schedule, and Materials

MODULE I. Introduction to R. Oct 3, 2017.

  • Introduction to R programming language and R Studio:
    • Introduction to R studio: explore environment variable, navigate the history of commands, navigate directory and file structure, workspace and files.
    • Basics of R language: syntax, special characters, simple arithmetic in R console, create/delete and manipulate an object.
    • R scripts: create and run, comment.
  • Functions in R.
    • Input/output: read and write a file, change and create a directory (functions: setwd, getwd).
  • Data structures in R (functions: class, dim, sum)
    • Vectors and factors: create, modify, subset, manipulate, compare.
    • Matrices and data frames: create, access/extract/subset, modify, arithmetic, conversions, check and name dimensions.
    • Lists: create, access/extract/subset, modify.
    • Missing values: how to deal with NA values (functions: is.na, na.omit).
  • Slides for Module I: Open the pdf file.


MODULE II. Descriptive Statistics & Plots in R. Oct 4, 2017.

  • Packages in R: find, install, load, explore/find functions and documentation, get help on functions.
  • Basic plots in R: bar-plots, histograms, box-plots, scatter-plots.
  • Exploratory data analysis and descriptive statistical functions: summary, mean, sd, min, max, quantile.
  • OUTCOME:
    • Install the packages "diamonds" and "WriteXLS".
    • Write a script that manipulates the diamonds data frame, writes it into an Excel file, produces and saves plots.
  • Slides for Module II, Part 1:
  • Slides for Module II, Part 2: Download the zipped html-file for the practicum.
  • The ggplot2 cheatsheet


MODULE III. Introduction to Statistical Inference. Oct 5, 2017.

  • The concept of hypothesis testing, type I and type II error, false discovery rate.
  • Significance and confidence level, p-value. Confidence intervals. One-sided and two-sided tests.
  • One-sample and two-sample tests for independent and matched samples with known and unknown variance.
  • Student t-distribution, assumption of normality.
  • Test for proportions.
  • Non-parametric tests: Sign test, Wilcoxon sum of ranks test (Mann-Whitney U-test), Wilcoxon signed rank test, Kruskal-Wallis test.
  • Kolmogorov-Smirnov (KS) test. Shapiro test for normality. QQ-plot.
  • Data transformation.
  • Download the zip-file of the module's materials.


External Resources

Bioinformatics Core Facility @ CRG — 2011-2024