Difference between revisions of "CRG PhD Course 2017 Introduction to Statistics in R"

From Bioinformatics Core Wiki
(MODULE I. Introduction to R. Oct 3, 2017.)
(MODULE III. Introduction to Statistical Inference. Oct 5, 2017.)
 
(14 intermediate revisions by 2 users not shown)
Line 2: Line 2:
  
 
=== Course Description ===
 
=== Course Description ===
This introductory course to statistics and R is offered in 3 two-hour consecutive modules (please see Course Syllabus below), each consisting of a hands-on practicum in a computer class, using [https://www.rstudio.com R Studio].  
+
This introductory course to exploratory data analysis and R is offered in 3 two-hour consecutive modules (please see Course Syllabus below), each consisting of a hands-on practicum in a computer class, using [https://www.rstudio.com R Studio].  
  
 
=== Course Objectives ===
 
=== Course Objectives ===
To introduce or to refresh the basic concepts of statistics and how they can be applied to real-life datasets using R. The students will produce their first scripts that can be re-used when they start analyzing their own data. Knowledge of statistics or R is not required for taking this course. However, familiarity with the material in the previous modules is recommended if the modules are not taken in a sequence.
+
To introduce or to refresh the basic concepts of descriptive statistics and how they can be applied to real-life datasets using R. The students will produce their first scripts that can be re-used when they start analyzing their own data. Knowledge of statistics or R is not required for taking this course. However, familiarity with the material in the previous modules is recommended if the modules are not taken in a sequence.
  
 
=== Course Instructors ===
 
=== Course Instructors ===
Line 35: Line 35:
 
** Types: numeric, character, logical.
 
** Types: numeric, character, logical.
 
** Structures: vectors, data frames, matrices.
 
** Structures: vectors, data frames, matrices.
* Slides for Module I: [[Media:171003_Module1_Introduction_to_R.pdf|Open the pdf file.]]
+
* Slides for Module I: [[Media:171003_Introduction_to_R_day1.pdf|Open the pdf file.]]
 +
<br>
  
 
==== MODULE II. Descriptive Statistics & Plots in R. Oct 4, 2017. ====
 
==== MODULE II. Descriptive Statistics & Plots in R. Oct 4, 2017. ====
* Packages in R: find, install, load, explore/find functions and documentation, get help on functions.
+
* Part 1:
* Basic plots in R: bar-plots, histograms, box-plots, scatter-plots.
+
** Packages in R: find, install, load, explore/find functions and documentation, get help on functions.
* Exploratory data analysis and descriptive statistical functions:  summary, mean, sd, min, max, quantile.
+
** Basic plots in R: bar-plots, histograms, box-plots, scatter-plots.
* Slides for Module II, Part 1:  
+
* Part 2:
* Slides for Module II, Part 2: [[Media:Module1_June_2017.html_2.zip|Download the zipped html-file for the practicum.]]  
+
** Exploratory data analysis and descriptive statistical functions:  summary, mean, sd, min, max, quantile.
* [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf The ggplot2 cheatsheet]
+
* Materials:
 +
** Module II, Part 1: [[Media:171004_Introduction_to_R_day2.pdf|Open the pdf file]]
 +
** Module II, Part 2: [[Media:Module2_Descriptive_Stat_Oct_2017.html.zip|Download the zipped html-file for the practicum.]]  
 +
** [[Media:Module_1_Lectures_June_2017.pdf|Slides: Lecture on the topic given at CRG in June 2017.]]
 
<br>
 
<br>
  
 
==== MODULE III. Introduction to Statistical Inference. Oct 5, 2017.  ====
 
==== MODULE III. Introduction to Statistical Inference. Oct 5, 2017.  ====
* The concept of hypothesis testing, type I and type II error, false discovery rate.
+
* Part 1:
* Significance and confidence level, p-value. Confidence intervals. One-sided and two-sided tests.
+
** Input / Output: Reading data from a file and writing data in the file.
* One-sample and two-sample tests for independent and matched samples with known and unknown variance.
+
* Part 2:
* Student t-distribution, assumption of normality.
+
** Continue on exploratory data analysis: plots.
* Test for proportions.
+
* Materials:
* Non-parametric tests: Sign test, Wilcoxon sum of ranks test (Mann-Whitney U-test), Wilcoxon signed rank test, Kruskal-Wallis test.
+
** Part 1: [[Media:171005_Introduction_to_R_day3.pdf|Download the slides for Input/Output]]
* Kolmogorov-Smirnov (KS) test. Shapiro test for normality. QQ-plot.
+
** Part 1: [[Media:Davis_car.txt|Download the Davis_car.txt file]]
* Data transformation.
+
** Part 2: Use the zipped-file of the Module II.
* [[Media:BIST_Module3_practicum.zip|Download the zip-file of the module's materials.]]
+
 
<br>
 
<br>
  

Latest revision as of 12:18, 5 October 2017

Course Description

This introductory course to exploratory data analysis and R is offered in 3 two-hour consecutive modules (please see Course Syllabus below), each consisting of a hands-on practicum in a computer class, using R Studio.

Course Objectives

To introduce or to refresh the basic concepts of descriptive statistics and how they can be applied to real-life datasets using R. The students will produce their first scripts that can be re-used when they start analyzing their own data. Knowledge of statistics or R is not required for taking this course. However, familiarity with the material in the previous modules is recommended if the modules are not taken in a sequence.

Course Instructors

  • Sarah Bonnin (Module I, II) sarah.bonnin@crg.eu
  • Julia Ponomarenko (organizer, Module II, III) julia.ponomarenko@crg.eu

Time and Location

  • Oct 3, 4, 5, 2017. 11:00 - 13:00. PRBB Building. Boinformatics classroom. 468. 4th floor. The hotel wing.


Course Syllabus, Schedule, and Materials

MODULE I. Introduction to R. Oct 3, 2017.

  • Introduction to R:
    • What is R?
    • Why to use R?
  • R studio:
    • Local installation
    • Understand and explore panels
  • Basics of R language:
    • Simple arithmetic in R console
    • Syntax
    • Objects
  • Functions in R:
    • Use functions
    • Get help
    • Arguments in functions
  • Data types and data structures in R
    • Types: numeric, character, logical.
    • Structures: vectors, data frames, matrices.
  • Slides for Module I: Open the pdf file.


MODULE II. Descriptive Statistics & Plots in R. Oct 4, 2017.


MODULE III. Introduction to Statistical Inference. Oct 5, 2017.


External Resources

Bioinformatics Core Facility @ CRG — 2011-2024