Difference between revisions of "CRG PhD Course 2017 Introduction to Statistics in R"

From Bioinformatics Core Wiki
(MODULE III. Introduction to Statistical Inference. Oct 5, 2017.)
 
(19 intermediate revisions by 2 users not shown)
Line 2: Line 2:
  
 
=== Course Description ===
 
=== Course Description ===
This introductory course to statistics and R is offered in 3 two-hour consecutive modules (please see Course Syllabus below), each consisting of a hands-on practicum in a computer class, using [https://www.rstudio.com R Studio].  
+
This introductory course to exploratory data analysis and R is offered in 3 two-hour consecutive modules (please see Course Syllabus below), each consisting of a hands-on practicum in a computer class, using [https://www.rstudio.com R Studio].  
  
 
=== Course Objectives ===
 
=== Course Objectives ===
To introduce or to refresh the basic concepts of statistics and how they can be applied to real-life datasets using R. The students will produce their first scripts that can be re-used when they start analyzing their own data. Knowledge of statistics or R is not required for taking this course. However, familiarity with the material in the previous modules is recommended if the modules are not taken in a sequence.
+
To introduce or to refresh the basic concepts of descriptive statistics and how they can be applied to real-life datasets using R. The students will produce their first scripts that can be re-used when they start analyzing their own data. Knowledge of statistics or R is not required for taking this course. However, familiarity with the material in the previous modules is recommended if the modules are not taken in a sequence.
  
 
=== Course Instructors ===
 
=== Course Instructors ===
* Sarah Bonnin (Module I) sarah.bonnin@crg.eu
+
* Sarah Bonnin (Module I, II) sarah.bonnin@crg.eu  
* German Demidov (Module II) german.demidov@crg.eu
+
* Julia Ponomarenko (organizer, Module II, III) julia.ponomarenko@crg.eu
* Julia Ponomarenko (organizer, Module III) julia.ponomarenko@crg.eu
+
  
 
=== Time and Location ===
 
=== Time and Location ===
Line 19: Line 18:
  
 
==== MODULE I. Introduction to R. Oct 3, 2017. ====
 
==== MODULE I. Introduction to R. Oct 3, 2017. ====
* Introduction to R programming language and R Studio:  
+
* Introduction to R:  
** Introduction to R studio: explore environment variable, navigate the history of commands, navigate directory and file structure, workspace and files.
+
** What is R?
** Basics of R language: syntax, special characters, simple arithmetic in R console, create/delete and manipulate an object.
+
** Why to use R?
** R scripts: create and run, comment.
+
* R studio:  
* Functions in R.
+
** Local installation
** Input/output: read and write a file, change and create a directory (functions: setwd, getwd).
+
** Understand and explore panels
* Data structures in R.
+
* Basics of R language:  
** Vectors and factors: create, modify, subset, manipulate, compare.
+
** Simple arithmetic in R console
** Matrices and data frames: create, access/extract/subset, modify, arithmetic, conversions, check and name dimensions.
+
** Syntax
** Lists: create, access/extract/subset, modify.
+
** Objects
** Missing values: how to deal with NA values (functions: is.na, na.omit).
+
* Functions in R:
* OUTCOME: Write a script that reads matrices and data frames, manipulates them, reads and writes files.
+
** Use functions
<br>
+
** Get help
* Slides for Module 1: [[Media:161005_Module1_Introduction_to_R.pdf|here]]
+
** Arguments in functions
 +
* Data types and data structures in R
 +
** Types: numeric, character, logical.
 +
** Structures: vectors, data frames, matrices.
 +
* Slides for Module I: [[Media:171003_Introduction_to_R_day1.pdf|Open the pdf file.]]
 
<br>
 
<br>
  
==== MODULE II. Descriptive statistics and plotting in R. Oct 4, 2017. ====
+
==== MODULE II. Descriptive Statistics & Plots in R. Oct 4, 2017. ====
* Packages in R: find, install, load, explore/find functions and documentation, get help on functions.
+
* Part 1:
* Exploratory data analysis
+
** Packages in R: find, install, load, explore/find functions and documentation, get help on functions.
** Descriptive statistical functions: 
+
** Basic plots in R: bar-plots, histograms, box-plots, scatter-plots.
** Plots: bar-plots, histograms, box-plots, scatter-plots.
+
* Part 2:
* Introduction to the ggplot2 package: structure of ggplot2 commands
+
** Exploratory data analysis and descriptive statistical functions: summary, mean, sd, min, max, quantile.
* OUTCOME:  
+
* Materials:
** Install the packages "diamonds" and "WriteXLS".  
+
** Module II, Part 1: [[Media:171004_Introduction_to_R_day2.pdf|Open the pdf file]]
** Write a script that manipulates the diamonds data frame, writes it into an Excel file, produces and saves plots.
+
** Module II, Part 2: [[Media:Module2_Descriptive_Stat_Oct_2017.html.zip|Download the zipped html-file for the practicum.]]
 +
** [[Media:Module_1_Lectures_June_2017.pdf|Slides: Lecture on the topic given at CRG in June 2017.]]
 
<br>
 
<br>
* Slides for Module II: [[Media:Module1_June_2017.html_2.zip|Download the zipped html-file for the practicum.]]
 
* [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf cheatsheet]
 
<br>
 
 
==== MODULE II. Introduction to Probability & Hypothesis testing. Oct 10, 2016.  ====
 
* Independence, conditional probability, Bayes formula.
 
* Distributions, population mean and population variance.
 
* Central Limit theorem and the Law of large numbers.
 
* The concept of hypothesis testing, type I and type II error, false discovery rate.
 
* Significance and confidence level, p-value. Confidence intervals. One-sided and two-sided tests.
 
* One-sample and two-sample tests for independent and matched samples with known and unknown variance. 
 
* Student t-distribution, assumption of normality.
 
* Test for proportions.
 
* [[Media:BIST_Module3_practicum.zip|Download the zip-file of the module's materials.]]
 
<br>
 
 
==== MODULE III. Non-parametric tests & Linear regression. Oct 11, 2016. ====
 
* Non-parametric tests: Sign test, Wilcoxon sum of ranks test (Mann-Whitney U-test), Wilcoxon signed rank test, Kruskal-Wallis test.
 
* Kolmogorov-Smirnov (KS) test. Shapiro test for normality. QQ-plot.
 
* Data transformation.
 
*[[Media:Module_3_Oct_11_2016.html.zip|Download the zip-file for this part of the practicum.]]
 
<br>
 
* Simple linear regression model, residuals, degrees of freedom.
 
* Interpretation of the slope, correlation, and determination coefficients.
 
* Standard error and statistical inference in simple linear regression model.
 
* Analysis of variance (ANOVA). One-way and two-way ANOVA.
 
* [[Media:BIST_Module5_hands_on.zip|Download the zip-file for the practicum on linear regression and ANOVA.]]
 
 
 
  
 +
==== MODULE III. Introduction to Statistical Inference. Oct 5, 2017.  ====
 +
* Part 1:
 +
** Input / Output: Reading data from a file and writing data in the file.
 +
* Part 2:
 +
** Continue on exploratory data analysis: plots.
 +
* Materials:
 +
** Part 1: [[Media:171005_Introduction_to_R_day3.pdf|Download the slides for Input/Output]]
 +
** Part 1: [[Media:Davis_car.txt|Download the Davis_car.txt file]]
 +
** Part 2: Use the zipped-file of the Module II.
 
<br>
 
<br>
  

Latest revision as of 12:18, 5 October 2017

Course Description

This introductory course to exploratory data analysis and R is offered in 3 two-hour consecutive modules (please see Course Syllabus below), each consisting of a hands-on practicum in a computer class, using R Studio.

Course Objectives

To introduce or to refresh the basic concepts of descriptive statistics and how they can be applied to real-life datasets using R. The students will produce their first scripts that can be re-used when they start analyzing their own data. Knowledge of statistics or R is not required for taking this course. However, familiarity with the material in the previous modules is recommended if the modules are not taken in a sequence.

Course Instructors

  • Sarah Bonnin (Module I, II) sarah.bonnin@crg.eu
  • Julia Ponomarenko (organizer, Module II, III) julia.ponomarenko@crg.eu

Time and Location

  • Oct 3, 4, 5, 2017. 11:00 - 13:00. PRBB Building. Boinformatics classroom. 468. 4th floor. The hotel wing.


Course Syllabus, Schedule, and Materials

MODULE I. Introduction to R. Oct 3, 2017.

  • Introduction to R:
    • What is R?
    • Why to use R?
  • R studio:
    • Local installation
    • Understand and explore panels
  • Basics of R language:
    • Simple arithmetic in R console
    • Syntax
    • Objects
  • Functions in R:
    • Use functions
    • Get help
    • Arguments in functions
  • Data types and data structures in R
    • Types: numeric, character, logical.
    • Structures: vectors, data frames, matrices.
  • Slides for Module I: Open the pdf file.


MODULE II. Descriptive Statistics & Plots in R. Oct 4, 2017.


MODULE III. Introduction to Statistical Inference. Oct 5, 2017.


External Resources

Bioinformatics Core Facility @ CRG — 2011-2024