Difference between revisions of "BIST Introduction to Statistics 2017"

From Bioinformatics Core Wiki
(Created page with "__TOC__ === Description === This is an introductory course to statistics and R programming. The R part is offered in 4 practicums followed by 3 practicums of statistical mo...")
 
(Dates, Time and Location)
 
(40 intermediate revisions by 2 users not shown)
Line 3: Line 3:
  
 
=== Description ===
 
=== Description ===
This is an introductory course to statistics and R programming.  
+
This is an introductory course to statistics and R programming. <br>
The R part is offered in 4 practicums followed by 3 practicums of statistical modules.
+
The R part is offered in 4 slow-paced practicums for absolute beginners, followed by 3 fast-paced practicums of statistical modules. <br>
The statistics material is offered in 3 consecutive modules (please see Course Syllabus below), each containing a morning lecture and an afternoon practicum in a computer class. For practical exercises we will use R programming language and [https://www.rstudio.com R Studio]. However, this course is focused on statistics rather than R; therefore, each practicum is designed with the purpose to demonstrate and reinforce understanding of concepts introduced in the lecture rather than to provide a training in R.
+
For practical exercises we will use R programming language and [https://www.rstudio.com R Studio].
 +
 
 +
The statistics material is offered in 3 consecutive modules (please see Course Syllabus below), each containing a morning lecture and an afternoon practicum in a computer class. These practicums are focused on using statistics in R, with the purpose to demonstrate and reinforce understanding of concepts introduced in the lectures, rather than teaching R programming.
 +
<br>
 +
 
  
 
=== Course Instructors ===
 
=== Course Instructors ===
* Dmitri Pervouchine (lectures) pervouchine@gmail.com
+
* [mailto:pervouchine@gmail.com Dmitri Pervouchine] (lectures)
* German Demidov (practicums V - VII) german.demidov@crg.eu
+
* [mailto:sarah.bonnin@crg.eu Sarah Bonnin] (practicums I - IV)  
* Sarah Bonnin (practicums I - IV) sarah.bonnin@crg.eu  
+
* [mailto:estefania.mancini@crg.eu Estefania Mancini] (practicums I - IV)
* Julia Ponomarenko (organizer, practicums V - VII) julia.ponomarenko@crg.eu
+
* [mailto:german.demidov@crg.eu German Demidov] (practicums V - VII)  
 +
* [mailto:julia.ponomarenko@crg.eu Julia Ponomarenko] (organizer, practicums V - VII)
 +
<br>
  
 
=== Dates, Time and Location ===
 
=== Dates, Time and Location ===
* LECTURES: PRBB. AULA Auditorium. 4th floor. The hotel wing.
+
* '''Module 0. Introduction to R. ''' May 25, 26, 29, 30, 2017.  
** June 6, 8, 9, 2017. 10:00 - 13:00.  
+
** 10:00 - 13:00.  
* PRACTICUMS: PRBB. Boinformatics classroom. 468. 4th floor. The hotel wing.
+
** PRBB. Boinformatics classroom (468). 4th floor. The hotel/North wing.
** May 26, 26, 29, 30, 2017. 10:00 - 13:00.
+
** June 6, 8, 9, 2017. 14:00 - 17:00.  
+
  
 +
 +
*''' Modules I, II, III. Introduction to Statistics. '''June 6, 8, 9, 2017.
 +
** LECTURES.
 +
*** 10:00 - 13:00.
 +
*** PRBB. Ramon y Cajal.
 +
** PRACTICUMS.
 +
*** 14:00 - 17:00.
 +
*** PRBB. Boinformatics classroom (468). 4th floor. The hotel/North wing.
 +
<br>
  
 
=== Course Syllabus, Schedule, and Materials ===
 
=== Course Syllabus, Schedule, and Materials ===
  
 +
<br>
 +
==== <b>MODULE 0. Introduction to R. May 25, 26, 29, 30.</b> ====
  
==== MODULE 0. Workshop "Introduction to R". May 2, 2016. ICFO. ====
+
* <u>PRACTICUM I. Intro to R and R Studio. May 25. 10:00 - 13:00. </u>
[[Media:ICFO_R.zip|Download the workshop materials.]] The workshop was given by Dr. Alejandro Caceres, CREAL, and organized by the ICFO's Training and Development Program.
+
** Introduction to R studio: explore environment variable, navigate the history of commands, navigate directory and file structure, workspace and files.
 +
** Simple arithmetic in R console.
 +
** Create and delete an object.
 +
** Introduction to data types and the "vector" data structure.
 +
** Create and run a short script.
 +
** Read and write a file.
 +
** OUTCOME: Write a script that creates (and enters) a directory,= and writes a simple calculation into a file.
  
  
==== MODULE I. Descriptive statistics. May 6, 2016. CRG. ====
+
* <u>PRACTICUM II. Data structures in R. May 26. 10:00 - 13:00.</u>
* LECTURE I. [[Media:Module1.pdf|View slides in this browser window.]] Exploratory data analysis: bar-plot, histogram, CDF, box-plot, scatter-plot, pie charts etc. Samples, measures of center and spread, percentiles, odds ratio. Outliers and robustness. Experiment versus observational study, confounding factors, simple random sample, other types of sampling, biases in sampling techniques.  
+
** More on vectors.
* LECTURE II. [[Media:Introduction to R Module1.pdf|View slides in this browser window.]] Introduction to R programming language and R Studio: Data types, variables, packages, functions, handling files/scripts/projects.
+
** Matrices and data frames: create, access/extract/subset, modify, arithmetic, conversions, check and name dimensions.
* PRACTICUM. [[Media:Practicum1 ggplot2.pdf|View pdf-file in this browser window.]] Basic plots in R using the ggplot2 package.
+
** OUTCOME: Produce a script that reads matrices and data frames, converts one into another, and makes calculations.
  
  
==== MODULE II. Introduction to Probability. May 9, 2016. CRG. ====
+
* <u>PRACTICUM III. Lists & Packages. May 29. 10:00 - 13:00.</u>
* LECTURE. [[Media:Module2.pdf|View slides in this browser window.]] Independence, conditional probability, Bayes formula. Distributions, population mean and population variance, Binomial, Poisson, and Normal distribution. Central Limit theorem and the Law of large numbers. Continuity correction. Sampling with and without replacement. Correction for finite population size.
+
** More on data structures. Lists: create, access/extract/subset, modify.
* PRACTICUM. [[Media:Practicum2.zip|Download the zip-file.]] Elementary probability problems in R, pdf and cdf functions, simulation explicating the law of large numbers.  
+
** Packages: find, install, load, explore/find functions and documentation, get help on functions.
 +
** OUTCOME: Install the packages "diamonds" and "WriteXLS". Use them in a script that manipulates the diamonds data frame and writes it into an Excel file.
 +
 
 +
 
 +
* <u>PRACTICUM IV. Plots & Graphics in R. May 30. 10:00 - 13:00.</u>
 +
** Basic plotting: scatter plots, box plots, histograms, density plots. Changing colors, points shapes, titles, labels, legend, axes, etc.
 +
** Introduction to ggplot2 package: structure of ggplot2 commands, scatter plots.
 +
** OUTCOME: Write a script that produces, customizes, and saves plots in files.
 +
<br>
 +
 
 +
==== <b>MODULE I. Descriptive Statistics & Intro to Probability. June 6. </b> ====
 +
 
 +
* <u>LECTURE. 10:00 - 13:00.<br></u>
 +
** Exploratory data analysis and graphical displays.
 +
** Samples, measures of center and spread, percentiles, odds ratio.
 +
** Outliers and robustness.
 +
** Independence, conditional probability, Bayes formula.
 +
** Distributions, population mean and population variance, Binomial, Poisson, and Normal distribution.
 +
** Central Limit theorem and the Law of large numbers.
 +
** Continuity correction.
 +
** Sampling with and without replacement.
 +
** Correction for finite population size.  
 
* [[Media:Tables corrected.pdf|STATISTICAL TABLES]]
 
* [[Media:Tables corrected.pdf|STATISTICAL TABLES]]
* [[Media:QUIZ2.pdf|QUIZ 2]]
+
<br>
 +
 
 +
* <u>PRACTICUM. 14:00 - 17:00.</u>
 +
** Descriptive statistics.
 +
** Plots: Bar-plot, histogram, CDF, box-plot, scatter-plot, pie charts etc.
 +
** Independence, conditional probability, Bayes formula.
 +
** Distributions, population mean and population variance.
 +
** Central Limit theorem and the Law of large numbers.
 +
<br>
  
 +
==== <b>MODULE II. Statistical Inference. June 8. </b> ====
  
==== MODULE III. Statistical Inference, part I. May 13, 2016. CRG. ====
+
* <u>LECTURE. 10:00 - 13:00.</u>.
* LECTURE. [[Media:Module3.pdf|View slides in this browser window.]] Statistical Inference, part I. The concept of hypothesis testing, type I and type II error, false discovery rate. Significance and confidence level, p-value. Confidence intervals. One-sided and two-sided tests and confidence intervals. Sampling distribution, estimators, standard error. Normal probabilities in application to p-value. One-sample and two-sample tests for independent and matched samples with known variance. The case of unknown variance and Student t-distribution, assumption of normality. Pooled variance and equal variances assumption.  
+
** The concept of hypothesis testing, type I and type II error, false discovery rate.
* PRACTICUM. [[Media:BIST_Module3_practicum.zip|Download the zip-file.]] One- and two-sample tests with known and unknown variance, test for proportions, simulation involving confidence intervals and t-distribution.
+
** Significance and confidence level, p-value.
* [[Media:QUIZ3.pdf|QUIZ 3]]
+
** One-sided and two-sided tests and confidence intervals.
 +
** Sampling distribution, estimators, standard error.  
 +
** Normal probabilities in application to p-value.  
 +
** One-sample and two-sample tests for independent and matched samples with known variance.
 +
** The case of unknown variance and Student t-distribution, assumption of normality.
 +
** Pooled variance and equal variances assumption.  
 +
** Estimation of variance.  
 +
** Fisher test for variance equality.  
 +
** Non-parametric tests: Sign test, Wilcoxon sum of ranks test (Mann-Whitney U-test), Wilcoxon signed rank test.
 +
** Chi-square test for goodness of fit, chi-square test for independence.  
 +
** Sample size estimation.  
 +
<br>
  
 +
* <u>PRACTICUM. 14:00 - 17:00.</u>.
 +
** One- and two-sample tests with known and unknown variance, test for proportions, simulation involving confidence intervals and t-distribution.
 +
** Non-parametric tests.
 +
** Kolmogorov-Smirnov (KS) test.
 +
** Shapiro test for normality.
 +
** QQ-plot.
 +
** Data transformation.
 +
<br>
  
==== MODULE IV. Statistical Inference, part II. May 18, 2016. CRG. ====
+
==== <b>MODULE III. Statistical modeling & Regression. June 9.</b> ====
* LECTURE. [[Media:Module4-2.pdf|View slides in this browser window.]] Statistical Inference, part II. Estimation of variance. Fisher test for variance equality. Non-parametric tests. Sign test, Wilcoxon sum of ranks test (Mann-Whitney U-test), Wilcoxon signed rank test. Chi-square test for goodness of fit, chi-square test for independence. Kolmogorov-Smirnov (KS) test. Shapiro test for normality. Sample size estimation. Correction for multiple testing, family-wise error rate.
+
* PRACTICUM. [[Media:Module4.zip|Download the zip-file.]] Tests with unknown variance, non-parametric tests, simulations explicating non-parametric tests, FDR.
+
* [[Media:QUIZ4.pdf|QUIZ 4]]
+
  
 +
* <u>LECTURE. 10:00 - 13:00.</u>
 +
** Simple linear regression model, residuals, degrees of freedom, least squares method, correlation coefficient, variance decomposition, determination coefficient.
 +
** Interpretation of the slope, correlation, and determination coefficients.
 +
** Standard error and statistical inference in simple linear regression model.
 +
** Analysis of variance (ANOVA). One-way and two-way ANOVA.
 +
** Beyond simple regression models: multiple regression, logistic regression.
 +
** Correction for multiple testing, family-wise error rate.
 +
<br>
  
==== MODULE V. Statistical modeling, Regression. May 20, 2016. CRG. ====
+
* <u>PRACTICUM. 14:00 - 17:00.</u>
* LECTURE. [[Media:Module5-2.pdf|View slides in this browser window.]] Simple linear regression model, residuals, degrees of freedom, least squares method, correlation coefficient, variance decomposition, determination coefficient. Interpretation of the slope, correlation, and determination coefficients. Standard error and statistical inference in simple linear regression model. Analysis of variance (ANOVA). One-way and two-way ANOVA.  
+
** Problems on linear regression.
* PRACTICUM. [[Media:BIST_Module5_hands_on.zip|Download the zip-file.]] Problems on linear regression, ANOVA, data transformation.
+
** ANOVA.
* [[Media:QUIZ5.pdf|QUIZ 5]]
+
  
  
 +
<br>
 
=== External Resources ===
 
=== External Resources ===
 
* [http://www.nature.com/collections/qghhqm Nature Web-collection "Statistics for Biologists"]  
 
* [http://www.nature.com/collections/qghhqm Nature Web-collection "Statistics for Biologists"]  
Line 71: Line 146:
 
* [https://ryouready.wordpress.com Blog "R you ready?"]
 
* [https://ryouready.wordpress.com Blog "R you ready?"]
 
* [http://www.r-statistics.com "R-statistics blog"]
 
* [http://www.r-statistics.com "R-statistics blog"]
 +
* [https://eda.nc3rs.org.uk/experimental-design Guide and tool for design and analysis of biological experiments from the UK's National Center for the Replacement Refinement and Reduction of Animals in Research (NC3R)], covering topics of control for cofounding variables, sample size, effect size, a standardised effect size, power of statistical tests, multiple testing.
 +
* [http://www.sample-size.net/ Sample/effect size online calculators for designing biomedical experiments from UC San Francisco]
 
* Self-paced online courses from UC Berkeley: [https://www.edx.org/course/introduction-statistics-descriptive-uc-berkeleyx-stat2-1x Descriptive Statistics.] [https://www.edx.org/course/introduction-statistics-probability-uc-berkeleyx-stat2-2x Probability.] [https://www.edx.org/course/introduction-statistics-inference-uc-berkeleyx-stat2-3x Inference.]
 
* Self-paced online courses from UC Berkeley: [https://www.edx.org/course/introduction-statistics-descriptive-uc-berkeleyx-stat2-1x Descriptive Statistics.] [https://www.edx.org/course/introduction-statistics-probability-uc-berkeleyx-stat2-2x Probability.] [https://www.edx.org/course/introduction-statistics-inference-uc-berkeleyx-stat2-3x Inference.]
 
* [http://www.stat.berkeley.edu/~stark/SticiGui/ Online book recommended for the UC Berkeley courses]
 
* [http://www.stat.berkeley.edu/~stark/SticiGui/ Online book recommended for the UC Berkeley courses]
Line 80: Line 157:
 
* [http://data.bits.vib.be/pub/trainingen/StatTheory/SlidesFullDay.pdf VIB "Basic statistics theory" course slides.]
 
* [http://data.bits.vib.be/pub/trainingen/StatTheory/SlidesFullDay.pdf VIB "Basic statistics theory" course slides.]
 
* [https://www.bits.vib.be/index.php/training/180#download VIB "Basic statistics in R" course. Tutorial, excercises, cheat sheets.]
 
* [https://www.bits.vib.be/index.php/training/180#download VIB "Basic statistics in R" course. Tutorial, excercises, cheat sheets.]
 +
* [http://students.brown.edu/seeing-theory/ The Seeing Theory website visualizes the fundamental concepts covered in an introductory college statistics, using D3.jc]

Latest revision as of 10:15, 13 April 2017


Description

This is an introductory course to statistics and R programming.
The R part is offered in 4 slow-paced practicums for absolute beginners, followed by 3 fast-paced practicums of statistical modules.
For practical exercises we will use R programming language and R Studio.

The statistics material is offered in 3 consecutive modules (please see Course Syllabus below), each containing a morning lecture and an afternoon practicum in a computer class. These practicums are focused on using statistics in R, with the purpose to demonstrate and reinforce understanding of concepts introduced in the lectures, rather than teaching R programming.


Course Instructors


Dates, Time and Location

  • Module 0. Introduction to R. May 25, 26, 29, 30, 2017.
    • 10:00 - 13:00.
    • PRBB. Boinformatics classroom (468). 4th floor. The hotel/North wing.


  • Modules I, II, III. Introduction to Statistics. June 6, 8, 9, 2017.
    • LECTURES.
      • 10:00 - 13:00.
      • PRBB. Ramon y Cajal.
    • PRACTICUMS.
      • 14:00 - 17:00.
      • PRBB. Boinformatics classroom (468). 4th floor. The hotel/North wing.


Course Syllabus, Schedule, and Materials


MODULE 0. Introduction to R. May 25, 26, 29, 30.

  • PRACTICUM I. Intro to R and R Studio. May 25. 10:00 - 13:00.
    • Introduction to R studio: explore environment variable, navigate the history of commands, navigate directory and file structure, workspace and files.
    • Simple arithmetic in R console.
    • Create and delete an object.
    • Introduction to data types and the "vector" data structure.
    • Create and run a short script.
    • Read and write a file.
    • OUTCOME: Write a script that creates (and enters) a directory,= and writes a simple calculation into a file.


  • PRACTICUM II. Data structures in R. May 26. 10:00 - 13:00.
    • More on vectors.
    • Matrices and data frames: create, access/extract/subset, modify, arithmetic, conversions, check and name dimensions.
    • OUTCOME: Produce a script that reads matrices and data frames, converts one into another, and makes calculations.


  • PRACTICUM III. Lists & Packages. May 29. 10:00 - 13:00.
    • More on data structures. Lists: create, access/extract/subset, modify.
    • Packages: find, install, load, explore/find functions and documentation, get help on functions.
    • OUTCOME: Install the packages "diamonds" and "WriteXLS". Use them in a script that manipulates the diamonds data frame and writes it into an Excel file.


  • PRACTICUM IV. Plots & Graphics in R. May 30. 10:00 - 13:00.
    • Basic plotting: scatter plots, box plots, histograms, density plots. Changing colors, points shapes, titles, labels, legend, axes, etc.
    • Introduction to ggplot2 package: structure of ggplot2 commands, scatter plots.
    • OUTCOME: Write a script that produces, customizes, and saves plots in files.


MODULE I. Descriptive Statistics & Intro to Probability. June 6.

  • LECTURE. 10:00 - 13:00.
    • Exploratory data analysis and graphical displays.
    • Samples, measures of center and spread, percentiles, odds ratio.
    • Outliers and robustness.
    • Independence, conditional probability, Bayes formula.
    • Distributions, population mean and population variance, Binomial, Poisson, and Normal distribution.
    • Central Limit theorem and the Law of large numbers.
    • Continuity correction.
    • Sampling with and without replacement.
    • Correction for finite population size.
  • STATISTICAL TABLES


  • PRACTICUM. 14:00 - 17:00.
    • Descriptive statistics.
    • Plots: Bar-plot, histogram, CDF, box-plot, scatter-plot, pie charts etc.
    • Independence, conditional probability, Bayes formula.
    • Distributions, population mean and population variance.
    • Central Limit theorem and the Law of large numbers.


MODULE II. Statistical Inference. June 8.

  • LECTURE. 10:00 - 13:00..
    • The concept of hypothesis testing, type I and type II error, false discovery rate.
    • Significance and confidence level, p-value.
    • One-sided and two-sided tests and confidence intervals.
    • Sampling distribution, estimators, standard error.
    • Normal probabilities in application to p-value.
    • One-sample and two-sample tests for independent and matched samples with known variance.
    • The case of unknown variance and Student t-distribution, assumption of normality.
    • Pooled variance and equal variances assumption.
    • Estimation of variance.
    • Fisher test for variance equality.
    • Non-parametric tests: Sign test, Wilcoxon sum of ranks test (Mann-Whitney U-test), Wilcoxon signed rank test.
    • Chi-square test for goodness of fit, chi-square test for independence.
    • Sample size estimation.


  • PRACTICUM. 14:00 - 17:00..
    • One- and two-sample tests with known and unknown variance, test for proportions, simulation involving confidence intervals and t-distribution.
    • Non-parametric tests.
    • Kolmogorov-Smirnov (KS) test.
    • Shapiro test for normality.
    • QQ-plot.
    • Data transformation.


MODULE III. Statistical modeling & Regression. June 9.

  • LECTURE. 10:00 - 13:00.
    • Simple linear regression model, residuals, degrees of freedom, least squares method, correlation coefficient, variance decomposition, determination coefficient.
    • Interpretation of the slope, correlation, and determination coefficients.
    • Standard error and statistical inference in simple linear regression model.
    • Analysis of variance (ANOVA). One-way and two-way ANOVA.
    • Beyond simple regression models: multiple regression, logistic regression.
    • Correction for multiple testing, family-wise error rate.


  • PRACTICUM. 14:00 - 17:00.
    • Problems on linear regression.
    • ANOVA.



External Resources

Bioinformatics Core Facility @ CRG — 2011-2024