BIST Introduction to Statistics 2016
From Bioinformatics Core Wiki
Revision as of 08:56, 28 April 2016 by Jponomarenko (Talk | contribs)
Contents
BIST "Introduction to Biostatistics" Course
Online Resources
- Nature Web-collection "Statistics for Biologists": http://www.nature.com/collections/qghhqm
- Self-paced online UC Berkeley courses
- Online book recommended by the above courses http://www.stat.berkeley.edu/~stark/SticiGui/
- Upcoming (Feb 2, 2016) MIT course Introduction to Probability - The Science of Uncertainty. https://www.edx.org/course/introduction-probability-science-mitx-6-041x-1
- Self-paced online course "Explore Statistics with R" https://www.edx.org/course/explore-statistics-r-kix-kiexplorx-0
- "An Introduction to Statistical Learning with Applications in R" from Stanford http://www-bcf.usc.edu/~gareth/ISL/
- 100 Statistical Tests.pdf - ResearchGate - just search Google to get a link
- VIB "Basic statistics in R" course. Tutorial and links. https://www.bits.vib.be/index.php/training/180#download
Comparison of two samples
- The t-test, paired or unpaired, in R >t.test (x,y, paired=TRUE). The t-test provides an exact test for the equality of the means of two normal populations with unknown, but equal, variances. The latter can be checked with F-test, or in R >var.test(x,y). https://en.wikipedia.org/wiki/Student's_t-test#Paired_samples
- Non-parametric tests. No assumption about variances and normality.
- Independent samples. The Wilcoxon rank-sum test, aka Mann-Witney test. https://en.wikipedia.org/wiki/Mann–Whitney_U_test. In R, >wilcox.test(x,y). H0= Ranks of means of two samples are not different.
- Paired samples. The Wilcoxon signed-Rank Test. In R, >wilcox.test(x,y, paired=TRUE). See http://vassarstats.net/textbook/ch12a.html.
- The Kolmogorov-Smirnov test. In R, >ks.test(x,y). https://en.wikipedia.org/wiki/Kolmogorov–Smirnov_test. If two samples have the same mean but different variance or/and shape/distribution, this test can spot it. It is more powerful than the Wilcoxon test. The statistic is calculated by finding the maximum absolute value of the differences between the two sample cumulative distribution functions. See http://www.physics.csbsju.edu/stats/KS-test.html.
Comparison of two microbiome samples
- New (2012) biostatistical methods for the analysis of microbiome data based on a fully parametric approach using all the data. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3527355/#pone.0052078-LaRosa1
- The use of a fully parametric model for these data has the benefit over alternative non-parametric approaches such as bootstrapping and permutation testing, in that this model is able to retain more information contained in the data.
- R package "HMP" is available. http://cran.r-project.org/web/packages/HMP/HMP.pdf. To install it: > source("http://www.bioconductor.org/biocLite.R"); biocLite("HMP")
Online courses
- Ten rules for online learning. http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002631
- Self-paced online course "Intro to Linux" https://www.edx.org/course/introduction-linux-linuxfoundationx-lfs101x-2
- Self-paced online course from Microsoft "Intro to R programming" https://www.edx.org/course/introduction-r-programming-microsoft-dat204x-0
- Self-paced online course "Introduction to Cloud Computing" https://www.edx.org/course/introduction-cloud-computing-ieeex-cloudintro-x-0
- Self-paced online course from Microsoft "Data Science and Machine Learning Essentials" https://www.edx.org/course/data-science-machine-learning-essentials-microsoft-dat203x-0
- Self-paced online courses in the seria "Data Analysis for Life Sciences" from Harvard:
- 1: Statistics and R https://www.edx.org/course/data-analysis-life-sciences-1-statistics-harvardx-ph525-1x
- 2: Introduction to Linear Models and Matrix Algebra https://www.edx.org/course/data-analysis-life-sciences-2-harvardx-ph525-2x
- 3: Statistical Inference and Modeling for High-throughput Experiments https://www.edx.org/course/data-analysis-life-sciences-3-harvardx-ph525-3x
- 4: High-Dimensional Data Analysis https://www.edx.org/course/data-analysis-life-sciences-4-high-harvardx-ph525-4x
- 5: Introduction to Bioconductor: Annotation and Analysis of Genomes and Genomic Assays https://www.edx.org/course/data-analysis-life-sciences-5-harvardx-ph525-5x
- 6: High-performance Computing for Reproducible Genomics https://www.edx.org/course/data-analysis-life-sciences-6-high-harvardx-ph525-6x
- 7: Case Studies in Functional Genomics https://www.edx.org/course/data-analysis-life-sciences-7-case-harvardx-ph525-7x
- Self-paced excellent course from MIT "Introduction to Biology - The Secret of Life" https://www.edx.org/course/introduction-biology-secret-life-mitx-7-00x-2