Data Analysis

Basics

Yangyong Ye

SOE, RUC

2020-12-14

Probability and distributions

Important Distribution

  • normal

  • student

  • chi 2

  • F

Relation Test

estimation and inference

t_test

  • one sample t test

  • two sample t test

  • paired sample t test

The T-Distribution, also known as Student’s t-distribution gets its name from William Sealy Gosset who first published it in English in 1908 in the scientific journal Biometrika using his pseudonym “Student” because his employer preferred staff to use pen names when publishing scientific papers instead of their real name, so he used the name “Student” to hide his identity.

ANOVA

  • one way anova
  • two way anova

Ronald Fisher introduced the term variance and proposed its formal analysis in a 1918 article The Correlation Between Relatives on the Supposition of Mendelian Inheritance.His first application of the analysis of variance was published in 1921.Analysis of variance became widely known after being included in Fisher’s 1925 book Statistical Methods for Research Workers.

chi-2 test

In 1900, Pearson published a paper on the χ2 test which is considered to be one of the foundations of modern statistics.In this paper, Pearson investigated a test of goodness of fit.

Interpreting correlations

An Interactive Visualization

It was developed by Karl Pearson from a related idea introduced by Francis Galton in the 1880s, and for which the mathematical formula was derived and published by Auguste Bravais in 1844.The naming of the coefficient is thus an example of Stigler’s Law.

Central Limit Theorem

The central limit theorem has an interesting history. The first version of this theorem was postulated by the French-born mathematician Abraham de Moivre who, in a remarkable article published in 1733, used the normal distribution to approximate the distribution of the number of heads resulting from many tosses of a fair coin. This finding was far ahead of its time, and was nearly forgotten until the famous French mathematician Pierre-Simon Laplace rescued it from obscurity in his monumental work Théorie analytique des probabilités, which was published in 1812. Laplace expanded De Moivre’s finding by approximating the binomial distribution with the normal distribution. But as with De Moivre, Laplace’s finding received little attention in his own time. It was not until the nineteenth century was at an end that the importance of the central limit theorem was discerned, when, in 1901, Russian mathematician Aleksandr Lyapunov defined it in general terms and proved precisely how it worked mathematically. Nowadays, the central limit theorem is considered to be the unofficial sovereign of probability theory.

Package

  • stats: t.test, aov, chisq.test, cor, cor.test

  • sjPlot: sjt.xtab

  • rstatix: t_test, anova_test, tuckey_hsd(), chisq_test,cor_test