Inferential Statistics:

Testing Hypotheses and Determining Significance Using Variance

Chandler O'Neal
2 min readJun 22, 2021

What is inferential statistics?

  • Inferential statistics is used to “draw conclusions about the characteristics of a population from the characteristics of a sample”, which are then used to determine how reliable such conclusions are.
  • Subsequently to the conclusions, based on the sample size and distribution of the sample, one can then determine the probability of the event happening.

What it measures:

  • Central tendency, variability, distribution, and relationships between characteristics within a data sample.

Further uses:

  • To make predictions.
  • To make generalizations about large groups.
  • To relate variables in a data set to one another.

Conducting an Inferential Statistics Test:

1. To begin, determine the type of data in the data set:

2. Formulate the difference between the Null and Alternative Hypothesis:

3. Identify potential Errors and signify which is more preferable to commit.

With all predictions come a place for potential error. Although evidence might be found in favor of the alternate hypothesis (the new theory), this evidence might not actually exist.

Type I Error preferred when:

  • Testing for a disease, a doctor and patient might prefer to give/receive a false positive test result given that the treatment would have less people than missing a rare disease altogether.

Type II Error preferred when:

  • will add later…

4. Quantify our uncertainty using Significance Tests :

Tests for statistical significance “tell us what the probability is that the relationship we think we have found is due only to random chance.”

Potential Significance tests to use:

  • T-tests
  • P-value
  • Hypothesis Test

5. If a relationship does exist — determine the effect :

Linear regression (simple or multiple) :

Cohen’s D: is an effect size used to indicate the standardized difference between two means. It can be used, for example, to accompany reporting of t-test.

  • Measures the Effect
  • cohen’s d = M1 — M2 / spooled
  • M1 = mean of group 1
  • M2 = mean of group 2
  • spooled = pooled standard deviations for the two groups. The formula is: √[(s12+ s22) / 2]

To Be Continued…

--

--

Chandler O'Neal

Currently attending the data science bootcamp with Flatiron. My goal of medium is to better equip myself and others with simplified explanations of material.