The Kruskal-Wallis Test

Tilak Zade
4 min readDec 9, 2020

--

In this blog we will see, what is the Kruskal Wallis Test, the assumptions involved in the test, what are the steps involved while performing a Kruskal Wallis test, a numerical example to understand how to perform the test, and some practical examples.

Introduction

The Kruskal Wallis test is the non-parametric alternative to the One Way ANOVA. Nonparametric means that the test doesn’t assume your data comes from a particular distribution. The H test is used when the assumptions for ANOVA aren’t met (like the assumption of normality). It is sometimes called the one-way ANOVA on ranks, as the ranks of the data values are used in the test rather than the actual data points. The test determines whether the medians of two or more groups are different. Like most statistical tests, you calculate a test statistic and compare it to a distribution cut-off point. The test statistic used in this test is called the H statistic. The hypotheses for the test are:

H0: population medians are equal.

H1: population medians are not equal.

Fig 1. A glimpse of “The Kruskal-Wallis Test”

The Kruskal Wallis test will tell you if there is a significant difference between groups. However, it won’t tell you which groups are different. For that, you’ll need to run a Post Hoc test. Both the Kruskal-Wallis test and one-way ANOVA assess for significant differences on a continuous dependent variable by a categorical independent variable (with two or more groups). In the ANOVA, we assume that the dependent variable is normally distributed and there is an approximately equal variance on the scores across groups. However, when using the Kruskal-Wallis Test, we do not have to make any of these assumptions. Therefore, the Kruskal- Wallis test can be used for both continuous and ordinal-level dependent variables. However, like most non-parametric tests, the Kruskal-Wallis Test is not as powerful as the ANOVA.

Assumptions

These are some assumptions that the Kruskal Wallis test includes. Your variables should have:

· One independent variable with two or more levels (independent groups). The test is more commonly used when you have three or more levels. For two levels, consider using the Mann Whitney U Test instead.

· Ordinal scale, Ratio Scale, or Interval scale dependent variables.

· Your observations should be independent. In other words, there should be no relationship between the members in each group or between groups.

· All groups should have the same shape distributions. Most software (i.e. SPSS, Minitab) will test for this condition as part of the test.

Methodology

Let’s see the step by step execution of the Kruskal Wallis test.

Step 1: Sort the data for all groups/samples into ascending order in one combined set.

Step 2: Assign ranks to the sorted data points. Give tied values the average rank.

Step 3: Add up the different ranks for each group/sample.

Step 4: Calculate the H statistic:

The formula for H statistics

Where:

n = sum of sample sizes for all samples,

c = number of samples,

Tj = sum of ranks in the jth samples,

nj = size of the jth sample.

Step 5: Find the critical chi-square value, with c-1 degrees of freedom. For c — 1 degree of freedom and an alpha level of .05

Step 6: Compare the H value from Step 4 to the critical chi-square value from Step 5.

Step 7: Accept or Reject the null hypothesis and come to a conclusion.

Practical Examples

1. You want to find out how test anxiety affects actual test scores. The independent variable “test anxiety” has three levels: no anxiety, low-medium anxiety, and high anxiety. The dependent variable is the exam score, rated from 0 to 100%.

2. You want to find out how socioeconomic status affects attitude towards sales tax increases. Your independent variable is “socioeconomic status” with three levels: working class, middle class, and wealthy. The dependent variable is measured on a 5-point Likert scale from strongly agree to strongly disagree.

3. A medical researcher has heard anecdotal evidence that certain anti-depressive drugs can have the positive side-effect of lowering neurological pain in those individuals with chronic, neurological back pain when administered in doses lower than those prescribed for depression. The medical researcher would like to investigate this anecdotal evidence with a study. The researcher identifies 3 well-known, anti-depressive drugs that might have this positive side effect, and labels them Drug A, Drug B, and Drug C. The researcher then recruits a group of 60 individuals with a similar level of back pain and randomly assigns them to one of three groups — Drug A, Drug B, or Drug C treatment groups — and prescribes the relevant drug for a 4 week period. At the end of the 4 week period, the researcher asks the participants to rate their back pain on a scale of 1 to 10, with 10 indicating the greatest level of pain. The researcher wants to compare the levels of pain experienced by the different groups at the end of the drug treatment period. The researcher runs a Kruskal- Wallis H test to compare this ordinal, dependent measure (Pain_Score) between the three-drug treatments (i.e., the independent variable, is the type of drug with more than two groups).

Conclusion

The Kruskal Wallis test is a non-parametric test to check if there is a significant difference between the two populations.

In this blog, we saw the assumptions involved in the Kruskal Wallis Test, the methodology behind the test, solving a problem using the test, and some of its practical examples.

References

https://www.statisticshowto.com/kruskal-wallis/

https://statistics.laerd.com/spss-tutorials/kruskal-wallis-h-test-using-spss-statistics.php

--

--

Tilak Zade
Tilak Zade

Written by Tilak Zade

I strive to wake up everyday and pursue what I find most interesting. Data Science and ML enthusiast • I write posts on Technology and Data Science.

No responses yet