Since Markdown embeds do not work with nbviewer, this article must be downloaded and viewed locally for the images to appear!


Friedman's Rank Test

Date: January 26th, 2016

An exploration of Friedman's Test based on an example from a Stats II lecture by Dwayne Schindler.


Choosing the Cheapest Grocery Store

For the uninitiated:

"The Friedman test is the non-parametric alternative to the one-way ANOVA with repeated measures. It is used to test for differences between groups when the dependent variable being measured is ordinal. It can also be used for continuous data that has violated the assumptions necessary to run the one-way ANOVA with repeated measures (e.g., data that has marked deviations from normality).” – Laerd Statistics

Say there are three grocery stores close by and you want to figure out which store has the best prices.

You create the above spreadsheet and go out to collect the prices for your eight products of interest at each store. For example, eggs sell for $1.12 at store 1, $1.10 at store 2, and $1.05 at store 3. With the data collected and entered, Friedman’s test is run. Technically, this is a bad choice considering the data is interval (not ordinal) but let’s go ahead anyway.

Basically the Friedman test works by comparing ranked scores across columns of data. Ranks are assigned based on ALL the groups, so Store 3's price on lettuce is ranked #1 compared with all other stores (i.e. Store 3 is the place to go for lettuce). This procedure is done for all variables and the column ranks are summed (e.g. Store 1's summed rank score is 19.5). We then put these ranks into a magic formula that compares them somehow.

The magic formula:

The magic formula outputs a Chi-squared which is evaluated to produce a p-value.

spss1.png

spss2.png

Null hypothesis: There is no difference in the overall price charged across the three stores

Alternative hypothesis: There is a difference in the overall price charged across the three stores

In this case, we would fail to reject the null hypothesis (p > 0.05). It seems the stores have approximately equivalent pricing schemes.

Experimentation

Q: What if Store 4 sold coffee for 1000 cents and everything else for 10 cents?

A: Store 4 would still be significantly different.

I don't think the magnitude of difference matters. All that is compared is the frequency of ranks.

Store 4 (everything at 10 cents):

2.png

Store 4 (everything at 10 cents except coffee which costs 1000 cents):

3.png

Store 4 (everything at 10 cents except coffee which costs 440 cents, 1 cent higher than the second most expensive store):

4.png

They are the same.

Why does this matter? Since the Friedman test cannot detect the magnitude of the price differences, there could be situations where a store is not considered significantly cheaper despite offering massive discounts.

An example is if Store 2 sold everything it was already #1 in for 10 cents.

Regular Store 2:

5.png

Cheap Store 2:

6.png

They are the same.

I'm guessing this occurs because the Friedman test is for ordinal data (where all you can be sure of is less than or greater than). Since this example uses interval data, the Friedman test does not seem appropriate.