diff --git a/examples/mann_whitney.ipynb b/examples/mann_whitney.ipynb new file mode 100644 index 000000000..f4993bb9d --- /dev/null +++ b/examples/mann_whitney.ipynb @@ -0,0 +1,407 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Mann Whitney U test\n", + "\n", + "Allen Downey\n", + "\n", + "[MIT License](https://en.wikipedia.org/wiki/MIT_License)" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "\n", + "import numpy as np\n", + "\n", + "import matplotlib.pyplot as plt\n", + "import seaborn as sns\n", + "sns.set(style='white')\n", + "\n", + "np.random.seed(18)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "[I saw a tweet recently](https://twitter.com/MaartenvSmeden/status/1100382052367691776) that make me realize that many people are confused about the [Mann-Whitney U test](https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test). A common misconception is that it tests for a difference in medians. Apparently that is true under a specific assumption, but not true in general.\n", + "\n", + "I have never been confused about the Mann-Whitney U test, because I believe \"There is Only One Test\". As I explain in [this article](https://allendowney.blogspot.com/2016/06/there-is-still-only-one-test.html), all hypothesis tests fit into a general framework; the only differences are:\n", + "\n", + "1. The test statistic, and\n", + "2. A model of the null hypothesis.\n", + "\n", + "If you have these two pieces, it is easy to compute the sampling distribution of the test statistic under the null hypothesis, and from that you can compute a p-value.\n", + "\n", + "To demonstrate, I'll use this framework to implement a version of the Mann-Whitney U test." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Example data\n", + "\n", + "First I'll generate samples from two gamma distributions with slighly different parameters." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "n1 = 90\n", + "n2 = 110" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(2.62509416921364, 1.257132162349856)" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "group1 = np.random.gamma(5, 0.5, size=n1)\n", + "group1.mean(), group1.std()" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(2.4520533570340324, 1.0190315744898875)" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "group2 = np.random.gamma(4.9, 0.48, size=n2)\n", + "group2.mean(), group2.std()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here's what the distributions look like for the two groups." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "sns.kdeplot(group1, label='Group 1')\n", + "sns.kdeplot(group2, label='Group 2')\n", + "\n", + "plt.xlabel('x')\n", + "plt.ylabel('PDF')\n", + "plt.title('Estimated PDFs for the two groups');" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### The test statistic\n", + "\n", + "The test statistic for the Mann-Whitney U test is the probability of superiority; that is, if I choose a random observation from each group, what is the probability that the observation from Group 1 exceeds the observation from Group 2.\n", + "\n", + "The following function takes two arrays representing the observed samples, and computes the probablity of superiority for Group 1 by comparing all pairs. This is not the most efficient implementation, but it is easy to compute using one of NumPy's `outer` methods (one of my favorite under-used features).\n", + "\n", + "In this example, because I sampled from a continuous distribution, the probability of a tie is small so I will ignore it." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "def test_stat(data):\n", + " \"\"\"Compute probability of superiority\n", + " \n", + " data: tuple of arrays\n", + " returns: float probability\n", + " \"\"\"\n", + " group1, group2 = data\n", + " array = np.greater.outer(group1, group2)\n", + " return np.mean(array)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here's the test statistic for the observed data." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.5253535353535354" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "data = group1, group2\n", + "actual = test_stat(data)\n", + "actual" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The test statistic is far enough from 0.5 to be suspicious, but it's not clear whether I would expect that to happen by chance. That's the question we'll answer, but first we need a null hypothesis." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### The null hypothesis\n", + "\n", + "There are several ways to model the null hypothesis. A simple choice is to assume that both samples were drawn from the same distribution.\n", + "\n", + "We can model that hypothesis by [permutation](https://en.wikipedia.org/wiki/Resampling_(statistics)#Permutation_tests); that is, we can combine the observations into a pool and then shuffle the pool.\n", + "\n", + "The following function implements this model. Each time it runs, it shuffles the pool, then splits it into two groups with the same sample sizes as the observed data." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "n1 = len(group1)\n", + "pool = np.hstack(data)\n", + " \n", + "def run_model(): \n", + " np.random.shuffle(pool)\n", + " return pool[:n1], pool[n1:]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here's an example where we generate one batch of permuted data and compute the test statistic." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "fake = run_model()" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.45494949494949494" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "test_stat(fake)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### The sampling distribution\n", + "\n", + "To estimate the sampling distribution, we can run `run_model` and `test_stat` 1000 times and save the results. " + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "def sampling_dist(iters=1000):\n", + " \"\"\"Samples the distribution of the test statistic under the null hypothesis.\n", + "\n", + " iters: number of iterations\n", + "\n", + " returns: array\n", + " \"\"\"\n", + " return [test_stat(run_model()) for _ in range(iters)]" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "sample = sampling_dist()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here's what the distribution of the test statistic looks like under the null hypothesis. The gray line shows the observed test statistic." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "plt.axvline(actual, color='0.8', linewidth=3)\n", + "sns.kdeplot(sample, color='C0', linewidth=3, alpha=0.8)\n", + "\n", + "plt.xlabel('Test statistic')\n", + "plt.ylabel('PDF')\n", + "plt.title('Distribution of the test statistic under the null hypothesis');" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### The p-value\n", + "\n", + "Finally, we can compute the p-value, which is the probability that the test statistic exceeds the observed value." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.25" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.mean(sample > actual)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this example, the p-value is fairly large, which means that the observed value (or higher) could plausibly occur even if the samples were drawn from the same distribution.\n", + "\n", + "In other words, with these sample sizes, this test is not able to rule out the possibility that the observed difference between the groups is due to chance." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.2" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}