Home » , , » Wilcoxon Signed-Rank Test for R and Python

Wilcoxon Signed-Rank Test for R and Python

In IR and many other research domains, we always have to use statistical Test to evaluate whether a newly proposed model can bring significant improvement over baselines. I do not want to judge it is a good means. Here I just introduce how to conduct statistical Test using R, python, etc.
————————————————–
http://www.r-tutor.com/elementary-statistics/non-parametric-methods/wilcoxon-signed-rank-test

Wilcoxon Signed-Rank Test

Two data samples are matched if they come from repeated observations of the same subject. Using the Wilcoxon Signed-Rank Test, we can decide whether the corresponding data population distributions are identical without assuming them to follow the normal distribution.
Example
In the built-in data set named immer, the barley yield in years 1931 and 1932 of the same field are recorded. The yield data are presented in the data frame columns Y1 and Y2.
> library(MASS) # load the MASS package
> head(immer)
Loc Var Y1 Y2
1 UF M 81.0 80.7
2 UF S 105.4 82.3
…..
Problem
Without assuming the data to have normal distribution, test at .05 significance level if the barley yields of 1931 and 1932 in data set immer have identical data distributions.
Solution
The null hypothesis is that the barley yields of the two sample years are identical populations. To test the hypothesis, we apply the wilcox.test function to compare the matched samples. For the paired test, we set the “paired” argument as TRUE. As the p-value turns out to be 0.005318, and is less than the .05 significance level, we reject the null hypothesis.
> wilcox.test(immer$Y1, immer$Y2, paired=TRUE)
Wilcoxon signed rank test with continuity correction
data: immer$Y1 and immer$Y2
V = 368.5, p-value = 0.005318
alternative hypothesis: true location shift is not equal to 0
Warning message:
In wilcox.test.default(immer$Y1, immer$Y2, paired = TRUE) :
cannot compute exact p-value with ties
Answer
At .05 significance level, we conclude that the barley yields of 1931 and 1932 from the data set immer are nonidentical populations.
————————————-

Wilcoxon Signed Test with Python.

It would be even easier for the Wilcoxon Test with Python.
Just the following lines:
import scipy.stats as stat
wvalue = stat.wilcoxon(diffs)
print “wilcoxon value:”, wvalue
A tool of Wilcoxon Signed Test for TREC evaluation is provided in the following link:

Popular Posts