Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spearman cor test #304

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/src/multivariate.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,5 @@ BartlettTest

```@docs
CorrelationTest
SpearmanCorrelationTest
```
90 changes: 79 additions & 11 deletions src/correlation.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,25 @@
# TODO: Implement `CorrelationTest` for two arguments

using Statistics: clampcor
using StatsBase: corspearman

export CorrelationTest
export CorrelationTest, SpearmanCorrelationTest
nalimilan marked this conversation as resolved.
Show resolved Hide resolved

abstract type AbstractCorrelationTest <: HypothesisTest end

"""
CorrelationTest(x, y)
nalimilan marked this conversation as resolved.
Show resolved Hide resolved

Perform a t-test for the hypothesis that ``\\text{Cor}(x,y) = 0``, i.e. the correlation
of vectors `x` and `y` is zero.
Perform a t-test for the hypothesis that ``\\text{Cor}(x,y) = 0``, i.e. the Pearson
correlation of vectors `x` and `y` is zero.

CorrelationTest(x, y, Z)

Perform a t-test for the hypothesis that ``\\text{Cor}(x,y|Z=z) = 0``, i.e. the partial
correlation of vectors `x` and `y` given the matrix `Z` is zero.

Implements `pvalue` for the t-test.

Implements `confint` using an approximate confidence interval based on Fisher's
``z``-transform.

Expand All @@ -29,7 +33,7 @@ See also `partialcor` from StatsBase.
* [Section testing using Student's t-distribution from Pearson correlation coefficient on Wikipedia](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient#Testing_using_Student's_t-distribution)
* [K.J. Levy and S.C. Narula (1978): Testing Hypotheses concerning Partial Correlations: Some Methods and Discussion. International Statistical Review 46(2).](https://www.jstor.org/stable/1402814)
"""
struct CorrelationTest{T<:Real} <: HypothesisTest
struct CorrelationTest{T<:Real} <: AbstractCorrelationTest
r::T
n::Int
k::Int
Expand Down Expand Up @@ -60,15 +64,15 @@ struct CorrelationTest{T<:Real} <: HypothesisTest
end

testname(p::CorrelationTest) =
string("Test for nonzero ", p.k != 0 ? "partial " : "", "correlation")
string("Test for nonzero ", p.k != 0 ? "partial " : "", "Pearson correlation")

function population_param_of_interest(p::CorrelationTest)
param = p.k != 0 ? "Partial correlation" : "Correlation"
param = p.k != 0 ? "Partial Pearson correlation" : "Pearson correlation"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that nobody says "partial Pearson correlation" even if that would sound more explicit.

Suggested change
param = p.k != 0 ? "Partial Pearson correlation" : "Pearson correlation"
param = p.k != 0 ? "Partial correlation" : "Pearson correlation"

(param, zero(p.r), p.r)
end

StatsAPI.nobs(p::CorrelationTest) = p.n
StatsAPI.dof(p::CorrelationTest) = p.n - 2 - p.k
StatsAPI.nobs(p::AbstractCorrelationTest) = p.n
StatsAPI.dof(p::AbstractCorrelationTest) = p.n - 2 - p.k

function StatsAPI.confint(test::CorrelationTest{T}, level::Float64=0.95) where T
dof(test) > 1 || return (-one(T), one(T)) # Otherwise we can get NaNs
Expand All @@ -80,12 +84,76 @@ function StatsAPI.confint(test::CorrelationTest{T}, level::Float64=0.95) where T
return (elo, ehi)
end

default_tail(::CorrelationTest) = :both
StatsAPI.pvalue(test::CorrelationTest; tail=:both) = pvalue(TDist(dof(test)), test.t, tail=tail)
default_tail(::AbstractCorrelationTest) = :both
StatsAPI.pvalue(test::AbstractCorrelationTest; tail=:both) = pvalue(TDist(dof(test)), test.t, tail=tail)

function show_params(io::IO, test::CorrelationTest, indent="")
function show_params(io::IO, test::AbstractCorrelationTest, indent="")
println(io, indent, "number of observations: ", nobs(test))
println(io, indent, "number of conditional variables: ", test.k)
println(io, indent, "t-statistic: ", test.t)
println(io, indent, "degrees of freedom: ", dof(test))
end

"""
SpearmanCorrelationTest(x, y)

Perform a t-test for the hypothesis that ``\\text{Cor}(x,y) = 0``, i.e. the Spearman rank
correlation ρₛ of vectors `x` and `y` is zero.

Implements `pvalue` for the t-test.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Implements `pvalue` for the t-test.
Implements `pvalue` for the t-test using the Fisher transformation.


Implements `confint` using an approximate confidence interval adjusting for the
non-normality of the ranks based on [1]. This is still an approximation, which performs
insufficiently in the case of:

* sample sizes below 25
* a true population Spearman correlation |ρₛ| above 0.95
* ordinal data

In these cases a bootstrap confidence interval can perform better [2,3].

# External resources
[1] D. G. Bonett and T. A. Wright, “Sample size requirements for estimating Pearson,
Kendall and Spearman correlations,” Psychometrika, vol. 65, no. 1, pp. 23–28, Mar. 2000,
doi: 10.1007/BF02294183.

[2] A. J. Bishara and J. B. Hittner, “Confidence intervals for correlations when data are
not normal,” Behav Res, vol. 49, no. 1, pp. 294–309, Feb. 2017,
doi: 10.3758/s13428-016-0702-8.

[3] J. Ruscio, “Constructing Confidence Intervals for Spearman’s Rank Correlation with
Ordinal Data: A Simulation Study Comparing Analytic and Bootstrap Methods,”
J. Mod. App. Stat. Meth., vol. 7, no. 2, pp. 416–434, Nov. 2008,
doi: 10.22237/jmasm/1225512360
"""
struct SpearmanCorrelationTest{T<:Real} <: AbstractCorrelationTest
r::T
n::Int
k::Int
t::T

# Error checking is done in `corspearman`
function SpearmanCorrelationTest(x::AbstractVector, y::AbstractVector)
r = corspearman(x, y)
n = length(x)
t = r * sqrt((n - 2) / (1 - r^2))
return new{typeof(r)}(r, n, 0, t)
end
end

testname(p::SpearmanCorrelationTest) = "Spearman correlation"

function population_param_of_interest(p::SpearmanCorrelationTest)
param = "Spearman correlation"
(param, zero(p.r), p.r)
end

function StatsAPI.confint(test::SpearmanCorrelationTest{T}, level::Float64=0.95) where T
dof(test) > 1 || return (-one(T), one(T)) # Otherwise we can get NaNs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test for this case? Maybe also for other corner cases like having NaNs or Inf in the input.

q = quantile(Normal(), 1 - (1-level) / 2)
fisher = atanh(test.r)
bound = sqrt((1 + test.r^2 / 2) / (dof(test)-1)) * q # Estimates variance as in Bonnet et al. (2000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bound = sqrt((1 + test.r^2 / 2) / (dof(test)-1)) * q # Estimates variance as in Bonnet et al. (2000)
# Estimates variance as in Bonett and Wright (2000)
bound = sqrt((1 + test.r^2 / 2) / (dof(test)-1)) * q

elo = clampcor(tanh(fisher - bound))
ehi = clampcor(tanh(fisher + bound))
return (elo, ehi)
end
33 changes: 33 additions & 0 deletions test/correlation.jl
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,39 @@ using DelimitedFiles
@test pvalue(x) ≈ 0.2948405 atol=1e-6
end

@testset "SpearmanCorrelation" begin
# Columns are line number, calcium, iron
nutrient = readdlm(joinpath(@__DIR__, "data", "nutrient.txt"))[:, 1:3]
w = SpearmanCorrelationTest(nutrient[:,2], nutrient[:,3])
let out = sprint(show, w)
@test occursin("reject h_0", out) && !occursin("fail to", out)
end
let ci = confint(w)
# Compare against values from R using library(spearmanCI)
@test first(ci) ≈ 0.3938044 atol=1e-3
@test last(ci) ≈ 0.5178866 atol=1e-2
# Test for consistency against our results
@test first(ci) ≈ 0.3934359 atol=1e-6
@test last(ci) ≈ 0.5137945 atol=1e-6
end
@test nobs(w) == 737
@test dof(w) == 735
@test pvalue(w) < 1e-25

x = SpearmanCorrelationTest(nutrient[:,1], nutrient[:,2])
@test occursin("fail to reject", sprint(show, x))
let ci = confint(x)
# Compare against values from R using library(spearmanCI)
@test first(ci) ≈ -0.1359917 atol=1e-2
@test last(ci) ≈ 0.01197175 atol=1e-2
# Test for consistency against our results
@test first(ci) ≈ -0.1333692 atol=1e-6
@test last(ci) ≈ 0.01065576 atol=1e-6
end
@test pvalue(x) ≈ 0.09274721 atol=1e-2 # value from R's cor.test(..., method="spearman") which does not use a t test algorithm AS 89
Copy link
Member

@nalimilan nalimilan Sep 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT R will use AS 89 if you pass exact=TRUE, right? It would be good to test against an implementation which uses AS 89, even if we need to use another software.

@test pvalue(x) ≈ 0.09429494 atol=1e-6 # Test for consistency against our results
end

@testset "Partial correlation" begin
# Columns are information, similarities, arithmetic, picture completion
wechsler = readdlm(joinpath(@__DIR__, "data", "wechsler.txt"))[:,2:end]
Expand Down