Compute chi^2-statistic for test on proportions as the categorical variable (Chi2Score)

class cerebstats.stat_scores.chi2Score.Chi2Score(*args: Any, **kwargs: Any)

Compute chi2-statistic for chi squared Test of proportions.

For any two-way contingency tables.

Possibilities for categorical

variable, A

Possibilities for categorical variable, B

Yes

No

a1

\(O_{00}\)

\(O_{01}\)

a2

\(O_{10}\)

\(O_{11}\)

Definitions

Interpretation

\(r\)

number of row variables

\(c\)

number of column variables

\(O_{ij}\)

observed count for a cell in \(i^{th}\) row, \(j^{th}\) column

\(R_{i}\)

total observations in \(i^{th}\) row, \(\sum_{\forall j \in c} O_{ij}\)

\(C_{j}\)

total observations in \(j^{th}\) column, \(\sum_{\forall i \in r} O_{ij}\)

\(n\)

total count for entire table \(\sum_{\forall i \in r} R_i\) or \(\sum_{\forall j \in c} C_j\)

\(E_{ij}\)

expected count for a cell in \(i^{th}\) row, \(j^{th}\) column \(E_{ij} = \frac{R_i C_j}{n}\)

test-statistic

\(\chi^2 = \sum_{\forall i,j} \frac{(O_{ij}-E_{ij})^2}{E_{ij}}\)

\(df\)

degrees of freedom, \(df = (r-1)(c-1)\)

Special note. For the case of 2 x 2 table like below

Possibilities for categorical

variable, A

row-1

Possibilities for categorical variable, B

Total

R1

column-1

A

column-2

B

row-2

C

D

R2

Total

C1

C2

N

Notice that for 2 x 2, \(df = 1\) and its test statictic can calculated with the shortcut formula

\(\chi^2 = \frac{ N(AD-BC)^2 }{ R_1 R_2 C_1 C_2 }\)

This class uses scipy.stats.chi2_contingency. chi2_contingency is a special case of chisquare as demonstrated below

obs = np.array([ [129, 49], [150, 29], [137, 39] ])
chi2, p, df, expected = scipy.stats.chi2_contingency( obs )
chi2_, p_ = scipy.stats.chisquare( obs.ravel(), f_exp=expected.ravel(), ddof=obs.size-1-df )
chi2 == chi2_ == 6.69
True
p == p2 == 0.03
True

Use Case:

x = Chi2ScoreForProportionChi2Test.compute( observation, prediction )
score = Chi2ScoreForProportionChi2Test(x)

Note: As part of the SciUnit framework this custom TScore should have the following methods,

  • compute() (class method)

  • sort_key() (property)

  • __str__()

classmethod compute(observation, prediction)

Argument

Value type

first argument |dictionary; observation/experimental data must

|must have keys “sample_size” and “success_numbers”

second argument |dictionary; model prediction must also have keys

|”sample_size” and “success_numbers”

Note:

  • for a 2 x 2 table, the value for the key “success_numbers is a number for both observation and prediction

  • for a 2 x k table, the values for the keys “success_numbers” (both observation and prediction) is either a list or an array.

  • chi squared tests by nature are two-sided so there is not option for one-sidedness.