Compute chi^2-statistic for chi^2 goodness-of-fit test on proportions of categories of a categorical variable (Chi2GOFScore)

class cerebstats.stat_scores.chi2GOFScore.Chi2GOFScore(*args: Any, **kwargs: Any)

Compute chi2-statistic for chi-squared goodness-of-fit Test of proportions.

One may think of this as a one-way contingency table.

sample size

\(n\)

\(k\) categories of a categorial variable of interest

\(x_1\)

\(x_2\)

\(\ldots\)

\(x_k\)

observations

\(O_1\)

\(O_2\)

\(\ldots\)

\(O_k\)

probabilities

\(p_1\)

\(p_2\)

\(\ldots\)

\(p_k\)

expected

\(np_1\)

\(np_2\)

\(\ldots\)

\(np_k\)

Notice that for probabilities of k categories \(\sum_{\forall i} p_i = 1\). The expected counts for each category can be derived from it (or already given) such that \(\sum_{\forall i} np_i = n\).

Definitions

Interpretation

\(n\)

sample size; total number of experiments done

\(k\)

number of categorical variables

\(O_i\)

observed count (frequency) for \(i^{th}\) variable

\(p_i\)

probability for \(i^{th}\) category such that \(\sum_{\forall i} p_i = 1\)

\(E_i\)

expected count for \(i^{th}\) category such that \(E_i = n p_i\)

test-statistic

\(\chi^2 = \sum_{\forall i} \frac{(O_i - E_i)^2}{E_i}\)

\(df\)

degrees of freedom, \(df = k-1\)

Note the modification made when compared with a two-way \(\chi^2\) test is

  • the calculation of expected counts \(E_i = n p_i\)

  • the degree of freedom \(df = k-1\)

This class uses scipy.stats.chisquare.

Use Case:

x = Chi2GOFScoreForProportionChi2GOFTest.compute( observation, prediction )
score = Chi2GOFScoreForProportionChi2GOFTest(x)

Note: As part of the SciUnit framework this custom TScore should have the following methods,

  • compute() (class method)

  • sort_key() (property)

  • __str__()

classmethod compute(observation, prediction)

Argument

Value type

first argument

dictionary; observation/experimental data must have keys “sample_size” with a number as its value and “observed_freq” whose value is an array

second argument

dictionary; model prediction must have either “probabilities” or “expected” whose value is an array (same length as “observed_freq”)

Note:

  • chi squared tests (for goodness-of-fit or contingency table) by nature are two-sided so there is not option for one-sidedness.