Compute chi^2-statistic for chi^2 goodness-of-fit test on proportions of categories of a categorical variable (Chi2GOFScore
)¶
- class cerebstats.stat_scores.chi2GOFScore.Chi2GOFScore(*args: Any, **kwargs: Any)¶
Compute chi2-statistic for chi-squared goodness-of-fit Test of proportions.
One may think of this as a one-way contingency table.
sample size
\(n\)
\(k\) categories of a categorial variable of interest
\(x_1\)
\(x_2\)
\(\ldots\)
\(x_k\)
observations
\(O_1\)
\(O_2\)
\(\ldots\)
\(O_k\)
probabilities
\(p_1\)
\(p_2\)
\(\ldots\)
\(p_k\)
expected
\(np_1\)
\(np_2\)
\(\ldots\)
\(np_k\)
Notice that for probabilities of k categories \(\sum_{\forall i} p_i = 1\). The expected counts for each category can be derived from it (or already given) such that \(\sum_{\forall i} np_i = n\).
Definitions
Interpretation
\(n\)
sample size; total number of experiments done
\(k\)
number of categorical variables
\(O_i\)
observed count (frequency) for \(i^{th}\) variable
\(p_i\)
probability for \(i^{th}\) category such that \(\sum_{\forall i} p_i = 1\)
\(E_i\)
expected count for \(i^{th}\) category such that \(E_i = n p_i\)
test-statistic
\(\chi^2 = \sum_{\forall i} \frac{(O_i - E_i)^2}{E_i}\)
\(df\)
degrees of freedom, \(df = k-1\)
Note the modification made when compared with a two-way \(\chi^2\) test is
the calculation of expected counts \(E_i = n p_i\)
the degree of freedom \(df = k-1\)
This class uses scipy.stats.chisquare.
Use Case:
x = Chi2GOFScoreForProportionChi2GOFTest.compute( observation, prediction ) score = Chi2GOFScoreForProportionChi2GOFTest(x)
Note: As part of the SciUnit framework this custom
TScore
should have the following methods,compute()
(class method)sort_key()
(property)__str__()
- classmethod compute(observation, prediction)¶
Argument
Value type
first argument
dictionary; observation/experimental data must have keys “sample_size” with a number as its value and “observed_freq” whose value is an array
second argument
dictionary; model prediction must have either “probabilities” or “expected” whose value is an array (same length as “observed_freq”)
Note:
chi squared tests (for goodness-of-fit or contingency table) by nature are two-sided so there is not option for one-sidedness.