Hypothesis testing about proportions (HtestAboutProportions
)¶
- class cerebstats.hypothesis_testings.aboutproportions.HtestAboutProportions(observation, prediction, test={'name': 'proportions_z_test_1pop', 'sample_statistic': 0.0, 'side': 'not_equal', 'z_statistic': 0.0})¶
Hypothesis Testing (significance testing) about proportions.
This is a parameteric test that assumes that individuals in the sample are chosen randomly and experiments equivalent to binomial experiments.
1. Verify necessary data conditions.
The verification is made based on the sample size requirement (the other condition being random sample or binomial experiment with independent trials; this is assumed).
Statistic name
Single sample test
Double sample test
sample size
\(n\) (observation)
\(n_1\) (observation) \(n_2\) (prediction)
null value
\(p_0\) (prediction)
\(p_0 = 0\)
proportions with
\(p_1\) (observation)
trait (succeses)
\(p_2\) (prediction)
Such that,
\(np_0 \geq lb \cap n(1-p_0) \geq lb\)
\(n_1p_1 \geq lb \cap n_1(1-p_1) \geq lb\)
\(n_2p_2 \geq lb \cap n_2(1-p_2) \geq lb\)
\(lb = 5\) (default) alternative value is \(lb = 10\)
2. Defining null and alternate hypotheses.
For single sample test
Statistic
Interpretation
sample statistic, \(\hat{p}\)
proportion of observation with the characteristic trait (successes)
null value/population parameter, \(p_0\)
proportion of prediction taken as the specified value
null hypothesis, \(H_0\)
\(\hat{p} = p_0\)
alternate hypothesis, \(H_a\)
\(\hat{p} \neq or < or > p_0\)
For two sample test
Statistic
Interpretation
sample statistic, \(\hat{p}_1-\hat{p}_2\)
- difference between the proportions (observation,1, and
prediction, 2) with the characteristic trait (successes)
null value/population parameter, \(p_0\)
0
null hypothesis, \(H_0\)
\(\hat{p}_1-\hat{p}_2 = 0\)
alternate hypothesis, \(H_a\)
\(\hat{p}_1-\hat{p}_2 \neq or < or > 0\)
3. Assuming H0 is true, find p-value.
For single sample test
Statistic
Interpretation
\(n\)
number of observations
\(x\)
number of observations with characteristic trait (successes)
\(\hat{p}\)
sample statistic, \(\hat{p} = \frac{x}{n}\)
\(se_{\hat{p}}\)
- standard error that \(H_0\) is true,
\(se_{\hat{p}} = \frac{ p_0(1-p_0) }{ n }\)
z_statistic, z
z = \(\frac{ \hat{p}-p_0 }{ se_{\hat{p}} }\)
For two sample test
Statistic
Interpretation
\(n_1\)
number of observations
\(n_2\)
number of predictions
\(x_1\)
number of observations with characteristic trait (successes)
\(x_2\)
number of predictions with characteristic trait (successes)
\(\hat{p}_1\)
- proportion of observation with successes,
\(\hat{p}_1 = \frac{x_1}{n_1}\)
\(\hat{p}_2\)
- proportion of predictions with successes,
\(\hat{p}_2 = \frac{x_2}{n_2}\)
\(\hat{p}\)
- combined proportion assuming that \(H_0: p_1 = p_2 = p\) is true
\(\hat{p} = \frac{x_1+x_2}{n_1+n_2}\)
\(\hat{p}_1-\hat{p}_2\)
sample statistic,
\(se_{\hat{p}_1-\hat{p}_2}\)
- standard error that \(H_0\) is true,
\(se_{\hat{p}_1-\hat{p}_2}=\sqrt{\frac{\hat{p}(1-\hat{p})}{n_1}+\frac{\hat{p}(1-\hat{p})}{n_2} }\)
z_statistic, z
z = \(\frac{\hat{p}_1-\hat{p}_2 - p_0}{se_{\hat{p}_1-\hat{p}_2}}\)
Note:
Using z look up table for standard normal curve which will return its corresponding p.
The p-value derived from z-statistic is approximate.
For single sample test, exact p-value can be calculated from binomial distribution.
The notation \(\hat{p}\) in single sample test represents sample statistic but not sample statistic for two sample test.
4. Report and Answer the question, based on the p-value is the result (true H0) statistically significant?
Answer is not provided by the class but it is up to the person viewing the reported result. The reports are obtained calling the attributes
.statistics
and.description
. This is illustrated below.ht = HtestAboutProportions( observation, prediction, test_result, side="less_than" ) score.description = ht.outcome score.statistics = ht.statistics
Arguments
Argument
Representation
Value type
first
experiment/observation
dictionary that must have keys;
“sample_size”, “success_numbers”,
second
model prediction
float or dictionary; the later for two sample cases
with keys: “sample_size”, “success_numbers”
third
(keyword)
test result
dictionary with keywords:
“name”: string, “proportions_z_test_1pop” or “proportions_z_test_2pop” “sample_statistic”: float; “z_statistic”: float; “side”: string, “not_equal”, “less_than” or “greater_than”; and any additional names that is specific to the test
This constructor method generated
statistics
andoutcome
(which is then assigned todescirption
within the validation test class where this hypothesis test class is implemented).- static alternate_hypothesis(side, symbol_null_value, symbol_sample_statistic)¶
Returns the statement for the alternate hypothesis, Ha.
- static null_hypothesis(symbol_null_value, symbol_sample_statistic)¶
Returns the statement for the null hypothesis, H0.
- test_outcome()¶
Puts together the returned values of
null_hypothesis()
,alternate_hypothesis()
, and_compute_pvalue()
. Then returns the string value for.outcome
.