Generating Mock Data files at random to evaluate validation tests (MockData)

class cerebstats.test_mock_data.mock_data.MockData

Available methods:

Method name

Method type

count_files()

static method

display_files()

static method

clear_files()

static method

generate_random_data_files()

static method

Example generating mock data using reference dataset:

ref_datasetlink = "https://raw.githubusercontent.com/cerebunit/cerebdata/master/expdata/cells/PurkinjeCell/Llinas_Sugimori_1980_soma_restVm.json"
MockData.generate_random_data_files(ref_datasetlink, sample_low=-70, sample_high=-46,num_of_files=10)
static clear_files()

This function wipes or deletes all the files at the path where the mock data is generated.

static count_files()

Counts the number of files present in the mock data directory.

static display_files()

Displays all the files present in the mock data directory.

static generate_random_data_files(reference_datasetlink, sample_low=None, sample_high=None, num_of_files=None)

This function accepts the link to the reference dataset along with the lower bound and upper bound of the raw_data and the number of mock data files required to be generated as arguments. It generates mock data randomly to calculate statistics required to evaluate the validation test.

Arguments

Argument

Representation

Value type

first

reference dataset link

URL to dataset; Dataset must be in JSON format

second

sample low

float

third

sample high

float

fourth

number of files

int; number of files to be generated.

Note For the second and third arguments which is Sample low and Sample High, the raw data array will be generated internally considering the Sample low as lowest possible value and Sample High as the largest possible value in raw data array.

If the raw_data key is present in the reference dataset then the newly generated mock dataset will also have that raw_data key.

If the raw_data key is not present in the reference dataset then the newly generated mock dataset will also not contain this key. But the raw data array would still be generated internally considering the sample low and sample high to calculate other values like SD, Mean etc.