kailo_beewell_dashboard.synthesise_aggregate

Functions which aggregate pupil-level data - as part of several files which provide functions for synthesis (creation and aggregation) of data for the dashboard.

Module Contents

Functions

results_by_site_and_group(data, agg_func, no_pupils[, ...])

Aggregate results for all possible sites (schools or areas) and groups

aggregate_scores(df)

Aggregate the score columns in the provided dataset, finding the mean and

convert_boolean(true_list, false_list, mask)

Conditionally replace values of boolean list from one list when True and

aggregate_proportions(data, response_col, labels[, ...])

Aggregates each of the columns provided by response_col, for the chosen

aggregate_counts(df)

Aggregates the provided dataframe by finding the total people in it.

aggregate_demographic(data, response_col, labels)

Aggregates the demographic data by school and group (seperate to

kailo_beewell_dashboard.synthesise_aggregate.results_by_site_and_group(data, agg_func, no_pupils, response_col=None, labels=None, group_type='standard', site_col='school_lab')

Aggregate results for all possible sites (schools or areas) and groups (setting result to 0 or NaN if no pupils from a particular group are present).

Parameters

datapandas dataframe

Pupil-level survey responses, with their school and demographics

agg_funcfunction

Method for aggregating the dataset

no_pupils: pandas dataframe

Output of agg_func() where all counts are set to 0 and other results set to NaN, to be used in cases where there are no pupils of a particular group (e.g. no FSM / SEN / Year 8)

response_collist

Optional argument used when agg_func is aggregate_proportions(). It is the list of columns that we want to aggregate.

labelsdictionary

Optional argument used when agg_func is aggregate_proportions(). It is a dictionary with all possible questions as keys, then values are another dictionary where keys are all the possible numeric (or nan) answers to the question, and values are relevant label for each answer.

group_typestring

Links to the type of demographic groupings performed. Either ‘standard’, ‘symbol’ or ‘none’ - default is standard.

site_col: string

Name of column with site - e.g. ‘school_lab’ (default), ‘msoa’.

Returns

resultpandas DataFrame

Dataframe where each row has the aggregation results, along with the relevant school and pupil groups used in that calculation

kailo_beewell_dashboard.synthesise_aggregate.aggregate_scores(df)

Aggregate the score columns in the provided dataset, finding the mean and count of non-NaN

Parameters:

dfdataframe

Dataframe with rows for each pupils and containing the score columns

Returns:

resdataframe

Dataframe with mean and count for each score

kailo_beewell_dashboard.synthesise_aggregate.convert_boolean(true_list, false_list, mask)

Conditionally replace values of boolean list from one list when True and another when False.

Parameters

true_listlist

Contains values to use if True

false_listlist

Contains values to use if False

masklist

Boolean list

kailo_beewell_dashboard.synthesise_aggregate.aggregate_proportions(data, response_col, labels, hide_low_response=False)

Aggregates each of the columns provided by response_col, for the chosen dataset.

This function uses the known possible values for each column, it counts occurences of each (inc. number missing) and makes the answer as a single dataframe row, where counts and percentages and categories are stored as lists within cells of that row. The function returns a dataframe containing all of those rows. It is designed to based on all possible values rather than only on values present - else e.g. if no-one responded 3, you could have a function that just returns counts of responses to 1, 2 and 4, which would then create issues when we try and plot the data.

For the branching question (talking about feelings), the value counts are calculated from a subset of the data (as the no response should only be from those who branched onto that question, and not those who branched onto the other question (or never answered the first branching question)).

Parameters

datadataframe

Dataframe with rows for each pupil and including all the response_col

response_collist

List of columns that we want to aggregate

labelsdictionary

Dictionary with all possible questions as keys, then values are another dictionary where keys are all the possible numeric (or nan) answers to the question, and values are the relevant label for each answer.

hide_low_responseboolean

Whether to hide responses when a response option gets less than 10 responses (rather than norm elsewhere, which is just requiring 10 responses to the entire item rather than to each response option)

Returns

pd.concat(rows): dataframe

Dataframe with the aggregate responses to each of the response_col

kailo_beewell_dashboard.synthesise_aggregate.aggregate_counts(df)

Aggregates the provided dataframe by finding the total people in it.

Parameters

dfDataframe

Dataframe with row for each pupil and columns that include the school and groups needed by results_by_site_and_group()

Returns

resDataframe

Dataframe with the count of pupils in each school and group

kailo_beewell_dashboard.synthesise_aggregate.aggregate_demographic(data, response_col, labels)

Aggregates the demographic data by school and group (seperate to results_by_school_and_group() as we want to aggregate by school v.s. all others rather than for each school, and as we don’t want to break down results any further by any demographic characteristics)

Parameters

datadataframe

Dataframe containing pupil-level demographic data

response_colarray

List of demographic columns to be aggregated

labelsdictionary

Dictionary with response options for each variable

Returns

resultdataframe

Dataframe with % responses to demographic questions, for each school, compared with all other schools