kailo_beewell_dashboard.synthesise_aggregate
Functions which aggregate pupil-level data - as part of several files which provide functions for synthesis (creation and aggregation) of data for the dashboard.
Module Contents
Functions
|
Aggregate results for all possible sites (schools or areas) and groups |
|
Aggregate the score columns in the provided dataset, finding the mean and |
|
Conditionally replace values of boolean list from one list when True and |
|
Aggregates each of the columns provided by response_col, for the chosen |
|
Aggregates the provided dataframe by finding the total people in it. |
|
Aggregates the demographic data by school and group (seperate to |
- kailo_beewell_dashboard.synthesise_aggregate.results_by_site_and_group(data, agg_func, no_pupils, response_col=None, labels=None, group_type='standard', site_col='school_lab')
Aggregate results for all possible sites (schools or areas) and groups (setting result to 0 or NaN if no pupils from a particular group are present).
Parameters
- datapandas dataframe
Pupil-level survey responses, with their school and demographics
- agg_funcfunction
Method for aggregating the dataset
- no_pupils: pandas dataframe
Output of agg_func() where all counts are set to 0 and other results set to NaN, to be used in cases where there are no pupils of a particular group (e.g. no FSM / SEN / Year 8)
- response_collist
Optional argument used when agg_func is aggregate_proportions(). It is the list of columns that we want to aggregate.
- labelsdictionary
Optional argument used when agg_func is aggregate_proportions(). It is a dictionary with all possible questions as keys, then values are another dictionary where keys are all the possible numeric (or nan) answers to the question, and values are relevant label for each answer.
- group_typestring
Links to the type of demographic groupings performed. Either ‘standard’, ‘symbol’ or ‘none’ - default is standard.
- site_col: string
Name of column with site - e.g. ‘school_lab’ (default), ‘msoa’.
Returns
- resultpandas DataFrame
Dataframe where each row has the aggregation results, along with the relevant school and pupil groups used in that calculation
- kailo_beewell_dashboard.synthesise_aggregate.aggregate_scores(df)
Aggregate the score columns in the provided dataset, finding the mean and count of non-NaN
Parameters:
- dfdataframe
Dataframe with rows for each pupils and containing the score columns
Returns:
- resdataframe
Dataframe with mean and count for each score
- kailo_beewell_dashboard.synthesise_aggregate.convert_boolean(true_list, false_list, mask)
Conditionally replace values of boolean list from one list when True and another when False.
Parameters
- true_listlist
Contains values to use if True
- false_listlist
Contains values to use if False
- masklist
Boolean list
- kailo_beewell_dashboard.synthesise_aggregate.aggregate_proportions(data, response_col, labels, hide_low_response=False)
Aggregates each of the columns provided by response_col, for the chosen dataset.
This function uses the known possible values for each column, it counts occurences of each (inc. number missing) and makes the answer as a single dataframe row, where counts and percentages and categories are stored as lists within cells of that row. The function returns a dataframe containing all of those rows. It is designed to based on all possible values rather than only on values present - else e.g. if no-one responded 3, you could have a function that just returns counts of responses to 1, 2 and 4, which would then create issues when we try and plot the data.
For the branching question (talking about feelings), the value counts are calculated from a subset of the data (as the no response should only be from those who branched onto that question, and not those who branched onto the other question (or never answered the first branching question)).
Parameters
- datadataframe
Dataframe with rows for each pupil and including all the response_col
- response_collist
List of columns that we want to aggregate
- labelsdictionary
Dictionary with all possible questions as keys, then values are another dictionary where keys are all the possible numeric (or nan) answers to the question, and values are the relevant label for each answer.
- hide_low_responseboolean
Whether to hide responses when a response option gets less than 10 responses (rather than norm elsewhere, which is just requiring 10 responses to the entire item rather than to each response option)
Returns
- pd.concat(rows): dataframe
Dataframe with the aggregate responses to each of the response_col
- kailo_beewell_dashboard.synthesise_aggregate.aggregate_counts(df)
Aggregates the provided dataframe by finding the total people in it.
Parameters
- dfDataframe
Dataframe with row for each pupil and columns that include the school and groups needed by results_by_site_and_group()
Returns
- resDataframe
Dataframe with the count of pupils in each school and group
- kailo_beewell_dashboard.synthesise_aggregate.aggregate_demographic(data, response_col, labels)
Aggregates the demographic data by school and group (seperate to results_by_school_and_group() as we want to aggregate by school v.s. all others rather than for each school, and as we don’t want to break down results any further by any demographic characteristics)
Parameters
- datadataframe
Dataframe containing pupil-level demographic data
- response_colarray
List of demographic columns to be aggregated
- labelsdictionary
Dictionary with response options for each variable
Returns
- resultdataframe
Dataframe with % responses to demographic questions, for each school, compared with all other schools