Introduction to PHEindicatormethods

Georgina Anderson

2019-09-05

Introduction

This vignette introduces the following functions from the PHEindicatormethods package and provides basic sample code to demonstrate their execution. The code included is based on the code provided within the ‘examples’ section of the function documentation. This vignette does not explain the methods applied in detail but these can (optionally) be output alongside the statistics or for a more detailed explanation, please see the references section of the function documentation.

The following packages must be installed and loaded if not already available

Package functions

This vignette covers the following functions available within the first release of the package (v1.0.8) but has been updated to apply to these functions in their latest release versions. If further functions are added to the package in future releases these will be explained elsewhere.

Function Type Description
phe_proportion Non-aggregate Performs a calculation on each row of data (unless data is grouped)
phe_rate Non-aggregate Performs a calculation on each row of data (unless data is grouped)
phe_mean Aggregate Performs a calculation on each grouping set
phe_dsr Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs
phe_smr Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs
phe_isr Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs

Non-aggregate functions

Create some test data for the non-aggregate functions

The following code chunk creates a data frame containing observed number of events and populations for 4 geographical areas over 2 time periods that is used later to demonstrate the PHEindicatormethods package functions:

Execute phe_proportion and phe_rate

INPUT: The phe_proportion and phe_rate functions take a single data frame as input with columns representing the numerators and denominators for the statistic. Any other columns present will be retained in the output.

OUTPUT: The functions output the original data frame with additional columns appended. By default the additional columns are the proportion or rate, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: The functions also accept additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.

Here are some example code chunks to demonstrate these two functions and the arguments that can optionally be specified

# default proportion
phe_proportion(df, obs, pop)
#>    area year obs pop     value    lowercl   uppercl confidence
#> 1 Area1 2015  47 128 0.3671875 0.28868455 0.4534300        95%
#> 2 Area2 2015  16 112 0.1428571 0.08988661 0.2195144        95%
#> 3 Area3 2015  78 182 0.4285714 0.35888349 0.5012123        95%
#> 4 Area4 2015  92 163 0.5644172 0.48768236 0.6381856        95%
#> 5 Area1 2016  74 145 0.5103448 0.42976859 0.5903871        95%
#> 6 Area2 2016  41 110 0.3727273 0.28809596 0.4659479        95%
#> 7 Area3 2016  26 117 0.2222222 0.15640397 0.3057012        95%
#> 8 Area4 2016  33 153 0.2156863 0.15790572 0.2873940        95%
#>         statistic method
#> 1 proportion of 1 Wilson
#> 2 proportion of 1 Wilson
#> 3 proportion of 1 Wilson
#> 4 proportion of 1 Wilson
#> 5 proportion of 1 Wilson
#> 6 proportion of 1 Wilson
#> 7 proportion of 1 Wilson
#> 8 proportion of 1 Wilson

# specify confidence level for proportion
phe_proportion(df, obs, pop, confidence=99.8)
#>    area year obs pop     value    lowercl   uppercl confidence
#> 1 Area1 2015  47 128 0.3671875 0.24906241 0.5037539      99.8%
#> 2 Area2 2015  16 112 0.1428571 0.06889889 0.2729332      99.8%
#> 3 Area3 2015  78 182 0.4285714 0.32157994 0.5426849      99.8%
#> 4 Area4 2015  92 163 0.5644172 0.44415166 0.6775525      99.8%
#> 5 Area1 2016  74 145 0.5103448 0.38544314 0.6339681      99.8%
#> 6 Area2 2016  41 110 0.3727273 0.24585624 0.5199312      99.8%
#> 7 Area3 2016  26 117 0.2222222 0.12707158 0.3592956      99.8%
#> 8 Area4 2016  33 153 0.2156863 0.13130903 0.3334695      99.8%
#>         statistic method
#> 1 proportion of 1 Wilson
#> 2 proportion of 1 Wilson
#> 3 proportion of 1 Wilson
#> 4 proportion of 1 Wilson
#> 5 proportion of 1 Wilson
#> 6 proportion of 1 Wilson
#> 7 proportion of 1 Wilson
#> 8 proportion of 1 Wilson

# specify to output proportions as percentages
phe_proportion(df, obs, pop, multiplier=100)
#>    area year obs pop    value   lowercl  uppercl confidence  statistic
#> 1 Area1 2015  47 128 36.71875 28.868455 45.34300        95% percentage
#> 2 Area2 2015  16 112 14.28571  8.988661 21.95144        95% percentage
#> 3 Area3 2015  78 182 42.85714 35.888349 50.12123        95% percentage
#> 4 Area4 2015  92 163 56.44172 48.768236 63.81856        95% percentage
#> 5 Area1 2016  74 145 51.03448 42.976859 59.03871        95% percentage
#> 6 Area2 2016  41 110 37.27273 28.809596 46.59479        95% percentage
#> 7 Area3 2016  26 117 22.22222 15.640397 30.57012        95% percentage
#> 8 Area4 2016  33 153 21.56863 15.790572 28.73940        95% percentage
#>   method
#> 1 Wilson
#> 2 Wilson
#> 3 Wilson
#> 4 Wilson
#> 5 Wilson
#> 6 Wilson
#> 7 Wilson
#> 8 Wilson

# specify level of detail to output for proportion
phe_proportion(df, obs, pop, confidence=99.8, multiplier=100)
#>    area year obs pop    value   lowercl  uppercl confidence  statistic
#> 1 Area1 2015  47 128 36.71875 24.906241 50.37539      99.8% percentage
#> 2 Area2 2015  16 112 14.28571  6.889889 27.29332      99.8% percentage
#> 3 Area3 2015  78 182 42.85714 32.157994 54.26849      99.8% percentage
#> 4 Area4 2015  92 163 56.44172 44.415166 67.75525      99.8% percentage
#> 5 Area1 2016  74 145 51.03448 38.544314 63.39681      99.8% percentage
#> 6 Area2 2016  41 110 37.27273 24.585624 51.99312      99.8% percentage
#> 7 Area3 2016  26 117 22.22222 12.707158 35.92956      99.8% percentage
#> 8 Area4 2016  33 153 21.56863 13.130903 33.34695      99.8% percentage
#>   method
#> 1 Wilson
#> 2 Wilson
#> 3 Wilson
#> 4 Wilson
#> 5 Wilson
#> 6 Wilson
#> 7 Wilson
#> 8 Wilson

# specify level of detail to output for proportion and remove metadata columns
phe_proportion(df, obs, pop, confidence=99.8, multiplier=100, type="standard")
#>    area year obs pop    value   lowercl  uppercl
#> 1 Area1 2015  47 128 36.71875 24.906241 50.37539
#> 2 Area2 2015  16 112 14.28571  6.889889 27.29332
#> 3 Area3 2015  78 182 42.85714 32.157994 54.26849
#> 4 Area4 2015  92 163 56.44172 44.415166 67.75525
#> 5 Area1 2016  74 145 51.03448 38.544314 63.39681
#> 6 Area2 2016  41 110 37.27273 24.585624 51.99312
#> 7 Area3 2016  26 117 22.22222 12.707158 35.92956
#> 8 Area4 2016  33 153 21.56863 13.130903 33.34695

# default rate
phe_rate(df, obs, pop)
#>    area year obs pop    value   lowercl  uppercl confidence
#> 1 Area1 2015  47 128 36718.75 26977.233 48829.25        95%
#> 2 Area2 2015  16 112 14285.71  8160.276 23200.39        95%
#> 3 Area3 2015  78 182 42857.14 33875.592 53488.33        95%
#> 4 Area4 2015  92 163 56441.72 45498.775 69221.48        95%
#> 5 Area1 2016  74 145 51034.48 40071.490 64069.99        95%
#> 6 Area2 2016  41 110 37272.73 26744.610 50565.98        95%
#> 7 Area3 2016  26 117 22222.22 14512.612 32562.00        95%
#> 8 Area4 2016  33 153 21568.63 14844.428 30291.35        95%
#>         statistic method
#> 1 rate per 100000  Byars
#> 2 rate per 100000  Byars
#> 3 rate per 100000  Byars
#> 4 rate per 100000  Byars
#> 5 rate per 100000  Byars
#> 6 rate per 100000  Byars
#> 7 rate per 100000  Byars
#> 8 rate per 100000  Byars

# specify rate parameters
phe_rate(df, obs, pop, confidence=99.8, multiplier=100)
#>    area year obs pop    value   lowercl  uppercl confidence    statistic
#> 1 Area1 2015  47 128 36.71875 22.342317 56.49356      99.8% rate per 100
#> 2 Area2 2015  16 112 14.28571  5.684786 29.17111      99.8% rate per 100
#> 3 Area3 2015  78 182 42.85714 29.399746 60.08691      99.8% rate per 100
#> 4 Area4 2015  92 163 56.44172 39.977631 77.11026      99.8% rate per 100
#> 5 Area1 2016  74 145 51.03448 34.631101 72.17572      99.8% rate per 100
#> 6 Area2 2016  41 110 37.27273 21.810353 59.02921      99.8% rate per 100
#> 7 Area3 2016  26 117 22.22222 11.111580 39.29392      99.8% rate per 100
#> 8 Area4 2016  33 153 21.56863 11.776009 35.90156      99.8% rate per 100
#>   method
#> 1  Byars
#> 2  Byars
#> 3  Byars
#> 4  Byars
#> 5  Byars
#> 6  Byars
#> 7  Byars
#> 8  Byars

# specify rate parameters and reduce columns output and remove metadata columns
phe_rate(df, obs, pop, type="standard", confidence=99.8, multiplier=100)
#>    area year obs pop    value   lowercl  uppercl
#> 1 Area1 2015  47 128 36.71875 22.342317 56.49356
#> 2 Area2 2015  16 112 14.28571  5.684786 29.17111
#> 3 Area3 2015  78 182 42.85714 29.399746 60.08691
#> 4 Area4 2015  92 163 56.44172 39.977631 77.11026
#> 5 Area1 2016  74 145 51.03448 34.631101 72.17572
#> 6 Area2 2016  41 110 37.27273 21.810353 59.02921
#> 7 Area3 2016  26 117 22.22222 11.111580 39.29392
#> 8 Area4 2016  33 153 21.56863 11.776009 35.90156

These functions can also return aggregate data if the input dataframes are grouped:



Aggregate functions

The remaining functions aggregate the rows in the input data frame to produce a single statistic. It is also possible to calculate multiple statistics in a single execution of these functions if the input data frame is grouped - for example by indicator ID, geographic area or time period (or all three). The output contains only the grouping variables and the values calculated by the function - any additional unused columns provided in the input data frame will not be retained in the output.

The df test data generated earlier can be used to demonstrate phe_mean:

Execute phe_mean

INPUT: The phe_mean function take a single data frame as input with a column representing the numbers to be averaged.

OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values (if applicable), the mean, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: The function also accepts additional arguments to specify the level of confidence and a reduced level of detail to be output.

Here are some example code chunks to demonstrate the phe_mean function and the arguments that can optionally be specified

Standardised Aggregate functions

Create some test data for the standardised aggregate functions

The following code chunk creates a data frame containing observed number of events and populations by age band for 4 areas, 5 time periods and 2 sexes:

Execute phe_dsr

INPUT: The minimum input requirement for the phe_dsr function is a single data frame with columns representing the numerators and denominators for each standardisation category. This is sufficient if the data is:

The 2013 European Standard Population is provided within the package in vector form (esp2013) and is used by default by this function. Alternative standard populations can be used but must be provided by the user. When the function joins a standard population vector to the input data frame it does this by position so it is important that the data is sorted accordingly. This is a user responsibility.

The function can also accept standard populations provided as a column within the input data frame.

OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values, the total count, the total population, the dsr, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: If standard populations are being provided as a column within the input data frame then the user must specify this using the stdpoptype argument as the function expects a vector by default. The function also accepts additional arguments to specify the standard populations, the level of confidence, the multiplier and a reduced level of detail to be output.

Here are some example code chunks to demonstrate the phe_dsr function and the arguments that can optionally be specified

# calculate separate dsrs for each area, year and sex
df_std %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop)
#> # A tibble: 40 x 11
#> # Groups:   area, year [20]
#>    area   year sex   total_count total_pop value lowercl uppercl confidence
#>    <fct> <int> <fct>       <int>     <int> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Fema~        1588    269585  542.    513.    573. 95%       
#>  2 Area1  2006 Male         2051    312726  682.    651.    713. 95%       
#>  3 Area1  2007 Fema~        1976    271347  748.    712.    785. 95%       
#>  4 Area1  2007 Male         2071    277302  741.    707.    776. 95%       
#>  5 Area1  2008 Fema~        1654    293895  597.    566.    629. 95%       
#>  6 Area1  2008 Male         1692    285377  604.    574.    636. 95%       
#>  7 Area1  2009 Fema~        1829    305132  659.    628.    692. 95%       
#>  8 Area1  2009 Male         1755    288565  598.    568.    629. 95%       
#>  9 Area1  2010 Fema~        2254    292523  722.    690.    755. 95%       
#> 10 Area1  2010 Male         1951    284716  671.    640.    704. 95%       
#> # ... with 30 more rows, and 2 more variables: statistic <chr>,
#> #   method <chr>

# calculate separate dsrs for each area, year and sex and drop metadata fields from output
df_std %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop, type="standard")
#> # A tibble: 40 x 8
#> # Groups:   area, year [20]
#>    area   year sex    total_count total_pop value lowercl uppercl
#>    <fct> <int> <fct>        <int>     <int> <dbl>   <dbl>   <dbl>
#>  1 Area1  2006 Female        1588    269585  542.    513.    573.
#>  2 Area1  2006 Male          2051    312726  682.    651.    713.
#>  3 Area1  2007 Female        1976    271347  748.    712.    785.
#>  4 Area1  2007 Male          2071    277302  741.    707.    776.
#>  5 Area1  2008 Female        1654    293895  597.    566.    629.
#>  6 Area1  2008 Male          1692    285377  604.    574.    636.
#>  7 Area1  2009 Female        1829    305132  659.    628.    692.
#>  8 Area1  2009 Male          1755    288565  598.    568.    629.
#>  9 Area1  2010 Female        2254    292523  722.    690.    755.
#> 10 Area1  2010 Male          1951    284716  671.    640.    704.
#> # ... with 30 more rows

# calculate same specifying standard population in vector form
df_std %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop, stdpop = esp2013)
#> # A tibble: 40 x 11
#> # Groups:   area, year [20]
#>    area   year sex   total_count total_pop value lowercl uppercl confidence
#>    <fct> <int> <fct>       <int>     <int> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Fema~        1588    269585  542.    513.    573. 95%       
#>  2 Area1  2006 Male         2051    312726  682.    651.    713. 95%       
#>  3 Area1  2007 Fema~        1976    271347  748.    712.    785. 95%       
#>  4 Area1  2007 Male         2071    277302  741.    707.    776. 95%       
#>  5 Area1  2008 Fema~        1654    293895  597.    566.    629. 95%       
#>  6 Area1  2008 Male         1692    285377  604.    574.    636. 95%       
#>  7 Area1  2009 Fema~        1829    305132  659.    628.    692. 95%       
#>  8 Area1  2009 Male         1755    288565  598.    568.    629. 95%       
#>  9 Area1  2010 Fema~        2254    292523  722.    690.    755. 95%       
#> 10 Area1  2010 Male         1951    284716  671.    640.    704. 95%       
#> # ... with 30 more rows, and 2 more variables: statistic <chr>,
#> #   method <chr>

# calculate the same dsrs by appending the standard populations to the data frame
df_std %>%
    mutate(refpop = rep(esp2013,40)) %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs,pop, stdpop=refpop, stdpoptype="field")
#> # A tibble: 40 x 11
#> # Groups:   area, year [20]
#>    area   year sex   total_count total_pop value lowercl uppercl confidence
#>    <fct> <int> <fct>       <int>     <int> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Fema~        1588    269585  542.    513.    573. 95%       
#>  2 Area1  2006 Male         2051    312726  682.    651.    713. 95%       
#>  3 Area1  2007 Fema~        1976    271347  748.    712.    785. 95%       
#>  4 Area1  2007 Male         2071    277302  741.    707.    776. 95%       
#>  5 Area1  2008 Fema~        1654    293895  597.    566.    629. 95%       
#>  6 Area1  2008 Male         1692    285377  604.    574.    636. 95%       
#>  7 Area1  2009 Fema~        1829    305132  659.    628.    692. 95%       
#>  8 Area1  2009 Male         1755    288565  598.    568.    629. 95%       
#>  9 Area1  2010 Fema~        2254    292523  722.    690.    755. 95%       
#> 10 Area1  2010 Male         1951    284716  671.    640.    704. 95%       
#> # ... with 30 more rows, and 2 more variables: statistic <chr>,
#> #   method <chr>

# calculate for under 75s by filtering out records for 75+ from input data frame and standard population
df_std %>%
    filter(ageband <= 70) %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop, stdpop = esp2013[1:15])
#> # A tibble: 40 x 11
#> # Groups:   area, year [20]
#>    area   year sex   total_count total_pop value lowercl uppercl confidence
#>    <fct> <int> <fct>       <int>     <int> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Fema~        1034    216699  501.    470.    533. 95%       
#>  2 Area1  2006 Male         1728    252257  708.    675.    743. 95%       
#>  3 Area1  2007 Fema~        1526    210474  740.    702.    780. 95%       
#>  4 Area1  2007 Male         1570    215934  725.    689.    763. 95%       
#>  5 Area1  2008 Fema~        1271    235630  593.    559.    628. 95%       
#>  6 Area1  2008 Male         1410    226722  620.    588.    654. 95%       
#>  7 Area1  2009 Fema~        1610    241629  691.    657.    727. 95%       
#>  8 Area1  2009 Male         1344    218049  602.    569.    635. 95%       
#>  9 Area1  2010 Fema~        1584    235585  667.    634.    702. 95%       
#> 10 Area1  2010 Male         1599    230790  680.    646.    714. 95%       
#> # ... with 30 more rows, and 2 more variables: statistic <chr>,
#> #   method <chr>
    
# calculate separate dsrs for persons for each area and year)
df_std %>%
    group_by(area, year, ageband) %>%
    summarise(obs = sum(obs),
              pop = sum(pop)) %>%
    group_by(area, year) %>%
    phe_dsr(obs,pop)
#> # A tibble: 20 x 10
#> # Groups:   area [4]
#>    area   year total_count total_pop value lowercl uppercl confidence
#>    <fct> <int>       <int>     <int> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006        3639    582311  601.    580.    623. 95%       
#>  2 Area1  2007        4047    548649  739.    715.    764. 95%       
#>  3 Area1  2008        3346    579272  587.    566.    608. 95%       
#>  4 Area1  2009        3584    593697  624.    603.    646. 95%       
#>  5 Area1  2010        4205    577239  698.    675.    721. 95%       
#>  6 Area2  2006        3732    595337  620.    599.    642. 95%       
#>  7 Area2  2007        3656    576631  671.    648.    694. 95%       
#>  8 Area2  2008        3367    604795  604.    583.    626. 95%       
#>  9 Area2  2009        3611    563165  652.    629.    675. 95%       
#> 10 Area2  2010        3159    602672  513.    494.    532. 95%       
#> 11 Area3  2006        4166    574100  730.    707.    755. 95%       
#> 12 Area3  2007        3791    585599  658.    636.    681. 95%       
#> 13 Area3  2008        3907    560264  728.    704.    753. 95%       
#> 14 Area3  2009        3744    602071  613.    592.    634. 95%       
#> 15 Area3  2010        4227    589856  747.    723.    771. 95%       
#> 16 Area4  2006        3545    553076  662.    639.    685. 95%       
#> 17 Area4  2007        3911    571843  696.    673.    720. 95%       
#> 18 Area4  2008        3860    582601  693.    670.    716. 95%       
#> 19 Area4  2009        3349    588160  602.    581.    624. 95%       
#> 20 Area4  2010        4227    571729  724.    701.    748. 95%       
#> # ... with 2 more variables: statistic <chr>, method <chr>

Execute phe_smr and phe_isr

INPUT: Unlike the phe_dsr function, there is no default standard or reference data for the phe_smr and phe_isr functions. These functions take a single data frame as input, with columns representing the numerators and denominators for each standardisation category, plus reference numerators and denominators for each standardisation category.

The reference data can either be provided in a separate data frame/vectors or as columns within the input data frame:

OUTPUT: By default, the functions output one row per grouping set containing the grouping variable values, the observed and expected counts, the reference rate (isr only), the smr or isr, the lower 95% confidence limit, and the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: If reference data are being provided as columns within the input data frame then the user must specify this as the function expects vectors by default. The function also accepts additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.

The following code chunk creates a data frame containing the reference data - this example uses the all area data for persons in the baseline year:

Here are some example code chunks to demonstrate the phe_smr function and the arguments that can optionally be specified

# calculate separate smrs for each area, year and sex
df_std %>%
    group_by(area, year, sex) %>%
    phe_smr(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 x 11
#> # Groups:   area, year [20]
#>    area   year sex   observed expected value lowercl uppercl confidence
#>    <fct> <int> <fct>    <int>    <dbl> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Fema~     1588    1760. 0.902   0.858   0.948 95%       
#>  2 Area1  2006 Male      2051    2047. 1.00    0.959   1.05  95%       
#>  3 Area1  2007 Fema~     1976    1827. 1.08    1.03    1.13  95%       
#>  4 Area1  2007 Male      2071    1850. 1.12    1.07    1.17  95%       
#>  5 Area1  2008 Fema~     1654    1968. 0.841   0.801   0.882 95%       
#>  6 Area1  2008 Male      1692    1886. 0.897   0.855   0.941 95%       
#>  7 Area1  2009 Fema~     1829    2024. 0.904   0.863   0.946 95%       
#>  8 Area1  2009 Male      1755    1913. 0.917   0.875   0.961 95%       
#>  9 Area1  2010 Fema~     2254    1932. 1.17    1.12    1.22  95%       
#> 10 Area1  2010 Male      1951    1879. 1.04    0.993   1.09  95%       
#> # ... with 30 more rows, and 2 more variables: statistic <chr>,
#> #   method <chr>

# calculate the same smrs by appending the reference data to the data frame
df_std %>%
    mutate(refobs = rep(df_ref$obs,40),
           refpop = rep(df_ref$pop,40)) %>%
    group_by(area, year, sex) %>%
    phe_smr(obs, pop, refobs, refpop, refpoptype="field")
#> # A tibble: 40 x 11
#> # Groups:   area, year [20]
#>    area   year sex   observed expected value lowercl uppercl confidence
#>    <fct> <int> <fct>    <int>    <dbl> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Fema~     1588    1760. 0.902   0.858   0.948 95%       
#>  2 Area1  2006 Male      2051    2047. 1.00    0.959   1.05  95%       
#>  3 Area1  2007 Fema~     1976    1827. 1.08    1.03    1.13  95%       
#>  4 Area1  2007 Male      2071    1850. 1.12    1.07    1.17  95%       
#>  5 Area1  2008 Fema~     1654    1968. 0.841   0.801   0.882 95%       
#>  6 Area1  2008 Male      1692    1886. 0.897   0.855   0.941 95%       
#>  7 Area1  2009 Fema~     1829    2024. 0.904   0.863   0.946 95%       
#>  8 Area1  2009 Male      1755    1913. 0.917   0.875   0.961 95%       
#>  9 Area1  2010 Fema~     2254    1932. 1.17    1.12    1.22  95%       
#> 10 Area1  2010 Male      1951    1879. 1.04    0.993   1.09  95%       
#> # ... with 30 more rows, and 2 more variables: statistic <chr>,
#> #   method <chr>

# calculate separate smrs for each year and drop metadata columns from output
df_std %>%
    group_by(year, ageband) %>%
    summarise(obs = sum(obs),
              pop = sum(pop)) %>%
    group_by(year) %>%
    phe_smr(obs, pop, df_ref$obs, df_ref$pop, type="standard")
#> # A tibble: 5 x 6
#>    year observed expected value lowercl uppercl
#>   <int>    <int>    <dbl> <dbl>   <dbl>   <dbl>
#> 1  2006    15082   15082  1       0.984   1.02 
#> 2  2007    15405   15086. 1.02    1.01    1.04 
#> 3  2008    14480   15383. 0.941   0.926   0.957
#> 4  2009    14288   15485. 0.923   0.908   0.938
#> 5  2010    15818   15432. 1.03    1.01    1.04

The phe_isr function works exactly the same way but instead of expressing the result as a ratio of the observed and expected rates the result is expressed as a rate and the reference rate is also provided. Here are some examples:

# calculate separate isrs for each area, year and sex
df_std %>%
    group_by(area, year, sex) %>%
    phe_isr(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 x 12
#> # Groups:   area, year [20]
#>    area   year sex   observed expected ref_rate value lowercl uppercl
#>    <fct> <int> <fct>    <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl>
#>  1 Area1  2006 Fema~     1588    1760.     654.  590.    562.    620.
#>  2 Area1  2006 Male      2051    2047.     654.  656.    628.    685.
#>  3 Area1  2007 Fema~     1976    1827.     654.  708.    677.    740.
#>  4 Area1  2007 Male      2071    1850.     654.  732.    701.    765.
#>  5 Area1  2008 Fema~     1654    1968.     654.  550.    524.    577.
#>  6 Area1  2008 Male      1692    1886.     654.  587.    559.    616.
#>  7 Area1  2009 Fema~     1829    2024.     654.  591.    565.    619.
#>  8 Area1  2009 Male      1755    1913.     654.  600.    573.    629.
#>  9 Area1  2010 Fema~     2254    1932.     654.  763.    732.    796.
#> 10 Area1  2010 Male      1951    1879.     654.  680.    650.    710.
#> # ... with 30 more rows, and 3 more variables: confidence <chr>,
#> #   statistic <chr>, method <chr>

# calculate the same isrs by appending the reference data to the data frame
df_std %>%
    mutate(refobs = rep(df_ref$obs,40),
           refpop = rep(df_ref$pop,40)) %>%
    group_by(area, year, sex) %>%
    phe_isr(obs, pop, refobs, refpop, refpoptype="field")
#> # A tibble: 40 x 12
#> # Groups:   area, year [20]
#>    area   year sex   observed expected ref_rate value lowercl uppercl
#>    <fct> <int> <fct>    <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl>
#>  1 Area1  2006 Fema~     1588    1760.     654.  590.    562.    620.
#>  2 Area1  2006 Male      2051    2047.     654.  656.    628.    685.
#>  3 Area1  2007 Fema~     1976    1827.     654.  708.    677.    740.
#>  4 Area1  2007 Male      2071    1850.     654.  732.    701.    765.
#>  5 Area1  2008 Fema~     1654    1968.     654.  550.    524.    577.
#>  6 Area1  2008 Male      1692    1886.     654.  587.    559.    616.
#>  7 Area1  2009 Fema~     1829    2024.     654.  591.    565.    619.
#>  8 Area1  2009 Male      1755    1913.     654.  600.    573.    629.
#>  9 Area1  2010 Fema~     2254    1932.     654.  763.    732.    796.
#> 10 Area1  2010 Male      1951    1879.     654.  680.    650.    710.
#> # ... with 30 more rows, and 3 more variables: confidence <chr>,
#> #   statistic <chr>, method <chr>

# calculate separate isrs for each year and drop metadata columns from output
df_std %>%
    group_by(year, ageband) %>%
    summarise(obs = sum(obs),
              pop = sum(pop)) %>%
    group_by(year) %>%
    phe_isr(obs, pop, df_ref$obs, df_ref$pop, type="standard")
#> # A tibble: 5 x 7
#>    year observed expected ref_rate value lowercl uppercl
#>   <int>    <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl>
#> 1  2006    15082   15082      654.  654.    644.    665.
#> 2  2007    15405   15086.     654.  668.    658.    679.
#> 3  2008    14480   15383.     654.  616.    606.    626.
#> 4  2009    14288   15485.     654.  604.    594.    614.
#> 5  2010    15818   15432.     654.  671.    660.    681.