R-Package “plnr” (version 2022.6.8) Published on CRAN

“plnr” is a system to plan analyses within the mental model where you have one (or more) datasets and want to run either A) the same function multiple times with different arguments, or B) multiple functions. This is appropriate when you have multiple strata (e.g. locations, age groups) that you want to apply the same function to, or you have multiple variables (e.g. exposures) that you want to apply the same statistical method to, or when you are creating the output for a report and you need multiple different tables or graphs.

Author
Affiliation
Published

June 8, 2022

This blog post has also been posted here.

Changes since last version

The R-package “plnr” (version 2022.6.8) has been published on CRAN. “plnr” is a part of the splverse, a set of R packages developed to help solve problems that frequently occur when performing infectious disease surveillance. “plnr” has two vignettes that briefly show the mental model behind “plnr”:

Concept

Broad technical terms
Object Description
argset A named list containing a set of arguments.
analysis

These are the fundamental units that are scheduled in plnr:

  • 1 argset
  • 1 (action) function that takes two arguments
    1. data (named list)
    2. argset (named list)
plan

This is the overarching “scheduler”:

  • 1 data pull
  • 1 list of analyses
Different types of plans
Plan Type Description
Single-function plan Same action function applied multiple times with different argsets applied to the same datasets.
Multi-function plan Different action functions applied to the same datasets.
Plan Examples
Plan Type Example
Single-function plan Multiple strata (e.g. locations, age groups) that you need to apply the same function to to (e.g. outbreak detection, trend detection, graphing).
Single-function plan Multiple variables (e.g. multiple outcomes, multiple exposures) that you need to apply the same statistical methods to (e.g. regression models, correlation plots).
Multi-function plan Creating the output for a report (e.g. multiple different tables and graphs).

In brief, we work within the mental model where we have one (or more) datasets and we want to run multiple analyses on these datasets. These multiple analyses can take the form of:

  • Single-function plans: One action function (e.g. table_1) called multiple times with different argsets (e.g. year=2019, year=2020).
  • Multi-function plans: Multiple action functions (e.g. table_1, table_2) called multiple times with different argsets (e.g. table_1: year=2019, while for table_2: year=2019 and year=2020)

By demanding that all analyses use the same data sources we can:

  1. Be efficient with requiring the minimal amount of data-pulling (this only happens once at the start).
  2. Better enforce the concept that data-cleaning and analysis should be completely separate.

By demanding that all analysis functions only use two arguments (data and argset) we can:

  1. Reduce mental fatigue by working within the same mental model for each analysis.
  2. Make it easier for analyses to be exchanged with each other and iterated on.
  3. Easily schedule the running of each analysis.

By including all of this in one Plan class, we can easily maintain a good overview of all the analyses (i.e. outputs) that need to be run.