Initial release.

Display labels and API simplification

  • load() is no longer a public method; the constructor is the only way to install the base data. Use CohortPipeline$new(dt) (with optional cache_file).
  • CohortPipeline$new() and new_cohort() accept a label = argument: a human-readable display label used by CONSORT diagrams and list_cohorts(). Cohort identifiers ("root", "adults_female") remain how you reference cohorts in code; labels are presentation only.
  • Re-issuing new_cohort() with a different label silently updates the field without invalidating the cache.

Incremental caching (new)

  • CohortPipeline$new() accepts a cache_file = argument. When the file exists, the pipeline is restored from it and operations replay against the cached log; matching operations are no-ops (cache hits), divergent operations recompute only the affected cohort and its descendants. cp$save() writes the current state to the cache file.
  • set_artifact() gains an argset parameter and accepts the new 3-argument fn signature function(dt, sib, argset). The cache key is (name, from, body(fn), argset). The legacy function(dt, sib) signature is still accepted (compat shim).
  • New cp$invalidate(cohort, artifact) to manually drop a cached cohort or artifact when a helper changes that the cache key cannot detect.
  • new_cohort(name, from) is now idempotent on identical (name, parent) pairs (required for cache replay). It still errors loudly if the same name is reissued with a different parent.

Branding

  • Hex sticker logo at man/figures/logo.png, generated by data-raw/logo.R and matching the house style of sibling packages (plnr, attrib, org).

Features

  • CohortPipeline R6 class for cohort construction with full provenance:
    • Branched cohort trees via $new_cohort(name, from).
    • Per-step exclusion logging via $exclude_and_track(branch, reason, expr_str).
    • Cached derived artifacts via $set_artifact(name, from, fn).
    • Schema validation via $declare_schema() and $validate().
    • CONSORT diagram generation via $plot() (auto-discovered) or $draw_consort_panels() (manual layouts).

The freeze rule

A cohort becomes frozen the first time another cohort branches from it, or the first time an artifact is set on it. After freezing, $exclude_and_track() on that cohort errors. The rule guarantees a cohort’s name maps to exactly one definition forever and that cached artifacts stay consistent with the included rows that produced them. Multi-way forks are unaffected.

Implementation notes

  • Shared base table + per-branch integer index of included rows. Branching never copies the data values.
  • Integer status codes (per branch) replace in-band string status columns. The user’s data table is never mutated.
  • Exclusion logs accumulate in a list and materialize on read, avoiding quadratic rbind growth.
  • $get_included() always returns an independent copy.
  • $get_everyone() reconstructs a per-branch full view (rows + a .cohort_status column) so the returned object is meaningful for any branch in the tree.
  • $plot() defaults to plotting every cohort regardless of freeze state, so unfrozen leaf cohorts (the typical analytic endpoint) are always shown.
  • CONSORT rendering is now implemented directly in grid instead of via the consort package. Arrows now terminate exactly at side-box edges (the consort rendering placed them inside the box), and multi-panel layouts top-align panel content. The consort package has been dropped from Imports.