Bug fixes

  • exclude_and_track() now errors when a predicate evaluates to a non-logical result (character, numeric, factor, or a logical matrix) instead of silently coercing it with as.logical(). A numeric predicate previously excluded every non-zero row and a character predicate logged a phantom exclusion, corrupting the audit trail. The documented contract – predicates return TRUE/FALSE, NA means keep – is now enforced.
  • Warm-cache replay: changing a non-root cohort’s later own exclusion no longer retains the stale step. .invalidate_from() treated its cursor as a branch-local count while the caller passed an absolute log index, double-counting the inherited prefix so the truncation under-removed. Root cohorts were unaffected (zero prefix).
  • Warm-cache replay: moving $new_cohort() before a parent exclusion (an order that errors on a cold run via the freeze rule) now errors instead of silently reusing the child’s stale post-exclusion snapshot. The parent’s replay position is checked against the child’s branch point; on mismatch you are directed to $invalidate() (or to delete the cache).

Initial release.

Display labels and API simplification

  • load() is no longer a public method; the constructor is the only way to install the base data. Use CohortPipeline$new(dt) (with optional cache_file).
  • CohortPipeline$new() and $new_cohort() accept a label = argument: a human-readable display label used by CONSORT diagrams and $list_cohorts(). Cohort identifiers ("root", "adults_female") remain how you reference cohorts in code; labels are presentation only.
  • Re-issuing $new_cohort() with a different label silently updates the field without invalidating the cache.

Cache integrity

  • The cache now stores a digest::digest(dt, algo = "spookyhash") hash of the base table alongside the operation log. On warm load with a supplied dt, the hash is recomputed and compared; a mismatch warns loudly and rebuilds from scratch. Catches silent data updates that would otherwise produce wrong results from stale cached state. Adds digest to Imports.
  • Cache schema version bumped to 3.

Incremental caching (new)

  • CohortPipeline$new() accepts a cache_file = argument. When the file exists, the pipeline is restored from it and operations replay against the cached log; matching operations are no-ops (cache hits), divergent operations recompute only the affected cohort and its descendants. cp$save() writes the current state to the cache file.
  • set_artifact() gains an argset parameter and accepts the new 3-argument fn signature function(dt, sib, argset). The cache key is (name, from, body(fn), argset). The legacy function(dt, sib) signature is still accepted (compat shim).
  • New cp$invalidate(cohort, artifact) to manually drop a cached cohort or artifact when a helper changes that the cache key cannot detect.
  • $new_cohort(name, from) is now idempotent on identical (name, parent) pairs (required for cache replay). It still errors loudly if the same name is reissued with a different parent.

Branding

  • Hex sticker logo at man/figures/logo.png, generated by data-raw/logo.R and matching the house style of sibling packages (plnr, attrib, org).

Features

  • CohortPipeline R6 class for cohort construction with full provenance:
    • Branched cohort trees via $new_cohort(name, from).
    • Per-step exclusion logging via $exclude_and_track(branch, reason, expr_str).
    • Cached derived artifacts via $set_artifact(name, from, fn).
    • Schema validation via $declare_schema() and $validate().
    • CONSORT diagram generation via $plot() (auto-discovered) or $draw_consort_panels() (manual layouts).

The freeze rule

A cohort becomes frozen the first time another cohort branches from it, or the first time an artifact is set on it. After freezing, $exclude_and_track() on that cohort errors. The rule guarantees a cohort’s name maps to exactly one definition forever and that cached artifacts stay consistent with the included rows that produced them. Multi-way forks are unaffected.

Implementation notes

  • Shared base table + per-branch integer index of included rows. Branching never copies the data values.
  • Integer status codes (per branch) replace in-band string status columns. The user’s data table is never mutated.
  • Exclusion logs accumulate in a list and materialize on read, avoiding quadratic rbind growth.
  • $get_included() always returns an independent copy.
  • $get_everyone() reconstructs a per-branch full view (rows + a .cohort_status column) so the returned object is meaningful for any branch in the tree.
  • $plot() defaults to plotting every cohort regardless of freeze state, so unfrozen leaf cohorts (the typical analytic endpoint) are always shown.
  • CONSORT rendering is now implemented directly in grid instead of via the consort package. Arrows now terminate exactly at side-box edges (the consort rendering placed them inside the box), and multi-panel layouts top-align panel content. The consort package has been dropped from Imports.