prune¶
Create a "pruned" dataset zip for sharing or archiving, using a pruner spec.
The goal of brkraw prune is to make Paravision datasets easier to share by:
- keeping only the files you need (or dropping sensitive/unnecessary files)
- optionally stripping JCAMP comment lines (
$$ ...) - optionally editing or deleting specific JCAMP parameters (via
update_params) - producing a reproducible sidecar (
.prune.yaml) describing what was done
This is especially useful when you want to share a dataset with collaborators without exposing private metadata or irrelevant files.
Basic usage¶
Prune a dataset using a spec file path:
brkraw prune /path/to/dataset --spec /path/to/prune_spec.yaml
Use an installed pruner spec by name:
brkraw prune /path/to/dataset --spec-name minimal_share
Write to a specific zip path:
brkraw prune /path/to/dataset --spec prune_spec.yaml --output out.zip
What a pruner spec controls¶
A pruner spec is a YAML mapping that defines:
- which files to keep or drop (
files+mode) - optional directory-level filters (
dirs) - optional JCAMP edits (
update_params) - optional root folder handling inside the zip (
add_root,root_name) - optional comment stripping for JCAMP files (
strip_jcamp_comments)
files is always required and must contain at least one selector.
Selectors are matched by either:
- full dataset-relative path (e.g.
pdata/1/visu_pars) - basename only (e.g.
visu_pars)
keep vs drop¶
mode: keep¶
Only files matching files are included.
Example:
mode: keep
files:
- visu_pars
- reco
- method
- acqp
mode: drop¶
Files matching files are excluded, everything else is included.
Example:
mode: drop
files:
- subject
- patient
- private_notes.txt
Notes:
- The selection is evaluated after directory rules (if any).
- If no files remain after applying rules, the prune fails.
Directory rules (dirs)¶
dirs allows filtering by directory names at specific path levels.
Each rule is a mapping:
- level: integer (1-based)
- dirs: list of directory names allowed or disallowed (depends on mode)
Example: keep only scans 3 and 5 (level 1 is usually scan folder level)
dirs:
- level: 1
dirs: [3, 5]
Example: keep only reco folders 1 and 2 (level 3 is often pdata level)
dirs:
- level: 3
dirs: [1, 2]
CLI overrides:
--scan-ids overrides a level=1 dirs rule --reco-ids overrides a level=3 dirs rule
Examples:
brkraw prune /path/to/dataset --spec prune.yaml --scan-ids 3 5
brkraw prune /path/to/dataset --spec prune.yaml --reco-ids 1,2
Notes:
- The CLI override rules are applied as:
- scan_ids: level=1
- reco_ids: level=3
JCAMP parameter edits (update_params)¶
update_params allows you to edit or delete JCAMP parameter keys in selected files.
Structure:
update_params:
<filename>:
<PARAM_KEY>: <value-or-null>
Rules:
- The map key is a filename (basename only), not a full path.
- If a file with that basename is included, it will be rewritten in the output zip.
- Values are converted to strings internally (except null).
- If the value is null, the key is removed (or cleared depending on Parameters behavior).
Example:
update_params:
subject:
SUBJECT_id: null
SUBJECT_name: null
method:
Operator: null
Important:
- Updates are applied by parsing the file as JCAMP parameters.
- If parsing fails, prune fails with an error.
- Updates apply only to files that are included by keep/drop selection.
Strip JCAMP comments¶
Some Paravision parameter files include comment lines starting with $$.
You can remove them from files that are included in the zip.
From CLI:
brkraw prune /path/to/dataset --spec prune.yaml --strip-jcamp-comments
From spec:
strip_jcamp_comments: true
Behavior:
- If a file is being rewritten due to update_params, comment stripping is applied after edits.
- If a file is included and looks like JCAMP, it can be stripped even without update_params.
Output zip naming¶
Default behavior¶
If --output is not provided, BrkRaw tries to use root_name from the spec.
If the spec has no root_name, you must provide --output.
When a default output is generated, it is written to the current working directory.
Root folder in the zip¶
The zip can include a top-level root directory (recommended for clean unpacking).
Spec fields:
- add_root: true or false (default: true)
- root_name: string (optional)
Notes:
- When
--outputis provided, the root folder name defaults to the output filename stem. - You can override that with root_name in the spec (or by not providing --output).
Template variables in spec¶
The CLI supports simple template variables, substituted into the spec before execution.
Use:
--set-var KEY=VALUE
Example:
brkraw prune /path/to/dataset --spec prune.yaml --set-var Project=CAMRI
In the spec, reference it using $KEY:
root_name: "$Project_shared"
Notes:
- Substitution is recursive for all strings in the spec.
- Unknown variables are left unchanged.
Spec validation¶
By default, prune specs are validated against the schema.
Disable validation:
brkraw prune /path/to/dataset --spec prune.yaml --no-validate
Sidecar output (.prune.yaml)¶
After pruning, BrkRaw writes a sidecar next to the output zip:
<output>.prune.yaml
It contains:
- timestamp (UTC)
- input path and output path
- the spec path and a summary of spec keys
- CLI overrides (mode, strip_jcamp_comments, scan_ids, reco_ids, set_vars)
- computed overrides (root_name_override, dirs_override, template_vars)
This sidecar is meant to make pruning reproducible and auditable.
Example prune spec¶
This is a minimal example that keeps only a few core parameter files, drops large raw data, and removes subject identifiers.
__meta__:
name: minimal_share
description: Minimal shareable dataset (no raw data, anonymized params)
mode: keep
files:
- method
- acqp
- reco
- visu_pars
- subject
dirs:
- level: 1
dirs: [3]
- level: 3
dirs: [1]
update_params:
subject:
SUBJECT_id: null
SUBJECT_name: null
add_root: true
root_name: "shared_scan3"
strip_jcamp_comments: true
Common pitfalls¶
- A prune spec must include
fileswith at least one selector. update_paramsmatches by basename only (not full path).--scan-idsand--reco-idsoverride directory rules at fixed levels (1 and 3).- If filtering removes all files, pruning fails.
- If a JCAMP file cannot be parsed for updates, pruning fails.