YAML Configuration
This section is continually being expanded as the configuration feature set is modified. For the time being, see existing dataset configuration files in this directory for examples.
Top-level YAML Sections
Section |
Description |
|---|---|
|
Dataset tag name (e.g. “HW”); used minimally but not quite |
|
Contains global settings that are applied across all variables |
|
Required Under |
|
Required Under |
|
Under |
|
Under |
|
This section contains one block for each variable in the dataset, |
|
This section defines variables to be derived from existing variables |
Variables YAML Section
Each variable in the dataset is assigned a normalized encoded value (e.g. HW00001, HW00002, etc.). Under each variable block, there are a variety of other possible configuration settings:
Section |
Description |
|---|---|
|
This is the header of the variable in the input dataset |
|
Either this or shared_model are required Expected variable type; one of: |
|
Either this or type are required Expected variable type as defined |
|
If desired, a string with a more descriptive variable name than what’s present in |
|
Accepts numeric values for tags |
|
A boolean to turn off printing a table of unique values and counts in the |
|
A boolean to override cleaned output for a variable: all values will be set |
|
For |
|
Required once per dataset Boolean flag to mark which variable is the accepted age of the subjects |
|
Required once per dataset Boolean flag to mark which variable is the accepted unique subject ID |
|
Any non-canonical values to be treated as NA (e.g. nil, not specified, etc.) |
|
Used to define another variable for plotting overlayed histograms, |
|
Only for variables of type |
|
Test for expected relationships between variables; can also include |
|
For |
Derived YAML Section
Derived variables are calculated from existing data, e.g. calculating BMI from reported waist and height measurements. This section allows the user to define arbitrary new variables to derive.
Most sections here have been previously described, but
codeis where the logic is injected to create the derived variable, written inRsyntax with access to the normalized variable names
YAML Validation
Prior to running this tool, you should validate the YAML configurations you’ve set up as follows:
dataset.schema <- system.file("validator",
"schema.datasets.yaml",
package = "process.phenotypes")
shared.models.schema <- system.file("validator",
"schema.shared-models.yaml",
package = "process.phenotypes")
process.phenotypes::config.validation("/path/to/your.dataset.yaml",
"/path/to/your.shared-models.yaml",
dataset.schema,
shared.models.schema)
This command will compare your configuration files to the set of guidelines and restrictions we’ve specified for the package. If your configuration settings are valid, you’ll get a confirmation message to that effect; otherwise, the function will emit a summary of the restriction that wasn’t met.