process.phenotypes: automated phenotype standardization and reporting
Documentation of phenotype cleaning at 54gene
- Overview
- Installation
- Execution
- YAML Configuration
- Integration with SurveyCTO
- Description of Data Cleaning
- Load Raw Input Phenotype Data
- Read in Configuration YAML
- Drop Invalid Columns
- Sanitize Header Content in the Input Phenotype Data
- String Cleanup
- Apply Consent Exclusions
- Apply Variable-specific NA Values
- Apply Type Conversions
- Exclude Subjects by Age
- Exclude Subjects Missing Subject IDs
- Apply Bounds on Numeric Data
- Attempt to Harmonize Self-reported Ancestry Labels
- Create Derived Variables
- Re-apply Bounds on Derived Numeric Variables
- Check Cross-variable Dependencies
- Handle Dependency Failures
- Compute Distribution Data for Numeric Variables
- Remove Subjects with Excess Invalid Entries Across All Variables
- Emit an HTML Report
- Write Out Cleaned Phenotype Data in TSV and/or Various Other Formats
- Variable Types Available
- Derived Variables
- Helper Functions
- HTML Report
- Vignettes
- How to Contribute to Development