DEX Guide
Main Navigation
DEX Users Guide
Data Preparation Process
To prepare your dataset for analysis using PatternDE, follow these steps in the recommended order:
-
Edit covariate column headers, adding appropriate metadata
- Flag the Target Covariate. One and only one covariate must be flagged as the target
- Flag any Enumerated Covariates
- Flag any Covariates that you want to ignore
- If a column is row labels, flag it. No more than one column may be flagged as row labels
- Flag any Nominal Covariates
-
Use Validation tools to find and fix missing and malformed data
- Run the VALIDATE ALL menu command to see what issues remain
- Run the MARK INVALID COVARIATES menu command to ignore covariates that are missing data
- Optionally, you can edit every cell with problem data, but only to replace problem data with a valid value.
- Run the EDIT BAD CELLS menu command, but only replace bad data with accurate observed values.
- If you do not know such values, it is better to ignore the entire covariate or remove the observation row
- Run the VALIDATE ALL menu command again, to ensure all issues have been fixed
- SAVEAS your revised data set with a different file name. Be sure to use the revised dataset when you upload the dataset for analysis
-
After completing the above steps and passing VALIDATE ALL, you are ready to upload your dataset for analysis
Data Preparation Best Practices
Recommendations
- Always keep a copy of your original dataset
- Ensure your dataset has consistent units across all observations
- Document any changes you make to the original data
- When in doubt about a data value, it's better to mark it as missing than to guess
- Use descriptive column names that clearly indicate what the data represents
Common Pitfalls
- Using inconsistent data formats (especially for dates)
- Including calculated columns that depend on other columns
- Mixing different units in the same column
- Not properly identifying categorical variables
- Including identifying information (e.g., patient names, IDs) that should be removed
Example: Before and After Preparation
Before Preparation
Patient | Age | Gender | Weight | Height | BMI | Diagnosis |
---|---|---|---|---|---|---|
John Smith | 42 | M | 180 lbs | 5'11" | 25.1 | Type 2 Diabetes |
Mary Jones | -35 | F | 140 | 5'4" | 24.0 | None |
Robert Lee | 58 | M | N/A | missing | ?? | Hypertension |
Susan Chen | 29 | F | 62 kg | 168 cm | 22.0 | Healthy |
Issues: Inconsistent units, invalid age value, missing data, calculated BMI column, mixed formats, patient names included
After Preparation
ID | Age | Gender | Weight_kg | Height_cm | Diagnosis |
---|---|---|---|---|---|
001 | 42 | M | 81.6 | 180.3 | Type 2 Diabetes |
002 | 35 | F | 63.5 | 162.6 | Healthy |
003 | 58 | M | NULL | NULL | Hypertension |
004 | 29 | F | 62.0 | 168.0 | Healthy |
Improvements: Consistent units, corrected age value, clear labeling of units in column headers, removed calculated columns, anonymized IDs instead of names