PMAP Labs Feature Engineering

Created: December 12, 2022
Last Modified: Feburary 15, 2023
Total runtime: 1547 sec

This notebook explores what labs we have, their prevalence, and to identify which of these would be important for our models. After the initial analysis is done, the results would be used to generate the lab-based features.

Author: Vina Ro

Subsetting from labs dataframe

  1. Get labs of pat_enc_csn_sids related to a CHF hospital stay.
  2. Keep only rows of data where the sample is collected during each hospital stay, i.e. btw hosp_admsn_time and hosp_disch_time.

After this, the information was combined, and annotated by hand.

I looked at the distributions of the 'ord_value' column to determine whether a field was

  1. 'numerical' : All numbers, but allowed inclusion of 'see comments' and similar statements since these were found in all labs)
  2. 'categorical' : Responses belonged to a set of ranges or strings)
  3. 'mixed' : Usually numerical categories with some '<x' or 'x-y' type values).

By looking at the distributions of the specimen type / specimen source columns, as well as the component names, I allocated 'lab_custom_group' names to the labs.
If a lab contains all 'specimentype' as 'blood', it begins with 'blood'.
If a lab contained 'specimen_type' as ['Blood', nan], I assumed all the specimens were blood for that particular lab_id (componentid in PMAP), and named it as 'blood'.
If a lab had more than one source (instruments, bacteria cultures etc.), either it has no prefix, or a prefix of 'mixed'

This is stored in "LabsRelevantInfo.csv"

Get Proportion of Each Lab Custom Group in our Cohort

Find the number & proportion of ICU stays from the cohort associated with each vent_set meas_id.
Store the information in a .csv

Data Cleaning

Special Treatment for Labs, discussed w/ Dr. F
  1. rare_labs: labs to keep which usually have a lower prevalence but are associated w/ CHF
  2. skipped_labs: labs to delete which are irrelevant / distribution doesnt make sense

1. Prepare df

2. Create unit conversion dictionary for each lab_custom_group

Used in cleaning numerical labs

3. Cleaning numerical labs

4. Cleaning categorical labs

Create dictionary to look at unique values in each lab custom group.

Feature Engineering

Data Visualization

Lab Prevalence in our cohort

Statistical Analysis

1. Summary Statistics

2. Mann-Whitney U test on different numerical lab groups