Visualizations

The five visualizations below are designed to move from a broad overview of risk factor patterns down to specific patient-level comparisons. Each chart builds on the previous one to tell a connected story about which factors are most associated with pulmonary disease in this dataset.

Visualization 1 — Indicators Most Associated with Pulmonary Disease

How to interact: Hover over any bar to highlight it and see the indicator name, type, and exact correlation value. Unselected bars fade to make the comparison easier to read.

Takeaway

This chart shows which indicators are most strongly associated with pulmonary disease in the dataset. Smoking stands out as the strongest relationship, followed by smoking family history and symptom-related variables such as throat discomfort and breathing issues. In contrast, age and gender show very weak relationships in this dataset, suggesting that lifestyle and symptom indicators are more informative than demographic variables in our current analysis.

Visualization 2 — Disease Outcomes by Top Risk Factors

How to interact: Use the dropdown menu to filter by a specific factor. Hover over any bar to see the exact proportion, raw count, and total group size. "Yes" and "No" indicate whether the patient has that risk factor.

Takeaway

This stacked bar chart compares the proportion of patients with and without pulmonary disease across the three strongest risk factors: smoking, family history, and throat discomfort. In all three cases the patients with the risk factor show a noticeably higher disease rate than those who do not. Smoking shows the most dramatic difference, with over half of smokers having pulmonary disease compared to under 10% of non-smokers. This confirms that lifestyle and symptom-related factors are meaningful predictors of disease outcome in this dataset.

Visualization 3 — Symptom Prevalence Among Disease-Positive Patients

How to interact: Hover over any cell to see the symptom name, age group, prevalence rate, and raw patient counts.

Takeaway

This heatmap shows the prevalence of each risk factor among disease-positive patients only, broken down by age group. Smoking and breathing issues are consistently the most prevalent symptoms across all age groups, appearing in over 90% of disease-positive patients regardless of age. Family history and stress-related factors show lower prevalence overall. The chart reveals that symptom profiles are fairly consistent across age groups, reinforcing the finding that lifestyle factors rather than age are the stronger predictors in this dataset.

Visualization 4 — Age vs Oxygen Saturation by Disease Status

How to interact: Use the dropdown to filter by smoking status. Hover over any point to see the patient's age, oxygen saturation, disease status, and smoking status.

Takeaway

This scatter plot shows the relationship between age and oxygen saturation for individual patients, colored by disease status. Disease-positive patients tend to cluster at slightly lower oxygen saturation levels compared to disease-negative patients, regardless of age. This suggests that oxygen saturation is a meaningful physiological signal associated with pulmonary disease in this dataset. The dropdown filter allows the viewer to isolate smokers or non-smokers to see whether the pattern holds within each group.

Visualization 5 — Risk Factor Profile: Disease vs. No Disease

How to interact: Hover over any dot on the radar chart to see the exact factor name, group, and prevalence rate.

Takeaway

This radar chart provides a broad overview of how the overall risk factor profile differs between disease-positive and disease-negative patients across all nine factors simultaneously. The red polygon representing disease-positive patients is consistently larger than the blue polygon, particularly for smoking, breathing issues, and throat discomfort. Family history and stress-related factors show smaller differences between the two groups. This side-by-side shape comparison makes it easy to see at a glance which factors separate the two groups the most and which ones are more similar.

Conclusion

Our analysis suggests that smoking is the strongest indicator associated with pulmonary disease in this dataset, with smoking family history and several symptom-related variables also standing out. Lifestyle and symptom indicators appear to be more informative than demographic variables such as age and gender, which show much weaker relationships in the current analysis.

The heatmap further confirms this by showing that symptom profiles among disease-positive patients remain fairly consistent across age groups, reinforcing that age alone is not a strong differentiator. The scatter plot adds a physiological dimension, showing that disease-positive patients tend toward slightly lower oxygen saturation levels regardless of age, suggesting that measurable health signals may be more useful than demographic factors for identifying at-risk patients.

The radar chart ties these findings together by showing the full risk factor profile side by side — the shape difference between disease-positive and disease-negative patients is driven primarily by smoking, breathing issues, and throat discomfort, while factors like family history and stress show smaller but still visible gaps between the two groups.

Taken together, these visualizations show that pulmonary disease risk in this dataset is best understood through a combination of lifestyle behaviors and physical symptoms rather than demographics. Future work could explore building a predictive model using these factors or validating these patterns against real clinical data. Because this dataset is synthetic, all findings should be interpreted as patterns within the data rather than medical proof.