ST558_Project_3

Purpose of the repo

This repository has been created with the purpose of analyzing data related to one of the most widespread chronic diseases in the United States, Diabetes. The Diabetes Health Indicators Dataset encompasses over 200,000 survey responses, offering valuable insights into how this disease affects the health of millions of Americans and its impact on both individuals and the nation’s economy.

The data set includes information on various aspects such as the current diabetes status of respondents, their demographics, physical and mental health metrics, habits, addictions, diet, and more. These data hold the potential to predict an individual’s likelihood of having diabetes or being at a high risk of developing it, based on specific factors. Additionally, it aims to identify the key factors contributing to the risk of diabetes.

Furthermore, this analysis will be presented separately based on the level of education, allowing us to examine how this demographic factor influences the overall findings.

List of R packages

library(readr)

library(dplyr)

library(ggplot2)

library(caret)

library(rmarkdown)

library(purrr)

library(glmnet)

library(randomForest)

library(gbm)

library(Metrics) #For logLoss()

library(cvms)

library(rpart)

library(pls)

Automation code of different education level .md files

lapply(unique(diabetes_data$Education), function(Education.i) { rmarkdown::render(“Project3.Rmd”, params = list(Education = Education.i), output_file = paste0(Education.i, “.md”)) })

Links to .html files of the generated analyses

Analysis for Elementary

Analysis for Some High School

Analysis for High School Graduate

Analysis for Some Collage

Analysis for College Graduate