Accepted Papers


Estimation of Obesity Levels based on Lifestyle Factors using Computational Intelligence

Juan Piero Santisteban Quiroz, Faculty of Systems Engineering and Informatics, Universidad Nacional Mayor de San Marcos, Lima, Peru

ABSTRACT

Obesity is a disease that affects the health of men and women, and in recent decades it had an increasing trend, the WHO estimates that “by the year 2030 more than 40% of the world population will be overweight and more than a fifth will be obese”. Consequently, researchers have made great efforts to identify early the factors that influence the generation of obesity. There are tools limited to the calculation of BMI, omitting other relevant factors such as: if the individual has a family history of obesity, time spent on exercise routines, genetic expression profiles and other factors. In this study, a computational intelligence model is created, based on supervised and unsupervised data mining techniques such as Light Gradient Boosting Machine (Light GBM), Extreme Gradient Booting (XG Boost), random forest (RF), decision tree (DT), Extremely Randomized Trees (ET) and logistic regression (LR) to identify obesity levels based on lifestyle. In this research, the main source of data was a study of 2.111 people from the countries Colombia, Mexico and Peru, aged between 14 and 61 years. The study takes a set of data related to the main causes of obesity, starting from the objective of referring to the "high caloric intake, the decrease in energy expenditure due to lack of physical activity, eating disorders, genetics and socioeconomic factors" [1]. The results show that the Light GBM classification model has the highest weighted value of AUC (0,99854), improving the results of previous studies with similar antecedents.

KEYWORDS

Obesity, Light GBM, XG Boost, random forest, decision tree, extremely randomized trees, logistic regression, data mining, AUC, ROC.