Using big data to examine the social factors affecting child health

Researchers at UC Berkeley School of Public Health, working with a multi-institutional team of experts led by Weill Cornell Medicine, have used a machine learning–based approach to create a comprehensive picture of the social determinants of child health that they hope will enable more successful interventions.

As reported Oct. 16 in JAMA Pediatrics, the researchers analyzed data on more than 10,500 American children, in communities across 17 U.S. states.

They quantified dozens of neighborhood-level social determinant factors for each child, among them: poverty rates, family unemployment, educational opportunities, access to quality health care, exposure to crime and drug sales, and other socioeconomic factors.

The team found that children exposed to socioeconomic deprivation had more mental health problems, lower cognitive performance, and adverse physical health than those in the other three clusters.

“This is a new approach to summarizing social determinants of health into more usable and interpretable clusters,” said Timothy T. Brown, associate adjunct professor of health economics at Berkeley Public Health, who was a study co-author. “Since social determinants of health necessarily occur in clusters, this approach will also help policymakers to see what specific clusters of policies are needed to address the specific problems of each area.”

Lonnie Snowden, Berkeley Public Health professor of health policy and management, whose research focuses on mental health care access and quality, was also a co-author; as was Julian Chun-Chung Chow, a UC Berkeley professor of social welfare.

“Social determinants are core public health research and practice concerns, but their number and diversity can sometimes be overwhelming,” Snowden said. “Underutilized, advanced methods permitted us to distill a handful of community profiles, each intriguingly characterized, from comprehensive social determinant lists.

“Children’s health diverged across many facets depending on the community of residence. This novel way of organizing social determinants ties them to community science and paves the way for better intervention targeting,” he continued.

Dr. Yunyu Xiao, an assistant professor of population health sciences at Weill Cornell Medicine and lead author of the study, was pleased with the results.

“A complex set of social factors can influence children’s health, and I think our results underscore the importance of using methods that can handle such complexity,” she said.

Prior studies in this field have tended to focus on narrow sets of socioeconomic variables and health outcomes, and typically have examined outcomes that are averaged over large geographic areas such as counties or states.

In the new study, the researchers took a different approach. They used machine learning techniques that allow relatively unbiased, fine-grained analyses of large datasets.

The dataset in the new study was generated by an ongoing, survey-based, National Institutes of Health (NIH)–sponsored project called the Adolescent Brain Cognitive Development (ABCD) Study. The cohort of 10,504 children were aged 9–10 in 2016, and were tracked through 2021. The sample’s ethnic and racial mix broadly reflected that of the U.S. as a whole.

In the analysis, each child’s record was scored on 84 different social determinant variables relating to educational resources, physical infrastructure, perceived bias and discrimination, household income, and neighborhood crime and drugs. The machine learning algorithm identified underlying patterns in the children’s social determinant profiles—and also looked for statistical associations between these patterns and health outcomes.

Dr. Chang Su, an assistant professor of population health sciences at Weill Cornell Medicine and another lead author of the study, said “Our approach is data-driven, allowing us to see what underlying patterns there are in large-scale, multidimensional social determinants of health data, without prior hypotheses and other biases getting in the way”.

A key finding was that the data clustered into four broad social determinant patterns: affluent; high socioeconomic deprivation; urban high crime and low level of educational attainment and resources; and high-stigma—the latter involving higher self-reported measures of bias and discrimination against women and immigrants and other underrepresented groups. White children were overrepresented in the affluent and high-stigma areas; Black and Hispanic children in the other two.

Each of the four profiles was associated with its own broad pattern of health outcomes, the “high socioeconomic deprivation” pattern being associated with the worst health outcomes. The other two non-affluent patterns were also associated generally with more adverse outcomes compared with the affluent pattern.

The study had some limitations, including the survey-based, self-reported nature of the ABCD data, which is generally considered less reliable than objectively measured data.

Also, epidemiological analyses like these can reveal only associations between social factors and health outcomes—they can’t prove that the former influence the latter.

Even so, the researchers said, the results demonstrate the power of a relatively unbiased, machine-learning approach to uncover potentially meaningful links, and should help inform future studies that can discover actual causative mechanisms connecting social factors to child health.

“This multi-dimensional, unbiased approach in principle can lead to more targeted and effective policy interventions that we are investigating in a current NIH-funded project,” Dr. Xiao said.


Additional authors: corresponding co-author Jyotishman Pathak, PhD, and Dr. Fei Wang, PhD, Weill Cornell Medicine; Dr. J. John Mann, Columbia University; Yu Hou, PhD, and Paul Siu-Fai Yip, PhD, The University of Hong Kong; and Dr. Alexander C. Tsai, Harvard Medical School.

Dr. Xiao is supported by the 2023 Google Research Scholar Program.