Connecting the Dots (and Finding New Ones)

“No pattern is an isolated entity. Each pattern can exist in the world only to the extent supported by other patterns..." Christopher Alexander

Jan 30, 2025

“In short, no pattern is an isolated entity. Each pattern can exist in the world only to the extent supported by other patterns: the larger patterns in which it is embedded, the patterns of the same size that surround it, and the smaller patterns that are embedded in it.” - Christopher Alexander

Connecting the Dots

My family has taken to working on the NY Times' tests of intuition, logic, and vocabulary. We compete on Wordle, Connections, and my favorite, Strands - a six-by-eight grid of letters that challenges us to connect the letters to form words that identify with the day's theme. The patterns are easily spotted when the letters connect left to right and more challenging when the sequence proceeds in random order and doubles back in the opposite direction. Working on these puzzles with a nine-year-old is humbling because he sees the pattern of words in the grid that I find inscrutable.

Searching for patterns is an essential activity in day-to-day living, and perhaps nowhere is it more critical than in science and policy-making. It is how we seek the causal relationships that help us make predictions, guide decision-making, and much more.

Big Mechanism Program

In 2014, the Defense Advanced Research Projects Agency (DARPA) funded the ambitious Big Mechanism Program to define a systems biology model for RAS-driven cancers. RAS cancer is caused by mutations in the RAS genes, which are responsible for cell signaling and are one of the complex pathways of cancer biology. The core idea was to read the literature with existing theories about cancer biology, assemble the salient fragments found in individual papers into causal models, and then reason with the resultant models.

Established pathways of metabolism and physiology, including known enzymatic processes, were mapped out to identify gaps. Computer algorithms searched massive amounts of biomedical literature to identify previously unrecognized information to fill in the gaps and discover extensions of enzymatic pathways.

Ultimately, the initiative succeeded in developing technology to read and extract mechanistic information from biomedical literature. However, synthesizing the information gleaned from the extraction process proved challenging, and the program had limited success in generating hypotheses that might have led to cancer treatments.

At a subsequent talk to the National Academy, Paul Cohen, author of the paper describing the Big Mechanism Program, observed:

So here's a challenge for Artificial Intelligence: Help humans to model and manage the world's complicated, interacting systems.

As a follow-up to the Big Mechanism Program, Paul Cohen initiated the World Modelers Program to develop technology integrating qualitative causal analyses with quantitative models and relevant data to understand complicated, dynamic national security questions. The initial application is to develop an understanding of food insecurity by incorporating dozens of contributing models, including climate, water availability, soil viability, market instability, and physical security.

Models in Global Health

Just as the challenge of food insecurity has many complex and intricately related causal factors, addressing complex global health challenges requires a holistic understanding of many interacting factors.

In global health, specialization in science and technology has yielded numerous detailed models of complex patterns of disease and interventions. For example, the catalog of models for investigating malaria disease patterns and treatment includes epidemiological models to simulate the transmission dynamics of malaria, economic models to assess the cost-effectiveness of malaria control strategies, climate models to predict alterations in mosquito habitats and behavior, genetic models to study the evolution of drug resistance, health system models to ensure adequate supplies of antimalarial regimens, and pharmacology models to explain patterns of antimalarial drug disposition, efficacy, and safety.

Refining and integrating these models is a key obstacle to practical, sustainable intervention strategies. If adequately harnessed, artificial intelligence promises to be a critical tool in this work.

Retrieval-Augmented Generation (RAGS) in Global Health

Large language models (LLMs) may produce incomplete or inaccurate responses to a given query when training data lacks comprehensive coverage of specific topics and contains biased or out-of-date information. Additionally, these deficiencies can cause LLMs to generate hallucinated answers—responses that sound plausible but are factually incorrect or entirely fabricated.

Retrieval-augmented generation (RAGS) is a framework for improving the quality of LLM responses by supplying external sources of knowledge to supplement the LLM’s internal representation of information.

Incorporating disease models into a RAGS framework would ensure that the LLM has access to current, reliable facts. It would allow LLM claims to be checked for accuracy and referenced against the source information, rendering the results explainable and auditable. These are critical attributes if LLMs are to contribute to problem-solving in clinical pharmacology and regulatory science.

Foundational pharmacokinetic and pharmacodynamic models have been developed based on many years of animal and human studies. These models are remarkably helpful in understanding the relationships between anatomy, physiology, and drug characteristics, including a drug's absorption, distribution, metabolism, excretion, and its effects on physiology and disease.

As information about a new drug's characteristics and clinical trial outcomes becomes available, the models are updated and become essential decision-support tools across the research and development lifecycle. They guide decisions about investing further in the drug's development, support regulatory reviews, and optimize dosing based on patient characteristics.

Incorporating these models into a RAGS framework for LLMs presents an opportunity to address the challenges of introducing new drugs into the worldwide market while adding new information to disease-drug models.

For example, when dolutegavir, an antiretroviral medication used in combination with other medicines to treat HIV/AIDS, was introduced into the clinics in Uganda, a marked increase in the prevalence of hyperglycemia among patients was noted.

My colleagues and I wrote a prompt for an LLM asking about risk factors and possible mechanisms for dolutegavir-induced hyperglycemia. Unfortunately, the responses were superficial and merely reflected the product label. Repeated attempts to gain more mechanistic insight were not productive.

Indeed, more data from dolutegavir clinical trials, including blood glucose levels, insulin response, drug concentrations, and subject demographics, would be helpful, as would the existing models of glucose-insulin homeostasis, drug pharmacokinetics, and drug distribution to putative sites of action. Furthermore, depending on the context of use, e.g., clinics in Uganda, information specific to the target population, including nutritional status, environmental exposures, genetic influences, and so forth, will be necessary.

It is difficult to imagine synthesizing this information without the combined efforts of human insights, expert curation, the RAGS framework for enhancing the LLM, and clinical expertise to interpret and assess the clinical validity of the results.

Finding New Patterns and Perspectives

“We are searching for some kind of harmony between two intangibles: a form that we have not yet designed and a context that we cannot properly describe.” - Christopher Alexander

Using models as feedstock for artificial intelligence presents several vital opportunities. First, reading the literature with models enables refinement to reflect the contexts of interest. Second, using the models as input to LLM provides a framework for integrating models from different domains. Third, synthesizing multiple models may be the key to inaccessible inspiration at the interface between domains.

Lastly, there is a notion that we have reached the limits of human knowledge and that this will hamper the future evolution of AI, to say nothing of what it implies for societal development. This sentiment ignores the detritus of knowledge left on the cutting room floor after we have “finalized” our models. That knowledge and the constant flow of new information await its opportunity to contribute and take us beyond the perceived boundaries of human thought.

Sometimes, the fresh eyes of a 9-year-old are helpful in spotting patterns. Other times, the emerging artificial intelligence tools may prove invaluable.

Velocity Made Good

Discussion about this post