This is one of the projects that I’m most passionate about. As we look to solve the world’s diseases, we don’t have a good way to collate information from the primary research literature. Having this information organized in a standardized way could help us quickly make sense of research that’s been done and what research should be done for each disease. This post puts forward a framework to organize information for diseases.
The framework here uses type 1 diabetes (T1D) as an example, and has four key components:
Problem with the disease
Genetic factors
Environmental factors
A model of the disease
Problem with the disease
For each disease, it’s not clear why we should invest. A database would bring together information that includes where current therapies are falling short, the individual burden of the disease (what life looks like for a patient with the disease), and the societal burden of disease. This would also bring together evidence that there is hope for better treatment.
In type 1 diabetes, the primary issue is that insulin is not enough. Conventional treatment alone doesn’t provide normal blood sugar levels, and higher blood sugar levels are associated with an increased risk of mortality. Other issues: treatment is inconvenient, the incidence of type 1 diabetes is on the rise, and although the lifetime cost savings of curing T1D is $423B, less than $0.5B per year is being invested to study the disease. These issues are depicted in the figures below.
Evidence for hope in type 1 diabetes: devices that allow more precise dosing of insulin, immune therapies that prevent the disease in animal models, research that implicates vitamin D and the microbiome as targets for prevention.
Having this information spelled out creates i) an emotionally compelling rationale to invest in the disease and ii) a scientifically compelling argument for why there is hope for solving the disease.
Genetic factors
Genetic association data gives clues as to how a disease develops. Because genetic information in a person doesn’t change, it is possible to use genetic information to infer genes that are responsible for disease cause or progression. Although genetic association data is becoming more readily available, it so far has not been summarized in a way that easily shows what genes have been implicated for each disease.
A database would ideally collate genetic data to visualize this in a more accessible way. An example for how genetic association data has been visualized in type 1 diabetes is included below.
Genomic loci implicated in T1D.
50 hits have been identified in T1D. Shown is their overlap with other autoimmune diseases. Circle size denotes significance of the association (p value). Color indicates whether causal genes have been assigned and the strength of evidence for their assignment.
Environmental factors
Knowledge of the environmental factors associated with a disease can tell us how to prevent the disease. Having this information in one place can tell us which factors are involved, and give an overall snapshot of research that has been done so far.
Ideal would be to list the environmental factors and the strength of their association. Since this information is not available for most diseases, an alternate way is to list the factors and the number of publications around each, which is a start to organizing this information and developing a priority order of factors to look into.
Environmental factors in T1D.
Candidate environmental factors and the number of times they appear in a Web of Science search combined with “type 1 diabetes.”
Model for the disease
A model for a disease is an illustration of how the disease develops. Combining this information into a single visual can reveal gaps in our knowledge and highlight new research directions. In type 1 diabetes, assembling this information through literature reviews and expert interviews helped us identify gaps in our understanding of the disease and put forth new research questions to address. By creating this visual, we could see more clearly where our knowledge was lacking and what might be good areas for investigation.
Model for T1D development:
This model describes the current theories for how T1D develops, including genetic and environmental factors relevant at each stage.
Summary
The aim of this post is to provide a framework to organize information in every disease, to create a database where the primary research is accessible. This document provides a way to collate data for each disease in a standardized and scalable way, and can be systematically updated as new research becomes available.