Jorge BM
Aug 11, 2024
Why is “Systems Biology” more relevant than you may think?
What is System Biology? Is it a mathematical, computer, or biological science? I must admit that I had difficulty coming up with a concise definition. But, after researching where this topic is expected to develop in the coming years, it appears that understanding the prospective application is required to comprehend its full significance. In this article, I will show you the future (if that is even possible) of the research goals, tools and applications of a field that, believe it or not, can use principles from gaming development into experimental biology.
“Systems Biology”, a dynamic meaning
To this day, the term “Systems Biology” could fit into a biology-based discipline. However, its interdisciplinary nature has opened its meaning as this field integrates multiple areas that converge into one common meaning: “a computational and mathematical analysis and modelling of complex biological systems” [1]. In other words, this field is set on a holistic view of different aspects of the intertwined molecular and biophysical variables that work together to perform a biological function. Rather than focusing on single molecular players, the goal is to create a model that incorporates several "actors" that are part of the system.
That should be enough to understand what Systems Biology is, shouldn’t it? Not exactly. Although even the meaning could be somehow defined, it is the applications that diverge the most. The main reason for this is very basic; Biological systems are complex, as there are numerous interconnected components in which the system operates across multiple scales, from molecular interactions to whole-organism behaviours. Also, its holistic and multidisciplinary nature integrates knowledge from biology, chemistry, physics, engineering, data generation, mathematics and computational systems. In fact, how many experimental biologists do you know that have all this knowledge? I dare to say that it is more difficult to find one of these than a unicorn in developing/designing. But, the complexity does not end there; while discussing systems, we must include their processes, which are not quite static in biological systems, as they vary over time in response to diverse stimuli. Therefore, capturing these dynamic processes is often difficult. Hence the multimodal data that needs to be integrated such as -omics, imaging, gene expression, blood tests, etc… is challenging to combine and analyse.
In this sense, acknowledging where Systems Biology comes from is just a piece of the puzzle that helps to understand that there is no easy way to tackle this field. Nonetheless, in my effort to comprehend its multifaceted purposes, and potentially make more accessible an already complex term, I followed the work of the US-based study carried out by Voit et al. (2023) “What’s next for computational systems biology?” sharing their vision for the future of computational systems biology, as this field has become a central methodology for biological and medical research [2].
Alphafold 3D protein (Google DeepMind - EMBL-EBI)
Where is “Systems Biology” going?
According to Voit’s group, Systems Biology brings an opportunity to advance in novel applications across biology, medicine, agriculture, environmental stewardship and other societal problems. Nonetheless, there are some conceptual challenges that need to be accounted for beforehand. The first one is related to the need to develop models that are so detailed and realistic that, in an effort to achieve the most accurate models to mimic biological processes, the inherent complexity of the original system is also captured, making it even more difficult to simplify those intended models to the most pertinent information. However, there are alternatives to this 1:1 realism in the shape of, for example, digital twins. This is an intermediary computational analogue of individuals that are targeting to create moderately complicated models of whole cells, tissues or organs. Actually, quite a few projects around the globe are developing virtual brains, virtual livers, virtual rats or even entire hospital systems to model hospitals’ capacity to recreate stress tests in operational COVID-19-liked situations [3–6]. But this is not only the territory for human health and metabolism, as also, agriculture is moving forward with digital twins on thematic applications areas such as crop monitoring, resource optimisation, cultivation support, livestock management or urban and controlled environment farming [7]. Nevertheless, despite the variety of areas applicable to systems biology, there is a second challenge that applies to all of them; the difficulty in defining, characterising and properly describing the most fundamental design and functioning aspects that regulate biological systems. This situation, in which biological systems have evolved and optimised over billions of years, is difficult to fathom. Let’s be honest, Mother Nature is quite thoughtful throughout governing biological systems. However, humans, seeing this significant disadvantage, are attempting to find clever ways to understand the patterns and structures that mimic a superior operating procedure, or, in other words, trying to look for the so-called “motifs”. In some ways, looking for patterns in biological laws is analogous to trying to comprehend physical laws (Gravity, mass versus weight, energy, Newton’s laws of motion, etc). Actually, there is a very relevant example of a biological law. These are the general principles of the genetic code. This “law” is especially relevant as this provides a consistent, predictive framework that applies across nearly all of life, forming a cornerstone of our understanding of molecular biology. So, imagine having a principles catalogue that can be applied to general or limited biological areas; that would be the dream, albeit a distant one so far.
Systems Biology needs you to view the individual cells, the individual tissue, the individual organ, and the individual organism, but more importantly, how they are connected.
How to actually use “Systems Biology”?
If we keep dreaming, certain ideas will eventually come true—that is, unless we wake up while falling from a building before hitting the ground. But, despite the concurring “falling into the void” moments in systems biology research, there is one emerging tool that might lead to a bright future for the field. This is Machine Learning (ML).
As you've already heard, Machine Learning employs enormous datasets to model, analyse, and understand complicated data, and what's more complex than biological data? Both machine learning and systems biology appear to be a natural fit, but the truth is a little more challenging.
To begin, Systems Biology is dealing with overfitting data size while training ML algorithms. Some researchers have noticed that many ML applications contravene the rule-of-thumb that a dataset should include at least ten times as many data points as the number of independent variables [8] . Let’s consider a theoretical example of predicting drug response in cancer patients. Suppose we want to use ML to predict how patients with a specific type of cancer will respond to a new drug. We have the following data:
Gene expression data for 100 genes
Protein levels for 50 proteins
10 clinical parameters (age, tumour size, etc.)
In total, we have 160 independent variables (features). But, according to the rule-of-thumb mentioned, we would need approximately 1,600 patients (10 times the number of features) to build a reliable ML model. However, in reality, we might only have data from 100 patients due to the rarity of the cancer type and the newness of the drug.
Another example works the other way around: suppose we wish to utilise machine learning to forecast the risk of inflammatory bowel disease (IBD) based on gut microbiota composition. In this example, we have significantly more data:
Microbiome composition (relative abundance of 1000 bacterial species)
Metabolomic data (levels of 500 metabolites)
Host genetic information (100 relevant Single Nucleotide Polymorphism (SNPs))
Environmental factors (10 variables like diet, stress levels)
The sum of this example would be 1,610 independent variables, therefore, we would need about 16,100 samples. But, in reality, we might have only data from 500 individuals due to the cost and recruitment challenges.
In this regard, in addition to sample sizes and independent variables, other factors such as data heterogeneity, generalisability, and biological complexity must be considered while employing ML. To address these challenges, researchers could use strategic feature selection using the most relevant independent variables, combine multiple models, transfer knowledge across related studies, integrate diverse biological data types, and carry out long-term observations to capture system dynamics over time. It will always rely on the conditions of the research settings and the requirements of the ML algorithm.
Nevertheless, there is a solution that caught my attention more than others. Maybe being an avid gamer helps to come to this association but, Voit et al. proposed using the gaming industry's AI and ML efforts as an example. For starters, a videogame always requires what is called a game engine to run the different elements integrated, and these game engines are usually built as a user-friendly environment that serves as a toolbox for game creation. In other words, a game engine integrates rendering, physics, scripting capabilities, sound integration, AI, streaming, networking, memory management, localization support, etc. Put differently, it is a complex system interconnecting multiple functionalities and modules. In this regard, Voit et al. suggests creating a “bioengine” that would function in Systems Biology similar to a game engine for a video game. This solution could have endless applications facilitating interactive visualisations, real-time simulations of biological phenomena, ecosystem modelling, or even creating a virtual lab environment for running simulated experiments. Now, despite what some may believe, this is not merely wishful thinking about futuristic uses; it is a reality in the AlphaFold (2023) project. Google DeepMind created an AI system that computationally predicts the 3D structure of 230 million proteins. This technology functions as a "bioengine" that can aid in understanding, among other things, how to discover the structure of SARS-CoV-2 viral proteins for innovative therapies, or how to characterise the structure of haemoglobin to create treatments for hereditary haemoglobin disorders. These breakthroughs have helped to reduce costs and time-consuming processes to figure out the exact structure of a protein as these can be generated from their 3D structure [9].
This image is a conceptual illustration of a Bioengine that would function in Systems Biology similar to a game engine for a video game
How can “Systems Biology” be applied?
Research will play a significant role in the progress of Systems Biology in the next years, but the field's applicability in tackling living-world problems will define its success. For this reason, Voit et al. selected the most promising developments today (Table 1).
The crucial role of interdisciplinarity in shaping the future of “Systems Biology”
In addition to the challenges, research tools and applications that I outlined thanks to the beautiful work of Voit et al. I want to highlight one additional component that will be critical in the future of Systems Biology. This reflection comes after I recognised "in situ" a series of challenges when working with AI-based technologies that use multimodal data and diverse disciplines. This is the "interdisciplinarity" between the complexity of the areas involved, and please note that I did not use the word "multidisciplinarity" since the former encompasses the holistic nature of integration in Systems Biology, whereas the latter overlooks the system's interconnectedness. However, I understand that research and experimentation in Systems Biology are a collection of snapshots through dynamic processes involving actors and processes from multiple disciplines for the same “tool”, which are often bounded by certain borders that must be crossed. That is why I emphasise the need for interdisciplinarity in Systems Biology, as bringing together specialists from other fields is a difficult innovation process. As a result, my recommendation when getting involved in Systems Biology is to gather experts with the necessary skills in data generation, computational systems, chemistry, engineering, physics, and chemistry, but don't forget about the individual or individuals with a more holistic vision who can serve as a link between them. To put it another way, Systems Biology needs you to view the individual cells, the individual tissue, the individual organ, and the individual organism, but more importantly, how they are connected.
Resources
1. Bizzarri M, editor. Systems biology. Second edition. New York, NY: Humana Press; 2024. ISBN:978-1-07-163577-3
2. Voit EO, Shah AM, Olivença D, Vodovotz Y. What’s next for computational systems biology? Front Syst Biol 2023 Sep 19;3:1250228. doi: 10.3389/fsysb.2023.1250228
3. Heidelberg Institute for Theoretical Studies. Virtual Liver Network. 2014. Available from: https://www.h-its.org/projects/virtual-liver/
4. National Center for Systems Biology. Virtual Physiological Rat. 2024. Available from: https://vpr.sites.uofmhosting.net
5. The Virtual Brain Foundation. Virtual Brain. 2023. Available from: www.thevirtualbrain.org
6. Khan A, Milne-Ives M, Meinert E, Iyawa GE, Jones RB, Josephraj AN. A Scoping Review of Digital Twins in the Context of the Covid-19 Pandemic. Biomed Eng Comput Biol 2022 Jan;13:117959722211021. doi: 10.1177/11795972221102115
7. Purcell W, Neubauer T. Digital Twins in Agriculture: A State-of-the-art review. Smart Agricultural Technology 2023 Feb;3:100094. doi: 10.1016/j.atech.2022.100094
8. Chicco D. Ten quick tips for machine learning in computational biology. BioData Mining 2017 Dec;10(1):35. doi: 10.1186/s13040-017-0155-3
9. Google DeepMind, EMBL-EBI. AlphaFold. 2020. Available from: https://alphafold.com/about