Accelerating Drug Discovery Research with New AI Models: a look at the AstraZeneca Cerebras collaboration
We live in an era of data at a scale beyond human comprehension. In research, the pace of data generation and new publication is so great, that it is impossible for any individual or group to track. This means that new findings or connections between disparate studies might lay undiscovered. This presents a significant opportunity for research with corollary costs to advancement of healthcare, and is particularly relevant in the pharmaceutical industry, where the need to leverage data for new drug discovery is urgent and complex.
The COVID-19 pandemic has further illustrated that the problem is not merely one of scale, but that speed is also a factor. The number of COVID-19-related papers skyrocketed from just 72 in January 2020 to 11,208 by July of that same year. To optimize research and identify therapeutic mechanisms more quickly for such emergent diseases, researchers need to be able to ask questions of such vast, rapidly-growing bodies of literature as they evolve, and get accurate responses to their queries in close to real time.
Artificial intelligence (AI) presents a potential solution to this challenge. AstraZeneca is unlocking new science insights with data science and AI, and in a recent collaboration with Cerebras Systems, have trained several large-scale Natural Language Processing (NLP) models to enable rapid, large-scale medical literature search, a critical capability for advancing drug discovery.
Much progress in AI over the past decade has been made using legacy, general purpose compute devices such as graphics processing units (GPU). While suitable, these machines are not optimized for AI work, and the industry has reached an inflection point at which incremental improvements in legacy processor technologies are no longer sufficient to meet the growing compute needs of larger and more complex models and datasets. The Cerebras CS-1 is a purpose-built AI computer system that lets researchers train AI models orders of magnitude faster than otherwise possible. This allows researchers and pharmaceutical companies to leverage this best-of-breed architecture to unlock entirely new capabilities for AI model training and data processing in healthcare.
Quoting Nick Brown, Head of AI Engineering at AstraZeneca: “Testing technologies like Cerebras on real use cases is important to understand what investments we need to make in AI strategically. Our goal is to allow researchers to iterate and experiment in real-time by running queries on hundreds of thousands of abstracts and research papers, a task that previously proved impractical, if not impossible.”
To address this goal, Cerebras trained multiple BERT[A1] [CR2] Large models from scratch on the CS-1 with large corpuses of biomedical texts (BERT = Bidirectional Encoder Representations from Transformers). These large NLP are often prohibitive for end users working with GPUs: such large models and datasets demand cluster-scale compute performance, and with a GPU cluster performance scaling becomes a major bottleneck as workloads increase. Programmability is also a challenge and impediment for traditional GPU cluster implementations. The complexity of placing work across many small GPU nodes, interconnecting these nodes, and modifying machine learning code to work in a distributed cluster often requires supercompute-style Machine Learning engineering expertise. The Cerebras CS-1, in contrast, concentrates all of its performance in a single system. Therein, the CS-1 provides the deep learning compute resources of a cluster with the programming ease of a single node, delivering greater performance and ease of use to achieve massive acceleration in time to solution. High-level results showed the CS-1 capability to train these large models matching or exceeding accuracy in just a fraction of the training time of GPU-based systems.
Cerebras’ revolutionary wafer-scale engine (WSE) is the primary processor aboard the CS-1. The CS-1 has 400,000 AI optimized cores, more silicon area than 56 GPUs, and more on-chip memory than 450 GPUs. But size and capacity are not the CS-1’s only advantage: unlike legacy, general purpose processors, the WSE is designed to deal with sparsity common to neural networks for deep learning by detecting it and scheduling workflows to ensure only non-zero values are multiplied and propagated across the network. This saves both time and power, and it’s part of why the CS-1 is able to handle workloads that are prohibitively expensive to train on other processors. Training the BERT Large model with sparsity reduced training time by an additional 20%. But perhaps more importantly in this early experiment, AstraZeneca found that models pre-trained with induced sparsity tend to perform better on the downstream tasks after fine-tuning.
“AstraZeneca is dedicated to innovating in AI and pushing the boundaries of machine learning to advance science,” said Brown. “Cerebras opens the possibility to accelerate our AI efforts, ultimately helping us understand where to make strategic investments in AI. Training which historically took over 2 weeks to run on a large cluster of GPUs was accomplished in just over 2 days — 52hrs to be exact — on a single CS-1. This could allow us to iterate more frequently and get much more accurate answers, orders of magnitude faster.”
The ability to train these complex NLP language models in reasonable periods of time will be critical to creating and maintaining effective records in the real-time biomedical databases of the future. There’s no doubt that better and more intuitive search systems, which understand the results scientists are searching for and are able to surface them more readily, are extremely helpful. By using Cerebras CS-1, it is possible to construct a more capable and much more accurate model than any GPU-based system could have delivered in an equivalent amount of time.