The identification of potent and selective lead molecules is the essential first step in any drug discovery research project. Historically, successful drug discovery has focused on a small number of so-called tractable target classes, including G-protein coupled receptors, ion channels, nuclear receptors, kinases, and other enzymes. Until the 1980s, lead molecules were identified through traditional medicinal chemistry approaches, typically through chemical modification of a known bioactive compound. The molecular biology revolution resulted in a huge increase in the number of putative target proteins for drug discovery. This was accompanied by the development of combinatorial chemistry methods to generate very large chemical libraries, which in turn was accompanied by the development of technologies for High-Throughput Screening (HTS) to enable the rapid and cost-effective screening of these large, often several million molecules in size, compound libraries against large numbers of drug targets. HTS, sometimes termed diversity screening, rapidly became embedded as a primary method for lead discovery within the pharmaceutical industry, and more recently there has been the transfer of this technology platform into the academic sector through the huge growth in academic drug discovery centers. Alongside the growth in HTS, advances in biophysics technologies and structural biology have led to the development of methods for the screening of small-molecule fragments and the complementary use of structural biology techniques to guide medicinal chemists in the optimization of such molecules.
The establishment of these technology platforms required huge investment in compound stores and distribution systems, screening automation and detection systems, assay technologies, and systems to generate large quantities of biological reagents to support fragment-based drug discovery and diversity screening. This investment led to the generation of novel, potent, and selective lead molecules, with appropriate physicochemical and safety properties, for many drug targets. However, there remain a significant number of drug targets for which the identification of novel molecules for use as target validation probes or as the starting points for the development of a drug candidate remains a major challenge. Existing compound collections have been built around the chemistry history of the field, and while successful at identifying lead molecules for the major target classes, in many cases these libraries have not successfully led to the generation of hit molecules for novel target classes or for so-called intractable target families. Advances in fragment screening have provided a mechanism for the design of novel molecules against protein targets, but while there have been recent advances in the development of such methods for screening membrane proteins, the implementation of this methodology remains in its infancy. As a consequence, there continues to be significant interest in the development of novel chemistries and compounds to enhance the quality of existing compound libraries, with a particular focus on physicochemical properties and lead-likeness, and in novel screening paradigms to enhance the overall success of lead discovery.
DNA-encoded library technology involves the creation of huge libraries of molecules covalently attached to DNA tags, using water-based combinatorial chemistry, and the subsequent screening of those libraries against soluble proteins using affinity selection. While DNA-encoded library technology was first described in the early 1990s, it is only in recent years that this technology platform has been considered as an attractive approach for lead discovery. This hugely valuable handbook provides a comprehensive review of the history and capabilities of DNA-encoded library technology. I will not attempt to review these here but would like to highlight the technology developments that have enabled this capability and the potential applications of DNA-encoded library technology as part of a broad portfolio of lead discovery paradigms.
As part of a broad portfolio of lead discovery paradigms, DNA-encoded library technology offers a number of attractions compared to other methods:
- DNA-encoded library selections require a few micrograms of protein; hence they do not require the investments in reagent generation and scale-up associated with other screening paradigms.
- A DNA-encoded library of 100 million or more molecules can be stored in an Eppendorf tube in a standard laboratory freezer; hence it does not require the investment in compound management and distribution infrastructure associated with existing small-molecule compound libraries.
- A DNA-encoded library selection can be performed on the laboratory bench, again avoiding the infrastructure investments required to support high-throughput screening or fragment discovery.
- As a consequence of the simplicity of a DNA-encoded library screen, it becomes possible to run multiple screens in parallel to identify molecules with enriched pharmacology. For example, selectivity can be engineered into hit molecules through the performance of parallel screens against the drug target and a selectivity target and the subsequent identification of molecules for progression with the required pharmacological profile.
- Through affinity-based selection, it is possible to identify molecules that bind to both orthosteric and allosteric sites within the same screen, thus identifying compounds with a novel mechanism of action.
- As a consequence of the use of combinatorial chemistry in library design, it is typical to gain deep insights into the structure–activity relationships of hit molecules generated in a DNA-encoded library selection.
- The combinatorial nature of DNA-encoded library chemistry enables the rapid exploration of new chemistries, leading to the tantalizing prospect that the use of such libraries may increase the success of lead identification for novel, and perhaps so-called intractable, target families.
Considering these attractions of DNA-encoded library technology, one can ask the question as to why the method has not become embedded within the field. The success of DNA-encoded technology relies upon the quality and diversity of the chemical libraries, the availability of next-generation DNA sequencing methods, and the development of informatics tools to identify high-affinity binding molecules from the library. Initially, the size and quality of DNA-encoded libraries were relatively poor, the molecules tended to be large and lipophilic and the libraries relatively small. To a large extent, this has been addressed through the ongoing development of new water-based synthetic chemistry methods, through improvements in library design, and through the availability of larger numbers of chemical building blocks. The ability to identify hit molecules in a DNA-encoded library screen relies upon the power of DNA sequencing to identify hit molecules. The revolution in DNA sequencing methodologies has dramatically reduced the costs and timelines for the analysis of the output of DNA-encoded library screens, enabling the sequencing of many hundred thousand hits for a few hundred dollars. Together with improvements in informatics, this has created a data analysis capability to rapidly understand screening data to identify molecules of interest. These developments are described in detail throughout this handbook. A final limitation to the application of this technology relies upon the defining nature of the selection paradigm. DNA-encoded library screens identify hit molecules through affinity selection. This requires that selections are performed on purified protein. While there have been some reports of the use of DNA-encoded library technology for screening of targets within a membrane or whole cell environment, the primary use of the technology has been for the screening of soluble protein targets, thus limiting the broad application of the platform for all target types.
Looking toward the future, one can anticipate an increasing acceptance of the value of DNA-encoded library technology as part of a portfolio of technologies, alongside high-throughput screening, structure-based drug discovery/fragment screening, virtual screening, and other methods for the generation of lead molecules for drug discovery. This handbook will provide an invaluable guide to scientists interested in learning, developing, and applying this technology.
Stephen Rees
2014
Vice President Screening and Sample Management
AstraZeneca, LLC