C proteins to fulfill a variety of functional needs has long been a goal of biochemists. This requires a thorough understanding of the relationship between the sequence of a polypeptide… Click to show full abstract
C proteins to fulfill a variety of functional needs has long been a goal of biochemists. This requires a thorough understanding of the relationship between the sequence of a polypeptide chain and the resulting protein structure. In recent years, the field of protein design has finally reached a stage where it is possbile to use physical and chemical principles to guild the design of novel protein structures. The goal of designing a protein structure is to produce an amino acid sequence that can fold into a target shape. To compute the sequence, most current methods explicitly model every atom in the system (with implicit solvent) to find a configuration that satisfies all of the interactions that each residue can make in its environment. While we are not yet capabale of using these methods to design proteins with any arbitrary function, our ability to create structures that significantly differ from those observed in structural databases has reached new heights. Protein design has become more robust with recent advances in computer processing power, design algorithms, and the decreased cost with DNA synthesis. These breakthroughs have provided the tools to run large-scale simulations, test design hypotheses, and experimentally iterate on and confirm designs. Nonetheless, the word “design” implies the involvement of cognitive activity in determining the outcome. This is arguably the most critical and least tractable element of the approach. Although any new amino acid sequence that can be generated rationally for a protein can be considered a design, in recent years, the meaning of designing a protein “de novo” has referred largely to designs in which both the structure and the sequence are modeled and created from scratch. When both the backbone and sequence are unknown at the onset, a protein designer must creatively choose a topology and construct the proper structural elements to form the backbone. A number of strategies to restrict the local backbone geometries to be native-like have been employed, for example, by borrowing true fragments from actual proteins to initiate the construction or extensively idealizing the peptide chain according to reliable chemical knowledge or parametric equations. While computer algorithms have largely automated specific steps of protein design, the protein designer still controls the process and makes certain that the resulting structures are coherent. But what decisions do human designers make that today’s automated algorithms do not? This question prompted the development of Foldit, a video game that applies a graphical user interface to the protein modeling suite Rosetta. In addition to serving as an excellent educational tool, Foldit aims to explore the strategies humans use to solve protein structure puzzles in hopes that these operations can be analyzed to improve or automate design algorithms. Foldit began with puzzles that challenged players to predict the folds of natural amino acid sequences (Figure 1A). Recently, it has been extended to allow players to modify previously designed proteins or design novel proteins from scratch (Figure 1B,C). There are three main components involved in the design of proteins: scoring metrics to guide the moves, strategies to change the structure, and sequence tweaks to improve models (Figure 1D,E). In Foldit, the latter two are controlled by human players. There is little difference between what a player may do compared to what a trained protein designer might because their objectives are the same: to follow the score provided by the force field as it is not possible to mentally follow the entire system of thousands of atoms. In a study published in Nature, Koepnick et al. let Foldit players design a folded peptide starting from a linear chain. Players were exceptionally good at exploring the conformational space, as seen in an early iteration of the game, where the players’ structures were truly novel and expressive. While many of these creative models would not likely fold to their target structures, the real implication of the crowd sourcing brilliance is that now every aspect of the Rosetta scoring function is being tested, and exploited, in unintended ways by the players to achieve a better score. Fixing scoring deficits identified by players will eventually make the scoring metric more robust. Indeed, in subsequent rounds, Foldit was configured to enforce packing and backbone regularization rules; remarkably, these improvements provided sound guidance, and the citizen scientists were able to design proteins at the same level of accuracy as expert designers who are trained in structural biology. Perhaps not surprisingly, with the imposition of build rules, the models produced in Foldit are no longer shockingly different from designs that trained experts have long been able to produce. However, for nonscientists to achieve these novel designs by simply maximizing the game score, the Foldit experiment shows that the scoring scheme (i.e., the Rosetta force field) must be remarkably robust. By specifying the secondary structure content required or other more general rules, the scientists behind Foldit also seem to be able to guide players into creating a wide variety of structures within specific folds. The quality of the models seems only as good as the rules set by the scientists. It will be fascinating to see how this interplay between knowledge-derived rules and human creativity can be harnessed to advance science. Automated computer algorithms today cannot carry out the design tasks like the human Foldit players; the calculations would take far too long to sample to produce a viable structure
               
Click one of the above tabs to view related content.