Selective Labeling of Proteins with Chemical Probes in Living Cells

Michael Z. Lin, Lei Wang


Selective labeling of proteins with small molecules introduces novel chemical and physical properties into proteins, enabling the target protein to be investigated or manipulated with various techniques. Different methods for labeling proteins in living cells have been developed by using protein domains, small peptides, or single amino acids. Their application in cells and in vivo has yielded novel insights into diverse biological processes.

Methods for selective labeling of proteins with chemical probes and their application to various biological processes in living cells.

The fusion of a fluorescent protein to target proteins has revealed a wealth of information on many biological systems over the last decade. Compared with fluorescent proteins, small molecule probes offer a wider range of property and functionality, since they can be derivatized with fluorophores, haptens, or reactive chemical groups. To attach small molecule probes to proteins, new methods have emerged in recent years involving fusion of protein or peptide tags followed by either noncovalent or covalent binding of small molecule probes to the tags. These probes are designed with functional groups at positions that will not interfere with binding to the tag. A different set of techniques allow specific protein labeling with functional groups with single amino acid precision. These techniques include chemical modification, native chemical ligation, and genetic incorporation of unnatural amino acids. These new approaches differ from earlier biochemical techniques in allowing greater precision of labeling, either in location on the protein or in time, and in allowing use in or on live cells. In this review, we will describe and compare these new methods for labeling proteins with chemical probes and highlight recent reports applying these methods in live cells.

Protein Domain Tags

Many protein domains, as part of their natural functions, interact with small molecules in a highly specific manner (FIGURE 1). For example, enzymes catalyze reactions by mediating the precise positioning of substrates in their active sites. Similarly, small molecule enzyme inhibitors such as natural toxins and therapeutic drugs exert their effects by binding tightly to active sites. Specialized proteins are also chemically modified in detoxification processes. Each of these types of interactions—enzyme-substrate, target-inhibitor, and scavenger-toxin—has been exploited to attach chemical probes to proteins of interest with high specificity.


Classes of protein domain or peptide tags used for chemical labeling of fusion proteins
A: domain tags self-label either covalently or noncovalently. B: peptide tags either self-label with varying degrees of reversibility or utilize enzyme-mediated covalent attachment. The standard methods of fluorescent protein or epitope tags are shown for comparison. Examples of tags of each class are listed to the right.

Covalent attachment of chemical probes has the advantage of greater irreversibility compared with noncovalent binding. A modified 33-kDa bacterial dehalogenase domain has recently been developed to perform efficient covalent self-labeling. Dehalogenases break down haloalkanes in a two-step process involving alkyl transfer to an active site Asp side chain (and the loss of the halide ion) followed by hydrolysis of the alkyl-enzyme bond. The hydrolysis step requires an active site His residue (59), mutation of which preserves the alkyl-enzyme conjugate. An optimized hydrolysis-deficient halogenase (marketed as Halotag, Promega) can be specifically labeled using cell-permeable haloalkanes bearing fluorophores or biotin (39). Since eukaryotes lack dehalogenases, background staining in cells is very low.

Before the use of dehalogenase, Johnsson and colleagues (32) were the first to react small molecules with a protein domain for labeling purposes, using the 21-kDa O6-alkylguanine-DNA alkyltransferase (AGT). AGT recognizes guanine bases alkylated on the O6 position and transfers the alkyl group to one of its reactive cysteine side chains, regenerating guanine. Cell-permeable O6-benzylguanine derivatives bearing fluorophores, biotin, haptens, and even other protein ligands have been used to label AGT fusions (32). A mutant AGT with increased reactivity for free benzyl-guanine derivatives and decreased binding to DNA has been designed (25) and is available along with a variety of probes from Covalys. The 50-fold faster reactivity of the mutant vs. wild-type AGT allows specific visualization of fusion proteins even in the nuclei of cells expressing endogenous AGT (31). Also, although wild-type AGT changes conformation upon alkylation to allow its degradation (13), mutant AGT fusions have been followed for days without unexpected signal decay, implying that degradation may not occur in fusions or with mutant AGT (31). AGT has proven to be a reliable labeling method in time-lapse studies of centromere and sarcoplasmic reticulum protein movements in living cells (31, 69).

Enzymes and irreversible active-site inhibitors, i.e., suicide substrates, can serve as protein-ligand pairs for covalent labeling without further protein engineering as long as the inhibitors do not react with endogenous enzymes. In the only example of this concept to date, fluorophore-conjugated analogs of the suicide substrate p-nitrophenyl phosphonate (pNPP) were used to label proteins fused to the fungal cell wall enzyme cutinase (5). However, because pNPP is cell-impermeant, this system is only suitable for labeling surface proteins.

Although less desirable for long-term studies, protein domains can also be labeled noncovalently with chemical ligands. This was first demonstrated using fluorescent derivatives of methotrexate to visualize fusions to mammalian dihydrofolate reductase (DHFR) in DHFR-deficient cells (29). Recently, Cornish and colleagues created a variety of cell-permeant derivatives of the antibiotic trimethoprim for visualizing fusions with the 18-kDa E. coli DHFR (7). Trimethoprim shows high affinity to E. coli DHFR (Kd = 1 nM) and minimal (Kd = 3 μM) binding to mammalian DHFR (66) and, unlike methotrexate, is not cytotoxic. A similar method arose from research on the immunosuppressant FK506 and its 12-kDa protein target FKBP12. The FK506 analog SLF’ binds with 94 pM affinity to the FKBP12(F36V) mutant but not endogenous FKBP12, allowing specific labeling of FKBP12(F36V) fusions in cells by SLF’ derivatives (43). A protein domain that allows labeling specifically in vesicles or the cell surface is a single-chain antibody, either selected for binding to a small molecule that can be conjugated to a fluorophore (18) or selected for the ability to increase the quantum yield of fluorescent probes upon binding (71). The latter tag, like fluorescent proteins, is restricted to optical readouts but allows for fluorophores with superior optical characteristics or for membrane-impermeant fluorophores for selective labeling of surface proteins.

The ever-growing list of enzyme-inhibitor pairs characterized by the pharmaceutical industry may serve as a source for future labeling systems. Suitable pairs are those that exhibit orthogonality and versatility, i.e., neither enzyme nor drug has effects in mammalian cells, and the inhibitors tolerate functional derivatization while retaining reactivity. Although most interactions are noncovalent, some well established drugs such as certain β-lactam antibiotics are suicide substrates and may serve as the basis for future covalent labeling systems. Another route to finding new protein-chemical pairs may be the engineering of antibodies that recognize a desired chemical structure. This provides for covalent as well as noncovalent interactions, since antibodies can be engineered to react with ligands (10, 78). The recent availability of methods for selecting camel antibody single-chain variable domains makes possible the isolation of relatively small (13 kDa) domains, which may retain function in the cytosol in some cases (63).

Small Peptide Tags

Protein domains mediate chemical labeling with high specificity, but their size and folding requirements generally restrict their use to the NH2 and COOH termini of proteins of interest. Smaller tags have less potential to disrupt protein folding or function and in certain cases can be placed in loops or domains within proteins. The inherent challenge with peptide chemical labeling pairs is molecular recognition in a small space, i.e., the designing of an interface that is tight and specific and yet on the protein side is encodable within a few amino acids.

The first and most extensively used system for specific labeling of a peptide by functionalized small molecules is the tetracysteine-biarsenical system. The basic components, rationally designed by Griffin and colleagues (24), are peptide tags with the consensus sequence CCXXCC and chemical probes, such as the green fluorophore FlAsH or the red fluorophore ReAsH (1), with two arsenic atoms positioned so that each arsenic atom bonds with two thiol groups on adjacent cysteines. These arsenic-sulfur bonds are covalent but mediated by outer valence orbitals and are reversible in the presence of excess thiols. The bidentate nature of the complex results in extremely high (~10 pM) affinity vs. interactions that involve only one arsenic atom (1). Recently, sequences surrounding the CCXXCC core have been optimized for improved ReAsH and FlAsH affinity, with their effects presumably being mediated by increased hydrophobic contacts between peptide and ligand outside of the arsenic-sulfur bonds (45). In being isolated from peptide library screen for a specific binding activity, these sequences may be considered to have characteristics of aptamers (i.e., a short peptide sequence evolved for a specific binding activity; see below).

The tetracysteine-biarsenical system has been used with a wide array of biarsenical-conjugated functional groups. Aside from their usual use as fluorophores for visualization, FlAsH and in particular ReAsH can generate reactive oxygen species for light-assisted protein inactivation in functional studies (42, 75) or diaminobenzidine precipitation in electron microscopy (21). The tetracysteine-biarsenical system has also been extended to attach chemical cross-linkers (38) and calcium sensors (74) to particular sites on proteins. Recently, a variation of the tetracysteine-biarsenical system has been developed that uses a fluorophore with more widely spaced arsenic atoms and a peptide with sequence CCKAEAACC (8). These components do not interact with those of the traditional tetracysteine-biarsenical systems, and therefore the two systems can be used in the same cell in an orthogonal manner.

The primary drawbacks of the tetracysteine-biarsenical system are background labeling and toxicity. CCXXCC motifs are not found in the genome, but multiple proteins contain motifs that differ from CCXXCC by only one cysteine, and substantial labeling of cytoplasmic proteins can occur to various degrees depending on cell type (23, 41, 70). Typically, high micromolar concentrations of dithiols are used during labeling to reduce background accumulation and afterward as a destaining step (23, 41), although these reagents reduce the effective affinity of the desired interaction to the nanomolar range (1). FlAsH and ReAsH may also bind to cellular components through nonspecific hydrophobic interactions (23). The system is therefore most useful for overexpressed or highly concentrated proteins of interest, such as actin concentrated on polysomes (62) or viral proteins during assembly (64). Furthermore, potential toxicities from residual off-target binding of arsenic or the dithiol antidotes themselves remain a concern (2, 35). These drawbacks, as well as the need for long labeling and destaining times, are likely to make use of the tetracysteine-biarsenical system in thick tissues or animals difficult.

Following the introduction of the tetracysteine-biarsenical system, other strategies for specifically attaching chemical probes to peptides have been developed. These fall into three categories: rationally designed peptide-probe pairs, peptide aptamers evolved for binding to chemical probes, and peptides from natural biosynthetic pathways that serve as specific sites for covalent attachment of chemical groups by enzymes. In analogy to the tetracysteine system, Tsien and colleagues developed a method for labeling polyhistidine peptides with a complex of two zinc ions and a zinc-chelating dye called HisZiFit (26). This interaction exhibits moderately tight affinity (~40 nM) but is currently limited by suboptimal dye characteristics. In a similar concept, the zinc chelator DpaTyr binds to a string of four Asp residues with 1.4 μM affinity and can be attached to various fluorophores (54). This low-affinity interaction can be converted into a covalent one by placing appropriate reactive chemical groups in the peptide tag and in the chelator probe (51). However, applicability inside eukaryotic cells has yet to be demonstrated for these zinc-chelating probes and may be difficult given their large sizes and therefore poor membrane permeability, especially for DpaTyr derivatives. Furthermore, zinc is required at micromolar levels but is maintained in the cytosol at only picomolar levels (6).

As an alternative to the rational design, other groups have turned to in vitro selection of aptamers from randomized libraries. A 14-aa aptamer has been evolved to bind lanthanide atoms (20), which can be visualized by luminescent or spectroscopic methods. One study took advantage of the small size of the lanthanide-binding tag to estimate distances between insertion sites on extracellular regions of proteins by measuring resonance energy transfer (65). A potentially more generalizable aptamer is TR512, recently isolated by Nolan and colleagues with impressively strong (25 pM) affinity to the central three-ring core of the dye Texas Red (44). This structure can be derivatized while retaining binding, as demonstrated inside cells with a cell-permeable form of the Texas Red-based calcium indicator X-Rhod-5F (44). Labeling with other Texas Red derivatives has yet to be performed, and there is a possibility that the rather hydrophobic Texas Red core will exhibit nonspecific binding, but the TR512-Texas Red system does theoretically feature desirable characteristics of high affinity, flexibility, self-labeling, and intracellular compatibility.

In a different approach from self-labeling, a variety of enzymes from microbial biosynthetic pathways have been adapted as tools to conjugate small molecules to specific peptide sequences. Conceptually, this solves the issue of specificity encoding in a peptide by requiring that specificity determinants reside on both the peptide substrate and in the active site of the conjugating enzyme. The best characterized among these enzymes is biotin ligase, which attaches biotin onto the biotin carboxyl carrier subunit of acetyl-coA decarboxylase. A 67-aa mammalian biotin acceptor sequence supports biotinylation by endogenous cytoplasmic biotin ligases (55) and may also allow biotinylation of secreted proteins by endogenous processes despite the lack of known biotin ligases in the secretory pathway (73). The E. coli biotin ligase BirA can also specifically conjugate biotin or biotin analogs onto a lysine residue in the natural 75-aa substrate or in a 15-aa biotin acceptor peptide (BAP) identified by peptide library screening (4, 12). BAP is not an acceptor for eukaryotic biotin ligases, so its biotinylation is completely dependent on BirA coexpression. As expected for its small size, BAP can function as an acceptor when inserted into protein domains, but efficiency is affected by variability in access or recognition by BirA (12). Long-term lack of toxicity has been demonstrated by the generation of transgenic mice expressing BirA and BAP-tagged protein (15). Transplanted cells expressing biotinylated proteins on their surfaces can be detected in living mice after intravenous injection of streptavidin-conjugated imaging reagents (73).

Assuming efficient access of biotin ligase to its small molecule co-substrates ATP and biotin, the kinetics of BirA biotinylation of BAP are acceptably fast for labeling a large proportion of tagged proteins in an acceptable amount of time. Conjugation rates are governed by the equation [enzyme][substrate]kcat/Km when [substrate] << Km, a safe assumption given Km values are in the hundreds of micromolar for the BirA interaction with BAP or intact protein substrates (4). When measured on BirA (preloaded with activated ATP-biotin) and the BAP substrate, kcat/Km is ~1 per micromolar per minute (4). Assuming concentrations of several micromolar for both enzyme and peptide, initial conjugation rates will also be several micromolar per minute. Therefore, this reaction is efficient enough to label a substantial fraction of unreacted peptides over minutes under ideal conditions.

Biotinylation on cell surfaces is made feasible by the lack of endogenous extracellular biotinylated proteins and by access to streptavidin-conjugated reagents for downstream assays (28, 56). Intracellular use, on the other hand, requires probes that can be conjugated to target peptides preferentially over native biotin. Biotin analogs containing chemically reactive groups such as ketones, azides, and alkynes, which are not present in mammalian cells, may allow specific chemical reaction with a variety of appropriately activated compounds in the cytoplasm. However, several parameters must be improved to make this labeling pathway robust enough for routine use. For one, adequate orthogonality with endogenous biotinylation mechanisms needs to be achieved. Ting and colleagues have characterized an alkyne-containing biotin analog that is not utilized as a substrate by mammalian biotin ligases but is accepted by yBL with only a modest loss in catalytic efficiency (68) and therefore can be attached to a yBL-specific acceptor peptide (9). Likewise, an azide analog is only utilized by Pyrococcus horikoshii biotin ligase (PhBL), although this proceeds at a 1,000-fold slower rate than biotin, and no small acceptor peptide is yet available (68). Still, neither the yBL-alkyne nor PhBL-azide system is currently orthogonal as endogenous biotin in the cell would still serve as a superior substrate, so further mutation may be desired to reduce cross-utilization of native biotin. It is also not clear to what extent biotin analogs are able to enter cells through the same transport mechanisms utilized by biotin. A final limitation is that the subsequent steps of performing chemistry on azides and alkynes in the cytoplasm of living cells using currently available reagents are rather inefficient (3). Improving and adapting biotinylation for intracellular protein tagging will therefore continue to be an interesting challenge for biochemists and chemists.

A similar system using the E. coli lipoic acid ligase LplA, which is structurally related to biotin ligase and functions analogously to attach a lipoic acid prosthetic group to pyruvate dehydrogenase, has also been developed by Ting and colleagues and may provide an easier path to orthogonality (19). Impressively, both a 22-aa LplA acceptor peptide (LAP) and an azide-containing lipoic acid probe were successfully engineered through rational design, although LAP is not as efficient a substrate as the native LplA substrate protein E2p, and the azide probe shows a relatively high Km for LplA (127 μM vs. 1.7–4.5 μM for lipoic acid). In the presence of high probe concentrations, attachment of the azide probe to extracellularly presented LAP proceeds as quickly as biotinylation. Because mammalian cells confine pyruvate dehydrogenase to the mitochondrial matrix (57), LplA-mediated labeling in the cytoplasm should be possible with minimal background labeling or toxicity. Indeed, some lipoic acid derivatives, including the azide probe, are able to enter cells (A. Y. Ting, personal communication). But intracellular probe concentrations are still lower than can be achieved extracellularly, so efficient labeling inside cells may require further evolution of small peptide substrates and/or the LplA enzyme.

Other transferases capable of attaching chemical groups to peptides are limited to extracellular uses for the foreseeable future. The Bacillus subtilis phosphopantetheinyl transferase (PPTase) Sfp catalyzes the transfer of a phosphopantetheinyl group from the metabolic intermediate CoA to bacterial acyl carrier proteins (ACPs) or the related B. subtilis carrier protein (PCP). Another PPTase, E. coli AcpS, functions similarly on a subset of ACPs. Both PPTases tolerate the presence of various functional substituents on the terminal thiol of phosphopantetheinyl and have been successfully used to mediate the covalent of functionalized phosphopantetheinyl ligands from CoA derivatives to ACPs and PCPs, e.g., for multicolor sequential fluorophore labeling or for attaching fluorophores to extra-cellular regions of proteins (46, 76). Yin and Walsh have identified 12-amino acid sequences that serve as specific substrates for Sfp and AcpS with kcat/Km values of 0.19 and 0.015 per micromolar per minute, respectively, in saturating CoA concentrations (86). The former rate is about 12 times slower than the native substrate protein PCP (85) and several times slower than biotinylation by BirA but is simpler in not requiring ATP as a co-substrate. It is currently limited to extracellular tagging because CoA is cell impermeant, and intracellular use would require proper conjugation of functional groups to CoA by biosynthetic pathways.

Other systems for extracellular tagging include the bacterial enzyme sortase, which cleaves the Thr-Gly amide bond in LPXTG sequences and ligates the new COOH terminus to pentaglycine peptide derivatives (58), but the large polar surface areas of these peptide analogs prevent cellular entry. Transglutaminase catalyzes amide bond formation between Gln side chains within Gln-rich substrate sequences and primary amines of chemical probes but shows poor specificity and can only function outside cells due to ion requirements (19, 37). Farnesyltransferases can mediate attachment of a variety of farnesyl derivatives to the cysteine residue in the COOH-terminal sequence CVIA, but the presence of endogenous targets again precludes use inside cells (17).

Protein and Peptide Tag Comparisons

The covalent labeling reactions mentioned above have made possible some unique applications. One such application is multicolor pulse-chase labeling of proteins synthesized at different times, as has been demonstrated using tetracysteines, AGT, and dehalogenases (21, 33). Other applications include stoichiometric purification of fusion protein complexes from lysates under stringent conditions (25, 39) and long-term visualization of single protein molecules, either on cell surfaces by biotinylation and visualization with streptavidin-conjugated quantum dots (28), or on cell surfaces by PPTase-mediated Cy5 labeling of an ACP tag (30), or in vitro by dehalogenase-mediated fluorophore labeling (60).

Each of the discussed methods has unique advantages and disadvantages (Table 1). Dehalogenases and AGT mediate specific and irreversible self-labeling with sensitive chemical probes, allowing detection of protein expressed within endogenous levels, but are large. Tetracysteines are capable of stable self-labeling with a large variety of chemical probes at various locations within proteins, but toxicity and background are common issues. Aptamers allow self-labeling and flexible insertion sites in proteins, but they have yet to be evaluated in multiple contexts. Enzymatically mediated conjugation is highly specific and irreversible but requires that the tag be in a location on the protein that is accessible to enzyme, at least transiently. However, requisite co-expression of an enzyme may be a feature rather than a complication depending on the experimental design, e.g., it could allow confinement of labeling to particular times or tissues via restricted enzyme expression. Each system also has a different set of requirements in terms of extracellular or lumenal or intracellular environments and a different set of compatible ligands. Currently, none of the characterized enzyme-peptide conjugation systems functions efficiently in the cytosol, leaving AGT, dehalogenase, and tetracysteine as the only covalent labeling methods suitable for intracellular use. Finally, the fusion of tags requires testing that structure or function is not perturbed. Larger tags are more likely to be a problem, especially when inserted at locations other than the NH2 and COOH termini (5), and there are examples where protein localization is altered by a 30-kDa fluorescent protein but not by the smaller tetracysteine tag (2, 27). However, even COOH-terminal tetracysteine tags may alter protein localization depending on linker sequences (64).

View this table:
Table 1.

Comparison of protein and peptide tags

Single-Residue Labeling on Reactive Side Chains

To investigate the role of a specific amino acid residue inside target proteins, it is desirable to modify the protein on selected single residues. Residue-specific protein labeling can minimize perturbation by the label on target protein folding and function, and provide greater locational flexibility in attaching biophysical probes, e.g., to report protein conformation changes or protein-protein interactions. Chemical modification, native chemical ligation, and genetic incorporation of unnatural amino acids have been used to label proteins on single residues.

Cysteine is the most frequently used residue for selective chemical modification due to its relatively low abundance in proteins and the increased nucleophilicity of the thiol group relative to other amino acid side chains. Cysteine can be introduced at any desired sites or removed from unwanted sites through site-directed mutagenesis, provided such mutations do not affect the protein under study. The residue is then modified through disulfide bond exchange or alkylation reactions. This method has been widely used to study the structure and function of proteins in vitro (80). For intracellular research, the labeling procedure can be carried out in vitro, and the modified protein is introduced into cells via microinjection. For instance, Hahn et al. labeled the p21-binding domain with fluorescent dye Alexa 546, which is injected into fibroblasts to image the Rac activation (34). Similarly, labeling of a WASP domain, which binds to activated Cdc42 only, with a dye sensitive to polarity followed by microinjection enables the visualization of the Cdc42 activation inside living cells (50). This approach has even been extended to whole animal studies. For instance, an agonist linked by an azobenzene moiety is attached to an introduced Cys on the ligand-binding domain of ionotropic glutamate receptor directly in HEK293 cells (77). Upon photoisomerization, the agonist is reversibly presented to the binding site to control the channel activity. This strategy was later applied in neurons in zebrafish larvae to manipulate the firing of neurons through light (72).

The Cys side chain can be exploited to attach a variety of tags, biophysical probes, or analogs of posttranslational modifications. A recent study installed methyl-lysine analogs into recombinant histones, so that the site and degree of methylation could be specified (67). These analogs preserve the function of methylated lysines and thus provide convenient means to study how lysine methylation affects chromatin structure and function. Nonetheless, chemical modification relies on the unique reactivity of an amino acid side chain. The intrinsic selectivity is low unless no Cys is present or unwanted Cys can all be mutated in the target protein. To label buried residues, the protein may need to be denatured to increase the accessibility and then renatured for functional assay. For intracellular and in vivo labeling, a reduced environment is necessary for Cys labeling, and careful controls need to be performed to exclude complications resulting from labeling other proteins. Thus the overall selectivity and efficiency are limited.

Introduction of Chemical Modifications by Native Chemical Ligation

Native chemical ligation and its extension, expressed protein ligation, chemically couple two or more peptide fragments into a full-length protein (14, 47). The COOH-terminal α-thioester of one peptide reacts with the NH2-terminal Cys of another peptide to form a native peptide link, leaving no chemical artifacts behind (FIGURE 2). Both the thioester peptide and the N-Cys peptide can be prepared either recombinantly or synthetically. Desired modifications can be introduced in the synthetic peptide fragment by using chemical synthesis and incorporated into the target protein after ligation. This method has been successfully used to introduce various probes and modifications for studying protein structure and function in vitro (48). In one elegant intracellular study, a palmitoylated (PalFar) and a hexadecylated (HDFar) Nras protein were prepared using native chemical ligation and microinjected into MDCK cells (61). The hexadecyl group mimics the properties of the palmitoyl group but is not cleavable by enzymes catalyzing de- and re-palmitoylation. Although the PalFar-Nras rapidly traffics from the plasma membrane to the Golgi, the HDFar-Nras accumulates nonspecifically in cellular membranes, suggesting that depalmitoylation is directly involved in Nras trafficking.


Principle of native chemical ligation
Peptide A with a COOH-terminal thioester group reacts with peptide B with an NH2-terminal Cys residue through thiol/thioester exchange, resulting in a thioester-linked initial product. This product then undergoes an intramolecular rearrangement and gives the final polypeptide that is linked by a native peptide bond. Multiple modifications can be introduced into either peptide through chemical synthesis.

The native chemical ligation method has the power to introduce diverse structures that are synthetically accessible, and multiple modifications can be incorporated in one synthetic peptide. Nonetheless, because peptides longer than 50 residues are difficult to synthesize using solid-phase peptide synthesis, appropriate sites for cleavage and ligation must be chosen carefully, and modifying internal sites in large proteins is cumbersome. It is also possible to ligate the peptides in living cells. A synthetic fragment can be injected into cells to react with an endogenously produced protein fragment (22). For cell biology and in vivo studies, microinjection of either the in vitro ligation product or the synthetic fragment for intracellular ligation can be a drawback. New methods for protein delivery in cells are awaited to circumvent this issue (49).

Site-Specific Incorporation of Unnatural Amino Acids

Site-specific incorporation of unnatural amino acids (UAAs) into proteins is another powerful technique to selectively modify and label proteins. Schultz et al. developed a general in vitro biosynthetic method in which an amber suppressor tRNA is chemically acylated with an UAA (52). The tRNA and the target gene containing an amber stop codon at the desired mutation site are then added to cell extracts that support transcription and translation. The tRNA recognizes the amber codon and incorporates the attached UAA. A variety of UAAs have been incorporated into proteins using this approach, regardless of the incorporation site or protein size, and have been applied to a large number of problems in protein chemistry (11). An extension of this method involves the microinjection of the chemically acylated tRNA and amber codoncontaining mutant mRNA into Xenopus oocytes (53). The endogenous oocyte protein synthesis machinery supports translation and incorporation of the UAA. This change enables the structure-function studies of integral membrane proteins, which are generally not amenable to in vitro expression systems (16, 40). In practice, chemical acylation of the suppressor tRNA is technically demanding. In addition, acylated tRNA is consumed stoichiometrically and cannot be regenerated in cells or cell extracts, which leads to low expression of the target protein.

Genetically encoding an UAA in a manner similar to that of common amino acids would enable site-directed mutagenesis in living cells with UAAs. A general method to expand the genetic code to include UAAs involves the generation of a new tRNA-codon-synthetase set that is specific for the UAA and does not cross talk with other sets for common amino acids (79, 80) (FIGURE 3). The new synthetase is evolved to charge specifically an UAA onto the new tRNA. This tRNA recognizes a codon that does not encode any common amino acids (e.g., a stop codon or an extended codon). When expressed in cells, the new tRNA-synthetase pair enables the UAA to be site-specifically incorporated into proteins at the unique codon with high fidelity and efficiency. Genetically encoded UAAs can directly introduce modifications in proteins to facilitate the investigation of biological processes in native settings (FIGURE 4). For instance, p-carboxymethyl-l-phenylalanine is an analog for phosphotyrosine but is resistant to hydrolysis by protein tyrosine phosphatases. When it is used to replace Tyr701 in human signal transducer and activator of transcription-1 (STAT1), the resulting mutant protein is constitutively active, which dimerizes and binds to the same DNA sequences as phosphotyrosine701 STAT1 (84). In a second example, protein phosphorylation in yeast is controlled by a photocaged serine, 4,5-dimethoxy-2-nitrobenzylserine (DMNB-Ser). This UAA blocks phosphorylation and is incorporated at different phosphoserine sites in the transcription factor Pho4. Upon photodecaging (λ = 405 nm), serine is regenerated and subsequently phosphorylated, triggering the nuclear export of Pho4 (36). In another example, interacting proteins have been photocross-linked in live cells. UAA p-benzoyl-l-phenylalanine carries the photocross-linker and is incorporated into human GRB2 at positions proximal to the ligand-binding pocket. Upon light exposure (λ = 365 nm), the protein is covalently linked to the epidermal growth factor receptor. Lastly, to probe the inactivation mechanism of the K+ channel Kv1.4, UAAs with extended side chains are incorporated into Kv1.4 expressed in HEK239 cells. By comparing the K+ currents of the mutant channels, it was found that the bulkiness of residues in the inactivation peptide is essential for fast channel inactivation, a finding that had not been possible using conventional mutagenesis (82).


Genetically encoding unnatural amino acids (UAAs) in live cells
A new set of components consisting of an orthogonal tRNA, an orthogonal aminoacyl-tRNA synthetase, and a unique codon is required. The synthetase specifically aminoacylates the UAA onto the orthogonal tRNA only. During protein translation, the acylated tRNA delivers the UAA in response to a unique codon that does not encode any of the common 20 amino acids. The UAA cannot be a substrate for the endogenous synthetases and needs to be efficiently transported into the cytoplasm when added to the growth medium or be biosynthesized by the host cell. Various modifications can be site-specifically introduced into proteins in the format of an UAA or by labeling the UAA side chain via an orthogonal reaction.


Examples for the application of genetically encoded UAAs
A: p-Carboxymethyl-l-phenylalanine is a nonhydrolyzable mimic for phosphorylated tyrosine. B: photocaged serine is used to control when the target serine is regenerated and phosphorylated. C: photocross-linker p-benzoyl-l-phenylalanine covalently locks interacting proteins. D: O-methyl-l-tyrosine extends the side chain length of tyrsoine by 0.9 Å, which impedes the inactivation peptide of Kv1.4 from threading through its side portal, abolishing the fast inactivation of the ion channel.

In addition to introducing modifications directly, UAAs with unique reactive groups can be incorporated as a chemical handle for further protein derivatization, providing new labeling reactions that are orthogonal to those of endogenous amino acids. The ketone, azide, acetylene, and thioester groups have all been genetically encoded and used to modify proteins in vitro (83). The ketone group has also been incorporated into the membrane protein LamB in E. coli and directly labeled with membrane-impermeable dyes. However, intracellular labeling is still challenging, as chemical reactions have not been well developed that occur efficiently under mild cellular conditions and are fully orthogonal to cellular components.

Over 40 UAAs have been genetically incorporated into different proteins in bacteria, yeast, and mammalian cells, and the list is growing (81, 83). These amino acids contain chemical reactive groups, photoreactive groups, biophysical probes, or moieties mimicking posttranslation modifications. It should be possible to generate stable cell lines or transgenic animals capable of inheriting UAA alterations for long-term studies. However, this method cannot be used to incorporate UAAs that are toxic or incompatible with the protein biosynthesis machinery (e.g., d-amino acids). Encoding the UAA with a stop codon may lead to UAA incorporation into other proteins whose genes end with the same stop codon, which may bring about complications and needs to be investigated. To fully exploit the power of these UAAs, efforts are required to improve their incorporation efficiency in eukaryotic cells, to develop extended codons or unnatural base codons for the incorporation of multiple UAAs simultaneously, and to expand the method to multicellular organisms.


The last few years have seen an explosion in new techniques for site-specific chemical labeling of proteins. These techniques span a wide range in terms of submolecular precision, temporal control, labeling specificity, and ease of implementation. No one system excels in all characteristics, and the choice of system involves finding the best fit for the experimental problem. The techniques with the longest history, AGT, tetracysteines, biotinylation, cysteine modification, native protein ligation, and UAA incorporation, have already demonstrated their utility in cultured cells in a variety of questions. For methods that are completely genetically encodable, we expect it will not be too long before some are applied to study proteins in their physiological contexts through the generation of transgenic animals. Meanwhile, ongoing work promises to increase the quality and number of protein chemical labeling techniques.


M. Z. Lin acknowledges the support of the Jane Coffin Childs Foundation. L. Wang acknowledges the support of the Searle Scholar Program and the Beckman Young Investigator Program.


View Abstract