Crystal structure of prototype foamy virus (PFV) protease-reverse transcriptase fusion (PR-RT) reveals conformational plasticity: implications for function

Proteolytic processing of the retroviral Pol polyprotein precursor produces protease (PR), reverse transcriptase (RT), and integrase (IN), except in foamy viruses (FVs) where only the IN domain is released. Here, we report the 2.9 Å resolution crystal structure of the mature PR-RT from prototype FV (PFV) needed for processing and reverse transcription. The monomeric PFV PR exhibits similar architecture as the HIV-1 PR but the Nand C-terminal residues are unstructured. A Cterminal extension of the PR folds into two helices that supports the RT palm subdomain and anchors the PR next to the RT. The subdomains of RT: fingers, palm, thumb, and connection, and the RNase H domain, are connected by flexible linkers and spatially arranged similarly to those in the HIV-1 RT p51 subunit. Significant spatial and conformational domain rearrangements are required for nucleic acid binding. This offers structural insight into retroviral RT conformational maturation and architecture of immature enzymes. . CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted November 24, 2020. ; https://doi.org/10.1101/2020.11.23.395087 doi: bioRxiv preprint


Introduction
Foamy viruses (FV), which remain the most ancient retroviruses, share conserved genome organization and architecture of structural and enzymatic proteins with their evolutionarily related retrotransposons (Hutter et al., 2013; E. G. Lee et al., 2013;Linial, 1999;Yu et al., 1999).
Synthesis of polyproteins from polycistronic mRNAs before proteolytic processing by cognate protease(s) into functional entities, is a unique feature of many pathogens, including RNA viruses, retroviruses, and retrotransposons (Baldwin & Linial, 1998;Roy & Linial, 2007;Shin et al., 2012;Yu et al., 1996). Among the numerous proteins encoded in the retroviral genome, the architecture of the enzymes reverse transcriptase (RT) and protease (PR) remain perhaps the most conserved (Ding et al., 1998;Nowak et al., 2014;Tozser, 2010) with a common mechanism of nucleic acid synthesis and proteolytic processing even when their sequence identity is low. Lentiviruses such as HIV-1 synthesize their enzymatic proteins as a Pol polyprotein precursor which is subsequently proteolytically processed into mature enzymes, as part of a Gag-Pol fusion from the same mRNA using a -1 translational frameshifting. Meanwhile, Spuma retroviruses such as the prototype foamy virus (PFV) synthesize their Gag and Pol polyproteins from separate mRNAs alternatively spliced from their genomic RNA (Lochelt & Flugel, 1996;Pfrepper et al., 1998). FV polyproteins undergo only minimal proteolysis during maturation, where the integrase (IN) is the only domain processed from the Pol, by cleavage between the RNase H domain and IN, while a 3-kDa C-terminal peptide is processed from the Gag polyprotein (Yu et al., 1996). The mature PR-RT therefore contains an N-terminal PR and C-terminal RT, both of which are functional [reviewed in (Wohrl, 2019)].
The limited proteolysis of Gag results in infectious foamy virus (FV) particles that are morphologically indistinguishable from the immature lentiviral capsids, though this minimal processing is absolutely required for infectivity in FVs (E. G. Lee et al., 2013;Yu et al., 1996).
The RT of FVs is monomeric in solution and is responsible for copying the (+) FV ssRNA genome into dsDNA for integration into the host chromosome. This process occurs largely during assembly and budding in producer cells concomitantly resulting in infectious viral particles containing dsDNA (E. G. Lee et al., 2013;Rethwilm & Bodem, 2013;Yu et al., 1999).
The aspartyl PR, which is responsible for the proteolytic processing of polyproteins from retroviruses, is functional as a homodimer (Katoh et al., 1989). Hence the PR embedded in the PR-RT must dimerize to activate the protease. The mechanism(s) by which this happens remains under investigation, and potentially involves viral RNA (Hartl et al., 2010;Wohrl, 2019). Many structures of retroviral PRs both alone and in complex with relevant substrates and inhibitors are available in the literature, which help to explain mechanisms of catalysis and inhibition. Similarly, several retroviral RT structures and their complexes with nucleic acids and inhibitors have also provided important structural blueprints that elucidate the mechanism of catalysis, inhibition, and drug resistance. Structural information on the domain organization of Pol polyproteins has eluded researchers so far, resulting in a lack of clear understanding of the mechanism by which retroviral PRs dimerize (in the context of the Pol polyprotein) which allows it to cleave other proteins including itself into functional entities.
The asymmetric conformational maturation of heterodimeric HIV-1 RT, with the p66 (catalytic) and p51 (scaffolding) subunits, remains largely an unsolved puzzle in retrovirology. However, there are strong suggestions (albeit with limited structural evidence) that monomeric HIV-1 p66 would adopt a more thermodynamically stable p51-like fold in solution, occasionally sampling the open conformation seen in HIV-1 p66 in the heterodimer, prior to homodimerization and subsequent maturation (Wang et al., 1994;Zheng et al., 2014;Zheng et al., 2015). In an effort to shed light on many of the missing pieces in the retroviral structural biology puzzle, a systematic investigation of the Pol polyprotein of PFV has been carried out and we report here the crystal structure of the full-length PFV PR-RT fusion polyprotein at 2.9 Å resolution. This structure defines the structural organization of FV PR-RT and offers insight into the potential mechanism of dimerization of PR in the context of a polyprotein, thereby offering a glimpse into the initial events leading up to the proteolytic processing of polyproteins. The PFV RT domains are arranged relative to each other similarly to the structure of HIV-1 RT p51. This compact p51like conformation is consistent with predictions in the literature and defines the structural organization of a monomeric RT precursor such as monomeric HIV-1 RT p66 subunit prior to isomerization to a p51-like conformation. This structure to the best of our knowledge is the first functionally relevant retroviral Pol polyprotein providing a framework for investigating other retroviral polyproteins. This structure will serve as a useful model for investigating the sources of high fidelity and processivity of FV RT especially as FV based vectors for gene therapy (Erlwein & McClure, 2010;Lindemann & Rethwilm, 2011;Trobridge, 2009) are becoming increasingly very popular. It is also anticipated that it will be a key platform for the design of new therapeutics against HIV and other retroviruses.

Engineering a proteolytically resistant mutant of PFV PR-RT for bacterial expression
Wild-type PFV PR-RT is highly susceptible to proteolysis when expressed in bacterial strains (Boyer et al., 2004). The breakdown products complicate the development of purification schemes that result in good quality pure protein needed for crystallographic studies. The extra steps required to obtain useful materials make purification laborious and terrible yield of final product notwithstanding the need for large doses of protease inhibitors and reducing agents. Shared composition between the full-length protein and the proteolytic products further complicate structural and biophysical studies. For crystallization, these impurities could selectively crystallize, crystallize alongside the full-length protein or get incorporated into the crystal lattice of the full-length protein and create lattice defects and/or structural heterogeneity that could reduce the resolution significantly. Each of these scenarios has deleterious consequences for obtaining crystals suitable for X-ray crystallographic studies.
Instability of proteins due to the composition of their primary sequence has been known for a long time (Guruprasad et al., 1990). In an effort to improve the proteolytic stability of PFV PR-RT [with a protease null mutation (D24A), designated wild-type (WT)], in silico mutagenesis of the primary sequence of WT PFV PR-RT was carried out by replacing each amino acid at each position with the other 19 and computing an instability index resulting from such substitution on the ExPASy ProtParam server (http://web.expasy.org/protparam/) using a Python script; see details in Materials and Methods. The instability index computed is a weighted sum of the dipeptide composition of proteins and is based on a strong correlation found to exist between stability of proteins and their dipeptide composition. It is therefore used as a qualitative measure of the in vivo stability of proteins. Proteins with instability estimates below 40 are considered stable while those above that threshold are considered unstable (Guruprasad et al., 1990). The instability index for the WT protein was computed to be 39.75 which is very close to the upper limit of the stability index. H507 and S584 were identified as the instability hotspots in the primary sequence where several other amino acids were preferred at these positions than the residues themselves. Two mutations, H507D and S584K, which computationally produced the most stable protein when substituted at these positions, were selected based on protein stability and biophysical considerations, in addition to C280S which is known to improve the solution behavior of HIV-1 RT. This PR-RT mutant C280S/H507D/S584K, designated CSH, resulted in at least a 4-fold increase in expression compared to the WT in bacterial strains and could be purified in two steps (nickel affinity and heparin) to at least 95% purity as judged from SDS-PAGE analysis without any protease inhibitors or reducing agents (Fig. 1).

Enzymatic assays
Changes to the primary sequence of proteins can have unintended consequences on the protein structure and/or the activity. To verify whether the mutations have had any impact on the enzymatic activity of the protein, the polymerase as well as the ribonuclease activities were assayed for the WT protein and the mutant using nucleic acid substrates. Compared to the WT, the CSH mutant showed only a modest reduction in polymerase activity while the ribonuclease activities were very similar. Hence while the ribonuclease activity remained similar to the WT (Supplemental Fig. 1). The processivity which measures how long polymerases engage their nucleic acid substrates enabling them to carry out consecutive dNTP incorporation into a growing chain was also measured. It was observed that compared to the WT, the processivity of the mutant had decreased by about 5-fold (Supplemental Fig. 1). This suggests that the mutations decrease the ability of the enzyme to engage the nucleic acid for long periods without significantly affecting its catalytic ability. The rationale could be that replacement of His507 with an Asp, a reversal of positive charge to a negative charge could result in nucleic acid backbone repulsion.

Architecture of PFV PR-RT
The aforementioned higher purity and yield of the PFV PR-RT CSH mutant versus the WT enabled broad robotic crystallization screening. Optimization of a crystallization condition eventually yielded suitable crystals that diffracted X-rays to around ~3 Å resolution. The condition found for the CSH mutant was successfully applied to crystallize WT PFV PR-RT. Both the PFV PR-RT CSH mutant and WT crystallize as a monomer with a single molecule per asymmetric unit in space group C2; the structures are very similar (RMSD =0.785 Å), and unless indicated the results described pertain to the WT structure. The structure was solved by using the anomalous signal from the selenomethionine (SeMet)-substituted CSH mutant, and the model obtained from the SeMet data was used as the search model in molecular replacement with the WT data (Table 1 and Even though the sequence homology of PFV with other retroviruses is generally low, typically less than 30%, the individual structural elements of retroviral polymerases and proteases are highly conserved (Fig. 2). Notable differences however exist in terms of relative domain positioning and arrangements, detailed in the subsequent sections, which offer potential insights into their unique functionalities and subtle differences in substrate binding specificities, processivity and fidelity.

PR and PR-CTE
Seven beta-sheets (β1'-β7'), two short helices (αA' and αB'), and random coils define the structure of the monomeric protease (Figs. 2D-E). The well folded single unit of the protease which functions as a homodimer exhibits the closed barrel-like core domain constituted predominantly by β-sheets similar to a monomeric protease of HIV-1. Residues 1-4 which form part of the dimerization interface in the mature PR inferred from mature HIV-1 PR structures and residues 5-9 remain unstructured with no defined electron density for the first six residues in these structures The hairpin formed by these strands forms the tip of flap (residues 50-57) which opens and closes to enable substrate binding and release, respectively. Consistent with its function, this region is disordered with less well-defined electron density in this structure. β5' (residues 58-68), which traverses almost the entire length of the PR core, is connected by a short loop (residues 69-70) at its C-terminus to β6' (residues 70-80), which defines the exosite that forms the hinge, together with the mid-section of β5', allowing the tip of the flap to open and close to permit substrate binding and release. β6' (residues 70-80) is however twisted away from β5' in a way that creates a gap between the flap region and the active site loop which forms the substrate binding cavity upon dimerization (Fig. 2). The large P1-loop which lines the walls of the active site connects β7' (residues 85-87) and β6', which form the only parallel sheet in the structure.
A short coil and a short helix (half-turn, αB') further link the C-terminus of β7' to a long random coil that span the C-terminus of the protease domain. The only helix in HIV-1 PR is at the Cterminus. The C-terminus of the PR, which together with the N-terminus forms a major portion of the dimerization interface, is also unstructured without well-defined density in our structure.
Whether these would form beta sheets upon dimerization as observed in HIV-1 is unknown. The first 100 residues of PFV PR-RT are sufficient to define the entire protease as observed in HIV-1 (Ishima et al., 2003).
Additional residues at the protease C-terminus have been observed through multiple sequence alignments (Supplemental Fig. 2). The residues 101-143, which were initially not known to be part of either PR or RT, are hereby designated as the protease C-terminal extension (PR-CTE). While  Two PR-RT structures positioned in silico by two-fold symmetry generates a dimeric PRs in a configuration poised for catalysis without extensive steric clashes with the RTs (Fig. 4), suggesting that this structure could be proteolytically active. While this does not offer insights into distal interactions within the RTs, it offers a glimpse of a plausible mechanism of PR dimerization and how the dimeric PR can be accommodated between the two polyproteins. Extensive interaction between helices F and G of the RT are expected to further stabilize the dimeric protease for catalysis. The substrate-binding groove of the PR generated in silico appears much wider than that of HIV-1 (Fig. 4), which offers a plausible reason why HIV-1 PR inhibitors such as tipranavir, darunavir, and indinavir (Hartl et al., 2010) do not inhibit PFV PR. The possibility of steric clashes between the inhibitor and side chains of residues close to the active site cannot be ruled out.

RT Domain
Retroviral RTs and other polymerase structures solved resemble a right hand and so the domains in these proteins have been historically named after the corresponding segments of a hand. Even though the RT domain of the structure of the PFV PR-RT does not resemble the right hand in its β2 and β3 define a hairpin that points towards the putative nucleic acid binding cleft reminiscent of the β3-β4 hairpin in HIV-1 p66 implicated in dNTP binding (Fig. 3).
The sheets which harbor the conserved catalytic motif YVDD (314-317), together with the primer grip (355-368), constitute the polymerase active site (Fig. 5). The primer grip, observed in other RTs as a short β-hairpin, is a somewhat large loop pointing towards the active site hairpin. It must be pointed out that amino acid residues that form the primer grip hairpin loop are conserved in PFV, MoMLV, XMRV, and HIV-1. We surmise that this flexible primer grip architecture may reduce the constraints imposed by rigid β-sheets in other RTs, which would enable easier primer orientation during nucleic acid synthesis and thus contribute to the higher processivity of PFV compared to other RTs.
Supplemental Figure 2. Secondary structural alignments of some retroviral PRs and RTs with PFV generated using the program Clustal.
It has been observed that the PFV RT mutant V315M, which mimics the active site of HIV-1 RT, leads to 50% loss of polymerase activity in virions, with no observable full-length cDNA detectable in transfected cells and the processivity greatly reduced (Boyer et al., 2004;Rinke et al., 2002). From a structural standpoint, replacement of valine with the bulky side chain of methionine could potentially distort the positioning of the primer grip, hence slowing down the rate of nucleotide incorporation needed to complete cDNA synthesis. The conserved residue D254, situated at the tip of β4 near the polymerase active site loop formed by β7, β8, house D316 and 317 which collectively constitute the catalytic aspartate triad (β9, β10 of HIV-1) (Fig. 5). While helix F (E in HIV-1 RT) is oriented ~100º relative to helix G (F in HIV-1 RT) above the active site, without the kinks observed in HIV-1 RT, helix G packs against strands 4 and 8 similar to RTs. However, the thumb (378-449), characterized by a three-helix bundle, is moved away from the palm and positioned next to the connection subdomain (450-590), making extensive contacts with it (Fig. 6).
The connection domain, characterized by five-stranded mixed beta-sheets stabilized by five helices, is snuggled between the fingers/palm and the RNase H subdomain/thumb on opposite sides and tilted slightly towards the polymerase active site, sitting almost parallel to the palm of the RT compared to its position in HIV-1 p66 and other RTs where they are orthogonal (Fig. 2).
This differs from its position in HIV-1 p66 and other RTs, where it lays orthogonal. This compact fold allows helix L (K in HIV-1 RT) to pack against H, while helix O (L in HIV-1 RT) tilts towards the active site loop positioning T552 to form a hydrogen bond with Y314, T556 with H236, and H507 forms a bifurcated hydrogen bond with S237 and N364 (Fig. 7). The H507D mutant employed in initial studies also makes these contacts and crystallizes much faster compared to WT, while H507F/ I mutants which lose these hydrogen-bonding interactions, did not crystallize after several attempts. This underscores the importance of these hydrogen bonds in stabilizing this conformation. The C-terminus of helix L also makes extensive contacts with the primer grip and helix C of the palm.
The folding of the connection onto the palm in this way results not only in significant interactions with the palm and thumb but also partially blocks access to the polymerase active site. Interactions between the connection and the fingers appear very remote with the only relatively close contact occurring between R221 and L541. This is in stark contrast to HIV-1 p51 where the positioning of helix L and β-strand 20 positions the loop between them into close contact with the fingers (Fig.   8). 2 and 8). It must be noted that while some hydrophobic residues are buried in these interfaces between the connection/palm/thumb, the key anchoring interactions are actually hydrogen bonds and van der Waals interactions. Since this machinery must undergo extensive subdomain rearrangement to function as a polymerase, these relatively 'weak' interactions ensure that the energetic barrier to isomerization from a p51-like to a p66-like conformation is lower as compared to HIV-1 p51 where these interfaces are predominantly hydrophobic.  Fig. 2).

RNase H
The RNase H domain (593-751) also comprises an asymmetric arrangement of five-stranded mixed beta-sheets and four α-helices with the beta-sheets sandwiched between a three helix bundle while αBR packs across helices αAR and αDR to stabilize them further, αCR is positioned across the thumb and the connection, largely forming van der Waals contacts and thereby restricting the motion of the thumb in that direction (Fig. 9). A calcium ion from the crystallization buffer of the WT PFV PR-RT is observed chelated to the side chains of residues D599, E646 and D669, together with carbonyl of G600 which define the RNase H active site (Fig. 9). These residues, even though situated at the center of the domain, are solvent exposed and point away from the putative nucleic acid-binding cleft. This further suggests that this domain must undergo rotation to bring the active site into register with the nucleic acid substrate when it binds it.

PR-RT/DNA complexes
To gain insight into the nucleic acid binding kinetics and substrate specificities, the dissociation of PFV PR-RT from a template-primer RNA/DNA hybrid and a double-stranded (ds) DNA was measured and compared to that of HIV-1 RT. PFV PR-RT bound dsDNA and dissociated much more slowly compared to when bound to RNA/DNA. On the contrary, HIV-1 RT dissociated much slower from an RNA/DNA than from dsDNA (Supplemental Figure 3). This suggests that these two enzymes interact differently with these substrates. These differences may relate to the general topological differences between the RNA/DNA hybrid and the dsDNA and how these proteins engage them, or whether additional contacts come into play depending on which substrate is bound. These interactions may be relevant for understanding the differences in processivity between the enzymes. To determine the similarities between the nucleic acid binding clefts of PFV PR-RT and that of HIV-1 RT, a site-specific DNA cross-linking experiment was carried out. Q258C substitution in HIV-1 RT enables the covalent trapping of nucleic acid substrates in the binding groove of the RT.
The cross-linking chemistry designed for HIV-1 RT is such that covalent trapping of the dsDNA or RNA/DNA is only possible after a single round of nucleotide incorporation followed by translocation when a thioalkyl modified base G (N2) in the primer is positioned in register with the cysteine to enable disulfide bond formation (Boyer et al., 2004;Das et al., 2012). The covalently trapped complex is further purified using tandem nickel-heparin columns. PFV PR-RT Q391C, which is the equivalent of HIV-1 RT Q258C covalently traps this dsDNA designed for HIV-1 RT (Supplemental Fig. 4). Thus, even though PFV RT Q391 in the thumb is pointing away from the structure, upon rearrangement, the thumb is positioned in a configuration similar to the HIV-1 RT p66, and at a distance from the polymerase active site that is equivalent to that of p66.
Supplemental This also suggests that the overall configuration of the catalytically active PFV RT is similar to that of HIV-1 RT p66.

Discussion
Retroviral genomes encode aspartyl PRs and RTs that are respectively responsible for the proteolytic processing of polyprotein precursors as well as reverse transcription of their RNA genome into a dsDNA, which can subsequently be inserted into the host chromosome by IN. In this work, we report the crystal structure of the PFV PR-RT fusion polyprotein is presented. The crystal structure of the PFV PR-RT shows monomeric PR and an RT. The PFV PR is folded similarly to a single subunit of the mature HIV-1 dimeric PR however, it exists in a monomeric state. The N-and C-terminal residues involved in the dimerization interface in the mature PR remain unstructured in this structure. In silico placement of two PFV PR-RT molecules related by two-fold symmetry bring two PR monomers together, similar to mature dimers, without significant Supplemental Figure 4. Heparin chromatography trace of PR-RT-dsDNA cross-linked complex with its accompanying SDS-PAGE gel.
steric clashes between the RT domains. This offers a glimpse of how this entity might carry out proteolytic processing of peptide substrates.
The polymerase domain in the structure appears to be in an inactive configuration that would not bind a double-stranded nucleic acid substrate. The subdomains are arranged in a configuration that closely resemblances to the p51 subunit of HIV-1 RT. Hence a significant rearrangement of the subdomains is envisaged for PFV PR-RT to facilitate the binding of nucleic acid substrates. With the long linkers connecting the palm and the thumb as well as the connection and RNase H, such movements are fathomable (see movie). It must be emphasized that while for monomeric RTs the arrangement of these domains is unusual, it may not be unique to PFV. Based on the analysis of buried surface areas in the HIV-1 p66/p51 heterodimer, it was suggested that monomeric forms of these enzymes would adopt a more compact "p51-like" conformation to compensate for the cost of exposing hydrophobic residues (Wang et al., 1994). (Zheng et al., 2015), using NMR spectroscopy confirmed this initial observation, further asserting that the RNase H domain of monomeric HIV-1 p66 has a loose structure with flexible linkers connecting the thumb and RNase H domains to the rest of the structure, without any significant interaction between the thumb and the RNase H.
These observations from limited structural data are very consistent with the structure presented here. The interactions observed between the thumb and RNase H of PFV RT is through the Chelix which is lacking in HIV-1. Furthermore, monomeric HIV-1 p66 has been shown to sample predominantly two conformational states prior to homodimerization and subsequent cleavage of the RNase H domain of the p51 precursor. NMR studies (Zheng et al., 2015) suggest that the most predominant population in solution is the "p51-like" conformation. Transient sampling of the catalytically competent p66 open conformer might expose the dimerization interface leading to homodimerization with a "p51-like" conformer prior to maturation. The asymmetric homodimer subsequently matures to the asymmetric heterodimer in mature HIV-1 RT. Structural data defining the position of the RNase H domain in the asymmetric homodimer prior to maturation has been elusive, since the structure of monomeric HIV-1 p66 is not available. Our PFV PR-RT structure reveals what could be an important isomerization intermediate of RTs that has eluded researchers for decades.
It is worth nothing that the energetic penalty of isomerization from the closed p51-like conformation with hydrophobic residues involved in extensive inter-subdomain interactions in this structure must be compensated for in the open p66-like conformation. In the absence of nucleic acid, this energy barrier is likely insurmountable, since most of these interactions are lost upon isomerization into the open catalytically competent state. It is envisaged therefore that this isomerization may only be largely triggered by nucleic acid substrates, where an extensive interface between the connection subdomain and the RNase H is expected to form, in addition to protein-nucleic acid interactions between the fingers, thumb, connection and RNase H.
Superposition of the current structure with HIV-1 RT/DNA complex (Fig. 10) suggests that the connection domain acts as the "gatekeeper", by occupying the nucleic acid-binding cleft. For nucleic acid binding therefore, it is envisaged that this gatekeeper must undergo close to 90º rotation followed by a translation perpendicular to the helical axis of helix L of the connection.
This screw movement would disrupt the interaction with the thumb, allowing it to move closer to the fingers/palm, while pushing out the RNase H domain to swivel to a position similar that observed in other RTs to form a conformation relevant for nucleic acid binding (see movie in SI).
It is possible that the catalytically competent conformation is minimally sampled in solution which is trapped and stabilized by nucleic acid substrates. The sandwiching of the thumb by the RNase H and the connection suggest that any kind of movement has to be well choreographed and that random rearrangement of the domains is unlikely.
Since the PFV PR-RT is the machinery that carries out proteolytic processing as well as reverse transcription, a tighter control of each process is envisaged. Thus, the adoption of an inactive RT conformation with considerable energetic penalty upon rearrangement will predominantly help to decouple the two functions and might ensure that they can be regulated from a temporal and spatial perspective. Such a high energy barrier may be compensated for by the extensive interactions with nucleic acids. This structure does not only capture the imaginative evolutionary genius of the most primitive member of retroviruses but offers significant insight into the evolutionary landscape of these largely non-pathogenic viruses.

Conclusions
A systematic study of the PR-RT precursor polyprotein of PFV has been carried out. The PFV PR-RT is a functional PR and RT. The mutations in the enzyme which improved the yield of purified protein compared to the WT were identified using a Python script based on the instability indices using inhibitors that target its precursor polyproteins rather than the mature enzymes, and making the precursor polyprotein a legitimate target for drug discovery.
For the PFV PR-RT, the subdomains are arranged similarly to the p51 subdomain of HIV-1 RT.
Such an inactive conformation of the monomeric RT had been predicted based on buried surface analysis and NMR spectroscopy. The compact fold likely ensures that the entropic cost associated with having an open nucleic acid-binding cleft in a monomeric RT is offset. It is expected that an extensive rearrangement of the domains would be required to enable nucleic acid binding. Crosslinking studies suggest that the PFV RT binds nucleic acid substrates in a mode similar to that of HIV-1 RT. This structure additionally offers insight into the pathway of conformational maturation of retroviral RTs. It is also a clever way of decoupling PR activity, which is needed early in the life cycle of the virus during maturation, from that of the polymerase to ensure they are well regulated.

Cloning, expression and purification
PFV PR-RT, with a protease null mutation (D24A), WT, was cloned into a pET28a vector with an N-terminal hexa-His-tag and an HRV14 3C protease cleavage site. Quick Change site-directed mutagenesis (Liu & Naismith, 2008) Figs. 5A-B). The Hampton additive screen was carried out using the Natrix conditions E8 and F8. Crystals grown in 30 mM glycylglycylglycine (GGG)-C11, and 10 mM EDTA-E1 at 20 ºC were of slightly different morphology than the initial crystal hit. Further optimization was therefore carried out using glycylglycine and EDTA as additives by varying their concentrations. These conditions produced thin plates of less than 5 µm thickness in each of the individual cases in the presence of 200 mM glycylglycine or 100 mM EDTA which diffracted Xrays to about 4.5-5 Å resolution. The choice of glycylglycine (GG) was serendipitous as it was available in the lab at the time, whereas GGG was not. Crystals grown with GGG when it was finally obtained diffracted more poorly.
Even though the resolution of these datasets was sufficient in theory to solve the structure, the diffraction was very anisotropic with very high mosaicity, making it practically impossible to solve the structure using these datasets. A 1:1 mixture of the two conditions containing GG and EDTA was carried out since the path to further optimization had been narrowed much further.
Surprisingly, the crystals grown in these conditions were significantly thicker than those produced by their individual conditions. Thus, GG and EDTA together had a cumulatively positive effect on the crystal size even though they remained plates emanating from a single nucleation site (Supplemental Fig. 5).
The following conditions produced the best crystals suitable for diffraction studies: 50 mM KCl, 50 mM sodium cacodylate trihydrate pH 6.0, 12% PEG 8000, 1.0 mM spermine, 1.0 mM Largininamide, 200 mM glycyglycine or glycylglycylglycine, and 50 mM EDTA in a sitting-drop vapor diffusion crystallization format between 10-25 °C with drop sizes ranging between 1-10 µL in micro bridges (Hampton). The EDTA could be substituted with 10 mM MgCl2, 10 mM MnCl2, or 100 mM CaCl2 with similar results. The wild-type crystals as well as selenomethionine CSH crystals were grown with 100 mM CaCl2 instead of EDTA and 2-5 mM TCEP supplemented with the protein. The CSH mutant of PFV routinely formed plate-like crystals which diffracted X-rays to at least 3.0 Å resolution enabling structural solution. Optimized crystallization conditions using this mutant were subsequently used to crystallize and solve the structure of the WT as well.
Crystals were harvested using MiTeGen MicroLoops or MicroMesh mounts and cryo-protected in reservoir supplemented with 25% glycerol before flash freezing.

Data collection
X-ray diffraction data were collected at the Cornell High Energy Synchrotron Source (CHESS) F1 beamline, the Advanced Photon Source (APS) 23-ID-D beamline, and the Stanford Synchrotron Radiation Light Source (SSRL) 9-2 beamline. Both PFV PR-RT WT and CSH crystallized with C2 space group symmetry. Data processing and scaling were done using iMosFlm and Aimless.
The structure was solved by merging five SeMet-labeled data sets of the CSH mutant using Crank2, and the refinement was carried using REFMAC, all in the CCP4i suite (Potterton et al., 2018). Model building was done in Coot (Emsley & Cowtan, 2004). The model of the SeMetlabelled CSH mutant was successfully used for molecular replacement using the WT data which Supplemental Figure 5. (A) Initial crystallization hit in Natrix HT E8. (B) Optimization of initial additive screening with (C), and (D) fully optimized crystals used for structure determination was found to be twinned (Xtriage) in Phaser in PHENIX (Adams et al., 2011) and refined to 2.9 Å resolution using the twin law -h, -k, l to R-work/R-free of 22.4/25.8% in REFMAC (Murshudov et al., 1997) and phenix.refine (Adams et al., 2011;Murshudov et al., 1997). Publication figures were generated using PyMOL Molecular Graphics System, v. 1.7 (Schrödinger, LLC).

Extension and processivity assay
The polymerase and RNase H activity assays were carried out according to the protocol in Boyer et al., 2004(Boyer et al., 2004Rinke et al., 2002).

Protein-DNA cross-linking
The cross-linking experiments were conducted as reported in (Das et al., 2012;Das et al., 2019;Sarafianos et al., 2003). NaCl concentrations in the final reactions were however changed to 150-200 mM.