###Column descriptions: #UNLABELED COLUMN: numbers refer to the feature index in the dataframe which we leave here only for consistency given that data is sorted by feature_importance "#1: ""CD accession"" as defined by the conserved domain database in cddid.tbl" "#2: ""CD ""short name"""" as defined by the conserved domain database in cddid.tbl" "#3: ""CD description"" as defined by the conserved domain database in cddid.tbl" "#4: ""PSSM-Length (number of columns, the size of the search model)"" as defined by the conserved domain database in cddid.tbl" "#integrase, excisionase, recombinase; etc.: each of these ""search term columns"" represents whether the particular search string appeared within the CD description field (1 for yes, 0, for no)" "#search_hits: A sum across the ""search term columns"" reflecting how many different search terms produced a hit within the CD description field. All domains should have at least a value of 1 to appear in our dataset" "#feature_importance: The importance assigned to each domain (as provided by classifier.feature_importances_) in the final random forest model. Missing values denote domains that were dropped for any one of several reasons mentioned within the main text of our manuscript (i.e. they occur in too few phages, are enriched in lytic phages, etc.)" 1 2 3 4 integrase excisionase recombinase transposase lysogen temperate parA|ParA|parB|ParB search_hits feature_importance 234 pfam04606 Ogr_Delta "Ogr/Delta-like zinc finger. This is a viral family of phage zinc-binding transcriptional activators, which also contains cryptic members in some bacterial genomes. The P4 phage delta protein contains two such domains attached covalently, while the P2 phage Ogr proteins possess one domain but function as dimers. All the members of this family have the following consensus sequence: C-X(2)-C-X(3)-A-(X)2-R-X(15)-C-X(4)-C-X(3)-F. This family also includes zinc fingers in recombinase proteins." 47 0 0 1 0 0 0 0 1 0.056585479 312 pfam13408 Zn_ribbon_recom Recombinase zinc beta ribbon domain. This short bacterial protein contains a zinc ribbon domain that is likely to be DNA-binding. This domain is found in site specific recombinase proteins. This family appears most closely related to pfam04606. 58 0 0 1 0 0 0 0 1 0.048596649 103 cd00397 DNA_BRE_C "DNA breaking-rejoining enzymes, C-terminal catalytic domain. The DNA breaking-rejoining enzyme superfamily includes type IB topoisomerases and tyrosine based site-specific recombinases (integrases) that share the same fold in their catalytic domain containing conserved active site residues. The best-studied members of this diverse superfamily include Human topoisomerase I, the bacteriophage lambda integrase, the bacteriophage P1 Cre recombinase, the yeast Flp recombinase, and the bacterial XerD/C recombinases. Their overall reaction mechanism is essentially identical and involves cleavage of a single strand of a DNA duplex by nucleophilic attack of a conserved tyrosine to give a 3' phosphotyrosyl protein-DNA adduct. In the second rejoining step, a terminal 5' hydroxyl attacks the covalent adduct to release the enzyme and generate duplex DNA. The enzymes differ in that topoisomerases cleave and then rejoin the same 5' and 3' termini, whereas a site-specific recombinase transfers a 5' hydroxyl generated by recombinase cleavage to a new 3' phosphate partner located in a different duplex region. Many DNA breaking-rejoining enzymes also have N-terminal domains, which show little sequence or structure similarity." 167 1 0 1 0 0 0 0 2 0.046739945 118 cd01182 INT_RitC_C_like "C-terminal catalytic domain of recombinase RitC, a component of the recombinase trio. Recombinases belonging to the RitA (also known as pAE1 due to its presence in the deletion prone region of plasmid pAE1 of Alcaligenes eutrophus H1), RitB, and RitC families are associated in a complex referred to as a Recombinase in Trio (RIT) element. These RIT elements consist of three adjacent and unidirectional overlapping genes, one from each family (ritABC in order of transcription). All three integrases contain a catalytic motif, suggesting that they are all active enzymes. However, their specific roles are not yet fully understood. All three families belong to the superfamily of DNA breaking-rejoining enzymes, which share the same fold in their catalytic domain and the overall reaction mechanism." 186 1 0 1 0 0 0 0 2 0.037092044 108 cd00796 INT_Rci_Hp1_C "Shufflon-specific DNA recombinase Rci and Bacteriophage Hp1_like integrase, C-terminal catalytic domain. Rci protein is a tyrosine recombinase specifically involved in Shufflon type of DNA rearrangement in bacteria. The shufflon of plasmid R64 consists of four invertible DNA segments which are separated and flanked by seven 19-bp repeat sequences. RCI recombinase facilitates the site-specific recombination between any inverted repeats results in an inversion of the DNA segment(s) either independently or in groups. HP1 integrase promotes site-specific recombination of the HP1 genome into that of Haemophilus influenza. Bacteriophage Hp1_like integrases are tyrosine based site specific recombinases. They belong to the superfamily of DNA breaking-rejoining enzymes, which share the same fold in their catalytic domain and the overall reaction mechanism. The catalytic domain contains six conserved active site residues. Their overall reaction mechanism is essentially identical and involves cleavage of a single strand of a DNA duplex by nucleophilic attack of a conserved tyrosine to give a 3' phosphotyrosyl protein-DNA adduct. In the second rejoining step, a terminal 5' hydroxyl attacks the covalent adduct to release the enzyme and generate duplex DNA." 162 1 0 1 0 0 0 0 2 0.03396646 200 pfam00589 Phage_integrase "Phage integrase family. Members of this family cleave DNA substrates by a series of staggered cuts, during which the protein becomes covalently linked to the DNA through a catalytic tyrosine residue at the carboxy end of the alignment. The catalytic site residues in CRE recombinase are Arg-173, His-289, Arg-292 and Tyr-324." 167 1 0 1 0 0 0 0 2 0.032174272 344 pfam17463 Gp79 "Gene Product 79. This is a domain of unknown function found in Mycobacterium phage. Family members include the full Gp79 protein found in Mycobacteriophage L5. Mycobacteriophage L5, is a phage isolated from Mycobacterium smegmatis. It forms stable lysogens in M. smegmatis and has a broad host range among the pathogenic mycobacteria. L5 encodes gene products (gp) toxic to the host M. smegmatis. Expression of gp79 interferes with the cell membrane or cell-wall synthesis of M. smegmatis, leading to altered cell morphology. It also has a bactericidal effect on E. coli. The N-terminal segment of gp79 (amino acids 1-41) shares sequence similarity with the signal peptide of the D-alanylD-alanine carboxypeptidase of Bacillus licheniformis. This enzyme removes C-terminal D-alanyl residues from sugarpeptide cell-wall precursors and is also a penicillin-binding protein (PBP). The homology of the hydrophobic N-terminal part of gp79 to a PBP (penicillin-binding protein) signal peptide may indicate an interaction of gp79 with proteins or metabolites involved in the peptidoglycan synthesis of M. smegmatis." 51 0 0 0 0 1 0 0 1 0.031340186 124 cd01189 INT_ICEBs1_C_like "C-terminal catalytic domain of integrases from bacterial phages and conjugate transposons. This family of tyrosine based site-specific integrases is has origins in bacterial phages and conjugate transposons. One member is the integrase from Bacillus subtilis conjugative transposon ICEBs1. ICEBs1 can be excised and transfered to various recipients in response to DNA damage or high concentrations of potential mating partners. The family belongs to the superfamily of DNA breaking-rejoining enzymes, which share the same fold in their catalytic domain and the overall reaction mechanism. The catalytic domain contains six conserved active site residues. Their overall reaction mechanism involves cleavage of a single strand of a DNA duplex by nucleophilic attack of a conserved tyrosine to give a 3' phosphotyrosyl protein-DNA adduct. In the second rejoining step, a terminal 5' hydroxyl attacks the covalent adduct to release the enzyme and generate duplex DNA." 147 1 0 0 0 0 0 0 1 0.030981784 53 PRK05084 xerS site-specific tyrosine recombinase XerS; Reviewed 357 0 0 1 0 0 0 0 1 0.030885012 144 cd03768 SR_ResInv "Serine Recombinase (SR) family, Resolvase and Invertase subfamily, catalytic domain; members contain a C-terminal DNA binding domain. Serine recombinases catalyze site-specific recombination of DNA molecules by a concerted, four-strand cleavage and rejoining mechanism which involves a transient phosphoserine linkage between DNA and the enzyme. They are functionally versatile and include resolvases, invertases, integrases, and transposases. Resolvases and invertases affect resolution or inversion and comprise a major phylogenic group. Resolvases (e.g. Tn3, gamma-delta, and Tn5044) normally recombine two sites in direct repeat causing deletion of the DNA between the sites. Invertases (e.g. Gin and Hin) recombine sites in inverted repeat to invert the DNA between the sites. Cointegrate resolution with gamma-delta resolvase requires the formation of a synaptosome of three resolvase dimers bound to each of two res sites on the DNA. Also included in this subfamily are some putative integrases including a sequence from bacteriophage phi-FC1." 126 1 0 1 1 0 0 0 3 0.028151756 7 COG1961 PinE "Site-specific DNA recombinase related to the DNA invertase Pin [Replication, recombination and repair]. " 222 0 0 1 0 0 0 0 1 0.026172778 128 cd01193 INT_IntI_C "Integron integrase and similar protiens, C-terminal catalytic domain. Integron integrases mediate site-specific DNA recombination between a proximal primary site (attI) and a secondary target site (attC) found within mobile gene cassettes encoding resistance or virulence factors. Unlike other site specific recombinases, the attC sites lack sequence conservation. Integron integrase exhibits broader DNA specificity by recognizing the non-conserved attC sites. The structure shows that DNA target site recognition are not dependent on canonical DNA but on the position of two flipped-out bases that interact in cis and in trans with the integrase. Integron-integrases are present in many natural occurring mobile elements, including transposons and conjugative plasmids. Vibrio, Shewanella, Xanthomonas, and Pseudomonas species harbor chromosomal super-integrons. All integron-integrases carry large inserts unlike the TnpF ermF-like proteins also seen in this group." 176 1 0 1 0 0 0 0 2 0.025724538 1 COG0582 XerC "Integrase [Replication, recombination and repair, Mobilome: prophages, transposons]. " 309 1 0 0 0 0 0 0 1 0.025439777 127 cd01192 INT_C_like_3 "Uncharacterized site-specific tyrosine recombinase, C-terminal catalytic domain. Tyrosine recombinase (integrase) belongs to a DNA breaking-rejoining enzyme superfamily. The catalytic domain contains six conserved active site residues. The recombination reaction involves cleavage of a single strand of a DNA duplex by nucleophilic attack of a conserved tyrosine to give a 3' phosphotyrosyl protein-DNA adduct. In the second rejoining step, a terminal 5' hydroxyl attacks the covalent adduct to release the enzyme and generate duplex DNA. Many DNA breaking-rejoining enzymes also have N-terminal domains, which show little sequence or structure similarity." 178 1 0 1 0 0 0 0 2 0.024722921 31 COG4973 XerC "Site-specific recombinase XerC [Replication, recombination and repair]. " 299 0 0 1 0 0 0 0 1 0.022714365 105 cd00569 HTH_Hin_like "Helix-turn-helix domain of Hin and related proteins. This domain model summarizes a family of DNA-binding domains unique to bacteria and represented by the Hin protein of Salmonella. The basic HTH domain is a simple fold comprised of three core helices that form a right-handed helical bundle. The principal DNA-protein interface is formed by the third helix, the recognition helix, inserting itself into the major groove of the DNA. A diverse array of HTH domains participate in a variety of functions that depend on their DNA-binding properties. HTH_Hin represents one of the simplest versions of the HTH domains; the characterization of homologous relationships between various sequence-diverse HTH domain families remains difficult. The Hin recombinase induces the site-specific inversion of a chromosomal DNA segment containing a promoter, which controls the alternate expression of two genes by reversibly switching orientation. The Hin recombinase consists of a single polypeptide chain containing a C-terminal DNA-binding domain (HTH_Hin) and a catalytic domain." 42 0 0 1 0 0 0 0 1 0.019206685 109 cd00797 INT_RitB_C_like "C-terminal catalytic domain of recombinase RitB, a component of the recombinase trio. Recombinases belonging to the RitA (also known as pAE1 due to its presence in the deletion prone region of plasmid pAE1 of Alcaligenes eutrophus H1), RitB, and RitC families are associated in a complex referred to as a Recombinase in Trio (RIT) element. These RIT elements consist of three adjacent and unidirectional overlapping genes, one from each family (ritABC in order of transcription). All three integrases contain a catalytic motif, suggesting that they are all active enzymes. However, their specific roles are not yet fully understood. All three families belong to the superfamily of DNA breaking-rejoining enzymes, which share the same fold in their catalytic domain and the overall reaction mechanism." 198 1 0 1 0 0 0 0 2 0.018246183 143 cd03767 SR_Res_par "Serine recombinase (SR) family, Partitioning (par)-Resolvase subfamily, catalytic domain; Serine recombinases catalyze site-specific recombination of DNA molecules by a concerted, four-strand cleavage and rejoining mechanism which involves a transient phosphoserine linkage between DNA and the enzyme. They are functionally versatile and include resolvases, invertases, integrases, and transposases. This subgroup is composed of proteins similar to the E. coli resolvase found in the par region of the RP4 plasmid, which encodes a highly efficient partitioning system. This protein is part of a complex stabilization system involved in the resolution of plasmid dimers during cell division. Similar to Tn3 and other resolvases, members of this family may contain a C-terminal DNA binding domain." 146 1 0 1 1 0 0 0 3 0.018149622 120 cd01185 INTN1_C_like "Integrase IntN1 of Bacteroides mobilizable transposon NBU1 and similar proteins, C-terminal catalytic domain. IntN1 is a tyrosine recombinase for the integration and excision of Bacteroides mobilizable transposon NBU1 from the host chromosome. IntN1 does not require strict homology between the recombining sites seen with other tyrosine recombinases. This family belongs to the superfamily of DNA breaking-rejoining enzymes, which share the same fold in their catalytic domain and the overall reaction mechanism. The catalytic domain contains six conserved active site residues. Their overall reaction mechanism involves cleavage of a single strand of a DNA duplex by nucleophilic attack of a conserved tyrosine to give a 3' phosphotyrosyl protein-DNA adduct. In the second rejoining step, a terminal 5' hydroxyl attacks the covalent adduct to release the enzyme and generate duplex DNA." 161 1 0 1 0 0 0 0 2 0.018015932 146 cd03770 SR_TndX_transposase "Serine Recombinase (SR) family, TndX-like transposase subfamily, catalytic domain; composed of large serine recombinases similar to Clostridium TndX and TnpX transposases. Serine recombinases catalyze site-specific recombination of DNA molecules by a concerted, four-strand cleavage and rejoining mechanism which involves a transient phosphoserine linkage between DNA and the enzyme. They are functionally versatile and include resolvases, invertases, integrases, and transposases. TndX mediates the excision and circularization of the conjugative transposon Tn5397 from Clostridium difficile. TnpX is responsible for the movement of the nonconjugative chloramphenicol resistance elements of the Tn4451/3 family. Mobile genetic elements such as transposons are important vehicles for the transmission of virulence and antibiotic resistance in many microorganisms." 140 1 0 1 1 0 0 0 3 0.016942714 121 cd01186 INT_tnpA_C_Tn554 "Putative Transposase A from transposon Tn554, C-terminal catalytic domain. This family includes putative Transposase A from transposon Tn554. It belongs to a DNA breaking-rejoining enzyme superfamily. The catalytic domain contains six conserved active site residues. The recombination reaction involves cleavage of a single strand of a DNA duplex by nucleophilic attack of a conserved tyrosine to give a 3' phosphotyrosyl protein-DNA adduct. In the second rejoining step, a terminal 5' hydroxyl attacks the covalent adduct to release the enzyme and generate duplex DNA. Many DNA breaking-rejoining enzymes also have N-terminal domains, which show little sequence or structure similarity." 184 0 0 0 1 0 0 0 1 0.016676233 112 cd00800 INT_Lambda_C "C-terminal catalytic domain of Lambda integrase, a tyrosine-based site-specific recombinase. Lambda-type integrases catalyze site-specific integration and excision of temperate bacteriophages and other mobile genetic elements to and from the bacterial host chromosome. They are tyrosine-based site-specific recombinase and belong to the superfamily of DNA breaking-rejoining enzymes, which share the same fold in their catalytic domain and the overall reaction mechanism. The phage lambda integrase can bridge two different and well-separated DNA sequences called arm- and core-sites. The C-terminal domain binds, cleaves and re-ligates DNA strands at the core-sites, while the N-terminal domain is largely responsible for high-affinity binding to the arm-type sites." 161 1 0 1 0 0 1 0 3 0.015255101 110 cd00798 INT_XerDC_C "XerD and XerC integrases, C-terminal catalytic domains. XerDC-like integrases are involved in the site-specific integration and excision of lysogenic bacteriophage genomes, transposition of conjugative transposons, termination of chromosomal replication, and stable plasmid inheritance. They share the same fold in their catalytic domain containing six conserved active site residues and the overall reaction mechanism with the DNA breaking-rejoining enzyme superfamily. In Escherichia coli, the Xer site-specific recombination system acts to convert dimeric chromosomes, which are formed by homologous recombination to monomers. Two related recombinases, XerC and XerD, bind cooperatively to a recombination site present in the E. coli chromosome. Each recombinase catalyzes the exchange of one pair of DNA strand in a reaction that proceeds through a Holliday junction intermediate. These enzymes can bridge two different and well-separated DNA sequences called arm- and core-sites. The C-terminal domain binds, cleaves, and re-ligates DNA strands at the core-sites, while the N-terminal domain is largely responsible for high-affinity binding to the arm-type sites." 172 1 0 1 0 1 0 0 3 0.014944367 48 PRK00236 xerC site-specific tyrosine recombinase XerC; Reviewed 297 0 0 1 0 0 0 0 1 0.014326812 352 pfam18607 HTH_54 "ParA helix turn helix domain. The accurate segregation of DNA is essential for the faithful inheritance of genetic information. Segregation of the prototypical P1 plasmid par system requires two proteins, ParA and ParB, and a centromere. When bound to ATP, ParA mediates segregation by interacting with centromere-bound ParB, but when bound to ADP, ParA fulfills a different function: DNA-binding transcription autoregulation. ParA consists of an elongated N-terminal alpha-helix which mediates dimerization, a winged-HTH and a Walker-box containing C-domain. This entry describes the N-terminal alpha helix domain combined with the winged HTH region." 92 0 0 0 0 0 0 1 1 0.013624489 123 cd01188 INT_RitA_C_like "C-terminal catalytic domain of recombinase RitA, a component of the recombinase trio. Recombinases RitA (also known as pAE1), RitB, and RitC are encoded by three adjacent and overlapping genes. Collectively they are known as the Recombinase in Trio (RIT). This RitA family includes various bacterial integrases and integrases from the deletion-prone region of plasmid pAE1 of Alcaligenes eutrophus H1. All three integrases contain a catalytic motif, suggesting that they are all active enzymes. However, their specific roles are not fully understood. All three families belong to the superfamily of DNA breaking-rejoining enzymes, which share the same fold in their catalytic domain and the overall reaction mechanism. The catalytic domain contains six conserved active site residues. Their overall reaction mechanism is essentially identical and involves cleavage of a single strand of a DNA duplex by nucleophilic attack of a conserved tyrosine to give a 3' phosphotyrosyl protein-DNA adduct. In the second rejoining step, a terminal 5' hydroxyl attacks the covalent adduct to release the enzyme and generate duplex DNA." 179 1 0 1 0 0 0 0 2 0.012965044 67 PRK15417 PRK15417 integron integrase. 337 1 0 0 0 0 0 0 1 0.012457728 211 pfam01656 CbiA "CobQ/CobB/MinD/ParA nucleotide binding domain. This family consists of various cobyrinic acid a,c-diamide synthases. These include CbiA and CbiP from S.typhimurium, and CobQ from R. capsulatus. These amidases catalyze amidations to various side chains of hydrogenobyrinic acid or cobyrinic acid a,c-diamide in the biosynthesis of cobalamin (vitamin B12) from uroporphyrinogen III. Vitamin B12 is an important cofactor and an essential nutrient for many plants and animals and is primarily produced by bacteria. The family also contains dethiobiotin synthetases as well as the plasmid partitioning proteins of the MinD/ParA family." 229 0 0 0 0 0 0 1 1 0.011972549 113 cd00801 INT_P4_C "Bacteriophage P4 integrase, C-terminal catalytic domain. P4-like integrases are found in temperate bacteriophages, integrative plasmids, pathogenicity and symbiosis islands, and other mobile genetic elements. The P4 integrase mediates integrative and excisive site-specific recombination between two sites, called attachment sites, located on the phage genome and the bacterial chromosome. The phage attachment site is often found adjacent to the integrase gene, while the host attachment sites are typically situated near tRNA genes. This family belongs to the superfamily of DNA breaking-rejoining enzymes, which share the same fold in their catalytic domain and the overall reaction mechanism. The catalytic domain contains six conserved active site residues. Their overall reaction mechanism involves cleavage of a single strand of a DNA duplex by nucleophilic attack of a conserved tyrosine to give a 3' phosphotyrosyl protein-DNA adduct. In the second rejoining step, a terminal 5' hydroxyl attacks the covalent adduct to release the enzyme and generate duplex DNA." 180 1 0 0 0 0 1 0 2 0.011945008 80 TIGR02225 recomb_XerD "tyrosine recombinase XerD. The phage integrase family describes a number of recombinases with tyrosine active sites that transiently bind covalently to DNA. Many are associated with mobile DNA elements, including phage, transposons, and phase variation loci. This model represents XerD, one of two closely related chromosomal proteins along with XerC (TIGR02224). XerC and XerD are site-specific recombinases which help resolve chromosome dimers to monomers for cell division after DNA replication. In species with a large chromosome and with homologs of XerD on other replicons, the chomosomal copy was preferred for building this model. This model does not detect all XerD, as some apparent XerD examples score below the trusted and noise cutoff scores. XerC and XerD interact with cell division protein FtsK. [DNA metabolism, DNA replication, recombination, and repair]" 291 1 0 1 0 0 0 0 2 0.010787692 77 TIGR01969 minD_arch "cell division ATPase MinD, archaeal. This model represents the archaeal branch of the MinD family. MinD, a weak ATPase, works in bacteria with MinC as a generalized cell division inhibitor and, through interaction with MinE, prevents septum placement inappropriate sites. Often several members of this family are found in archaeal genomes, and the function is uncharacterized. More distantly related proteins ParA chromosome partitioning proteins. The exact roles of the various archaeal MinD homologs are unknown." 251 0 0 0 0 0 0 1 1 0.010651989 132 cd01197 INT_FimBE_like "FimB and FimE and related proteins, integrase/recombinases. This CD includes proteins similar to E.coli FimE and FimB and Proteus mirabilis MrpI. FimB and FimE are the regulatory proteins during expression of type 1 fimbriae in Escherichia coli. The fimB and fimE proteins direct the phase switch into the 'on' and 'off' position. MrpI is the regulatory protein of proteus mirabilis fimbriae expression. This family belongs to the integrase/recombinase superfamily." 181 1 0 1 0 0 0 0 2 0.009560897 266 pfam09140 MipZ ATPase MipZ. MipZ is an ATPase that forms a complex with the chromosome partitioning protein ParB near the chromosomal origin of replication. It is responsible for the temporal and spatial regulation of FtsZ ring formation. 263 0 0 0 0 0 0 1 1 0.009239184 85 TIGR03453 partition_RepA "plasmid partitioning protein RepA. Members of this family are the RepA (or ParA) protein involved in replicon partitioning. All known examples occur in bacterial species with two or more replicons, on a plasmid or the smaller chromosome. Note that an apparent exception may be seen as a pseudomolecule from assembly of an incompletely sequenced genome. Members of this family belong to a larger family that also includes the enzyme cobyrinic acid a,c-diamide synthase, but assignment of that name to members of this family would be in error. [Mobile and extrachromosomal element functions, Plasmid functions]" 387 0 0 0 0 0 0 1 1 0.008599834 95 TIGR03815 CpaE_hom_Actino "helicase/secretion neighborhood CpaE-like protein. Members of this protein family belong to the MinD/ParA family of P-loop NTPases, and in particular show homology to the CpaE family of pilus assembly proteins (see ). Nearly all members are found, not only in a gene context consistent with pilus biogenesis or a pilus-like secretion apparatus, but also near a DEAD/DEAH-box helicase, suggesting an involvement in DNA transfer activity. The model describes a clade restricted to the Actinobacteria." 322 0 0 0 0 0 0 1 1 0.00816534 10 COG2826 Tra8 "Transposase and inactivated derivatives, IS30 family [Mobilome: prophages, transposons]. " 318 0 0 0 1 0 0 0 1 0.007903136 73 TIGR01764 excise "DNA binding domain, excisionase family. An excisionase, or Xis protein, is a small protein that binds and promotes excisive recombination; it is not enzymatically active. This model represents a number of putative excisionases and related proteins from temperate phage, plasmids, and transposons, as well as DNA binding domains of other proteins, such as a DNA modification methylase. This model identifies mostly small proteins and N-terminal regions of large proteins, but some proteins appear to have two copies. This domain appears similar, in both sequence and predicted secondary structure (PSIPRED) to the MerR family of transcriptional regulators (pfam00376). [Unknown function, General]" 49 0 1 0 0 0 1 0 2 0.007549456 83 TIGR02249 integrase_gron "integron integrase. Members of this family are integrases associated with integrons (and super-integrons), which are systems for incorporating and expressing cassettes of laterally transferred DNA. Incorporation occurs at an attI site. A super-integron, as in Vibrio sp., may include over 100 cassettes. This family belongs to the phage integrase family (pfam00589) that also includes recombinases XerC (TIGR02224) and XerD (TIGR02225), which are bacterial housekeeping proteins. Within this family of integron integrases, some are designated by class, e.g. IntI4, a class 4 integron integrase from Vibrio cholerae N16961. [DNA metabolism, DNA replication, recombination, and repair, Mobile and extrachromosomal element functions, Other]" 315 1 0 1 0 0 0 0 2 0.005932234 207 pfam01527 HTH_Tnp_1 Transposase. Transposase proteins are necessary for efficient DNA transposition. This family consists of various E. coli insertion elements and other bacterial transposases some of which are members of the IS3 family. 74 0 0 0 1 0 0 0 1 0.005884895 134 cd02042 ParAB_family "partition proteins ParAB family. ParA and ParB of Caulobacter crescentus belong to a conserved family of bacterial proteins implicated in chromosome segregation. ParB binds to DNA sequences adjacent to the origin of replication and localizes to opposite cell poles shortly following the initiation of DNA replication. ParB regulates the ParA ATPase activity by promoting nucleotide exchange in a fashion reminiscent of the exchange factors of eukaryotic G proteins. ADP-bound ParA binds single-stranded DNA, whereas the ATP-bound form dissociates ParB from its DNA binding sites. Increasing the fraction of ParA-ADP in the cell inhibits cell division, suggesting that this simple nucleotide switch may regulate cytokinesis. ParA shares sequence similarity to a conserved and widespread family of ATPases which includes the repA protein of the repABC operon in Rhizobium etli symbiotic plasmid. This operon is involved in the plasmid replication and partition." 130 0 0 0 0 0 0 1 1 0.005566779 148 cd04762 HTH_MerR-trunc "Helix-Turn-Helix DNA binding domain of truncated MerR-like proteins. Proteins in this family mostly have a truncated helix-turn-helix (HTH) MerR-like domain. They lack a portion of the C-terminal region, called Wing 2 and the long dimerization helix that is typically present in MerR-like proteins. These truncated domains are found in response regulator receiver (REC) domain proteins (i.e., CheY), cytosine-C5 specific DNA methylases, IS607 transposase-like proteins, and RacA, a bacterial protein that anchors chromosomes to cell poles." 49 0 0 0 1 0 0 0 1 0.005528148 145 cd03769 SR_IS607_transposase_like "Serine Recombinase (SR) family, IS607-like transposase subfamily, catalytic domain; members contain a DNA binding domain with homology to MerR/SoxR located N-terminal to the catalytic domain. Serine recombinases catalyze site-specific recombination of DNA molecules by a concerted, four-strand cleavage and rejoining mechanism which involves a transient phosphoserine linkage between DNA and the enzyme. They are functionally versatile and include resolvases, invertases, integrases, and transposases. This subfamily is composed of proteins that catalyze the transposition of insertion sequence (IS) elements such as IS607 from Helicobacter and IS1535 from Mycobacterium, and similar proteins from other bacteria and several archaeal species. IS elements are DNA segments that move to new sites in prokaryotic and eukaryotic genomes causing insertion mutations and gene rearrangements." 134 1 0 1 1 0 0 0 3 0.005316905 218 pfam02316 HTH_Tnp_Mu_1 Mu DNA-binding domain. This family consists of MuA-transposase and repressor protein CI. These proteins contain homologous DNA-binding domains at their N-termini which compete for the same DNA site within the Mu bacteriophage genome. 134 0 0 0 1 0 0 0 1 0.005295385 294 pfam12784 PDDEXK_2 PD-(D/E)XK nuclease family transposase. Members of this family belong to the PD-(D/E)XK nuclease superfamily. These proteins are transposase proteins. 227 0 0 0 1 0 0 0 1 0.004721834 361 pfam18759 Plavaka Plavaka transposase. A transposase with an RNaseH catalytic domain that often has a histone binding BAM/BAH domain at the C-terminus and is sometimes associated with TET/JBP family of dioxygenases in fungi. 326 0 0 0 1 0 0 0 1 0.004711294 102 cd00338 Ser_Recombinase "Serine Recombinase family, catalytic domain; a DNA binding domain may be present either N- or C-terminal to the catalytic domain. These enzymes perform site-specific recombination of DNA molecules by a concerted, four-strand cleavage and rejoining mechanism which involves a transient phosphoserine linkage between DNA and serine recombinase. Serine recombinases demonstrate functional versatility and include resolvases, invertases, integrases, and transposases. Resolvases and invertases (i.e. Tn3, gamma-delta, Tn5044 resolvases, Gin and Hin invertases) in this family contain a C-terminal DNA binding domain and comprise a major phylogenic group. Also included are phage- and bacterial-encoded recombinases such as phiC31 integrase, SpoIVCA excisionase, and Tn4451 TnpX transposase. These integrases and transposases have larger C-terminal domains compared to resolvases/invertases and are referred to as large serine recombinases. Also belonging to this family are proteins with N-terminal DNA binding domains similar to IS607- and IS1535-transposases from Helicobacter and Mycobacterium." 137 1 1 1 1 0 0 0 4 0.004530893 122 cd01187 INT_tnpB_C_Tn554 "Putative Transposase B from transposon Tn554, C-terminal catalytic domain. This family includes putative Transposase B from transposon Tn554. It belongs to a DNA breaking-rejoining enzyme superfamily. The catalytic domain containing six conserved active site residues. The recombination reaction involves cleavage of a single strand of a DNA duplex by nucleophilic attack of a conserved tyrosine to give a 3' phosphotyrosyl protein-DNA adduct. In the second rejoining step, a terminal 5' hydroxyl attacks the covalent adduct to release the enzyme and generate duplex DNA. Many DNA breaking-rejoining enzymes also have N-terminal domains, which show little sequence or structure similarity." 142 0 0 0 1 0 0 0 1 0.004379822 255 pfam07508 Recombinase Recombinase. This domain is usually found associated with pfam00239 in putative integrases/recombinases of mobile genetic elements of diverse bacteria and phages. 102 1 0 1 0 0 0 0 2 0.004225174 360 pfam18758 KDZ "Kyakuja-Dileera-Zisupton transposase. A transposase family with an RNaseH catalytic domain, often fused to DNA binding domains such as SAP or cysteine cluster domains. KDZ transposases are widely present in fungi, metazoa, chlorophytes and haotpohytes. Fungal versions are often associated with a TET/JBP family of dioxygenases." 219 0 0 0 1 0 0 0 1 0.004039168 50 PRK01287 xerC site-specific tyrosine recombinase XerC; Reviewed 358 0 0 1 0 0 0 0 1 0.003919367 129 cd01194 INT_C_like_4 "Uncharacterized site-specific tyrosine recombinase, C-terminal catalytic domain. Tyrosine recombinase (integrase) belongs to a DNA breaking-rejoining enzyme superfamily. The catalytic domain contains six conserved active site residues. The recombination reaction involves cleavage of a single strand of a DNA duplex by nucleophilic attack of a conserved tyrosine to give a 3' phosphotyrosyl protein-DNA adduct. In the second rejoining step, a terminal 5' hydroxyl attacks the covalent adduct to release the enzyme and generate duplex DNA. Many DNA breaking-rejoining enzymes also have N-terminal domains, which show little sequence or structure similarity." 174 1 0 1 0 0 0 0 2 0.003518848 141 cd03408 SPFH_like_u1 "Uncharacterized family; SPFH (stomatin, prohibitin, flotillin, and HflK/C) superfamily. This model summarizes an uncharacterized family of proteins similar to stomatin, prohibitin, flotillin, HflK/C (SPFH) and podocin. The conserved domain common to the SPFH superfamily has also been referred to as the Band 7 domain. Many superfamily members are associated with lipid rafts. Individual proteins of the SPFH superfamily may cluster to form membrane microdomains which may in turn recruit multiprotein complexes. Microdomains formed from flotillin proteins may in addition be dynamic units with their own regulatory functions. Flotillins have been implicated in signal transduction, vesicle trafficking, cytoskeleton rearrangement and are known to interact with a variety of proteins. Stomatin interacts with and regulates members of the degenerin/epithelia Na+ channel family in mechanosensory cells of Caenorhabditis elegans and vertebrate neurons and participates in trafficking of Glut1 glucose transporters. Prohibitin may act as a chaperone for the stabilization of mitochondrial proteins. Prokaryotic HflK/C plays a role in the decision between lysogenic and lytic cycle growth during lambda phage infection. Flotillins have been implicated in the progression of prion disease, in the pathogenesis of neurodegenerative diseases such as Parkinson's and Alzheimer's disease and, in cancer invasion and metastasis. Mutations in the podocin gene give rise to autosomal recessive steroid resistant nephritic syndrome." 217 0 0 0 0 1 0 0 1 0.003475096 305 pfam13022 HTH_Tnp_1_2 Helix-turn-helix of insertion element transposase. This is a family of largely phage proteins which are likely to be a helix-turn-helix insertion elements. 123 0 0 0 1 0 0 0 1 0.003198712 44 PHA02601 int integrase; Provisional 333 1 0 0 0 0 0 0 1 0.003175008 115 cd01120 RecA-like_NTPases "RecA-like NTPases. This family includes the NTP binding domain of F1 and V1 H+ATPases, DnaB and related helicases as well as bacterial RecA and related eukaryotic and archaeal recombinases. This group also includes bacterial conjugation proteins and related DNA transfer proteins involved in type II and type IV secretion." 165 0 0 1 0 0 0 0 1 0.00313751 337 pfam14706 Tnp_DNA_bind "Transposase DNA-binding. This domain occurs at the C-terminus of transposases including E. coli tnpA. TnpA encodes a transposase and an inhibitor protein, the inhibitor only differs from the transposase by the absence of the N-terminal 55 amino acids, which includes most of this domain. This domain consists of alpha helices and turns, and functions as a DNA-binding domain." 57 0 0 0 1 0 0 0 1 0.003034451 130 cd01195 INT_C_like_5 "Uncharacterized site-specific tyrosine recombinase, C-terminal catalytic domain. Tyrosine recombinase (integrase) belongs to a DNA breaking-rejoining enzyme superfamily. The catalytic domain contains six conserved active site residues. The recombination reaction involves cleavage of a single strand of a DNA duplex by nucleophilic attack of a conserved tyrosine to give a 3' phosphotyrosyl protein-DNA adduct. In the second rejoining step, a terminal 5' hydroxyl attacks the covalent adduct to release the enzyme and generate duplex DNA. Many DNA breaking-rejoining enzymes also have N-terminal domains, which show little sequence or structure similarity." 170 1 0 1 0 0 0 0 2 0.002947607 43 PHA02518 PHA02518 ParA-like protein; Provisional 211 0 0 0 0 0 0 1 1 0.002874034 238 pfam04855 SNF5 "SNF5 / SMARCB1 / INI1. SNF5 is a component of the yeast SWI/SNF complex, which is an ATP-dependent nucleosome-remodelling complex that regulates the transcription of a subset of yeast genes. SNF5 is a key component of all SWI/SNF-class complexes characterized so far. This family consists of the conserved region of SNF5, including a direct repeat motif. SNF5 is essential for the assembly promoter targeting and chromatin remodelling activity of the SWI-SNF complex. SNF5 is also known as SMARCB1, for SWI/SNF-related, matrix-associated, actin-dependent regulator of chromatin, subfamily b, member 1, and also INI1 for integrase interactor 1. Loss-of function mutations in SNF5 are thought to contribute to oncogenesis in malignant rhabdoid tumors (MRTs)." 180 1 0 0 0 0 0 0 1 0.002846652 60 PRK09692 PRK09692 integrase; Provisional 413 1 0 0 0 0 0 0 1 0.00281726 119 cd01184 INT_C_like_1 "Uncharacterized site-specific tyrosine recombinase, C-terminal catalytic domain. Tyrosine recombinase (integrase) belongs to a DNA breaking-rejoining enzyme superfamily. The catalytic domain containing six conserved active site residues. The recombination reaction involves cleavage of a single strand of a DNA duplex by nucleophilic attack of a conserved tyrosine to give a 3' phosphotyrosyl protein-DNA adduct. In the second rejoining step, a terminal 5' hydroxyl attacks the covalent adduct to release the enzyme and generate duplex DNA. Many DNA breaking-rejoining enzymes also have N-terminal domains, which show little sequence or structure similarity." 180 1 0 1 0 0 0 0 2 0.002799943 330 pfam13976 gag_pre-integrs GAG-pre-integrase domain. This domain is found associated with retroviral insertion elements and lies just upstream of the integrase region on the polyproteins. 67 1 0 0 0 0 0 0 1 0.00278842 61 PRK09870 PRK09870 tyrosine recombinase; Provisional 200 0 0 1 0 0 0 0 1 0.002781054 79 TIGR02224 recomb_XerC "tyrosine recombinase XerC. The phage integrase family describes a number of recombinases with tyrosine active sites that transiently bind covalently to DNA. Many are associated with mobile DNA elements, including phage, transposons, and phase variation loci. This model represents XerC, one of two closely related chromosomal proteins along with XerD (TIGR02225). XerC and XerD are site-specific recombinases which help resolve chromosome dimers to monomers for cell division after DNA replication. In species with a large chromosome and homologs of XerC on other replicons, the chomosomal copy was preferred for building this model. This model does not detect all XerC, as some apparent XerC examples score in the gray zone between trusted (450) and noise (410) cutoffs, along with some XerD examples. XerC and XerD interact with cell division protein FtsK. [DNA metabolism, DNA replication, recombination, and repair]" 295 1 0 1 0 0 0 0 2 0.002716992 288 pfam12323 HTH_OrfB_IS605 Helix-turn-helix domain. This is the N terminal helix-turn-helix domain of Transposase_2 pfam01385. 47 0 0 0 1 0 0 0 1 0.002709763 213 pfam01710 HTH_Tnp_IS630 Transposase. Transposase proteins are necessary for efficient DNA transposition. This family includes insertion sequences from Synechocystis PCC 6803 three of which are characterized as homologous to bacterial IS5- and IS4- and to several members of the IS630-Tc1-mariner superfamily. 119 0 0 0 1 0 0 0 1 0.002486067 314 pfam13518 HTH_28 Helix-turn-helix domain. This helix-turn-helix domain is often found in transposases and is likely to be DNA-binding. 52 0 0 0 1 0 0 0 1 0.002451119 208 pfam01548 DEDD_Tnp_IS110 Transposase. Transposase proteins are necessary for efficient DNA transposition. This family includes an amino-terminal region of the pilin gene inverting protein (PIVML) and of members of the IS111A/IS1328/IS1533 family of transposases. The C-terminus is represented by family pfam02371. 155 0 0 0 1 0 0 0 1 0.002375442 309 pfam13356 Arm-DNA-bind_3 Arm DNA-binding domain. This DNA-binding domain is found at the N-terminus of a wide variety of phage integrase proteins. 77 1 0 0 0 0 0 0 1 0.002306953 71 TIGR01633 phi3626_gp14_N "putative phage tail component, N-terminal domain. This model represents the best-conserved region of about 125 amino acids, toward the N-terminus, of a family of proteins from temperate phage of a number of Gram-positive bacteria. These phage proteins range in length from 230 to 525 amino acids. [Mobile and extrachromosomal element functions, Prophage functions]" 124 0 0 0 0 0 1 0 1 0.002262556 222 pfam02914 DDE_2 Bacteriophage Mu transposase. 217 0 0 0 1 0 0 0 1 0.002199795 345 pfam17762 HTH_ParB HTH domain found in ParB protein. 52 0 0 0 0 0 0 1 1 0.00212988 327 pfam13843 DDE_Tnp_1_7 Transposase IS4. 352 0 0 0 1 0 0 0 1 0.002038149 364 pfam18803 CxC2 CxC2 like cysteine cluster associated with KDZ transposases. A predicted Zinc chelating domain present N-terminal to the KDZ transposase domain. 107 0 0 0 1 0 0 0 1 0.001879598 8 COG2452 COG2452 "Predicted site-specific integrase-resolvase [Mobilome: prophages, transposons]. " 193 1 0 0 0 0 0 0 1 0.001832967 12 COG2963 InsE "Transposase and inactivated derivatives [Mobilome: prophages, transposons]. " 116 0 0 0 1 0 0 0 1 0.001792404 64 PRK13413 mpi master DNA invertase Mpi family serine-type recombinase. 200 0 0 1 0 0 0 0 1 0.001789561 253 pfam07022 Phage_CI_repr "Bacteriophage CI repressor helix-turn-helix domain. This family consists of several phage CI repressor proteins and related bacterial sequences. The CI repressor is known to function as a transcriptional switch, determining whether transcription is lytic or lysogenic." 65 0 0 0 0 1 0 0 1 0.001783966 194 cd18974 CD_POL_like "chromodomain of Penicillium solitum protein PENSOL_c198G03123. This subgroup includes the CHROMO (CHRromatin Organization Modifier) domain found in Penicillium solitum protein PENSOL_c198G03123 a putative polyprotein from a Ty3/Gypsy long terminal repeat (LTR) retroelement. The pol gene in TY3/gypsy elements generally encodes domains in the following order: an aspartyl protease, a reverse transcriptase, RNase H, and an integrase, here the chromodomain is found at the C-terminus of the integrase domain. The chromodomain, is a conserved region of about 50 amino acids, found in a variety of chromosomal proteins, and implicated in the binding, of the proteins in which it is found, to methylated histone tails and maybe RNA. A chromodomain may occur as a single instance, in a tandem arrangement, or followed by a related chromo shadow domain." 50 1 0 0 0 0 0 0 1 0.0017606 181 cd16411 ParB_N_like "ParB N-terminal, parA -binding, domain of bacterial and plasmid parABS partitioning systems. This family represents the N-terminal domain of ParB, a DNA-binding component of the prokaryotic parABS partitioning system. parABS contributes to the efficient segregation of chromosomes and low-copy number plasmids to daughter cells during prokaryotic cell division. The process includes the parA (Walker box) ATPase, the ParB DNA-binding protein and a parS cis-acting DNA sites. Binding of ParB to centromere-like parS sites is followed by non-specific binding to DNA (""spreading"", which has been implicated in gene silencing in plasmid P1) and oligomerization of additional ParB molecules near the parS sites. It has been proposed that ParB-ParB cross-linking compacts the DNA, binds to parA via the N-terminal region, and leads to parA separating the ParB-parS complexes and the recruitment of the SMC (structural maintenance of chromosomes) complexes. The ParB N-terminal domain of Bacillus subtilis and other species contains a Arginine-rich ParB Box II with residues essential for bridging of the ParB-parS complexes. The arginine-rich ParB Box II consensus (I[VIL]AGERR[FYW]RA[AS] identified in several species is partially conserved with this family and related families. Mutations within the basic columns particularly debilitate spreading from the parS sites and impair SMC recruitment. The C-terminal domain contains a HTH DNA-binding motif and is the primary homo-dimerization domain, and binds to parS DNA sites. Additional homo-dimerization contacts are found along the N-terminal domain, but dimerization of the N-terminus may only occur after concentration at ParB-parS foci." 90 0 0 0 0 0 0 1 1 0.001752753 321 pfam13683 rve_3 Integrase core domain. 67 1 0 0 0 0 0 0 1 0.001698378 334 pfam14657 Arm-DNA-bind_4 Arm DNA-binding domain. This family includes AP2-like domains found in a variety of phage integrase proteins. These domains bind to Arm DNA sites. 45 1 0 0 0 0 0 0 1 0.00165937 150 cd05481 retropepsin_like_LTR_1 "Retropepsins_like_LTR; pepsin-like aspartate protease from retrotransposons with long terminal repeats. Retropepsin of retrotransposons with long terminal repeats are pepsin-like aspartate proteases. While fungal and mammalian pepsins are bilobal proteins with structurally related N and C-terminals, retropepsins are half as long as their fungal and mammalian counterparts. The monomers are structurally related to one lobe of the pepsin molecule and retropepsins function as homodimers. The active site aspartate occurs within a motif (Asp-Thr/Ser-Gly), as it does in pepsin. Retroviral aspartyl protease is synthesized as part of the POL polyprotein that contains an aspartyl protease, a reverse transcriptase, RNase H, and an integrase. The POL polyprotein undergoes specific enzymatic cleavage to yield the mature proteins. In aspartate peptidases, Asp residues are ligands of an activated water molecule in all examples where catalytic residues have been identified. This group of aspartate peptidases is classified by MEROPS as the peptidase family A2 (retropepsin family, clan AA), subfamily A2A." 93 1 0 0 0 0 0 0 1 0.001630224 336 pfam14690 zf-ISL3 zinc-finger of transposase IS204/IS1001/IS1096/IS1165. 47 0 0 0 1 0 0 0 1 0.00162986 20 COG3415 COG3415 "Transposase [Mobilome: prophages, transposons]. " 138 0 0 0 1 0 0 0 1 0.001620555 265 pfam09121 Tower "Tower. Members of this family adopt a secondary structure consisting of a pair of long, antiparallel alpha-helices (the stem) that support a three-helix bundle (3HB) at their end. The 3HB contains a helix-turn-helix motif and is similar to the DNA binding domains of the bacterial site-specific recombinases, and of eukaryotic Myb and homeodomain transcription factors. The Tower domain has an important role in the tumor suppressor function of BRCA2, and is essential for appropriate binding of BRCA2 to DNA." 42 0 0 1 0 0 0 0 1 0.001617367 335 pfam14659 Phage_int_SAM_3 "Phage integrase, N-terminal SAM-like domain. This domain is found in a variety of phage integrase proteins." 57 1 0 0 0 0 0 0 1 0.001600232 303 pfam13011 LZ_Tnp_IS481 "leucine-zipper of insertion element IS481. This is the upstream region of the conjoined ORF AB of insertion element 481. The significance of IS481 in the detection of Bordetella pertussis is discussed in. The B portion of the ORF AB carries the transposase activity in family rve, pfam00665." 85 0 0 0 1 0 0 0 1 0.001557903 332 pfam14372 DUF4413 "Domain of unknown function (DUF4413). This domain is part of an RNase-H fold section of longer proteins some of which are transposable elements possibly of the Pong type, since some members are putative Tam3 transposases." 100 0 0 0 1 0 0 0 1 0.001505331 116 cd01123 Rad51_DMC1_radA "Rad51_DMC1_radA,B. This group of recombinases includes the eukaryotic proteins RAD51, RAD55/57 and the meiosis-specific protein DMC1, and the archaeal proteins radA and radB. They are closely related to the bacterial RecA group. Rad51 proteins catalyze a similiar recombination reaction as RecA, using ATP-dependent DNA binding activity and a DNA-dependent ATPase. However, this reaction is less efficient and requires accessory proteins such as RAD55/57 ." 235 0 0 1 0 0 0 0 1 0.001469022 333 pfam14549 P22_Cro DNA-binding transcriptional regulator Cro. Bacteriophage P22 Cro protein represses genes normally expressed in early phage development and is necessary for the late stage of lytic growth. It does this by binding to the OL and OR operator-regions normally used by the repressor protein for lysogenic maintenance. 59 0 0 0 0 1 0 0 1 0.001466596 331 pfam14319 Zn_Tnp_IS91 Transposase zinc-binding domain. This domain is likely to be a zinc-binding domain. It is found at the N-terminus of transposases belonging to the IS91 family. 92 0 0 0 1 0 0 0 1 0.001458343 3 COG1475 Spo0J "Chromosome segregation protein Spo0J, contains ParB-like nuclease domain [Cell cycle control, cell division, chromosome partitioning]. " 240 0 0 0 0 0 0 1 1 0.001417006 320 pfam13613 HTH_Tnp_4 "Helix-turn-helix of DDE superfamily endonuclease. This domain is the probable DNA-binding region of transposase enzymes, necessary for efficient DNA transposition. Most of the members derive from the IS superfamily IS5 and rather fewer from IS4." 53 0 0 0 1 0 0 0 1 0.001392964 227 pfam03050 DDE_Tnp_IS66 Transposase IS66 family. Transposase proteins are necessary for efficient DNA transposition. This family includes IS66 from Agrobacterium tumefaciens. 282 0 0 0 1 0 0 0 1 0.001384287 308 pfam13340 DUF4096 Putative transposase of IS4/5 family (DUF4096). 76 0 0 0 1 0 0 0 1 0.00138333 243 pfam05598 DUF772 Transposase domain (DUF772). This presumed domain is found at the N-terminus of many proteins found in transposons. 73 0 0 0 1 0 0 0 1 0.001367634 94 TIGR03764 ICE_PFGI_1_parB "integrating conjugative element, PFGI_1 class, ParB family protein. Members of this protein family carry the ParB-type nuclease domain and are found in integrating conjugative elements (ICE) in the same class as PFGI-1 of Pseudomonas fluorescens Pf-5." 258 0 0 0 0 0 0 1 1 0.001351768 356 pfam18718 CxC5 CxC5 like cysteine cluster associated with KDZ transposases. A predicted Zinc chelating domain present N-terminal to the KDZ transposase domain. 117 0 0 0 1 0 0 0 1 0.001336258 210 pfam01610 DDE_Tnp_ISL3 "Transposase. Transposase proteins are necessary for efficient DNA transposition. Contains transposases for IS204, IS1001, IS1096 and IS1165." 238 0 0 0 1 0 0 0 1 0.001297647 278 pfam10743 Phage_Cox "Regulatory phage protein cox. This family of phage Cox proteins is expressed by Enterobacteria phages. The Cox protein is a 79-residue basic protein with a predicted strong helix-turn-helix DNA-binding motif. It inhibits integrative recombination and it activates site-specific excision of the HP1 genome from the Haemophilus influenzae chromosome, Hp1. Cox appears to function as a tetramer. Cox binding sites consist of two direct repeats of the consensus motif 5'-GGTMAWWWWA, one Cox tetramer binding to each motif. Cox binding interferes with the interaction of HP1 integrase with one of its binding sites, IBS5. This competition is central to directional control. Both Cox binding sites are needed for full inhibition of integration and for activating excision, because it plays a positive role in assembling the nucleoprotein complexes that produce excisive recombination, by inducing the formation of a critical conformation in those complexes." 87 1 0 0 0 0 0 0 1 0.001286061 55 PRK08181 PRK08181 transposase; Validated 269 0 0 0 1 0 0 0 1 0.001255004 101 cd00303 retropepsin_like "Retropepsins; pepsin-like aspartate proteases. The family includes pepsin-like aspartate proteases from retroviruses, retrotransposons and retroelements, as well as eukaryotic dna-damage-inducible proteins (DDIs), and bacterial aspartate peptidases. While fungal and mammalian pepsins are bilobal proteins with structurally related N and C-terminals, retropepsins are half as long as their fungal and mammalian counterparts. The monomers are structurally related to one lobe of the pepsin molecule and retropepsins function as homodimers. The active site aspartate occurs within a motif (Asp-Thr/Ser-Gly), as it does in pepsin. Retroviral aspartyl protease is synthesized as part of the POL polyprotein that contains an aspartyl protease, a reverse transcriptase, RNase H, and an integrase. The POL polyprotein undergoes specific enzymatic cleavage to yield the mature proteins. In aspartate peptidases, Asp residues are ligands of an activated water molecule in all examples where catalytic residues have been identified. This group of aspartate peptidases is classified by MEROPS as the peptidase family A2 (retropepsin family, clan AA), subfamily A2A." 92 1 0 0 0 0 0 0 1 0.001181021 315 pfam13542 HTH_Tnp_ISL3 Helix-turn-helix domain of transposase family ISL3. 51 0 0 0 1 0 0 0 1 0.001124347 49 PRK00283 xerD tyrosine recombinase. 299 0 0 1 0 0 0 0 1 0.001090388 189 cd18728 PIN_N4BP1-like "PRORP-like PIN domain of NEDD4 binding protein 1 and related proteins. NEDD4-binding partner-1 (N4BP1) interacts with and is a substrate of NEDD4 ubiquitin ligase (neural precursor cell expressed, developmentally down-regulated 4, E3 ubiquitin protein ligase). It is also an inhibitor of the E3 ubiquitin-protein ligase ITCH, a NEDD4 structurally related E3. This subfamily additionally includes NYNRIN (NYN domain and retroviral integrase containing, also known as CGIN1/Cousin of GIN1), and KHNYN (KH and NYN domain containing) protein. N4BP1, CGIN1, and KHNYN proteins are probably of retroviral origin. This subfamily belongs to the Zc3h12a-N4BP1-like PIN subfamily of the PRORP-Zc3h12a-like PIN family, the latter of which additionally includes human PRORP, also known as proteinaceous RNase P and mitochondrial RNase P protein subunit 3 (MRPP3), and Arabidopsis thaliana PRORP1-3. The PIN (PilT N terminus) domain belongs to a large nuclease superfamily. The structural properties of the PIN domain indicate its active center, consisting of three highly conserved catalytic residues which coordinate metal ions; in some members, additional metal coordinating residues can be found while some others lack several of these key catalytic residues. The PIN active site is geometrically similar in the active center of structure-specific 5' nucleases, PIN-domain ribonucleases of eukaryotic rRNA editing proteins, and bacterial toxins of toxin-antitoxin (TA) operons." 127 1 0 0 0 0 0 0 1 0.00105675 242 pfam05269 Phage_CII Bacteriophage CII protein. This family consists of several phage CII regulatory proteins. CII plays a key role in the lysis-lysogeny decision in bacteriophage lambda and related phages. 91 0 0 0 0 1 0 0 1 0.001051183 42 PHA02517 PHA02517 putative transposase OrfB; Reviewed 277 0 0 0 1 0 0 0 1 0.00098931 322 pfam13700 DUF4158 "Domain of unknown function (DUF4158). The exact function of this domain is not clear, but it frequently occurs as an N-terminal region of transposase 3 or IS3 family of insertion elements." 166 0 0 0 1 0 0 0 1 0.000982046 58 PRK09409 PRK09409 IS2 transposase TnpB; Reviewed 301 0 0 0 1 0 0 0 1 0.000945123 370 smart00674 CENPB "Putative DNA-binding domain in centromere protein B, mouse jerky and transposases. " 66 0 0 0 1 0 0 0 1 0.000943745 313 pfam13495 Phage_int_SAM_4 "Phage integrase, N-terminal SAM-like domain. " 84 1 0 0 0 0 0 0 1 0.000930603 104 cd00447 NusB_Sun "RNA binding domain of NusB (N protein-Utilization Substance B) and Sun (also known as RrmB or Fmu) proteins. This family includes two orthologous groups exemplified by the transcription termination factor NusB and the N-terminal domain of the rRNA-specific 5-methylcytidine transferase (m5C-methyltransferase) Sun. The NusB protein plays a key role in the regulation of ribosomal RNA biosynthesis in eubacteria by modulating the efficiency of transcriptional antitermination. NusB along with other Nus factors (NusA, NusE/S10 and NusG) forms the core complex with the boxA element of the nut site of the rRNA operons. These interactions help RNA polymerase to counteract polarity during transcription of rRNA operons and allow stable antitermination. The transcription antitermination system can be appropriated by some bacteriophages such as lambda, which use the system to switch between the lysogenic and lytic modes of phage propagation. The m5C-methyltransferase Sun shares the N-terminal non-catalytic RNA-binding domain with NusB." 129 0 0 0 0 1 0 0 1 0.000916133 66 PRK14702 PRK14702 insertion element IS2 transposase InsD; Provisional 262 0 0 0 1 0 0 0 1 0.000858724 236 pfam04754 Transposase_31 "Putative transposase, YhgA-like. This family of putative transposases includes the YhgA sequence from Escherichia coli and several prokaryotic homologs." 202 0 0 0 1 0 0 0 1 0.000852626 271 pfam09684 Tail_P2_I Phage tail protein (Tail_P2_I). These sequences represent the family of phage P2 protein I and related tail proteins from a number of temperate phage of Gram-negative bacteria. 132 0 0 0 0 0 1 0 1 0.000817296 347 pfam17921 Integrase_H2C2 Integrase zinc binding domain. This zinc binding domain is found in a wide variety of integrase proteins. 58 1 0 0 0 0 0 0 1 0.000817054 355 pfam18717 CxC4 CxC4 like cysteine cluster associated with KDZ transposases. A predicted Zinc chelating domain present N-terminal to the KDZ transposase domain. 133 0 0 0 1 0 0 0 1 0.000798088 221 pfam02899 Phage_int_SAM_1 "Phage integrase, N-terminal SAM-like domain. " 84 1 0 0 0 0 0 0 1 0.000761913 69 TIGR00616 rect "recombinase, phage RecT family. All proteins in this family for which functions are known bind ssDNA and are involved in the the pairing of homologous DNA This family is based on the phylogenomic analysis of JA Eisen (1999, Ph.D. Thesis, Stanford University). RecT and homologs are found in prophage regions of bacterial genomes. RecT works with a partner protein, RecE. [DNA metabolism, DNA replication, recombination, and repair, Mobile and extrachromosomal element functions, Prophage functions]" 241 0 0 1 0 0 0 0 1 0.00072543 293 pfam12762 DDE_Tnp_IS1595 "ISXO2-like transposase domain. This domain probably functions as an integrase that is found in a wide variety of transposases, including ISXO2." 151 1 0 0 1 0 0 0 2 0.000708906 349 pfam18090 SoPB_HTH "Centromere-binding protein HTH domain. This domain is found in centromere-binding protein (SopB). SopB displays an intriguing range of DNA-binding properties essential for partition; it binds the centromere to form a partition complex, which recruits NTPase (SopA), and it also inhibits SopA polymerization. The domain has a helix-turn-helix (HTH) structure and is thought to be the specific DNA-binding domain mainly through residues from the recognition helix, alpha 3, of the HTH. The domain has show structural similarity to the DNA-binding domains of P1 ParB and KorB." 75 0 0 0 0 0 0 1 1 0.000698434 78 TIGR02126 phgtail_TP901_1 "phage major tail protein, TP901-1 family. This family includes the members of pfam06199 but is broader. Characterized members are major tail proteins from various phage, including lactococcal temperate bacteriophage TP901-1. [Mobile and extrachromosomal element functions, Prophage functions]" 136 0 0 0 0 0 1 0 1 0.000697965 299 pfam13005 zf-IS66 zinc-finger binding domain of transposase IS66. This is a zinc-finger region of the N-terminus of the insertion element IS66 transposase. 46 0 0 0 1 0 0 0 1 0.000695447 27 COG4342 COG4342 "Intergrase/Recombinase [Mobilome: prophages, transposons]. " 291 0 0 1 0 0 0 0 1 0.000681337 269 pfam09588 YqaJ "YqaJ-like viral recombinase domain. This protein family is found in many different bacterial species but is of viral origin. The protein forms an oligomer and functions as a processive alkaline exonuclease that digests linear double-stranded DNA in a Mg(2+)-dependent reaction, It has a preference for 5'-phosphorylated DNA ends. It thus forms part of the two-component SynExo viral recombinase functional unit." 144 0 0 1 0 0 0 0 1 0.00067694 261 pfam08775 ParB "ParB family. ParB is a component of the par system which mediates accurate DNA partition during cell division. It recognizes A-box and B-box DNA motifs. ParB forms an asymmetric dimer with 2 extended helix-turn-helix (HTH) motifs that bind to A-boxes. The HTH motifs emanate from a beta sheet coiled coil DNA binding module. Both DNA binding elements are free to rotate around a flexible linker, this enables them to bind to complex arrays of A- and B-box elements on adjacent DNA arms of the looped partition site." 125 0 0 0 0 0 0 1 1 0.000662649 254 pfam07282 OrfB_Zn_ribbon Putative transposase DNA-binding domain. This putative domain is found at the C-terminus of a large number of transposase proteins. This domain contains four conserved cysteines suggestive of a zinc binding domain. Given the need for transposases to bind DNA as well as the large number of DNA-binding zinc fingers we hypothesize this domain is DNA-binding. 69 0 0 0 1 0 0 0 1 0.000624814 350 pfam18103 SH3_11 "Retroviral integrase C-terminal SH3 domain. This is the carboxy-terminal domain (CTD) found in retroviral integrase, an essential retroviral enzyme that binds both termini of linear viral DNA and inserts them into a host cell chromosome. The CTD adopts an SH3-like fold. Each CTD makes contact with the phosphodiester backbone of both viral DNA molecules, essentially crosslinking the structure." 63 1 0 0 0 0 0 0 1 0.000616249 341 pfam16795 Phage_integr_3 Archaeal phage integrase. catalyzes cleavage and ligation of DNA. 162 1 0 0 0 0 0 0 1 0.000587622 154 cd06095 RP_RTVL_H_like "Retropepsin of the RTVL_H family of human endogenous retrovirus-like elements. This family includes aspartate proteases from retroelements with LTR (long terminal repeats) including the RTVL_H family of human endogenous retrovirus-like elements. While fungal and mammalian pepsins are bilobal proteins with structurally related N- and C-termini, retropepsins are half as long as their fungal and mammalian counterparts. The monomers are structurally related to one lobe of the pepsin molecule and retropepsins function as homodimers. The active site aspartate occurs within a motif (Asp-Thr/Ser-Gly), as it does in pepsin. Retroviral aspartyl protease is synthesized as part of the POL polyprotein that contains an aspartyl protease, a reverse transcriptase, RNase H, and an integrase. The POL polyprotein undergoes specific enzymatic cleavage to yield the mature proteins. In aspartate peptidases, Asp residues are ligands of an activated water molecule in all examples where catalytic residues have been identified. This group of aspartate peptidases is classified by MEROPS as the peptidase family A2 (retropepsin family, clan AA), subfamily A2A." 86 1 0 0 0 0 0 0 1 0.000577396 131 cd01196 INT_C_like_6 "Uncharacterized site-specific tyrosine recombinase, C-terminal catalytic domain. Tyrosine recombinase (integrase) belongs to a DNA breaking-rejoining enzyme superfamily. The catalytic domain contains six conserved active site residues. The recombination reaction involves cleavage of a single strand of a DNA duplex by nucleophilic attack of a conserved tyrosine to give a 3' phosphotyrosyl protein-DNA adduct. In the second rejoining step, a terminal 5' hydroxyl attacks the covalent adduct to release the enzyme and generate duplex DNA. Many DNA breaking-rejoining enzymes also have N-terminal domains, which show little sequence or structure similarity." 183 1 0 1 0 0 0 0 2 0.0005688 72 TIGR01634 tail_P2_I "phage tail protein, P2 protein I family. This model represents the family of phage P2 protein I and related tail proteins from a number of temperate phage of Gram-negative bacteria. This model is built as a fragment model and identifies some phage tail proteins with strong but local similarity to members of the seed alignment. [Mobile and extrachromosomal element functions, Prophage functions]" 139 0 0 0 0 0 1 0 1 0.000548808 153 cd06094 RP_Saci_like "RP_Saci_like, retropepsin family. Retropepsin on retrotransposons with long terminal repeats (LTR) including Saci-1, -2 and -3 of Schistosoma mansoni. Retropepsins are related to fungal and mammalian pepsins. While fungal and mammalian pepsins are bilobal proteins with structurally related N- and C-termini, retropepsins are half as long as their fungal and mammalian counterparts. The monomers are structurally related to one lobe of the pepsin molecule and retropepsins function as homodimers. The active site aspartate occurs within a motif (Asp-Thr/Ser-Gly), as it does in pepsin. Retroviral aspartyl protease is synthesized as part of the POL polyprotein that contains an aspartyl protease, a reverse transcriptase, RNase H, and an integrase. The POL polyprotein undergoes specific enzymatic cleavage to yield the mature proteins. In aspartate peptidases, Asp residues are ligands of an activated water molecule in all examples where catalytic residues have been identified. This group of aspartate peptidases is classified by MEROPS as the peptidase family A2 (retropepsin family, clan AA), subfamily A2A." 89 1 0 0 0 0 0 0 1 0.000547708 29 COG4584 COG4584 "Transposase [Mobilome: prophages, transposons]. " 278 0 0 0 1 0 0 0 1 0.000527802 275 pfam10539 Dev_Cell_Death "Development and cell death domain. The DCD domain is found in plant proteins involved in development and cell death. The DCD domain is an approximately 130 amino acid long stretch that contains several mostly invariable motifs. These include a FGLP and a LFL motif at the N-terminus and a PAQV and a PLxE motif towards the C-terminus of the domain. The DCD domain is present in proteins with different architectures. Some of these proteins contain additional recognisable motifs, like the KELCH repeats or the ParB domain." 126 0 0 0 0 0 0 1 1 0.000518737 247 pfam06291 Lambda_Bor "Bor protein. This family consists of several Bacteriophage lambda Bor and Escherichia coli Iss proteins. Expression of bor significantly increases the survival of the Escherichia coli host cell in animal serum. This property is a well known bacterial virulence determinant indeed, bor and its adjacent sequences are highly homologous to the iss serum resistance locus of the plasmid ColV2-K94, which confers virulence in animals. It has been suggested that lysogeny may generally have a role in bacterial survival in animal hosts, and perhaps in pathogenesis." 74 0 0 0 0 1 0 0 1 0.000486848 32 COG4974 XerD "Site-specific recombinase XerD [Replication, recombination and repair]. " 300 0 0 1 0 0 0 0 1 0.000483565 162 cd16390 ParB_N_Srx_like "uncharacterized family distantly related to the N-terminal domain of the ParB/Srx superfamily. Uncharacterized proteins distantly related to the N-terminal domain of the ParB superfamily, primarily involved in bacterial and plasmid parABS-related partitioning systems. A small minority of proteins in this family include a C-terminal inorganic pyrophosphatase domain. Also within the ParB superfamily is sulfiredoxin (Srx), which is a reactivator of oxidatively inactivated 2-cys peroxiredoxins. Other families includes a putative regulator in the biosynthetic gene cluster and a family containing a Pyrococcus furiosus nuclease and putative transcriptional regulators SbnI (Staphylococcus aureus siderophore biosynthetic gene cluster ) and EdeB (Brevibacillus brevis antimicrobial peptide edeine biosynthetic cluster). Nuclease activity has also been reported in Arabidopsis Srx." 162 0 0 0 0 0 0 1 1 0.000448615 306 pfam13102 Phage_int_SAM_5 Phage integrase SAM-like domain. A family of uncharacterized proteins found by clustering human gut metagenomic sequences. This family appears related to the N-terminal domain of phage integrases. 98 1 0 0 0 0 0 0 1 0.000440776 177 cd16407 ParB_N_like "ParB N-terminal, parA-binding, -like domain of bacterial and plasmid parABS partitioning systems. This family represents the N-terminal domain of ParB, a DNA-binding component of the prokaryotic parABS partitioning system. parABS contributes to the efficient segregation of chromosomes and low-copy number plasmids to daughter cells during prokaryotic cell division. The process includes the parA (Walker box) ATPase, the ParB DNA-binding protein and a parS cis-acting DNA sites. Binding of ParB to centromere-like parS sites is followed by non-specific binding to DNA (""spreading"", which has been implicated in gene silencing in plasmid P1) and oligomerization of additional ParB molecules near the parS sites. It has been proposed that ParB-ParB cross-linking compacts the DNA, binds to parA via the N-terminal region, and leads to parA separating the ParB-parS complexes and the recruitment of the SMC (structural maintenance of chromosomes) complexes. The ParB N-terminal domain of Bacillus subtilis and other species contains a Arginine-rich ParB Box II with residues essential for bridging of the ParB-parS complexes. The arginine-rich ParB Box II consensus (I[VIL]AGERR[FYW]RA[AS] identified in several species is partially conserved with this family and related families. Mutations within the basic columns particularly debilitate spreading from the parS sites and impair SMC recruitment. The C-terminal domain contains a HTH DNA-binding motif and is the primary homo-dimerization domain, and binds to parS DNA sites. Additional homo-dimerization contacts are found along the N-terminal domain, but dimerization of the N-terminus may only occur after concentration at ParB-parS foci." 86 0 0 0 0 0 0 1 1 0.000428905 325 pfam13751 DDE_Tnp_1_6 "Transposase DDE domain. Transposase proteins are necessary for efficient DNA transposition. This domain is a member of the DDE superfamily, which contain three carboxylate residues that are believed to be responsible for coordinating metal ions needed for catalysis." 125 0 0 0 1 0 0 0 1 0.000406758 270 pfam09669 Phage_pRha "Phage regulatory protein Rha (Phage_pRha). Members of this protein family are found in temperate phage and bacterial prophage regions. Members include the product of the rha gene of the lambdoid phage phi-80, a late operon gene. The presence of this gene interferes with infection of bacterial strains that lack integration host factor (IHF), which regulates the rha gene. It is suggested that Rha is a phage regulatory protein." 91 0 0 0 0 0 1 0 1 0.000396809 62 PRK09871 PRK09871 tyrosine recombinase; Provisional 198 0 0 1 0 0 0 0 1 0.000395318 257 pfam07825 Exc "Excisionase-like protein. The phage-encoded excisionase protein (Xis) is involved in excisive recombination by regulating the assembly of the excisive intasome and by inhibiting viral integration. It adopts an unusual 'winged'-helix structure in which two alpha helices are packed against two extended strands. Also present in the structure is a two-stranded anti-parallel beta-sheet, whose strands are connected by a four-residue 'wing'. During interaction with DNA, helix alpha2 is thought to insert into the major groove, while the wing contacts the adjacent minor groove or phosphodiester backbone. The C-terminal region of Xis is involved in interaction with phage-encoded integrase (Int), and a putative C-terminal alpha helix may fold upon interaction with Int and/or DNA." 72 1 1 0 0 0 0 0 2 0.00038999 251 pfam06806 DUF1233 Putative excisionase (DUF1233). This family consists of several putative phage excisionase proteins of around 80 residues in length. 70 0 1 0 0 0 0 0 1 0.00036673 13 COG3039 IS5 "Transposase and inactivated derivatives, IS5 family [Mobilome: prophages, transposons]. " 230 0 0 0 1 0 0 0 1 0.000356079 206 pfam01526 DDE_Tnp_Tn3 "Tn3 transposase DDE domain. This family includes transposases of Tn3, Tn21, Tn1721, Tn2501, Tn3926 transposons from E-coli. The specific binding of the Tn3 transposase to DNA has been demonstrated. Sequence analysis has suggested that the invariant triad of Asp689, Asp765, Glu895 (numbering as in Tn3) may correspond to the D-D-35-E motif previously implicated in the catalysis of numerous transposases." 388 0 0 0 1 0 0 0 1 0.000349768 280 pfam11372 DUF3173 "Domain of unknown function (DUF3173). This family of proteins with unknown function appears to be restricted to Firmicutes. These proteins appear to be distantly related to HHH domains and are therefore likely to be DNA-binding. Genomic environment-visualisation confirms the likely function as being DNA-binding, as this short protein lies very closely between an integrase and a replication protein (http://www.microbesonline.org/)." 58 1 0 0 0 0 0 0 1 0.000338652 263 pfam09003 Arm-DNA-bind_1 "Bacteriophage lambda integrase, Arm DNA-binding domain. The amino terminal domain of bacteriophage lambda integrase folds into a three-stranded, antiparallel beta-sheet that packs against a C-terminal alpha-helix, adopting a fold that is structurally related to the three-stranded beta-sheet family of DNA-binding domains (which includes the GCC-box DNA-binding domain and the N-terminal domain of Tn916 integrase). This domain is responsible for high-affinity binding to each of the five DNA arm-type sites and is also a context-sensitive modulator of DNA cleavage." 72 1 0 0 0 0 0 0 1 0.000338177 246 pfam06199 Phage_tail_2 "Phage tail tube protein. characterized members are major tail tube proteins from various phages, including lactococcal temperate bacteriophage TP901-1." 134 0 0 0 0 0 1 0 1 0.000335728 18 COG3335 COG3335 "Transposase [Mobilome: prophages, transposons]. " 132 0 0 0 1 0 0 0 1 0.000319108 126 cd01191 INT_C_like_2 "Uncharacterized site-specific tyrosine recombinase, C-terminal catalytic domain. Tyrosine recombinase (integrase) belongs to a DNA breaking-rejoining enzyme superfamily. The catalytic domain contains six conserved active site residues. The recombination reaction involves cleavage of a single strand of a DNA duplex by nucleophilic attack of a conserved tyrosine to give a 3' phosphotyrosyl protein-DNA adduct. In the second rejoining step, a terminal 5' hydroxyl attacks the covalent adduct to release the enzyme and generate duplex DNA. Many DNA breaking-rejoining enzymes also have N-terminal domains, which show little sequence or structure similarity." 176 1 0 1 0 0 0 0 2 0.000313946 199 pfam00552 IN_DBD_C Integrase DNA binding domain. Integrase mediates integration of a DNA copy of the viral genome into the host chromosome. Integrase is composed of three domains. The amino-terminal domain is a zinc binding domain. The central domain is the catalytic domain pfam00665. This domain is the carboxyl terminal domain that is a non-specific DNA binding domain. 50 1 0 0 0 0 0 0 1 0.000307144 185 cd16844 ParB_N_like_MT "ParB N-terminal-like domain, some attached to C-terminal S-adenosylmethionine-dependent methyltransferase domain. This family represents domains related to the N-terminal domain of ParB, a DNA-binding component of the prokaryotic parABS partitioning system, fused to a variety of C-terminal domains, including S-adenosylmethionine-dependent methyltransferase-like domains and DUF4417. parABS contributes to the efficient segregation of chromosomes and low-copy number plasmids to daughter cells during prokaryotic cell division. The process includes the parA (Walker box) ATPase, the ParB DNA-binding protein and a parS cis-acting DNA sites. Binding of ParB to centromere-like parS sites is followed by non-specific binding to DNA (""spreading"", which has been implicated in gene silencing in plasmid P1) and oligomerization of additional ParB molecules near the parS sites. It has been proposed that ParB-ParB cross-linking compacts the DNA, binds to parA via the N-terminal region, and leads to parA separating the ParB-parS complexes and the recruitment of the SMC (structural maintenance of chromosomes) complexes. The ParB N-terminal domain of Bacillus subtilis and other species contains a Arginine-rich ParB Box II with residues essential for bridging of the ParB-parS complexes. The arginine-rich ParB Box II consensus (I[VIL]AGERR[FYW]RA[AS] identified in several species is partially conserved with this family and related families. Mutations within the basic columns particularly debilitate spreading from the parS sites and impair SMC recruitment. The C-terminal domain contains a HTH DNA-binding motif and is the primary homo-dimerization domain, and binds to parS DNA sites. Additional homo-dimerization contacts are found along the N-terminal domain, but dimerization of the N-terminus may only occur after concentration at ParB-parS foci." 54 0 0 0 0 0 0 1 1 0.000305331 365 pfam18804 CxC3 CxC3 like cysteine cluster associated with KDZ transposases. A predicted Zinc chelating domain present N-terminal to the KDZ transposase domain. 113 0 0 0 1 0 0 0 1 0.000298324 256 pfam07592 DDE_Tnp_ISAZ013 "Rhodopirellula transposase DDE domain. These transposases are found in the planctomycete Rhodopirellula baltica, the cyanobacterium Nostoc, and the Gram-positive bacterium Streptomyces." 308 0 0 0 1 0 0 0 1 0.000279856 9 COG2801 Tra5 "Transposase InsO and inactivated derivatives [Mobilome: prophages, transposons]. " 232 0 0 0 1 0 0 0 1 0.000257419 23 COG3547 COG3547 "Transposase [Mobilome: prophages, transposons]. " 303 0 0 0 1 0 0 0 1 0.000257051 362 pfam18763 ddrB-ParB ddrB-like ParB superfamily domain. A member of the ParB/sulfiredoxin superfamily of proteins found in polyvalent proteins prototyped by the version in the phage P1 ddRB protein. These proteins are predicted to function as nucleases. 122 0 0 0 0 0 0 1 1 0.000240544 291 pfam12713 DUF3806 "Domain of unknown function (DUF3806). This family represent the C-terminal domain of the structure. In two related Bacteroides species the gene lies immediately upstream from a putative ATP binding component of an ATP transporter and a putative histidinol phosphatase. The structure of this domain is strikingly similar to the N-terminal structure of 1ma7 whose C-terminal domain is a phage integrase, pfam00589." 86 1 0 0 0 0 0 0 1 0.000229884 165 cd16394 sopB_N "N-terminal domain of sopB protein, which promotes proper partitioning of F1 plasmid. Escherichia coli SopB acts in the equitable partitioning of the F plasmid in the SopABC system. SopA binds to the sopAB promoter, while SopB binds SopC and helps stimulate polymerization of SopA in the presence of ATP and Mg(II). Mutation of SopA inhibits proper plasmid segregation. This N-terminal domain is related to the ParB N-terminal domain of bacterial and plasmid parABS partitioning systems, which binds parA." 67 0 0 0 0 0 0 1 1 0.000206699 197 cd18978 CD_DDE_transposase_like "chromodomain of Rhizopus microsporus putative DDE transposases, and similar proteins. This subgroup includes the CHROMO (CHRromatin Organization Modifier) domain found in Rhizopus microsporus putative DDE transposases, and similar proteins. The chromodomain, is a conserved region of about 50 amino acids, found in a variety of chromosomal proteins, and implicated in the binding, of the proteins in which it is found, to methylated histone tails and maybe RNA. A chromodomain may occur as a single instance, in a tandem arrangement, or followed by a related chromo shadow domain." 52 0 0 0 1 0 0 0 1 0.000193348 15 COG3316 Rve "Transposase (or an inactivated derivative) [Mobilome: prophages, transposons]. " 215 0 0 0 1 0 0 0 1 0.000189519 111 cd00799 INT_Cre_C "C-terminal catalytic domain of Cre recombinase (also called integrase). Cre-like recombinases are tyrosine based site specific recombinases. They belong to the superfamily of DNA breaking-rejoining enzymes, which share the same fold in their catalytic domain and the overall reaction mechanism. The bacteriophage P1 Cre recombinase maintains the circular phage replicon in a monomeric state by catalyzing a site-specific recombination between two loxP sites. The catalytic core domain of Cre recombinase is linked to a more divergent helical N-terminal domain, which interacts primarily with the DNA major groove proximal to the crossover region." 188 1 0 1 0 0 0 0 2 0.000189283 231 pfam03400 DDE_Tnp_IS1 IS1 transposase. Transposase proteins are necessary for efficient DNA transposition. This family represents bacterial IS1 transposases. 131 0 0 0 1 0 0 0 1 0.000177219 99 TIGR04402 mob_CxxC_CxxC "mobilome CxxCx(11)CxxC protein. Members of this family share twin CxxC motifs near the C-terminus, suggesting a DNA- or RNA-binding activity. The spacing between CxxC motifs is variable, from 11 to 16 amino acids. Members in general occur on plasmids or near other markers of lateral gene transfer (transposases, integrases, endonucleases, etc)." 186 1 0 0 1 0 0 0 2 0.000176011 287 pfam12167 Arm-DNA-bind_2 Arm DNA-binding domain. This domain is found at the N-terminus of various phage integrases. The domain binds to DNA. 64 1 0 0 0 0 0 0 1 0.000172529 166 cd16395 Srx "Sulfiredoxin reactivates peroxiredoxins after oxidative inactivation. Sulfiredoxin reduces and thereby re-activates 2-cys peroxiredoxins. Peroxiredoxins act as molecular switches, inactivating in response to hyperoxidation from hydrogen peroxide and other free radicals. Sulfiredoxin reactivates Prx-SO(2)(-) via ATP-Mg(2+)-dependent reduction. Arabidopsis sulfiredoxin has been described as a dual function enzyme, having nuclease activity in addition to the sulfiredoxin activity. This protein is similar to ParB N-terminal-like domain of bacterial and plasmid parABS partitioning systems." 90 0 0 0 0 0 0 1 1 0.000170541 223 pfam02920 Integrase_DNA DNA binding domain of tn916 integrase. 58 1 0 0 0 0 0 0 1 0.000166507 93 TIGR03742 PRTRC_F "PRTRC system protein F. A novel genetic system characterized by seven (usually) major proteins, including a ParB homolog and a ThiF homolog, is commonly found on plasmids or in bacterial chromosomal regions near phage, plasmid, or transposon markers. It is most common among the beta Proteobacteria. We designate the system PRTRC, or ParB-Related,ThiF-Related Cassette. This protein family is designated protein F. It is the most divergent of the families." 342 0 0 0 0 0 0 1 1 0.000157376 161 cd16389 FIN "fertility inhibition factors, including OSA and FiwA, related to the ParB/Srx superfamily. Osa and FiwA are fertility inhibition factors (FIN), which are employed by plasmids to block import of rival plasmids. Osa (oncogenic suppressive activity) of IncW group plasmid pSa gene inhibits the oncogenic properties of Agrobacterium tumefaciens. Osa is structurally similar to the ParB N-terminal domain/Srx superfamily of proteins: ParB acts in the bacterial and plasmid parABS partitioning systems. Osa has been shown to have ATPase and DNAse activities, an can block T-DNA transfer into plants. FiwA is encoded by plasmid RP1 and blocks the transfer of plasmid R388. The gene product of Haemophilus influenzae p1056.10c also blocks T-DNA transfer." 124 0 0 0 0 0 0 1 1 0.000156933 272 pfam09956 DUF2190 "Uncharacterized conserved protein (DUF2190). This domain, found in various hypothetical prokaryotic proteins, as well as in some putative RecA/RadA recombinases, has no known function." 103 0 0 1 0 0 0 0 1 0.000144565 2 COG0675 InsQ "Transposase [Mobilome: prophages, transposons]. " 364 0 0 0 1 0 0 0 1 0.000143185 182 cd16412 dndB "DNA sulfur modification protein DndB. dndB acts in the regulation of DNA modifications, including DNA phosphorothioation. DndB may act by binding near the phosphorothioate modification site and regulating access of the Dnd modification machinery to DNA. These proteins show similarity to the N-terminal domain of ParB, a DNA-binding component of the prokaryotic parABS partitioning system, and other members of the ParB/Srx superfamily." 333 0 0 0 0 0 0 1 1 0.000141709 65 PRK13698 PRK13698 ParB/RepB/Spo0J family plasmid partition protein. 323 0 0 0 0 0 0 1 1 0.00013892 125 cd01190 INT_StrepXerD_C_like "Putative XerD in Streptococcus pneumonia and similar proteins, C-terminal catalytic domain. This family includes a putative XerD recombinase in Streptococcus pneumonia and similar tyrosine recombinases. However, the members of this family contain unusual active site motifs from the XerD from Escherichia coli. E. coli XerD and homologous enzymes show four conserved amino acids R-H-R-H that are spaced along the C-terminal domain. The putative S. pneumoniae XerD contains three unique replacements at the conserved positions resulting in L-Q-R-L. Severe growth defects in a loss-of-function xerD mutant demonstrate an important in vivo function of the S. pneumoniae XerD protein. This family belongs to the superfamily of DNA breaking-rejoining enzymes, which share the same fold in their catalytic domain and the overall reaction mechanism. The catalytic domain contains six conserved active site residues. Their overall reaction mechanism involves cleavage of a single strand of a DNA duplex by nucleophilic attack of a conserved tyrosine to give a 3' phosphotyrosyl protein-DNA adduct. In the second rejoining step, a terminal 5' hydroxyl attacks the covalent adduct to release the enzyme and generate duplex DNA." 150 0 0 1 0 0 0 0 1 0.000129509 310 pfam13358 DDE_3 "DDE superfamily endonuclease. This family of proteins are related to pfam00665 and are probably endonucleases of the DDE superfamily. Transposase proteins are necessary for efficient DNA transposition. This domain is a member of the DDE superfamily, which contain three carboxylate residues that are believed to be responsible for coordinating metal ions needed for catalysis. The catalytic activity of this enzyme involves DNA cleavage at a specific site followed by a strand transfer reaction." 145 0 0 0 1 0 0 0 1 0.000105573 84 TIGR02681 phage_pRha "phage regulatory protein, rha family. Members of this protein family are found in temperate phage and bacterial prophage regions. Members include the product of the rha gene of the lambdoid phage phi-80, a late operon gene. The presence of this gene interferes with infection of bacterial strains that lack integration host factor (IHF), which regulates the rha gene. It is suggested that pRha is a phage regulatory protein. [Mobile and extrachromosomal element functions, Prophage functions]" 108 0 0 0 0 0 1 0 1 0.000102441 191 cd18971 CD_POL_like "chromodomain of a Magnaporthe grisea putative retrotransposon polyprotein, and similar proteins. This subgroup includes the CHROMO (CHRromatin Organization Modifier) domain found in a Magnaporthe grisea putative retrotransposon polyprotein which includes domains in the following order: an aspartyl protease, a reverse transcriptase, RNase H, and an integrase, here the chromodomain is found at the C-terminus of the integrase domain. The chromodomain, is a conserved region of about 50 amino acids, found in a variety of chromosomal proteins, and implicated in the binding, of the proteins in which it is found, to methylated histone tails and maybe RNA. A chromodomain may occur as a single instance, in a tandem arrangement, or followed by a related chromo shadow domain." 50 1 0 0 0 0 0 0 1 9.50E-05 54 PRK06526 PRK06526 transposase; Provisional 254 0 0 0 1 0 0 0 1 8.99E-05 302 pfam13009 Phage_Integr_2 Putative phage integrase. This family is found in association with IS elements. 323 1 0 0 0 0 0 0 1 8.43E-05 5 COG1662 InsB "Transposase and inactivated derivatives, IS1 family [Mobilome: prophages, transposons]. " 121 0 0 0 1 0 0 0 1 8.29E-05 307 pfam13333 rve_2 Integrase core domain. 52 1 0 0 0 0 0 0 1 7.56E-05 142 cd03714 RT_DIRS1 "RT_DIRS1: Reverse transcriptases (RTs) occurring in the DIRS1 group of retransposons. Members of the subfamily include the Dictyostelium DIRS-1, Volvox carteri kangaroo, and Panagrellus redivivus PAT elements. These elements differ from LTR and conventional non-LTR retrotransposons. They contain split direct repeat (SDR) termini, and have been proposed to integrate via double-stranded closed-circle DNA intermediates assisted by an encoded recombinase which is similar to gamma-site-specific integrase." 119 1 0 1 0 0 0 0 2 7.44E-05 284 pfam11646 DUF3258 Protein of unknown function DUF3258. This viral family are possible phage integrase proteins however this cannot be confirmed. 99 1 0 0 0 0 0 0 1 7.19E-05 342 pfam17241 DUF5314 Family of unknown function (DUF5314). This is a family of unknown function usually preceded by the GAG-pre-integrase domain pfam13976. 194 1 0 0 0 0 0 0 1 6.66E-05 248 pfam06316 Ail_Lom "Enterobacterial Ail/Lom protein. This family consists of several bacterial and phage Ail/Lom-like proteins. The Yersinia enterocolitica Ail protein is a known virulence factor. Proteins in this family are predicted to consist of eight transmembrane beta-sheets and four cell surface-exposed loops. It is thought that Ail directly promotes invasion and loop 2 contains an active site, perhaps a receptor-binding domain. The phage protein Lom is expressed during lysogeny, and encode host-cell envelope proteins. Lom is found in the bacterial outer membrane, and is homologous to virulence proteins of two other enterobacterial genera. It has been suggested that lysogeny may generally have a role in bacterial survival in animal hosts, and perhaps in pathogenesis." 199 0 0 0 0 1 0 0 1 6.42E-05 175 cd16405 RepB_like_N "plasmid segregation replication protein B like protein, N-terminal domain. RepB, found on plasmids and secondary chromosomes, works along with repA in directing plasmid segregation, and has been shown in Rhizobium etli to require the parS centromere-like sequence for full transcriptional repression of the repABC operon, inducing plasmid incompatibility. RepA is a Walker-type ATPase that complexes with RepB to form DNA-protein complexes in the presence of ATP/ADP. RepC is an initiator protein for the plasmid. repA and repB are homologous to the parA and ParB genes of the parABS partitioning system found on primary chromosomes." 91 0 0 0 0 0 0 1 1 6.18E-05 37 COG5471 COG5471 "Predicted phage recombinase, RecA/RadA family [Mobilome: prophages, transposons]. " 107 0 0 1 0 0 0 0 1 5.71E-05 184 cd16414 dndB_like "DNA-sulfur modification-associated domain. Family of proteins related to dndB. dndB acts in the regulation of DNA modifications, including DNA phosphorothioation. Both have a conserved DGQHR sequence motif. These proteins show similarity to the N-terminal domain of ParB, a DNA-binding component of the prokaryotic parABS partitioning system, and other members of the ParB/Srx superfamily" 238 0 0 0 0 0 0 1 1 4.75E-05 318 pfam13610 DDE_Tnp_IS240 "DDE domain. This DDE domain is found in a wide variety of transposases including those found in IS240, IS26, IS6100 and IS26." 139 0 0 0 1 0 0 0 1 3.96E-05 262 pfam08857 ParBc_2 Putative ParB-like nuclease. This domain is probably distantly related to pfam02195. Suggesting these uncharacterized proteins have a nuclease function. 159 0 0 0 0 0 0 1 1 3.08E-05 51 PRK02436 xerD site-specific tyrosine recombinase XerD. 245 0 0 1 0 0 0 0 1 2.64E-05 220 pfam02661 Fic "Fic/DOC family. This family consists of the Fic (filamentation induced by cAMP) protein and doc (death on curing). The Fic protein is involved in cell division and is suggested to be involved in the synthesis of PAB or folate, indicating that the Fic protein and cAMP are involved in a regulatory mechanism of cell division via folate metabolism. This family contains a central conserved motif HPFXXGNG in most members. The exact molecular function of these proteins is uncertain. P1 lysogens of Escherichia coli carry the prophage as a stable low copy number plasmid. The frequency with which viable cells cured of prophage are produced is about 10(-5) per cell per generation. A significant part of this remarkable stability can be attributed to a plasmid-encoded mechanism that causes death of cells that have lost P1. In other words, the lysogenic cells appear to be addicted to the presence of the prophage. The plasmid withdrawal response depends on a gene named doc (death on curing) that is represented by this family. Doc induces a reversible growth arrest of E. coli cells by targetting the protein synthesis machinery. Doc hosts the C-terminal domain of its antitoxin partner Phd (prevents host death) through fold complementation, a domain that is intrinsically disordered in solution but that folds into an alpha-helix on binding to Doc.This domain forms complexes with Phd antitoxins containing pfam02604." 95 0 0 0 0 1 0 0 1 1.79E-05 75 TIGR01766 tspaseT_teng_C "transposase, IS605 OrfB family, central region. This model represents a region of a sequence similarity between a family of putative transposases of Thermoanaerobacter tengcongensis, smaller related proteins from Bacillus anthracis, putative transposes described by pfam01385, and other proteins. [Mobile and extrachromosomal element functions, Transposon functions]" 82 0 0 0 1 0 0 0 1 1.18E-05 324 pfam13737 DDE_Tnp_1_5 "Transposase DDE domain. Transposase proteins are necessary for efficient DNA transposition. This domain is a member of the DDE superfamily, which contain three carboxylate residues that are believed to be responsible for coordinating metal ions needed for catalysis." 112 0 0 0 1 0 0 0 1 1.07E-05 169 cd16398 KorB_N_like "ParB-like partition protein of low copy number plasmid RK2, N-terminal domain and related domains. KorB, a member of the ParB like family, is present on the low copy number, broad host range plasmid RK2. KorB encodes a gene product involved in segregation of RK2 and acts as a transcriptional regulator, down-regulating at least 6 RK2 operons. KorB binds RNA polymerase and acts cooperatively with several co-repressors in modulating transcription. KorB is comprised of 3 domains, including a beta-strand C-terminal domain similar to SH3 domains and an alpha helical central domain that interacts with operator DNA. In ParB of P1 and SopB of F, the N-terminal region is responsible for interaction with the parA component. However, korB interaction with the RK2 parA-equivalent IncC has been mapped to the central HTH motif. This family is related to the N-terminal domain of ParB, a DNA-binding component of the prokaryotic parABS partitioning system. parABS contributes to the efficient segregation of chromosomes and low-copy number plasmids to daughter cells during prokaryotic cell division. The process includes the parA (Walker box) ATPase, the ParB DNA-binding protein and a parS cis-acting DNA sites. Binding of ParB to centromere-like parS sites is followed by non-specific binding to DNA (""spreading"", which has been implicated in gene silencing in plasmid P1) and oligomerization of additional ParB molecules near the parS sites. It has been proposed that ParB-ParB cross-linking compacts the DNA, binds to parA via the N-terminal region, and leads to parA separating the ParB-parS complexes and the recruitment of the SMC (structural maintenance of chromosomes) complexes. The ParB N-terminal domain of Bacillus subtilis and other species contains a Arginine-rich ParB Box II with residues essential for bridging of the ParB-parS complexes. The arginine-rich ParB Box II consensus (I[VIL]AGERR[FYW]RA[AS] identified in several species is partially conserved with this family and related families. Mutations within the basic columns particularly debilitate spreading from the parS sites and impair SMC recruitment. The C-terminal domain contains a HTH DNA-binding motif and is the primary homo-dimerization domain, and binds to parS DNA sites. Additional homo-dimerization contacts are found along the N-terminal domain, but dimerization of the N-terminus may only occur after concentration at ParB-parS foci." 91 0 0 0 0 0 0 1 1 1.65E-06 354 pfam18697 MLVIN_C Murine leukemia virus (MLV) integrase (IN) C-terminal domain. This is the C-terminal domain (CTD) which can be found in murine leukemia virus (MLV) integrase (IN) proteins. The MLV IN C-terminal domain interacts with the bromo and extraterminal (BET) proteins through the ET domain. This interaction provides a structural basis for global in vivo integration-site preferences andt disruption of this interaction through truncation mutations affects the global targeting profile of MLV. The CTD consists an SH3 fold followed by a long unstructured tail. 79 1 0 0 0 0 0 0 1 1.12E-06 190 cd18970 CD_POL_like "chromodomain of Hypsizygus marmoreus TY3B-I_0 protein, and similar proteins. This subgroup includes the CHROMO (CHRromatin Organization Modifier) domain found in Hypsizygus marmoreus TY3B-I_0 protein, a putative TY3/gypsy retrotransposon polyprotein, and similar proteins. The pol gene in TY3/gypsy elements generally encodes domains in the following order: an aspartyl protease, a reverse transcriptase, RNase H, and an integrase, here the chromodomain is found at the C-terminus of the integrase domain. The chromodomain, is a conserved region of about 50 amino acids, found in a variety of chromosomal proteins, and implicated in the binding, of the proteins in which it is found, to methylated histone tails and maybe RNA. A chromodomain may occur as a single instance, in a tandem arrangement, or followed by a related chromo shadow domain." 49 1 0 0 0 0 0 0 1 6.42E-08 151 cd05482 HIV_retropepsin_like "Retropepsins, pepsin-like aspartate proteases. This is a subfamily of retropepsins. The family includes pepsin-like aspartate proteases from retroviruses, retrotransposons and retroelements. While fungal and mammalian pepsins are bilobal proteins with structurally related N- and C-termini, retropepsins are half as long as their fungal and mammalian counterparts. The monomers are structurally related to one lobe of the pepsin molecule and retropepsins function as homodimers. The active site aspartate occurs within a motif (Asp-Thr/Ser-Gly), as it does in pepsin. Retroviral aspartyl protease is synthesized as part of the POL polyprotein that contains an aspartyl protease, a reverse transcriptase, RNase H, and an integrase. The POL polyprotein undergoes specific enzymatic cleavage to yield the mature proteins. In aspartate peptidases, Asp residues are ligands of an activated water molecule in all examples where catalytic residues have been identified. This group of aspartate peptidases is classified by MEROPS as the peptidase family A2 (retropepsin family, clan AA), subfamily A2A." 87 1 0 0 0 0 0 0 1 0 296 pfam12835 Integrase_1 Integrase. This is a family of DNA-binding prophage integrases found in Proteobacteria. 149 1 0 0 0 0 0 0 1 0 28 COG4389 COG4389 "Site-specific recombinase [Replication, recombination and repair]. " 677 0 0 1 0 0 0 0 1 0 30 COG4644 COG4644 "Transposase and inactivated derivatives, TnpA family [Mobilome: prophages, transposons]. " 323 0 0 0 1 0 0 0 1 0 366 pfam18866 CxC7 CxC7 like cysteine cluster associated with KDZ transposases. A predicted Zinc chelating domain present N-terminal to the KDZ transposase domain. 64 0 0 0 1 0 0 0 1 0 290 pfam12596 Tnp_P_element_C "87kDa Transposase. This domain family is found in eukaryotes, and is typically between 78 and 110 amino acids in length. The family is found in association with pfam05485. There are two completely conserved residues (D and G) that may be functionally important. This family is an 87kDa transposase protein which catalyzes both the precise and imprecise excision of a nonautonomous P transposable element." 107 0 0 0 1 0 0 0 1 0 0 COG0468 RecA "RecA/RadA recombinase [Replication, recombination and repair]. " 279 0 0 1 0 0 0 0 1 4 COG1479 COG1479 "Uncharacterized conserved protein, contains ParB-like and HNH nuclease domains [Function unknown]. " 409 0 0 0 0 0 0 1 1 6 COG1943 RAYT "REP element-mobilizing transposase RayT [Mobilome: prophages, transposons]. " 136 0 0 0 1 0 0 0 1 11 COG2915 HflD "Regulator of phage lambda lysogenization HflD, binds to CII and stimulates its degradation [Mobilome: prophages, transposons, Signal transduction mechanisms]. " 207 0 0 0 0 1 0 0 1 14 COG3293 COG3293 "Transposase [Mobilome: prophages, transposons]. " 124 0 0 0 1 0 0 0 1 16 COG3328 IS285 "Transposase (or an inactivated derivative) [Mobilome: prophages, transposons]. " 379 0 0 0 1 0 0 0 1 17 COG3331 PrfA "Penicillin-binding protein-related factor A, putative recombinase [General function prediction only]. " 177 0 0 1 0 0 0 0 1 19 COG3385 InsG "IS4 transposase [Mobilome: prophages, transposons]. " 292 0 0 0 1 0 0 0 1 21 COG3436 COG3436 "Transposase [Mobilome: prophages, transposons]. " 157 0 0 0 1 0 0 0 1 22 COG3464 COG3464 "Transposase [Mobilome: prophages, transposons]. " 402 0 0 0 1 0 0 0 1 24 COG3666 COG3666 "Transposase [Mobilome: prophages, transposons]. " 161 0 0 0 1 0 0 0 1 25 COG3676 COG3676 "Transposase and inactivated derivatives [Mobilome: prophages, transposons]. " 126 0 0 0 1 0 0 0 1 26 COG3677 InsA "Transposase [Mobilome: prophages, transposons]. " 129 0 0 0 1 0 0 0 1 33 COG5119 COG5119 "Uncharacterized protein, contains ParB-like nuclease domain [General function prediction only]. " 119 0 0 0 0 0 0 1 1 34 COG5421 COG5421 "Transposase [Mobilome: prophages, transposons]. " 480 0 0 0 1 0 0 0 1 35 COG5433 YhhI "Predicted transposase YbfD/YdcC associated with H repeats [Mobilome: prophages, transposons]. " 121 0 0 0 1 0 0 0 1 36 COG5464 YadD "Predicted transposase YdaD [Replication, recombination and repair]. " 289 0 0 0 1 0 0 0 1 38 COG5558 COG5558 "Transposase [Mobilome: prophages, transposons]. " 261 0 0 0 1 0 0 0 1 39 COG5659 COG5659 "SRSO17 transposase [Mobilome: prophages, transposons]. " 385 0 0 0 1 0 0 0 1 40 NF033179 TnsA_like_Actin "TnsA-like heteromeric transposase endonuclease subunit. The transposase of transposon Tn7 contains multiple subunit. Members of this family are largely restricted to the Actinobacteria, resemble the endonuclease subunit TsnA of the multimeric transposase of Tn7 and its relatives, and occur in genomic neighborhoods that suggest a similar role in transposition." 212 0 0 0 1 0 0 0 1 41 PHA00730 int integrase 337 1 0 0 0 0 0 0 1 45 PHA02731 PHA02731 putative integrase; Provisional 231 1 0 0 0 0 0 0 1 46 PHA02942 PHA02942 putative transposase; Provisional 383 0 0 0 1 0 0 0 1 47 PRK00218 PRK00218 lysogenization regulator HflD. 207 0 0 0 0 1 0 0 1 52 PRK02944 PRK02944 YidC family membrane integrase SpoIIIJ. 255 1 0 0 0 0 0 0 1 56 PRK09183 PRK09183 transposase/IS protein; Provisional 259 0 0 0 1 0 0 0 1 57 PRK09354 recA recombinase A; Provisional 349 0 0 1 0 0 0 0 1 59 PRK09519 recA intein-containing recombinase RecA. 790 0 0 1 0 0 0 0 1 63 PRK09956 PRK09956 ISNCY family transposase. 308 0 0 0 1 0 0 0 1 68 TIGR00180 parB_part "ParB/RepB/Spo0J family partition protein. This model represents the most well-conserved core of a set of chromosomal and plasmid partition proteins related to ParB, including Spo0J, RepB, and SopB. Spo0J has been shown to bind a specific DNA sequence that, when introduced into a plasmid, can serve as partition site. Study of RepB, which has nicking-closing activity, suggests that it forms a transient protein-DNA covalent intermediate during the strand transfer reaction." 187 0 0 0 0 0 0 1 1 70 TIGR01606 holin_BlyA "holin, BlyA family. This family represents a BlyA, a small holin found in Borrelia circular plasmids that prove to be temperate phage. This protein was previously proposed to be an hemolysin. BlyA is small (67 residues) and contains two largely hydrophobic helices and a highly charged C-terminus. [Mobile and extrachromosomal element functions, Prophage functions]" 63 0 0 0 0 0 1 0 1 74 TIGR01765 tspaseT_teng_N "transposase, putative, N-terminal domain. This model represents the N-terminal region of a family of putative transposases found in the largest copy number in Thermoanaerobacter tengcongensis. The three homologs in Bacillus anthracis are each split into two ORFs and this model represents the upstream ORF. [Mobile and extrachromosomal element functions, Transposon functions]" 73 0 0 0 1 0 0 0 1 76 TIGR01784 T_den_put_tspse "conserved hypothetical protein (putative transposase or invertase). Several lines of evidence suggest that members of this family (loaded as a fragment mode model to find part-length matches) are associated with transposition, inversion, or recombination. Members are found in small numbers of genomes, but in large copy numbers in many of those species, including over 30 full length and fragmentary members in Treponema denticola. The strongest similarities are usually within rather than between species. PSI-BLAST shows similarity to proteins designated as possible transposases, DNA invertases (resolvases), and recombinases. In the oral pathogenic spirochete Treponema denticola, full-length members are often found near transporters or other membrane proteins. This family includes members of the putative transposase family pfam04754." 270 0 0 1 1 0 0 0 2 81 TIGR02238 recomb_DMC1 "meiotic recombinase Dmc1. This model describes DMC1, a subfamily of a larger family of DNA repair and recombination proteins. It is eukaryotic only and most closely related to eukaryotic RAD51. It also resembles archaeal RadA (TIGR02236) and RadB (TIGR02237) and bacterial RecA (TIGR02012). It has been characterized for human as a recombinase active only in meiosis." 313 0 0 1 0 0 0 0 1 82 TIGR02239 recomb_RAD51 "DNA repair protein RAD51. This eukaryotic sequence family consists of RAD51, a protein involved in DNA homologous recombination and repair. It is similar in sequence the exclusively meiotic recombinase DMC1 (TIGR02238), to archaeal families RadA (TIGR02236) and RadB (TIGR02237), and to bacterial RecA (TIGR02012)." 316 0 0 1 0 0 0 0 1 86 TIGR03734 PRTRC_parB "PRTRC system ParB family protein. A novel genetic system characterized by six major proteins, included a ParB homolog and a ThiF homolog, is designated PRTRC, or ParB-Related,ThiF-Related Cassette. It is often found on plasmids. This protein family the member related to ParB, and is designated PRTRC system ParB family protein." 554 0 0 0 0 0 0 1 1 87 TIGR03735 PRTRC_A "PRTRC system protein A. A novel genetic system characterized by six major proteins, included a ParB homolog and a ThiF homolog, is designated PRTRC, or ParB-Related,ThiF-Related Cassette. It is often found on plasmids. This protein family is designated protein A." 192 0 0 0 0 0 0 1 1 88 TIGR03736 PRTRC_ThiF "PRTRC system ThiF family protein. A novel genetic system characterized by six major proteins, included a ParB homolog and a ThiF homolog, is designated PRTRC, or ParB-Related,ThiF-Related Cassette. This family is the PRTRC system ThiF family protein." 244 0 0 0 0 0 0 1 1 89 TIGR03737 PRTRC_B "PRTRC system protein B. A novel genetic system characterized by six major proteins, included a ParB homolog and a ThiF homolog, is designated PRTRC, or ParB-Related,ThiF-Related Cassette. This protein family is designated protein B." 228 0 0 0 0 0 0 1 1 90 TIGR03738 PRTRC_C "PRTRC system protein C. A novel genetic system characterized by six major proteins, included a ParB homolog and a ThiF homolog, is designated PRTRC, or ParB-Related,ThiF-Related Cassette. It is often found on plasmids. This protein family is designated PRTRC system protein C." 66 0 0 0 0 0 0 1 1 91 TIGR03739 PRTRC_D "PRTRC system protein D. A novel genetic system characterized by six major proteins, included a ParB homolog and a ThiF homolog, is designated PRTRC, or ParB-Related,ThiF-Related Cassette. It is often found on plasmids. This protein family is designated PRTRC system protein D. The gray zone, between trusted and noise, includes proteins found in the same genomes as other proteins of the PRTRC systems, but not in the same contiguous gene region." 320 0 0 0 0 0 0 1 1 92 TIGR03741 PRTRC_E "PRTRC system protein E. A novel genetic system characterized by six or seven major proteins, included a ParB homolog and a ThiF homolog, is designated PRTRC, or ParB-Related,ThiF-Related Cassette. It is often found on plasmids. This protein family averages about 150 amino acids in length, but the last third contains low-complexity sequence that complicates sequence comparisons. This model does not include the low-complexity region." 104 0 0 0 0 0 0 1 1 96 TIGR04102 SWIM_PBPRA1643 "SWIM/SEC-C metal-binding motif protein, PBPRA1643 family. Members of this protein family have a SWIM, or SEC-C, domain (see pfam02810), a 21-amino acid putative Zn-binding domain that is shared with SecA, plant MuDR transposases, etc. This small protein family of unknown function occurs primarily in marine bacteria." 108 0 0 0 1 0 0 0 1 97 TIGR04141 TIGR04141 "sporadically distributed protein, TIGR04141 family. This model describes a sporadically distributed conserved hypothetical protein in which complete members average over 500 amino acids in length, although matching sequences frequently are truncated or broken into tandem ORFs. Regular co-clustering with known markers of mobility (integrases, transposases, phage proteins, restriction enzymes, etc.) suggests this family also is part of the mobilome. The function is unknown." 516 1 0 0 1 0 0 0 2 98 TIGR04285 nucleoid_noc "nucleoid occlusion protein. This model describes nucleoid occlusion protein, a close homolog to ParB chromosome partitioning proteins including Spo0J in Bacillus subtilis. Its gene often is located near the gene for the Spo0J ortholog. This protein bind a specific DNA sequence and blocks cytokinesis from happening until chromosome segregation is complete." 255 0 0 0 0 0 0 1 1 100 cd00217 INT_Flp_C "Flp Tyrosine-based site-specific recombinases (also called integrases), C-terminal catalytic domain. Yeast Flp-like recombinases mediate the amplification of the 2 micron circular plasmid copy number by catalyzing the intra-molecular recombination between two inverted repeats during replication. They belong to the DNA breaking-rejoining enzyme superfamily, which also includes prokaryotic tyrosine recombinases and type IB topoisomerases. These enzymes share the same fold in their catalytic domain containing six conserved active site residues and the overall reaction mechanism. Flp-like recombinases are almost exclusively found in yeast and are highly diverged in sequence from the prokaryotic tyrosine recombinases. They cleave their target DNA in trans with a composite active site in which the catalytic tyrosine is provided by a promoter bound to a site other than the one being cleaved. Thus each active site within Flp complexes is assembled by domain swapping and contains catalytic residues from two different monomers. Two DNA segments are synapsed by the tetrameric enzyme, carrying the nucleophilic tyrosine in each active site with only two of the four monomers active at a given time. The catalytic domain is linked through a flexible loop to the N-terminal domain, which is largely responsible for non-specific DNA binding and isomerization. Its overall fold is similar to the SAM domain fold also found in the N-terminal domains of lambda integrase and XerD recombinase." 410 1 0 1 0 0 0 0 2 106 cd00619 Terminator_NusB "Transcription termination factor NusB (N protein-Utilization Substance B). NusB plays a key role in the regulation of ribosomal RNA biosynthesis in eubacteria by modulating the efficiency of transcriptional antitermination. NusB along with other Nus factors (NusA, NusE/S10 and NusG) forms the core complex with the boxA element of the nut site of the rRNA operons. These interactions help RNA polymerase to counteract polarity during transcription of rRNA operons and allow stable antitermination. The transcription antitermination system can be appropriated by some bacteriophages such as lambda, which use the system to switch between the lysogenic and lytic modes of phage propagation." 130 0 0 0 0 1 0 0 1 107 cd00659 Topo_IB_C "DNA topoisomerase IB, C-terminal catalytic domain. Topoisomerase I promotes the relaxation of both positive and negative DNA superhelical tension by introducing a transient single-stranded break in duplex DNA. This function is vital for the processes of replication, transcription, and recombination. Unlike Topo IA enzymes, Topo IB enzymes do not require a single-stranded region of DNA or metal ions for their function. The type IB family of DNA topoisomerases includes eukaryotic nuclear topoisomerase I, topoisomerases of poxviruses, and bacterial versions of Topo IB. They belong to the superfamily of DNA breaking-rejoining enzymes, which share the same fold in their C-terminal catalytic domain and the overall reaction mechanism with tyrosine recombinases. The C-terminal catalytic domain in topoisomerases is linked to a divergent N-terminal domain that shows no sequence or structure similarity to the N-terminal domains of tyrosine recombinases." 210 0 0 1 0 0 0 0 1 114 cd01026 TOPRIM_OLD "TOPRIM_OLD: topoisomerase-primase (TOPRIM) nucleotidyl transferase/hydrolase domain of the type found in bacterial and archaeal nucleases of the OLD (overcome lysogenization defect) family. The bacteriophage P2 OLD protein, which has DNase as well as RNase activity, consists of an N-terminal ABC-type ATPase domain and a C-terminal Toprim domain; the nuclease activity of OLD is stimulated by ATP, though the ATPase activity is not DNA-dependent. Functional details on OLD are scant and further experimentation is required to define the relationship between the ATPase and Toprim nuclease domains. The TOPRIM domain has two conserved motifs, one of which centers at a conserved glutamate and the other one at two conserved aspartates (DxD). The conserved glutamate may act as a general acid in strand cleavage by nucleases. The DXD motif may co-ordinate Mg2+, a cofactor required for full catalytic function." 97 0 0 0 0 1 0 0 1 117 cd01176 IPT_RBP-Jkappa "IPT domain of the recombination signal Jkappa binding protein (RBP-Jkappa). RBP-J kappa, was initially considered to be involved in V(D)J recombination because of its DNA binding specificity and structural similarity to site-specific recombinases known as the integrase family. Further studies indicated that RBP-J kappa functions as a repressor of transcription, via destabilization of the general transcription factor IID and recruitment of histone deacetylase complexes." 97 1 0 1 0 0 0 0 2 133 cd01436 Dipth_tox_like "Mono-ADP-ribosylating toxins catalyze the transfer of ADP_ribose from NAD+ to eukaryotic Elongation Factor 2, halting protein synthesis. A single molecule of delivered toxin is sufficient to kill a cell. These toxins share mono-ADP-ribosylating activity with a variety of bacterial toxins, such as cholera toxin and pertussis toxin. The structural core is homologous to the poly-ADP ribosylating enzymes such as the PARP enzymes and Tankyrase. Diphtheria toxin is encoded by a lysogenic bacteriophage. Both diphtheria toxin and Pseudomonas aeruginosa exotoxin A are multi-domain proteins. These domains provide a EF2 ADP_ribosylating, receptor-binding, and intracellular trafficking/transmembrane functions ." 147 0 0 0 0 1 0 0 1 135 cd02106 SPFH_like "core domain of the SPFH (stomatin, prohibitin, flotillin, and HflK/C) superfamily. This model summarizes proteins similar to stomatin, prohibitin, flotillin, HflK/C (SPFH) and podocin. The conserved domain common to the SPFH superfamily has also been referred to as the Band 7 domain. Many superfamily members are associated with lipid rafts. Individual proteins of the SPFH superfamily may cluster to form membrane microdomains which may in turn recruit multiprotein complexes. Microdomains formed from flotillin proteins may in addition be dynamic units with their own regulatory functions. Flotillins have been implicated in signal transduction, vesicle trafficking, cytoskeleton rearrangement and are known to interact with a variety of proteins. Stomatin interacts with and regulates members of the degenerin/epithelia Na+ channel family in mechanosensory cells of Caenorhabditis elegans and vertebrate neurons, and participates in trafficking of Glut1 glucose transporters. Prohibitin may act as a chaperone for the stabilization of mitochondrial proteins. Prokaryotic HflK/C plays a role in the decision between lysogenic and lytic cycle growth during lambda phage infection. Flotillins have been implicated in the progression of prion disease, in the pathogenesis of neurodegenerative diseases such as Parkinson's and Alzheimer's disease, and in cancer invasion and metastasis. Mutations in the podocin gene give rise to autosomal recessive steroid resistant nephritic syndrome." 110 0 0 0 0 1 0 0 1 136 cd03402 SPFH_like_u2 "Uncharacterized family; SPFH (stomatin, prohibitin, flotillin, and HflK/C) superfamily. This model summarizes an uncharacterized family of proteins similar to stomatin, prohibitin, flotillin, HflK/C (SPFH) and podocin. The conserved domain common to the SPFH superfamily has also been referred to as the Band 7 domain. Many superfamily members are associated with lipid rafts. Individual proteins of the SPFH superfamily may cluster to form membrane microdomains which may in turn recruit multiprotein complexes. Microdomains formed from flotillin proteins may in addition be dynamic units with their own regulatory functions. Flotillins have been implicated in signal transduction, vesicle trafficking, cytoskeleton rearrangement and are known to interact with a variety of proteins. Stomatin interacts with and regulates members of the degenerin/epithelia Na+ channel family in mechanosensory cells of Caenorhabditis elegans and vertebrate neurons and participates in trafficking of Glut1 glucose transporters. Prohibitin may act as a chaperone for the stabilization of mitochondrial proteins. Prokaryotic HflK/C plays a role in the decision between lysogenic and lytic cycle growth during lambda phage infection. Flotillins have been implicated in the progression of prion disease, in the pathogenesis of neurodegenerative diseases such as Parkinson's and Alzheimer's disease, and in cancer invasion and metastasis. Mutations in the podocin gene give rise to autosomal recessive steroid resistant nephritic syndrome." 231 0 0 0 0 1 0 0 1 137 cd03404 SPFH_HflK "High frequency of lysogenization K (HflK) family; SPFH (stomatin, prohibitin, flotillin, and HflK/C) superfamily. This model characterizes proteins similar to prokaryotic HflK (High frequency of lysogenization K). Although many members of the SPFH (or band 7) superfamily are lipid raft associated, prokaryote plasma membranes lack cholesterol and are unlikely to have lipid raft domains. Individual proteins of this SPFH domain superfamily may cluster to form membrane microdomains which may in turn recruit multiprotein complexes. Escherichia coli HflK is an integral membrane protein which may localize to the plasma membrane. HflK associates with another SPFH superfamily member (HflC) to form an HflKC complex. HflKC interacts with FtsH in a large complex termed the FtsH holo-enzyme. FtsH is an AAA ATP-dependent protease which exerts progressive proteolysis against membrane-embedded and soluble substrate proteins. HflKC can modulate the activity of FtsH. HflKC plays a role in the decision between lysogenic and lytic cycle growth during lambda phage infection." 266 0 0 0 0 1 0 0 1 138 cd03405 SPFH_HflC "High frequency of lysogenization C (HflC) family; SPFH (stomatin, prohibitin, flotillin, and HflK/C) superfamily. This model characterizes proteins similar to prokaryotic HflC (High frequency of lysogenization C). Although many members of the SPFH (or band 7) superfamily are lipid raft associated, prokaryote plasma membranes lack cholesterol and are unlikely to have lipid raft domains. Individual proteins of this SPFH domain superfamily may cluster to form membrane microdomains which may in turn recruit multiprotein complexes. Escherichia coli HflC is an integral membrane protein which may localize to the plasma membrane. HflC associates with another SPFH superfamily member (HflK) to form an HflKC complex. HflKC interacts with FtsH in a large complex termed the FtsH holo-enzyme. FtsH is an AAA ATP-dependent protease which exerts progressive proteolysis against membrane-embedded and soluble substrate proteins. HflKC can modulate the activity of FtsH. HflKC plays a role in the decision between lysogenic and lytic cycle growth during lambda phage infection." 249 0 0 0 0 1 0 0 1 139 cd03406 SPFH_like_u3 "Uncharacterized family; SPFH (stomatin, prohibitin, flotillin, and HflK/C) superfamily. This model summarizes an uncharacterized family of proteins similar to stomatin, prohibitin, flotillin, HflK/C (SPFH) and podocin. The conserved domain common to the SPFH superfamily has also been referred to as the Band 7 domain. Many superfamily members are associated with lipid rafts. Individual proteins of the SPFH superfamily may cluster to form membrane microdomains which may in turn recruit multiprotein complexes. Microdomains formed from flotillin proteins may in addition be dynamic units with their own regulatory functions. Flotillins have been implicated in signal transduction, vesicle trafficking, cytoskeleton rearrangement and are known to interact with a variety of proteins. Stomatin interacts with and regulates members of the degenerin/epithelia Na+ channel family in mechanosensory cells of Caenorhabditis elegans and vertebrate neurons and participates in trafficking of Glut1 glucose transporters. Prohibitin may act as a chaperone for the stabilization of mitochondrial proteins. Prokaryotic HflK/C plays a role in the decision between lysogenic and lytic cycle growth during lambda phage infection. Flotillins have been implicated in the progression of prion disease, in the pathogenesis of neurodegenerative diseases such as Parkinson's and Alzheimer's disease and, in cancer invasion and metastasis. Mutations in the podocin gene give rise to autosomal recessive steroid resistant nephritic syndrome." 293 0 0 0 0 1 0 0 1 140 cd03407 SPFH_like_u4 "Uncharacterized family; SPFH (stomatin, prohibitin, flotillin, and HflK/C) superfamily. This model summarizes an uncharacterized family of proteins similar to stomatin, prohibitin, flotillin, HflK/C (SPFH) and podocin. The conserved domain common to the SPFH superfamily has also been referred to as the Band 7 domain. Many superfamily members are associated with lipid rafts. Individual proteins of the SPFH superfamily may cluster to form membrane microdomains which may in turn recruit multiprotein complexes. Microdomains formed from flotillin proteins may in addition be dynamic units with their own regulatory functions. Flotillins have been implicated in signal transduction, vesicle trafficking, cytoskeleton rearrangement and are known to interact with a variety of proteins. Stomatin interacts with and regulates members of the degenerin/epithelia Na+ channel family in mechanosensory cells of Caenorhabditis elegans and vertebrate neurons and participates in trafficking of Glut1 glucose transporters. Prohibitin may act as a chaperone for the stabilization of mitochondrial proteins. Prokaryotic HflK/C plays a role in the decision between lysogenic and lytic cycle growth during lambda phage infection. Flotillins have been implicated in the progression of prion disease, in the pathogenesis of neurodegenerative diseases such as Parkinson's and Alzheimer's disease and, in cancer invasion and metastasis. Mutations in the podocin gene give rise to autosomal recessive steroid resistant nephritic syndrome." 269 0 0 0 0 1 0 0 1 147 cd04610 CBS_pair_ParBc_assoc "Two tandem repeats of the cystathionine beta-synthase (CBS pair) domains associated with a ParBc (ParB-like nuclease) domain. This cd contains two tandem repeats of the cystathionine beta-synthase (CBS pair) domains associated with a ParBc (ParB-like nuclease) domain downstream. The CBS domain, named after human CBS, is a small domain originally identified in cystathionine beta-synthase and is subsequently found in a wide range of different proteins. CBS domains usually occur in tandem repeats. They associate to form a so-called Bateman domain or a CBS pair based on crystallographic studies in bacteria. The CBS pair was used as a basis for this cd hierarchy since the human CBS proteins can adopt the typical core structure and form an intramolecular CBS pair. The interface between the two CBS domains forms a cleft that is a potential ligand binding site. The CBS pair coexists with a variety of other functional domains and this has been used to help in its classification here. It has been proposed that the CBS domain may play a regulatory role, although its exact function is unknown. Mutations of conserved residues within this domain are associated with a variety of human hereditary diseases, including congenital myotonia, idiopathic generalized epilepsy, hypercalciuric nephrolithiasis, and classic Bartter syndrome (CLC chloride channel family members), Wolff-Parkinson-White syndrome (gamma 2 subunit of AMP-activated protein kinase), retinitis pigmentosa (IMP dehydrogenase-1), and homocystinuria (cystathionine beta-synthase)." 108 0 0 0 0 0 0 1 1 149 cd05470 pepsin_retropepsin_like "Cellular and retroviral pepsin-like aspartate proteases. This family includes both cellular and retroviral pepsin-like aspartate proteases. The cellular pepsin and pepsin-like enzymes are twice as long as their retroviral counterparts. The cellular pepsin-like aspartic proteases are found in mammals, plants, fungi and bacteria. These well known and extensively characterized enzymes include pepsins, chymosin, rennin, cathepsins, and fungal aspartic proteases. Several have long been known to be medically (rennin, cathepsin D and E, pepsin) or commercially (chymosin) important. The eukaryotic pepsin-like proteases contain two domains possessing similar topological features. The N- and C-terminal domains, although structurally related by a 2-fold axis, have only limited sequence homology except in the vicinity of the active site. This suggests that the enzymes evolved by an ancient duplication event. The eukaryotic pepsin-like proteases have two active site ASP residues with each N- and C-terminal lobe contributing one residue. While the fungal and mammalian pepsins are bilobal proteins, retropepsins function as dimers and the monomer resembles structure of the N- or C-terminal domains of eukaryotic enzyme. The active site motif (Asp-Thr/Ser-Gly-Ser) is conserved between the retroviral and eukaryotic proteases and between the N-and C-terminal of eukaryotic pepsin-like proteases. The retropepsin-like family includes pepsin-like aspartate proteases from retroviruses, retrotransposons and retroelements; as well as eukaryotic DNA-damage-inducible proteins (DDIs), and bacterial aspartate peptidases. Retropepsin is synthesized as part of the POL polyprotein that contains an aspartyl-protease, a reverse transcriptase, RNase H, and an integrase. The POL polyprotein undergoes specific enzymatic cleavage to yield the mature proteins. This family of aspartate proteases is classified by MEROPS as the peptidase family A1 (pepsin A) and A2 (retropepsin family)." 109 1 0 0 0 0 0 0 1 152 cd05484 retropepsin_like_LTR_2 "Retropepsins_like_LTR, pepsin-like aspartate proteases. Retropepsin of retrotransposons with long terminal repeats are pepsin-like aspartate proteases. While fungal and mammalian pepsins are bilobal proteins with structurally related N- and C-termini, retropepsins are half as long as their fungal and mammalian counterparts. The monomers are structurally related to one lobe of the pepsin molecule and retropepsins function as homodimers. The active site aspartate occurs within a motif (Asp-Thr/Ser-Gly), as it does in pepsin. Retroviral aspartyl protease is synthesized as part of the POL polyprotein that contains an aspartyl protease, a reverse transcriptase, RNase H, and an integrase. The POL polyprotein undergoes specific enzymatic cleavage to yield the mature proteins. In aspartate peptidases, Asp residues are ligands of an activated water molecule in all examples where catalytic residues have been identified. This group of aspartate peptidases is classified by MEROPS as the peptidase family A2 (retropepsin family, clan AA), subfamily A2A." 91 1 0 0 0 0 0 0 1 155 cd10544 SET_SETMAR SET domain (including pre-SET and post-SET domains) found in SET domain and mariner transposase fusion protein (SETMAR) and similar proteins. SETMAR (also termed metnase) is a DNA-binding protein that is indirectly recruited to sites of DNA damage through protein-protein interactions. It has a sequence-specific DNA-binding activity recognizing the 19-mer core of the 5'-terminal inverted repeats (TIRs) of the Hsmar1 element and displays a DNA nicking and end joining activity. SETMAR also acts as a histone-lysine N-methyltransferase that methylates 'Lys-4' and 'Lys-36' of histone H3. It specifically mediates dimethylation of H3 'Lys-36' at sites of DNA double-strand break and may recruit proteins required for efficient DSB repair through non-homologous end-joining. 254 0 0 0 1 0 0 0 1 156 cd11602 Ndc10 "Ndc10 component of the yeast centromere-binding factor 3. Ndc10 is a multidomain protein conserved in Saccharomycotina that interacts with kinetochore components. This model characterizes the majority of the protein; some family members may have an additional C-terminal domain that is homologous to transcriptional activators (GCR1_C). Ndc10 is part of the centromere-binding factor 3 (CBF3) complex in budding yeast. The CBF3 complex contains four essential proteins, Ndc10, Cep3, Ctf13, and Skp1. CBF3/Ndc10 is essential for the recruitment of the centromeric nucleosome and formation of the kinetochore. The Kinetochore is the large, multiprotein assembly that serves to connect condensed sister chromatids to the mitotic spindle. Ndc10 forms a dimer and it has non-sequence-specific DNA binding activity via the DNA backbone. Ndc10 also plays an important role in the coordination of cell division. It has been noted that the protein bears resemblance to the tyrosine recombinases (type IB topoisomerase/lambda-integrase)." 413 1 0 1 0 0 0 0 2 157 cd15569 PHD_RAG2 "PHD finger found in V(D)J recombination-activating protein 2 (RAG-2) and similar proteins. RAG-2 is an essential component of the lymphoid-specific recombination activating gene RAG1/2 V(D)J recombinase mediating antigen-receptor gene assembly. It contains an acidic hinge region implicated in histone-binding, a non-canonical plant homeodomain (PHD) finger followed by a C-terminal extension of 40 amino acids that is essential for phosphoinositide (PtdIns)-binding. The PHD finger is a chromatin-binding module that specifically recognizes histone H3 trimethylated at lysine 4 (H3K4me3) and influences V(D)J recombination." 67 0 0 1 0 0 0 0 1 158 cd16382 XisI-like "XisI is FdxN element excision controlling factor protein. This family contains XisI proteins, also known as FdxN element excision controlling factors, and similar proteins. FdxN element is excised from the chromosome during heterocyst differentiation in cyanobacteria. This is accomplished by the large serine recombinase XisF (fdxN element site-specific recombinase). The xisH as well as the xisF and xisI genes are required. XisI may function as recombination directionality factor (RDF), and needs XisH which may function as an endonuclease." 107 0 0 1 0 0 0 0 1 159 cd16387 ParB_N_Srx "ParB N-terminal domain and sulfiredoxin protein-related families. The ParB N-terminal domain/Sulfiredoxin (Srx) superfamily contains proteins with diverse activities. Many of the families are involved in segregation and competition between plasmids and chromosomes. Several families share similar activities with the N-terminal domain of ParB (Spo0J in Bacillus subtilis), a DNA-binding component of the prokaryotic parABS partitioning system. Also within this superfamily is sulfiredoxin (Srx; reactivator of oxidatively inactivated 2-cys peroxiredoxins), RepB N-terminal domain (plasmid segregation replication protein B like protein), nucleoid occlusion protein, KorB N-terminal domain partition protein of low copy number plasmid RK2, irbB (immunoglobulin-binding regulator that activates eib genes), N-terminal domain of sopB protein (promotes proper partitioning of F1 plasmid), fertility inhibition factors OSA and FiwA,DNA sulfur modification protein DndB, and a ParB-like toxin domain. Other activities includes a StrR (regulator in the streptomycin biosynthetic gene cluster), and a family containing a Pyrococcus furiosus nuclease and putative transcriptional regulators sbnI (Staphylococcus aureus siderophore biosynthetic gene cluster ). Nuclease activity has also been reported in Arabidopsis Srx." 54 0 0 0 0 0 0 1 1 160 cd16388 SbnI_like_N "N-terminal domain of transcriptional regulators similar to SbnI. Siderophore staphylobactin biosynthesis protein SbnI of Staphylococcus aureus is a ParB/Spo0J like protein required for the expression of genes in the sbn operon, which is responsible for staphyloferrin B (SB) biosynthesis. SnbI forms dimers and binds DNA upstream of sdnD. SbnI binds heme, which inhibits DNA binding of SbnI, leading to a suppression of sbn operon expression." 77 0 0 0 0 0 0 1 1 163 cd16392 toxin-ParB "toxin domain of the ParB/Srx superfamily. toxin domain with similarity to the N-terminal domain of ParB, a DNA-binding component of the prokaryotic parABS partitioning system and related proteins. Toxin found, for example, at the C-terminus of polymorphic toxin system members." 72 0 0 0 0 0 0 1 1 164 cd16393 SPO0J_N "Thermus thermophilus stage 0 sporulation protein J-like N-terminal domain, ParB family member. Spo0J (stage 0 sporulation protein J) is a ParB family member, a critical component of the ParABS-type bacterial chromosome segregation system. The Spo0J N-terminal region acts in protein-protein interaction and is adjacent to the DNA-binding domain that binds to parS sites. Two Spo0J bind per parS site, and Spo0J interacts with neighbors via the N-terminal domain to form oligomers via an Arginine-rich patch (RRXR). This superfamily represents the N-terminal domain of ParB, a DNA-binding component of the prokaryotic parABS partitioning system. parABS contributes to the efficient segregation of chromosomes and low-copy number plasmids to daughter cells during prokaryotic cell division. The process includes the parA (Walker box) ATPase, the ParB DNA-binding protein and a parS cis-acting DNA sites. Binding of ParB to centromere-like parS sites is followed by non-specific binding to DNA (""spreading"", which has been implicated in gene silencing in plasmid P1) and oligomerization of additional ParB molecules near the parS sites. It has been proposed that ParB-ParB cross-linking compacts the DNA, binds to parA via the N-terminal region, and leads to parA separating the ParB-parS complexes and the recruitment of the SMC (structural maintenance of chromosomes) complexes. The ParB N-terminal domain of Bacillus subtilis and other species contains a Arginine-rich ParB Box II with residues essential for bridging of the ParB-parS complexes. The arginine-rich ParB Box II consensus (I[VIL]AGERR[FYW]RA[AS] identified in several species is partially conserved with this family and related families. Mutations within the basic columns particularly debilitate spreading from the parS sites and impair SMC recruitment. The C-terminal domain contains a HTH DNA-binding motif and is the primary homo-dimerization domain, and binds to parS DNA sites. Additional homo-dimerization contacts are found along the N-terminal domain, but dimerization of the N-terminus may only occur after concentration at ParB-parS foci." 97 0 0 0 0 0 0 1 1 167 cd16396 Noc_N "nucleoid occlusion protein, N-terminal domain, and related domains of the ParB partitioning protein family. Nucleoid occlusion protein has been shown in Bacillus subtilis to bind to specific DNA sequences on the chromosome (Noc-binding DNA sequences, NBS), inhibiting cell division near the nucleoid and thereby protecting the chromosome. This N-terminal domain is related to the N-terminal domain of ParB/repB partitioning system proteins." 95 0 0 0 0 0 0 1 1 168 cd16397 IbrB_like "immunoglobulin-binding regulator IbrB activates eib genes. IbrB (along with IbrA) activates immunoglobulin-binding eib genes in Escherichia coli. IbrB is related to the ParB N-terminal domain family, which includes DNA-binding proteins involved in chromosomal/plasmid segregation and transcriptional regulation, consistent with a possible mechanism for IbrB in DNA binding-related regulation of eib expression. The ParB like family is a diverse domain superfamily with structural and sequence similarity to ParB of bacterial chromosomes/plasmid parABS partitioning system and Sulfiredoxin (Srx), which is a reactivator of oxidatively inactivated 2-cys peroxiredoxins. Other families includes proteins related to StrR, a putative regulator in the biosynthetic gene cluster and a family containing a Pyrococcus furiosus nuclease and putative transcriptional regulators SbnI (Staphylococcus aureus siderophore biosynthetic gene cluster ) and EdeB (Brevibacillus brevis antimicrobial peptide edeine biosynthetic cluster). Nuclease activity has also been reported in arabidopsis Srx." 100 0 0 0 0 0 0 1 1 170 cd16400 ParB_Srx_like_nuclease "ParB/Srx_like nuclease and putative transcriptional regulators related to SbnI. This family contains a Pyrococcus Furiosus enzyme reported to have DNA nuclease activity and resembles the N-terminal domain of ParB proteins of the parABS bacterial chromosome partitioning system. This sub-family also includes siderophore staphylobactin biosynthesis protein SbnI. 60% of the P. furiosus nuclease activity was retained at 90 degree C, suggesting a physiological role in the organism, which can grow in temperatures as high as 100 degrees Celsius. The protein has endo- and exo-nuclease activity vs. single- and double-stranded DNA, and nuclease activity was lost in methylated proteins prepared for structure solution. This family has a fairly well-conserved DGHHR motif that corresponds to the same structural position as the phosphorylation site (a portion of the ATP-Mg-binding site) of sulfiredoxin and the arginine-rich ParB BoxII of ParB." 72 0 0 0 0 0 0 1 1 171 cd16401 ParB_N_like_MT "ParB N-terminal-like domain, some attached to C-terminal S-adenosylmethionine-dependent methyltransferase domain. This family represents domains related to the N-terminal domain of ParB, a DNA-binding component of the prokaryotic parABS partitioning system, fused to a variety of C-terminal domains, including S-adenosylmethionine-dependent methyltransferase-like domains. parABS contributes to the efficient segregation of chromosomes and low-copy number plasmids to daughter cells during prokaryotic cell division. The process includes the parA (Walker box) ATPase, the ParB DNA-binding protein and a parS cis-acting DNA sites. Binding of ParB to centromere-like parS sites is followed by non-specific binding to DNA (""spreading"", which has been implicated in gene silencing in plasmid P1) and oligomerization of additional ParB molecules near the parS sites. It has been proposed that ParB-ParB cross-linking compacts the DNA, binds to parA via the N-terminal region, and leads to parA separating the ParB-parS complexes and the recruitment of the SMC (structural maintenance of chromosomes) complexes. The ParB N-terminal domain of Bacillus subtilis and other species contains a Arginine-rich ParB Box II with residues essential for bridging of the ParB-parS complexes. The arginine-rich ParB Box II consensus (I[VIL]AGERR[FYW]RA[AS] identified in several species is partially conserved with this family and related families. Mutations within the basic columns particularly debilitate spreading from the parS sites and impair SMC recruitment. The C-terminal domain contains a HTH DNA-binding motif and is the primary homo-dimerization domain, and binds to parS DNA sites. Additional homo-dimerization contacts are found along the N-terminal domain, but dimerization of the N-terminus may only occur after concentration at ParB-parS foci." 85 0 0 0 0 0 0 1 1 172 cd16402 ParB_N_like_MT "ParB N-terminal-like domain, some attached to C-terminal S-adenosylmethionine-dependent methyltransferase domain. This family represents domains related to the N-terminal domain of ParB, a DNA-binding component of the prokaryotic parABS partitioning system, fused to a variety of C-terminal domains, including S-adenosylmethionine-dependent methyltransferase-like domains and DUF4417. parABS contributes to the efficient segregation of chromosomes and low-copy number plasmids to daughter cells during prokaryotic cell division. The process includes the parA (Walker box) ATPase, the ParB DNA-binding protein and a parS cis-acting DNA sites. Binding of ParB to centromere-like parS sites is followed by non-specific binding to DNA (""spreading"", which has been implicated in gene silencing in plasmid P1) and oligomerization of additional ParB molecules near the parS sites. It has been proposed that ParB-ParB cross-linking compacts the DNA, binds to parA via the N-terminal region, and leads to parA separating the ParB-parS complexes and the recruitment of the SMC (structural maintenance of chromosomes) complexes. The ParB N-terminal domain of Bacillus subtilis and other species contains a Arginine-rich ParB Box II with residues essential for bridging of the ParB-parS complexes. The arginine-rich ParB Box II consensus (I[VIL]AGERR[FYW]RA[AS] identified in several species is partially conserved with this family and related families. Mutations within the basic columns particularly debilitate spreading from the parS sites and impair SMC recruitment. The C-terminal domain contains a HTH DNA-binding motif and is the primary homo-dimerization domain, and binds to parS DNA sites. Additional homo-dimerization contacts are found along the N-terminal domain, but dimerization of the N-terminus may only occur after concentration at ParB-parS foci." 87 0 0 0 0 0 0 1 1 173 cd16403 ParB_N_like_MT "ParB N-terminal-like domain, some attached to C-terminal S-adenosylmethionine-dependent methyltransferase. This family represents domains related to the N-terminal domain of ParB, a DNA-binding component of the prokaryotic parABS partitioning system, fused to a variety of C-terminal domains, including S-adenosylmethionine-dependent methyltransferase-like domains. parABS contributes to the efficient segregation of chromosomes and low-copy number plasmids to daughter cells during prokaryotic cell division. The process includes the parA (Walker box) ATPase, the ParB DNA-binding protein and a parS cis-acting DNA sites. Binding of ParB to centromere-like parS sites is followed by non-specific binding to DNA (""spreading"", which has been implicated in gene silencing in plasmid P1) and oligomerization of additional ParB molecules near the parS sites. It has been proposed that ParB-ParB cross-linking compacts the DNA, binds to parA via the N-terminal region, and leads to parA separating the ParB-parS complexes and the recruitment of the SMC (structural maintenance of chromosomes) complexes. The ParB N-terminal domain of Bacillus subtilis and other species contains a Arginine-rich ParB Box II with residues essential for bridging of the ParB-parS complexes. The arginine-rich ParB Box II consensus (I[VIL]AGERR[FYW]RA[AS] identified in several species is partially conserved with this family and related families. Mutations within the basic columns particularly debilitate spreading from the parS sites and impair SMC recruitment. The C-terminal domain contains a HTH DNA-binding motif and is the primary homo-dimerization domain, and binds to parS DNA sites. Additional homo-dimerization contacts are found along the N-terminal domain, but dimerization of the N-terminus may only occur after concentration at ParB-parS foci." 88 0 0 0 0 0 0 1 1 174 cd16404 pNOB8_ParB_N_like "pNOB8 ParB-like N-terminal domain, plasmid partitioning system protein domain. archaeal pNOB8 ParB acts in a plasmid partitioning system made up of 3 parts: AspA, ParA motor protein, and ParB, which links ParA to the protein-DNA superhelix. As demonstrated in Sulfolobus, AspA spreads along DNA, which allows ParB binding, and links to the Walker-motif containing ParA motor protein. The Sulfolobus ParB C-terminal domain resembles eukaryotic segregation protein CenpA, and other histones. This family is related to the N-terminal domain of ParB (Spo0J in Bacillus subtilis), a DNA-binding component of the prokaryotic parABS partitioning system and related proteins." 69 0 0 0 0 0 0 1 1 176 cd16406 ParB_N_like "ParB N-terminal, parA-binding, -like domain of bacterial and plasmid parABS partitioning systems. This family represents the N-terminal domain of ParB, a DNA-binding component of the prokaryotic parABS partitioning system. parABS contributes to the efficient segregation of chromosomes and low-copy number plasmids to daughter cells during prokaryotic cell division. The process includes the parA (Walker box) ATPase, the ParB DNA-binding protein and a parS cis-acting DNA sites. Binding of ParB to centromere-like parS sites is followed by non-specific binding to DNA (""spreading"", which has been implicated in gene silencing in plasmid P1) and oligomerization of additional ParB molecules near the parS sites. It has been proposed that ParB-ParB cross-linking compacts the DNA, binds to parA via the N-terminal region, and leads to parA separating the ParB-parS complexes and the recruitment of the SMC (structural maintenance of chromosomes) complexes. The ParB N-terminal domain of Bacillus subtilis and other species contains a Arginine-rich ParB Box II with residues essential for bridging of the ParB-parS complexes. The arginine-rich ParB Box II consensus (I[VIL]AGERR[FYW]RA[AS] identified in several species is partially conserved with this family and related families. Mutations within the basic columns particularly debilitate spreading from the parS sites and impair SMC recruitment. The C-terminal domain contains a HTH DNA-binding motif and is the primary homo-dimerization domain, and binds to parS DNA sites. Additional homo-dimerization contacts are found along the N-terminal domain, but dimerization of the N-terminus may only occur after concentration at ParB-parS foci." 82 0 0 0 0 0 0 1 1 178 cd16408 ParB_N_like "ParB N-terminal, parA -binding, -like domain of bacterial and plasmid parABS partitioning systems. This family represents the N-terminal domain of ParB, a DNA-binding component of the prokaryotic parABS partitioning system. parABS contributes to the efficient segregation of chromosomes and low-copy number plasmids to daughter cells during prokaryotic cell division. The process includes the parA (Walker box) ATPase, the ParB DNA-binding protein and a parS cis-acting DNA sites. Binding of ParB to centromere-like parS sites is followed by non-specific binding to DNA (""spreading"", which has been implicated in gene silencing in plasmid P1) and oligomerization of additional ParB molecules near the parS sites. It has been proposed that ParB-ParB cross-linking compacts the DNA, binds to parA via the N-terminal region, and leads to parA separating the ParB-parS complexes and the recruitment of the SMC (structural maintenance of chromosomes) complexes. The ParB N-terminal domain of Bacillus subtilis and other species contains a Arginine-rich ParB Box II with residues essential for bridging of the ParB-parS complexes. The arginine-rich ParB Box II consensus (I[VIL]AGERR[FYW]RA[AS] identified in several species is partially conserved with this family and related families. Mutations within the basic columns particularly debilitate spreading from the parS sites and impair SMC recruitment. The C-terminal domain contains a HTH DNA-binding motif and is the primary homo-dimerization domain, and binds to parS DNA sites. Additional homo-dimerization contacts are found along the N-terminal domain, but dimerization of the N-terminus may only occur after concentration at ParB-parS foci." 84 0 0 0 0 0 0 1 1 179 cd16409 ParB_N_like "ParB N-terminal-like domain of bacterial and plasmid parABS partitioning systems. This family represents the N-terminal domain of ParB, a DNA-binding component of the prokaryotic parABS partitioning system. parABS contributes to the efficient segregation of chromosomes and low-copy number plasmids to daughter cells during prokaryotic cell division. The process includes the parA (Walker box) ATPase, the ParB DNA-binding protein and a parS cis-acting DNA sites. Binding of ParB to centromere-like parS sites is followed by non-specific binding to DNA (""spreading"", which has been implicated in gene silencing in plasmid P1) and oligomerization of additional ParB molecules near the parS sites. It has been proposed that ParB-ParB cross-linking compacts the DNA, binds to parA via the N-terminal region, and leads to parA separating the ParB-parS complexes and the recruitment of the SMC (structural maintenance of chromosomes) complexes. The ParB N-terminal domain of Bacillus subtilis and other species contains a Arginine-rich ParB Box II with residues essential for bridging of the ParB-parS complexes. The arginine-rich ParB Box II consensus (I[VIL]AGERR[FYW]RA[AS] identified in several species is partially conserved with this family and related families. Mutations within the basic columns particularly debilitate spreading from the parS sites and impair SMC recruitment. The C-terminal domain contains a HTH DNA-binding motif and is the primary homo-dimerization domain, and binds to parS DNA sites. Additional homo-dimerization contacts are found along the N-terminal domain, but dimerization of the N-terminus may only occur after concentration at ParB-parS foci." 74 0 0 0 0 0 0 1 1 180 cd16410 ParB_N_like "ParB N-terminal, parA-binding, -like domain of bacterial and plasmid parABS partitioning systems. This family represents the N-terminal domain of ParB, a DNA-binding component of the prokaryotic parABS partitioning system. parABS contributes to the efficient segregation of chromosomes and low-copy number plasmids to daughter cells during prokaryotic cell division. The process includes the parA (Walker box) ATPase, the ParB DNA-binding protein and a parS cis-acting DNA sites. Binding of ParB to centromere-like parS sites is followed by non-specific binding to DNA (""spreading"", which has been implicated in gene silencing in plasmid P1) and oligomerization of additional ParB molecules near the parS sites. It has been proposed that ParB-ParB cross-linking compacts the DNA, binds to parA via the N-terminal region, and leads to parA separating the ParB-parS complexes and the recruitment of the SMC (structural maintenance of chromosomes) complexes. The ParB N-terminal domain of Bacillus subtilis and other species contains a Arginine-rich ParB Box II with residues essential for bridging of the ParB-parS complexes. The arginine-rich ParB Box II consensus (I[VIL]AGERR[FYW]RA[AS] identified in several species is partially conserved with this family and related families. Mutations within the basic columns particularly debilitate spreading from the parS sites and impair SMC recruitment. The C-terminal domain contains a HTH DNA-binding motif and is the primary homo-dimerization domain, and binds to parS DNA sites. Additional homo-dimerization contacts are found along the N-terminal domain, but dimerization of the N-terminus may only occur after concentration at ParB-parS foci." 80 0 0 0 0 0 0 1 1 183 cd16413 DGQHR_domain "DGQHR motif containing domain. Uncharacterized diverse domain family with conserved DGQHR motif, in addition to QR and FXXXN motifs. Some proteins have been identified as parts of DNA phosphorothioation systems. Related to dndB, which acts in the regulation of DNA modifications, including DNA phosphorothioation. These proteins show similarity to the N-terminal domain of ParB, a DNA-binding component of the prokaryotic parABS partitioning system, and other members of the ParB/Srx superfamily." 229 0 0 0 0 0 0 1 1 186 cd18202 BTB_POZ_ZBTB11 "BTB (Broad-Complex, Tramtrack and Bric a brac)/POZ (poxvirus and zinc finger) domain found in zinc finger and BTB domain-containing protein 11 (ZBTB11). ZBTB11 is a transcriptional repressor of TP53. It is critical for basal and emergency granulopoiesis. It regulates neutrophil development through its integrase-like zinc finger domain. ZBTB11 contains a BTB/POZ domain, a common protein-protein interaction motif of about 100 amino acids." 118 1 0 0 0 0 0 0 1 187 cd18671 PIN_PRORP-Zc3h12a-like "PIN domain of protein-only RNase P (PRORP), ribonuclease Zc3h12a, and related proteins. PRORPs catalyze the maturation of the 5' end of precursor tRNAs in eukaryotes. This family includes human PRORP, also known as proteinaceous RNase P and mitochondrial RNase P protein subunit 3 (MRPP3), and Arabidopsis thaliana PRORP1-3, PRORP1 localizes to the chloroplast and the mitochondria, and PRORP2 and PRORP3 localize to the nucleus. Zc3h12a (zinc finger CCCH-type containing 12A, also known as MCPIP1/MCP induced protein 1 and Regnase-1) is a critical regulator of inflammatory response, with additional roles in defense against viruses and various stresses, cellular differentiation, and apoptosis. This PIN_PRORP-Zc3h12a-like family also includes Caenorhabditis elegans REGE-1 (REGnasE-1), which also functions as a cytoplasmic endonuclease. Additionally, it includes three less-studied mammalian homologs: Zc3h12b-d/Regnase-2-4, as well as N4BP1 (NEDD4-binding partner-1), NYNRIN (NYN domain and retroviral integrase containing, also known as CGIN1/Cousin of GIN1), and KHNYN (KH and NYN domain containing) protein. N4BP1, CGIN1, and KHNYN proteins are probably of retroviral origin. The PIN (PilT N terminus) domain belongs to a large nuclease superfamily. The structural properties of the PIN domain indicate its active center, consisting of three highly conserved catalytic residues which coordinate metal ions; in some members, additional metal coordinating residues can be found while some others lack several of these key catalytic residues. The PIN active site is geometrically similar in the active center of structure-specific 5' nucleases, PIN-domain ribonucleases of eukaryotic rRNA editing proteins, and bacterial toxins of toxin-antitoxin (TA) operons." 126 1 0 0 0 0 0 0 1 188 cd18719 PIN_Zc3h12a-N4BP1-like "PRORP-like PIN domain of ribonuclease Zc3h12a, NEDD4-binding partner-1, and related proteins. Zc3h12a (zinc finger CCCH-type containing 12A, also known as MCPIP1/MCP induced protein 1 and Regnase-1) is a critical regulator of inflammatory response, with additional roles in defense against viruses and various stresses, cellular differentiation, and apoptosis. This subfamily also includes Caenorhabditis elegans REGE-1 (REGnasE-1), which also functions as a cytoplasmic endonuclease. Additionally, it includes three less-studied mammalian homologs: Zc3h12b-d/Regnase-2-4, as well as N4BP1 (NEDD4-binding partner-1), NYNRIN (NYN domain and retroviral integrase containing, also known as CGIN1/Cousin of GIN1), and KHNYN (KH and NYN domain containing) protein. N4BP1, CGIN1, and KHNYN proteins are probably of retroviral origin. This subfamily belongs to the PRORP-Zc3h12a-like PIN family which in addition includes human PRORP, also known as proteinaceous RNase P and mitochondrial RNase P protein subunit 3 (MRPP3), and Arabidopsis thaliana PRORP1-3. The PIN (PilT N terminus) domain belongs to a large nuclease superfamily. The structural properties of the PIN domain indicate its active center, consisting of three highly conserved catalytic residues which coordinate metal ions; in some members, additional metal coordinating residues can be found while some others lack several of these key catalytic residues. The PIN active site is geometrically similar in the active center of structure-specific 5' nucleases, PIN-domain ribonucleases of eukaryotic rRNA editing proteins, and bacterial toxins of toxin-antitoxin (TA) operons." 127 1 0 0 0 0 0 0 1 192 cd18972 CD_POL_like "chromodomain of a Moniliophthora perniciosa FA553 putative retrotransposon polyprotein, and similar proteins. This subgroup includes the CHROMO (CHRromatin Organization Modifier) domain found in a Moniliophthora perniciosa FA553 putative retrotelement polyprotein, which includes domains in the following order: a reverse transcriptase, RNase H, and an integrase, here the chromodomain is found at the C-terminus of the integrase domain. The chromodomain, is a conserved region of about 50 amino acids, found in a variety of chromosomal proteins, and implicated in the binding, of the proteins in which it is found, to methylated histone tails and maybe RNA. A chromodomain may occur as a single instance, in a tandem arrangement, or followed by a related ""chromo shadow"" domain" 50 1 0 0 0 0 0 0 1 193 cd18973 CD_Tf2-1_POL_like "chromodomain of Rhizoctonia solani AG-1 IB retrotransposable element Tf2 155 kDa protein type 1, and similar proteins. This subgroup includes the CHROMO (CHRromatin Organization Modifier) domain found in Rhizoctonia solani AG-1 IB retrotransposable element Tf2 155 kDa protein type 1 (Tf2-1), and similar proteins. It belongs to the Ty3/gypsy family of long terminal repeat (LTR) retrotransposons. The pol gene in TY3/gypsy elements generally encodes domains in the following order: an aspartyl protease, a reverse transcriptase, RNase H, and an integrase, here the chromodomain is found at the C-terminus of the integrase domain. The chromodomain, is a conserved region of about 50 amino acids, found in a variety of chromosomal proteins, and implicated in the binding, of the proteins in which it is found, to methylated histone tails and maybe RNA. A chromodomain may occur as a single instance, in a tandem arrangement, or followed by a related chromo shadow domain." 50 1 0 0 0 0 0 0 1 195 cd18975 CD_MarY1_POL_like "chromodomain of Tricholoma matsutake polyprotein, and similar proteins. This subgroup includes the CHROMO (CHRromatin Organization Modifier) domain found in the polyprotein from the MarY1 Ty3/Gypsy long terminal repeat (LTR) retroelement from the from the Ectomycorrhizal Basidiomycete Tricholoma matsutake. The pol gene in TY3/gypsy elements generally encodes domains in the following order: prt-reverse transcriptase-RNase H-integrase, in marY1 POL the chromodomain is found at the C-terminus of the integrase domain. The chromodomain, is a conserved region of about 50 amino acids, found in a variety of chromosomal proteins, and implicated in the binding, of the proteins in which it is found, to methylated histone tails and maybe RNA. A chromodomain may occur as a single instance, in a tandem arrangement, or followed by a related chromo shadow domain." 49 1 0 0 0 0 0 0 1 196 cd18977 CD_POL_like "chromodomain of a Rhizoctonia solani AG-3 Rhs1AP polyprotein, and similar proteins. This subgroup includes the CHROMO (CHRromatin Organization Modifier) domain found in a Rhizoctonia solani AG-3 Rhs1AP, a putative Ty3/Gypsy polyprotein/retrotransposon which includes a protease, a reverse transcriptase, a ribonuclease H, and an integrase domain, in that order, with a chromodomain at the C-terminus of the integrase domain. The chromodomain, is a conserved region of about 50 amino acids, found in a variety of chromosomal proteins, and implicated in the binding, of the proteins in which it is found, to methylated histone tails and maybe RNA. A chromodomain may occur as a single instance, in a tandem arrangement, or followed by a related chromo shadow domain." 57 1 0 0 0 0 0 0 1 198 cd18979 CD_POL_like "chromodomain of a Zea maize putative metaviridae (gypsy-type) retrotransposon polyproteins (Z195D10.9), and similar proteins. This subgroup includes the CHROMO (CHRromatin Organization Modifier) domain found in Zea maize Z195D10.9 protein, and other putative TY3/gypsy retrotransposon polyproteins. The pol gene in TY3/gypsy elements generally encodes domains in the following order: an aspartyl protease, a reverse transcriptase, RNase H, and an integrase, here the chromodomain is found at the C-terminus of the integrase domain. The chromodomain, is a conserved region of about 50 amino acids, found in a variety of chromosomal proteins, and implicated in the binding, of the proteins in which it is found, to methylated histone tails and maybe RNA. A chromodomain may occur as a single instance, in a tandem arrangement, or followed by a related chromo shadow domain." 48 1 0 0 0 0 0 0 1 201 pfam00665 rve Integrase core domain. Integrase mediates integration of a DNA copy of the viral genome into the host chromosome. Integrase is composed of three domains. The amino-terminal domain is a zinc binding domain pfam02022. This domain is the central catalytic domain. The carboxyl terminal domain that is a non-specific DNA binding domain pfam00552. The catalytic domain acts as an endonuclease when two nucleotides are removed from the 3' ends of the blunt-ended viral DNA made by reverse transcription. This domain also catalyzes the DNA strand transfer reaction of the 3' ends of the viral DNA to the 5' ends of the integration site. 114 1 0 0 0 0 0 0 1 202 pfam00872 Transposase_mut "Transposase, Mutator family. " 380 0 0 0 1 0 0 0 1 203 pfam01359 Transposase_1 Transposase (partial DDE domain). This family includes the mariner transposase. 80 0 0 0 1 0 0 0 1 204 pfam01385 OrfB_IS605 "Probable transposase. This family includes IS891, IS1136 and IS1341. DUF1225, pfam06774, has now been merged into this family." 119 0 0 0 1 0 0 0 1 205 pfam01498 HTH_Tnp_Tc3_2 "Transposase. Transposase proteins are necessary for efficient DNA transposition. This family includes the amino-terminal region of Tc1, Tc1A, Tc1B and Tc2B transposases of C.elegans. The region encompasses the specific DNA binding and second DNA recognition domains as well as an amino-terminal region of the catalytic domain of Tc3 as described in. Tc3 is a member of the Tc1/mariner family of transposable elements." 72 0 0 0 1 0 0 0 1 209 pfam01609 DDE_Tnp_1 "Transposase DDE domain. Transposase proteins are necessary for efficient DNA transposition. This domain is a member of the DDE superfamily, which contain three carboxylate residues that are believed to be responsible for coordinating metal ions needed for catalysis. The catalytic activity of this enzyme involves DNA cleavage at a specific site followed by a strand transfer reaction. This family contains transposases for IS4, IS421, IS5377, IS427, IS402, IS1355, IS5, which was original isolated in bacteriophage lambda." 196 0 0 0 1 0 0 0 1 212 pfam01695 IstB_IS21 "IstB-like ATP binding protein. This protein contains an ATP/GTP binding P-loop motif. It is found associated with IS21 family insertion sequences. The function of this protein is unknown, but it may perform a transposase function." 176 0 0 0 1 0 0 0 1 214 pfam01797 Y1_Tnp Transposase IS200 like. Transposases are needed for efficient transposition of the insertion sequence or transposon DNA. This family includes transposases for IS200 from E. coli. 121 0 0 0 1 0 0 0 1 215 pfam02022 Integrase_Zn Integrase Zinc binding domain. Integrase mediates integration of a DNA copy of the viral genome into the host chromosome. Integrase is composed of three domains. This domain is the amino-terminal domain zinc binding domain. The central domain is the catalytic domain pfam00665. The carboxyl terminal domain is a DNA binding domain pfam00552. 37 1 0 0 0 0 0 0 1 216 pfam02195 ParBc ParB-like nuclease domain. 90 0 0 0 0 0 0 1 1 217 pfam02281 Dimer_Tnp_Tn5 "Transposase Tn5 dimerization domain. Transposons are mobile DNA sequences capable of replication and insertion into the chromosome. Typically transposons code for the transposase enzyme, which catalyzes insertion, found between terminal inverted repeats. Tn5 has a unique method of self- regulation in which a truncated version of the transposase enzyme acts as an inhibitor. The catalytic domain of the Tn5 transposon is found in pfam01609. This domain mediates dimerization in the known structure." 106 0 0 0 1 0 0 0 1 219 pfam02371 Transposase_20 "Transposase IS116/IS110/IS902 family. Transposases are needed for efficient transposition of the insertion sequence or transposon DNA. This family includes transposases for IS116, IS110 and IS902. This region is often found with pfam01548. The exact function of this region is uncertain. This family contains a HHH motif suggesting a DNA-binding function." 86 0 0 0 1 0 0 0 1 224 pfam02992 Transposase_21 Transposase family tnp2. 211 0 0 0 1 0 0 0 1 225 pfam03004 Transposase_24 Plant transposase (Ptta/En/Spm family). Transposase proteins are necessary for efficient DNA transposition. This family includes various plant transposases from the Ptta and En/Spm families. 137 0 0 0 1 0 0 0 1 226 pfam03017 Transposase_23 TNP1/EN/SPM transposase. 66 0 0 0 1 0 0 0 1 228 pfam03108 DBD_Tnp_Mut MuDR family transposase. This region is found in plant proteins that are presumed to be the transposases for Mutator transposable elements. These transposons contain two ORFs. The molecular function of this region is unknown. 65 0 0 0 1 0 0 0 1 229 pfam03184 DDE_1 "DDE superfamily endonuclease. This family of proteins are related to pfam00665 and are probably endonucleases of the DDE superfamily. Transposase proteins are necessary for efficient DNA transposition. This domain is a member of the DDE superfamily, which contain three carboxylate residues that are believed to be responsible for coordinating metal ions needed for catalysis. The catalytic activity of this enzyme involves DNA cleavage at a specific site followed by a strand transfer reaction. Interestingly this family also includes the CENP-B protein. This domain in that protein appears to have lost the metal binding residues and is unlikely to have endonuclease activity. Centromere Protein B (CENP-B) is a DNA-binding protein localized to the centromere." 177 0 0 0 1 0 0 0 1 230 pfam03221 HTH_Tnp_Tc5 Tc5 transposase DNA-binding domain. 66 0 0 0 1 0 0 0 1 232 pfam03930 Flp_N Recombinase Flp protein N-terminus. 82 0 0 1 0 0 0 0 1 233 pfam04236 Transp_Tc5_C Tc5 transposase C-terminal domain. This family corresponds to a C-terminal cysteine rich region that probably binds to a metal ion and could be DNA binding (pers. obs. A Bateman). 63 0 0 0 1 0 0 0 1 235 pfam04693 DDE_Tnp_2 Archaeal putative transposase ISC1217. 327 0 0 0 1 0 0 0 1 237 pfam04827 Plant_tran Plant transposon protein. This family contains plant transposases which are putative members of the PIF / Ping-Pong family. 205 0 0 0 1 0 0 0 1 239 pfam04937 DUF659 Protein of unknown function (DUF 659). Transposase-like protein with no known function. 152 0 0 0 1 0 0 0 1 240 pfam04986 Y2_Tnp Putative transposase. Transposases are needed for efficient transposition of the insertion sequence or transposon DNA. This family includes transposases IS1294 and IS801. This is a rolling-circle transposase. 183 0 0 0 1 0 0 0 1 241 pfam05202 Flp_C Recombinase Flp protein. 243 0 0 1 0 0 0 0 1 244 pfam05699 Dimer_Tnp_hAT hAT family C-terminal dimerization region. This dimerization region is found at the C-terminus of the transposases of elements belonging to the Activator superfamily (hAT element superfamily). The isolated dimerization region forms extremely stable dimers in vitro. 84 0 0 0 1 0 0 0 1 245 pfam05946 TcpA "Toxin-coregulated pilus subunit TcpA. This family consists of toxin-coregulated pilus subunit (TcpA) proteins from Vibrio cholerae and related sequences. The major virulence factors of toxigenic Vibrio cholerae are cholera toxin (CT), which is encoded by a lysogenic bacteriophage (CTXPhi), and toxin-coregulated pilus (TCP), an essential colonisation factor which is also the receptor for CTXPhi. The genes for the biosynthesis of TCP are part of a larger genetic element known as the TCP pathogenicity island." 130 0 0 0 0 1 0 0 1 249 pfam06465 DUF1087 "Domain of Unknown Function (DUF1087). Members of this family are found in various chromatin remodelling factors and transposases. Their exact function is, as yet, unknown." 61 0 0 0 1 0 0 0 1 250 pfam06467 zf-FCS "MYM-type Zinc finger with FCS sequence motif. MYM-type zinc fingers were identified in MYM family proteins. Human protein ZMYM3 is involved in a chromosomal translocation and may be responsible for X-linked retardation in XQ13.1. ZMYM2 is also involved in disease. In myeloproliferative disorders it is fused to FGF receptor 1; in atypical myeloproliferative disorders it is rearranged. Members of the family generally are involved in development. This Zn-finger domain functions as a transcriptional trans-activator of late vaccinia viral genes, and orthologues are also found in all nucleocytoplasmic large DNA viruses, NCLDV. This domain is also found fused to the C termini of recombinases from certain prokaryotic transposons." 40 0 0 1 0 0 0 0 1 252 pfam06892 Phage_CP76 Phage regulatory protein CII (CP76). This family consists of several phage regulatory protein CII (CP76) sequences which are thought to be DNA binding proteins which are involved in the establishment of lysogeny. 155 0 0 0 0 1 0 0 1 258 pfam07935 SSV1_ORF_D-335 ORF D-335-like protein. The sequences featured in this family are similar to a probable integrase expressed by the SSV1 virus of the archaebacterium Sulfolobus shibatae. This protein may be necessary for the integration of the virus into the host genome by a process of site-specific recombination. 63 1 0 0 0 0 0 0 1 259 pfam08721 Tn7_Tnp_TnsA_C "TnsA endonuclease C terminal. The Tn7 transposase is composed of proteins TnsA and TnsB. DNA breakage at the 5' end of the transposon is carried out by TnsA, and breakage and joining at the 3' end is carried out by TnsB. The C terminal domain of TnsA binds DNA." 83 0 0 0 1 0 0 0 1 260 pfam08722 Tn7_Tnp_TnsA_N "TnsA endonuclease N terminal. The Tn7 transposase is composed of proteins TnsA and TnsB. DNA breakage at the 5' end of the transposon is carried out by TnsA, and breakage and joining at the 3' end is carried out by TnsB. The N terminal domain of TnsA is catalytic." 83 0 0 0 1 0 0 0 1 264 pfam09035 Tn916-Xis "Excisionase from transposon Tn916. The phage-encoded excisionase protein Tn916-Xis adopts a winged-helix structure that consists of a three-stranded anti-parallel beta-sheet that packs against a helix-turn-helix (HTH) motif and a third C-terminal alpha-helix. It is encoded for by Tn916, which also codes for the integrase Tn916-Int. The protein interacts with DNA by the insertion of helix alpha-2 into the major groove and the contact of the hairpin that connects strands beta-2 and beta-3 with the adjacent phosphodiester backbone and/or minor groove. Tn916-Xis stimulates phage excision and inhibits viral integration by stabilizing distorted DNA structures." 62 1 1 0 0 0 0 0 2 267 pfam09299 Mu-transpos_C "Mu transposase, C-terminal. Members of this family are found in various prokaryotic integrases and transposases. They adopt a beta-barrel structure with Greek-key topology." 61 1 0 0 1 0 0 0 2 268 pfam09322 DUF1979 Domain of unknown function (DUF1979). Members of this family of functionally uncharacterized domains are found in various Oryza sativa mutator-like transposases. 58 0 0 0 1 0 0 0 1 273 pfam10136 SpecificRecomb Site-specific recombinase. Members of this family of bacterial proteins are found in various putative site-specific recombinase transmembrane proteins. 640 0 0 1 0 0 0 0 1 274 pfam10536 PMD Plant mobile domain. This domain was identified by Babu and colleagues in a variety of transposases. 360 0 0 0 1 0 0 0 1 276 pfam10551 MULE MULE transposase domain. This domain was identified by Babu and colleagues. 96 0 0 0 1 0 0 0 1 277 pfam10683 DBD_Tnp_Hermes Hermes transposase DNA-binding domain. This domain confers specific DNA-binding on Hermes transposase. 68 0 0 0 1 0 0 0 1 279 pfam11358 DUF3158 Protein of unknown function (DUF3158). Some members in this family of proteins are annotated as integrase regulator R however this cannot be confirmed. This family of proteins with unknown function appear to be restricted to Proteobacteria. 152 1 0 0 0 0 0 0 1 281 pfam11426 Tn7_TnsC_Int Tn7 transposition regulator TnsC. TnsC is a molecular switch that regulates transposition and interacts with TnsA which is a component of the transposase. The two proteins interact via the residues 504-555 on TnsC. The TnsA/TnsC interaction is very important in Tn7 transposition. 47 0 0 0 1 0 0 0 1 282 pfam11427 HTH_Tnp_Tc3_1 "Tc3 transposase. Tc3 is transposase with a specific DNA-binding domain which contains three alpha-helices, two of which form a helix-turn-helix motif which makes four base-specific contacts with the major groove. The N-terminus makes contacts with the minor groove. There is a base specific recognition between Tc3 and the transposon DNA. The DNA binding domain forms a dimer in which each monomer binds a separate transposon end. This implicates that the dimer has a role in synapsis and is necessary for the simultaneous cleavage of both transposon termini." 50 0 0 0 1 0 0 0 1 283 pfam11467 LEDGF Lens epithelium-derived growth factor (LEDGF). LEDGF is a chromatin-associated protein that protects cells from stress-induced apoptosis. It is the binding partner of HIV-1 integrase in human cells. The integrase binding domain (IBD) of LEDGF is a compact right-handed bundle composed of five alpha-helices. The residues essential for the interaction with the integrase are present in the inter-helical loop regions of the bundle structure. 102 1 0 0 0 0 0 0 1 285 pfam11917 DUF3435 Protein of unknown function (DUF3435). This family of proteins are functionally uncharacterized. This protein is found in eukaryotes. Proteins in this family are typically between 435 to 791 amino acids in length. This family is related to pfam00589 suggesting it may be an integrase enzyme. 418 1 0 0 0 0 0 0 1 286 pfam12017 Tnp_P_element Transposase protein. Protein in this family are transposases found in insects. This region is about 230 amino acids in length and is found associated with pfam05485. 219 0 0 0 1 0 0 0 1 289 pfam12482 DUF3701 "Phage integrase protein. This domain family is found in bacteria, and is approximately 100 amino acids in length. The family is found in association with pfam00589." 88 1 0 0 0 0 0 0 1 292 pfam12760 Zn_Tnp_IS1595 "Transposase zinc-ribbon domain. This zinc binding domain is found in a range of transposase proteins such as ISSPO8, ISSOD11, ISRSSP2 etc. It is likely a zinc-binding beta ribbon domain that could bind the DNA." 46 0 0 0 1 0 0 0 1 295 pfam12834 Phage_int_SAM_2 "Phage integrase, N-terminal. This is a family of DNA-binding prophage integrases. It is found largely in Proteobacteria." 91 1 0 0 0 0 0 0 1 297 pfam12851 Tet_JBP "Oxygenase domain of the 2OGFeDO superfamily. A double-stranded beta helix (DSBH) fold domain of the 2-oxoglutarate (2OG)-Fe(II)-dependent dioxygenase (2OGFeDO) superfamily found in various eukaryotes, bacteria and bacteriophages. Members of this family catalyze nucleic acid modifications, such as thymidine hydroxylation during base J synthesis in kinetoplastids, and the conversion of 5 methyl-cytosine (5-mC) to 5-hydroxymethyl-cytosine (hmC), or further oxidation to 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC). Metazoan TET proteins contain a cysteine-rich region inserted into the core of the DSBH fold. Vertebrate TET proteins are oncogenes that are mutated in various myeloid cancers. Fungal and algal versions of this family are linked to a predicted transposase and show lineage-specific expansions." 166 0 0 0 1 0 0 0 1 298 pfam12940 RAG1 "Recombination-activation protein 1 (RAG1), recombinase. This family is one of the two different components of the RAG1-RAG2 V(D)J recombinase complex. The RAG complex, consisting of two RAG1 and two RAG2 proteins is a multi-protein complex that mediates DNA cleavage during V(D)J (variable-diversity-joining) recombination. RAG1 mediates DNA-binding to the conserved recombination signal sequences (RSS). Many of the proteins in this family are fragments. Solution of the structure of the complex of RAG1 and RAG2 shows that each protein dimerizes with itself and each pair then complexes together to from the RAG1-RAG2 V(D)J recombinase enzyme. The different structural elements in RAG1 for UniProtKB:P15919 are: an N-terminal nonamer-binding domain from residues 391-459; a dimerization and DNA-binding domain from 459-515; an extended pre-RNase H domain from 515-588; the catalytic RNase H domain from 588-719; a ZnC2 domain from 719-791; and ZnH2 domain from 791-962; and a three-helix C-terminal domain from 962-1008." 653 0 0 1 0 0 0 0 1 300 pfam13006 Nterm_IS4 "Insertion element 4 transposase N-terminal. This family represents the N-terminal region of proteins carrying the transposase enzyme, DDE_Tnp_1 (that was Transposase_11), pfam01609, at the C-terminus. The full-length members are Insertion Element 4, IS4. Within the collection of E.coli strains, ECOR, the number of IS4 elements varies from zero to 14, with an average of 5 copies/strain." 95 0 0 0 1 0 0 0 1 301 pfam13007 LZ_Tnp_IS66 Transposase C of IS166 homeodomain. This is a leucine-zipper-like or homeodomain-like region of transposase TnpC of insertion element IS66. 67 0 0 0 1 0 0 0 1 304 pfam13017 Maelstrom "piRNA pathway germ-plasm component. Maelstrom is a germ-plasm component protein, that is shown to be functionally involved in the piRNA pathway. It is conserved throughout Eukaryota, though it appears to have been lost from all examined teleost fish species. The domain architecture shows that it is coupled with several DNA- and RNA- related domains such as HMG box, SR-25-like and HDAC_interact domains. Sequence analysis and fold recognition have found a distant similarity between Maelstrom domain and the DnaQ 3'-5' exonuclease family with the RNase H fold (Exonuc_X-T, pfam00929); notably, that the Maelstrom domains from basal eukaryotes contain the conserved 3'-5' exonuclease active site residues (Asp-Glu-Asp-His-Asp, DEDHD). However, the animal and some amoeba maelstrom contain another set of conserved residues (Glu-His-His-Cys-His-Cys, EHHCHC). This evolutionary link together with structural examinations leads to the hypothesis that Maelstrom domains may have a potential nuclease-transposase activity or RNA-binding ability that may be implicated in piRNA biogenesis. A protein function evolution mode, namely ""active site switch"", has been proposed, in which the amoeba Maelstrom domains are the possible evolutionary intermediates due to their harbouring of the specific characteristics of both 3'-5' exonuclease and Maelstrom domains." 215 0 0 0 1 0 0 0 1 311 pfam13359 DDE_Tnp_4 "DDE superfamily endonuclease. This family of proteins are related to pfam00665 and are probably endonucleases of the DDE superfamily. Transposase proteins are necessary for efficient DNA transposition. This domain is a member of the DDE superfamily, which contain three carboxylate residues that are believed to be responsible for coordinating metal ions needed for catalysis. The catalytic activity of this enzyme involves DNA cleavage at a specific site followed by a strand transfer reaction." 158 0 0 0 1 0 0 0 1 316 pfam13546 DDE_5 "DDE superfamily endonuclease. This family of proteins are related to pfam00665 and are probably endonucleases of the DDE superfamily. Transposase proteins are necessary for efficient DNA transposition. This domain is a member of the DDE superfamily, which contain three carboxylate residues that are believed to be responsible for coordinating metal ions needed for catalysis. The catalytic activity of this enzyme involves DNA cleavage at a specific site followed by a strand transfer reaction." 266 0 0 0 1 0 0 0 1 317 pfam13586 DDE_Tnp_1_2 "Transposase DDE domain. Transposase proteins are necessary for efficient DNA transposition. This domain is a member of the DDE superfamily, which contain three carboxylate residues that are believed to be responsible for coordinating metal ions needed for catalysis." 90 0 0 0 1 0 0 0 1 319 pfam13612 DDE_Tnp_1_3 "Transposase DDE domain. Transposase proteins are necessary for efficient DNA transposition. This domain is a member of the DDE superfamily, which contains three carboxylate residues that are believed to be responsible for coordinating metal ions needed for catalysis. The catalytic activity of this enzyme involves DNA cleavage at a specific site followed by a strand transfer reaction." 154 0 0 0 1 0 0 0 1 323 pfam13701 DDE_Tnp_1_4 "Transposase DDE domain group 1. Transposase proteins are necessary for efficient DNA transposition. This domain is a member of the DDE superfamily, which contain three carboxylate residues that are believed to be responsible for coordinating metal ions needed for catalysis." 434 0 0 0 1 0 0 0 1 326 pfam13808 DDE_Tnp_1_assoc "DDE_Tnp_1-associated. This domain is frequently found N-terminal to the transposase, IS family DDE_Tnp_1, pfam01609 and its relatives." 88 0 0 0 1 0 0 0 1 328 pfam13952 DUF4216 "Domain of unknown function (DUF4216). This DUF is sometimes found at the C-terminal end of proteins carrying a Transposase_21 domain, pfam02992." 69 0 0 0 1 0 0 0 1 329 pfam13963 Transpos_assoc Transposase-associated domain. 74 0 0 0 1 0 0 0 1 338 pfam14882 PHINT_rpt Phage-integrase repeat unit. This repeat family is found on phage-integrase proteins in up to 15 copies. The function is not known. 52 1 0 0 0 0 0 0 1 339 pfam15571 Imm44 "Immunity protein 44. A predicted immunity protein with an alpha+beta fold. Proteins containing this domain are present in bacterial polymorphic toxin systems as an immediate gene neighbor of the toxin gene, usually containing a domain of the Tox-URI1, Tox-URI2 or Tox-ParBL1 families. The gene for this toxin is also found in heterogeneous poly-immunity loci that show variations in structure even between closely related strains." 126 0 0 0 0 0 0 1 1 340 pfam15590 Imm27 "Immunity protein 27. A predicted immunity protein with an alpha+beta fold and a conserved aspartate and GGxP motif. Proteins containing this domain are present in bacterial polymorphic toxin systems as an immediate gene neighbor of the toxin gene, usually containing a domain of the Ntox10 or Tox-ParB families." 67 0 0 0 0 0 0 1 1 343 pfam17293 Arm-DNA-bind_5 Arm DNA-binding domain. This domain is the N-terminal Arm DNA-binding domain found in various tyrosine recombinases. 87 0 0 1 0 0 0 0 1 346 pfam17906 HTH_48 HTH domain in Mos1 transposase. The N-terminal domain of the Mos1 Mariner transposase comprises two HTH domains. This HTH domain binds in the DNA major groove to the transposons inverted repeats. 50 0 0 0 1 0 0 0 1 348 pfam18064 ParB_C "Centromere-binding protein ParB C-terminal. This is the C-terminal domain found in centromere-binding protein ParB, which is used for stable segregation. The C-terminal domain has a ribbon-helix helix (RHH) motif with a C-terminal loop (residues 119-128) following helix alpha-2. The domain forms a dimer with the C-terminal of the beta chain. The function of the C-terminal domain is to bind to DNA." 47 0 0 0 0 0 0 1 1 351 pfam18231 DUF5603 "Domain of unknown function (DUF5603). This domain is found in the C-terminal region of free serine kinase (SerK) in the hyperthermophilic archaeon Thermococcus kodakarensis. SerK converts ADP and l-serine (Ser) into AMP and O-phospho-l-serine (Sep), which is a precursor of l-cysteine. The domain is not conserved in the ParB/Srx family. The differences between SerK and the other members of the ParB/Srx family is concentrated in the C-terminal region, which may include residues involved in the Sep binding." 105 0 0 0 0 0 0 1 1 353 pfam18644 Phage_int_SAM_6 "Phage integrase SAM-like domain. Xer recombinases are members of the tyrosine site-specific recombinase superfamily, a large group of enzymes that catalyze DNA breakage and rejoining using a conserved tyrosine nucleophile. Tyrosine recombinases promote various programmed DNA rearrangements including the monomerization of phage, plasmid and chromosome multimers, resolution of hairpin telomeres, and the movement of virulence and antibiotic resistance carrying integrative mobile genetic elements. Structural analysis of Helicobacter pylori XerH indicates that this N-terminal domain consisting of six alpha-helices contacts the DNA using a four-helix bundle." 132 1 0 1 0 0 0 0 2 357 pfam18721 CxC6 CxC6 like cysteine cluster associated with KDZ transposases. A predicted Zinc chelating domain inserted into the core of the KDZ transposase domain. 62 0 0 0 1 0 0 0 1 358 pfam18735 HEPN_RiboL-PSP RiboL-PSP-HEPN. RiboL-PSP-HEPN. Fused to endoRNase L-PSP ; in operon with ParB. 188 0 0 0 0 0 0 1 1 359 pfam18737 HEPN_MAE_28990 "MAE_28990/MAE_18760-like HEPN. HEPN-like nuclease. MAE_28990 In operon with a ParB nuclease and DNA methylase genes. MAE_18760-like HEPN found fused to HEPN/RES-NTD1, HEPN/Toprim-NTD1, Schlafen and a novel beta rich domain. In operon with ParA/Soj ATPase of SIMIBI-type GTPase fold." 207 0 0 0 0 0 0 1 1 363 pfam18802 CxC1 CxC1 like cysteine cluster associated with KDZ transposases. A predicted Zinc chelating domain present N-terminal to the KDZ transposase domain. 104 0 0 0 1 0 0 0 1 367 smart00470 ParB "ParB-like nuclease domain. Plasmid RK2 ParB preferentially cleaves single-stranded DNA. ParB also nicks supercoiled plasmid DNA preferably at sites with potential single-stranded character, like AT-rich regions and sequences that can form cruciform structures. ParB also exhibits 5-->3 exonuclease activity." 89 0 0 0 0 0 0 1 1 368 smart00597 ZnF_TTF zinc finger in transposases and transcription factors. 91 0 0 0 1 0 0 0 1 369 smart00614 ZnF_BED BED zinc finger. DNA-binding domain in chromatin-boundary-element-binding proteins and transposases 50 0 0 0 1 0 0 0 1