assembly_id	genome_id	genome_def	crispr_array_locus_merge	crispr_array_location_merge	crispr_locus_id	crispr_pred_method	array_in_prot	prot_within_array_20000	prot_in_genome	crispr_type_by_cas_prot	consensus_repeat	repeat_length	self-targeting_spacer_number	self-targeting_target_number	spacer_location	protospacer_location	repeat_type	spacer_locus_num	spacer_num	correct_crispr_type	genome_cas_prots	unknown_protein_around_crispr	L10	L10_domain	L9	L9_domain	L8	L8_domain	L7	L7_domain	L6	L6_domain	L5	L5_domain	L4	L4_domain	L3	L3_domain	L2	L2_domain	L1	L1_domain	R1	R1_domain	R2	R2_domain	R3	R3_domain	R4	R4_domain	R5	R5_domain	R6	R6_domain	R7	R7_domain	R8	R8_domain	R9	R9_domain	R10	R10_domain
GCF_000184925.1_ASM18492v1	NC_017304	Hungateiclostridium thermocellum DSM 1313, complete sequence	1	879396-879663	1,1,1	PILER-CR,CRISPRCasFinder,CRT	no	cas8b1,cas6,csx1,cas10,csm3gr7,csx10gr5,csx19,cas2,cas4	cas3,WYL,cas8b1,cas6,csx1,cas10,csm3gr7,csx10gr5,csx19,cas2,cas4,csa3,DEDDh,DinG,csm2gr11,cas1,cas5,cas7	Type I-B,Type III-C,Type III-D,Type III-B,Type III-A	GTTGAAGAGGTACTTCCAGTAAAACAAGGATTGAAACATA,GTTGAAGAGGTACTTCCAGTAAAACAAGGATTGAAAC,GTTGAAGAGGTACTTCCAGTAAAACAAGGATTGAAAC	40,37,37	0	0	NA	NA	?:?:?	2,2,3	3	TypeI-B,TypeIII-C,TypeIII-D,TypeIII-B,TypeIII-A	cas3,WYL,cas8b1,cas6,csx1,cas10,csm3gr7,csx10gr5,csx19,cas2,cas4,csa3,DEDDh,DinG,csm2gr11,cas1,cas5,cas7	csx1|329aa|up_9|NC_017304.1_867734_868721_+,csx1|330aa|up_8|NC_017304.1_868710_869700_+,csx19|156aa|up_3|NC_017304.1_874863_875331_+,NA|60aa|up_1|NC_017304.1_877413_877593_+,NA|83aa|down_3|NC_017304.1_883053_883302_-,NA|47aa|down_4|NC_017304.1_883412_883553_-,NA|216aa|down_5|NC_017304.1_884378_885026_+,NA|266aa|down_6|NC_017304.1_885162_885960_+,NA|166aa|down_7|NC_017304.1_886001_886499_+	csx1|329aa|up_9|NC_017304.1_867734_868721_+	NA	csx1|330aa|up_8|NC_017304.1_868710_869700_+	NA	cas10|502aa|up_7|NC_017304.1_869719_871225_+	cd09679, Cas10_III, CRISPR/Cas system-associated protein Cas10	csm3gr7|222aa|up_6|NC_017304.1_871225_871891_+	pfam03787, RAMPs, RAMP superfamily	csx10gr5|534aa|up_5|NC_017304.1_871883_873485_+	cd09700, Csx10, CRISPR/Cas system-associated RAMP superfamily protein Csx10	csm3gr7|461aa|up_4|NC_017304.1_873484_874867_+	cd09726, RAMP_I_III, CRISPR/Cas system-associated RAMP superfamily protein	csx19|156aa|up_3|NC_017304.1_874863_875331_+	NA	csm3gr7|693aa|up_2|NC_017304.1_875335_877414_+	TIGR03986, CRISPR-associated_protein, CRISPR-associated protein	NA|60aa|up_1|NC_017304.1_877413_877593_+	NA	csx1|489aa|up_0|NC_017304.1_877636_879103_+	pfam09670, Cas_Cas02710, CRISPR-associated protein (Cas_Cas02710)	cas2|97aa|down_0|NC_017304.1_880662_880953_+	cd09725, Cas2_I_II_III, CRISPR/Cas system-associated protein Cas2	cas4|210aa|down_1|NC_017304.1_880927_881557_+	cd09637, Cas4_I-A_I-B_I-C_I-D_II-B, CRISPR/Cas system-associated protein Cas4	NA|408aa|down_2|NC_017304.1_881639_882863_+	pfam00872, Transposase_mut, Transposase, Mutator family	NA|83aa|down_3|NC_017304.1_883053_883302_-	NA	NA|47aa|down_4|NC_017304.1_883412_883553_-	NA	NA|216aa|down_5|NC_017304.1_884378_885026_+	NA	NA|266aa|down_6|NC_017304.1_885162_885960_+	NA	NA|166aa|down_7|NC_017304.1_886001_886499_+	NA	NA|287aa|down_8|NC_017304.1_887008_887869_+	cd00200, WD40, WD40 domain, found in a number of eukaryotic proteins that cover a wide variety of functions including adaptor/regulatory modules in signal transduction, pre-mRNA processing and cytoskeleton assembly; typically contains a GH dipeptide 11-24 residues from its N-terminus and the WD dipeptide at its C-terminus and is 40 residues long, hence the name WD40; between GH and WD lies a conserved core; serves as a stable propeller-like platform to which proteins can bind either stably or reversibly; forms a propeller-like structure with several blades where each blade is composed of a four-stranded anti-parallel b-sheet; instances with few detectable copies are hypothesized to form larger structures by dimerization; each WD40 sequence repeat forms the first three strands of one blade and the last strand in the next blade; the last C-terminal WD40 repeat completes the blade structure of the first WD40 repeat to create the closed ring propeller-structure; residues on the top and bottom surface of the propeller are proposed to coordinate interactions with other proteins and/or small ligands; 7 copies of the repeat are present in this alignment	NA|384aa|down_9|NC_017304.1_888112_889264_-	PHA02517, PHA02517, putative transposase OrfB; Reviewed
GCF_000184925.1_ASM18492v1	NC_017304	Hungateiclostridium thermocellum DSM 1313, complete sequence	2	976887-982276	2,2,2	CRISPRCasFinder,CRT,PILER-CR	no		cas3,WYL,cas8b1,cas6,csx1,cas10,csm3gr7,csx10gr5,csx19,cas2,cas4,csa3,DEDDh,DinG,csm2gr11,cas1,cas5,cas7	Orphan	GTTTCAATTCCTCATAGGTACGATAAAAAC,GTTTCAATTCCTCATAGGTACGATAAAAAC,GTTTCAATTCCTCATAGGTACGATAAAAAC	30,30,30	0	0	NA	NA	NA:NA:NA	80,80,79	80	Orphan	cas3,WYL,cas8b1,cas6,csx1,cas10,csm3gr7,csx10gr5,csx19,cas2,cas4,csa3,DEDDh,DinG,csm2gr11,cas1,cas5,cas7	NA|111aa|up_0|NC_017304.1_976308_976641_-,NA	NA|257aa|up_9|NC_017304.1_965343_966114_+	PRK13111, trpA, tryptophan synthase subunit alpha; Provisional	NA|317aa|up_8|NC_017304.1_966239_967190_+	COG0053, MMT1, Predicted Co/Zn/Cd cation transporters [Inorganic ion transport and metabolism]	NA|233aa|up_7|NC_017304.1_968706_969405_-	COG0745, OmpR, Response regulators consisting of a CheY-like receiver domain and a winged-helix DNA-binding domain [Signal transduction mechanisms / Transcription]	NA|229aa|up_6|NC_017304.1_969580_970267_+	cd07750, PolyPPase_VTC_like, Polyphosphate(polyP) polymerase domain of yeast vacuolar transport chaperone (VTC) proteins VTC-2, -3 and- 4, and similar proteins	NA|229aa|up_5|NC_017304.1_970296_970983_+	pfam16316, DUF4956, Domain of unknown function (DUF4956)	NA|704aa|up_4|NC_017304.1_971021_973133_+	pfam08757, CotH, CotH kinase protein	NA|717aa|up_3|NC_017304.1_973164_975315_-	COG0370, FeoB, Fe2+ transport system protein B [Inorganic ion transport and metabolism]	NA|80aa|up_2|NC_017304.1_975397_975637_-	pfam04023, FeoA, FeoA domain	NA|160aa|up_1|NC_017304.1_975796_976276_+	PRK03902, PRK03902, transcriptional regulator MntR	NA|111aa|up_0|NC_017304.1_976308_976641_-	NA	NA|416aa|down_0|NC_017304.1_982665_983913_+	pfam07745, Glyco_hydro_53, Glycosyl hydrolase family 53	NA|149aa|down_1|NC_017304.1_984024_984471_+	pfam09719, C_GCAxxG_C_C, Putative redox-active protein (C_GCAxxG_C_C)	NA|843aa|down_2|NC_017304.1_984819_987348_+	cd14256, Dockerin_I, Type I dockerin repeat domain	NA|273aa|down_3|NC_017304.1_987582_988401_+	cd04194, GT8_A4GalT_like, A4GalT_like proteins catalyze the addition of galactose or glucose residues to the lipooligosaccharide (LOS) or lipopolysaccharide (LPS) of the bacterial cell surface	NA|515aa|down_4|NC_017304.1_988440_989985_+	cd09160, PLDc_SMU_988_like_2, Putative catalytic domain, repeat 2, of Streptococcus mutans uncharacterized protein SMU_988 and similar proteins	NA|64aa|down_5|NC_017304.1_990069_990261_+	COG1117, PstB, ABC-type phosphate transport system, ATPase component [Inorganic ion transport and metabolism]	NA|237aa|down_6|NC_017304.1_990324_991035_+	COG0745, OmpR, Response regulators consisting of a CheY-like receiver domain and a winged-helix DNA-binding domain [Signal transduction mechanisms / Transcription]	NA|590aa|down_7|NC_017304.1_991037_992807_+	NF033092, HK_WalK, cell wall metabolism sensor histidine kinase WalK	NA|512aa|down_8|NC_017304.1_993623_995159_+	PRK00915, PRK00915, 2-isopropylmalate synthase; Validated	NA|311aa|down_9|NC_017304.1_995296_996229_-	pfam12146, Hydrolase_4, Serine aminopeptidase, S33
GCF_000184925.1_ASM18492v1	NC_017304	Hungateiclostridium thermocellum DSM 1313, complete sequence	3	1914892-1916668	3,3,3,4	CRISPRCasFinder,CRT,PILER-CR,PILER-CR	no	cas3	cas3,WYL,cas8b1,cas6,csx1,cas10,csm3gr7,csx10gr5,csx19,cas2,cas4,csa3,DEDDh,DinG,csm2gr11,cas1,cas5,cas7	Unclear	ATTTCAATTCCTCATAGGTACGATACAAAC,ATTTCAATTCCTCATAGGTACGATACAAAC,ATTTCAATTCCTCATAGGTACGATACAAAC,TTTCAATTCCTCATAGGTACGATACAAAC	30,30,30,29	0	0	NA	NA	NA:NA:NA:NA	26,26,23,23	26	Unclear	cas3,WYL,cas8b1,cas6,csx1,cas10,csm3gr7,csx10gr5,csx19,cas2,cas4,csa3,DEDDh,DinG,csm2gr11,cas1,cas5,cas7	NA|356aa|up_3|NC_017304.1_1906229_1907297_-,NA|111aa|down_2|NC_017304.1_1918471_1918804_-	NA|357aa|up_9|NC_017304.1_1901269_1902340_-	PRK02615, PRK02615, thiamine phosphate synthase	NA|215aa|up_8|NC_017304.1_1902339_1902984_-	PRK08644, PRK08644, sulfur carrier protein ThiS adenylyltransferase ThiF	NA|370aa|up_7|NC_017304.1_1903060_1904170_-	PRK09240, thiH, 2-iminoacetate synthase ThiH	NA|256aa|up_6|NC_017304.1_1904298_1905066_-	PRK00208, thiG, thiazole synthase; Reviewed	NA|67aa|up_5|NC_017304.1_1905068_1905269_-	cd00565, Ubl_ThiS, ubiquitin-like (Ubl) domain found in sulfur carrier protein ThiS	NA|208aa|up_4|NC_017304.1_1905533_1906157_-	PRK00454, engB, GTP-binding protein YsxC; Reviewed	NA|356aa|up_3|NC_017304.1_1906229_1907297_-	NA	NA|67aa|up_2|NC_017304.1_1907871_1908072_+	PRK10767, PRK10767, chaperone protein DnaJ; Provisional	NA|348aa|up_1|NC_017304.1_1908193_1909237_-	pfam07833, Cu_amine_oxidN1, Copper amine oxidase N-terminal domain	NA|221aa|up_0|NC_017304.1_1909514_1910177_-	pfam13649, Methyltransf_25, Methyltransferase domain	NA|152aa|down_0|NC_017304.1_1916707_1917163_-	PRK00409, PRK00409, recombination and DNA strand exchange inhibitor protein; Reviewed	NA|384aa|down_1|NC_017304.1_1917244_1918396_-	cd00338, Ser_Recombinase, Serine Recombinase family, catalytic domain; a DNA binding domain may be present either N- or C-terminal to the catalytic domain	NA|111aa|down_2|NC_017304.1_1918471_1918804_-	NA	cas3|713aa|down_3|NC_017304.1_1919052_1921191_-	COG1201, Lhr, Lhr-like helicases [General function prediction only]	NA|615aa|down_4|NC_017304.1_1922529_1924374_-	pfam13208, TerB_N, TerB N-terminal domain	NA|223aa|down_5|NC_017304.1_1924534_1925203_-	PRK13413, mpi, master DNA invertase Mpi family serine-type recombinase	NA|259aa|down_6|NC_017304.1_1925486_1926263_-	COG1196, Smc, Chromosome segregation ATPases [Cell division and chromosome partitioning]	NA|54aa|down_7|NC_017304.1_1926287_1926449_-	pfam12728, HTH_17, Helix-turn-helix domain	NA|384aa|down_8|NC_017304.1_1927599_1928752_+	PHA02517, PHA02517, putative transposase OrfB; Reviewed	NA|274aa|down_9|NC_017304.1_1928940_1929762_-	pfam13730, HTH_36, Helix-turn-helix domain
GCF_000184925.1_ASM18492v1	NC_017304	Hungateiclostridium thermocellum DSM 1313, complete sequence	4	1930002-1933245	4,4,5	CRISPRCasFinder,CRT,PILER-CR	no	cas3	cas3,WYL,cas8b1,cas6,csx1,cas10,csm3gr7,csx10gr5,csx19,cas2,cas4,csa3,DEDDh,DinG,csm2gr11,cas1,cas5,cas7	Unclear	GTTTCAATTCCTCATAGGTACGATACAAAC,GTTTCAATTCCTCATAGGTACGATACAAAC,GTTTCAATTCCTCATAGGTACGATACAAAC	30,30,30	0	0	NA	NA	NA:NA:NA	48,48,48	48	Unclear	cas3,WYL,cas8b1,cas6,csx1,cas10,csm3gr7,csx10gr5,csx19,cas2,cas4,csa3,DEDDh,DinG,csm2gr11,cas1,cas5,cas7	NA|111aa|up_7|NC_017304.1_1918471_1918804_-,NA	NA|152aa|up_9|NC_017304.1_1916707_1917163_-	PRK00409, PRK00409, recombination and DNA strand exchange inhibitor protein; Reviewed	NA|384aa|up_8|NC_017304.1_1917244_1918396_-	cd00338, Ser_Recombinase, Serine Recombinase family, catalytic domain; a DNA binding domain may be present either N- or C-terminal to the catalytic domain	NA|111aa|up_7|NC_017304.1_1918471_1918804_-	NA	cas3|713aa|up_6|NC_017304.1_1919052_1921191_-	COG1201, Lhr, Lhr-like helicases [General function prediction only]	NA|615aa|up_5|NC_017304.1_1922529_1924374_-	pfam13208, TerB_N, TerB N-terminal domain	NA|223aa|up_4|NC_017304.1_1924534_1925203_-	PRK13413, mpi, master DNA invertase Mpi family serine-type recombinase	NA|259aa|up_3|NC_017304.1_1925486_1926263_-	COG1196, Smc, Chromosome segregation ATPases [Cell division and chromosome partitioning]	NA|54aa|up_2|NC_017304.1_1926287_1926449_-	pfam12728, HTH_17, Helix-turn-helix domain	NA|384aa|up_1|NC_017304.1_1927599_1928752_+	PHA02517, PHA02517, putative transposase OrfB; Reviewed	NA|274aa|up_0|NC_017304.1_1928940_1929762_-	pfam13730, HTH_36, Helix-turn-helix domain	NA|97aa|down_0|NC_017304.1_1933985_1934276_-	pfam04456, DUF503, Protein of unknown function (DUF503)	NA|395aa|down_1|NC_017304.1_1934405_1935590_+	pfam07228, SpoIIE, Stage II sporulation protein E (SpoIIE)	NA|588aa|down_2|NC_017304.1_1935617_1937381_+	COG4191, COG4191, Signal transduction histidine kinase regulating C4-dicarboxylate transport system [Signal transduction mechanisms]	NA|589aa|down_3|NC_017304.1_1937637_1939404_+	pfam05833, FbpA, Fibronectin-binding protein A N-terminus (FbpA)	NA|398aa|down_4|NC_017304.1_1939417_1940611_+	PRK06836, PRK06836, pyridoxal phosphate-dependent aminotransferase	NA|174aa|down_5|NC_017304.1_1940677_1941199_-	cd02151, nitroreductase, nitroreductase family protein	NA|737aa|down_6|NC_017304.1_1941752_1943963_+	pfam00759, Glyco_hydro_9, Glycosyl hydrolase family 9	NA|213aa|down_7|NC_017304.1_1944101_1944740_-	cd07995, TPK, Thiamine pyrophosphokinase	NA|221aa|down_8|NC_017304.1_1944752_1945415_-	cd00429, RPE, Ribulose-5-phosphate 3-epimerase (RPE)	NA|295aa|down_9|NC_017304.1_1945573_1946458_-	PRK00098, PRK00098, GTPase RsgA; Reviewed
GCF_000184925.1_ASM18492v1	NC_017304	Hungateiclostridium thermocellum DSM 1313, complete sequence	5	3480109-3481947	5,5,6	CRISPRCasFinder,CRT,PILER-CR	no	cas2,cas1,cas4,cas3,cas5,cas7,cas8b1,cas6	cas3,WYL,cas8b1,cas6,csx1,cas10,csm3gr7,csx10gr5,csx19,cas2,cas4,csa3,DEDDh,DinG,csm2gr11,cas1,cas5,cas7	Type I-B	GTTTCAATTCCTCATAGGTACGATAAAAAC,GTTTCAATTCCTCATAGGTACGATAAAAAC,GTTTCAATTCCTCATAGGTACGATAAAAAC	30,30,30	0	0	NA	NA	NA:NA:NA	27,27,26	27	TypeI-B	cas3,WYL,cas8b1,cas6,csx1,cas10,csm3gr7,csx10gr5,csx19,cas2,cas4,csa3,DEDDh,DinG,csm2gr11,cas1,cas5,cas7	NA|240aa|up_4|NC_017304.1_3476849_3477569_-,NA|88aa|up_2|NC_017304.1_3478237_3478501_-,NA|153aa|up_1|NC_017304.1_3478545_3479004_-,NA|180aa|up_0|NC_017304.1_3479020_3479560_-,cas8b1|559aa|down_6|NC_017304.1_3487829_3489506_-	NA|703aa|up_9|NC_017304.1_3470885_3472994_-	COG0643, CheA, Chemotaxis protein histidine kinase and related kinases [Cell motility and secretion / Signal transduction mechanisms]	NA|142aa|up_8|NC_017304.1_3473008_3473434_-	COG0835, CheW, Chemotaxis signal transduction protein [Cell motility and secretion / Signal transduction mechanisms]	NA|187aa|up_7|NC_017304.1_3474279_3474840_-	cd01192, INT_C_like_3, Uncharacterized site-specific tyrosine recombinase, C-terminal catalytic domain	NA|311aa|up_6|NC_017304.1_3475240_3476173_+	smart00342, HTH_ARAC, helix_turn_helix, arabinose operon control protein	NA|185aa|up_5|NC_017304.1_3476145_3476700_-	pfam01161, PBP, Phosphatidylethanolamine-binding protein	NA|240aa|up_4|NC_017304.1_3476849_3477569_-	NA	NA|227aa|up_3|NC_017304.1_3477547_3478228_-	COG1131, CcmA, ABC-type multidrug transport system, ATPase component [Defense mechanisms]	NA|88aa|up_2|NC_017304.1_3478237_3478501_-	NA	NA|153aa|up_1|NC_017304.1_3478545_3479004_-	NA	NA|180aa|up_0|NC_017304.1_3479020_3479560_-	NA	cas2|88aa|down_0|NC_017304.1_3482113_3482377_-	cd09725, Cas2_I_II_III, CRISPR/Cas system-associated protein Cas2	cas1|331aa|down_1|NC_017304.1_3482390_3483383_-	TIGR03641, cas1_HMARI, CRISPR-associated endonuclease Cas1, subtype I-B/HMARI/TNEAP	cas4|169aa|down_2|NC_017304.1_3483396_3483903_-	pfam01930, Cas_Cas4, Domain of unknown function DUF83	cas3|751aa|down_3|NC_017304.1_3483921_3486174_-	cd09639, Cas3_I, CRISPR/Cas system-associated protein Cas3	cas5|242aa|down_4|NC_017304.1_3486195_3486921_-	TIGR01895, conserved_hypothetical_protein, CRISPR-associated protein Cas5, subtype I-B/TNEAP	cas7|295aa|down_5|NC_017304.1_3486939_3487824_-	TIGR02585, conserved_protein, CRISPR-associated protein Cas7/Cst2/DevR, subtype I-B/TNEAP	cas8b1|559aa|down_6|NC_017304.1_3487829_3489506_-	NA	cas6|241aa|down_7|NC_017304.1_3489517_3490240_-	COG1583, COG1583, CRISPR system related protein, RAMP superfamily [Defense    mechanisms]	NA|307aa|down_8|NC_017304.1_3490496_3491417_+	cd00537, MTHFR, Methylenetetrahydrofolate reductase (MTHFR)	NA|273aa|down_9|NC_017304.1_3491446_3492265_-	PRK00281, PRK00281, undecaprenyl-diphosphate phosphatase
