assembly_id	genome_id	genome_def	crispr_array_locus_merge	crispr_array_location_merge	crispr_locus_id	crispr_pred_method	array_in_prot	prot_within_array_20000	prot_in_genome	crispr_type_by_cas_prot	consensus_repeat	repeat_length	self-targeting_spacer_number	self-targeting_target_number	spacer_location	protospacer_location	repeat_type	spacer_locus_num	spacer_num	correct_crispr_type	genome_cas_prots	unknown_protein_around_crispr	L10	L10_domain	L9	L9_domain	L8	L8_domain	L7	L7_domain	L6	L6_domain	L5	L5_domain	L4	L4_domain	L3	L3_domain	L2	L2_domain	L1	L1_domain	R1	R1_domain	R2	R2_domain	R3	R3_domain	R4	R4_domain	R5	R5_domain	R6	R6_domain	R7	R7_domain	R8	R8_domain	R9	R9_domain	R10	R10_domain
GCF_001692755.1_ASM169275v1	NZ_CP016502	Hungateiclostridium thermocellum DSM 2360 strain LQRI chromosome, complete genome	1	882282-882549	1,1,1	PILER-CR,CRISPRCasFinder,CRT	no	cas8b1,cas6,csx1,cas10,csm3gr7,csx10gr5,csx19,cas2,cas4	cas3,WYL,cas8b1,cas6,csx1,cas10,csm3gr7,csx10gr5,csx19,cas2,cas4,csa3,DEDDh,DinG,csm2gr11,cas1,cas5,cas7	Type III-A,Type I-B,Type III-B,Type III-C,Type III-D	GTTGAAGAGGTACTTCCAGTAAAACAAGGATTGAAACATA,GTTGAAGAGGTACTTCCAGTAAAACAAGGATTGAAAC,GTTGAAGAGGTACTTCCAGTAAAACAAGGATTGAAAC	40,37,37	0	0	NA	NA	?:?:?	2,2,3	3	TypeIII-A,TypeI-B,TypeIII-B,TypeIII-C,TypeIII-D	cas3,WYL,cas8b1,cas6,csx1,cas10,csm3gr7,csx10gr5,csx19,cas2,cas4,csa3,DEDDh,DinG,csm2gr11,cas1,cas5,cas7	csx1|329aa|up_9|NZ_CP016502.1_870620_871607_+,csx1|330aa|up_8|NZ_CP016502.1_871596_872586_+,csx19|156aa|up_3|NZ_CP016502.1_877749_878217_+,NA|60aa|up_1|NZ_CP016502.1_880299_880479_+,NA|83aa|down_3|NZ_CP016502.1_885939_886188_-,NA|47aa|down_4|NZ_CP016502.1_886298_886439_-,NA|216aa|down_5|NZ_CP016502.1_887264_887912_+,NA|266aa|down_6|NZ_CP016502.1_888048_888846_+,NA|166aa|down_7|NZ_CP016502.1_888887_889385_+	csx1|329aa|up_9|NZ_CP016502.1_870620_871607_+	NA	csx1|330aa|up_8|NZ_CP016502.1_871596_872586_+	NA	cas10|502aa|up_7|NZ_CP016502.1_872605_874111_+	cd09679, Cas10_III, CRISPR/Cas system-associated protein Cas10	csm3gr7|222aa|up_6|NZ_CP016502.1_874111_874777_+	pfam03787, RAMPs, RAMP superfamily	csx10gr5|534aa|up_5|NZ_CP016502.1_874769_876371_+	cd09700, Csx10, CRISPR/Cas system-associated RAMP superfamily protein Csx10	csm3gr7|461aa|up_4|NZ_CP016502.1_876370_877753_+	cd09726, RAMP_I_III, CRISPR/Cas system-associated RAMP superfamily protein	csx19|156aa|up_3|NZ_CP016502.1_877749_878217_+	NA	csm3gr7|693aa|up_2|NZ_CP016502.1_878221_880300_+	TIGR03986, CRISPR-associated_protein, CRISPR-associated protein	NA|60aa|up_1|NZ_CP016502.1_880299_880479_+	NA	csx1|489aa|up_0|NZ_CP016502.1_880522_881989_+	pfam09670, Cas_Cas02710, CRISPR-associated protein (Cas_Cas02710)	cas2|97aa|down_0|NZ_CP016502.1_883548_883839_+	cd09725, Cas2_I_II_III, CRISPR/Cas system-associated protein Cas2	cas4|210aa|down_1|NZ_CP016502.1_883813_884443_+	cd09637, Cas4_I-A_I-B_I-C_I-D_II-B, CRISPR/Cas system-associated protein Cas4	NA|408aa|down_2|NZ_CP016502.1_884525_885749_+	pfam00872, Transposase_mut, Transposase, Mutator family	NA|83aa|down_3|NZ_CP016502.1_885939_886188_-	NA	NA|47aa|down_4|NZ_CP016502.1_886298_886439_-	NA	NA|216aa|down_5|NZ_CP016502.1_887264_887912_+	NA	NA|266aa|down_6|NZ_CP016502.1_888048_888846_+	NA	NA|166aa|down_7|NZ_CP016502.1_888887_889385_+	NA	NA|287aa|down_8|NZ_CP016502.1_889894_890755_+	cd00200, WD40, WD40 domain, found in a number of eukaryotic proteins that cover a wide variety of functions including adaptor/regulatory modules in signal transduction, pre-mRNA processing and cytoskeleton assembly; typically contains a GH dipeptide 11-24 residues from its N-terminus and the WD dipeptide at its C-terminus and is 40 residues long, hence the name WD40; between GH and WD lies a conserved core; serves as a stable propeller-like platform to which proteins can bind either stably or reversibly; forms a propeller-like structure with several blades where each blade is composed of a four-stranded anti-parallel b-sheet; instances with few detectable copies are hypothesized to form larger structures by dimerization; each WD40 sequence repeat forms the first three strands of one blade and the last strand in the next blade; the last C-terminal WD40 repeat completes the blade structure of the first WD40 repeat to create the closed ring propeller-structure; residues on the top and bottom surface of the propeller are proposed to coordinate interactions with other proteins and/or small ligands; 7 copies of the repeat are present in this alignment	NA|384aa|down_9|NZ_CP016502.1_890998_892151_-	PHA02517, PHA02517, putative transposase OrfB; Reviewed
GCF_001692755.1_ASM169275v1	NZ_CP016502	Hungateiclostridium thermocellum DSM 2360 strain LQRI chromosome, complete genome	2	979773-985162	2,2,2	CRISPRCasFinder,CRT,PILER-CR	no		cas3,WYL,cas8b1,cas6,csx1,cas10,csm3gr7,csx10gr5,csx19,cas2,cas4,csa3,DEDDh,DinG,csm2gr11,cas1,cas5,cas7	Orphan	GTTTCAATTCCTCATAGGTACGATAAAAAC,GTTTCAATTCCTCATAGGTACGATAAAAAC,GTTTCAATTCCTCATAGGTACGATAAAAAC	30,30,30	0	0	NA	NA	NA:NA:NA	80,80,79	80	Orphan	cas3,WYL,cas8b1,cas6,csx1,cas10,csm3gr7,csx10gr5,csx19,cas2,cas4,csa3,DEDDh,DinG,csm2gr11,cas1,cas5,cas7	NA|111aa|up_0|NZ_CP016502.1_979194_979527_-,NA	NA|317aa|up_9|NZ_CP016502.1_969126_970077_+	COG0053, MMT1, Predicted Co/Zn/Cd cation transporters [Inorganic ion transport and metabolism]	NA|490aa|up_8|NZ_CP016502.1_970118_971588_-	COG0642, BaeS, Signal transduction histidine kinase [Signal transduction mechanisms]	NA|233aa|up_7|NZ_CP016502.1_971592_972291_-	COG0745, OmpR, Response regulators consisting of a CheY-like receiver domain and a winged-helix DNA-binding domain [Signal transduction mechanisms / Transcription]	NA|229aa|up_6|NZ_CP016502.1_972466_973153_+	cd07750, PolyPPase_VTC_like, Polyphosphate(polyP) polymerase domain of yeast vacuolar transport chaperone (VTC) proteins VTC-2, -3 and- 4, and similar proteins	NA|229aa|up_5|NZ_CP016502.1_973182_973869_+	pfam16316, DUF4956, Domain of unknown function (DUF4956)	NA|704aa|up_4|NZ_CP016502.1_973907_976019_+	pfam08757, CotH, CotH kinase protein	NA|717aa|up_3|NZ_CP016502.1_976050_978201_-	COG0370, FeoB, Fe2+ transport system protein B [Inorganic ion transport and metabolism]	NA|80aa|up_2|NZ_CP016502.1_978283_978523_-	pfam04023, FeoA, FeoA domain	NA|160aa|up_1|NZ_CP016502.1_978682_979162_+	PRK03902, PRK03902, transcriptional regulator MntR	NA|111aa|up_0|NZ_CP016502.1_979194_979527_-	NA	NA|416aa|down_0|NZ_CP016502.1_985551_986799_+	pfam07745, Glyco_hydro_53, Glycosyl hydrolase family 53	NA|149aa|down_1|NZ_CP016502.1_986910_987357_+	pfam09719, C_GCAxxG_C_C, Putative redox-active protein (C_GCAxxG_C_C)	NA|843aa|down_2|NZ_CP016502.1_987705_990234_+	cd14256, Dockerin_I, Type I dockerin repeat domain	NA|273aa|down_3|NZ_CP016502.1_990468_991287_+	cd04194, GT8_A4GalT_like, A4GalT_like proteins catalyze the addition of galactose or glucose residues to the lipooligosaccharide (LOS) or lipopolysaccharide (LPS) of the bacterial cell surface	NA|515aa|down_4|NZ_CP016502.1_991326_992871_+	cd09160, PLDc_SMU_988_like_2, Putative catalytic domain, repeat 2, of Streptococcus mutans uncharacterized protein SMU_988 and similar proteins	NA|64aa|down_5|NZ_CP016502.1_992955_993147_+	COG1117, PstB, ABC-type phosphate transport system, ATPase component [Inorganic ion transport and metabolism]	NA|237aa|down_6|NZ_CP016502.1_993210_993921_+	COG0745, OmpR, Response regulators consisting of a CheY-like receiver domain and a winged-helix DNA-binding domain [Signal transduction mechanisms / Transcription]	NA|590aa|down_7|NZ_CP016502.1_993923_995693_+	NF033092, HK_WalK, cell wall metabolism sensor histidine kinase WalK	NA|512aa|down_8|NZ_CP016502.1_996509_998045_+	PRK00915, PRK00915, 2-isopropylmalate synthase; Validated	NA|311aa|down_9|NZ_CP016502.1_998182_999115_-	pfam12146, Hydrolase_4, Serine aminopeptidase, S33
GCF_001692755.1_ASM169275v1	NZ_CP016502	Hungateiclostridium thermocellum DSM 2360 strain LQRI chromosome, complete genome	3	1919011-1920787	3,3,3,4	CRISPRCasFinder,CRT,PILER-CR,PILER-CR	no	cas3	cas3,WYL,cas8b1,cas6,csx1,cas10,csm3gr7,csx10gr5,csx19,cas2,cas4,csa3,DEDDh,DinG,csm2gr11,cas1,cas5,cas7	Unclear	ATTTCAATTCCTCATAGGTACGATACAAAC,ATTTCAATTCCTCATAGGTACGATACAAAC,ATTTCAATTCCTCATAGGTACGATACAAAC,TTTCAATTCCTCATAGGTACGATACAAAC	30,30,30,29	0	0	NA	NA	NA:NA:NA:NA	26,26,23,23	26	Unclear	cas3,WYL,cas8b1,cas6,csx1,cas10,csm3gr7,csx10gr5,csx19,cas2,cas4,csa3,DEDDh,DinG,csm2gr11,cas1,cas5,cas7	NA|356aa|up_4|NZ_CP016502.1_1910349_1911417_-,NA|111aa|down_2|NZ_CP016502.1_1922590_1922923_-	NA|215aa|up_9|NZ_CP016502.1_1906459_1907104_-	PRK08644, PRK08644, sulfur carrier protein ThiS adenylyltransferase ThiF	NA|370aa|up_8|NZ_CP016502.1_1907180_1908290_-	PRK09240, thiH, 2-iminoacetate synthase ThiH	NA|256aa|up_7|NZ_CP016502.1_1908418_1909186_-	PRK00208, thiG, thiazole synthase; Reviewed	NA|67aa|up_6|NZ_CP016502.1_1909188_1909389_-	cd00565, Ubl_ThiS, ubiquitin-like (Ubl) domain found in sulfur carrier protein ThiS	NA|208aa|up_5|NZ_CP016502.1_1909653_1910277_-	PRK00454, engB, GTP-binding protein YsxC; Reviewed	NA|356aa|up_4|NZ_CP016502.1_1910349_1911417_-	NA	NA|67aa|up_3|NZ_CP016502.1_1911991_1912192_+	PRK10767, PRK10767, chaperone protein DnaJ; Provisional	NA|348aa|up_2|NZ_CP016502.1_1912313_1913357_-	pfam07833, Cu_amine_oxidN1, Copper amine oxidase N-terminal domain	NA|221aa|up_1|NZ_CP016502.1_1913634_1914297_-	pfam13649, Methyltransf_25, Methyltransferase domain	NA|384aa|up_0|NZ_CP016502.1_1916209_1917361_+	PHA02517, PHA02517, putative transposase OrfB; Reviewed	NA|152aa|down_0|NZ_CP016502.1_1920826_1921282_-	PRK00409, PRK00409, recombination and DNA strand exchange inhibitor protein; Reviewed	NA|384aa|down_1|NZ_CP016502.1_1921363_1922515_-	cd00338, Ser_Recombinase, Serine Recombinase family, catalytic domain; a DNA binding domain may be present either N- or C-terminal to the catalytic domain	NA|111aa|down_2|NZ_CP016502.1_1922590_1922923_-	NA	cas3|713aa|down_3|NZ_CP016502.1_1923170_1925309_-	COG1201, Lhr, Lhr-like helicases [General function prediction only]	NA|615aa|down_4|NZ_CP016502.1_1926647_1928492_-	pfam13208, TerB_N, TerB N-terminal domain	NA|223aa|down_5|NZ_CP016502.1_1928652_1929321_-	PRK13413, mpi, master DNA invertase Mpi family serine-type recombinase	NA|259aa|down_6|NZ_CP016502.1_1929604_1930381_-	COG1196, Smc, Chromosome segregation ATPases [Cell division and chromosome partitioning]	NA|54aa|down_7|NZ_CP016502.1_1930405_1930567_-	pfam12728, HTH_17, Helix-turn-helix domain	NA|384aa|down_8|NZ_CP016502.1_1931717_1932870_+	PHA02517, PHA02517, putative transposase OrfB; Reviewed	NA|274aa|down_9|NZ_CP016502.1_1933058_1933880_-	pfam13730, HTH_36, Helix-turn-helix domain
GCF_001692755.1_ASM169275v1	NZ_CP016502	Hungateiclostridium thermocellum DSM 2360 strain LQRI chromosome, complete genome	4	1934120-1937363	4,4,5	CRISPRCasFinder,CRT,PILER-CR	no	cas3	cas3,WYL,cas8b1,cas6,csx1,cas10,csm3gr7,csx10gr5,csx19,cas2,cas4,csa3,DEDDh,DinG,csm2gr11,cas1,cas5,cas7	Unclear	GTTTCAATTCCTCATAGGTACGATACAAAC,GTTTCAATTCCTCATAGGTACGATACAAAC,GTTTCAATTCCTCATAGGTACGATACAAAC	30,30,30	0	0	NA	NA	NA:NA:NA	48,48,48	48	Unclear	cas3,WYL,cas8b1,cas6,csx1,cas10,csm3gr7,csx10gr5,csx19,cas2,cas4,csa3,DEDDh,DinG,csm2gr11,cas1,cas5,cas7	NA|111aa|up_7|NZ_CP016502.1_1922590_1922923_-,NA	NA|152aa|up_9|NZ_CP016502.1_1920826_1921282_-	PRK00409, PRK00409, recombination and DNA strand exchange inhibitor protein; Reviewed	NA|384aa|up_8|NZ_CP016502.1_1921363_1922515_-	cd00338, Ser_Recombinase, Serine Recombinase family, catalytic domain; a DNA binding domain may be present either N- or C-terminal to the catalytic domain	NA|111aa|up_7|NZ_CP016502.1_1922590_1922923_-	NA	cas3|713aa|up_6|NZ_CP016502.1_1923170_1925309_-	COG1201, Lhr, Lhr-like helicases [General function prediction only]	NA|615aa|up_5|NZ_CP016502.1_1926647_1928492_-	pfam13208, TerB_N, TerB N-terminal domain	NA|223aa|up_4|NZ_CP016502.1_1928652_1929321_-	PRK13413, mpi, master DNA invertase Mpi family serine-type recombinase	NA|259aa|up_3|NZ_CP016502.1_1929604_1930381_-	COG1196, Smc, Chromosome segregation ATPases [Cell division and chromosome partitioning]	NA|54aa|up_2|NZ_CP016502.1_1930405_1930567_-	pfam12728, HTH_17, Helix-turn-helix domain	NA|384aa|up_1|NZ_CP016502.1_1931717_1932870_+	PHA02517, PHA02517, putative transposase OrfB; Reviewed	NA|274aa|up_0|NZ_CP016502.1_1933058_1933880_-	pfam13730, HTH_36, Helix-turn-helix domain	NA|97aa|down_0|NZ_CP016502.1_1938103_1938394_-	pfam04456, DUF503, Protein of unknown function (DUF503)	NA|395aa|down_1|NZ_CP016502.1_1938523_1939708_+	pfam07228, SpoIIE, Stage II sporulation protein E (SpoIIE)	NA|588aa|down_2|NZ_CP016502.1_1939735_1941499_+	COG4191, COG4191, Signal transduction histidine kinase regulating C4-dicarboxylate transport system [Signal transduction mechanisms]	NA|589aa|down_3|NZ_CP016502.1_1941755_1943522_+	pfam05833, FbpA, Fibronectin-binding protein A N-terminus (FbpA)	NA|398aa|down_4|NZ_CP016502.1_1943535_1944729_+	PRK06836, PRK06836, pyridoxal phosphate-dependent aminotransferase	NA|174aa|down_5|NZ_CP016502.1_1944795_1945317_-	cd02151, nitroreductase, nitroreductase family protein	NA|737aa|down_6|NZ_CP016502.1_1945870_1948081_+	pfam00759, Glyco_hydro_9, Glycosyl hydrolase family 9	NA|213aa|down_7|NZ_CP016502.1_1948219_1948858_-	cd07995, TPK, Thiamine pyrophosphokinase	NA|221aa|down_8|NZ_CP016502.1_1948870_1949533_-	cd00429, RPE, Ribulose-5-phosphate 3-epimerase (RPE)	NA|295aa|down_9|NZ_CP016502.1_1949691_1950576_-	PRK00098, PRK00098, GTPase RsgA; Reviewed
GCF_001692755.1_ASM169275v1	NZ_CP016502	Hungateiclostridium thermocellum DSM 2360 strain LQRI chromosome, complete genome	5	3495792-3497696	5,5,6	CRISPRCasFinder,CRT,PILER-CR	no	cas2,cas1,cas4,cas3,cas5,cas7,cas8b1,cas6	cas3,WYL,cas8b1,cas6,csx1,cas10,csm3gr7,csx10gr5,csx19,cas2,cas4,csa3,DEDDh,DinG,csm2gr11,cas1,cas5,cas7	Type I-B	GTTTCAATTCCTCATAGGTACGATAAAAAC,GTTTCAATTCCTCATAGGTACGATAAAAAC,GTTTCAATTCCTCATAGGTACGATAAAAAC	30,30,30	0	0	NA	NA	NA:NA:NA	28,28,27	28	TypeI-B	cas3,WYL,cas8b1,cas6,csx1,cas10,csm3gr7,csx10gr5,csx19,cas2,cas4,csa3,DEDDh,DinG,csm2gr11,cas1,cas5,cas7	NA|240aa|up_4|NZ_CP016502.1_3492532_3493252_-,NA|88aa|up_2|NZ_CP016502.1_3493920_3494184_-,NA|153aa|up_1|NZ_CP016502.1_3494228_3494687_-,NA|180aa|up_0|NZ_CP016502.1_3494703_3495243_-,cas8b1|559aa|down_6|NZ_CP016502.1_3503578_3505255_-	NA|703aa|up_9|NZ_CP016502.1_3486568_3488677_-	COG0643, CheA, Chemotaxis protein histidine kinase and related kinases [Cell motility and secretion / Signal transduction mechanisms]	NA|142aa|up_8|NZ_CP016502.1_3488691_3489117_-	COG0835, CheW, Chemotaxis signal transduction protein [Cell motility and secretion / Signal transduction mechanisms]	NA|187aa|up_7|NZ_CP016502.1_3489962_3490523_-	cd01192, INT_C_like_3, Uncharacterized site-specific tyrosine recombinase, C-terminal catalytic domain	NA|311aa|up_6|NZ_CP016502.1_3490923_3491856_+	smart00342, HTH_ARAC, helix_turn_helix, arabinose operon control protein	NA|185aa|up_5|NZ_CP016502.1_3491828_3492383_-	pfam01161, PBP, Phosphatidylethanolamine-binding protein	NA|240aa|up_4|NZ_CP016502.1_3492532_3493252_-	NA	NA|227aa|up_3|NZ_CP016502.1_3493230_3493911_-	COG1131, CcmA, ABC-type multidrug transport system, ATPase component [Defense mechanisms]	NA|88aa|up_2|NZ_CP016502.1_3493920_3494184_-	NA	NA|153aa|up_1|NZ_CP016502.1_3494228_3494687_-	NA	NA|180aa|up_0|NZ_CP016502.1_3494703_3495243_-	NA	cas2|88aa|down_0|NZ_CP016502.1_3497862_3498126_-	cd09725, Cas2_I_II_III, CRISPR/Cas system-associated protein Cas2	cas1|331aa|down_1|NZ_CP016502.1_3498139_3499132_-	TIGR03641, cas1_HMARI, CRISPR-associated endonuclease Cas1, subtype I-B/HMARI/TNEAP	cas4|169aa|down_2|NZ_CP016502.1_3499145_3499652_-	pfam01930, Cas_Cas4, Domain of unknown function DUF83	cas3|751aa|down_3|NZ_CP016502.1_3499670_3501923_-	cd09639, Cas3_I, CRISPR/Cas system-associated protein Cas3	cas5|242aa|down_4|NZ_CP016502.1_3501944_3502670_-	TIGR01895, conserved_hypothetical_protein, CRISPR-associated protein Cas5, subtype I-B/TNEAP	cas7|295aa|down_5|NZ_CP016502.1_3502688_3503573_-	TIGR02585, conserved_protein, CRISPR-associated protein Cas7/Cst2/DevR, subtype I-B/TNEAP	cas8b1|559aa|down_6|NZ_CP016502.1_3503578_3505255_-	NA	cas6|241aa|down_7|NZ_CP016502.1_3505266_3505989_-	COG1583, COG1583, CRISPR system related protein, RAMP superfamily [Defense    mechanisms]	NA|307aa|down_8|NZ_CP016502.1_3506245_3507166_+	cd00537, MTHFR, Methylenetetrahydrofolate reductase (MTHFR)	NA|273aa|down_9|NZ_CP016502.1_3507195_3508014_-	PRK00281, PRK00281, undecaprenyl-diphosphate phosphatase
