assembly_id	genome_id	genome_def	crispr_array_locus_merge	crispr_array_location_merge	crispr_locus_id	crispr_pred_method	array_in_prot	prot_within_array_20000	prot_in_genome	crispr_type_by_cas_prot	consensus_repeat	repeat_length	self-targeting_spacer_number	self-targeting_target_number	spacer_location	protospacer_location	repeat_type	spacer_locus_num	spacer_num	correct_crispr_type	genome_cas_prots	unknown_protein_around_crispr	L10	L10_domain	L9	L9_domain	L8	L8_domain	L7	L7_domain	L6	L6_domain	L5	L5_domain	L4	L4_domain	L3	L3_domain	L2	L2_domain	L1	L1_domain	R1	R1_domain	R2	R2_domain	R3	R3_domain	R4	R4_domain	R5	R5_domain	R6	R6_domain	R7	R7_domain	R8	R8_domain	R9	R9_domain	R10	R10_domain
GCF_000317045.1_ASM31704v1	NC_019703	Geitlerinema sp. PCC 7407, complete sequence	1	1919477-1919553	1	CRISPRCasFinder	no		Cas14u_CAS-V,DinG,c2c9_V-U4,csa3,cas14j,csb2gr5,cas7,cas8u1,cas3,DEDDh,RT,c2c5_V-U5	Orphan	CCCTTCATGGACGAGCCGTTTCTGGA	26	0	0	NA	NA	NA	1	1	Orphan	Cas14u_CAS-V,DinG,c2c9_V-U4,csa3,cas14j,csb2gr5,cas7,cas8u1,cas3,DEDDh,RT,c2c5_V-U5	NA|63aa|up_7|NC_019703.1_1905811_1906000_+,NA|619aa|up_5|NC_019703.1_1909776_1911633_+,NA|131aa|up_1|NC_019703.1_1916476_1916869_-,NA|122aa|down_1|NC_019703.1_1922322_1922688_-,NA|139aa|down_2|NC_019703.1_1922729_1923146_-	NA|110aa|up_9|NC_019703.1_1904120_1904450_-	COG4095, COG4095, Uncharacterized conserved protein [Function unknown]	NA|329aa|up_8|NC_019703.1_1904593_1905580_-	COG1131, CcmA, ABC-type multidrug transport system, ATPase component [Defense mechanisms]	NA|63aa|up_7|NC_019703.1_1905811_1906000_+	NA	NA|1014aa|up_6|NC_019703.1_1906009_1909051_-	COG1357, COG1357, Pentapeptide repeats containing protein [Function unknown]	NA|619aa|up_5|NC_019703.1_1909776_1911633_+	NA	NA|568aa|up_4|NC_019703.1_1911782_1913486_+	pfam12452, DUF3685, Protein of unknown function (DUF3685)	NA|743aa|up_3|NC_019703.1_1913736_1915965_+	TIGR02956, sensor_protein_TorS, TMAO reductase sytem sensor TorS	NA|141aa|up_2|NC_019703.1_1916009_1916432_+	cd17546, REC_hyHK_CKI1_RcsC-like, phosphoacceptor receiver (REC) domain of hybrid sensor histidine kinases/response regulators similar to Arabidopsis thaliana CKI1 and Escherichia coli RcsC	NA|131aa|up_1|NC_019703.1_1916476_1916869_-	NA	NA|306aa|up_0|NC_019703.1_1917010_1917928_+	PLN02824, PLN02824, hydrolase, alpha/beta fold family protein	NA|488aa|down_0|NC_019703.1_1920518_1921982_-	COG1316, LytR, Transcriptional regulator [Transcription]	NA|122aa|down_1|NC_019703.1_1922322_1922688_-	NA	NA|139aa|down_2|NC_019703.1_1922729_1923146_-	NA	NA|534aa|down_3|NC_019703.1_1923533_1925135_+	pfam00211, Guanylate_cyc, Adenylate and Guanylate cyclase catalytic domain	NA|500aa|down_4|NC_019703.1_1925443_1926943_+	pfam00990, GGDEF, Diguanylate cyclase, GGDEF domain	NA|208aa|down_5|NC_019703.1_1926932_1927556_-	PRK00116, ruvA, Holliday junction branch migration protein RuvA	NA|253aa|down_6|NC_019703.1_1927552_1928311_-	TIGR01485, putative_sucrose-phosphate_phosphatase, sucrose-6F-phosphate phosphohydrolase	NA|583aa|down_7|NC_019703.1_1928472_1930221_-	COG1132, MdlB, ABC-type multidrug transport system, ATPase and permease components [Defense mechanisms]	NA|377aa|down_8|NC_019703.1_1930287_1931418_-	cd06259, YdcF-like, YdcF-like	NA|316aa|down_9|NC_019703.1_1931904_1932852_+	COG1357, COG1357, Pentapeptide repeats containing protein [Function unknown]
GCF_000317045.1_ASM31704v1	NC_019703	Geitlerinema sp. PCC 7407, complete sequence	2	2173675-2173766	2	CRISPRCasFinder	no		Cas14u_CAS-V,DinG,c2c9_V-U4,csa3,cas14j,csb2gr5,cas7,cas8u1,cas3,DEDDh,RT,c2c5_V-U5	Orphan	TGAGATAATCCGAGAAGGCAGTCGAGAAG	29	0	0	NA	NA	NA	1	1	Orphan	Cas14u_CAS-V,DinG,c2c9_V-U4,csa3,cas14j,csb2gr5,cas7,cas8u1,cas3,DEDDh,RT,c2c5_V-U5	NA|136aa|up_9|NC_019703.1_2148706_2149114_+,NA|66aa|up_4|NC_019703.1_2159442_2159640_+,NA|559aa|down_3|NC_019703.1_2184926_2186603_-,NA|452aa|down_4|NC_019703.1_2186646_2188002_-	NA|136aa|up_9|NC_019703.1_2148706_2149114_+	NA	NA|319aa|up_8|NC_019703.1_2149110_2150067_-	COG0564, RluA, Pseudouridylate synthases, 23S RNA-specific [Translation, ribosomal structure and biogenesis]	NA|323aa|up_7|NC_019703.1_2150383_2151352_+	PRK07453, PRK07453, protochlorophyllide reductase	NA|451aa|up_6|NC_019703.1_2151557_2152910_-	cd01298, ATZ_TRZ_like, TRZ/ATZ family contains enzymes from the atrazine degradation pathway and related hydrolases	NA|1802aa|up_5|NC_019703.1_2153002_2158408_-	COG3899, COG3899, Predicted ATPase [General function prediction only]	NA|66aa|up_4|NC_019703.1_2159442_2159640_+	NA	NA|412aa|up_3|NC_019703.1_2159668_2160904_-	TIGR02037, Probable_periplasmic_serine_protease_do/HhoA-like, periplasmic serine protease, Do/DeqQ family	NA|2145aa|up_2|NC_019703.1_2161904_2168339_+	PRK11107, PRK11107, hybrid sensory histidine kinase BarA; Provisional	NA|351aa|up_1|NC_019703.1_2168341_2169394_+	COG3706, PleD, Response regulator containing a CheY-like receiver domain and a GGDEF domain [Signal transduction mechanisms]	NA|896aa|up_0|NC_019703.1_2169494_2172182_+	pfam12770, CHAT, CHAT domain	NA|316aa|down_0|NC_019703.1_2173989_2174937_+	pfam12770, CHAT, CHAT domain	NA|1559aa|down_1|NC_019703.1_2174999_2179676_-	cd00200, WD40, WD40 domain, found in a number of eukaryotic proteins that cover a wide variety of functions including adaptor/regulatory modules in signal transduction, pre-mRNA processing and cytoskeleton assembly; typically contains a GH dipeptide 11-24 residues from its N-terminus and the WD dipeptide at its C-terminus and is 40 residues long, hence the name WD40; between GH and WD lies a conserved core; serves as a stable propeller-like platform to which proteins can bind either stably or reversibly; forms a propeller-like structure with several blades where each blade is composed of a four-stranded anti-parallel b-sheet; instances with few detectable copies are hypothesized to form larger structures by dimerization; each WD40 sequence repeat forms the first three strands of one blade and the last strand in the next blade; the last C-terminal WD40 repeat completes the blade structure of the first WD40 repeat to create the closed ring propeller-structure; residues on the top and bottom surface of the propeller are proposed to coordinate interactions with other proteins and/or small ligands; 7 copies of the repeat are present in this alignment	NA|1630aa|down_2|NC_019703.1_2179893_2184783_-	cd00200, WD40, WD40 domain, found in a number of eukaryotic proteins that cover a wide variety of functions including adaptor/regulatory modules in signal transduction, pre-mRNA processing and cytoskeleton assembly; typically contains a GH dipeptide 11-24 residues from its N-terminus and the WD dipeptide at its C-terminus and is 40 residues long, hence the name WD40; between GH and WD lies a conserved core; serves as a stable propeller-like platform to which proteins can bind either stably or reversibly; forms a propeller-like structure with several blades where each blade is composed of a four-stranded anti-parallel b-sheet; instances with few detectable copies are hypothesized to form larger structures by dimerization; each WD40 sequence repeat forms the first three strands of one blade and the last strand in the next blade; the last C-terminal WD40 repeat completes the blade structure of the first WD40 repeat to create the closed ring propeller-structure; residues on the top and bottom surface of the propeller are proposed to coordinate interactions with other proteins and/or small ligands; 7 copies of the repeat are present in this alignment	NA|559aa|down_3|NC_019703.1_2184926_2186603_-	NA	NA|452aa|down_4|NC_019703.1_2186646_2188002_-	NA	NA|194aa|down_5|NC_019703.1_2189762_2190344_+	COG0523, COG0523, Putative GTPases (G3E family) [General function prediction only]	NA|287aa|down_6|NC_019703.1_2191323_2192184_-	cd05355, SDR_c1, classical (c) SDR, subgroup 1	NA|342aa|down_7|NC_019703.1_2192258_2193284_-	COG5592, COG5592, Uncharacterized conserved protein [Function unknown]	NA|331aa|down_8|NC_019703.1_2193553_2194546_-	pfam00924, MS_channel, Mechanosensitive ion channel	NA|161aa|down_9|NC_019703.1_2194871_2195354_+	cd07817, SRPBCC_8, Ligand-binding SRPBCC domain of an uncharacterized subfamily of proteins
GCF_000317045.1_ASM31704v1	NC_019703	Geitlerinema sp. PCC 7407, complete sequence	3	2281100-2282806	3,1,1	CRISPRCasFinder,CRT,PILER-CR	no	csb2gr5,cas7,cas8u1,cas3	Cas14u_CAS-V,DinG,c2c9_V-U4,csa3,cas14j,csb2gr5,cas7,cas8u1,cas3,DEDDh,RT,c2c5_V-U5	Unclear	TCTTCAAAGGGGCCGCATCTTGAGAATGCGGTGAGAC,CTTCAAAGGGGCCGCATCTTGAGA,CTTCAAAGGGGCCGCATCTTGAGAATGCGGTGAGAC	37,24,36	0	0	NA	NA	NA:NA:NA	23,23,22	23	Unclear	Cas14u_CAS-V,DinG,c2c9_V-U4,csa3,cas14j,csb2gr5,cas7,cas8u1,cas3,DEDDh,RT,c2c5_V-U5	NA|459aa|up_2|NC_019703.1_2276699_2278076_+,NA|106aa|down_4|NC_019703.1_2290783_2291101_-,NA|83aa|down_9|NC_019703.1_2296203_2296452_+	NA|232aa|up_9|NC_019703.1_2269460_2270156_+	pfam01944, SpoIIM, Stage II sporulation protein M	NA|293aa|up_8|NC_019703.1_2270342_2271221_+	cd05269, TMR_SDR_a, triphenylmethane reductase (TMR)-like proteins, NMRa-like, atypical (a) SDRs	NA|326aa|up_7|NC_019703.1_2271283_2272261_+	pfam13354, Beta-lactamase2, Beta-lactamase enzyme family	NA|134aa|up_6|NC_019703.1_2272523_2272925_+	cd06152, YjgF_YER057c_UK114_like_4, YjgF, YER057c, and UK114 belong to a large family of proteins present in bacteria, archaea, and eukaryotes with no definitive function	NA|140aa|up_5|NC_019703.1_2273049_2273469_+	pfam01471, PG_binding_1, Putative peptidoglycan binding domain	NA|256aa|up_4|NC_019703.1_2273568_2274336_-	PRK00748, PRK00748, 1-(5-phosphoribosyl)-5-[(5-phosphoribosylamino)methylideneamino] imidazole-4-carboxamide isomerase; Validated	NA|446aa|up_3|NC_019703.1_2275001_2276339_+	pfam10011, DUF2254, Predicted membrane protein (DUF2254)	NA|459aa|up_2|NC_019703.1_2276699_2278076_+	NA	NA|263aa|up_1|NC_019703.1_2278182_2278971_+	cd07477, Peptidases_S8_Subtilisin_subset, Peptidase S8 family domain in Subtilisin proteins	NA|709aa|up_0|NC_019703.1_2278967_2281094_+	cd07473, Peptidases_S8_Subtilisin_like, Peptidase S8 family domain in Subtilisin-like proteins	csb2gr5|578aa|down_0|NC_019703.1_2283035_2284769_-	TIGR02165, CRISPR-associated_protein_GSU0054_family, CRISPR-associated protein GSU0054/csb2, Dpsyc system	cas7|333aa|down_1|NC_019703.1_2284765_2285764_-	pfam09617, Cas_GSU0053, CRISPR-associated protein GSU0053 (Cas_GSU0053)	cas8u1|758aa|down_2|NC_019703.1_2285773_2288047_-	TIGR04113, hypothetical_protein_AaLAA1DRAFT_1703, CRISPR-associated protein Csx17, subtype Dpsyc	cas3|831aa|down_3|NC_019703.1_2288051_2290544_-	TIGR02621, CRISPR-associated_helicase_Cas3, CRISPR-associated helicase Cas3, subtype Dpsyc	NA|106aa|down_4|NC_019703.1_2290783_2291101_-	NA	NA|118aa|down_5|NC_019703.1_2291601_2291955_+	cd08026, DUF326, Cysteine-rich 4 helical bundle widely conserved in bacteria	NA|368aa|down_6|NC_019703.1_2292097_2293201_-	COG4638, HcaE, Phenylpropionate dioxygenase and related ring-hydroxylating dioxygenases, large terminal subunit [Inorganic ion transport and metabolism / General function prediction only]	NA|242aa|down_7|NC_019703.1_2293307_2294033_-	pfam12981, DUF3865, Domain of Unknown Function with PDB structure (DUF3865)	NA|347aa|down_8|NC_019703.1_2294861_2295902_-	TIGR00737, Probable_tRNA-dihydrouridine_synthase, putative TIM-barrel protein, nifR3 family	NA|83aa|down_9|NC_019703.1_2296203_2296452_+	NA
GCF_000317045.1_ASM31704v1	NC_019703	Geitlerinema sp. PCC 7407, complete sequence	4	3405789-3405981	4	CRISPRCasFinder	no	c2c5_V-U5	Cas14u_CAS-V,DinG,c2c9_V-U4,csa3,cas14j,csb2gr5,cas7,cas8u1,cas3,DEDDh,RT,c2c5_V-U5	Type V-U5	CTTTCAACCCTTCCAGTACCGGAAGGGCGATCGCAACTC	39	0	0	NA	NA	V-U5	2	2	TypeV-U5	Cas14u_CAS-V,DinG,c2c9_V-U4,csa3,cas14j,csb2gr5,cas7,cas8u1,cas3,DEDDh,RT,c2c5_V-U5	NA|179aa|up_5|NC_019703.1_3399170_3399707_-,NA|110aa|down_1|NC_019703.1_3409024_3409354_+,NA|134aa|down_3|NC_019703.1_3412285_3412687_+,NA|130aa|down_4|NC_019703.1_3412732_3413122_+,NA|73aa|down_5|NC_019703.1_3413518_3413737_+,NA|290aa|down_7|NC_019703.1_3414875_3415745_-,NA|390aa|down_9|NC_019703.1_3418559_3419729_+	NA|333aa|up_9|NC_019703.1_3392503_3393502_-	COG0679, COG0679, Predicted permeases [General function prediction only]	NA|755aa|up_8|NC_019703.1_3393513_3395778_-	pfam13355, DUF4101, Protein of unknown function (DUF4101)	NA|344aa|up_7|NC_019703.1_3396202_3397234_+	CHL00149, odpA, pyruvate dehydrogenase E1 component alpha subunit; Reviewed	NA|601aa|up_6|NC_019703.1_3397317_3399120_-	COG0768, FtsI, Cell division protein FtsI/penicillin-binding protein 2 [Cell envelope biogenesis, outer membrane]	NA|179aa|up_5|NC_019703.1_3399170_3399707_-	NA	NA|97aa|up_4|NC_019703.1_3400114_3400405_+	pfam11691, DUF3288, Protein of unknown function (DUF3288)	NA|455aa|up_3|NC_019703.1_3400444_3401809_-	PLN03094, PLN03094, Substrate binding subunit of ER-derived-lipid transporter; Provisional	NA|261aa|up_2|NC_019703.1_3401939_3402722_-	cd03261, ABC_Org_Solvent_Resistant, ATP-binding cassette transport system involved in resistant to organic solvents	NA|457aa|up_1|NC_019703.1_3403107_3404478_+	pfam01933, UPF0052, Uncharacterized protein family UPF0052	NA|170aa|up_0|NC_019703.1_3404492_3405002_+	pfam02367, TsaE, Threonylcarbamoyl adenosine biosynthesis protein TsaE	c2c5_V-U5|754aa|down_0|NC_019703.1_3406548_3408810_-	TIGR01766, Putative_transposase_MJ0751, transposase, IS605 OrfB family, central region	NA|110aa|down_1|NC_019703.1_3409024_3409354_+	NA	NA|606aa|down_2|NC_019703.1_3410435_3412253_+	COG3472, COG3472, Uncharacterized conserved protein [Function unknown]	NA|134aa|down_3|NC_019703.1_3412285_3412687_+	NA	NA|130aa|down_4|NC_019703.1_3412732_3413122_+	NA	NA|73aa|down_5|NC_019703.1_3413518_3413737_+	NA	NA|364aa|down_6|NC_019703.1_3413828_3414920_+	pfam14072, DndB, DNA-sulfur modification-associated	NA|290aa|down_7|NC_019703.1_3414875_3415745_-	NA	NA|793aa|down_8|NC_019703.1_3416177_3418556_+	COG0210, UvrD, Superfamily I DNA and RNA helicases [DNA replication, recombination, and repair]	NA|390aa|down_9|NC_019703.1_3418559_3419729_+	NA
GCF_000317045.1_ASM31704v1	NC_019703	Geitlerinema sp. PCC 7407, complete sequence	5	4668656-4668752	5	CRISPRCasFinder	no		Cas14u_CAS-V,DinG,c2c9_V-U4,csa3,cas14j,csb2gr5,cas7,cas8u1,cas3,DEDDh,RT,c2c5_V-U5	Orphan	GTCCAGTTCCAAGGGTGTGGAAGTTGCCAAACGGTG	36	0	0	NA	NA	NA	1	1	Orphan	Cas14u_CAS-V,DinG,c2c9_V-U4,csa3,cas14j,csb2gr5,cas7,cas8u1,cas3,DEDDh,RT,c2c5_V-U5	NA|69aa|up_7|NC_019703.1_4663038_4663245_+,NA|78aa|up_5|NC_019703.1_4665008_4665242_+,NA|89aa|up_2|NC_019703.1_4667212_4667479_-,NA|126aa|up_1|NC_019703.1_4667477_4667855_+,NA|46aa|down_1|NC_019703.1_4671415_4671553_+	NA|1010aa|up_9|NC_019703.1_4658759_4661789_-	PRK00349, uvrA, excinuclease ABC subunit UvrA	NA|188aa|up_8|NC_019703.1_4662401_4662965_+	PRK00150, def, peptide deformylase; Reviewed	NA|69aa|up_7|NC_019703.1_4663038_4663245_+	NA	NA|464aa|up_6|NC_019703.1_4663460_4664852_+	COG2339, prsW, Membrane proteinase, regulator of anti-sigma factor [Posttranslational modification, protein turnover, chaperones]	NA|78aa|up_5|NC_019703.1_4665008_4665242_+	NA	NA|448aa|up_4|NC_019703.1_4665238_4666582_-	COG4664, FcbT3, TRAP-type mannitol/chloroaromatic compound transport system, large permease component [Secondary metabolites biosynthesis, transport, and catabolism]	NA|186aa|up_3|NC_019703.1_4666565_4667123_-	COG4665, FcbT2, TRAP-type mannitol/chloroaromatic compound transport system, small permease component [Secondary metabolites biosynthesis, transport, and catabolism]	NA|89aa|up_2|NC_019703.1_4667212_4667479_-	NA	NA|126aa|up_1|NC_019703.1_4667477_4667855_+	NA	NA|234aa|up_0|NC_019703.1_4667834_4668536_-	COG1040, ComFC, Predicted amidophosphoribosyltransferases [General function prediction only]	NA|814aa|down_0|NC_019703.1_4668880_4671322_+	TIGR01418, Phosphoenolpyruvate_synthase, phosphoenolpyruvate synthase	NA|46aa|down_1|NC_019703.1_4671415_4671553_+	NA	NA|636aa|down_2|NC_019703.1_4671549_4673457_+	COG1123, COG1123, ATPase components of various ABC-type transport systems, contain duplicated ATPase [General function prediction only]	NA|640aa|down_3|NC_019703.1_4673705_4675625_+	COG4191, COG4191, Signal transduction histidine kinase regulating C4-dicarboxylate transport system [Signal transduction mechanisms]	NA|283aa|down_4|NC_019703.1_4675980_4676829_+	TIGR01183, Nitrate_transport_permease_protein_NrtB, nitrate ABC transporter, permease protein	NA|477aa|down_5|NC_019703.1_4676971_4678402_+	cd13553, PBP2_NrtA_CpmA_like, Substrate binding domain of ABC-type nitrate/bicarbonate transporters, a member of the type 2 periplasmic binding fold superfamily	NA|281aa|down_6|NC_019703.1_4678476_4679319_+	TIGR01184, Nitrate_transport_ATP-binding_protein_NrtC, nitrate transport ATP-binding subunits C and D	NA|221aa|down_7|NC_019703.1_4679352_4680015_-	COG2884, FtsE, Predicted ATPase involved in cell division [Cell division and chromosome partitioning]	NA|254aa|down_8|NC_019703.1_4680143_4680905_-	COG1922, WecG, Teichoic acid biosynthesis proteins [Cell envelope biogenesis, outer membrane]	NA|NA	NA
