Prediction Discovery of CRISPR knowledge in the bacterial genome

Input

Users can upload a FASTA DNA sequence file or paste the sequence in the submission page.

As an option, users can also provide a RefSeq ID for an annotated sequence in NCBI example: NC_008532.

Prediction work-flow

In CRISPRminer, for a bacterial or archaeal genome, the detected CRISPR arrays and nearby (within 10,000 bps) Cas clusters were identified as CRISPR-Cas loci according to their distances using the dynamic programming algorithm. The CRISPR arrays lacking nearby Cas genes or Cas clusters lacking nearby CRISPRs are defined as isolated ones. An isolated cas locus defined in this study should contain at least three cas genes, at least one of which belongs to the universal cas genes for CRISPR adaptation (cas1 and cas2) or the main components of interference module including cas7, cas5, cas8, cas10, csf1, cas9, cpf1.

Both the PILER-CR+HmmScan and CRISPRCasFinder methods were supported for CRISPR-Cas system annotation.

CRISPR-Cas system Look up pre-calculated CRISPR-Cas systems

Introduction of CRISPR system

CRISPR: Clustered regularly interspaced short palindromic repeats

Cas: CRISPR associated proteins

CRISPR-Cas system: CRISPR and associated proteins (Cas) comprise the CRISPR-Cas system

CRISPR locus: locus that contains alternate repeated elements (CRISPR repeats) and variable sequences (CRISPR spacers)

The Cas proteins can be divided into four distinct functional modules: adaptation (spacer acquisition), expression (crRNA processing and target binding), interference (target cleavage), and ancillary (regulatory and other CRISPR-associated functions).

CRISPR system detection work-flow

Browser

Users can browse all the bacterial and archaeal organisms with CRISPR systems (containing both CRISPRs and Cas proteins) according to the bacteria evolutionary lineage.

Compare

All the bacterial organisms with CRISPR systems are organized at the species or genus level. When clicking on a specific species or genus name, all CRISPR systems belonging to the organisms belonging to the selected species or genus are displayed. Thus, users can compare the CRISPR systems to see the common or distinct features among the organisms belonging to the same species or genus.

Visualization of predicted CRISPR-Cas systems

Our system provide both global and local views of predicted CRISPR-Cas system(s). The global view provides predicted bacterial genomic islands (GIs, regions of probable horizontal origin) using IslandViewer and predicted prophage regions using prophinder and phaster for users to compare with the loci of the CRISPR systems.

The local view provides the visualization of the CRISPR array (shown as x10 for example, 10 means the copy number of the repeats) plus the Cas proteins (cas proteins in different functional modules are drawn with different colors).

The detailed information for the Cas genes (locations, HMM profiles and type/subtype), the CRISPR array (location, repeat and spacer) are provided at the bottom of the page.

Classification Types and subtypes of the predicted CRISPR-Cas system

CRISPR types/subtypes decision rules

This classification is established based on cas genes content in each CRISPR-cas locus. Signature genes and distinctive gene architectures allow the assignment of these loci to types and subtypes. We summarized a table of cas gene content for each subtype (or type) cas protein table.

For details please refer to Makarova et al 2015 (Nat Rev Microbiol) and Shmakov et al 2017 (Nat Rev Microbiol).

The classification system developed by Makarova et al and Shmakov et al encompass two classes, six types. Class 1 CRISPR–Cas systems are defined by the presence of a multisubunit crRNA–effector complex. The class includes type I and type III CRISPR–Cas systems, as well as the putative new type IV. Class 2 CRISPR–Cas systems are defined by the presence of a single subunit crRNA–effector module. This class includes type II CRISPR–Cas systems, as well as two putative new types V and VI.

  1. Type I includes I-A, I-B, I-C, I-D, I-E, I-F, and I-U subtypes
  2. Type II includes II-A, II-B, and II-C subtypes
  3. Type III includes III-A, III-B, III-C, and III-U subtypes
  4. Type IV includes two distinct variants, one containing a DinG family helicase, and the other not containg DinG but typically a gene encoding a small α-helical protein, which is a putative small subunit.
  5. Type V includes V-A, V-B, V-C, and V-U subtypes
  6. Type VI includes VI-A, VI-B, and VI-U subtypes

Self-targeting Annotation of self-targeting spacers

Self-targeting spacers reported in Stem et al (Trends Genet)

In Stem et al study, 23 550 spacers from 330 CRISPR encoding organisms were scanned for an exact full match between the spacer and a portion of the endogenous genomic sequence that is not part of a CRISPR array (termed target, or self proto-spacer). As a result, 100 of 23 550 spacers (0.4%) are self-targeting.

Self-targeting spacers detected in our study

In our study, 918,168 spacers from 62,343 CRISPR encoding organisms were tested for self-targeting to host genome using blastn .

Interaction Bacteria and phage infection network

Bacteria and phage infection network inferred based on spacers

To detect the interactions between the bacteria and phages. all the spacers from the CRISPR loci of the bacteria genomes were blast against all the phage genomes in our database.

The interactions could be browsed at the species or genus level of the bacteria organisms. The information for the interacting spacers and their targeting proto-spacers, as well as the Phage genome content near proto-spacers are provided. If the proto-spacer is located in a protein region, the information about the protein is also provided.

Bacteria and phage infection network extracted from NCBI

Each phage genome is searched for its bacteria host which is annotated in its genebank file.

Anti-crisprs Annotation of anti-crispr proteins and their homologs

Known anti-crispr proteins

Nowadays, totally 21 anti-crispr proteins were detected and validated in the CRISPR-bearing bacteria, including four from type I-E, ten from type I-F, four from type II-A, and three from type II-C.

Homologs of anti-crispr proteins

Since a lot of other anti-crisprs are estimated to be uncovered, we searched for the homologs of all known anti-crispr proteins and their surrounding gene content (five proteins upstream and downstream, respectively) such as the HTH domain for users to investigate the features of operons encoding anti-crisprs.