Version: v1.0 Release Date: 2025.2.25 Last Update Date: 2025.2.25
Introduction: GECCO (Gene Cluster prediction with Conditional Random Fields) is a fast and scalable method for identifying putative novel Biosynthetic Gene Clusters (BGCs) in genomic and metagenomic data using Conditional Random Fields (CRFs).
Output:
- {genome}.genes.tsv: The genes file, containing the genes extracted or predicted from the input file, and per-gene BGC probabilities predicted by the CRF.
- {genome}.features.tsv: The features file, containing the identified domains in the input sequences, in tabular format.
- {genome}.clusters.tsv: If any were found, a clusters file, containing the coordinates of the predicted clusters along their putative biosynthetic type, in tabular format.
- {genome}_cluster_{N}.gbk: If any were found, a GenBank file per cluster, containing the cluster sequence annotated with its member proteins and domains.