HBICLOUDSolutions of Biological Data Analysis | Since 2021

Version: v1.0     Release Date: 2025.2.25     Last Update Date: 2025.2.25
Introduction: GECCO (Gene Cluster prediction with Conditional Random Fields) is a fast and scalable method for identifying putative novel Biosynthetic Gene Clusters (BGCs) in genomic and metagenomic data using Conditional Random Fields (CRFs).
Input:
FASTA or GenBank file with the genomic sequence.
Output:
  • {genome}.genes.tsv: The genes file, containing the genes extracted or predicted from the input file, and per-gene BGC probabilities predicted by the CRF.
  • {genome}.features.tsv: The features file, containing the identified domains in the input sequences, in tabular format.
  • {genome}.clusters.tsv: If any were found, a clusters file, containing the coordinates of the predicted clusters along their putative biosynthetic type, in tabular format.
  • {genome}_cluster_{N}.gbk: If any were found, a GenBank file per cluster, containing the cluster sequence annotated with its member proteins and domains.

Upload data