TropGBTropical Crops Genome Database

Sugarcane /甘蔗

Taxonomy:    Magnoliopsida / Liliidae / Commelinanae / Poales / Poaceae / Saccharum /Saccharum spontaneum

Introduction

1. Sugarcane, perennial tall solid herb. The rhizome is stout and well developed. The stalk height is 3-5 (-6) meters. ChinaTaiwan,Fujian,Guangdong,Hainan,Guangxi,Sichuan,Yunnanand so onThe tropicsWidely cultivated.

2. Sugar cane istemperateandtropicCrops are manufacturedcane sugarand can be refinedethanolas an energy alternative. More than 100 countries around the world produce sugarcane, and the largest sugarcane producer isBrazil,IndiaandChina.

3. Sugarcane is an annual or perennial tropical and subtropical herb that belongs to the C4 crop.

4. Sugar cane is a perennial tall solid herb. The rhizome is stout and well developed.

5. Land preparation is to provide a deep, loose and fertile soil condition for the growth of sugarcane, so as to fully meet the needs of its root growth, so that the root system can better play the role of absorbing water and nutrients. At the same time, land preparation can also reduce diseases, insects and weeds in sugarcane fields.

Genomic Version Information

Saccharum spontaneum AP85-441

Genome Overview

Sugarcane is the 5th most valuable crop. Saccharum spontaneum contributed disease resistance to modern hybrid cultivars. The complexity of polyploid interspecific hybrid sugarcane genomes is unprecedented. The haploid S. spontaneum AP85-441 was sequenced and assembled into 32 chromosomes in 8 homologous groups. The reduction of basic chromosome number from 10 to 8 in S. spontaneum is caused by fissions of two ancestral chromosomes followed by translocations to three chromosomes each. The two rounds of whole genome duplication event occurred after the divergence of Saccharum and Miscanthus, and S. spontaneum is autopolyploid. We annotated 35,536 genes with alleles, and 4289, 9792, 14797, and 6647 genes with 4, 3, 2, and 1 alleles, respectively. The subgenomes displayed no global expression dominance, but more than half of gene sets shows allelic dominance. S. spontaneum has an enhanced NADP-ME C4 pathway for photosynthesis. 78% NBS disease genes were located in the rearranged chromosomes. Re-sequencing of 64 S. spontaneum genomes revealed balancing selection in rearranged region and octoploid is the ancestral ploidy. The allele defined Saccharum genome offers new knowledge and resources to accelerate sugarcane improvement.

Genome Information

Genome size:3.1 Gb
Total ungapped length: 3.1 Gb
Number of chromosomes:32
Number of scaffolds:15,303
Scaffold N50: 91.4 Mb
Scaffold L50:15
Number of contigs:91,522
Contig N50:45kb
Contig L50:21,341
GC percent:45
Genome coverage:90.0x
Assembly level:Chromosome

Sequencing, Assembly, and Annotation

The sugarcane AP85-441 contig-level assembly incorporated sequencing data from a mixture of sequencing technologies (Supplementary Fig. 1), including BAC pools sequenced with Illumina HiSeq 2500 and whole-genome shotgun sequencing with PacBio RS II as well as Hi-C reads, followed by Illumina short reads polishing. Each BAC pool was independently assembled using ALLPATHS-LG8, SPAdes9 and SOAPdenovo210, and best results were retained. For PacBio assembly, Canu v1.511 was used, as it is capable of avoiding collapsed repetitive regions and haplotypes. Self-correction was performed with parameter corOutCoverage=100, which allowed us to correct all of the input PacBio reads. The corrected reads, along with BAC-assembled contigs, were imported to the assembly step. Chromosomal assembly was constructed based on proximity-guided assembly using our newly developed program, ALLHIC, which is designed for polyploid genome scaffolding (see Supplementary Note for details).

We first customized a de novo repeat library of the genome using RepeatModeler, which can automatically execute two de novo repeat finding programs, including RECON 59 and RepeatScout 60. The consensus transposable element (TE) sequences generated above were imported to RepeatMasker 61 to identify and cluster repetitive elements. Unknown TEs were further classified using TEclass 62. To identify tandem repeats within the genome, the Tandem Repeat Finder (TRF) package 63 was used with the modified parameters of ‘1 1 2 80 5 200 2,000 –d –h’ to find high-order repeats. Telomeres and centromeres were identified based on the .dat output files above. Repeat sequences with more than ten monomers ‘AAACCT’ were identified as telomeres. For centromere identification, we used a similar method described in the Oropetium thomaeum genome64. The largest repeat arrays were identified and clustered as centromeres. To further investigate LTRs, we applied the LTR_retriever pipeline65, which can integrate results from public programs such as LTR_FINDER66 and LTRharvest67 and efficiently remove false positives from the initial predictions. The predicted LTRs were further classified into intact and non-intact LTRs, and the insertion time was estimated as T=K/2μ using the scripts implemented in the LTR_retriever package65.

Reference Publication(s)

1.  Zhang J, Zhang X, Tang H, et al. Allele-defined genome of the autopolyploid sugarcane Saccharum spontaneum L [published correction appears in Nat Genet. 2018 Dec;50(12):1754]. Nat Genet. 2018;50(11):1565-1573.