TropGBTropical Crops Genome Database

Oilpalm /油棕

Taxonomy:   Angiosperms / Monocots / Commelinids /Arecales /Arecaceae/Arecoideae/Cocoseae/Elaeis

Introduction

1. Since palm oil contains more saturated fats than oils made from canola, corn, linseed, soybeans, safflower, and sunflowers, it can withstand extreme deep-frying heat and resists oxidation

2. It contains no trans fat, and its use in food has increased as food-labelling laws have changed to specify trans fat content.

3. Human use of oil palms may date back to about 5,000 years in coastal west Africa.

4. Elaeis guineensis is now extensively cultivated in tropical countries outside Africa, particularly Malaysia and Indonesia which together produce most of the world supply.

5. Palm oil is typically considered the most controversial of the cooking oils - for both health and environmental reasons.

Genomic Version Information

Elaeis guineensis EGV3

Genome Overview

The palm family (Arecaceae), consisting of ∼ 2600 species, is the third most economically important family of plants. The African oil palm (Elaeis guineensis) is one of the most important palms. However, the genome sequences of palms that are currently available are still limited and fragmented. Here, we report a high-quality chromosome-level reference genome of an oil palm, Dura, assembled by integrating long reads with ∼ 150× genome coverage. The assembled genome was 1.7 Gb in size, covering 94.5% of the estimated genome, of which 91.6% was assigned into 16 pseudochromosomes and 73.7% was repetitive sequences. Relying on the conserved synteny with oil palm, the existing draft genome sequences of both date palm and coconut were further assembled into chromosomal level. Transposon burst, particularly long terminal repeat retrotransposons, following the last whole-genome duplication, likely explains the genome size variation across palms. Sequence analysis of the VIRESCENS gene in palms suggests that DNA variations in this gene are related to fruit colors. Recent duplications of highly tandemly repeated pathogenesis-related proteins from the same tandem arrays play an important role in defense responses to Ganoderma. Whole-genome resequencing of both ancestral African and introduced oil palms in Southeast Asia reveals that genes under putative selection are notably associated with stress responses, suggesting adaptation to stresses in the new habitat. The genomic resources and insights gained in this study could be exploited for accelerating genetic improvement and understanding the evolution of palms.

Genome Information

Genome size (bp)1,701,312,507
GC content38.70%
Genome sequence No.932
Maximum genome sequence length (bp)160,148,325
Minimum genome sequence length (bp)511
Average genome sequence length (bp)1,825,443
Genome sequence N50 (bp)111,579,804
Genome sequence N90 (bp)37,783,746

Sequencing, Assembly, and Annotation

The same Dura tree, previously sequenced with Illumina platform, was sequenced using Single-Molecule Real-Time technology to improve the genome assembly. Genomic DNA was isolated using MagAttract HMW DNA Kit (Catalog No. 67563, Qiagen, Düsseldorf, Germany). Two 20-kb libraries were constructed and sequenced for > 150× coverage on PacBio Sequel II Sequencer (Pacific Biosciences, Menlo Park, CA) by BGI (Hong Kong, China). Flye v2.8 was used to assemble the genome (-g 1.8 g -m 10,000 --asm-coverage 50 -i 3). Cleaned paired-end reads of 300-bp insert libraries and ∼ 100× coverage from Illumina sequencing were used to polish the genome with Pilon.

RepeatModeler was first used to build a custom repeat library of the studied species. RepeatMasker was then employed to identifyrepetitive sequences based on the custom repeat library and Repbase database. Tandem repeats were further annotated using Tandem Repeats Finder. Finally, we combined and filtered these repetitive sequences to obtain the nonredundant repeat annotations of the genome based on the coordinates. Assessment of the intact LTR retrotransposons was carried out using LTR_retriever. Demographic history of the TEs was inferred by investigation on the most abundant LTRs. One hundred LTRs were randomly selected from 40 random subfamilies of Copia. Full sequences were extracted and aligned with MUSCLE. The distribution of pairwise sequence similarity within a family was used to estimate the temporal dynamics of TE activity.

Genome was annotated with MAKER2 pipeline. Genome sequences were first soft-masked using RepeatMasker, based on the aforementioned repetitive libraries. Cleaned mRNA sequencing reads of multiple organs from our previous study were assembled with Trinity and used for evidence-based annotation. For ab initio gene model prediction, protein sequences of E. guineensis EG5.1 and EGv2, date palm Barhee BC4, and coconut HainanTall were used as evidence. SNAP and AUGUSTUS were iteratively used to train gene models. Predicated gene models that contained TE domains and were not supported by transcripts were filtered. Cleaned gene models were then annotated by BLAST to Non-Redundant Protein Sequence Database and RefSeq databases with BLASTP (E-value < 1E–10).

Reference Publication(s)

Oil palm genome sequence reveals divergence of interfertile species in Old and New worlds.", Nature, 2013 Aug 15;500(7462):335-9DOI:10.1016/j.gpb.2022.11.002