TropGBTropical Crops Genome Database

Oilpalm /油棕

Taxonomy:   Angiosperms / Monocots / Commelinids /Arecales /Arecaceae/Arecoideae/Cocoseae/Elaeis

Introduction

1. Since palm oil contains more saturated fats than oils made from canola, corn, linseed, soybeans, safflower, and sunflowers, it can withstand extreme deep-frying heat and resists oxidation

2. It contains no trans fat, and its use in food has increased as food-labelling laws have changed to specify trans fat content.

3. Human use of oil palms may date back to about 5,000 years in coastal west Africa.

4. Elaeis guineensis is now extensively cultivated in tropical countries outside Africa, particularly Malaysia and Indonesia which together produce most of the world supply.

5. Palm oil is typically considered the most controversial of the cooking oils - for both health and environmental reasons.

Genomic Version Information

Elaeis guineensis EG11

Genome Overview

Elaeis guineensis and E. oleifera are the two species of oil palm. E. guineensis is the most widely cultivated commercial species, and introgression of desirable traits from E. oleifera is ongoing. We report an improved E. guineensis genome assembly with substantially increased continuity and completeness, as well as the first chromosome-scale E. oleifera genome assembly. Each assembly was obtained by integration of long-read sequencing, proximity ligation sequencing, optical mapping, and genetic mapping. High interspecific genome conservation is observed between the two species. The study provides the most extensive gene annotation to date, including 46,697 E. guineensis and 38,658 E. oleifera gene predictions. Analyses of repetitive element families further resolve the DNA repeat architecture of both genomes. Comparative genomic analyses identified experimentally validated small structural variants between the oil palm species and resolved the mechanism of chromosomal fusions responsible for the evolutionary descending dysploidy from 18 to 16 chromosomes.

Genome Information

Genome size1.8Gb
Total ungapped length1.7 Gb
Number of chromosomes16
Number of scaffolds29
Scaffold N50128.3Mb
Scaffold L507
Number of contigs37,553
Contig N50239.2kb
Contig L501,923
GC percent38.5
Genome coverage60.0x

Sequencing, Assembly, and Annotation

The E. guineensis (AVROS pisifera fruit form) and E. oleifera reference genomes were sequenced to 60× and 48× coverage of reads ≥ 10 Kb by PacBio Sequel sequencing, respectively. The specific oil palms used for genome sequencing are the same as those used for the initial E. guineensis reference genome assembly (P5) and E. oleifera draft reference genome (O7) (Singh, Ong-Abdullah, et al. 2013). Both species underwent proximity ligation sequencing (HiC and Chicago) by Dovetail Genomics, as well as Bionano optical mapping (McDonnell Genome Institute at Washington University in St. Louis, MO, USA).

In preparation for the assembly, E. guineensis PacBio reads were subsampled down to 10 Kb or longer. The ≥ 10 Kb reads were then subsampled to 60x coverage of the expected 1.8-Gb genome size. SMRT Link, the PacBio analysis web application, was then used to run the Falcon pipeline resulting in P11a genome build. The resulting assembly was then polished using Quiver. After this, the resulting assembly was sent to Dovetail Genomics for Chicago and HiC assembly, designated as P11b and P11c builds, respectively. Bionano optical mapping was conducted using the Saphyr software to correct the assembly a final time, resulting in the P11d2 build (Supplementary Table 1). Bionano can use the physical DNA to estimate the distance between tagged sites and add a proportional amount of N's for the gaps. E. oleifera PacBio reads were loaded into SMRT Link and were filtered down to a set of reads that were ≥ 10 Kb. The reads were further subsampled for use in WTDBG2, resulting in 3,078,930 reads with 47× coverage. WTDBG2 was then run according to the default pipeline, which involved aligning the reads to themselves to create a consensus. The consensus was then aligned with 8 Kb Illumina linker libraries, and a new consensus called. This polishing step was carried out twice. The O12a assembly was then sent to Dovetail Genomics for Chicago (O12b build) and HiC (O12c build) assembly using HiRise. At this point, the original assembly was broken and joined according to the long-distance read information. The O12c assembly was then corrected using Bionano scaffolding, where the scaffolds were broken and joined according to the optical map, resulting In the O12d build. The O12d build was polished with Illumina linker library reads to generate O12e2 build (Supplementary Table 1). The resulting assemblies were quality control tested using Merqury (Rhie et al. 2020) and LTR Assembly Index (LAI) program (Ou et al. 2018). The LAI program uses the output of LTRharvest (Ellinghaus et al. 2008), LTR_FINDER_ parallel (Ou and Jiang 2019), and LTR_retriever (Ou and Jiang 2018) to estimate LTR assembly index. LTR_FINDER_ parallel and LTRharvest were run using default values. BUSCO5 (genome mode) (Seppey et al. 2019) analysis was carried out using the Liliopsida profiles.

Gene ontology, enzyme code, EggNOG orthologs, and KEGG pathway annotations of the transcripts were determined using the following procedure. Protein sequences translated from the CDS of transcripts were searched for protein homology using BLASTP (e-value cutoff: 1e-5) against the GenBank RefSeq protein database, followed by the GenBank non-redundant (nr) protein database for gene models with no significant hit to RefSeq sequences. The BLASTP searches were limited to proteins that were listed under the embryophyte (txid3193) but not Elaeis (txid51952) taxonomy, and not annotated as hypothetical, predicted, uncharacterized, unnamed, unknown, low quality, or partial genes. The filtered RefSeq and nr datasets downloaded on 18-Feb-2022 consisted of 4,339,261 and 8,884,589 proteins, respectively. InterProScan (Jones et al. 2014) was used to search for protein functions in the following databases: CDD-3.18, Coils-2.2.1, Gene3D-4.3.0, Hamap-2020_05, MobiDBLite-2.0, PANTHER-15.0, Pfam-33.1, Phobius-1.01, PIRSF-3.10, PIRSR-2021_02, PRINTS-42.0, ProSitePatterns-2021_01, ProSiteProfiles-2021_01, SFLD-4, SignalP_EUK-4.1, SMART-7.1, SUPERFAMILY-1.75, TIGRFAM-15.0, and TMHMM-2.0c. The BLASTP and InterProScan results in XML format were imported into OmicsBox to determine the final gene annotation.

Reference Publication(s)

Low EL et al., "Chromosome-scale Elaeis guineensis and E. oleifera assemblies: comparative genomics of oil palm and other Arecaceae.", G3 (Bethesda), 2024 Sep 4;14(9)DOI:10.1093/g3journal/jkae135