HUA Jin-feng ,ZHANG Lei ,HAN Yong-hua ,GOU Xiao-wan ,CHEN Tian-yuan ,HUANG Yong-mei,LI Yan-qing,MA Dai-fu,LI Zong-yun#
1 Institute of Integrative Plant Biology/Jiangsu Key Laboratory of Phylogenomics and Comparative Genomics/School of Life Sciences,Jiangsu Normal University,Xuzhou 221116,P.R.China
2 Sweet Potato Laboratory,Institute of Maize Research,Guangxi Academy of Agricultural Sciences,Nanning 530007,P.R.China
3 Xuzhou Academy of Agricultural Sciences/Institute of Sweet Potato Research,Chinese Academy of Agricultural Sciences,Xuzhou 221131,P.R.China
Abstract Cylas formicarius is one of the most important pests of sweet potato worldwide,causing considerable ecological and economic damage. This study improved the effect of comprehensive management and understanding of genetic mechanisms by examining the functional genomics of C.formicarius. Using Illumina and PacBio sequencing,this study obtained a chromosome-level genome assembly of adult weevils from lines inbred for 15 generations. The high-quality assembly obtained was 338.84 Mb,with contig and scaffold N50 values of 14.97 and 34.23 Mb,respectively. In total,157.51 Mb of repeat sequences and 11 907 protein-coding genes were predicted. A total of 337.06 Mb of genomic sequences was located on the 11 chromosomes,accounting for 99.03% of the total length of the associated chromosome. Comparative genomic analysis showed that C.formicarius was sister to Dendroctonus ponderosae,and C.formicarius diverged from D.ponderosae approximately 138.89 million years ago (Mya). Many important gene families expanded in the C.formicarius genome were involved in the detoxification of pesticides,tolerance to cold stress and chemosensory system. To further study the role of odorant-binding proteins (OBPs) in olfactory recognition of C.formicarius,the binding assay results indicated that CforOBP4-6 had strong binding affinities for sex pheromones and other ligands. The high-quality C.formicarius genome provides a valuable resource to reveal the molecular ecological basis,genetic mechanism,and evolutionary process of major agricultural pests;it also offers new ideas and new technologies for ecologically sustainable pest control.
Keywords: Cylas formicarius,PacBio sequencing,high-through chromosome conformation capture,chromosome-level genome,chemosensory genes,fluorescence competitive binding
Brentidae,members of which are also called primitive weevils,includes over 11 000 described extant species,including many of the world’s insect pest species(Bouchardet al.2011;Gunteret al.2016;Schon and Skuhrovec 2016). Weevils are responsible for substantial damage to food and cash crops,causing severe reductions in crop yields and considerable economic loss.Sweet potato weevil (SPW),CylasformicariusFabricius(Coleoptera: Brentidae),is a major destructive pest that causes drastic yield decline,resulting in a loss of millions of dollars annually (Kyerekoet al.2019). Although olfaction-based strategies have been used to prevent and control infestations of the SPW as part of integrated pest management (IPM) programmes (Coffeltet al.1978;Heathet al.1986),they exhibit a unique ability to damage sweet potato throughout tropical and subtropical regions of the world (Kyerekoet al.2019). To provide a wealth of information to improve the effect of comprehensive management and understand the molecular ecology and evolution of this species,the genetic functions ofC.formicariushave been the subject of intensive study.
Sweet potato (Ipomoeabatatas[L.] Lam),the seventh most important crop in the world and the fourth most significant crop in China,is an important source of calories,proteins,vitamins,and minerals for humans(Yanget al.2017). Sweet potato has immense potential to play a major role in human nutrition,food security,and poverty alleviation in developing countries (Bovell-Benjamin 2007).Cylasformicariusis the major destructive pest of sweet potato throughout Africa,Asia,the Pacific islands,the Caribbean,the USA,Venezuela,and Guyana(Kyerekoet al.2019);it has been found in higher-latitude areas as well (Sutherland 1986a). AlthoughC.formicariusprefers sweet potato,more than 30 species ofIpomoeaand other genera have been recorded as its host plants(Sutherland 1986b;Austinet al.1991). In southern China(Jiangsu,Zhejiang,Jiangxi,Hunan,Fujian,Guizhou,Sichuan,Yunnan,Guangxi,Guangdong,Hainan,and Taiwan),C.formicariuscan produce several generations per year and overwinter in storage or open fields (Maet al.2016). After originating in the Indian subcontinent approximately 80-100 million years ago (Mya) (Wolfe 1991),C.formicariusfirst became associated with sweet potato,which originated in or near northwestern South America at the beginning of the sixteenth century.Cylas formicariuswas first described in 1792-1794 by Fabricius from Trenquebar (India),and it was first reported as an economic pest in 1857 (Sutherland 1986b). Over the course of 150 years of research,many studies onC.formicariusmanagement and control have been carried out,including studies on agricultural measures,chemical and biological control,host plant resistance,insect sterilization techniques,and IPM.
Furthermore,the sex pheromone ofC.formicarius,(Z)-3-dodecen-1-yl(E)-2-butenoate,was first extracted in 1978 (Coffeltet al.1978),and the bioactivity of the synthetic compound was tested in both laboratory and field experiments in 1986 (Heathet al.1986). Olfactionbased approaches,using synthetic sex pheromones to monitor and interfere with the pests’ ability to find suitable mates,have been used successfully in “pushpull” control strategies (Hleremaet al.2017). However,there are no effective control strategies for application in the production of sweet potato becauseC.formicariuspopulations have overlapping generations,are active throughout the year,have unknown larval feeding behaviour,fly short distances,and are mainly nocturnal as adults (Kyerekoet al.2019). Generally,in quarantine areas worldwide,C.formicariuscauses extensive loss of sweet potato (Kyerekoet al.2019). Thus,there is an urgent need to develop alternative pest control methods. So far,the development of different but related gene expression patterns has been reported based on transcriptome analysis (Maet al.2016;Binet al.2017);however,genomic research onC.formicarius,including on the mechanism of environmental adaptation and the molecular mechanism of olfactory recognition,has been very limited.
The present study reported a chromosome-level genome assembly ofC.formicariususing Illumina pairedend (PE) sequencing,Pacific Biosciences (PacBio) long reads and high-throughput chromosome conformation capture technology (Hi-C) chromatin interaction maps.The high-quality genome sequence will provide a strong foundation for the biological study ofC.formicarius,which will advance the knowledge of the mechanisms of molecular evolution,host-plant specialization,ecological adaptation,and innovative pest control.
Cylasformicarius(National Center for Biotechnology Information (NCBI) Taxonomy ID: 2 611 543) (Fig.1) was collected from Nanning,Guangxi,China,followed by 15 generations of single-pair mating with fresh sweet potato roots at a temperature of (27±1)°C under (60±5)% relative humidity (RH) and a 16 h:8 h light-dark (L:D) photoperiod at the School of Life Sciences,Jiangsu Normal University,China.
Fig.1 Life cycle of Cylas formicarius.
For the genome sequencing,weevils were anaesthetized,and midguts were removed to avoid contamination by genomes from gut microbes and host plants ofC.formicarius. High-molecular-weight genomic DNA was isolated from approximately 400 female weevils using a TreliefTM Animal Genomic DNA Kit (Tsingke Biological Technology,China),and the DNA quality and quantity were assessed using a NanoDrop? spectrophotometer(Thermo Fisher Scientific,USA) and a Qubit?3.0 Fluorometer (Invitrogen,USA). The extracted DNA was then used to construct Illumina libraries and PacBio RSII libraries. PE genomic libraries with an insertion length of 270 bp were constructed from fragmented DNA and sequenced on an Illumina HiSeq X Ten platform (Illumina,San Diego,CA,USA) according to the Illumina TruSeq Nano DNA Library Prep Kit. For long-read sequencing,single-molecule real-time (SMRT) cell 20-kb DNA libraries were constructed on a PacBio Sequel sequencer (Pacific Biosciences,Menlo Park,CA,USA) according to the standard PacBio protocol;one movie of the SMRT cells was acquired at Biomarker Technologies Corporation(Beijing,China). The original data were subjected to strict quality control before assembly. For Illumina data,we used the Trimmomatic programv.0.33 (Trimmomatic,RRID:SCR-011848),with default parameters,to remove adaptor sequences and trim low-quality reads. Lowquality bases from the starts or ends of raw reads were trimmed,and reads were scanned using a 5-bp sliding window and trimmed when the average quality per base dropped below 40. The clean reads obtained from this process were used for subsequent analysis.
For the genomic survey,the sequence data from 270-bp PE libraries were employed to analyse thek-mer(k=19) depth frequency distribution map. The size of theC.formicariusgenome was estimated as follows:G=K-num/K-depth,where G represents the genome size,K-num is the total number of 19-mers,and K-depth is the averagek-mer depth. The 19-mer peak was at a depth of 61X. The estimated genome size was used to obtain the subsequent genome assembly results.
The PacBio long-read data were assembled using an overlap-layout-consensus method. First,the longer reads were selected,corrected,and then used to obtain a draft assembly. Second,the draft assembly was polished. PacBio long reads were corrected using the Canu Program (v.1.7) (Korenet al.2017). To obtain their maximum supported range,error-corrected reads were obtained by trimming unsupported bases and hairpin adapters with default parameters,and then the draft assembly was generated. PacBio data were assembled into contigs with the Wtdbg Program (Ruan and Li 2020). To further improve the accuracy of the assembly and evaluate the genomic integrity,two rounds of consensus correction were performed using Illumina reads mapped with Burrows-Wheeler Aligner (BWA v.0.7.10-r789) and Pilon (Pilon,RRID:SCR-014731)Software. The Core Eukaryotic Genes Mapping Approach (CEGMA) analysis with CEGMAv.2.5 (http://korflab.ucdavis.edu/Datasets) (Parraet al.2007) was then conducted. Finally,we performed Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis using BUSCOv.2.0 (http://busco.ezlab.org) (Simaoet al.2015).
The present study constructed the Hi-C fragment library ranging from 300 to 700 bp by following previously described protocols (Raoet al.2014;Naganoet al.2015).Insect tissue was fixed with 2% (v/v) formaldehyde in PBS,and the DNA was digested withHindIII. The sticky ends were biotinylated and proximity-ligated to form chimaeric junctions,which were enriched and further sheared into 300-700 bp fragments by sonication.These chimaeric fragments were sequenced using a PE strategy on the Illumina HiSeq X Ten Platform at Biomarker Technologies Corporation (Beijing,China). To obtain clean data,adapter sequences and low-quality PE reads were removed using the Trimmomatic Program v.0.33 with default parameters. Then,the clean reads were mapped using BWAv.0.7.10-r789 (Li and Durbin 2009). Only uniquely aligned PE reads were considered for subsequent analysis. Identification and filtering of the invalid read pairs,sorting,and quality assessment were performed using HiC-Pro (v.2.11.1) (Servantet al.2015).Lachesis Software was employed to use the verified data to group,cluster,sort and orient the contigs into chromosome-level sequences (Burtonet al.2013).
We used the RepeatScoutv.1.05,PILER-DFv.2.4,LTR-FINDERv.1.05 and MITE-Hunterv.1.0.0 software packages to construct ade novorepeat library with default parameters. Repeat sequences are less well conserved among species and play an important role in genome evolution (Treangen and Salzberg 2012). First,we used TRF (v.4.09),RepeatMasker (v.3.3.0;RepeatMasker,RRID:SCR 012954) and Repeat Protein Mask (v.3.3.0)to detect repeat sequences and classify different types of repetitive sequences by aligning genome sequences to the Repbase library (v.17.01) (Baoet al.2015). Next,a RepeatModeler analysis was conducted on thede novolibrary,and RepeatMasker (v.4.0.6) was used (Tarailo-Graovac and Chen 2009). Then,PASTEClassifier was used to classify the repeat libraries,and the Repbase database was used to merge the libraries. Finally,RepeatMaskerv.4.0.6 was used to identify the repeat regions by aligning sequences against the Repbase andde novorepeat libraries.
After masking the repeat sequence,de novo-based,homologue-based,and RNA sequence-based gene methods were employed for gene prediction in theC.formicariusgenome assembly. We first used the software programs Genscanv.3.1,Augustusv.3.0(Augustus,RRID: SCR_008417),GlimmerHMMv.3.0.4,SNAP v2006-07-28 and GeneIDv.1.4 forde novoprediction. The protein sequences of the coleopteran insectsC.formicarius,Anoplophoraglabripennis,D.ponderosae,Oryctesborbonicus,Triboliumcastaneum,andDrosophilamelanogasterwere downloaded from the NCBI database (https://www.ncbi.nlm.nih.gov/genome/),and the homology-based prediction was performed using GeMoMav.1.3.1 (Keilwagenet al.2016) as a reference.
In the process of RNA sequence-based gene prediction,11 RNA samples were extracted from postlarvae and 3-day-old adult tissues (antennae,heads,legs,thoraxes,and abdomens) of males and females using a SV Total RNA Isolation System Kit (Promega Madison,WI,USA). The RNA-seq reads ofC.formicariusfrom Illumina sequences were assembled with the reference genome using Hisatv.2.0.4 and Stringtiev.1.2.3. After filtering,TransDecoderv.2.0,GenemarkS-Tv.5.1,and Program to Assemble Spliced Alignments (PASA,RRID:SCR_014656) were used for RNA-seq-based gene prediction. Finally,the results of gene annotation from the three approaches were integrated by EVidenceModeler(EVM)v.1.1.148 (EVM,RRID:SCR_014659) (Haaset al.2008).
According to the structural characteristics of different non-coding RNAs,different strategies (rRNA,microRNA,and tRNA prediction) were adopted to predict theC.formicariusgenome. The rRNAs were predicted using RNAmmerv.1.2 Software by aligning theC.formicariusgenome to the Rfam database (release 13.0). The tRNAs were identified using tRNAscan-SEv.1.3.1 (tRNAscan-SE,RRID: SCR_010835) Software with default parameters for eukaryotes. Based on the miRBase database,Infinal 1.1 was used to predict microRNAs. The results of non-coding RNA annotation are presented.
For the annotation of pseudogenes,Genblasta v.1.0.4 was used to search for sequences homologous to the known protein-coding genes in theC.formicariusgenome(Sheet al.2009). The premature termination codons or frameshift mutations located in the above sequences were identified,and pseudogenes were obtained using Genewisev.2.4.1 (RRID:SCR 015054) (Birneyet al.2004). A total of 503 pseudogenes were annotated in the genome ofC.formicarius.
Gene functional annotation was performed by alignment to the Eukaryotic Orthologous Groups of proteins (KOG)database,nucleotide collection (nr/nt),TrEMBL database,Kyoto Encyclopedia of Genes and Genomes (KEGG),and Swiss-Prot protein knowledgebase (https://www.uniprot.org/) using Basic Local Alignment Search Tool (BLAST)v.2.2.31 and KAASv.2.1. Furthermore,InterProScan v.5.8-49.0 (RRID:SCR 005829) was used to annotate conserved functional motifs and protein domains,and the functional annotations were aligned to the following databases: PROSITE (RRID: SCR003457),PRINTS(RRID: SCR 003412),SUPERFAMILY,PANTHER(RRID: SCR 004869),TIGRFAMs,SMRT 4.0 (RRID:SCR 005026),PIRSF,ProDom (RRID: SCR 006969),Pfam(RRID: SCR 004726),HAMAP,and CATH-Gene3D.
This study predicted orthologs and inferred a phylogenetic tree using the whole-genome sequence ofC.formicariusand 15 published whole-genome sequences,namely,those ofT.castaneum(Tribolium Genome Sequencing Consortium 2008),Onthophagustaurus(Choiet al.2010),D.ponderosae(Keelinget al.2013),A.glabripennis(McKennaet al.2016),O.borbonicus(Meyeret al.2016),Agrilusplanipennis,Bombyxmori(Xiaet al.2004),Apismellifera(Honeybee Genome Sequencing Consortium 2006),Locustamigratoria(Wanget al.2014),D.melanogaster(Gelbart 1992),Acyrthosiphonpisum(Keith 2010),Pediculushumanus(Pittendrighet al.2006),Cimexlectularius(Rosenfeldet al.2016),Zootermopsisnevadensis(Terraponet al.2014),andCaenorhabditiselegans(Consortium 1998).This study aligned all the protein sequences translated from the longest transcripts of each pairwise gene using BLASTP (E-value cut-off of 1E-5). The BLASTP results were used to cluster gene families and 1:1 orthologous gene sets in OrthoMCL. Multiple alignments were performed for each orthologue group of the coding sequences of the single-copy families using MAFFT.The sequences of 1:1 orthologous genes were aligned using MUSCLE with default parameters. The alignments were cleaned using Gblocks with the parameters:-t=pb5=h-b4=5-b3=15-d=y-n=y. Using the orthologous single-copy genes of the 16 species,the genes in each species were connected to obtain super-sequences for phylogenetic tree building. Maximum-likelihood phylogenetic analysis (MLPA) with 1 000 bootstrap repeats and discrete gamma distribution across sites was performed by PhyML3.0 Software.Caenorhabditis eleganswas used as an outgroup. The CodeML model in MLPA and a branch site model were applied to analyze the selective pressure of single-copy genes in each species. The functional annotation and enrichment analysis of the obtained rapidly evolving genes were carried out using GO and KEGG,respectively. Based on the phylogenetic tree,divergence time was estimated using MCMCTREE in the PAML package. The TimeTree database and divergence times were applied to calibrate the time. The fossil calibration used in the evolutionary trees was as follows: (Tcas,Dpon)’<234>208’ and (Dmel,Cele)’<743>728’. We assessed the convergence of the independent runs by a comparison of likelihood scores and model parameter estimates in TRACERv.1.5.
The most recent common ancestor of the 16 species was used in an analysis of expansion and contraction.OrthoMCL (v.2.0) was used to cluster the homologous groups of the 16 species. Detection for the expansion and contraction of orthologous gene families were performed with Computational Analysis of gene Family Evolution(CAFEv.4.1) (De Bieet al.2006),which uses birth-death models to infer if the gene tree supports gene gain or loss.
To assign functions to the newly annotated genes,thehsps,chemosensory,and insecticide resistance gene families were manually annotated using the NCBI BLAST program withT.castaneumsequences as queries (http://blast.ncbi.nlm.nih.gov/Blast.cgi). The protein sequences ofhspsand insecticide resistance genes ofT.castaneum(Tcas),A.glabripennis(Agla),D.ponderosae(Dpon),A.planipennis(Aplan),andO.taurus(Otau) were downloaded from GenBank (Zhanget al.2020). The protein sequences of olfactory receptors (ORs) and OBPs ofT.castaneum(Tcas),D.ponderosae(Dpon),Anoplophorachinensis(Achi),Sitophiluszeamais(Szea),andA.glabripennis(Agla) were downloaded from GenBank (Mitchellet al.2020). The amino acid sequences were used to construct an maximum-likelihood(ML) tree. The phylogenetic tree was constructed in MEGA X10.0 using the ML method (Tamuraet al.2011)with a suitable evolutionary model. Finally,the tree was annotated using Evoview Software.
The tissue-specific expression profiles ofCforOBP4-36were evaluated by qRT-PCR analysis. cDNA was generated following the instructions of the PrimeScript RT reagent Kit with gDNA Eraser (TaKaRa,Beijing,China).Three reference genes,namely,Cforβ-actin(GenBank accession MH 716465),CforGAPDH(GenBank accession no.MT 512411),andCforUBE4A(GenBank accession no.MT 512412),were used as the controls (Huaet al.2021b).The specific primers are listed in Appendix A. qRT-PCR was performed using TB Green PremixEx TaqII (TaKaRa,Beijing,China) in a CFX96 Real-Time PCR Detection System (Bio-Rad,Richmond,CA,USA). Thermal cycling was performed using the following parameters: initial denaturation at 94°C for 30 s,followed by 40 cycles of 94°C for 5 s and 60°C for 30 s. A standard curve was generated according to the dilution concentration and the corresponding CT value,and the amplification efficiency was calculated by the equation E=10-1/slope. The means and standard errors were calculated for three biological replicates with three technical replicates of each tissue and control. The relative CT values were analyzed using the 2-ΔΔCTmethod (Huaet al.2021a).
In order to obtain purified protein,recombinant CforOBP4-6 proteins were expressed inEscherichia colicells. Prokaryotic expression primers (Appendix A)were designed for CforOBP4-6 (the signal peptide was removed) and contained theBamHI andEcoRI restriction sites according to the methods described in a previous study (Huaet al.2021a). The positive clone was cultured in 5 mL of Luria-Bertani (LB) liquid medium containing kanamycin (50 μg mL-1) and grown overnight at 37°C with shaking (220 r min-1). The culture was then diluted to 500 mL in LB medium and cultured at 37°C. Recombinant protein expression was induced by the addition of IPTG at a final concentration of 1 mmol L-1when the OD (600 nm)of the culture reached 0.6 to 0.8. After 4 h of incubation at 28°C with shaking (180 r min-1),the cells were harvested by centrifugation (12 000 r min-1,4°C,10 min)and sonicated in binding buffer (20 mmol L-1sodium phosphate,0.5 mol L-1NaCl,20 mmol L-1imidazole;pH 7.4). The recombinant CforOBP4-6 proteins were purified using a Ni-ion affinity chromatography column(HisTrap HP;GE Healthcare,Piscataway,NJ,USA).The His-tag was removed using enterokinase solution(2 U mg-1) (Sigma St Louis,MO,USA) at 24°C for 4 h.The purified protein was analyzed by 15% SDS-PAGE and then dialyzed using a HiTrap desalting column (GE Healthcare,Piscataway,NJ,USA). The concentration of the protein was measured with the BCA Assay Kit(Sangong,Shanghai,China). The purified protein was stored at -80°C.
A fluorescence displacement binding assay was performed to determine the affinity of CforOBP4-6 for 102 volatiles according to the methods described in a previous study(Huaet al.2021b). The ligand-binding experiment was carried out on an F-4700 fluorescence spectrophotometer(Hitachi,Japan) using N-phenyl-1-naphthylamine (1-NPN)as a fluorescent probe with excitation at 337 nm,and emission spectra were recorded from 400 to 550 nm. 1-NPN and all the ligands were dissolved in methanol. The binding constants for 1-NPN were measured by adding 1-NPN to 2 μmol L-1CforOBP4-6 in 50 mmol L-1Tris-HCl buffer (pH 7.4) to achieve final concentrations ranging from 2 to 20 μmol L-1. The binding affinities of the ligands were measured using 1-NPN as a fluorescence probe with a stoichiometry of 1:1 (ligand:protein). The binding affinities of ligands were obtained from an average of three independent experiments with a stoichiometry of 1:1 (1-NPN and CforOBPs). The CforOBP4/1-NPN dissociation constant(Kd) and the curves were calculated from the relative Scatchard plots using GraphPad Prism 8 Software(GraphPad,La Jolla,CA,USA). The binding affinity(Ki) of the competitors was calculated based on the IC50values using the equation:
where [1-NPN] is the free concentration of 1-NPN andK1-NPNis the dissociation constant of the CforOBP4/1-NPN complex.
The genome size ofC.formicariuswas estimated by k-mer analyses of the Illumina DNA data (Appendix B). The total number ofk-mers obtained from the sequencing data was 24 314 168 078. After removing abnormalk-mers,a total of 22 494 390 501k-mers were used to evaluate genome size and characteristics,and theC.formicariusgenome size was estimated to be 364.51 Mb (Appendix C).Thek-mer distribution analysis showed that the repetitive sequence content was approximately 28.60%.There was no obvious heterozygous peak. A quality check did not find plant and microbe contamination.These characteristics implied that the isogenic line ofC.formicariusis a simple genome,which is conducive to the assembly of a genome.
A combination of sequencing strategies was used: long reads (PacBio) for a primary assembly with short reads(Illumina) for polishing and chimeric reads from a Hi-C library to a super scaffold (Appendices D and E). A 338.84 Mb assembly was obtained forC.formicariuswith a contig N50 length of 14.97 Mb and the longest contig length of 27.31 Mb (Table 1).
Table 1 Summary of assembly results of different methods for Cylas formicarius
The completeness of the draft genome was evaluated with single-copy orthologous genes using BUSCO and analysis with CEGMAv.2.5. The CEGMAv.2.5 analysis showed that 98.03% of the 458 conserved core genes of eukaryotes in the CEGMA database were completely detected,and 94.76% of the 248 highly conserved core genes of eukaryotes were found in the assembled genome. Furthermore,98.87 and 0.56% of the 1 054highly conserved insect orthologues from BUSCOv.3.0.1 were identified as complete and fragmented,respectively,in the assembly (Appendix F). In total,approximately 98.01 of reads and 96.04% of proper reads were mapped to the assembled genome sequences (Appendix G).
This study obtained 32.83 Gb of clean reads from the Hi-C fragment library after filtering,representing 97-fold coverage of the draft genome (338.84 Mb). In total,80.04 Mb of unique mapped read pairs and 32.23 Mb of valid interaction pairs were generated (Appendix H). After error correction with Illumina PE sequencing,PacBio long-read sequencing and Hi-C interaction maps,this study obtained a final assembly of 337.06 Mb in size,comprising 221 contigs and 154 scaffolds,with a contig N50 of 13.21 Mb and a scaffold N50 of 34.23 Mb. Finally,a total of 337.06 Mb of genome sequences,accounting for 99.42% of the assembled draft genome,was anchored to 11 pseudo-chromosomes. More importantly,78 scaffolds,comprising 99.03% of the total sequence length,were ordered and oriented (Appendices I and J).These results indicated that the assembled draft genome ofC.formicariushad a high degree of continuity and completeness.
In total,theC.formicariusgenome was found to contain 157.51 Mb of repetitive sequences (approximately 46.46% of the assembled genome),of which 41.55%were transposable elements (TEs) (Appendix K). DNA transposons accounted for 21.61% of theC.formicariusgenome,and the most common classification assigned to these repetitive elements was terminal inverted repeat (TIR) (64.78 Mb of length,19.11% of assembly)(Appendix L).
This study used three different methods to predict proteincoding genes in theC.formicariusgenome:ab initio,RNA-seq-based method,and homology-based method.Then,EVM v.1.1.1 Software was used to integrate the prediction results. Finally,1 386 motifs and 24 493 protein domains were annotated in the genome ofC.formicarius.A total of 11 907 protein-coding genes were found in theC.formicariusgenome (Appendix M),divided into 9 291 gene families,75 of which were unique gene families(Appendix N). With the support of RNA-seq-based and homology-based gene prediction methods,the final prediction results showed 11 610 protein-coding genes,accounting for 97.51% of the total protein-coding genes(Appendix O),showing a good gene prediction effect on theC.formicariusgenome. Functional annotation statistics showed that 11 469 genes (96.32% of the predicted genes) were assigned to corresponding putative functions (Appendix P). Compared with other known coleopteran genomes,the number of genes in theC.formicariusgenome was similar to that inT.castaneum(12 841),O.taurus(14 402),D.ponderosae(12 102),A.glabripennis(14 533),O.borbonicus(14 402),andA.planipennis(11 373) (Appendix N). We also identified and annotated various non-coding RNA sequences,including 102 rRNAs,165 tRNAs,and 40 miRNAs(Appendix Q).
The phylogenetic analysis showed that there were fewer species-specific genes inC.formicarius(223) than in the other 15 species of insects,except the Coleopteran insectO.borbonicus(55) and Hymenopteran insectA.mellifera(141) (Fig.2). The 11 907 one-to-one homologous genes from the gene family analysis were used to infer the phylogeny. Finally,223 unigenes fromC.formicariuswere obtained,corresponding to 75 gene families. In total,11 011 orthologues were identified,which could be clustered with the other 15 insects,including 896 unclusters (Appendix N). Among them,the proportion of species-specific genes in the six coleopteran insect genomes ranged from 0.82%(O.borbonicus) to 7.38% (O.taurus). All bootstrap values of the nodes generated were above 90%,the majority being higher than 99%.Cylasformicariusand two other coleopteran insects clustered together.CylasformicariusandD.ponderosaediverged from the common ancestor ofA.glabripennis187.54 Mya,and the divergence time betweenD.ponderosaeandC.formicariuswas approximately 138.89 Mya (Fig.2).The coleopteran insects from the same order were clustered together and formed a clade,clearly sharing a common ancestor. The evolution time of the other insects was consistent with previous studies (McKennaet al.2016),and the differentiation time ofC.eleganswas the most primitive as an outgroup.
Fig.2 Phylogenetic trees and gene orthology of Cylas formicarius and six other coleopteran insects and model insects. Node values show the divergence times from the present. The 16 species used for comparison were Tribolium castaneum,Onthophagus taurus,Dendroctonus ponderosae,Anoplophora glabripennis,Oryctes borbonicus,Agrilus planipennis,Bombyx mori,Apis mellifera,Locusta migratoria,Drosophila melanogaster,Acyrthosiphon pisum,Pediculus humanus,Cimex lectularius,Zootermopsis nevadensis,Caenorhabditis elegans,and Cylas formicarius. The gene set of each species was subdivided into different types of orthology clusters. 1:1:1,universal single-copy gene families across all examined species;N:N:N,multiple-copy orthologous genes in common gene families;Specific,genes from a unique gene family from each species;Other,genes that do not belong to any abovementioned orthologous categories;Uncluster,genes that do not cluster with any families.
To elucidate key genomic changes associated with adaptation,a total of 132 846 gene families in the most recent common ancestor of the 16 species were obtained by analyzing the gene family expansion and contraction.The numbers of expanded and contracted gene families inC.formicariuswere 31 and 28,respectively(Appendices R and S). Compared withD.ponderosaeandA.glabripennis,C.formicariushad 27 expanded and 16 contracted gene families,demonstrating that the number of expanded genes inC.formicariushad increased significantly. This result indicated thatC.formicariusmay have experienced more duplication events thanD.ponderosae. It was found that these genes inC.formicariuswere also the most abundant based on the multicopy homologous gene number(Appendix R). In addition,we performed GO and KEGG enrichment analyses of these expanded and contracted genes in theC.formicariusgenome (Appendix T). We found lineage-specific expansion of genes related to the biosynthesis of chemosensory metabolites,which may affect the biosynthesis of olfaction-related proteins and improve the olfactory sensitivity ofC.formicarius(Appendix U).
Based on the assembled draft genome,comprehensive genomic approaches were used to identifyhspgenes. A total of 30hspgenes were detected,including 9hsp70,16 small heat shock protein (shsp),onehsp90,and onehsp100gene. Compared to the other five coleopteran insects (T.castaneum,A.glabripennis,D.ponderosae,A.planipennis,andO.taurus),thehspgene superfamily ofC.formicariusis the most in number,in particular,shspandhsp70had the mosthsps(Fig.3;Appendices V and W).
Fig.3 Phylogenetic tree of the Hsp family. The Hsp sequences were from Cylas formicarius (Cfor,red),Dendroctonus ponderosae(Dpon,brown),Agrilus planipennis (Apla,violet),Tribolium castaneum (Tcas,green),and Onthophagus taurus (Otau,purple).
This study examined genes important to neuromuscular target site sensitivity,tissue penetration,and prominent gene families;the functions of these genes have been demonstrated in insects. These include the cationgated nicotinic acetylcholine receptors (nAChRs),the γ-amino butyric acid (GABA)-gated anion channels and the histamine-gated chloride channels (HisCls),cuticular proteins,cytochrome P450 monooxygenases (CYPs),and the glutathioneS-transferases (GSTs). The number of P450s (87) was lower than inA.glabripennis(the Asian longhorned beetle,a globally significant invasive species) (McKennaet al.2016),T.castaneum(a model organism and omnivorous beetle),D.ponderosae(the mountain pine-boring beetle,a major forest pest) (Keelinget al.2013),andO.taurus(the horned beetle) (Choiet al.2010);it was higher thanA.planipennis(an invasive phloem-feeding insect pest of ash trees) (Appendix X).The number of COEs (93) was higher than inA.glabripennis,T.castaneum,D.ponderosae,O.taurus,andA.planipennis. The result of GSTs (26) was higher than inA.glabripennis,T.castaneum,O.taurus,andA.planipennisand was lower thanD.ponderosae. A total of 90 putative cuticle protein genes were lower than inA.glabripennis,T.castaneum,D.ponderosae,andO.taurusand were higher thanA.planipennis.
As an obvious comparison for the coleopteran insectC.formicarius,the Coleopteran insectA.glabripennishas 61 genes encoding OBPs,132 encoding odorant receptors (ORs),234 encoding gustatory receptors(GRs),72 encoding ionotropic receptors (IRs),17 encoding chemosensory proteins (CSPs),and four encoding sensory neuron membrane proteins (SNMPs)(McKennaet al.2016). We manually annotated 36 OBPs,154 ORs,46 GRs,39 IRs,13 CSPs,and 4 SNMPs in theC.formicariusgenome (Appendix U).
The notable exceptions are the OR and OBP families,in which a total of 190 components (154 ORs and 36 OBPs) were found,indicating massive gene expansion in theC.formicariusgenome. We compared the OR and OBP gene families involved in chemosensory activity betweenC.formicariusandT.castaneumand betweenD.ponderosaeandA.glabripennis(Appendix W).Cylasformicariushas 153 OR genes in addition to the highly conserved OR coreceptor CforOrco. These include representatives of all seven subfamilies of beetle ORs except group 4/6 and follow the pattern of frequent paralogous radiation typical of insect chemoreceptors. One new lineage of ORs was identified and placed as group 8 inC.formicarius(CforOR91-150). ManyC.formicariusORs are in tandem arrays (Fig.4) and are derived from recent expansions.Cylasformicariusmay thus harbour the larger identified insect OR repertoire because there are 46 ORs inA.planipennis,79 inD.ponderosae,121 inA.glabripennis,and 270 inT.castaneum(Mitchellet al.2020). The majority of identified OBPs comprise a large expansion of the minus-C subfamily,and the remaining genes encode the classic 6-cysteine motif and are placed alone or in a small radiation pattern. Five OBPs(CforOBP10,CforOBP11,CforOBP21,CforOBP22,and CforOBP24) were identified as members of the plus-C group and were identical toT.castaneum(TcasOBP6),D.ponderosae(DponOBP26),andA.glabripennis(AglaOBP15) (Fig.5).
Fig.4 Phylogenetic tree of the olfactory receptor family. The receptor sequences were from Cylas formicarius (Cfor,red),Dendroctonus ponderosae (Dpon,blue),Anoplophora glabripennis (Agla,green),Tribolium castaneum (Tcas,violet),and Onthophagus taurus (Otau,brown). The coloured arcs indicate the seven major coleopteran OR groups. To reduce tree size,the massively expanded coleopteran-specific OR lineages in former OR groups 1-7 are represented here by five ORs each. The sources of sequence data and explanation of receptor suffixes are detailed in Appendix W.
Fig.5 Phylogenetic tree of odorant binding protein family. The odorant binding proteins originated from Cylas formicarius (Cfor),Tribolium castaneum (Tcas),Anoplophora chinensis (Achi),Sitophilus zeamais (Szea),and Dendroctonus ponderosae (Dpon).
To obtain initial insights into expression differences among tissue samples,this study comparatively analyzed the expression ofCforOBPsin the main chemosensory tissue,the antennae,heads (the whole head capsule excluding the antennae),legs,thoraxes (excluding head and legs),and abdomens of males and females.
The expression of theCforOBPs was restricted to the main chemosensory tissues (antennae and heads)(Fig.6). The transcripts of 24 of the 33CforOBPs were significantly enriched in antennae (CforOBP4,CforOBP5,CforOBP7,CforOBP9,CforOBP10,CforOBP11,CforOBP12,CforOBP14,CforOBP15,CforOBP16,CforOBP17,CforOBP19,CforOBP20,CforOBP21,CforOBP22,CforOBP23,CforOBP24,CforOBP27,CforOBP28,CforOBP29,CforOBP32,CforOBP34,CforOBP35,andCforOBP36),whereas three were enriched in the mouthparts (CforOBP4,CforOBP5,andCforOBP11). Statistical analysis of the male and female antennal samples showed no significant difference;13 of the 24CforOBPs were significantly enriched in female antennae (CforOBP5,CforOBP7,CforOBP10,CforOBP14,CforOBP17,CforOBP19,CforOBP20,CforOBP21,CforOBP22,CforOBP29,CforOBP32,CforOBP34,andCforOBP35).Nevertheless,CforOBP16,CforOBP23,andCforOBP28showed more than five-fold overexpression in male antennae compared to female antennae (Fig.6). The fact that major and significant differences between males and females is consistent with anatomical data from the antennal lobe,where sexual dimorphism was found (Sutherland 1986b). Interestingly,seven of theCforOBPs were significantly enriched in the female abdomen (CforOBP10,CforOBP13,CforOBP19,CforOBP25,CforOBP30,CforOBP32,andCforOBP33).Most of theCforOBPs were expressed in the main chemosensory tissue,that is,antennae and heads,and only seven were significantly enriched in female antennae compared to female legs. Thus,these genes were most likely exclusively involved in chemosensory processing.
Fig.6 Expression profile of CforOBPs. Heatmap showing the relative expression level of CforOBP4-36 as log2 in different tissues(adult antennae,heads (missing antennae but including mouth parts),legs,thoraxes,and abdomens). The color change from red to purple indicates that the RNA expression level is from high to low. The asterisks mark statistically significant differentially expressed genes compared to the expression in female legs. The black asterisks represent upregulation,and the red asterisks represent downregulation (*,P<0.05;**,P<0.01;***,P<0.001).
The prokaryotic expression vector pET30a/CforOBP4-6 was successfully expressed inE.coli.Purified recombinant mature proteins were obtained by cleaving the His-Tag using enterokinase. The purified recombinant proteins were examined by SDS-PAGE(Appendix Y).
TheKdvalues for CforOBP4-6 bound to 1-NPN were (3.30±0.15),(3.07±0.19),and (3.49±0.25) μmol L-1,respectively (Fig.7-A). Representative 1-NPN displacement curves are displayed in Appendix Y. Based on these curves,the median inhibitory concentration(IC50,displacement of more than 50% of 1-NPN) and the reciprocal values of the dissociation constant (Ki) were calculated.
The binding test results indicated that theKivalues of recombinant CforOBP4-6 with the sex pheromones were (1.56±0.23),(1.06±0.22),and (1.35±0.09) μmol L-1,respectively (Table 2),indicating strong binding affinities. Among the 43 tested sweet potato volatiles,kaempferol-3-glucoside showed the highest binding affinity for CforOBP5,with aKiof (0.29±0.03) μmol L-1;it was followed by the affinity of kaempferol 3,7,4′-trimethyl ether for CforOBP5 (aKiof (1.37±0.03) μmol L-1) and those of tiliroside,1-aminoanthracene,and rhamnetin for CforOBP6 (Kivalues of (1.69±0.11),(7.05±0.31),and(8.11±0.25) μmol L-1,respectively) (Fig.7).
Table 2 Binding affinities of the CforOBP4-6 proteins to 102 chemical compounds
Fig.7 Competitive binding curves of CforOBP4-6 with various odorant compounds. A,affinity of CforOBP4-6 for the fluorescent probe N-phenyl-1-naphthylamine (1-NPN). B,binding of CforOBP4 to ligands. C,binding of CforOBP5 to ligands. D,binding of CforOBP6 to ligands. E,comparison of the binding ability [calculated as Ki-1 (reciprocals of the dissociation constants) values]of these three proteins with (Z)-3-dodecen-1-yl(E)-2-butenoate and 22 ligands that exhibited significant affinity (mean±SE;n≥3).
Fig.7 (Continued from preceding page)
Cylasformicariusis the most important pest of sweet potato and invades many of the main areas of sweet potato production throughout the tropics and subtropics(Capineraet al.1999). Its morphological,ecological,and physiological characteristics have become clear.The mitochondrial genome (Yang and Li 2019) and the transcriptome ofC.formicarius(Maet al.2016;Binet al.2017) have been previously studied. However,abundant genomic resources and genome-wide molecular markers ofC.formicariusand related species are still lacking.The genome sequence ofC.formicariusprovides a novel resource for Brentidae,many species of which are economically and ecologically important pests in agriculture. At the same time,it provides comparativedata for the genome sequence and provides the basis for evolutionary research on Coleopteran and other insects.
In this study,k-mer analysis was slightly larger than assembly genome size,which may be affected by data quality,software versions,and parameter settings. TheC.formicariusassembly genome size is similar to the mean assembly size of the previously published coleopteran genomes (ranging from 110 to 850 Mb) (Hleremaet al.2017) and is comparable to the genome sizes of elateroid beetles (Plastocerusangulosus) (367 Mb) (Kusyet al.2018)and blister beetles (Hycleusphaleratus) (308 Mb) (Wuet al.2018). These analyses indicated that theC.formicariusgenome obtained was a high-quality assembly,which was similar to ladybird beetle (Propyleajaponica) (Zhanget al.2020) and greenhouse whitefly (Trialeurodes vaporariorum)(Xieet al.2020). The proportion of repetitive sequences in theC.formicariusgenome was higher than that in other coleopteran genomes,such as theHypothenemus hampei(2.7% of assembly) (Vegaet al.2015),Nicrophorus vespilloides(12.85% of assembly) (Cunninghamet al.2015),Leptinotarsadecemlineata(17% of assembly)(Schovilleet al.2018),D.ponderosae(17 and 23% of assembly for males and females,respectively) (Keelinget al.2013),HycleuscichoriiandH.phaleratus(22.73 and 13.47% of assembly,respectively) genomes;it was similar to that in thePyrocoeliapectoralis(44.88% of assembly) (Fuet al.2017) andP.japonica(58.22%) (Zhanget al.2020)genomes.
This study reported the first draft genome sequence ofC.formicariusat the chromosome level using the Illumina and PacBio sequencing platforms and Hi-C technology.With recent developments in SMRT sequencing technology,chromosomal-level genome assembly using long-read sequencing strategies and Hi-C technology was also reported recently in insects (Yanget al.2020). The high-quality genome assembled in the present study could provide an important resource for the molecular ecological development ofC.formicariusand its related species.
In order to understand the functional genomic properties of insecticide resistance,the detoxification pathways ofC.formicariuswere explored. In the natural state,due to the hidden feeding habits ofC.formicariuslarvae and the nocturnal activities of the adults,control of this insect pest usually and heavily relies on traditional chemical insecticides,which increase the potential risk of pesticide residue and resistance (Huaet al.2021b). The large number of detoxification gene families expansion inC.formicariusindicates that these metabolic enzymes play an important role in detoxification (Zhanget al.2020).
An important determinant of insect abundance and distribution is temperature,with extremes of heat and cold eliciting adaptive induction of HSP gene expression required for survival. With global warming,the transportations of sweet potato seedlings and varieties,and the improvement of sweet potato storage conditions,C.formicariustend to migrate to high latitude (Sutherland 1986a). Hsps are essential molecular chaperones involved in numerous normal cellular processes and various kinds of environmental stress (Jinet al.2020). Cold hardening ofLiriomyzasativapupae andD.melanogasterpromotes the sHSP and HSP70 synthesis,suggesting sHSP and HSP70 are involved in the cold tolerance of insects (Huanget al.2009). The results implied that sHSP and HSP70 might be essential for the lower cold tolerance ofC.formicarius.
To better study insect behaviour,the olfactory mechanism ofC.formicariuswas explored. We annotated a complete set of chemosensory genes in the existingC.formicariusgenome. During interactions with the environment,C.formicariusgene families are probably involved in various sensory processes,including searching for food sources,locating mates and spawning sites,avoiding predators,exchanging information between individuals,and socializing among groups (Huaet al.2021b). Studies have shown that the male adults have a high affinity for sex pheromones released by the females(Coffeltet al.1978;Heathet al.1986),but the mechanisms of sex pheromone perception,especially ORs,have not been reported. Some key functional genes that appeared to expand inC.formicariuswere also identified;these genes may play a role in environmental adaptation (Wuet al.2019). These functional genes are associated with ORs,GRs,and IRs,as well as OBPs,CSPs,and SNMPs,revealing their ability to interact with a diverse chemical environment (Keelinget al.2013). The large numbers ofC.formicariusandT.castaneumORs are thought to be due to current or past difficulties in findings hosts and food (Wurmet al.2011). As has been suggested forSolenopsisinvicta(Wurmet al.2011),the large number forC.formicariusmay be due to the importance of chemical communication among weevils. OBPs constitute an essential family of genes also known to play roles in chemosensation inDrosophila(Xuet al.2005). Among these expanded genes,we identified ORs and OBPs that were inferred to mediate taste chemical neurotransmission and are potential targets for biological control (Satoet al.2008;Wurmet al.2011). At the same time,the binding characteristics of CforOBP4-6 were identified. The fluorescent competitive binding assay results indicated that CforOBP4-6 had strong binding affinities for sex pheromones and other ligands. In a previous study,CforOBP1-3 was also enriched in antennae,whereas CforOBP1 was also detected in the abdomen and legs.Functional characteristic analysis results suggested that CforOBP1-3 is involved not only in the reception of sex pheromones but also in the behaviour of searching for host plants (Huaet al.2021b). To further study the role of CforOBPs in olfactory recognition ofC.formicarius,we performed functional analysis of CforOBP4-6in vitro,which was highly expressed in the antennae.
These putative candidates will be particularly important for further research. For example,exploration of the function of ORs will help clarify the odorant recognition mechanism of insects and provide a theoretical basis for screening more effective behavioural interference factors at the molecular level. The findings provide a new strategy for pest management by regulating odorant perception behaviour;therefore,ORs are believed to play an important role in the recognition of volatile compounds by insects (Fleischeret al.2018) and are frequently used to study the relationship among ecological specialization,adaptability,and gene family evolution (Anderssonet al.2019;Mitchellet al.2020). In the future,the gene families that exhibited expansion or contraction will be further studied. The functional verification of these candidate species will help study the mechanism by which some invasive species adapt to other species and environments.
This study constructed the first high-quality chromosomelevel genome ofC.formicarius,performed a comparative genomic analysis between this species and 15 other species,and found that OR and OBP gene families were expanded inC.formicarius. These datasets provide a wealth of information for studying this species’ genetics and evolutionary mechanisms and valuable resources for further studies on the molecular mechanisms of stress resistance,allowing researchers to identify important functional genes and population genetic patterns ofC.formicarius. A complete genome sequence will advance the understanding of the molecular mechanisms underlying the tolerance processes to insecticides and abiotic and biotic stresses and will accelerate studies on population genetics,facilitating the development of IPM ofC.formicarius.
Acknowledgements
This research was supported by the Natural Science Foundation of Guangxi Autonomous Region,China(2022GXNSFAA035558),the Technology Development Foundation of Guangxi Academy of Agricultural Sciences(2021ZX09),the China Agriculture Research System of MOF and MARA (CARS-10-B3 and CARS-10-C19),the Guangxi Innovation Team Construction Project(nycytxgxcxtd-11-03),and the Priority Academic Program Development of Jiangsu Higher Education Institutions(PAPD),China.
Declaration of competing interest
The authors declare that they have no conflict of interest.
Appendicesassociated with this paper are available on http://www.ChinaAgriSci.com/V2/En/appendix.htm
Journal of Integrative Agriculture2023年3期