• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    Comparative genomic analyses reveal cis-regulatory divergence after polyploidization in cotton

    2022-12-02 01:00:32JiqiYouMinLinZhenpingLiuLiulingPeiYuexunLongLiliTuXinlongZhngMojunWng
    The Crop Journal 2022年6期

    Jiqi You,Min Lin,Zhenping Liu,Liuling Pei,Yuexun Long,Lili Tu,Xinlong Zhng,Mojun Wng,*

    a National Key Laboratory of Crop Genetic Improvement,Hubei Hongshan Laboratory,Huazhong Agricultural University,Wuhan 430070,Hubei,China

    b School of Pharmaceutical Sciences,South-Central University for Nationalities,Wuhan 430074,Hubei,China

    Keywords:Cotton Polyploidization Transcriptional regulation Fiber development

    ABSTRACT Polyploidization has long been recognized as a driver for the evolutionary formation of superior plant traits coupled with gene expression novelty.However,knowledge of the effect of regulatory variation on expression changes following polyploidization remains limited.In this study,we characterized transcriptional regulatory divergence by comparing tetraploid cotton with its putative diploid ancestors.We identified 144,827,99,609,and 219,379 Tn5 transposase-hypersensitive sites(THSs)in Gossypium arboreum,G.raimondii,and G.hirsutum,respectively,and found that the conservation of promoter THSs was associated with coordination of orthologous genes expression.This observation was consistent with analysis of transcription-factor binding sites(TFBS)for 262 known motifs:genes with higher TFBS conservation scores(CS)showed less change than those genes with lower TFBS CS in expression levels.TFBS influenced by genomic variation were involved in the novel regulation networks between transcriptional factors and target genes in tetraploid cotton.We describe an example showing that the turnover of TFBS was linked to expression pattern divergence of genes involved in fiber development(fiber-related genes).Our findings reveal the regulatory divergence of the transcriptional network in cotton after polyploidization and characterizes the regulatory relationships of genes contributing to desirable traits.

    1.Introduction

    Polyploidization is a common phenomenon in plant evolution,merging two or more genomes into one nucleus and leading to variation in phenotype and gene expression[1-4].In polyploids,gene expression and regulation are more complex than in diploids.The expression divergence of subgenomic homologous genes has been thought[5]to be mostly inherited from parental donor species.Gene expression in polyploids has characteristics such as‘‘genome dominance”[3,6-8]and‘‘homologous bias”[7,8].

    As a major food crop,wheat(Triticum aestivum),a natural allohexaploid,is used as a model to investigate homologous expression dominance.The asymmetric expression of genes associated with baking quality was identified by comparison of the transcriptomes of endosperm cells in seven different tissues or developmental stages[9].Transcriptome data from 22 tissues showed that 30% of homologous triads(composed of A,B,and D subgenome copies)displayed unbalanced expression[10].In the F1hybrid of Triticum turgidum(AABB)and Aegilops tauschii(DD),32.6% of D-subgenome genes were down-regulated in comparison with their parental counterparts,but only respectively 1.9% and 1.7% of genes were down-regulated in the A and B subgenomes[11].Similar findings have been reported for other polyploids[12-16].Although unbalanced epigenetic modification[17-20],homologous recombination[21],3D genome architecture[22],reactivation of transposable elements[23],and small interfering RNA(siRNA)expression[24]have been shown to drive gene expression bias or novelty,there is still a lack of understanding of the complexity of transcriptional regulation of subgenomic genes mediated by cis-regulatory elements in polyploids.

    Transcriptional factors(TFs)regulate gene expression by binding cis-acting elements,a complex process because of its spatiotemporal specificity.In the past decade,accessible chromatin regions were found to have regulatory functions,as active regulatory regions can be accessible to TFs.High-throughput technologies,such as formaldehyde-assisted isolation of regulatory elements(FAIRE-seq)[25],micrococcal nuclease digestion of chromatin(MNase-seq)[26],and DNase I digestion followed by highthroughput sequencing(DNase-seq)[27]have been developed to identify accessible chromatin regions.Assay for transposaseaccessible chromatin with high-throughput sequencing(ATACseq)interrogates regions of accessible chromatin with insertion of hyperactive transposase Tn5.ATAC-seq peaks were highly consistent with DNase-seq peaks while requiring fewer cells,making it useful for studies with limited material[28,29].Unlike these methods,chromatin immunoprecipitation sequencing(ChIP-seq)[30]and DNA affinity purification sequencing(DAP-seq)[31,32]have been used to identify transcription factor binding sites via immunoprecipitation of DNA-protein complexes.Application of these high-throughput methods has provided a comprehensive map of TF binding activity[31-33].Based on these methods,TF binding sequences were collected for de novo motif discovery[31,34-36].Because of the evolutionary conservation of TF motifs among plants,TF binding sites in other plants can be inferred based on motifs in TF motif databases,such as HOMER[37]and TRANSFAC[38].

    Cotton(Gossypium spp.),is a major source of natural textile fiber.Approximately 1-2 million years ago(MYA),new allotetraploid species(genome AADD)were created by an intergenomic hybridization event between A-and D-genome ancestor species[2,39].The wild allotetraploid G.hirsutum has been domesticated into an annual crop,accounting for 90% of global cotton fiber production.In contrast to the putative donor species G.raimondii and G.arboreum,cultivated G.hirsutum produces long,white fibers[40].

    The unbalanced expression of orthologous genes has been identified in many tissues of diploid and tetraploid cotton species[41-45].Between the two diploid ancestors,42.0% to 50.0% of genes were differentially expressed in leaves[7].In natural allopolyploid or F1hybrid cotton,59.4%-70.9% of orthologous gene pairs maintain the same expression bias mode relative to the pairs between parental diploids,when 7.3%-19.1% showed novel bias mode[7].In fiber development,more genes showed biased expression towards the D subgenome than A subgenome[46].The regulatory mechanisms underlying these changes are largely unknown.The previous study[47]indicated G.hirsutum experienced asymmetric subgenome domestication,which had effect on cis-regulatory divergence.

    The objective of the present study was to characterize cisregulatory divergence after polyploidization.We used ATAC-seq to identify accessible chromatin regions in G.arboreum,G.raimondii,and G.hirsutum.We also predicted TFBSs of 262 TF binding motifs in promoter sequences in the three species.By comparing the accessible chromatin regions and TF binding sites between orthologous genes,we found that gene pairs with conserved chromatin accessible regions or regulatory sequences exhibited less expression change after polyploidization.This study provides a resource for decoding the regulatory divergence of key genes associated with the formation of desirable traits in cotton.

    2.Materials and methods

    2.1.Plant materials and growth conditions

    Cotton plants(including G.arboreum,G.raimondii,and G.hirsutum)were grown in a climate-controlled greenhouse at 28-32 °C under a 14-h day/10-h night photoperiod with identical management practice.Young leaves were collected and divided into two equal parts for ATAC-seq and RNA-seq.For each experiment,two biological replicates were used.

    2.2.Library construction for ATAC-seq

    For each species,0.1 g of fresh leaves were collected and ground in liquid nitrogen.The powder was resuspended in 30 mL of precooled NPB buffer(20 mmol L-1MOPS,40 mmol L-1NaCl,90 mmol L-1KCl,2 mmol L-1EDTA,0.5 mmol L-1EGTA,0.5 mmol L-1spermidine,0.2 mmol L-1spermine),and shaken on ice for 15 min.This suspension was then filtered through a double-layer Miracloth Magic Filter cloth and the effluent was collected in a 50-mL Falcon tube and centrifuged at 3000×g for 10 min at 4 °C,and the supernatant was discarded.The sediment was resuspended in 2 mL of precooled EB2(0.25 mol L-1sucrose,10 mmol L-1Tris-HCl pH 8.0,10 mmol L-1MgCl2,5 mmol L-1βME,0.1 mmol L-1OMSF,protease inhibitor,ddH2O)and filtered through a 20-μm nylon membrane with gentle and repeated suction and agitation of the homogenate on the membrane to capture as many nuclei as possible,and the effluent was collected in a 2-mL centrifuge tube and centrifuged at 3000×g for 10 min at 4 °C.The pellet containing nuclei was resuspended in 0.5 mL of NEBuffer 2(New England Biolabs,Ipswich,MA,USA).The nuclei were stained with DAPI(Sigma-Aldrich,Saint Louis,MO,USA),and sorted by BD Aria SORP flow cytometry(Becton,Dickinson and Company,Franklin lake,NJ,USA).After centrifugation at 500×g and 4 °C,and discarding the supernatant,100,000 nuclei were collected in a 1.5-mL centrifuge tube and stored on ice.Library construction was performed following a published protocol[48].The final library was sequenced with MGISEQ-2000 sequencing system(MGI Tech Co.,Ltd.,Shenzhen,Guangdong,China)(Table S1).

    2.3.THS identification

    BWA-mem[49]was used to map reads to the reference genomes(G.arboreum[50],G.raimondii[50]and G.hirsutum[51])with default settings.The mapped reads from mitochondrial and chloroplast DNA were filtered out.To remove PCR duplicates and keep mapped reads of high mapping quality,Samtools(version 1.8)[52]with two commands(samtools rmdup;samtools view-bq 30)was used.As previously[48]described,sequencing reads with the same mapping coordinates but with different unique molecular identifier(UMI)sequences were kept.All kept reads were adjusted by‘‘start sites in forward strand+4 bp,start sites in negative strand-5 bp”to determine the Tn5 insertion positions.Peak calling on ATAC-seq data was performed using the MACS2 package[53]with parameters‘‘--nomodel-keep-dup all-B-SPMR-callsummits”.

    2.4.RNA-seq data analysis

    For leaf samples,total RNA was extracted using the Spectrum Plant Total RNA Kit(Sigma-Aldrich,Saint Louis,MO,USA).Libraries were constructed and sequenced on an MGISEQ-2000 system.Public RNA-seq data(including petal,ovule,and fiber samples)from different laboratories were used to calculate gene expression(Table S2).Hisat2(version 2.1.0)[54]was used to map the sequence data to the reference genome.The mapped data were filtered using Samtools with programs(samtools rmdup and samtools view-bq 20).StringTie(version 1.3.0)[55]was used to calculate the fragments per kilobase million(FPKM)as genes expression levels.

    2.5.Identification of orthologous genes

    To identify orthologous gene pairs between diploid and tetraploid cotton,we combined MCScanX and BLAST+to improve accuracy[56,57].We used protein sequences of genes to conduct a bidirectional blastp analysis embedded in BLAST+(version 2.7.1)[57]with parameters‘‘-evalue 1e-20-outfmt 6”.Only the two-way best gene pairs with P-value<1e-20 were retained.MCScanX was used to identifiy collinearity blocks with parameters‘‘-g 3-s 5”.When one gene was located in more than one collinearity blocks,it’s orthologous gene in the largest collinearity block were retained.The genes found by both MCScanX and BLAST+,including 16,315 orthologous pairs between G.arboreum and G.hirsutum A subgenome(Ga-GhAt),and16,861 orthologous pairs between G.raimondii and G.hirsutum D subgenome(Gr-GhDt),were retained for further analysis.

    2.6.Mapping diploid THSs to the tetraploid genome

    The collinearity blocks identified in section 2.5 narrowed the range of sequence alignment between diploid and tetraploid cotton.For each collinearity block,we preformed‘‘inside block”LASTZ(version 1.04.03)[58]with settings(--notransition--step=20--strand=both--format=lav--allocate:traceback=200 M--hspthresh=8000--nogapped)to achieve block alignment.We used the UCSC Genome Browser tools chainPreNet,chainNet and netChainSubset(detail usage:http://genomewiki.ucsc.edu/index.php/HowTo:_Syntenic_Net_or_Reciprocal_Best)[59]to optimize block alignment files in turn.We used bnMapper[60]to extract positions of regions homologous to diploid THS from optimized block alignment files.For each diploid THS,a conserved THS was defined as one aligned to just one homologous region.A badly mapping THS was defined as one aligned to more than one homologous region.A non-mapping THS was defined as one not located in blocks or located in blocks but without the homologous region.

    2.7.Identification of TFBS using motif database

    The promoters(2 kb upstream of TSS)sequences from G.arboreum[50],G.raimondii[50],and G.hirsutum[51]genomes were extracted to identify transcription-factor binding sites.HOMER(version 4.10.4)[37]software with findMotifs.pl program was used.The motif file was from ChIP-seq or DAP-seq experiments in Arabidopsis,rice and other plants.Only TFBSs with a motif score above 10 were retained.A total of 506 plant TF motifs are contained in the HOMER database,and only 262 TF motifs were related to TFBS after removing unknown motifs.

    2.8.Calculation of TFBS conservation score

    For all TFBSs located in the promoters of an orthologous gene pair,if we identified TF-a TFBSs in both diploid and tetraploid promoters,the TFBSs of TF-a were defined as conserved TFBSs.If we identified TF-b TFBSs in the diploid but not in the tetraploid promoter,the TFBSs of TF-b were defined as non-conserved TFBSs.If we identified TF-c TFBSs in the tetraploid but not in the diploid promoter,the TFBSs of TF-c were defined as non-conserved TFBSs.For the diploid gene of an orthologous pair,all TFBSs in a promoter were classified as conserved TFBSs or non-conserved TFBSs,and a conservation score(CS-D)was calculated as(counts of conserved TFBSs/counts of total TFBSs)×100%.The same process was used to calculate conservation score for tetraploid gene(CS-T)of this orthologous pair.The method was used to calculate conservation scores for 16,315 Ga-GhAt gene pairs and 16,861 Gr-GhDt gene pairs.The mean conservation score was calculated as(CS-D+CST)/2.

    2.9.Genomic variation analysis

    For each pair of orthologous gene pair,promoter sequence of diploid gene was aligned to promoter of tetraploid gene with MUSCLE(version 3.8.1551)[61],and sequence variation was identified with an in-house Perl script.Base substitutions were treated as single nucleotide variations(SNVs),deletions or insertions of shorter than 50 bp were treated as insertion-deletions(InDels),and deletions or insertions longer than 50 bp were treated as structural variations(SVs).

    2.10.Identification of differentially expressed genes

    The prepDE.py script from StringTie(version 1.3.0)was used to obtain the RNA-seq read count matrixes.Differentially expressed genes were identified using the edgeR[62]package(false discovery rate(FDR)<0.05 and|log2(fold change)|>2).

    2.11.Regulatory network construction

    A TF-gene(fiber-related gene)network was constructed,in which edges were based on predicted TFBSs located in gene promoter regions.A total of 58 known fiber-related genes were incorporated in the network with the following conditions:(1)genes were annotated;(2)genes were differently expressed between diploid and tetraploid cotton in 10 days post anthesis(DPA)fiber or 20 DPA fiber;(3)genes were known to be functional in other plants.A set of 167 TFs that were predicted to bind promoters of these genes were also incorporated in the network.Cytoscape[63]was used to visualize the network,in which edges represented TF-DNA interactions.

    3.Results

    3.1.Genome-wide identification of THSs in three cotton species

    In order to document the genome-wide landscape of open chromatin in cotton,we performed Tn5 transposase digestion of chromatin in G.arboreum,G.raimondii,and G.hirsutum.A total of 1611 million ATAC-seq reads were mapped to the reference genome of each species(Table S1).In view of the high reproducibility of biological replicates for each species,we combined the replicates for downstream analyses(Fig.S1).

    Respectively 144,827,99,609,and 219,379 Tn5 THSs were identified in G.arboreum,G.raimondii,and G.hirsutum(Tables S3-S5).Genomic features of THSs in each cotton were characterized,including the distribution in 3′UTR,5′UTR,exon,intron,promoter region(2 kb upstream of transcription start sites),and intergenic region(>2 kb from transcription start sites)(Fig.1A).The distribution of THSs was similar in the three species:42.2%-45.6%of THSs were found in intergenic regions and 24.3%-28.7% in promoter regions.The proportion of THSs located in promoter regions decreased with increasing genome size,with 28.7% of THSs in G.raimondii(~0.7 Gb),25.5% in G.arboreum(~1.6 Gb),and 24.3% in G.hirsutum(~2.3 Gb).Plotting genomic position of THSs relative to genes showed that THSs in cotton were enriched around genes relative to intergenic regions(Fig.S2).

    To investigate the relationship between THSs and gene expression,we examined the expression level of genes using RNA-seq data in each cotton(Fig.1B;Table S2).Genes without observed expression(FPKM=0)were grouped into bin 1 and the other genes(FPKM>0)were grouped into five quintiles(bin 2-bin 6)based on expression level.It appeared that genes with higher expression showed higher chromatin accessibility.We categorized genes into two categories with or without THSs in promoter regions.In all three cottons,the expression of genes with THSs was significantly higher than that of genes without THSs in promoter regions(Fig.1C).We investigated the effect of distance(between the midpoint of the gene body and the midpoint of the nearest THS)on gene expression(Fig.1D).For each species,all genes were divided into eight groups with distance ranging from 0 to 2,2-5,5-10,10-20,20-50,50-100,100-200,and≥200 kb.In G.arboreum,48.3%(19,786 of 41,000)genes had THSs within 2 kb,and 0.6%(234 of 41,000)genes were located>200 kb from the nearest THS.In G.raimondii,41.4%(16,658 of 40,281)genes had THSs within 2 kb,and 0.007%(3 of 40,281)genes were located outside 200 kb of the nearest THS.In G.hirsutum,47.1%(32,476 of 68,900)genes had THSs within 2 kb,and 0.08%(60 of 68,900)genes were located outside 200 kb of the nearest THS.Thus,only a few genes were located far from accessible chromatin.The expression of genes showed a gradually decrease with increasing distance.

    Fig.1.ATAC-seq profilings in G.arboreum,G.raimondii,and G.hirsutum.(A)Distribution of ATAC-seq THSs relative to genomic features in each cotton.(B)Line charts and heat maps for THSs around genes(-2 to 2 kb).The expressed genes were divided into five quintiles(from low expression[bin 2]to high expression[bin 6]).Inactive genes(FPKM=0)were classified into[bin 1].The y axes in line charts and colors in heat maps show the THS read counts around genes with expression levels.(C)Correlation of gene expression with the number of THSs in promoters.****,P-value<0.0001.(D)Correlation of gene expression with the distance between genes and the closest THSs.**,Pvalue<0.01;***,P-value<0.001;ns,P-value>0.05.(E)Correlation of gene expression with mean numbers of THSs located in promoters.The genes were divided into six bins as described in Fig.1B.The y axis shows the mean number of THSs for genes in each bin.Red dots and line,mean THS number in G.arboreum;Blue dots and line,mean THS number in G.raimondii;Purple dots and line,mean THS number in G.hirsutum.(F)Correlation of gene expression with the mean length of THSs in promoters.The y axis shows the mean length of THSs for genes,which were divided into bins as described in Fig.1B.Red dots and line,mean THS length in G.arboreum;Blue dots and line,mean THS length in G.raimondii;Purple dots and line,mean THS length in G.hirsutum.

    We further investigated whether the number or the length of THSs was correlated with gene expression(Fig.1E,F).All genes were categorized into six bins based on expression levels in Fig.1B.In each cotton,the highly expressed genes contained a high number of THSs,with a mean 3.5 times greater in bin 6 than in bin 2.In order to avoid the increasing number of THS affecting the statistics of THS length,genes with one promoter THS in each bin were used to calculate the mean THS length.We found that highly expressed genes contained longer THSs,with a mean of 22 bp longer in bin 6 than in bin 2.This result suggested that both the number and length of promoter THSs were associated with gene expression.

    3.2.Effect of promoter THSs on expression divergence after polyploidization

    We next investigated the possible role of promoter THSs in driving unbalanced expression of orthologous genes.To evaluate the sequence conservation of THSs between diploids and tetraploid subgenomes,we aligned THS regions across genomes(Figs.2A,S3).In total,81.6%(118,129 of 144,827,in G.arboreum)and 80.8%(80,455 of 99,609,in G.raimondii)diploid THSs were identified as conserved THSs for further analysis.Diploid THSs were grouped into three categories:conserved and shared THSs(conserved and annotated as THSs in both diploid and tetraploid collinear blocks);conserved but species-specific THSs(conserved and annotated as THSs only in diploid collinear blocks);nonconserved THSs(badly mapping THSs and non-mapping THSs).We found that 21.6%(31,237 of 144,827,in G.arboreum)and 25.3%(25,199 of 99,609,in G.raimondii)diploid THSs were identified as conserved and shared THSs.Compared with all THSs,conserved and shared THSs were enriched in promoter regions(P-value<0.001,chi-square test,25.5%-34.9% in G.arboreum,28.7%-36.7% in G.raimondii)and 5′UTR regions(P-value<0.001,chi-square test,ratio 8.8%-13.8% in G.arboreum,9.1%-13.6% in G.raimondii)(Figs.1A,2B).This result suggested that diploid THSs located around promoters were conserved after polyploidization.Based on the division of diploid THS,tetraploid THSs that overlapped with conserved and shared THSs were defined as conserved and shared THSs.The other tetraploid THSs were roughly defined as non-conserved THSs.

    Fig.2.Conservation of THSs between diploid and tetraploid cotton.(A)In the top panel,G.arboreum THSs are aligned to the G.hirsutum At subgenome.All G.arboreum THSs are divided into conserved THSs,badly mapping THSs,and non-mapping THSs based on block alignment(top bar).Conserved THSs are further divided into conserved and shared THSs and conserved but species-specific THSs based on chromatin accessibility;badly mapping THSs and non-mapping THSs are merged into non-conserved THSs(bottom bar).In the bottom panel,all G.raimondii THSs are aligned to the G.hirsutum Dt subgenome.The description of the bottom panel is similar to that of the top panel.(B)Distribution of conserved and shared diploid THSs relative to genomic features in tetraploid G.hirsutum.(C)Four classes of orthologous gene pairs with different combinations of THSs.Gray boxes,gene body;blue boxes,promoter region;orange boxes,conserved and shared THSs;green boxes,conserved but species-specific THSs;dark gray boxes,non-conserved THSs.(D)The expression levels of the four classes of gene pairs defined in Fig.2C.*,P-value<0.05;****,P-value<0.0001;ns,P-value>0.05.(E)GO enrichment of genes in class 1 and class 2.(F)Expression fold changes of orthologous gene pairs in four classes.Expression fold change:(max FPKM+0.1)/(min FPKM+0.1).(G)Four examples representing four classes of orthologous gene pairs.The genome browser tracks show gene annotation(gray),RNA-seq coverage(blue),and ATAC-seq coverage(pink).Conserved and shared THSs(orange)and conserved but species-specific THSs(green)are indicated by boxes.

    To investigate the effect of promoter THSs on gene expression,respectively 16,315 and 16,861 orthologous gene pairs were identified between the G.arboreum and G.hirsutum At subgenome and between the G.raimondii and G.hirsutum Dt subgenome(Table S6).Orthologous genes were grouped into four classes:class 1(only with conserved and shared THSs);class 2(with conserved and shared diploid THSs and other THSs);class 3(with conserved and species-specific THSs);class 4(without conserved THSs)(Fig.2C).Genes with conserved and shared THSs(class 1 and class 2)showed higher expression than those with conserved but species-specific THSs(class 3).Genes without conserved THSs(class 4)showed the lowest expression(Fig.2C,D).This result suggested that highly expressed genes frequently contained conserved and shared THSs.Gene Ontology(GO)enrichment analysis[64]showed that class 1 genes were enriched in some stressassociated pathways,including cellular response to stress(GO:0033554),positive regulation of defense response(GO:0031349),and positive regulation of cell death(GO:0010942)(Fig.2E).Class 2 genes were enriched in some development-associated pathways,including tissue development(GO:0009888),reproductive shoot system development(GO:0090567),and plant organ development(GO:0099402).

    In order to avoid the influence of overall expression level differences among species,we used expression fold change to quantify the degree of expression variation between orthologous gene pairs.We found that orthologous genes with conserved and shared THSs(class 1 and class 2)showed smaller expression changes than those with conserved but species-specific THSs(class 3),and orthologous genes with conserved but species-specific THSs(class 3)showed smaller expression change than those without conserved THSs(class 4)(Fig.2F).This finding suggested that change of promoter-THSs was associated with expression divergence.

    For example,FATB (Ga:Garb_05G011410,GhAt:Ghir_A05G013090;class 1),encodes an acyl-acyl carrier protein thioesterase with a role in fatty acid synthesis(Fig.2G)[65].There was a conserved and shared THS in the promoter associated with a 1.83-fold expression change.VIH1(Ga:Garb_09G002550,GhAt:Ghir_A09G002220;class 2),encodes a functional VIP1/PPIP5Ktype ATP-grasp kinase with a role in jasmonate-regulated defenses[66].A conserved and shared THS,as well as species-specific THSs,was found in the promoter with associated with a 1.97-fold expression change.CED1(Ga:Garb_11G001840,GhAt:Ghir_A11G001900;class 3),encodes an epidermally expressed extracellular protein that likely functions as an alpha-beta hydrolase and is required for normal cuticle formation[67].Both conserved and species-specific THSs were found in the promoter with a 3.20-fold expression change.CDC68-like genes(Ga:Garb_12G000810,GhAt:Ghir_A12G000790;class 4)did not have conserved THSs in their promoters,and had a 2.30-fold expression change.

    3.3.Characterization of TFBS in diploid and tetraploid cotton

    Exploiting the evolutionary conservation of transcriptional regulation,known motifs in other plant species were used to predict TF binding sites(TFBSs)in cotton.In total,262 non-redundant TF binding motifs were used to identify TF binding sites(Table S7).Respectively 632,318,610,632,and 1,064,100 TF binding sites were identified in gene promoters of G.arboreum,G.raimondii,and G.hirsutum(Tables S3-S5).As with THSs,TF binding sites were enriched around genes relative to intergenic regions(Fig.S4).

    The availability of 262 TF binding motifs provided an opportunity for exploring the genome-wide regulatory features mediated by transcriptional factors.Because different transcription factors may have opposite effects on gene expression,it is difficult to measure the effect of one or several TF binding motif changes in promoter regions at the expression level.In order to study the change of regulatory sequences after polyploidization,TF binding sites were classified according to the sequence variation among orthologous gene pairs:TF binding sites that were conserved between orthologous genes were defined as conserved TF binding sites,and the others were called non-conserved binding sites(Fig.3A).For each orthologous gene pair,a conservation score(CS)was calculated according to the number of conserved binding sites,and then two conservation scores,including CS-D(x-axis in plot)for the gene in the diploid and CS-T(y-axis in plot)for the gene in the tetraploid,were obtained.If a pair of genes contained the same TF binding sites,both CS-D and CS-T scores were 100%.

    CSs were calculated for 16,315 and 16,861 orthologous gene pairs,from Ga to GhAt and from Gr to GhDt,respectively(Table S8).These genes were plotted on a two-dimensional chart,9790 to 11,358 pairs with CS-D>66% and CS-T>66%,1227 to 1958 pairs with CS-D≤33%or CS-T≤33%,3730 to 5113 pairs with other conservation scores(Fig.3B,C).Thus,regulatory sequences were relatively conserved for the majority of genes after polyploidization.

    We next investigated the effect of regulatory sequence variation on gene expression in eight tissues or developmental stages(Fig.3D;Table S2).Orthologous gene pairs were sorted descendingly by their mean conservation scores.The first 33%(5438 of 16,315 from Ga to GhAt,5620 of 16,861 from Gr to GhDt)were defined as pairs with high conservation scores,and the last 33%as pairs with low conservation scores.Orthologous gene pairs with low conservation scores showed higher expression change than those with high conservation scores in a few tissues(Fig.3D).This finding suggested that orthologous genes with low conservation scores were those that had undergone expression divergence after polyploidization.

    3.4.Coordinated regulation of gene expression by THSs and TFBSs

    Both promoter THSs and promoter TFBSs were associated with expression divergence after polyploidization in cotton(Fig.2F,3D).We next sought to identify TF binding sites inside open chromatin regions.To illustrate the putative regulatory relationships in cotton,the TFBSs inside THSs were retained for further analyses.Orthologous genes with conserved and shared THSs(class 1 and class 2)gave higher CSs than those with conserved but speciesspecific THSs(class 3),and orthologous genes with conserved but species-specific THSs(class 3)gave higher CSs than those without conserved THSs(class 4)(Fig.4A).This result suggested that orthologous genes with conserved THSs yielded higher TFBS CSs.

    We quantified the regulatory variation after polyploidization.For each of 262 TF binding motifs,we calculated the number of orthologous genes containing TFBSs inside promoter THSs in G.arboreum,G.hirsutum At subgenome,G.raimondii,and the G.hirsutum Dt subgenome(Fig.4B,C).We found that each individual TF binding motif was predicted to regulate a similar number of genes(cor=0.9802,from Ga to GhAt;cor=0.9837,from Gr to GhDt).Although some TF binding motifs were involved in regulation gain and some were involved in regulation loss after polyploidization,at the global level we found that TF binding motifs in the At subgenome were predicted to regulate fewer genes than those in G.arboreum,while TF binding motifs in the Dt subgenome were predicted to regulate more genes than those in G.raimondii(simple linear regression;slope=0.86559,from Ga to GhAt;slope=1.15910,from Gr to GhDt).This finding suggested that the A and D genomes might have been subject to different evolutionary selection in terms of regulatory sequences after polyploidization.

    We next investigated the extent of conservation of the 262 TF binding motifs in the three cottons.We identified 33 TF binding motifs that were shared among conserved and shared THSs in G.arboreum,G.raimondii,and G.hirsutum(Fig.4D).For example,GBF3 is a transcription factor associated with drought tolerance in Arabidopsis[68].In G.arboreum,GBF3 binding sites were detected in 0.74% of conserved and shared THSs,but 0.38% of all THSs.In the At subgenome,GBF3 binding sites were detected in 0.86%of conserved and shared THSs,but 0.52%of all THSs.In G.raimondii,GBF3 binding sites were detected in 0.87%of conserved and shared THSs,but 0.52% of all THSs.In the Dt subgenome,GBF3 binding sites were detected in 1.1%of conserved and shared THSs,but 0.59% of all THSs.Using all THSs as a control,the GBF3 TFBSs were enriched in conserved and shared THSs(P-value<0.0001,chi-square test).The 33 TF motifs were clustered by motif similarity,including motif sequences ACGTG,TTGTC,TCTCTCTC,AAATATC,TCCATCA,CACGTG,CCGT,and GTGGTCC(Fig.S5A).

    We identified five TF binding motifs that were shared among Gr-specific THSs(including conserved and Gr-specific THSs and non-conserved THSs)(Fig.4E).ERF104 is a transcription factor responsive to cold stress in Arabidopsis[69].In G.raimondii,ERF104 binding sites were detected in 1.2% of Gr-specific THSs,but 0.94%of all THSs.Using all THSs as a control,the ERF104 TFBSs were enriched in Gr-specific THSs(P-value<0.0001,chi-square test).The five TF motifs were clustered by motif similarity,including motif sequence CCGCCG(Fig.S5B).There were no significantly enriched TF motifs among other species-specific THSs.

    Fig.3.Conservation of TFBSs between diploid and tetraploid cotton.(A)A schematic diagram of conservation score(CS)calculation based on transcription binding sites(TFBS).Ellipses represent TF binding both diploid promoter and tetraploid promoter(pink),TF binding only diploid promoter(blue)and TF binding only tetraploid promoter(orange).The short lines in pink,blue or orange represent TFBS.Conserved TFBS(C-TFBS)are circled by red dotted lines and non-conserved TFBS(NonC-TFBS)are circled by blue dotted lines.CS-D,CS-diploid;CS-T,CS-tetraploid.(B)The two-dimensional chart shows orthologous gene pairs based on conservation scores.Yellow bin,high density of gene pairs;Black bin,low density of gene pairs.(C)The stacked plot shows the total proportion of gene pairs.Red boxes,conservation scores within[66%-100%]and[66%-100%];yellow boxes,conservation scores within[33%-66%]or[33%-66%];green boxes,conservation scores within[0-33%]or[0-33%],in the comparisons of Ga-GhAt and Gr-GhDt,respectively.(D)Expression fold changes of gene pairs with high conservation scores(red boxes)and gene pairs with low conservation scores(green boxes)in eight tissues or developmental stages.Expression fold change:(max FPKM+0.1)/(min FPKM+0.1).*,P-value<0.05;**,P-value<0.01;***,P-value<0.001;****,P-value<0.0001;ns,P-value>0.05.

    Fig.4.Dynamics of TFBSs in promoter THSs between diploid and tetraploid cotton.(A)Mean conservation score based on TFBSs in promoter THSs of orthologous genes in four classes.Red bar,orthologous gene pairs from Ga to GhAt;blue bar,orthologous gene pairs from Gr to GhDt.**,P-value<0.01;****,P-value<0.0001;ns,P-value>0.05.(B)Correlation plot showing the number of genes regulated by TF binding motifs in Ga(x-axis)or GhAt(y-axis).A total of 239 dots,representing 239 TF binding motifs.Rectangle shows where the expanded region is located.Red line,y=x;black line with shading,simple linear regression fitting line with confidence interval.(C)Correlation plot showing the number of genes regulated by TF binding motifs in Gr(x-axis)or GhDt(y-axis).A total of 235 dots,representing 235 TF binding motifs.Rectangle shows where the expanded region is located.Red line,y=x;black line with shading,simple linear regression fitting line with confidence interval.(D)Comparison of TFBS enrichment between all THSs and conserved and shared THSs.(E)Comparison of TFBS enrichment between all THSs and five Gr-specific THSs.

    3.5.Influence of genomic variation on promoter TFBS

    The finding that change in promoter TFBSs can influence gene expression led us to investigate the underlying mechanisms with respect to sequence variation.TF motifs are essential short sequences in regulatory regions such as promoters,which are affected by genomic variation,including SNVs,InDels and SVs.

    The promoter sequences(2 kb upstream of TSS)of orthologous genes were aligned.We identified 6876 SVs,357,560 InDels,5,173,936 SNVs from Ga to GhAt,and 9310 SVs,292,878 InDels,and 4,086,309 SNVs from Gr to GhDt(Fig.5A).Among these genomic variations,48.1%-49.6%of SVs,7.5%-7.6%of InDels,and 4.8%-5.0% of SNVs were led to the formation of non-conserved TFBSs.This finding indicated that most genomic variation would not lead to TFBS variation;SVs had a larger role in affecting TFBSs in promoter regions than SNVs.A similar conclusion was obtained by the statistical analysis of cumulative genomic variation length(Fig.5B).But in fact,non-conserved TFBSs were influenced mainly by SNVs,owing to the large number of SNVs(Fig.5C).

    Based on promoter sequence,we identified 6216-6567 gene pairs with cumulative genomic variation length<0.2 kb,7566-8098 gene pairs with cumulative genomic variation length of 0.2-1 kb,and 2196-2533 gene pairs with cumulative genomic variation length>1 kb(Fig.5D).As expected,the mean conservation score dropped with the increase in genome variation length,suggesting that genomic variation was associated with the evolutionary turnover of TFBSs.

    3.6.Regulatory divergence of fiber-related genes in diploid and tetraploid cotton

    Cultivated tetraploid cotton has longer fibers than diploid ancestors,probably because the fiber in diploid cotton stops elongating when fiber in tetraploid cotton is still in rapid growth[40].A comparative transcriptome analysis was performed to identify differentially expressed genes between diploid and tetraploid fibers(FDR<0.05 and|log2(fold change)|>2)(Fig.S6).GO enrichment analysis revealed that up-regulated genes were enriched in pathways associated with fiber elongation and down-regulated genes were enriched in secondary cell wall biosynthesis in tetraploid cotton(Fig.6A).Presumably early secondary cell wall biosynthesis impeded fiber elongation in diploids.

    We next aimed at uncovering regulatory divergence contributing to the expression pattern differences of these genes between diploid and tetraploid cotton.In this analysis,209(between G.arboreum and G.hirsutum)and 282(between G.raimondii and G.hirsutum)fiber-related genes were characterized with differential expression levels between diploid and tetraploid cotton in at least one set of comparisons(10 DPA fiber,20 DPA fiber)(Fig.6B).Among the 491 differentially expressed genes,58 had known functional roles based on previous studies.The 58 genes were used to draw a fiber-related regulatory network(Fig.6C).77.4%(277 of 358)of regulatory relationships between TF and genes in G.arboreum,and 67.6%(227 of 336)of regulatory relationships in G.raimondii were conserved after polyploidization.Respectively 119 and 73 novel regulatory relationships were gained in the At and Dt subgenomes of G.hirsutum.For example,PLASMA MEMBRANE INTRINSIC PROTEIN 1E(PIP1E),functioning as an aquaporin,with an average CS of 67.0%(CS-Ga=67%,CS-GhAt=67%)and 80.5%(CS-Gr=92%,CS-GhDt=69%)(Fig.6D).PIP1E in G.arboreum and At-subgenome showed a larger expression difference than PIP1E in G.raimondii and Dt-subgenome,accompanied by the loss of old regulatory sequences and the acquisition of new regulatory sequences from an insertion/deletion variation(Fig.6E-G).Thus,the identification of TFBSs can facilitate the exploration of regulatory changes of genes associated with fiber development.

    4.Discussion

    In this study,we generated genome-wide chromatin accessibility maps from three cotton species:G.arboreum,G.raimondii and G.hirsutum,and achieved an unprecedented view of accessibility conservation across species after polyploidization.THSs were expected to encompass key regulatory elements that regulated gene expression in native cellular conditions.Genome-wide TFBS maps were established to predict the cis-regulation inside accessible chromatin.Overall,we investigated the similarities and differences in accessible chromatin and cis-regulatory landscape among diploid and tetraploid cottons,allowing us to identify and analyze regulatory relationships between TF and genes.These data can be expected to serve as a valuable resource for cis-regulation research in cotton.

    Fig.5.The effect of genomic variation on regulatory divergence in promoter regions.(A)The counts of different types of genomic variation in the comparison of Ga-GhAt and Gr-GhDt.Green boxes,genomic variation without TFBSs;Purple boxes,genomic variation within TFBSs.(B)The cumulative length of different types of genomic variation in the comparison of Ga-GhAt and Gr-GhDt.Green boxes,genomic variation outside TFBSs;purple boxes,genomic variation in TFBSs.(C)Numbers of non-conserved TFBSs affected by different types of genomic variation.(D)Correlation between the total conservation scores of gene pairs and genomic variation length.**,P-value<0.01;****,Pvalue<0.0001.

    Fig.6.Regulatory divergence of fiber-related genes between diploid and tetraploid cotton.(A)GO enrichment analysis was performed for differentially expressed genes in cotton fiber.(B)Heat maps showing expression fold change of gene pairs between diploids and tetraploid subgenomes in fibers at 10 DPA and 20 DPA.Color indicates higher(red)or lower(green)expression in tetraploid cotton.All fiber-related genes were categorized into eight or seven clusters.The representative genes from nine pathways were annotated as colored dots in each heatmap.The expression fold change was calculated using edgeR(FDR<0.05).(C)The regulatory network of 58 fiber-related genes,which was constructed based on TF binding motifs in target promoters.Fiber-related genes(genes from Ga/GhAt in red,genes from Gr/GhDt in blue)were differentially regulated after polyploidization.Sub-networks of PLASMA MEMBRANE INTRINSIC PROTEIN 1E(PIP1E)were circled in dashed lines as examples.(D)The conservation degree of PIP1E.(E)The expression fold change of PIP1E.(F)TF binding sites in promoters of PIP1E-Ga and PIP1E-GhAt.Landmarks are colored in red;common TFBSs were colored in black;TFBSs gained in GhAt are colored in red;TFBSs lost in GhAt are colored in blue.The genomic variation in TFBSs is highlighted with red circles.(G)TF binding sites in promoters of PIP1E-Gr and PIP1E-GhDt.

    Previous studies[31,70]have found that most regulatory elements were located proximally to target genes.We hypothesized that gene expression was regulated by promoter THSs as in tomato[71].Interestingly,the effect of THSs on gene expression decayed gradually with the increase of distance to genes.Long-distance transcription regulation is mediated by regulatory elements such as enhancers,through higher-order chromatin structures[22].

    In recent years,there have been many studies[70-74]of dynamic transcription regulation in accessible chromatin identified by DNase-seq or ATAC-seq.However,most of the reported dynamic transcription regulation events were from comparisons between different treatments of the same tissue,different development stages of the same tissue,and different tissues of the same species.In closely related species,although they evolved from the same ancestor,the coding sequences and non-coding sequences have changed greatly during independent evolution.One study[74]proposed an approach to analyze THSs among grasses,based on conservation of THS and chromatin accessibility.In the present study,approximately 80%of THSs in G.arboreum and G.raimondii were identified as conserved THSs by comparison with tetraploid G.hirsutum.This value is much higher than the 4.7%-29.6% observed among grasses[74].Given that allotetraploid cotton was formed much later(1-2 MYA)than the divergence of grasses(45-50 MYA),this might be the reason for higher THS conservation in cotton[2,75].Although the majority of THS sequences were conserved,the change of chromatin accessibility expanded the diversity of transcription regulation and led to the difference of gene expression.

    By using TF binding motif models,we present a genome-wide identification of TF binding sites among three cotton species.These sites were used to predict the regulatory relationships between TFs and target genes.Some regulatory relationships were gained and some lost in tetraploid cotton relative to the two diploid cotton species.At this stage,we do not know when these regulatory relationship changes occurred:during polyploidization or during the independent evolution of the tetraploid and the diploids[42,45].These findings provide an insight into the regulatory divergence of homologous gene transcription in cotton.

    Many technologies can be used to investigate the regulatory effect of TFs on a particular gene with well-established transformation platforms.However,when a variety of TFs are found to regulate target genes,it is challenging to predict the change of gene expression patterns.Previous study[33]suggested that the loss of TF occupancy at target promoters might down-regulate gene expression.In this study,we provided evidence using RNA-seq data in eight different tissues/stages to support the hypothesis that variation of TF binding motifs in promoters could lead to expression divergence.These data represent a resource for evolutionary study of transcriptional regulation in cotton.

    In summary,we investigated accessible chromatin regions and cis-regulatory sequence divergence of othologous genes in cotton.Future studies should use more high-throughput methods,such as DAP-seq and ChIP-seq,to uncover a more complete regulatory network mediated by TFs.For key genes associated with agronomic traits,functional study of the evolutionary change of regulatory sequences should be performed with the construction of genome-edited plants.

    Data availability

    The raw sequencing data generated in this study have been submitted to NCBI under the Bioproject accession number PRJNA576032.

    CRediT authorship contribution statement

    Jiaqi You:Conceptualization,Formal analysis,Visualization,Writing-original draft.Min Lin:Data curation,Resources,Writing-review & editing.Zhenping Liu:Investigation.Liuling Pei:Resources.Yuexuan Long:Investigation.Lili Tu:Conceptualization.Xianlong Zhang:Supervision,Writing-review & editing.Maojun Wang:Conceptualization,Methodology,Supervision,Writing-review & editing.

    Declaration of competing interest

    The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

    Acknowledgments

    This work was supported by the National Natural Science Foundation of China(31922069,32170645)and the Fundamental Research Funds for the Central Universities(2662020ZKPY017).We thank the high-performance computing platform at National Key Laboratory of Crop Genetic Improvement in Huazhong Agricultural University.

    Appendix A.Supplementary data

    Supplementary data for this article can be found online at https://doi.org/10.1016/j.cj.2022.03.002.

    乐陵市| 贵州省| 五莲县| 胶南市| 桐庐县| 蕲春县| 昌平区| 米泉市| 庄浪县| 中西区| 沛县| 四子王旗| 象山县| 玛纳斯县| 永昌县| 昌黎县| 大埔区| 酒泉市| 景德镇市| 隆安县| 冕宁县| 深圳市| 怀集县| 镇安县| 三穗县| 德州市| 酒泉市| 浙江省| 安西县| 方山县| 瓮安县| 许昌县| 宜春市| 和静县| 鱼台县| 鄂托克前旗| 寻甸| 夏津县| 博罗县| 浦东新区| 蕉岭县|