• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    CharPlant: A De Novo Open Chromatin Region Prediction Tool for Plant Genomes

    2021-06-07 07:44:58YinShenLingLingChenJunxiangGao
    Genomics,Proteomics & Bioinformatics 2021年5期

    Yin Shen, Ling-Ling Chen,2, Junxiang Gao,*

    1 Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University,Wuhan 430070, China

    2 National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China

    KEYWORDS Open chromatin region;Chromatin accessibility;Convolutional neural network;De novo prediction;Plant genome

    Abstract Chromatin accessibility is a highly informative structural feature for understanding gene transcription regulation, because it indicates the degree to which nuclear macromolecules such as proteins and RNAs can access chromosomal DNA.Studies have shown that chromatin accessibility is highly dynamic during stress response, stimulus response, and developmental transition. Moreover,physical access to chromosomal DNA in eukaryotes is highly cell-specific. Therefore, current technologies such as DNase-seq,ATAC-seq,and FAIRE-seq reveal only a portion of the open chromatin regions (OCRs) present in a given species. Thus, the genome-wide distribution of OCRs remains unknown. In this study, we developed a bioinformatics tool called CharPlant for the de novo prediction of OCRs in plant genomes. To develop this tool, we constructed a three-layer convolutional neural network(CNN)and subsequently trained the CNN using DNase-seq and ATACseq datasets of four plant species. The model simultaneously learns the sequence motifs and regulatory logics,which are jointly used to determine DNA accessibility.All of these steps are integrated into CharPlant, which can be run using a simple command line. The results of data analysis using CharPlant in this study demonstrate its prediction power and computational efficiency. To our knowledge, CharPlant is the first de novo prediction tool that can identify potential OCRs in the whole genome. The source code of CharPlant and supporting files are freely available from https://github.com/Yin-Shen/CharPlant.

    Introduction

    In eukaryotic genomes, most of the chromatin regions are tightly coiled in the nucleus,but some regions,known as open chromatin regions(OCRs)or accessible chromatin regions,are loosely formed after chromatin remodeling.Whether the chromatin is loosely or tightly coiled largely determines transcriptional regulation [1,2]. A number ofcis-regulatory elements interact withtrans-acting factors for transcriptional regulation,andcis-transelements with regulatory functions participate in the process of transcriptional regulation by binding to OCRs [3,4]. For example, when a transcription factor binds to an OCR, it recruits other proteins to initiate the transcription of nearby genes. Therefore, a complete genome-wide map of potential OCRs is helpful for the investigation of changes in the nucleosome location and for the discovery of genome regulatory elements and gene regulatory mechanisms [5,6]. Chromatin accessibility information has even been proven to be valuable for the early diagnosis and treatment of cancer [7,8].

    The OCRs are easier to excise than other regions. Therefore, researchers often use enzymes, such as nuclease and transposase, or physical methods to digest the chromatin.The cleavage-sensitive sites are then sequenced using various technologies, such as DNase I hypersensitive site sequencing(DNase-seq), assay for transposase accessible chromatin sequencing (ATAC-seq), and formaldehyde-assisted isolation of regulatory element sequencing (FAIRE-seq), to obtain further information. DNase-seq has been used for a long time;however, it requires a large amount of starting material(~1×107cells).On the other hand,ATAC-seq requires a significantly smaller sample (< 1×105cells) and has the advantage of requiring no antibody. Therefore, ATAC-seq has become the method of choice in recent years [9,10]. However,none of these techniques are able to solve the problem of open chromatin determination. All nuclease-based methods exhibit a preference for specific sequences for cleavage, depending on the nuclease, which is a major flaw. For example,DNase I exhibits a strong preference for specific sequences,and many DNase-seq results reflect cleavage preferences rather than actual protein binding[3,11,12].Similarly,in the ATACseq method,a preference for cleavage sites has been observed with some of the Tn5 enzymes, resulting in ‘‘false DNA footprints”[13].Although these technologies have been commonly used in human and animal studies [14], their application in plants is still in the exploratory stage. This is because of structural differences between plant and animal cells.Unlike animal cells, plant cells possess a cell wall, numerous chloroplasts, mitochondria, and other organelles that contaminate the assay. Consequently, OCR data have been obtained using DNase-seq and ATAC-seq only in a small number of model plant species, includingOryza sativa[15],

    Arabidopsis thaliana,Medicago truncatula,Solanum lycopersicum[16], andHordeum vulgare[17].

    Previous studies have shown that chromatin accessibility is highly dynamic rather than static. OCRs usually change during stress response, stimulus response, and developmental transition[18,19].Moreover,OCRs in different species are significantly cell-specific [4]. More than 40% of the OCRs in human T-cells differ between functional and exhausted cells at different time points[20].Chromatin accessibility also varies considerably among different cells inDrosophila melanogaster,A. thaliana, andO. sativa[16,21]. Consequently, the current DNase-seq, ATAC-seq, and FAIRE-seq data represent only some of the OCRs and do not present the entire chromatin accessibility information about a given species [22,23]. Thus,a global overview of the distribution of OCRs in genomes is lacking. Moreover, these experimental technologies are generally expensive and time-consuming [3].

    Proteins recognize specific motifs and epigenetic modifications of the DNA sequence that influence its accessibility [24].After training on a specific dataset, machine-learning algorithms can collect sequence information and predict proteinbinding sites, DNA accessibility, histone modifications, and DNA methylation patterns.Many algorithms have been developed to predict regulatory elements,such as Basset[25],Deeperdeepsea [26], DeepBind [27], and DeepCpG [28]. However,these algorithms have a few limitations.First,most of the algorithms are not designed for the prediction of OCRs; instead,they are designed for the prediction of 1)regulatory fragments that bind to transcription factors or RNA-binding proteins,2) DNase sensitivity, and 3) genomic variants.Second, almost all previous studies using such algorithms are based on human or mouse data;however, the OCRs of plants and animals exhibit significantly different characteristics. For example, approximately 39% of the DNase I hypersensitive sites(DHSs)are associated with introns in the human genome,which is remarkably higher than the proportion of intron-associated DHSs inO. sativa(11%) andA. thaliana(5%) [29]. Third, most of the existing methods are developed as conventional classifiers that classify sequence fragments of a certain length(hundreds of base-pairs)as regulatory regions,instead of scanning the whole genome.

    OCRs are usually rich in various elements and specific motif-binding factors. Therefore, it is feasible to scan the genome and predict chromatin accessibility by learning the motifs and their distribution from OCR data. Here, we developed ade novoOCR prediction tool, chromatin accessible regions for plant (CharPlant), based on deep learning to provide a genome-wide overview of OCRs in a given plant species.We constructed training datasets using the DNase-seq and ATAC-seq data of four plant species. CharPlant simultaneously learns the relevant sequence motifs and regulatory logics,which are jointly used to determine DNA accessibility. The trained model accepts DNA sequences or scaffolds as input and generates an outline of OCRs in a ‘‘.bed” file as output(Figure 1A). To our knowledge, this is the first tool capable ofde novoprediction of OCRs from the DNA sequence.

    Method

    Construction of datasets

    Considering that a dataset has to be constructed using representative plant species that have both OCR assay data and high-quality reference genomes, the DNase-seq data ofO.sativaand the ATAC-seq data of four plant species (A. thaliana,S.lycopersicum,M.truncatula,andO.sativa)were downloaded from the PlantDHS database at https://plantdhs.org and the Gene Expression Omnibus (GEO) of NCBI (GEO:GSE101482 and GSE75794) at https://www.ncbi.nlm.nih.gov/geo [15], respectively. Detailed information about the DNase-seq and ATAC-seq data is listed inTable 1, and basic information about the reference genomes of these four plant species is listed inTable 2. Because the DNase-seq and ATAC-seq data used here represent diverse plant species of both dicot and monocot lineages and different cell types of the same species (O. sativa), this model applies to a broad range of plant species with distant evolutionary relationships.

    The use of two mainstream technologies, DNase-seq and ATAC-seq, allowed the inclusion of nuclease-based and transposase-based data. MACS2 software was used for peak calling with default parameters[30].Because of higher statistical power using long fragments,peaks longer than 200 bp were filtered out as positive samples. Positive sequences were shuffled to generate the negative dataset with fasta-shuffle-letters in the MEME software [31]. Unlike the random interception of fragments from the DNA sequence for use as negative samples, shuffling ensures that the negative and positive samples have identical composition of all four bases[27].Shuffling also maintains a balance between positive and negative samples when constructing the dataset.In an unbalanced dataset,classification algorithms focus on the class containing the most samples, which degrades the classification performance of the class that contains a small number of samples. Most machine-learning algorithms do not work well with unbalanced datasets.Therefore,to construct the dataset in this study, a negative sequence was generated using each positive sequence,i.e., the number of positive and negative samples were equal in number. The samples were divided into three sets, training set, validation set, and testing set, which accounted for 60%, 20%, and 20% of the data, respectively.The sample numbers of the three sets are listed in Table 1.

    Table 2 Genome-related information on the four plant species used in this study

    Construction of the CharPlant model

    The CharPlant model is trained using ATAC-seq data of four plants or DNase-seq data ofO. sativa, and each plant has its own model parameters. The model is based on a multilayer convolutional neural network (CNN) (Figure 1B). The CNN model, which originates from artificial neural networks, contains perceptrons with multiple hidden layers and combines low-level features to form more abstract high-level attributes or features for discovering feature representations of data [32,33]. The motivation is to build a neural network that can simulate the human neuron for analysis and learning, and imitate the mechanism of the human brain to interpret data,such as images, sounds, and texts [34–36]. Unlike traditional methods, in which features are manually selected in the preprocessing stage, the CNN adaptively extracts features from large-scale training datasets. It then maps input data to highdimensional representations with abundant information by nonlinear transformation, thus simplifying classification or regression. Early application of the CNN model in DNA sequence analysis surpasses existing mature algorithms, such as support vector machines or Random forests, in predicting protein binding and DNA sequence accessibility [25,27].

    Figure 2 Motifs identified by CharPlant in Oryza sativa and other plant species

    To achieve high computational efficiency, our model is designed with only three hidden layers: the first and second layers are convolutional, whereas the third layer is fully connected. The CNN model used in CharPlant requires binary vectors as input. Each input DNA fragment is first converted into a 4×nmatrix, wherenrepresents the length of the input fragment. Thus, each base is preprocessed with ‘‘one-hot”encoding (A: [1, 0, 0, 0]; C: [0, 1, 0, 0]; G: [0, 0, 1, 0]; T: [0,0, 0, 1]; N: [0, 0, 0, 0]), and the sequence is converted into a matrix with four rows. The first layer of the CNN model contains convolutional filters for the identification of low-level features in a given DNA sequence. A convolutional filter is an essential motif prober that scans each input matrix to discover potential patterns. The identification of low-level DNA features involves the following steps. First, each input sequence is fed into the first convolutional layer, and the convolution kernel slides over the sequence fragment to calculate the activation score. If the activation score of the convolution kernel at a certain position is greater than the preset threshold, the sequence segment centered at that position will be identified and represented by the position frequency matrix (PFM) of four base frequencies. Then, the PFM is used to calculate the information entropy and is transformed into position weight matrix(PWM),which is widely used for the representation of motifs.The PMW contains four rows,and describes the entropy of four bases at each position [25,37]. Subsequently,the sequence logo is used to visualize the motif,i.e., the base size of each position indicates the possibility of the base at this position. To obtain the activation score of the convolutional filter, the rectified linear unit (ReLU) is used as the activation function for three hidden layers. The ReLU functionf(x) is calculated as follows:

    wherexis the input.

    The ReLU function is used by neurons just like traditional activation functions such as sigmoid or hyperbolic tangent.Compared with the conventional activation function, ReLU has much less computational complexity for calculating the error gradient in back propagation. Additionally, when the conventional activation function propagates backward, it is likely that the derivative will approach zero, which wouldmake it impossible to complete the training of deep network.However, ReLU overcomes this shortcoming very well.

    To overcome the problem of over-fitting, random dropout is set after every hidden layer in the model.Dropout is an optimization method for resolving over-fitting and gradient disappearance in deep neural networks. In the learning process of the neural network, the weight of randomly selected nodes in the hidden layer is set at zero.Because different nodes are reset to zero after each iteration, the importance of each node is balanced. Because of the use of random dropout, each node of the neural network contributes roughly equally to the training, and there is no case where a few high-weight nodes completely dominate the output. In this study, the dropout probability is set at 0.6,i.e.,the weights of 60%of the neurons are set to zero in every iteration.

    The architecture of the second layer is the same as that of the first layer and is based on three key technologies:convolutional network, ReLU activation function, and dropout. The second layer combines low-level motif features to form abstract high-level attributes. In the CNN model, a fully connected layer is set behind the two convolutional layers. Each neuron in the fully connected layer is connected with all the neurons in its preceding layer to integrate local sequence information with class discrimination in the convolutional layer.The fully connected layer contains 200 neurons,and it flattens the matrix into a column vector. The weights of links are calculated by a linear regression algorithm, but linear regression could only predict the continuous value, which does not solve the classification problem.Therefore,the output layer determines whether the input sequence belongs to the positive or negative class, depending on the calculations. The final output layer uses sigmoid function to perform nonlinear transformation and maps the results of the fully connected layer from(-∞, +∞) to (0, 1), which indicates the probability of open chromatin sequence. The goal of the training model is to minimize the error between predicted and labeled values,i.e.,to minimize the cost function. A series of cost functions is available. Here, the binary cross-entropy cost function is used because it can overcome the problem of gradient disappearance when calculating gradient descent,thus showing high learning efficiency.

    Implementation of CharPlant

    CharPlant, implemented in Python, is based on Keras 2.0.1 with TensorFlow 1.2.0,an open source machine-learning platform developed by Google. Additionally, a widely used workflow management system,Snakemake,is employed to combine a series of steps into a single pipeline that can be run by an inexperienced user using simple command line entries [38].Steps in the workflow are described in terms of the rules defined using the input and output and Shell and Python codes. The workflow determines the steps that need to be performed and produces one or more output files. Dependencies between rules are automatically resolved, and rules are automatically parallelized when possible. A text file titled‘‘Snakefile”is created,which defines the input and how the output is created from the input.CharPlant can learn OCR features from DNase-seq or ATAC-seq data and predict potential chromatin accessible regions in a plant genomede novo.It performs four steps: 1) data pre-processing, 2) model training, 3) motif visualization,and 4)de novoprediction.If all the steps are successful, CharPlant outputs the results of the predicted OCRs in a‘‘.bed file”in the directory CharPlant/peak.

    Snakefile has a number of parameters, such as epoch number, learning rate, batch size, and dropout, which could be adjusted using a configuration file ‘‘config.yaml”. This configuration file provides default values and their meaning for each parameter. It is not necessary for the users to modify the default values, except for two directories as follows:‘‘genome:Yourpath/CharPlant/example/oryza_sativa.fa” and ‘‘bed:Yourpath/CharPlant/example/oryza_sativa.bed”. For example, the parameter ‘‘genome” represents the input genome file in ‘‘.fasta” format, and the parameter ‘‘bed” represents the output OCR file in ‘‘.bed” format. The user would need to replace ‘‘Yourpath” with the true path in which CharPlant is installed.

    Installation and execution of CharPlant

    CharPlant is currently available for Linux-based operating systems. To install and run CharPlant, download the package from the GitHub development platform at https://github.com/Yin-Shen/CharPlant and then set‘‘CharPlant”as the current directory. The subdirectory CharPlant/example contains the reference genome (file ‘‘oryza_sativa.fa”) and the DNaseseq data ofO. sativaas an example (file ‘‘ory_whole.bed”).All Python and Shell scripts are in the subdirectory CharPlant/src. Some fundamental Python packages, such as numpy,matplotlib,and keras,will be needed for scientific computing and network construction. File S1 provides a detailed CharPlant manual, installation steps, and parameter settings for the abovementioned Python packages, and complete‘‘config.yaml” and ‘‘Snakemake” files. Users can run the program by typing the following command:$ CharPlant.sh.

    Results and discussion

    Motifs identified by CharPlant

    The positive dataset was obtained from the peaks of ATACseq and DNase-seq data, and the negative samples were generated by shuffling the positive samples, as described above.In the positive samples,motifs are usually clustered for protein binding, whereas the negative samples generally have far fewer motifs. A DNA motif is defined as a short similar recurring pattern of nucleotides, with many biological functions. A previous study has shown that sequence motifs are roughly constant in length, and are often repeated and conserved [39]. Based on the difference between positive and negative datasets, our model learned the sequence motifs and the regulatory logics with which they are combined to determine DNA accessibility. The convolutional layer searched for the motifs along the genome sequence and produced a matrix, with rows representing neurons and columns representing positions. Determination of OCRs was based on the accurate identification of motifs. We compared the motifs learned by the convolution kernels with the known motifs in the JASPAR database [37]. The results showed that many of the motifs predicted by our model were previously known and experimentally validated.For example,JASPAR has eight known motifs inO. sativa, of which six were identified by our model (Figure 2A). We also identified many known motifs of other plants, some of which are shown in Figure 2B–D.Notably, sequence motifs were very small in size (6–19 bp),whereas intergenic regions were very long and highly variable,thus making motif discovery a very difficult task. Therefore,the number of motifs in plant genomes and their positions with respect to target genes remain unclear. Given that JASPAR has a very limited repertoire of only 501 experimentally validated motifs, some of the identified sequences not included in the database could be potential motifs, which might be experimentally validated in the future. Overall, CharPlant can detect motifs of various lengths, which can be subsequently used for the identification of OCRs.

    Performance comparison between CharPlant and other methods

    CharPlant is designed as ade novoOCR prediction tool. By contrast, almost all current methods are developed as conventional classifier algorithms and cannot scan the genome sequence to discover OCRs. Moreover, the architecture and parameters of these models are developed based on human and animal data.Therefore,a strict comparison of these methods with CharPlant is difficult. Because it was impossible to comparede novoprediction with other methods,we compared the learning ability and computational efficiency of CharPlant with two state-of-the-art deep learning algorithms, Basset [25]and Deeperdeepsea [26], using chromatin accessibility data of plants. Basset is an open source package and learns the functional activity of DNA sequences from genomic data. The authors applied Basset to a compendium of accessible genomic sites mapped in 164 cell types by DNase-seq and showed greater predictive accuracy than previous methods [25]. We revised Basset to adapt it to plant data. Deeperdeepsea is a recently published PyTorch-based deep learning library for any biological sequence data. We downloaded the package from https://selene.flatironinstitute.org/. Each method was adjusted to its best state and trained using the dataset constructed in this study, as described above. We calculated the false positive ratevs.true positive rate to plot receiver operating characteristic(ROC)curves and determined the area under the ROC curve (AUROC;Figure 3, Figure S1). In Figure 3A,the curves were obtained using the DNase-seq data ofO. sativa. Although the AUROC values of CharPlant were slightly better than those of Basset and Deeperdeepsea on theO. sativaDNase-seq data, no major difference was observed among the three methods. However, the performances of these three methods on the datasets of other three species (A. thaliana,M. truncatula, andS. lycopersicum) were quite different (Figure 3B–D). Although the ROC curves and AUROC values of Basset onO.sativaandA.thalianadatasets were similar to those of CharPlant, the prediction accuracy of Basset was only ~50%withS.lycopersicumandM.truncatuladatasets. Thus, the results of Basset were equivalent to a random guess,indicating that Basset failed to predict OCRs.Similarly, Deeperdeepsea failed on the datasets of all analyzed plant species, exceptO. sativa.By contrast, our method could be applied to all datasets and achieve consistent performance.Basset and Deeperdeepsea are both excellent methods for the prediction of regulatory elements and have been proven to produce accurate results after training on human data. However, these methods do not work on plant datasets, as shown in this study. This is likely because the structural design and hyperparameter choice of the model are not suitable for plant datasets. The characteristics of DNA sequences differ greatly between plants and animals. To shift an algorithm from the animal to the plant system, replacing the animal training set with a plant dataset is not sufficient;instead,to achieve similar performance,it is often necessary to make substantial changes to the model structure, essentially transforming it into a new model.

    To further validate the performance of CharPlant,we compared our model with machine-learning methods including Random forest, Adaboost, GBDT, XGBboost, and CatBoost on all four plant datasets. These machine-learning methods were implemented using the Scikit-learn package, a widely used library that supports supervised and unsupervised learning [40]. We computed the precision and recall ratios of the comparative methods, and plotted the precision recall (PR)curves. As shown in Figure S2, analysis of the PR curves of CharPlant,Basset,and Deeperdeepsea led us to a similar conclusion to that obtained from the analysis of ROCs described above (Figure 3). When the other five comparative methods,including Random forest, Adaboost, GBDT, XGBboost, and CatBoost, were used on the four plant datasets, their performances were similar,and their ROC and PR curves were close to each other.However,the performance of each of these algorithms was significantly inferior to that of neural network methods.

    In some instances,the method of sample partitioning influences model evaluation. To avoid the randomness of a single training set and testing set,we performed 10-fold cross validation. The dataset was divided into 10 parts and each part was used in turn as a testing dataset, and nine were used as the training dataset.ROC and PR curves were plotted for each test(Figure S3).All samples were used as training and testing sets,and each sample was tested one time. The majority of PR curves overlapped, indicating that precision and recall were stable when using different dataset partition methods. Similar conclusions could be drawn from ROC curves.

    To compare the computation efficiencies, we trained and tested CharPlant, Basset, and Deeperdeepsea on the central processing unit (CPU) and graphics processing unit (GPU).The manufacturers and models are as follows: Tesla P100-PCIE-16GB (GPU) and Intel(R) Xeon(R) Gold 6140 CPU@ 2.30 GHz (CPU). The comparison was performed on the DNase-seq dataset ofO. sativa. The results showed that CharPlant took significantly less time than Basset and Deeperdeepsea on both CPU and GPU (Table 3).

    De novo prediction of OCRs in genomes

    To enable the prediction of OCRs from long DNA sequences or complete genomes using CharPlant, we used the slidingwindow method to split the sequence into fragments.The window width was set at 36 bp.Generally,a smaller sliding step is helpful for the accurate prediction of the locations of OCRs;however, the computational complexity with a smaller sliding step is significantly higher than that with a large sliding step.To compromise the calculation efficiency and accuracy, the sliding step was set at 5 bp.The trained model was used to calculate the probability of chromatin accessibility of these fragments, and then the peaks of OCRs in these fragments were called using the MACS2 tool, with default parameters [30].We scanned the whole genome sequences of four plant species using the CharPlant model and aligned the predicted OCRs with the DNase-seq or ATAC-seq dataset to validate the performance of CharPlant. The training datasets were obtained from a single cell type at a specific time,yet we tried to predict all potential OCRs in different tissues at different times. The results showed that the number of OCRs in the latter was higher than that in the former. Notably, the number of OCRs predicted by CharPlant in all datasets was much larger than that detected by DNase-seq or ATAC-seq assays (Figure 4).CharPlant predicted 153,594 potential OCRs in the DNaseseq dataset ofO. sativaseedlings and calli, of which 65,634 overlapped with those detected by the DNase-seq assay(Figure 4A). Although the remaining 87,960 predicted OCRs were not supported by the DNase-seq assay, 21,420 of these were supported by the ATAC-seq assay ofO.sativaroots and leaves(ATAC-seq data from GSE101482 and GSE75794)(Figure 4B). Based on the currently available DNase-seq and ATAC-seq data of a few plant species,it is reasonable to speculate that more predicted OCRs could be confirmed if more experimental data were available. Among the OCRs predictedin theO.sativaseedling/callus data,a considerable proportion was supported by the root/leaf data, implying that the OCRs predicted by CharPlant are credible and not false positives.

    Figure 4 Overlap between OCRs predicted by CharPlant and those detected by DNase-seq or ATAC-seq assays in four plant species

    To provide more evidence, we compared the predicted OCRs with experimental OCRs and three types of histone modifications(H3K4me3,H3K9ac,and H3K27ac)inA.thaliana. Covalent modification of the histone tail plays a key role in regulating chromatin structure and gene transcription. In eukaryotes, H3K4me3 is associated with active chromatin and promotes transcription through interactions with effector proteins [41,42]. H3K9ac and H3K4me3 frequently coexist as markers of active gene promoters. H3K27ac is related to gene activation and is mainly enriched in enhancer and promoter regions[43,44].Figure S4A and B show two examples of overlap between predicted OCRs and experimental OCRs,indicating that the ATAC-seq data supported the predicted results.Additionally, H3K4me3, H3K9ac, and H3K27ac modifications showed peaks at these sites. However, another scenario is that a predicted OCR does not overlap with DNase-seq or ATAC-seq data. For example, as shown in Figure S4C, the ATAC-seq data showed no peak at the predicted OCR,whereas H3K4me3, H3K9ac, and H3K27ac modifications showed significant peaks. Considering the tissue- and timespecificity of OCRs, it is difficult to definitively conclude that this site is not an OCR.To determine whether there is a prediction bias,i.e., some regions have higher prediction accuracy than other regions,we calculated the distributions of predicted OCRs in the promoter (≤2 kb) regions, intergenic regions,exons, introns, 5′UTRs, 3′UTRs, and downstream regions(300 bp downstream of transcription termination sites), and compared them with experimental data to show their consistency. The results showed that the distributions of predicted OCRs were consistent with those of the DNase-seq data in

    Figure 5 Distributions of CharPlant-predicted OCRs and experimental OCRs in four plant species

    O. sativaand the ATAC-seq data inA. thalianaandM. truncatula(Figure 5A–C). Although the distribution of predicted OCRs appeared different from that of the ATAC-seq data inS.lycopersicum,the number of OCRs was highest in the intergenic regions, followed by the promoter regions, and least in the 3′UTRs (Figure 5D). Epigenetic modifications provide further evidence for the validation of OCRs predicted by CharPlant. Among the four plant species,A. thalianahas the most abundant data,including ATAC-seq dataset and various epigenetic datasets.Therefore,we usedA.thalianaas an example to compare the difference in the frequency of H3K4me3 modification between the predicted OCRs and ATAC-seq peaks.The distribution of H3K4me3 in theA.thalianagenome was obtained from the Plant Chromatin State Database(PCSD; https://systemsbiology.cau.edu.cn/chromstates) [45].Our results showed no significant difference in the frequency of H3K4me3 modification between the predicted OCRs and ATAC-seq peaks, and the two boxplots were almost identical(Figure S5A). Furthermore, we investigated the difference of H3K4me3 modification between the predicted OCRs and 10,000 randomly selected inactive chromatin regions (based on ATAC-seq data). We found that the predicted OCRs were significantly more enriched for H3K4me3 modification than the unopened chromatin regions (Figure S5B).

    Notably, the time taken to scan the genome of four plant species was closely related to the genome size; analyses ofA.thaliana,O.sativa,M.truncatula,andS.lycopersicumgenomes took 8 h, 22 h, 24 h, and 49 h, respectively.

    Conclusion

    In summary,experimental technologies can determine only the current status of DNA accessibility, whereas CharPlant is a neural network model that learns the sequence motifs and regulatory logics and predicts potential OCRs, according to the experimental data. Compared with existing algorithms,CharPlant has several advantages. First, to our knowledge,CharPlant is the firstde novoprediction tool that can identify potential chromatin accessible regions along the genome sequence.Second,CharPlant is specifically designed to predict OCRs of plants, rather than those of human or animals, as in other algorithms. Third, CharPlant marks all potential OCRs of a given plant species in different tissues and at different times, which is beneficial for the investigation of gene regulation under different conditions. Lastly, CharPlant is significantly faster than other deep learning algorithms because it is designed with a concise and efficient structure.

    Code availability

    The source code of CharPlant and supporting files are freely available from GitHub at https://github.com/Yin-Shen/CharPlant.

    CRediT author statement

    Yin Shen:Methodology, Data curation, Software.Ling-Ling Chen:Conceptualization,Supervision.Junxiang Gao:Conceptualization, Project administration, Validation,Writing -original draft, Writing - review & editing. All authors have read and approved the final manuscript.

    Competing interests

    The authors have declared no competing interests.

    Acknowledgments

    This work was supported by the National Natural Science Foundation of China (Grant No. 31871269), the Hubei Provincial Natural Science Foundation of China (Grant No.2019CFA014), and the Fundamental Research Funds for the Central Universities, China (Grant No. 2662019PY069).

    Supplementary material

    Supplementary data to this article can be found online at https://doi.org/10.1016/j.gpb.2020.06.021.

    ORCID

    ORCID 0000-0001-7002-4022 (Yin Shen)

    ORCID 0000-0002-3005-526X (Ling-Ling Chen)

    ORCID 0000-0002-8211-9034 (Junxiang Gao)

    高清毛片免费看| 国产亚洲av片在线观看秒播厂 | 极品教师在线视频| 日韩在线高清观看一区二区三区| 日韩人妻高清精品专区| 亚洲av.av天堂| 日韩av在线大香蕉| 精品99又大又爽又粗少妇毛片| 久久亚洲国产成人精品v| 国产精品久久久久久久电影| 亚洲一级一片aⅴ在线观看| 天天躁日日操中文字幕| 久久精品国产99精品国产亚洲性色| 久久这里有精品视频免费| 国产成人精品久久久久久| 最新中文字幕久久久久| 国产高清有码在线观看视频| 亚洲不卡免费看| 免费看光身美女| 日产精品乱码卡一卡2卡三| 色综合站精品国产| 久久精品国产亚洲网站| 久久久久久久久久成人| 亚洲精品日韩在线中文字幕| 青青草视频在线视频观看| 午夜激情欧美在线| 高清视频免费观看一区二区 | 日韩强制内射视频| 美女高潮的动态| 欧美变态另类bdsm刘玥| 日本与韩国留学比较| 噜噜噜噜噜久久久久久91| 水蜜桃什么品种好| 日韩成人伦理影院| 日韩中字成人| 欧美成人一区二区免费高清观看| 中文在线观看免费www的网站| 黄色日韩在线| 99热网站在线观看| 大香蕉97超碰在线| 天堂影院成人在线观看| 身体一侧抽搐| 国产精品爽爽va在线观看网站| 国产美女午夜福利| 国产伦一二天堂av在线观看| 久久精品国产自在天天线| 少妇裸体淫交视频免费看高清| 国产精品国产三级国产专区5o | 欧美激情在线99| 嘟嘟电影网在线观看| 麻豆成人av视频| 久久99精品国语久久久| 嫩草影院精品99| 99热全是精品| 亚洲国产精品合色在线| 婷婷六月久久综合丁香| 最近手机中文字幕大全| 麻豆一二三区av精品| 欧美一区二区精品小视频在线| 天堂√8在线中文| 能在线免费观看的黄片| av专区在线播放| 欧美一区二区精品小视频在线| 国产精品一及| 丰满少妇做爰视频| av在线老鸭窝| 一区二区三区高清视频在线| 国产亚洲午夜精品一区二区久久 | 欧美激情在线99| 美女高潮的动态| 午夜视频国产福利| 免费观看a级毛片全部| 狠狠狠狠99中文字幕| 亚洲av不卡在线观看| 亚洲色图av天堂| 久久婷婷人人爽人人干人人爱| 欧美成人一区二区免费高清观看| 欧美潮喷喷水| 日韩av不卡免费在线播放| 欧美一区二区国产精品久久精品| 久久人人爽人人片av| 亚洲成av人片在线播放无| 国产精品国产三级专区第一集| av在线天堂中文字幕| 国产69精品久久久久777片| 欧美日韩国产亚洲二区| 国产一区二区在线观看日韩| 亚洲欧美清纯卡通| 亚洲av中文字字幕乱码综合| 国产真实伦视频高清在线观看| 日韩成人av中文字幕在线观看| 亚洲精品456在线播放app| 欧美不卡视频在线免费观看| 国产精品,欧美在线| 亚洲欧美清纯卡通| 国产又色又爽无遮挡免| 精华霜和精华液先用哪个| 男女视频在线观看网站免费| 国产免费又黄又爽又色| 22中文网久久字幕| 日韩成人av中文字幕在线观看| 中文字幕亚洲精品专区| 久久久国产成人免费| 能在线免费观看的黄片| 天天一区二区日本电影三级| 国产精品无大码| 观看美女的网站| 久久久久九九精品影院| 99热精品在线国产| 亚洲中文字幕一区二区三区有码在线看| 国产真实伦视频高清在线观看| 视频中文字幕在线观看| 别揉我奶头 嗯啊视频| 全区人妻精品视频| 九色成人免费人妻av| 久久精品夜色国产| 成年版毛片免费区| 国产男人的电影天堂91| 久久人人爽人人片av| 国产成人freesex在线| 午夜a级毛片| 最近手机中文字幕大全| 久久久久免费精品人妻一区二区| 亚洲av成人精品一二三区| 国产精华一区二区三区| 国产成人午夜福利电影在线观看| 亚洲人成网站高清观看| 欧美一区二区精品小视频在线| 中国国产av一级| 中文字幕制服av| 高清午夜精品一区二区三区| 欧美一区二区国产精品久久精品| 特大巨黑吊av在线直播| 亚洲成人av在线免费| 国产高清视频在线观看网站| 久热久热在线精品观看| 看免费成人av毛片| av又黄又爽大尺度在线免费看 | 精品国产露脸久久av麻豆 | 狠狠狠狠99中文字幕| 又爽又黄a免费视频| 边亲边吃奶的免费视频| 久久久精品欧美日韩精品| 欧美成人免费av一区二区三区| 小说图片视频综合网站| 国产精品三级大全| 蜜桃久久精品国产亚洲av| 亚洲国产高清在线一区二区三| 亚洲欧洲日产国产| 久热久热在线精品观看| 1000部很黄的大片| av卡一久久| 国产成人a区在线观看| ponron亚洲| 一个人免费在线观看电影| 日本三级黄在线观看| 天堂中文最新版在线下载 | 中文欧美无线码| 亚洲婷婷狠狠爱综合网| 久久久精品大字幕| 97人妻精品一区二区三区麻豆| 亚洲国产精品成人综合色| 男女那种视频在线观看| 亚洲av中文av极速乱| 69人妻影院| 国产在视频线在精品| 国产免费福利视频在线观看| 国产高清有码在线观看视频| 亚洲av中文字字幕乱码综合| 免费观看在线日韩| 日韩欧美三级三区| 久久久久九九精品影院| 99热这里只有是精品在线观看| 可以在线观看毛片的网站| 中文亚洲av片在线观看爽| 日韩av不卡免费在线播放| 亚洲精品乱久久久久久| 乱系列少妇在线播放| 久久亚洲国产成人精品v| 26uuu在线亚洲综合色| 国产精华一区二区三区| 黄色配什么色好看| 男女边吃奶边做爰视频| 极品教师在线视频| 国产精品一区二区性色av| 一级毛片久久久久久久久女| 欧美97在线视频| 免费观看在线日韩| 有码 亚洲区| 久久久久性生活片| 99久久精品一区二区三区| 日本-黄色视频高清免费观看| 男的添女的下面高潮视频| 99热网站在线观看| 欧美性猛交╳xxx乱大交人| 99热6这里只有精品| 国产色爽女视频免费观看| 色网站视频免费| 亚洲欧美成人综合另类久久久 | 久久久久免费精品人妻一区二区| 国产爱豆传媒在线观看| 国产黄a三级三级三级人| 国产精品久久视频播放| 国产精品蜜桃在线观看| 少妇的逼水好多| 边亲边吃奶的免费视频| 天美传媒精品一区二区| 联通29元200g的流量卡| 国产精品一及| 麻豆av噜噜一区二区三区| 全区人妻精品视频| 少妇猛男粗大的猛烈进出视频 | 中文字幕熟女人妻在线| 精品人妻熟女av久视频| 日日摸夜夜添夜夜添av毛片| 国产精品国产三级国产av玫瑰| 欧美3d第一页| 精品人妻偷拍中文字幕| 一本久久精品| 亚洲国产精品合色在线| 听说在线观看完整版免费高清| videos熟女内射| 中文字幕av成人在线电影| 综合色av麻豆| 91精品国产九色| 亚洲人成网站在线观看播放| 禁无遮挡网站| 麻豆精品久久久久久蜜桃| 国产伦精品一区二区三区四那| 在线天堂最新版资源| 村上凉子中文字幕在线| 青春草亚洲视频在线观看| 国产老妇伦熟女老妇高清| 久久久精品大字幕| 亚洲欧美精品自产自拍| 七月丁香在线播放| 精品欧美国产一区二区三| 亚洲精品一区蜜桃| 国语自产精品视频在线第100页| 国产一区二区在线观看日韩| 久久99热这里只频精品6学生 | 黄色欧美视频在线观看| 中文天堂在线官网| 色哟哟·www| 如何舔出高潮| 中文欧美无线码| 国产片特级美女逼逼视频| 国产精品乱码一区二三区的特点| 久久久精品94久久精品| 国产免费又黄又爽又色| 久久婷婷人人爽人人干人人爱| 色视频www国产| 国产一级毛片七仙女欲春2| 搡老妇女老女人老熟妇| 99热精品在线国产| 国产精品一区二区三区四区久久| 国产大屁股一区二区在线视频| 国产精品福利在线免费观看| 热99re8久久精品国产| 久久精品91蜜桃| 久久精品久久精品一区二区三区| 国产成人一区二区在线| 亚洲欧洲日产国产| 联通29元200g的流量卡| av免费在线看不卡| 欧美一级a爱片免费观看看| 男女边吃奶边做爰视频| 精品人妻熟女av久视频| 亚洲无线观看免费| 午夜福利在线在线| 久久久久久久午夜电影| 国产淫片久久久久久久久| 国产精品麻豆人妻色哟哟久久 | 国语自产精品视频在线第100页| 久久人人爽人人爽人人片va| 国产成人a区在线观看| 国内揄拍国产精品人妻在线| 日本黄色片子视频| 波多野结衣高清无吗| 99久国产av精品国产电影| 少妇熟女aⅴ在线视频| 一级毛片电影观看 | 国产av在哪里看| 免费搜索国产男女视频| 波多野结衣巨乳人妻| 国产成年人精品一区二区| 99久久人妻综合| 国产精品无大码| 欧美xxxx性猛交bbbb| 日本黄色视频三级网站网址| 最后的刺客免费高清国语| 欧美+日韩+精品| 久久久久久久午夜电影| av线在线观看网站| 亚洲中文字幕一区二区三区有码在线看| 一级毛片电影观看 | 丝袜喷水一区| 国产一区二区在线观看日韩| 久热久热在线精品观看| 国产 一区精品| 非洲黑人性xxxx精品又粗又长| 国产精品一区二区性色av| 亚洲av熟女| 丝袜喷水一区| 亚洲中文字幕日韩| 九九热线精品视视频播放| 国产精品国产三级国产专区5o | 国产av不卡久久| 久久99热6这里只有精品| 国产精品一区二区三区四区免费观看| 亚洲综合色惰| 国产精品一区www在线观看| 日日摸夜夜添夜夜添av毛片| 在线观看一区二区三区| 日韩亚洲欧美综合| 国产在线男女| 久久精品夜夜夜夜夜久久蜜豆| 国产精华一区二区三区| 久久久欧美国产精品| 午夜a级毛片| 欧美色视频一区免费| 欧美激情国产日韩精品一区| 亚洲欧美成人综合另类久久久 | 91午夜精品亚洲一区二区三区| 亚洲婷婷狠狠爱综合网| 免费av不卡在线播放| 69av精品久久久久久| 日韩国内少妇激情av| 国产精品爽爽va在线观看网站| 精品国内亚洲2022精品成人| 国产精品三级大全| 精品久久久久久久人妻蜜臀av| 国产成人精品婷婷| 免费人成在线观看视频色| 国产精品人妻久久久影院| 级片在线观看| 色吧在线观看| 国产日韩欧美在线精品| 国产精品人妻久久久影院| av黄色大香蕉| 99久久中文字幕三级久久日本| 国产91av在线免费观看| 3wmmmm亚洲av在线观看| 精品久久久久久成人av| 日本wwww免费看| 特大巨黑吊av在线直播| 26uuu在线亚洲综合色| 国产一级毛片在线| 深夜a级毛片| 国产中年淑女户外野战色| 91久久精品电影网| 久久久国产成人免费| 亚洲第一区二区三区不卡| 性插视频无遮挡在线免费观看| 少妇高潮的动态图| 日本av手机在线免费观看| 久久久久免费精品人妻一区二区| 欧美性猛交╳xxx乱大交人| 18+在线观看网站| 色综合站精品国产| 人人妻人人看人人澡| АⅤ资源中文在线天堂| a级一级毛片免费在线观看| 天堂√8在线中文| 又黄又爽又刺激的免费视频.| 日韩制服骚丝袜av| av黄色大香蕉| 国产成年人精品一区二区| 国产大屁股一区二区在线视频| 国产伦精品一区二区三区视频9| 久久精品国产亚洲av天美| 97人妻精品一区二区三区麻豆| 成人鲁丝片一二三区免费| 汤姆久久久久久久影院中文字幕 | 亚洲国产成人一精品久久久| 少妇丰满av| 91精品国产九色| 亚洲自拍偷在线| 久久99热6这里只有精品| 国产三级在线视频| 嘟嘟电影网在线观看| 国产一区二区三区av在线| 女人被狂操c到高潮| 黄片无遮挡物在线观看| 亚洲一区高清亚洲精品| 美女大奶头视频| 亚洲av不卡在线观看| 少妇被粗大猛烈的视频| 亚洲国产精品久久男人天堂| 国产精品久久久久久精品电影| 中文字幕人妻熟人妻熟丝袜美| 美女国产视频在线观看| 久久人妻av系列| 日韩三级伦理在线观看| 国产探花在线观看一区二区| 97在线视频观看| 国产成人午夜福利电影在线观看| 国产黄色小视频在线观看| 亚洲无线观看免费| 日日干狠狠操夜夜爽| 国产精品乱码一区二三区的特点| 少妇猛男粗大的猛烈进出视频 | 日本熟妇午夜| 嫩草影院入口| 日本-黄色视频高清免费观看| 日韩亚洲欧美综合| 男女啪啪激烈高潮av片| 久久99热6这里只有精品| 亚洲国产精品sss在线观看| 国产一级毛片在线| 久久精品夜夜夜夜夜久久蜜豆| 99久久成人亚洲精品观看| 国产成人免费观看mmmm| 一级爰片在线观看| 国产黄a三级三级三级人| 看黄色毛片网站| 淫秽高清视频在线观看| 欧美+日韩+精品| 一二三四中文在线观看免费高清| 黄色日韩在线| 美女cb高潮喷水在线观看| 永久免费av网站大全| 熟女人妻精品中文字幕| 日韩精品青青久久久久久| 啦啦啦啦在线视频资源| 久久久久精品久久久久真实原创| 又爽又黄a免费视频| 亚洲18禁久久av| 精品国内亚洲2022精品成人| 国产精品爽爽va在线观看网站| 欧美人与善性xxx| av视频在线观看入口| 免费无遮挡裸体视频| 性色avwww在线观看| 身体一侧抽搐| 秋霞在线观看毛片| 黄片wwwwww| 九九爱精品视频在线观看| av免费在线看不卡| 免费一级毛片在线播放高清视频| 久久久精品94久久精品| 国产成人freesex在线| 亚洲国产精品成人综合色| 最近中文字幕2019免费版| 午夜精品在线福利| 国产三级中文精品| 国产单亲对白刺激| 美女国产视频在线观看| 中文字幕久久专区| 在线观看美女被高潮喷水网站| 少妇熟女aⅴ在线视频| 天天躁日日操中文字幕| 国产女主播在线喷水免费视频网站 | 超碰av人人做人人爽久久| 亚洲性久久影院| 欧美zozozo另类| 小说图片视频综合网站| 国产精品乱码一区二三区的特点| 亚洲av熟女| 成人亚洲欧美一区二区av| 亚洲欧美日韩高清专用| 亚洲欧美日韩卡通动漫| 亚洲国产精品合色在线| 国产在视频线精品| 男人舔奶头视频| 国产精品伦人一区二区| 亚洲伊人久久精品综合 | 久久婷婷人人爽人人干人人爱| 国产色爽女视频免费观看| 亚洲内射少妇av| 日日摸夜夜添夜夜添av毛片| av卡一久久| 国产午夜福利久久久久久| 秋霞在线观看毛片| 国产不卡一卡二| 中文字幕人妻熟人妻熟丝袜美| 精品久久国产蜜桃| 内射极品少妇av片p| 能在线免费看毛片的网站| 成人毛片60女人毛片免费| 国产精品国产三级国产专区5o | 精品国内亚洲2022精品成人| 深爱激情五月婷婷| 欧美区成人在线视频| av女优亚洲男人天堂| 人人妻人人澡欧美一区二区| 九九爱精品视频在线观看| 亚洲国产欧洲综合997久久,| 国产午夜精品论理片| 国产免费视频播放在线视频 | 久久精品影院6| 欧美日韩国产亚洲二区| 亚洲婷婷狠狠爱综合网| 在线观看av片永久免费下载| 欧美成人免费av一区二区三区| 七月丁香在线播放| 国产精品人妻久久久久久| 丝袜美腿在线中文| 亚洲欧美精品自产自拍| 国产69精品久久久久777片| 亚洲国产精品专区欧美| 免费人成在线观看视频色| 精品免费久久久久久久清纯| 欧美区成人在线视频| 国产人妻一区二区三区在| www日本黄色视频网| 午夜久久久久精精品| 国产亚洲午夜精品一区二区久久 | 中文资源天堂在线| 亚洲五月天丁香| 在线观看av片永久免费下载| 极品教师在线视频| 亚洲精品乱久久久久久| 欧美人与善性xxx| 午夜老司机福利剧场| 国产爱豆传媒在线观看| 国产精品久久电影中文字幕| 精品久久久久久久久av| 少妇高潮的动态图| 国产成人aa在线观看| av免费观看日本| 亚洲在线观看片| 听说在线观看完整版免费高清| av天堂中文字幕网| 日韩一区二区三区影片| 亚洲av成人精品一二三区| 亚洲av中文字字幕乱码综合| 91精品伊人久久大香线蕉| 欧美一区二区精品小视频在线| 亚洲国产成人一精品久久久| 长腿黑丝高跟| 搞女人的毛片| 99在线视频只有这里精品首页| 国产精品福利在线免费观看| 久久久a久久爽久久v久久| 听说在线观看完整版免费高清| 欧美日韩综合久久久久久| 三级毛片av免费| 免费播放大片免费观看视频在线观看 | 日本免费在线观看一区| 国产乱来视频区| 我的女老师完整版在线观看| 国产又色又爽无遮挡免| 一边摸一边抽搐一进一小说| 欧美日本亚洲视频在线播放| 老师上课跳d突然被开到最大视频| av在线天堂中文字幕| 精品不卡国产一区二区三区| 51国产日韩欧美| 九九久久精品国产亚洲av麻豆| 亚洲av成人精品一区久久| 欧美一区二区亚洲| 亚洲第一区二区三区不卡| 亚洲欧美日韩高清专用| 国产爱豆传媒在线观看| 国产乱来视频区| 深夜a级毛片| 欧美高清性xxxxhd video| 亚洲av男天堂| 啦啦啦韩国在线观看视频| 成人特级av手机在线观看| 国产伦一二天堂av在线观看| 欧美zozozo另类| 国产一级毛片七仙女欲春2| 欧美xxxx性猛交bbbb| 国产一区二区三区av在线| 久久精品国产亚洲av天美| 久久人人爽人人爽人人片va| 国产精品电影一区二区三区| 嫩草影院精品99| 国产精品99久久久久久久久| 亚洲aⅴ乱码一区二区在线播放| 少妇熟女aⅴ在线视频| 日本免费a在线| 欧美不卡视频在线免费观看| 亚洲综合精品二区| 人妻夜夜爽99麻豆av| 久久人人爽人人片av| 国内少妇人妻偷人精品xxx网站| 亚洲国产最新在线播放| 1000部很黄的大片| 中文字幕免费在线视频6| 久久久精品94久久精品| 在线天堂最新版资源| 欧美丝袜亚洲另类| 色综合亚洲欧美另类图片| 亚洲,欧美,日韩| 国产亚洲精品久久久com| 久久精品国产自在天天线| 亚洲av中文字字幕乱码综合| 日韩欧美精品免费久久| 国产一区二区三区av在线| 免费人成在线观看视频色| 一区二区三区免费毛片| 村上凉子中文字幕在线| 国产探花在线观看一区二区| 亚洲五月天丁香| 日本五十路高清| 亚洲在线自拍视频| 丝袜美腿在线中文| 婷婷色麻豆天堂久久 | 男女边吃奶边做爰视频| 三级男女做爰猛烈吃奶摸视频| 麻豆精品久久久久久蜜桃| 亚洲精品乱久久久久久| 99久久九九国产精品国产免费| 在现免费观看毛片| 免费播放大片免费观看视频在线观看 | 日日啪夜夜撸| 国产 一区 欧美 日韩| 嫩草影院新地址| 久久欧美精品欧美久久欧美| av在线老鸭窝| 亚洲aⅴ乱码一区二区在线播放| 婷婷色麻豆天堂久久 |