• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    Predicting Lung Cancers Using Epidemiological Data: A Generative-Discriminative Framework

    2021-04-13 06:56:12JinpengLiMemberIEEEYalingTaoandTingCai
    IEEE/CAA Journal of Automatica Sinica 2021年5期

    Jinpeng Li, Member, IEEE, Yaling Tao, and Ting Cai

    Abstract—Predictive models for assessing the risk of developing lung cancers can help identify high-risk individuals with the aim of recommending further screening and early intervention. To facilitate pre-hospital self-assessments, some studies have exploited predictive models trained on non-clinical data (e.g.,smoking status and family history). The performance of these models is limited due to not considering clinical data (e.g., blood test and medical imaging results). Deep learning has shown the potential in processing complex data that combine both clinical and non-clinical information. However, predicting lung cancers remains difficult due to the severe lack of positive samples among follow-ups. To tackle this problem, this paper presents a generative-discriminative framework for improving the ability of deep learning models to generalize. According to the proposed framework, two nonlinear generative models, one based on the generative adversarial network and another on the variational autoencoder, are used to synthesize auxiliary positive samples for the training set. Then, several discriminative models, including a deep neural network (DNN), are used to assess the lung cancer risk based on a comprehensive list of risk factors. The framework was evaluated on over 55 000 subjects questioned between January 2014 and December 2017, with 699 subjects being clinically diagnosed with lung cancer between January 2014 and August 2019. According to the results, the best performing predictive model built using the proposed framework was based on DNN. It achieved an average sensitivity of 76.54% and an area under the curve of 69.24% in distinguishing between the cases of lung cancer and normal cases on test sets.

    I. INTRODUCTION

    LUNG cancer is a major threat to humankind [1]. In China,the prevalence of lung cancer in men is 52 cases per 100 000 people, which makes it the first among all cancers. In women, the prevalence is 26.7 cases per 10 000 people, which makes it the second cancer following breast cancer, which leads at 30.4 cases per 100 000 people [2]. Since the current understanding of phenomena associated with lung cancer appearance and evolution is not sufficient for early detection,lung cancer is generally discovered late. As a result, the prognosis is poor [3]. More than 50% of cancers (including the lung cancer) could be prevented if the current knowledge of risk factors was utilized in risk assessments [4], [5].Therefore, moving medical interventions from treatment to prevention can potentially save lives [4], [6].

    Identifying risk factors and making prediction rules for diseases have been used in clinical practice to assist decisionmaking and counseling [4], [7]. Using epidemiological questionnaires to collect risk factors and making predictions based on them is an economic and convenient approach to popularize disease prevention among the public. The Harvard cancer risk index (HCRI) research group developed the first comprehensive cancer risk assessment system and covered common cancers that accounted for 80% of all incidences in the United States [4], where lung cancer ranks the third among all cancers in terms of morbidity and mortality. On the basis of the HCRI, the Chinese National Cancer Center and Chinese Academy of Medical Sciences released the questionnaire on early diagnosis and treatment of urban cancer (EDTUC) in 2012. The questionnaire involves risk factors identified by experienced epidemiologists and clinical oncologist and is adjusted to suit Chinese people. Fig. 1 demonstrates that the questionnaire involves no clinical input to ensure the manageability of pre-hospital self-assessment. Risk factors are the foundation of the decision-making process on lung risk evaluation. The decision rules are based on the weighted summation of risk factors, where the weights are assigned according to expert opinions. This process brings subjectivity.This study considers clinical records (with lung cancer diagnoses) from January 2014 to August 2019 to build a deep learning lung cancer risk model (LCRM). The LCRM predicts risk scores indicating whether a person is under a high lung cancer risk. Our main contributions are listed as follows:

    1) It is the first study on cancer prediction using EDTUC data covering the population of major cities in China. The study validates the effectiveness of risk factors and demonstrates the effectiveness of deep learning in predicting lung cancers based on non-clinical information.

    2) The proposed model considers significantly more risk factors compared to existing methods.

    3) The Wasserstein generative adversarial network(WGAN) is employed for auxiliary sample generation; it is shown to outperform the synthetic minority oversampling technique (SMOTE).

    Fig. 1. The proposed data-driven lung cancer risk predictive model. The risk factors involve six aspects of epidemiological information.

    4) The performance of the proposed method is validated in ablation experiments on real-life data.

    II. RELATED WORK

    Lung cancer is a common threat to human health. According to a survey in 2016, lung and bronchus cancers account for 14% of new cancer cases and 28% of cancer-related deaths in the United States [5]. In China, new cases and death rates from lung cancer rank the top two in both males and females[6]. The late diagnosis of lung cancer results in an increased death risk and a poor five-year survival rate of less than 20%[5]. Predicting lung cancer at an early stage helps identify high-risk population with the aim to recommend further screening (e.g., chest low-dose computed tomography (CT))and prompt lifestyle changes (e.g., quit smoking). Therefore,predicting the lung cancer risk is of great significance in reducing the lung cancer threat.

    There are mainly two approaches to predicting lung cancer risks. The first approach is building an epidemiological plus clinical assessment model (ECAM) based on clinical measures such as blood tests [8], CT [9], and gene sequencing[10]. The second approach is building an epidemiological model (EM) based only on factors that are easy to access such as gender, age, and smoking history. The advantage of the ECAM over the EM is the consideration of clinical modalities that can potentially improve the prediction performance.However, the EM has the advantages of convenience and low cost. This study focuses on the EM to develop an effective and convenient risk prediction solution. Clinical measures are reserved for the second stage conducted on high-risk population identified by the EM. Such a two-stage mode allows to reduce the cost of screening.

    As an early example of the EM, Bach et al. [11] used age,gender, smoking duration, smoking intensity, smoking quit time, and asbestos exposure as risk factors to predict lung cancer risk within one year. Spitz et al. [12] included environmental tobacco smoke, family history of cancer, dust exposure, prior respiratory disease and smoking history variables to build a multivariable logistic regression model that achieved a sensitivity of 70%. Cronin et al. [13] built a model based on smoking variables for predicting lung cancer risk for the future ten years rather than one year; the model achieved an accuracy of 72% for internal studies and 89% for external studies. We speculate that the accuracy would improve as the predictive window becomes longer; however,predictive window such as ten years is too long to be applied.Screening once in a long period of time would allow cancers to develop and progress, whereas screening annually or every three to five years is more appropriate for detecting cancers at their early stages. Therefore, constructing predictive models that evaluate five-year (or shorter) risk is more feasible. The applicability of the three aforementioned models is uncertain since the recruited subjects were current or former smokers, or had asbestos exposure history; therefore, the resulting model is not suitable for predicting lung cancer risk for people without smoking or asbestos exposure history.

    Several studies have included extensive information such as medical history, family cancer history, and living conditions to predict risks among more general population. In 2011,Tammemagi et al. [14] used age, body mass index (BMI), Xray history, education level, smoking status, smoking duration, pack years, family lung cancer history, and chronic obstructive pulmonary disease history to build a model for predicting lung cancer risks within nine years. In 2013,Tammemagi’s model was modified to compute risks within six years [15]. In 2012, Hoggart et al. [16] identified age,smoking status, smoking start age, smoking duration, and number of cigarettes per day as risk factors to implement a one-year risk predictive model. The age of the population considered in Hoggart’s study was between 40 and 65, which was lower than that of the population recruited for the majority of existing studies. Hoggart’s model achieved an area under the curve (AUC) of 84% for smokers, whereas the performance on non-smokers was poor. In 2018, Hart et al.[17] built a lung cancer prediction model based on 13 risk factors among the general population that achieved an AUC of 86% on the test set. The considered risk factors included age,BMI, heart disease, physical exercise, gender, smoking status,emphysema, asthma, diabetes, stroke, hypertension, Hispanic ethnicity, and race.

    While some studies have included all-round information other than smoking status, the risk factors considered in these studies are still limited. Hoggart’s model [16] includes five factors, Bach’s model [11] and Etzel’s model [18] include six factors, Cassidy’s model [19] includes eight factors,Tammemagi’s model [14] includes nine factors, and Spitz’s model [12] includes 12 factors. These studies adopted statistical analysis to quantify associations between each factor and diagnostic result, and applied logistic regression(LR) to predict lung cancer incidence. Statistical analysis lacks the consideration of the cooperation between risks factors during prediction, while LR cannot effectively model the nonlinear relationship between risk factors. Hence, more advanced techniques are required to accurately predict lung cancer.

    III. MATERIALS AND METHODS

    This study focuses on the data collected by the EDTUC project from the urban population of Ningbo, China,considering 84 factors. This number of factors is significantly higher than that considered in existing studies. The factors can be used to describe a broad range of cancers, thus allowing the unified analysis of various common cancers. The EDTUC project evaluates cancer risks according to diagnoses made by epidemiologists. Decisions are made by setting a threshold on the sum of risk factors; the risk is found to be high if the sum exceeds the threshold. This approach lacks objectivity since it is not result-oriented and does not use the information from real diagnostic data to formulate decision rules. From the perspective of data-driven modeling, learning a mapping function between risk factors and clinical diagnosis is a practical and objective approach to predicting lung cancer. To the best of our knowledge, there is no other study on the automatic prediction of lung cancer based in the EDTUC data.This study compares the performance of three discriminative models, namely, LR, support vector machine (SVM), and deep neural network (DNN). The models were trained to perform binary classification on the EDTUC samples and answer the question of whether a person would develop lung cancer within the next five years. Since the number of positive samples in this domain is significantly lower than that of negative samples, this study employs two generative models,namely, a WGAN and a variational autoencoder (VAE), to oversample positive samples.

    A. Data Retrieval

    The EDTUC project was launched in 2012 and carried out in major cities in China. Ningbo was one of them. Data were collected using an epidemiological questionnaire covering more than one hundred risk factors of common cancers(ethical approval 15-070/997 was granted by the Ethics Committee of the Chinese Academy of Medical Sciences).The participation was voluntary, and inclusion criteria included the following:

    1) Ningbo citizen aged between 40 and 74;

    2) No serious organ dysfunction or mental diseases.

    The data were mainly contributed by community and regional central hospitals. The data from each questionnaire was entered by an employee. To ensure data quality, each questionnaire was checked by an independent person. Samples with missing information or obvious errors were discarded.The questionnaires collected across Ningbo were gathered in the Health Commission of Ningbo, China. This study considered all the questions included in the EDTUC questionnaire except those related to family history around common cancers other than lung cancers. We noticed that there were some logical repetitions in the original version of the questionnaire. To reduce the dimensionality of the data(which was thought to be beneficial for building accurate yet simple machine learning models), we reduced the number of considered questions while preserving all information. In total, 84 questions were selected for this study; they are listed in Appendix. Furthermore, 55 891 valid and unique responses collected between January 2014 and December 2017 were included in the study. The name and identity card number were used in combination as the key word to retrieve medical records from the NHC medical database. Among 55 891 respondents, 699 were diagnosed with lung cancer from January 2014 to August 2019. Table I shows the summary of the data.Since the class imbalance is severe, using the original data for training models would make the models tilt excessively toward negative predictions. Class balancing is an important step for obtaining reliable LCRMs [20]. Two advanced generative models were used in this study to synthesize auxiliary positive samples. Fig. 2 summarizes the proposed approach for predicting lung cancer. The LCRM is a discriminative model computing risk scores based on the risk factors. To alleviate the influence of class imbalance during training, we exploit WGAN and VAE to synthesize auxiliary samples. The optimization principles of the generative models are also illustrated in Fig. 2.

    TABLE I SUMMARY OF THE RETRIEVED DATA FOR LUNG CANCER PREDICTION

    B. Wasserstein Generative Adversarial Network

    Generative adversarial networks (GANs) [21], [22] refer to a class of sophisticated generative models that are able to replicate real-world entities by approximating their underlying data distributions. This study employed a GAN to synthesize additional positive samples by augmenting existing positive samples.

    The GAN scheme is inspired by the zero-sum game from game theory; its basic structure includes a generator (G) and a discriminator (D) (Fig. 2(c)). G is trained to capture the real data distribution so as to synthesize samples as real as possible from a noise variable z (that can follow a Gaussian, uniform,or any other distribution). D is trained to distinguish whether a sample comes from the real world or is synthesized by G. The two models compete with each other. When the Nash equilibrium is reached, D is able to distinguish between real and synthesized samples, and G learns the distribution of data[21]. The overall training goal of GANs can be formulated as

    Fig. 2. The generative-discriminative framework for lung cancer risk prediction. (a) LCRM denotes a discriminative model automatically deciding whether to trigger an alar; (b) generative models synthesizing auxiliary samples to help training; (c) the training scheme of the GAN; (d) the training scheme of the VAE.

    1) Sigmoid removal at the last layer of D;

    2) No logarithm in the loss function of D;

    3) Clipping D weights to a fixed small range (e.g.,[?0.1,0.1]) after each gradient update;

    4) No momentum in the optimizer.

    WGAN was used in this study to synthesize auxiliary positive samples and improve the training quality of LCRMs.

    C. Variational Autoencoder

    The regularizer makes the model meaningful by increasing the degree of similarity between similar data and degree of dissimilarity among different data in the hidden space. Since the hidden space is close to the normal distribution,meaningful samples can be synthesized by sampling from this distribution. A more detailed description of this process from the perspectives of probability theory can be found in [24] and[25]. In practice, when updating the model parameters, the mean and standard deviation of the encoder’s output are made close to those of the standard normal distribution. Once a model is trained, the decoder can be used to synthesize auxiliary positive samples by sampling the hidden space.

    D. Lung Cancer Risk Model

    The LCRM is a discriminative model, which computes the risk score on the basis of risk factors. Three machine learning models, namely LR, SVM, DNN, were used in this study to implement the LCRM. Their performance was compared in terms of the AUC, true positive rate (TPR or sensitivity), and true negative rate (TNR or specificity). The LR model feeds the output of a linear to a nonlinear sigmoid function to conduct binary classification; it is the most commonly-used machine learning method in relevant EM studies [5]. Unlike LR, SVM makes decisions based on boundary samples rather than all samples.

    DNN learns the representations of data in a parameterintensive manner. The representations are learned using a hierarchical and end-to-end structure. Muhammad et al. [26]used a DNN to predict pancreatic cancer based on 18 features,demonstrating the effectiveness of this method.

    E. Ablation Experiment Based on Risk Factors

    An LCRM makes predictions based on multiple risk factors.To quantitatively measure the contribution of each factor in the prediction process, we set the value of each risk factor value to zero one-by-one, retrained the model, and observed the result changes. For each risk factor, its ablation score was defined as

    IV. ExPERIMENTS AND RESULTS

    Only numeric values were accepted for open-text questions.One-hot encoding was used to represent answers for multiplechoice questions, resulting in a total of 148 features.

    A. Baseline Models

    To train baseline models, 699 negative samples were randomly selected from a total of 55 192 negative samples to match the number of available positive samples. A subset of 1 398 samples were thus obtained, which included an equivalent number of positive and negative samples. This dataset was split into training set (80%) and test set (20%). The values in each column of the training and test sets were normalized to be in the interval [0, 1].

    The RBF kernel function was used for the SVM classifier;the class weights were set to be equal for positive and negative samples. Grid search with three-fold cross-validation was applied to find the c and gamma values that maximized the AUC value; the searching scope for both parameters was set to [2?3,23]. Table II summarizes the DNN structure; the number of layers and neurons was set experimentally for better performance. L2-regularizer (0.01) was applied to the second and third layers to avoid overfitting. The optimizer was set to Adam [27] with default parameters. The batch size was set to 32, the validation split during training was set to 20%, and the number of training epochs was set to 1000. Early stopping was applied with a patience of 10. The SVM and LR models were implemented using the Scikit-learn library [28],while the DNN model was implemented using the TensorFlow library.

    TABLE II SUMMARY OF THE MODEL ARCHITECTURES

    Table III shows the performance of the baseline models. It can be noticed from the table that the baseline models achieved similar and reasonable results when no clinical measures were used. In particular, the AUCs for the LR,SVM, and DNN models were 64.21%, 64.83%, and 64.91%,respectively.

    B. Synthesizing Auxiliary Samples

    To balance the classes, positive samples were first used to train the WGAN and VAE and synthesize auxiliary positive samples. This approach was compared to SMOTE.

    The dimensionality of noise variable z in WGAN was 16;this noise variable follows a normal distribution. The details of the generator G and discriminator D are summarized in Table II. In particular, the clip value was set to 0.1, which means that the weights of D were clipped below 0.1. Since D is responsible for providing reliable gradients, its parameters are updated five times in each iteration before updating G parameters. As suggested in [22], the non-momentum optimizer RMSProp with a small learning rate of 0.0005 was used to search for solutions. The VAE details are also summarized in Table II. In particular, the Adam optimizer with default parameters was used to optimize VAE. Both WGAN and VAE were implemented using the Keras library running on top of TensorFlow.

    TABLE III COMPARISON OF DIFFERENT OVERSAMPLING AND CLASSIFICATION METHODS

    Fig. 3(a) demonstrates the impact of the number of synthesized samples on the DNN performance. In the figure,results are averages over five runs. In each run, the 699 positive samples were split into 560 training and 139 test samples. The positive augmentation fold (PAF) represents the number of times positive samples were augmented. For example, PAF=1 means that no synthesized samples were used during training. After the number of positive samples was determined, the same number of negative samples was randomly selected for training. According to the results,WGAN is superior to VAE and SMOTE. When PAF grows from one to two, the AUC increases for all the three methods.However, a higher number of synthesized samples does not always bring better results; the curves of the three methods all show fluctuations when PAF grows from three to ten. This phenomenon indicates that a large number of repetitive(similar) patterns may result in overfitting. Setting PAF to two yielded good performance with the minimal sample size on the considered dataset.

    Several studies have suggested that making the numbers of positive and negative samples strictly equal is not necessary the best modeling approach [29]. To assess the impact of the number of negative samples on the model performance, the mean AUC of DNN was evaluated under different numbers of negative samples with PAF=2. The negative sampling fold( NS F) was used to represent the number of negative samples selected for training (Fig. 3(b)). For example, NS F=1 means that an equal number of negative and positive samples were used to train the model. For all the three oversampling methods, the optimal AUCs were achieved when NS F=1,although the results were similar when NS F=2. When NS F>4, the values of the AUC dropped significantly since optimization was dominated by the negative class. These empirical results suggest making a balanced training set.

    Table III shows the results of different oversampling and classification methods ( PAF=2 , NS F=1). Oversampling methods have the potential of synthesizing noisy samples;hence two data cleaning methods were applied after oversampling:

    Fig. 3. Mean area under the curves (AUCs) of DNN impacted by (a) positive augmentation fold (PAF) and (b) negative sampling fold (NSF). PAF denotes the number of folds of synthesized positive samples. NSF denotes the number of folds of negative samples sampled with respect to the number of positive samples.

    1) Edited Nearest Neighbors (ENN): For a synthesized positive sample, if the majority of its neighbors are negative samples, it is identified as a noisy sample and therefore is removed.

    2) Tomek Link: For a sample A, if its nearest neighbor B comes from the opposite class, and the nearest neighbor of B is A, then the two samples have a Tomek link. Synthesized Tomek samples in the positive class were removed.

    Data cleaning was performed using the imblearn library[30]. The Euclidean distance was used as the metric. The default setting was adopted for ENN (three neighbors).According to the mean AUCs on the test set, the impact of data cleaning on the model performance was unclear; no significant differences could be captured. Therefore, auxiliary samples synthesized by SMOTE, VAE, and WGAN had reliable class properties.

    C. Comparisons With the State-of-the-Art Algorithms

    We compare the proposed method with state-of-the-art outlier detection algorithms, undersampling algorithms and generative models. The results are summarized in Table IV.The generative adversarial active learning (GAAL) [31] has two versions: the single-objective GAAL (SO-GAAL) and the multi-objective GAAL (MO-GAAL). SO-GAAL uses a generator to synthesize minority data, and a discriminator to distinguish the minority data and the synthesized minority data. After reaching the Nash equilibrium, the discriminator is used as the outlier detector. MO-GAAL uses more generators to learn reasonable reference distributions of the dataset. The configurations of the generator and discriminator are consistent with that shown in Table II except that the activation function of the output layer of the discriminator is replaced by Sigmoid. We set the training epoch as 1 K. We apply the stochastic gradient descent with a momentum of 0.9 to optimize the generator (learning rate: 0.0001) and the discriminator (learning rate: 0.001). For MO-GAAL, we use 10 generators to conduct the experiment. The nearest sample of cluster center (NSCC) [32] is an effective undersampling method, which performs K-means clustering in the majority class, and the nearest sample of each center is regarded as minority samples. The EhrGAN [33] uses GAN to synthesize the electronic health records to enhance training. We adopt the GAN structure and learning configuration of the SO-GAAL to conduct the experiment. The symbol “*” is used to indicate statistically significant improvement (p < 0.05 according to the paired t-test) of the model with respect to the baseline.According to the results, the performance of VAE is basically the same as those of NSCC and EhrGAN. The WGAN we apply achieves the highest sensitivity (76.54%), specificity(71.08%) and AUC (69.24%) among all the methods.

    TABLE IV COMPARISON WITH STATE-OF-THE-ART METHODS

    D. Ablation Test Results for Risk Factors

    Ablation tests were conducted to evaluate the effectiveness of each risk factor by evaluating the AUC drop when disabling each risk factor to a constant (zero) in turn. The computing criterion was set according to (10). The results were scaled to [0,1]. Fig. 4 shows the mean ablation score of each risk factor. The list of risk factors appearing on the horizontal axis can be found in Table V in Appendix. When different risk factors were disabled, the changes in the prediction results showed significant differences. The factors played different roles in modeling. While some factors impacted the result directly, others impacted it in a cooperatively coupled manner.

    V. DISCUSSION

    The best value of the AUC obtained on the test set was 69.24%, with a sensitivity of 76.54% and a specificity of 71.08%. Many lung cancer prediction models have been proposed in the literature based on different risk factors and target populations. Some studies have reported results obtained on entire datasets, without splitting them into training and test sets. This is the case with the Tammemagi’s model [14], for example, which achieved an AUC of 77%when considering nine risk factors. Those studies evaluating their models on separate test sets, have reported lower AUC scores. For example, Spitz’s model [12] built using 12 risk factors achieved AUCs for those who never smoked, were former smokers, and current smokers of 57%, 63%, and 58%,respectively. Etzel’s model [18] built using six risk factors achieved an AUC of 63%. Cassidy model [19] built using eight risk factors to predict lung cancer achieved an AUC of 70%. Cassidy’s model was built using 579 lung cancer cases,which is similar to the number of cases used to build the model proposed in this paper (699). Both models were built to predict five-year risk among general population. While the AUC score of Cassidy’s model is slightly higher than that of the model proposed in this paper (by 2.11%), the latter is tailored to the lifestyle of Chinese people, and therefore, still has a reference value.

    This study employed VAE and WGAN for oversampling,and compared them with commonly-used SMOTE. When used in combination with the LR model, SMOTE, VAE, and WGAN allowed to achieve AUCs of 64.95%, 66.14%, and 66.99%, respectively. When used in combination with the DNN model, SMOTE, VAE, and WGAN allowed to achieve AUCs of 65.54%, 65.82%, and 69.24%, respectively. In both cases, the WGAN-based oversampling allowed to achieve the best AUC, sensitivity, and specificity. When combined with DNN, the advantage of WGAN is more significant (WGAN to VAE: p < 0.01; WGAN to SMOTE: p < 0.01; both evaluated using the paired t-test). At present, the main method to solve the problem of class imbalance is SMOTE. However, SMOTE is based on the linear interpolation and insufficient in synthesizing high-quality samples with high dimensions.Considering that the number of variables included in this study is more than previous studies, we propose to use two sophisticated nonlinear generative models, i.e., WGAN and VAE to generate samples in high-dimensional manifolds. The results validate the superiority of these methods. As nonlinear generative models, WGAN and VAE can be used in upsampling high-dimensional and complex data.

    While SMOTE, VAE, and WGAN bring significant performance improvements, the unlimited oversampling of positive class does not bring continuous improvement. The experimental results in Fig. 3 confirm this viewpoint. In this study, the number of positive samples was significantly lower than that of negative samples. Oversampling may result in repetitive or similar positive patterns; this can provide a good performance on the training set but not the test set. In other words, models can become overfitted when an excessive number of synthesized samples are used for their training.

    Fig. 4. Ablation test results demonstrating the impact of the considered risk factors. The ablation scores were computed according to (10). The results are represented as means over five runs. The error bars represent the standard deviation.

    Compared to the LR and SVM models, the DNN model is more parameter-intensive. While this characteristic endows DNN with a strong fitting ability, it requires more training samples to search for parameters. Therefore, the performances of DNN, LR, and SVM are pretty much the same in the baseline. However, when synthesized samples are used to assist training, the performance of DNN is better than that of LR. We believe that DNN can outperform LR even further when more positive samples are included in the training set.

    The presented ablation test demonstrates the relative contribution of each risk factor in the data-driven modeling.

    1) Basic Information: Removing A02: Age leads to the largest performance decline. The possibility of developing lung cancer rises rapidly with age, as has been demonstrated elsewhere [34]. The elderly should pay more attention to lung cancer prevention. A01: Gender has a high ablation score.This is not surprising; the prevalence of lung cancer among Chinese men is about twice that among Chinese women [2].

    2) Diet Habits: B01: Fresh vegetable is associated with lung cancer, which is consistent with a cohort study conducted among Chinese people [35], although the reasons have been less investigated. High ablation scores are also found for B07:Salt, B03: Meat, and B09: Pickled food. While B07 has not appeared in the literature, B03 and B07 have been investigated elsewhere. In particular, some recent studies have found convincing associations between large intake of red meat and lung cancer [36]. While some studies have investigated the associations between B09: Pickled food and lung cancer in China, no positive results have been found [37]. A recent study suggests that consuming pickled foods is associated with smoking and alcohol drinking, and the risk of lung cancer increases when these risk factors are present [38].

    3) Living Conditions: Risk factors with high ablation scores include C04: Lampblack, C05: Smoking, C07: Total smoking years, and C09: Regular inhalation of secondhand smoke. All of these factors except for C04: Lampblack are associated with smoking. Both active smoking [39] and passive smoking[40] have been proved to be correlated with lung cancer in many studies. Stopping smoking is important for reducing the risk of lung cancer.

    4) Psychology and Emotion: These factors have been relatively neglected in the literature. The ablation test results show that these “internal factors” are also important.

    5) Medical History: High ablation scores are found for E05:Chronic bronchitis, E15: Duodenal ulcer, and E31:Hypertension. While E05: Chronic bronchitis has been proved to be associated with lung cancer [41], the other two factors require further investigation.

    6) Family Cancer History: F07: Grandparents-in-law and F09: Mother’s brother or sister have relatively high ablation scores. These findings should be confirmed with more rigorous and large-scale epidemiological studies.

    VI. CONCLUSION

    The EDTUC questionnaire provides comprehensive information to facilitate lung cancer risk prediction. The LCRM learns decision rules automatically based on the clinical diagnosis in a data-driven manner, which avoids the subjectivity of human experience. In application, LCRM helps identify peoples with high risk of developing lung cancer within around five years. Further screenings and individualized interventions should be carried out to prevent lung cancer. The development of machine learning will help to improve the prevention-oriented medical system, so as to improve the human well-being.

    The main limitation of this study is the focus on employing deep learning for lung cancer prediction, without considering the medical, biological, and oncological perspective.

    As part of our future work, we plan to regularly update the LCRM as more data will be collected. We intend to package the model as software and combine its predictions with medical expertise using Bayes reasoning. Expert experience can help improve the interpretability of the LCRM and encourage its better performance when examples of positive cases are limited. In addition, the LCPM model can take clinical test variables, imaging data, and genetic data into consideration, so as to improve the prediction performance and have a prognostic evaluation function.

    APPENDIX

    TABLE V RISK FACTORS OF LUNG CANCER PREDICTION

    TABLE V(CONTINUED)

    ACKNOwLEDGMENT

    The authors would like to thank the Health Commission of Ningbo (HCN), China for providing questionnaire data, as well as providing clinical diagnosis records of lung cancer.

    a级毛片免费高清观看在线播放| 日韩视频在线欧美| 色网站视频免费| 亚洲一区二区三区欧美精品| 啦啦啦在线观看免费高清www| 久久 成人 亚洲| 狂野欧美白嫩少妇大欣赏| 丝瓜视频免费看黄片| 中文资源天堂在线| 日韩一本色道免费dvd| 日韩制服骚丝袜av| 天堂8中文在线网| 亚洲欧美一区二区三区国产| av黄色大香蕉| 欧美区成人在线视频| 精品久久久久久久久亚洲| 高清av免费在线| 91精品伊人久久大香线蕉| 建设人人有责人人尽责人人享有的 | 内地一区二区视频在线| 国产探花极品一区二区| 91午夜精品亚洲一区二区三区| 日韩成人av中文字幕在线观看| 在线免费十八禁| 老女人水多毛片| 黄色一级大片看看| 精华霜和精华液先用哪个| 一级毛片电影观看| 在线免费观看不下载黄p国产| 老女人水多毛片| 亚洲中文av在线| 只有这里有精品99| 校园人妻丝袜中文字幕| 亚洲国产成人一精品久久久| 国产精品99久久久久久久久| 日本色播在线视频| 亚洲,一卡二卡三卡| 人妻系列 视频| 久久这里有精品视频免费| 你懂的网址亚洲精品在线观看| 亚洲欧洲国产日韩| 91久久精品国产一区二区成人| 国产精品一二三区在线看| h视频一区二区三区| 国产熟女欧美一区二区| 精品国产三级普通话版| 欧美zozozo另类| 天堂俺去俺来也www色官网| 精品一品国产午夜福利视频| 91午夜精品亚洲一区二区三区| 亚洲内射少妇av| 我要看黄色一级片免费的| 亚洲不卡免费看| 国产精品99久久久久久久久| 少妇被粗大猛烈的视频| 91精品伊人久久大香线蕉| 亚洲精品国产av蜜桃| 99国产精品免费福利视频| 婷婷色麻豆天堂久久| 一边亲一边摸免费视频| 少妇 在线观看| 成年女人在线观看亚洲视频| 久久久亚洲精品成人影院| 国产男人的电影天堂91| 99久国产av精品国产电影| 97在线人人人人妻| 寂寞人妻少妇视频99o| 爱豆传媒免费全集在线观看| 在线免费十八禁| 中文字幕久久专区| 91午夜精品亚洲一区二区三区| 黑人猛操日本美女一级片| 国产淫片久久久久久久久| av在线老鸭窝| 中文天堂在线官网| 成人影院久久| 国产v大片淫在线免费观看| 亚洲国产精品999| 蜜臀久久99精品久久宅男| 国产视频首页在线观看| 美女视频免费永久观看网站| 伦理电影大哥的女人| 又大又黄又爽视频免费| 美女cb高潮喷水在线观看| 在线观看人妻少妇| 国产高潮美女av| 97超视频在线观看视频| 国产淫片久久久久久久久| 三级国产精品片| 人妻 亚洲 视频| 午夜福利高清视频| 国产视频首页在线观看| 在线亚洲精品国产二区图片欧美 | h视频一区二区三区| 中文天堂在线官网| 在线观看国产h片| 三级国产精品片| 高清视频免费观看一区二区| 视频区图区小说| 男男h啪啪无遮挡| 蜜桃久久精品国产亚洲av| 国产精品一区二区三区四区免费观看| 日韩伦理黄色片| 亚洲aⅴ乱码一区二区在线播放| 男人狂女人下面高潮的视频| 国产精品一区二区在线观看99| 搡女人真爽免费视频火全软件| 亚洲高清免费不卡视频| 黄色欧美视频在线观看| 日韩欧美 国产精品| 日本猛色少妇xxxxx猛交久久| 麻豆成人午夜福利视频| 亚洲精华国产精华液的使用体验| 日本与韩国留学比较| 国产精品女同一区二区软件| 欧美成人午夜免费资源| 亚洲欧美一区二区三区黑人 | 大片免费播放器 马上看| 亚洲av在线观看美女高潮| 欧美区成人在线视频| 各种免费的搞黄视频| 大香蕉97超碰在线| 欧美变态另类bdsm刘玥| 99热国产这里只有精品6| 毛片一级片免费看久久久久| 又黄又爽又刺激的免费视频.| av.在线天堂| 亚州av有码| 热99国产精品久久久久久7| 亚洲三级黄色毛片| 这个男人来自地球电影免费观看 | 少妇人妻久久综合中文| 在现免费观看毛片| 国产成人一区二区在线| 精品久久久久久电影网| 99久国产av精品国产电影| 精品视频人人做人人爽| 久久国产乱子免费精品| 午夜福利影视在线免费观看| 青春草视频在线免费观看| 中文字幕av成人在线电影| 中国国产av一级| 亚洲精品中文字幕在线视频 | 午夜福利在线观看免费完整高清在| 十分钟在线观看高清视频www | a级毛色黄片| 日韩,欧美,国产一区二区三区| 国内揄拍国产精品人妻在线| av女优亚洲男人天堂| 亚洲精华国产精华液的使用体验| 久久久久久伊人网av| 亚洲精品日本国产第一区| 久久毛片免费看一区二区三区| 免费高清在线观看视频在线观看| 青青草视频在线视频观看| 欧美日韩亚洲高清精品| 欧美xxⅹ黑人| 人人妻人人添人人爽欧美一区卜 | 国产精品福利在线免费观看| 王馨瑶露胸无遮挡在线观看| 80岁老熟妇乱子伦牲交| 亚洲美女视频黄频| 熟女人妻精品中文字幕| 精品酒店卫生间| 国产成人91sexporn| 色婷婷久久久亚洲欧美| 一个人看视频在线观看www免费| 免费播放大片免费观看视频在线观看| 久久久色成人| 亚洲性久久影院| 亚洲国产av新网站| 久久久久精品性色| 国产黄片美女视频| 国产精品一区二区在线不卡| 少妇精品久久久久久久| 亚洲欧美精品专区久久| 国产免费视频播放在线视频| av专区在线播放| 精品人妻视频免费看| 欧美xxxx黑人xx丫x性爽| 亚洲三级黄色毛片| 91aial.com中文字幕在线观看| 国产精品久久久久久av不卡| 青春草视频在线免费观看| 街头女战士在线观看网站| 一区二区三区四区激情视频| h视频一区二区三区| 国产精品无大码| 欧美xxxx黑人xx丫x性爽| 午夜免费观看性视频| 亚洲av中文av极速乱| 一区二区三区四区激情视频| 国产日韩欧美在线精品| 精华霜和精华液先用哪个| 精品久久久精品久久久| 伦精品一区二区三区| 少妇人妻久久综合中文| 干丝袜人妻中文字幕| 欧美最新免费一区二区三区| 18禁在线播放成人免费| 免费少妇av软件| 国产亚洲5aaaaa淫片| 国产免费视频播放在线视频| 另类亚洲欧美激情| 日日撸夜夜添| 国产精品久久久久久久久免| 国产爱豆传媒在线观看| 国产免费一区二区三区四区乱码| 在线播放无遮挡| 亚洲伊人久久精品综合| 人妻系列 视频| av天堂中文字幕网| 18禁在线播放成人免费| 欧美97在线视频| 免费人妻精品一区二区三区视频| 99久久精品国产国产毛片| 国产成人午夜福利电影在线观看| 97超视频在线观看视频| 成年av动漫网址| 精品人妻偷拍中文字幕| 日韩欧美精品免费久久| 国产精品99久久久久久久久| a级毛片免费高清观看在线播放| 欧美极品一区二区三区四区| 国产精品国产三级专区第一集| 午夜老司机福利剧场| 赤兔流量卡办理| 草草在线视频免费看| 久久久久国产精品人妻一区二区| 美女脱内裤让男人舔精品视频| 亚洲精品国产色婷婷电影| 亚洲图色成人| 国产午夜精品久久久久久一区二区三区| 美女内射精品一级片tv| kizo精华| 国产成人精品婷婷| 亚洲精品乱码久久久v下载方式| 交换朋友夫妻互换小说| av视频免费观看在线观看| 国产黄色免费在线视频| 最后的刺客免费高清国语| 熟女av电影| 国产成人免费观看mmmm| 欧美 日韩 精品 国产| 在线天堂最新版资源| 偷拍熟女少妇极品色| 国产日韩欧美在线精品| 美女国产视频在线观看| 久久99热6这里只有精品| 大陆偷拍与自拍| 精品久久国产蜜桃| 中文在线观看免费www的网站| 国产黄色免费在线视频| 国产老妇伦熟女老妇高清| 久久鲁丝午夜福利片| 国产亚洲91精品色在线| 直男gayav资源| 日产精品乱码卡一卡2卡三| 国产免费一区二区三区四区乱码| 内射极品少妇av片p| 日韩大片免费观看网站| 欧美3d第一页| 男女边摸边吃奶| 能在线免费看毛片的网站| 国产毛片在线视频| 一级毛片电影观看| 精品亚洲成国产av| 欧美bdsm另类| 美女高潮的动态| 亚洲欧美精品自产自拍| 久久午夜福利片| 尾随美女入室| 国产精品三级大全| 国产精品麻豆人妻色哟哟久久| 中文欧美无线码| 国产片特级美女逼逼视频| 女性生殖器流出的白浆| 18禁在线播放成人免费| 国产午夜精品一二区理论片| 国产亚洲91精品色在线| 乱码一卡2卡4卡精品| 街头女战士在线观看网站| 在线 av 中文字幕| 美女脱内裤让男人舔精品视频| 国产综合精华液| 日韩视频在线欧美| 久久99热这里只有精品18| 国产69精品久久久久777片| 国产成人精品久久久久久| 亚洲aⅴ乱码一区二区在线播放| 80岁老熟妇乱子伦牲交| 一级毛片 在线播放| 亚洲精品中文字幕在线视频 | 深夜a级毛片| 国产精品一区www在线观看| 老司机影院毛片| 青青草视频在线视频观看| 国产精品麻豆人妻色哟哟久久| 国产 一区精品| 国产亚洲精品久久久com| 国产精品一区二区三区四区免费观看| 日本av手机在线免费观看| 成年女人在线观看亚洲视频| 自拍欧美九色日韩亚洲蝌蚪91 | 久久久久久久亚洲中文字幕| 国产有黄有色有爽视频| tube8黄色片| 国产av码专区亚洲av| 日韩中文字幕视频在线看片 | 中文字幕人妻熟人妻熟丝袜美| 夫妻性生交免费视频一级片| 菩萨蛮人人尽说江南好唐韦庄| 97热精品久久久久久| 新久久久久国产一级毛片| 男女免费视频国产| 99九九线精品视频在线观看视频| 亚洲精品中文字幕在线视频 | 国产久久久一区二区三区| 人人妻人人澡人人爽人人夜夜| 观看av在线不卡| 国产精品嫩草影院av在线观看| 精品视频人人做人人爽| 精品人妻视频免费看| 国产精品爽爽va在线观看网站| 亚洲精品第二区| 亚洲欧美一区二区三区国产| 亚洲精品日本国产第一区| 免费黄频网站在线观看国产| 成年美女黄网站色视频大全免费 | 女人十人毛片免费观看3o分钟| 成年av动漫网址| 国产精品一区二区性色av| 婷婷色综合大香蕉| 国产精品久久久久久av不卡| 97在线人人人人妻| 亚洲精品亚洲一区二区| 一区二区三区四区激情视频| 国产 一区 欧美 日韩| 天堂中文最新版在线下载| 少妇人妻一区二区三区视频| 欧美国产精品一级二级三级 | 美女内射精品一级片tv| 如何舔出高潮| 久久久久久久久久成人| 亚洲精品,欧美精品| 黄色怎么调成土黄色| 免费看av在线观看网站| 亚洲精品一二三| 久久99蜜桃精品久久| 综合色丁香网| 中文字幕制服av| 亚洲av日韩在线播放| 天堂中文最新版在线下载| 男的添女的下面高潮视频| freevideosex欧美| 亚洲精品久久久久久婷婷小说| 校园人妻丝袜中文字幕| 精品亚洲成a人片在线观看 | 久久影院123| 精品亚洲乱码少妇综合久久| av在线蜜桃| 18禁在线播放成人免费| 中文字幕制服av| 成人免费观看视频高清| 日韩大片免费观看网站| 如何舔出高潮| 精品熟女少妇av免费看| 精品国产乱码久久久久久小说| 日日撸夜夜添| 国产精品久久久久久精品电影小说 | 久久久欧美国产精品| 亚洲av成人精品一二三区| 国产成人a∨麻豆精品| av在线老鸭窝| 国产极品天堂在线| 成人无遮挡网站| 免费看光身美女| 26uuu在线亚洲综合色| 插阴视频在线观看视频| 97超碰精品成人国产| 久久精品国产自在天天线| 久久久久网色| 尾随美女入室| 国产精品一及| 日韩 亚洲 欧美在线| 国产在线免费精品| 99热网站在线观看| 日韩 亚洲 欧美在线| 波野结衣二区三区在线| 亚洲欧洲日产国产| 免费看av在线观看网站| 国产有黄有色有爽视频| 激情 狠狠 欧美| 久久精品国产a三级三级三级| 欧美xxⅹ黑人| 又粗又硬又长又爽又黄的视频| 在线免费十八禁| 欧美97在线视频| 边亲边吃奶的免费视频| 久久精品久久精品一区二区三区| 中文字幕免费在线视频6| 久久久久久九九精品二区国产| 啦啦啦中文免费视频观看日本| 又黄又爽又刺激的免费视频.| 婷婷色麻豆天堂久久| 免费观看性生交大片5| 99热全是精品| 看十八女毛片水多多多| 美女国产视频在线观看| 毛片一级片免费看久久久久| 亚洲精华国产精华液的使用体验| 亚洲欧洲国产日韩| 国产黄色视频一区二区在线观看| 久久国产精品大桥未久av | 一本—道久久a久久精品蜜桃钙片| 麻豆乱淫一区二区| 成年美女黄网站色视频大全免费 | 亚洲欧美精品自产自拍| 久久99热这里只频精品6学生| 成人综合一区亚洲| 日韩伦理黄色片| 韩国av在线不卡| 亚洲国产高清在线一区二区三| 舔av片在线| 国产 一区 欧美 日韩| 亚洲精品,欧美精品| 午夜福利影视在线免费观看| 97超碰精品成人国产| 久久久久久伊人网av| 免费看av在线观看网站| 亚洲电影在线观看av| 欧美精品国产亚洲| 中文字幕久久专区| 久久久久精品久久久久真实原创| 人妻系列 视频| 人妻夜夜爽99麻豆av| 黄色怎么调成土黄色| 99久久精品一区二区三区| 色婷婷av一区二区三区视频| 三级国产精品片| 日本黄大片高清| 91精品国产国语对白视频| 中文资源天堂在线| 日韩av免费高清视频| 国产深夜福利视频在线观看| 国产精品偷伦视频观看了| 在现免费观看毛片| 黑丝袜美女国产一区| 午夜免费鲁丝| 亚洲人成网站在线观看播放| 欧美精品国产亚洲| av黄色大香蕉| 在线观看一区二区三区| 久久久久国产精品人妻一区二区| 天堂中文最新版在线下载| 国产免费一区二区三区四区乱码| 欧美日韩视频高清一区二区三区二| 国产精品久久久久久久久免| 中文天堂在线官网| 天天躁日日操中文字幕| 国产淫语在线视频| 三级国产精品欧美在线观看| 成人特级av手机在线观看| 国产免费又黄又爽又色| 亚洲欧美精品专区久久| 免费观看的影片在线观看| 久久久久久久久久久免费av| 日韩中文字幕视频在线看片 | 成年av动漫网址| 亚洲天堂av无毛| 国产视频内射| 日韩av不卡免费在线播放| 乱系列少妇在线播放| 国内少妇人妻偷人精品xxx网站| 狂野欧美白嫩少妇大欣赏| 国产乱人视频| 国产片特级美女逼逼视频| 久久国产精品大桥未久av | 亚洲国产精品一区三区| 日本欧美视频一区| 久久精品国产自在天天线| 国产69精品久久久久777片| 日韩人妻高清精品专区| 亚洲av免费高清在线观看| 插逼视频在线观看| 王馨瑶露胸无遮挡在线观看| 成人高潮视频无遮挡免费网站| 国产精品女同一区二区软件| 另类亚洲欧美激情| 亚洲av福利一区| 久久久a久久爽久久v久久| 精品一区二区免费观看| 极品教师在线视频| 国产亚洲5aaaaa淫片| 久久久久国产精品人妻一区二区| 欧美精品人与动牲交sv欧美| 一本一本综合久久| 国产男女超爽视频在线观看| 91精品国产国语对白视频| 午夜免费观看性视频| 国产亚洲午夜精品一区二区久久| 国产伦精品一区二区三区四那| 人人妻人人添人人爽欧美一区卜 | 1000部很黄的大片| 成年免费大片在线观看| 观看av在线不卡| 免费黄频网站在线观看国产| 亚洲综合色惰| 久久久精品免费免费高清| 夫妻午夜视频| 十八禁网站网址无遮挡 | av在线播放精品| 国产男女超爽视频在线观看| 18禁裸乳无遮挡免费网站照片| 91午夜精品亚洲一区二区三区| 免费在线观看成人毛片| 日日啪夜夜撸| 伊人久久国产一区二区| av一本久久久久| 一个人看的www免费观看视频| 女性被躁到高潮视频| 91aial.com中文字幕在线观看| videos熟女内射| 久久99蜜桃精品久久| 久久久色成人| 99热这里只有精品一区| 日日摸夜夜添夜夜添av毛片| 九九久久精品国产亚洲av麻豆| 国产精品久久久久久精品古装| 亚洲激情五月婷婷啪啪| 性色av一级| 亚洲一区二区三区欧美精品| 黄色视频在线播放观看不卡| 亚洲精品aⅴ在线观看| 99久久精品国产国产毛片| 国产精品一区二区三区四区免费观看| 男女边吃奶边做爰视频| 尤物成人国产欧美一区二区三区| 少妇人妻一区二区三区视频| 亚洲人成网站在线播| 精华霜和精华液先用哪个| 99久久人妻综合| 精品国产乱码久久久久久小说| 久久人人爽人人片av| 欧美亚洲 丝袜 人妻 在线| 在线免费观看不下载黄p国产| 亚洲精品,欧美精品| 51国产日韩欧美| 日韩一本色道免费dvd| 国产69精品久久久久777片| 汤姆久久久久久久影院中文字幕| 国产精品蜜桃在线观看| 日本与韩国留学比较| 久久久久性生活片| 日本午夜av视频| 青春草国产在线视频| 精品人妻偷拍中文字幕| 99热这里只有是精品在线观看| 男女边摸边吃奶| 国产一级毛片在线| 日产精品乱码卡一卡2卡三| 下体分泌物呈黄色| 日韩人妻高清精品专区| 国产精品熟女久久久久浪| 国产av精品麻豆| 在现免费观看毛片| 成年美女黄网站色视频大全免费 | av黄色大香蕉| 日本色播在线视频| 干丝袜人妻中文字幕| 国产黄片美女视频| 午夜老司机福利剧场| 国产伦在线观看视频一区| 色哟哟·www| 亚洲三级黄色毛片| 国产成人91sexporn| 男人狂女人下面高潮的视频| 黄片wwwwww| 天堂俺去俺来也www色官网| 观看av在线不卡| 超碰97精品在线观看| 欧美亚洲 丝袜 人妻 在线| 国产精品蜜桃在线观看| av播播在线观看一区| 国产淫片久久久久久久久| 日韩av不卡免费在线播放| 男人爽女人下面视频在线观看| 99视频精品全部免费 在线| 三级国产精品片| 一区二区av电影网| 蜜桃在线观看..| 高清不卡的av网站| 久久精品人妻少妇| 亚洲自偷自拍三级| 蜜桃在线观看..| 国产成人a∨麻豆精品| 汤姆久久久久久久影院中文字幕| 久久久久久久国产电影| 色哟哟·www| www.av在线官网国产| 国产精品一区www在线观看| 欧美日韩视频精品一区| a级一级毛片免费在线观看| 国产黄片视频在线免费观看| 精品国产露脸久久av麻豆| 51国产日韩欧美| 高清日韩中文字幕在线| 亚洲丝袜综合中文字幕| 99久久综合免费| 日韩人妻高清精品专区| tube8黄色片| 一本色道久久久久久精品综合| 久久久久国产网址| 亚洲精品成人av观看孕妇| 少妇丰满av|