• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    Machine Learning Techniques Applied to Electronic Healthcare Records to Predict Cancer Patient Survivability

    2021-12-11 13:29:24OrnelaBardhiandBegonyaGarciaZapirain
    Computers Materials&Continua 2021年8期

    Ornela Bardhi and Begonya Garcia Zapirain

    1eVIDA Lab,University of Deusto,Bilbao,48007,Spain

    2Success Clinic Oy,Helsinki,00180,Finland

    Abstract: Breast cancer (BCa) and prostate cancer (PCa) are the two most common types of cancer.Various factors play a role in these cancers, and discovering the most important ones might help patients live longer, better lives.This study aims to determine the variables that most affect patient survivability, and how the use of different machine learning algorithms can assist in such predictions.The AURIA database was used, which contains electronic healthcare records(EHRs)of 20,006 individual patients diagnosed with either breast or prostate cancer in a particular region in Finland.In total,there were 178 features for BCa and 143 for PCa.Six feature selection algorithms were used to obtain the 21 most important variables for BCa,and 19 for PCa.These features were then used to predict patient survivability by employing nine different machine learning algorithms.Seventy-five percent of the dataset was used to train the models and 25%for testing.Cross-validation was carried out using the StratifiedKfold technique to test the effectiveness of the machine learning models.The support vector machine classifier yielded the best ROC with an area under the curve (AUC)=0.83, followed by the KNeighborsClassifier with AUC=0.82 for the BCa dataset.The two algorithms that yielded the best results for PCa are the random forest classifier and KNeighborsClassifier,both with AUC=0.82.This study shows that not all variables are decisive when predicting breast or prostate cancer patient survivability.By narrowing down the input variables,healthcare professionals were able to focus on the issues that most impact patients, and hence devise better,more individualized care plans.

    Keywords: Machine learning; EHRs; feature selection; breast cancer;prostate cancer; survivability; Finland

    1 Introduction

    One in three people in Finland will develop cancer at some point during their lifetime [1].Every year, about 30,000 people are diagnosed with cancer.However, only two-thirds will recover from the disease [1].The most common cancer in men in Finland is prostate cancer (PCa) [2].In 2018, 5,016 new PCa cases were detected in Finland [3]; 28% of all new cancers in men.In the same year, 914 men died of PCa, with age-standardized mortality standing at 11.2 per 100,000.PCa patient mortality has remained relatively constant in recent years.By the age of 80,a Finnish man has an 11.6% risk of developing and 1.6% risk of dying from prostate cancer.The most substantial identified risk factors are age, ethnic background, hereditary susceptibility and environmental factors.Approximately 2%-5% of prostate cancers relate to hereditary cancers, and about 15%-20% are familial [4-6].A twin Scandinavian study showed that environmental factors play a more significant role in the development of PCa than hereditary factors [7].Excessive consumption of fat, meat and multivitamins may be associated with increased PCa risk [8,9].Exercise has been found to reduce PCa risk [10].Smoking, on the other hand, appears to increase aggressive PCa risk and may also increase its progression [11].

    The relative PCa survival rate one year after diagnosis is 98%:and after five years, 93%.PCa prognosis has remained unchanged over the last ten years [3].The 10-year survival forecast for men with local, highly differentiated prostate cancer is the same regardless of treatment(90%-94%).Treatments include active monitoring and, if necessary, radical treatments (surgery or radiotherapy), conservative monitoring and, where needed, endocrine therapy [2].

    The most common cancer in women in Finland is breast cancer (BCa).In 2018, 4,934 new BCa cases were detected in Finland; 29.8% of all new cancers in women.In the same year, 873 women died of BCa, with age-standardized mortality standing at 12.2 per 100,000 [3].BCa patient mortality has remained relatively constant in recent years.By the age of 70, a Finnish woman has an 8.52% risk of developing BCa.The relative BCa survival rate one year after diagnosis is 97.6%:and after five years, 91%.BCa prognosis has slightly improved over the last 15 years [3].Among the identified risk factors are gender, age, family history and hereditary susceptibility,ethnicity, pregnancy and breastfeeding history, weight, alcohol consumption and inactivity.The twin Scandinavian study [7] mentioned above showed that environmental factors play a far more significant role in BCa development than hereditary factors.Only 27% risk can explain hereditary factors [7].It is worth noting that male breast cancer accounted for just 0.6% of all Finnish BCa in 2018 [3], and treatment protocol is mainly based on the principles for female BCa [12].

    Different drugs are currently in use to treat BCa and PCa, and new ones are frequently being clinically trialed.Such treatments include chemotherapy, radiotherapy, endocrine therapy, surgery and, more recently, targeted therapy and immunotherapy.These treatments are administered in combination with each other to cure or keep the disease at bay.

    Previous studies have been conducted on predicting the risk of developing BCa and PCa.However, they differ substantially with regard to the different type of information used to make such predictions.In the case of BCa risk prediction, [13] machine learning (ML) models were developed using Gail model [14] inputs only, and models using both Gail model inputs and additional personal health data relevant to BCa risk.Three out of six of the ML models performed better when the additional personal health inputs were added for analysis, improving five-year BCa risk prediction [13].Another study assessed ML ensembles of preprocessing methods by improving the biomarker performance for early BCa survival prediction [15].The dataset used in this study consisted of genetic data.It concluded that a voting classifier is one way of improving single preprocessing methods.In [16], the authors developed an automated Ki67 scoring method to identify and score the tumor regions using the highest proliferative rates.The authors stated that automated Ki67 scores could contribute to models that predict BCa recurrence risk.As in [15],genetic inputs, pathologic data and age were used to make predictions.

    In the case of PCa risk predictions, Sapre et al.[17] showed that microRNA profiling of urine and plasma from radical prostatectomy could not predict if PCa is aggressive or slow-growing.Besides RNA data, clinical and pathological data were used to train and test ML.The authors of [18] added the PCa gene 3 biomarker to the Prostate Cancer Prevention Trial risk calculator(PCPTRC) [19], thereby improving PCPTRC accuracy.Reference [20] is an updated version of the PCPTRC calculator.A recent study in the USA on utilizing neighborhood socioeconomic variables to predict time to PCa diagnosis using ML [21] showed that such data could be useful for men with a high risk of developing PCa.

    This paper presents the results of a study that included Electronic Healthcare Records (EHRs)of breast and prostate cancer patients in a region in Southwest Finland.EHRs are the systematized collection of electronically-stored patient and population health information in digital format.Information stored in such systems varies from demographic information to all types of treatments and examinations that patients undergo throughout the course of their care.This information usually lacks structure or order, and requires thorough data cleaning prior to conducting any meaningful analysis.The social impact of analyzing such data is enormous.Understanding the most important variables for a particular disease helps hospitals allocate resources, and also helps healthcare professionals individualize care pathways for each patient.Patients thus benefit from a better quality of life.This study aimed to determine the most critical variables impacting BCa and PCa patient survivability, and how the use of ML models can aid prediction.

    2 Materials and Methods

    This paper complies with the GATHER statement [22].

    2.1 Study Design

    A retrospective cohort study was conducted using the EHRs of BCa and PCa patients treated at the District of Southwest Finland Hospital, via the Turku Centre for Clinical Informatics(TCCI).TCCI provided the Data Analytics Platform (DAP), a remote server where data was accessed and analyzed via a secure shell (SSH) connection.

    No ethical approval was required.Nonetheless, it was necessary to apply for authorization to use the data in compliance with privacy and ethical regulations under Finnish law.This study included anonymized patient data only.

    Success Clinic Oy sponsored the database.

    2.2 Materials

    The BCa and PCa data was stored in a PostgreSQL database engine in 24 separate tables according to treatment, or the department where the information was collected in the hospital.Structured Query Language (SQL) was utilized to retrieve data for each treatment line (e.g.,chemotherapy, radiotherapy, etc.) for both cancers separately and then each file was stored in CSV format.This approach was selected because the data was unstructured and thorough data cleaning and preprocessing conducted prior to analysis.In total, there were 20,006 individual patients aged 19-103, of whom 9,998 were female and 10,008 male.Of 20,006 patients, 9,922 were diagnosed with prostate cancer and 10,113 with breast cancer; 115 were male, 86 of whom were diagnosed with breast cancer only.The database contains information dating from January 2004 (when the regional repository was initially created) until the end of March 2019.

    2.3 Data

    The variables collected in this study were primarily based on previous research [23], a mixedmethod study was conducted aimed at understanding breast and prostate cancer patients’care journey from their perspective.The data in [23] was collected using qualitative methods and EHRs.Hospitals, however, do not collect the kind of data retrieved through qualitative methods in their electronic healthcare systems.An explanation of the type of data available and retrieved from the TCCI is given below.

    2.3.1 Demographic Data

    Demographic data included the patient’s current age, age at diagnosis, date of birth, date of death and years suffering from cancer from the first date of diagnosis.Although patient residence details were collected as part of the study, they did not form part of the analysis.

    2.3.2 Medical Data

    Medical data included biopsy results:cancer type, grade, Gleason score, progesterone receptor score, estrogen receptor score, HER2 receptor score, tumor size, lymph node involvement,Prostate-Specific Antigen (PSA).Treatment lines included chemotherapy drugs, number of cycles,chemotherapy start and finish date; the number of radiotherapy sessions, doses delivered, fractions delivered, radiation treatment start and finish date; endocrine therapy drugs; targeted therapy drugs; bisphosphonate drugs; comorbidities at the time of data collection.

    The World Health Organization International Classification of Diseases (ICD) version 10 [24]codes were employed for each disease.The main categories for BCa ICD10 codes were used such as c50, c50.1, c50.2, c50.3, c50.4, c50.5, c50.6, c50.7, c50.8 and c50.9.This was done because there were some inconsistencies when associating male breast cancer with male patients.Some were stored as being diagnosed with female breast cancer.This variable was dropped for PCa as there is only one ICD10 code-c61.Grade categories were grade 1, grade 2 and grade 3, and the Gleason score was 6 to 10.There were 18 separate categories for tumor size and 15 for lymph node involvement.Anatomical Therapeutic Chemical (ATC) Classification System codes were used to code chemotherapy, endocrine therapy, targeted therapy and bisphosphonate drugs.

    2.3.3 Lifestyle Data

    Lifestyle data included smoking and alcohol consumption.Other information such as diet,exercise, family history or female nulliparity [25] was not initially collected by hospitals, and is therefore not included in this study.Participant demographic characteristics are shown in Tab.1,created using tableone [26], a Python library for creating patient population summary statistics.

    2.4 Methods

    Machine learning methods were employed for both feature selection and classification.Python(version 3.5.2) [27] programming was used to preprocess and analyze data utilizing Python libraries.Besides Python, SQL was used since data was stored in a PostgreSQL server.The main libraries used during the preprocessing stage were Pandas and NumPy, both of which are opensource libraries providing high-performance, easy-to-use data structures and data analysis tools for scientific computing.Matplotlib and Seaborn open-source data visualization libraries were also used.The study used the scikit-learn (sklearn) library [28] for machine learning analysis, and was conducted on the server provided by TCCI.

    Table 1:Patient characteristics grouped according to gender

    Most of the variables were categorical.Hence one-hot encoding was utilized for encoding and preparing data for ML analysis.This is due to the fact that machine learning models do not work with categorical variables.

    Train_test_split(), a pre-defined method in the sklearn library, was employed to train and test the models.75% of the dataset was used for training the models and 25% for testing.The stratify parameter was included to split the data in a stratified fashion using the desired variable to predict survivability as class labels.

    The effectiveness of nine machine learning classifiers was assessed when predicting the probabilities that individuals were likely to survive or die within the first 15 years of diagnosis.The nine classifier types were:logistic regression (LR), support vector machine (SVM), nearest neighbor,na?ve Bayes (NB), decision tree (DT), and random forest (RF).These machine learning models were selected because each model has significant advantages, which could make it the best model to predict survivability/mortality risk based on the inputs chosen during the feature selection stage.

    Logistic regression classifies data by using maximum likelihood functions to predict the probabilities of outcome classes [29] such as alive/dead, healthy/sick, etc.LRs are widely used because they are simple and explicable.In order to model nonlinear relationships between variables with logistic regression, the relationships must be found prior to training, or various transformations of variables performed [30].

    Support vector machines were first introduced by Cortes et al.[31].Their objective is to find a hyperplane in the N number feature space that maximizes the distance between points corresponding to training dataset subjects in the output classes [32].SVMs are generalizable to different datasets and work well with high-dimensional data [29] and can accurately perform linear and nonlinear classification.Nonlinear classification is performed using the kernel, which maps inputs into high-dimensional feature spaces.However, SVMs require a lot of parameter tuning [13,29,33].

    Nearest neighbor algorithms work by finding a preset number of training samples that are closest in distance to the new point, and later predict the labels [34].In k-nearest neighbor(KNN) learning, the number of samples is a user-defined constant.By contrast, in radius-based neighbor learning, the constant varies depending on the local density of points [33].Despite their simplicity, nearest neighbors have been successful in many classification and regression problems.As a non-parametric method, it often manages to classify situations where the decision boundary is highly irregular.

    Naive Bayes models, unlike the previously described classifiers, are probabilistic classifiers [29]based on the Bayes theorem.NB models generally require less training data and have fewer parameters compared to other models such as SVMs etc.[35].NB models are good at disregarding noise or irrelevant inputs [35].However, they consider that the input variables are independent,which is not valid for most classification applications [29].Despite this assumption, these models have been successful in many complex problems [29].

    Decision trees organize knowledge extracted from data in a recursive hierarchical structure composed of nodes and branches [36].DTs are non-parametric, supervised learning methods used for both classification and regression, whose goal is to create a model that predicts the value of a target feature by learning simple rules inferred from the input features.Besides nodes and branches, DTs are made up of leaves, the last nodes being found at the bottom of the tree [32].Some advantages of DTs are that they are simple to understand and interpret (trees can be visualized), require scarce data preparation (no data normalization is needed), can handle both numerical and categorical data, and the model can be validated by using statistical tests [33].Besides all these positive aspects of DTs, particular care should be taken when working with them as over-complex trees can be created that are poorly generalized [33].DTs can also be unstable when introducing small variations into data, which can be mitigated by using them within an ensemble [33].

    Random forest is a meta model that fits various decision tree classifiers into a number of subsamples on the dataset.RF uses averaging to improve predictive accuracy and control overfitting.The sub-sample size is controlled by the max_sample parameter when the bootstrap is set to True(default); otherwise, each tree uses the whole dataset.Individual DTs generally tend to have high variance and overfit.RFs yield DTs and take an average of the predictions, which leads to some errors being canceled out.RFs achieve reduced variance by combining diverse trees, sometimes to the detriment of a slight increase in bias.In practice, variance reduction is often significant,hence yielding a better overall model.

    The LR, NB, DT, SVM, and KNN models were implemented using the Python scikitlearn package (version 0.23.1) [28,33].The “l(fā)inear_model.LogisticRegression” function was used for logistic regression, and “naive_bayes.GaussianNB” and “naive_bayes.BernoulliNB” for naive Bayes.The “tree.DecisionTreeClassifier” function was used to create a decision tree, and “ensemble.RandomForestClassifier” to create a random forest classifier.“svm.SVC” implementation was applied with probability predictions enabled, and “svm.LinearSVC” for the support vector machine.The “neighbors.KNeighborsClassifier” model was used for nearest neighbor, and a grid search technique to extract the best parameters for each function.

    Finally, all the features/variables used to train the machine learning models were scaled to be centered around 0 and transformed to unit variance since the datasets had features on different scales, e.g., height in meters and weight in kilograms.Rescaling variables is mandatory because machine learning models assume that data is normally distributed.Also, doing so helps to train the models quickly and generalize more effectively [37].StandardScaler was chosen to scale the data since it is one of the most popular rescaling methods [37].

    3 Results

    This section is structured in two parts.The first explains feature selection, and the second addresses the classification analysis performed in relation to the features selected from part one.

    3.1 Feature Selection

    Feature selection is the process of selecting a set of variables that are significant to the analysis to be conducted.The objective of feature selection is manifold:(i) it provides a better understanding of the underlying process generating data, (ii) faster and more cost-effective predictors, and (iii) improves predictor prediction performance [38].

    There are different techniques to select the relevant variables.The first technique employed was recursive feature elimination (RFE), whose goal is to remove features step-by-step by using an external estimator that assigns weights to features [33].The estimator is trained on the initial dataset, which contains all the features.Each feature’s importance is obtained via two attributes:(i) coef_; or (ii) feature_importances_ [33].The least important features are eliminated from the current set of features recursively until the set number of features to be selected is reached.The estimators used to perform RFE are logistic regression, stochastic gradient descent, random forest,linear SVM and perceptron.Tab.2 shows the estimators used in analysis and accuracy for each number of features selected when predicting whether a patient will survive.

    Table 2:Feature selection algorithms and accuracy score

    Besides RFE, SelectFromModel with a Lasso estimator was used.SelectFromModel is a metatransformer used alongside an estimator.After fitting, the estimator has an attribute stating feature importance, such as the coef_ or feature_importances_ attributes.In order to control the feature selection algorithms, the same parameters were used to set a limit on the number of features to be selected, the n_features_to_select for RFE and max_features for SelectFromModel.

    In order to verify the results obtained from RFE and the SelectFromModel algorithms, the Random Forest Classifier and XGBoost [39] were used.Both these algorithms have a specific attribute to select the best features.The feature_importances_ attribute was used for the Random Forest Classifier and the plot_importance() [39,40] method for XGBoost with height set to 0.5 as the parameter.XGBoost was employed on the basis of being an optimized distributed gradient boosting library designed to be flexible, efficient, and portable [39].It uses machine learning algorithms under the Gradient Boosting framework as well as providing parallel tree boosting,which has proven to be highly efficient at solving various problems.

    The XGBoost results with the most important features and scores are shown in Fig.1.In total, 21 features were selected after running the XGBoost estimator for BCa data, and 15 features for PCa data.The results from Random Forest are shown in Tab.3.All features selected by the algorithms are shown for both BCa and PCa.

    Figure 1:Feature selection and importance extracted from XGBoost for (a) breast cancer and(b) prostate cancer.Features for both databases are specific to the diseases, and indexes for each feature are different, ex.f0 in the breast cancer dataset represents feature c50_diag_age, whereas in prostate cancer, it represents c61_diag_age, etc.

    Apart from the features shown in Tab.3, there are six more features (total 21) that were selected but not shown in the table:her2_neg, alcohol_no, alcohol_yes, L02BG04, tumor_size_1,lymph_node_0.All features mentioned above had an F score of at least 1, also shown in Fig.2.All feature indexes refer to the features themselves when shown in Tabs.3 and 4.

    The final features selected for analysis are shown in Tab.5.All the features are included that were chosen by at least two estimators, which is shown in the “times”(how many estimators chose the feature) columns for each cancer separately.

    Table 3:Features selected using different estimators for breast cancer

    Figure 2:ROC AUC for breast cancer

    Table 4:Features selected using different estimators for prostate cancer

    Table 5:Features selected for breast and prostate cancer data analysis

    3.2 Classification Using Machine Learning

    Nine different classification algorithms/estimators were selected for analysis, which was carried out after having chosen the features via the feature selection process.All estimators have several hyperparameters.A GridSearchCV was performed—an exhaustive search over specified parameter values for an estimator—to obtain the best hyperparameters for each algorithm.All parameters and values for each estimator are as follows.

    1.LogisticRegression parameters:

    a.‘penalty’:[‘11,’‘l2,’‘elasticnet’],

    b.‘solver’:[‘lbfgs,’‘liblinear,’‘sag,’‘saga’],

    c.‘max_iter’:[1000, 3000, 5000]

    2.LinearSVC and SVC parameters:

    a.‘max_iter’:[1000, 3000, 5000],

    b.‘C’:[0.001, 0.01, 0.1]

    3.SGDClassifier parameters:

    a.‘loss’:[‘hinge,’‘log,’‘squared_hinge,’‘perceptron’],

    b.‘a(chǎn)lpha’:[0.0001, 0.001, 0.01, 0.1],

    c.‘penalty’:[‘l1,’‘l2,’‘elasticnet’]

    4.KNeighborsClassifier parameters:

    a.‘n_neighbors’:[3-6],

    b.‘a(chǎn)lgorithm’:[‘a(chǎn)uto,’‘ball_tree,’‘kd_tree,’‘brute’]

    5.BernoulliNB parameters:

    a.‘a(chǎn)lpha’:[0.1, 0.2, 0.4, 0.6, 0.8, 1]

    6.GaussianNB parameters:defaults

    7.RandomForestClassifier and DecisionTreeClassifier parameters:

    a.‘max_depth’:[2-5],

    b.‘min_samples_leaf’:[0.1, 0.12, 0.14, 0.16, 0.18]

    The best value for each hyperparameter is displayed below in Tab.6 for each estimator and disease:

    Table 6:Selected best hyperparameters for each type of cancer

    The Receiver Operating Characteristic (ROC) and AUC metric were used to assess classifier quality.The ROC curve features a true positive rate on the Y-axis and a false positive rate on the X-axis, meaning that the top left corner of the plot is the “ideal” point (a zero false positive rate and a one true positive rate) [41].Although the “ideal point” is not realistic, it usually indicates that larger AUC is preferable.The ROC curve’s “steepness” is also essential since it is ideal for maximizing the true positive rate while minimizing the false positive rate.

    Cross-validation was performed for each estimator using scikit-learn StratifiedKFold with the default value of the number of splits set to 5 (5-fold cross-validation).The ROC AUC curve for each estimator with cross-validation for breast cancer is shown in Fig.2 and in Fig.3 for prostate cancer.

    Figure 3:ROC AUC for prostate cancer

    It can be clearly seen that the support vector machine classifier achieved the best ROC AUC curve for the breast cancer dataset with an area under the curve=0.83±0.01, followed by KNeighborsClassifier with AUC=0.82±0.01.Whereas, for the prostate cancer dataset, the random forest classifier and KNeighborsClassifier had the best ROC, both yielding AUC =0.82±0.01.

    Conversely, the worst performances for the breast cancer dataset were identified by the following classifiers:Bernoulli Na?ve Bayes with ROC AUC=0.71±0.02, LinearSVC with ROC AUC=0.72±0.01, and LogisticRegression with ROC AUC=0.73±0.01.These same classifiers also performed poorly on the prostate cancer dataset, with ROC AUC=0.64±0.01 for Bernoulli Na?ve Bayes, 0.66 ± 0.01 for LinearSVC, and 0.67 ± 0.01 for LogisticRegression.In general,Decision Trees, Random Forest and Nearest Neighbors performed very well on both datasets with ROC AUC above 0.80.

    In addition, ensemble learning was performed using bagging and voting with cross-validation.BaggingClassifier was used for bagging, and VotingClassifier for voting.In the case of BaggingClassifier, the number of trees was set to 500, and KFold cross-validator was used for cross-validation.The ROC-AUC curve for the breast cancer dataset is shown in Fig.4, and for the prostate cancer dataset in Fig.5.

    As in the previous cross-validation analysis, the best results for BaggingClassifier, in the case of the breast cancer dataset, were yielded by KNeighborsClassifier with a ROC AUC score=0.94,followed by a ROC AUC score = 0.91 for SVC.The worst performers were BernoulliNB and DecisionTreeClassifier, both with a ROC AUC score=0.80.Similarly, in the bagging analysis for the prostate cancer dataset, the best classifiers were KNeighborsClassifier and SVC with ROC AUC scores = 0.92 and 0.88, respectively.Finally, the worst classifiers were DecisionTree and GaussianNB, with ROC AUC scores=0.80 and 0.82, respectively.

    Figure 4:BaggingClassifier for breast cancer dataset

    Figure 5:BaggingClassifier for prostate cancer dataset

    3.3 Comparing Machine Learning Models

    The accuracy score, precision, recall and F1 score were selected in the training and test sets in order to compare how each model scored when predicting each patient’s survivability.Since the problem was a binary classification problem, the results for both classes are presented; the first class, class 0, being patients still alive, and the second, class 1, those who have died.Tab.7 shows the results for the breast cancer dataset and Tab.8 for the prostate cancer dataset.These results were obtained by using the classification_report imported from the sklearn library metrics module.

    In addition, the selected models were trained and tested using the voting technique, with and without data standardization.It was noted that when data standardization techniques were employed such as StandardScaler(), better results were obtained on all counts for the BCa dataset.However, this was not the case for the PCa dataset.Recall in class 1 and precision in class 2 are slightly worse, but the others either remain unchanged, such as the accuracy scores and F1 score in class 1, or are marginally better.

    In general, the algorithms performed better on the breast cancer dataset compared to prostate cancer.One reason could be dataset size; the BCa dataset is slightly larger and more balanced than the PCa dataset.Another reason could be the features.Despite using feature selection algorithms to select the most appropriate variables, other features that were omitted may improve the results.

    Table 7:Comparison of machine learning models results for breast cancer dataset

    Table 8:Comparison of machine learning models results for prostate cancer dataset

    4 Discussions

    There are multiple variables for each of these two types of cancer.This study sought to analyze which variables were of most importance when predicting patient survivability, or the mortality risk, within the first 15 years of cancer diagnosis.In total, 179 features were included on the breast cancer dataset and 144 on the prostate cancer dataset.

    Valid results were obtained by only selecting 15 features after running different feature selection algorithms with different numbers of selected features.In other words, the difference in accuracy achieved by including all 179 features or just 15 features was insignificant.

    The selected features are some of the main risk factors of these diseases.In both cancers,it is clear that age at diagnosis and years suffering from cancer are two of the main features that predict whether a patient will survive.Among the selected features, there are few relating to medications and lifestyle (see Tab.9).Medications for BCa include L02BA03, L02BG04,L01CA04 and L01BC06; and L02AE02 and L02BX02 for PCa.

    Table 9:Generic names and ATC codes for medication selected during the feature selection process

    When attempting to predict the progression of these cancers, it is difficult to make comparisons between studies.This is due to the lack of large, publicly available datasets, numbers of records and number of variables the datasets contain.Moreover, there is a sheer number of hypotheses that these studies test.This can even be seen in the feature selection algorithms used by various authors.Earlier studies used the F-Score to reduce the number of variables [42,43], with more recent studies moving toward more sophisticated algorithms such as random forest [44,45]and genetic algorithms [46].

    5 Limitations and Future Work

    The database is very comprehensive and covers a wealth of data.This study has endeavored to include as much data as possible in its analytical approach.Nevertheless, laboratory results have not been included.The reason being that blood tests are routinely performed, and results vary depending on the treatment the patient is undergoing.Analyzing the averages of such results would fail to yield any meaningful results.However, other ways of incorporating this information into the analysis are being investigated.Another analysis method currently being developed is to conduct a similar study with different deep learning models and compare these results with the results obtained from the machine learning analysis.

    Also, it should be noted that these results are specific to this Finnish population.Each country has its own guidelines and approved medications for certain diseases, so training the same models on a different dataset could deliver different results.

    Acknowledgement:We wish to thank the Marie Sklodowska-Curie Action for funding the project and Success Clinic Oy for purchasing the dataset and welcoming O.B.to conduct her research.

    Funding Statement:O.B.received funding from the European Union’s Horizon 2020 CATCH ITN project under the Marie Sklodowska-Curie grant agreement no.722012, website https://www.catchitn.eu/.

    Conflict of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

    涩涩av久久男人的天堂| 脱女人内裤的视频| 一边摸一边抽搐一进一小说| 日韩欧美一区视频在线观看| 又黄又粗又硬又大视频| 女人被狂操c到高潮| 国产高清有码在线观看视频 | 国内精品久久久久久久电影| АⅤ资源中文在线天堂| 国产午夜福利久久久久久| 国产精品乱码一区二三区的特点 | 亚洲男人的天堂狠狠| 免费高清在线观看日韩| 丁香欧美五月| 亚洲一区二区三区色噜噜| 国产伦一二天堂av在线观看| 精品久久蜜臀av无| 天堂动漫精品| 亚洲色图av天堂| 国产激情久久老熟女| 18禁裸乳无遮挡免费网站照片 | 无人区码免费观看不卡| 丝袜美腿诱惑在线| 在线观看一区二区三区| 18禁观看日本| 在线十欧美十亚洲十日本专区| 国产精品秋霞免费鲁丝片| 两个人视频免费观看高清| 国产精品久久久久久精品电影 | 高清黄色对白视频在线免费看| 黄色 视频免费看| 亚洲中文日韩欧美视频| svipshipincom国产片| 国产麻豆69| 99国产精品一区二区蜜桃av| 日韩欧美一区视频在线观看| 国产精品99久久99久久久不卡| 99在线人妻在线中文字幕| 精品人妻1区二区| 国产精品1区2区在线观看.| 丁香六月欧美| 国产精品亚洲av一区麻豆| 男人操女人黄网站| 久久精品国产亚洲av高清一级| 亚洲专区中文字幕在线| 久久精品亚洲精品国产色婷小说| 啦啦啦观看免费观看视频高清 | 中文字幕久久专区| 成人三级做爰电影| 日本免费a在线| 男女做爰动态图高潮gif福利片 | 一级毛片高清免费大全| 波多野结衣一区麻豆| 亚洲精品久久国产高清桃花| 高清黄色对白视频在线免费看| 欧美午夜高清在线| 国产私拍福利视频在线观看| 国产精品久久久人人做人人爽| 少妇 在线观看| 美国免费a级毛片| 国产精品野战在线观看| 看免费av毛片| 亚洲国产精品久久男人天堂| 成人三级做爰电影| 婷婷丁香在线五月| 无人区码免费观看不卡| 午夜福利在线观看吧| 99国产综合亚洲精品| 黄色视频不卡| 亚洲人成77777在线视频| 波多野结衣高清无吗| 国产成人欧美| 国产精品亚洲一级av第二区| 亚洲av片天天在线观看| 国产成人av激情在线播放| 精品第一国产精品| 国产av一区在线观看免费| www国产在线视频色| 亚洲一卡2卡3卡4卡5卡精品中文| 久久久水蜜桃国产精品网| 91麻豆精品激情在线观看国产| 嫩草影视91久久| 身体一侧抽搐| 老司机靠b影院| 亚洲av片天天在线观看| 日韩大尺度精品在线看网址 | 啦啦啦韩国在线观看视频| 亚洲欧美激情综合另类| 久久这里只有精品19| tocl精华| 一个人观看的视频www高清免费观看 | 天天躁夜夜躁狠狠躁躁| 亚洲va日本ⅴa欧美va伊人久久| 50天的宝宝边吃奶边哭怎么回事| 日本一区二区免费在线视频| 欧美激情 高清一区二区三区| 在线观看舔阴道视频| 国产成人精品无人区| 岛国视频午夜一区免费看| 亚洲男人天堂网一区| 人人妻人人爽人人添夜夜欢视频| 午夜影院日韩av| av中文乱码字幕在线| 免费看a级黄色片| 午夜免费成人在线视频| 少妇粗大呻吟视频| 一区二区三区高清视频在线| 亚洲 国产 在线| av天堂在线播放| 亚洲伊人色综图| 亚洲精品在线观看二区| 久久久国产欧美日韩av| 国产精品久久久久久亚洲av鲁大| 精品不卡国产一区二区三区| 午夜激情av网站| 欧美乱色亚洲激情| 91在线观看av| 欧美日本视频| 亚洲一区二区三区不卡视频| 国产亚洲精品综合一区在线观看 | 少妇 在线观看| 国产伦一二天堂av在线观看| 黑人操中国人逼视频| 国产精品亚洲美女久久久| 国产熟女xx| 啦啦啦观看免费观看视频高清 | 大型黄色视频在线免费观看| 不卡一级毛片| 日本免费一区二区三区高清不卡 | 亚洲成人国产一区在线观看| 中文字幕人妻丝袜一区二区| 大陆偷拍与自拍| 男女床上黄色一级片免费看| 久久精品国产亚洲av香蕉五月| 欧美精品亚洲一区二区| 欧美 亚洲 国产 日韩一| 九色国产91popny在线| 正在播放国产对白刺激| 91麻豆av在线| 久久久久国产一级毛片高清牌| 两性夫妻黄色片| 69精品国产乱码久久久| 久久性视频一级片| 波多野结衣一区麻豆| 无人区码免费观看不卡| 一级作爱视频免费观看| 在线观看免费午夜福利视频| 黄片大片在线免费观看| 一级,二级,三级黄色视频| 国产精品久久视频播放| 大香蕉久久成人网| 午夜精品在线福利| 午夜福利视频1000在线观看 | av福利片在线| 制服诱惑二区| 久久精品成人免费网站| 69av精品久久久久久| 亚洲欧美一区二区三区黑人| 亚洲性夜色夜夜综合| 国产午夜精品久久久久久| 精品乱码久久久久久99久播| 一进一出好大好爽视频| 男女做爰动态图高潮gif福利片 | 日本一区二区免费在线视频| 嫁个100分男人电影在线观看| 无人区码免费观看不卡| 色老头精品视频在线观看| 亚洲欧美日韩高清在线视频| 50天的宝宝边吃奶边哭怎么回事| 国产97色在线日韩免费| 国产成人一区二区三区免费视频网站| 久久亚洲真实| 欧美乱妇无乱码| 麻豆av在线久日| 男女下面插进去视频免费观看| 久久久水蜜桃国产精品网| 天堂√8在线中文| 成在线人永久免费视频| 久久人妻熟女aⅴ| АⅤ资源中文在线天堂| 亚洲精品国产精品久久久不卡| 国产精品久久久人人做人人爽| 嫩草影视91久久| 国产av一区二区精品久久| 国产高清激情床上av| 久久精品aⅴ一区二区三区四区| 精品不卡国产一区二区三区| 一区二区三区激情视频| 日韩欧美一区视频在线观看| 日韩欧美免费精品| 午夜老司机福利片| 美女扒开内裤让男人捅视频| 正在播放国产对白刺激| 久久人人爽av亚洲精品天堂| 精品卡一卡二卡四卡免费| 欧美性长视频在线观看| 精品国产超薄肉色丝袜足j| 天天躁狠狠躁夜夜躁狠狠躁| 老汉色av国产亚洲站长工具| 叶爱在线成人免费视频播放| 一级片免费观看大全| 国产高清有码在线观看视频 | www.999成人在线观看| 一个人免费在线观看的高清视频| 国产精品精品国产色婷婷| 男人操女人黄网站| 女人精品久久久久毛片| 欧美久久黑人一区二区| 一本大道久久a久久精品| 亚洲三区欧美一区| 午夜免费成人在线视频| 国产私拍福利视频在线观看| 国产亚洲精品久久久久5区| 久久久久九九精品影院| 桃红色精品国产亚洲av| 女人被狂操c到高潮| 欧美一级a爱片免费观看看 | 国产一卡二卡三卡精品| 叶爱在线成人免费视频播放| 欧美色视频一区免费| 成人三级做爰电影| 亚洲 欧美一区二区三区| 午夜激情av网站| 欧美日本亚洲视频在线播放| 久久精品影院6| www.www免费av| 欧美黄色淫秽网站| 国产人伦9x9x在线观看| 久久精品亚洲精品国产色婷小说| 又黄又爽又免费观看的视频| 好男人电影高清在线观看| 国产主播在线观看一区二区| 亚洲人成77777在线视频| 狠狠狠狠99中文字幕| 中文字幕人妻熟女乱码| 九色国产91popny在线| 亚洲自偷自拍图片 自拍| 亚洲专区中文字幕在线| 国产精品,欧美在线| 欧美日韩乱码在线| 精品熟女少妇八av免费久了| 国产高清videossex| 好看av亚洲va欧美ⅴa在| 老司机靠b影院| 少妇被粗大的猛进出69影院| 成在线人永久免费视频| 精品欧美国产一区二区三| 丰满人妻熟妇乱又伦精品不卡| 久久久久久久久免费视频了| 亚洲一区二区三区不卡视频| 一区二区三区高清视频在线| 中文字幕人成人乱码亚洲影| 视频区欧美日本亚洲| tocl精华| 亚洲精品国产色婷婷电影| 国产精品乱码一区二三区的特点 | 美女大奶头视频| 免费高清视频大片| 免费女性裸体啪啪无遮挡网站| 国产成人精品在线电影| 黄频高清免费视频| 中文字幕人妻熟女乱码| 亚洲欧洲精品一区二区精品久久久| 国产精品永久免费网站| 国产1区2区3区精品| 午夜激情av网站| 午夜精品国产一区二区电影| 久久精品国产亚洲av香蕉五月| 久久精品国产亚洲av香蕉五月| 亚洲激情在线av| 天堂√8在线中文| 18禁美女被吸乳视频| tocl精华| 午夜福利成人在线免费观看| 搡老熟女国产l中国老女人| 日韩中文字幕欧美一区二区| 亚洲欧美日韩无卡精品| 国产xxxxx性猛交| 亚洲第一青青草原| 天天添夜夜摸| 精品欧美国产一区二区三| 日韩有码中文字幕| 久久影院123| 激情在线观看视频在线高清| 99久久久亚洲精品蜜臀av| 婷婷六月久久综合丁香| 国产精品av久久久久免费| 精品国产一区二区久久| 亚洲欧美日韩高清在线视频| 亚洲三区欧美一区| 在线观看免费视频网站a站| 亚洲人成77777在线视频| bbb黄色大片| 亚洲精品av麻豆狂野| 亚洲一区二区三区色噜噜| 国产蜜桃级精品一区二区三区| 国内毛片毛片毛片毛片毛片| 免费女性裸体啪啪无遮挡网站| 99香蕉大伊视频| 午夜久久久久精精品| 精品少妇一区二区三区视频日本电影| 91字幕亚洲| 国产黄a三级三级三级人| a在线观看视频网站| 极品人妻少妇av视频| 日日爽夜夜爽网站| 激情视频va一区二区三区| 国产一卡二卡三卡精品| 制服丝袜大香蕉在线| 欧美日韩一级在线毛片| 日韩欧美国产在线观看| www日本在线高清视频| 国产野战对白在线观看| 天天添夜夜摸| 欧美日韩福利视频一区二区| 国产精品一区二区在线不卡| 美国免费a级毛片| 嫩草影视91久久| 99国产精品免费福利视频| 日本 欧美在线| 国产精品久久电影中文字幕| 欧美日韩瑟瑟在线播放| 午夜精品国产一区二区电影| 亚洲av美国av| 久久久久久久久久久久大奶| 99久久综合精品五月天人人| 国产乱人伦免费视频| 一a级毛片在线观看| 日日干狠狠操夜夜爽| 日韩一卡2卡3卡4卡2021年| 91老司机精品| 亚洲人成网站在线播放欧美日韩| 身体一侧抽搐| 精品一品国产午夜福利视频| 叶爱在线成人免费视频播放| 三级毛片av免费| 一级毛片精品| 亚洲中文字幕一区二区三区有码在线看 | 99国产精品一区二区蜜桃av| 亚洲av电影不卡..在线观看| 国语自产精品视频在线第100页| 巨乳人妻的诱惑在线观看| 久久久久九九精品影院| 9色porny在线观看| 国产精品综合久久久久久久免费 | 欧美日韩亚洲综合一区二区三区_| 国产欧美日韩综合在线一区二区| 淫妇啪啪啪对白视频| 九色亚洲精品在线播放| 国产欧美日韩一区二区三| 亚洲成av人片免费观看| 日日夜夜操网爽| 伊人久久大香线蕉亚洲五| 99在线人妻在线中文字幕| 99久久国产精品久久久| netflix在线观看网站| 涩涩av久久男人的天堂| 亚洲性夜色夜夜综合| 97超级碰碰碰精品色视频在线观看| 精品欧美一区二区三区在线| 又黄又爽又免费观看的视频| 一a级毛片在线观看| 视频在线观看一区二区三区| 我的亚洲天堂| 亚洲精品在线美女| 亚洲激情在线av| av福利片在线| 国产伦一二天堂av在线观看| 丁香欧美五月| 欧美成人一区二区免费高清观看 | 亚洲免费av在线视频| 亚洲国产欧美网| 夜夜看夜夜爽夜夜摸| 久久精品aⅴ一区二区三区四区| 日日夜夜操网爽| 两性午夜刺激爽爽歪歪视频在线观看 | 亚洲精品美女久久av网站| 香蕉丝袜av| 男人操女人黄网站| 亚洲情色 制服丝袜| 老司机在亚洲福利影院| 天天躁夜夜躁狠狠躁躁| 男女下面进入的视频免费午夜 | 国产精品 国内视频| 成年人黄色毛片网站| 中出人妻视频一区二区| 欧美一级a爱片免费观看看 | 一区二区三区国产精品乱码| 国产精品一区二区精品视频观看| 12—13女人毛片做爰片一| 国产又色又爽无遮挡免费看| 电影成人av| 日韩欧美国产在线观看| 国产一区二区三区视频了| 在线天堂中文资源库| 很黄的视频免费| 757午夜福利合集在线观看| 久久人人精品亚洲av| 亚洲精品美女久久av网站| 午夜精品久久久久久毛片777| 亚洲天堂国产精品一区在线| 在线观看66精品国产| 免费在线观看完整版高清| 欧美人与性动交α欧美精品济南到| 97超级碰碰碰精品色视频在线观看| 黑人欧美特级aaaaaa片| 少妇 在线观看| 亚洲av电影在线进入| 中文字幕色久视频| 99久久99久久久精品蜜桃| 成人18禁高潮啪啪吃奶动态图| 怎么达到女性高潮| 国产精品免费视频内射| av在线播放免费不卡| 亚洲五月天丁香| 精品国产乱子伦一区二区三区| 国产视频一区二区在线看| 亚洲久久久国产精品| 午夜老司机福利片| 欧美性长视频在线观看| 日韩欧美三级三区| 非洲黑人性xxxx精品又粗又长| 精品国产国语对白av| 妹子高潮喷水视频| 欧美日韩乱码在线| 法律面前人人平等表现在哪些方面| 亚洲精品在线美女| 一个人免费在线观看的高清视频| 18禁裸乳无遮挡免费网站照片 | 韩国精品一区二区三区| 久久婷婷人人爽人人干人人爱 | 性欧美人与动物交配| 夜夜躁狠狠躁天天躁| 最好的美女福利视频网| 国内毛片毛片毛片毛片毛片| 日韩欧美国产在线观看| 变态另类成人亚洲欧美熟女 | 午夜日韩欧美国产| 熟女少妇亚洲综合色aaa.| 高清在线国产一区| 女同久久另类99精品国产91| 亚洲精品久久成人aⅴ小说| 久久精品人人爽人人爽视色| 午夜精品久久久久久毛片777| 狂野欧美激情性xxxx| 咕卡用的链子| 女人高潮潮喷娇喘18禁视频| 亚洲久久久国产精品| 久久久国产精品麻豆| 18美女黄网站色大片免费观看| 色尼玛亚洲综合影院| 国产亚洲av高清不卡| 欧美黑人精品巨大| a级毛片在线看网站| 国产区一区二久久| 中文字幕久久专区| 国产一区二区三区视频了| 国产又色又爽无遮挡免费看| 日日干狠狠操夜夜爽| √禁漫天堂资源中文www| 久久精品亚洲熟妇少妇任你| av欧美777| a级毛片在线看网站| 精品卡一卡二卡四卡免费| 国产野战对白在线观看| 色综合亚洲欧美另类图片| 国产高清有码在线观看视频 | 精品人妻1区二区| 久久婷婷成人综合色麻豆| 一进一出好大好爽视频| 久久精品91无色码中文字幕| 中文字幕最新亚洲高清| 老熟妇乱子伦视频在线观看| 婷婷六月久久综合丁香| bbb黄色大片| 欧美精品啪啪一区二区三区| 在线天堂中文资源库| 三级毛片av免费| 欧美日韩亚洲国产一区二区在线观看| 人人澡人人妻人| 国产视频一区二区在线看| 黄色女人牲交| 黄片大片在线免费观看| 正在播放国产对白刺激| 精品不卡国产一区二区三区| 久久亚洲精品不卡| www.www免费av| 久久久久国产一级毛片高清牌| 欧美不卡视频在线免费观看 | 成人欧美大片| 成人特级黄色片久久久久久久| 久热这里只有精品99| 视频在线观看一区二区三区| 精品熟女少妇八av免费久了| 国内久久婷婷六月综合欲色啪| 精品欧美一区二区三区在线| 亚洲男人的天堂狠狠| 中文字幕人妻熟女乱码| 69av精品久久久久久| 久久狼人影院| 国产一区二区在线av高清观看| 免费女性裸体啪啪无遮挡网站| 麻豆国产av国片精品| 欧美黄色淫秽网站| 777久久人妻少妇嫩草av网站| 99国产精品免费福利视频| 日日摸夜夜添夜夜添小说| 久久国产精品男人的天堂亚洲| 久久久水蜜桃国产精品网| 十分钟在线观看高清视频www| 亚洲欧美日韩另类电影网站| 日韩欧美三级三区| 男人的好看免费观看在线视频 | 久久婷婷成人综合色麻豆| 亚洲成av片中文字幕在线观看| 久久香蕉激情| 国产成人免费无遮挡视频| 国产av在哪里看| 亚洲人成77777在线视频| 99久久国产精品久久久| 脱女人内裤的视频| 99久久久亚洲精品蜜臀av| 女生性感内裤真人,穿戴方法视频| 在线观看日韩欧美| 欧美另类亚洲清纯唯美| 少妇熟女aⅴ在线视频| 在线观看一区二区三区| 最近最新中文字幕大全免费视频| 国产麻豆成人av免费视频| 国产精品电影一区二区三区| 嫩草影视91久久| 变态另类丝袜制服| 国产精品秋霞免费鲁丝片| 日韩国内少妇激情av| 淫秽高清视频在线观看| 97超级碰碰碰精品色视频在线观看| 欧美一级a爱片免费观看看 | 欧美乱妇无乱码| 搡老熟女国产l中国老女人| 一区福利在线观看| 久久久久久久久久久久大奶| 欧美久久黑人一区二区| 国产亚洲精品第一综合不卡| 亚洲色图 男人天堂 中文字幕| 日韩欧美国产在线观看| 麻豆久久精品国产亚洲av| 国产精品香港三级国产av潘金莲| 亚洲欧美一区二区三区黑人| 国内久久婷婷六月综合欲色啪| 亚洲av片天天在线观看| 在线观看免费视频日本深夜| 欧美日本中文国产一区发布| 日韩大尺度精品在线看网址 | 一边摸一边做爽爽视频免费| 亚洲狠狠婷婷综合久久图片| 日日爽夜夜爽网站| 精品国产超薄肉色丝袜足j| 日本欧美视频一区| 别揉我奶头~嗯~啊~动态视频| 少妇熟女aⅴ在线视频| 1024视频免费在线观看| 欧美精品啪啪一区二区三区| 久久中文字幕一级| 欧美绝顶高潮抽搐喷水| 90打野战视频偷拍视频| 国产日韩一区二区三区精品不卡| 日韩欧美三级三区| 欧美成人免费av一区二区三区| 欧美一区二区精品小视频在线| 国产欧美日韩综合在线一区二区| 999精品在线视频| 十八禁人妻一区二区| 午夜两性在线视频| 中文字幕av电影在线播放| 不卡一级毛片| 午夜老司机福利片| av天堂在线播放| 大香蕉久久成人网| 精品人妻1区二区| 在线观看免费午夜福利视频| 97人妻精品一区二区三区麻豆 | 国产熟女xx| 变态另类丝袜制服| 高潮久久久久久久久久久不卡| 久久人人精品亚洲av| 国产精品久久久久久人妻精品电影| 欧美成人午夜精品| 中国美女看黄片| 在线观看舔阴道视频| 久久中文字幕人妻熟女| 黄片播放在线免费| 国产一卡二卡三卡精品| 亚洲人成伊人成综合网2020| 国产一区二区三区视频了| 黄片大片在线免费观看| 日本黄色视频三级网站网址| 99精品久久久久人妻精品| ponron亚洲| 亚洲国产精品成人综合色| 免费在线观看影片大全网站| 国产欧美日韩一区二区精品| 成人三级黄色视频| 亚洲免费av在线视频| 欧美日本亚洲视频在线播放| 亚洲成国产人片在线观看| 波多野结衣av一区二区av| 黑人欧美特级aaaaaa片| АⅤ资源中文在线天堂| 亚洲自偷自拍图片 自拍| 90打野战视频偷拍视频| 中文字幕久久专区| www.999成人在线观看| 亚洲 欧美一区二区三区| 一二三四社区在线视频社区8| 淫妇啪啪啪对白视频|