ZHOU Qiang( ), YUE Kaixu(), DUAN Yao( )
School of Management and Engineering, Anhui Polytechnic University, Wuhu 241000,China
Abstract: A comprehensive analysis of the impact privacy incidents on its market value is given.A broad set of instances of the exposure of personal information from a summary of some security mechanisms and the corresponding results are presented. The cumulative effect increases in magnitude over day following the breach announcement, but then decreases. Besides, a new privacy protection property, that is, p-sensitive k-anonymity is presented in this paper to protect against identity disclosure. We illustrated the inclusion of the two necessary conditions in the algorithm for computing a p-k-minimal generalization. Algorithms such as k-anonymity and l-diversity remain all sensitive attributes intact and apply generalization and suppression to the quasi-identifiers. This will keep the data “truthful” and provide good utility for data-mining applications, while achieving less perfect privacy. We aim to get the problem based on the prior analysis, and study the issue of privacy protection from the perspective of the model-benefit.
Key words: privacy; security; economics; privacy protection; big data
With the advent of big data and cloud service computing era, the connotation and denotation of privacy are also changing. The privacy that is not disturbed in the traditional era is passive and defensive, while the right of privacy in the current era has the initiative and dominance. The design of the idea of privacy protection is intended to incorporate the concept of privacy protection at the beginning of product or service design. Its user-centered principle emphasizes that users have more control over their personal information.
With the advent of the era of electronic communication and social media, people urgently need to connect themselves to the outside world. The web world is based on the real world. People’s transactions and relationships in the virtual world need to be built on the basis of sharing their data, such as accounts of all kinds of social networking sites, hospital records and shopping histories. On one hand, we need to be a positive name for ourselves[1-5], and a good hierarchical record is conducive to the unified management of the backstage service staff. We provide our own real information for the trust in the network operators to a certain extent, and we are legal citizens. On the other hand, a good network environment makes it impossible for those who are not qualified to share our same resources.
Furthermore, the privacy right which is not disturbed in the traditional era is passive and defensive, while the right of privacy in the current era has the initiative and dominance. By designing the concept of privacy protection, we will integrate the concept of privacy protection at the beginning of product or service design. The user-centered principle emphasizes that users should be given with more control over their personal information[6-8].
The coming of a new era will bring great convenience to people.For example, fossil fuel discovery provides people with a continuous source of energy, while at the same time, the utilization of fossil fuels also brings environmental pollution, climate change and other drawbacks. In the age of mobile internet, people are getting more and more concerned about the disclosure of their own information. The risk of the time includes the loss of safety, money, valuable items, intellectual property(IP), or person’s electronic identity[9-13]. At the same time, personal frustrations that people do not want to share are likely to be well known, like professional embarrassment, loss of a position or job, social loss(friendships), social stigmatization, or marginalization.
To some degree, private information(PI) is similar to private personal property(PP) and IP. People’s personal habits and information will become useful goods and have a certain value. In addition, deep data mining may bring greater economic benefits. To some extent, it will lead the society to the advanced times. Along with the coming of this era, what kind of choice we will make, what kind of change the enterprise will do, and how the government will deal with it,etc. are all problems that deserve our thinking. Therefore, it is necessary for us to establish a mathematical model and to analyze the present and predict the future through the prior knowledge.
With the popularity of the internet, the number of user continues to grow, and by the end of 2011, the number of user in the world had reached two billion and one hundred million, booming the use of the internet. The cloud industry where user data is involved in cloud computing service has developed rapidly. The cloud industry usually refers to “providing information technology resources to many users through the Internet[14-16].” Since the president of Google took the lead in putting forward the concept in 2006, the industry has developed rapidly and is expected to increase by 17 billion US dollars a year.
If the “prism” surveillance project in the U.S. is justified by the name of national security and is above the privacy of citizens, the pursuit of market interests by ecommerce enterprises can not violate consumers’ privacy right. In this new research field, foreign scholars have studied the three stakeholders involved in the operation of cloud industry: consumers, cloud service companies and governments[17].
Cloud industry can be divided into three categories according to service content: platform as a service, software as a service and infrastructure as a service. Besides, it can also be divided into public cloud, private cloud, community and hybrid cloud. Stakeholders in the cloud industry and their interrelationships are shown in Fig. 1.
Fig. 1 Relationship between different roles
Personal financial information refers to individual information in the banking and other financial institutions related to the transaction records and the personal data provided. It reflects personal income, assets and liabilities and spending habits. The traditional law of personal financial information sharing means that financial institutions can hold the consumer personal financial information for sharing with other agencies. There are strict restrictions. In recent years, with the advent of financial mixed operation and the extensive application of network technology, financial institutions have paid more attention to personal financial information sharing. For example, a credit system was established in western developed countries[18-22]. It provides accurate and reliable information of potential borrowers through credit information system, to strengthen the monitoring of credit risk as well as improve credit efficiency. Some domestic financial institutions have also started to classify their customer information, and provide classified information to affiliates or commercial organizations, so as to facilitate the marketing of new products to customers and increase institutional profit.
The so-called financial privacy refers to the information holders’ control over their own credit or transaction-related information. It is closely linked with the economic interests or property interests of the information holders and is a kind of mixed commercial right which integrates the personality and property rights. The main features of privacy[23-24]are as foolws.
(a) Specificity, non separation of the subject of privacy and rights;
(b) Self-control, the right of privacy is an active and active right;
(c) Restrictive, the exercise of concealed private rights is restricted by laws an regulations, the interests of the state and the public interests.
Furthermore, there is a certain conflict between personal information sharing and privacy protection. We should note that personal information sharing can reduce related internet cooperating costs, and also prevent related risks. However, when the interests of the individual are subordinated to the social interests, a considerable portion of the appeals will be damaged. The protection of privacy is relative to the interests of the individual.
Contrary to the statistical database and answer, the recorded record contains the actual, noninterfering data related to the individual. Microdata records contain information about specific individuals. For example, medical records are used in public health research, personal transactions or preferences to support the development of new data mining algorithms, and records are issued to meet legal requirements. Identification properties, such as names and social security numbers, are usually known before the microdata records are published. The published records may also include a “quasi-identifier”, for example, the attributes of the demo graphics, such as zip code, age, and sex. Although quasi-identifier attributes do not directly reveal a person’s identity, they may appear together in another public database identity, or it may be easy to rebuild their values for any given individual. Microdata records may also contain “neutral” attributes that are neither identified nor sensitive[25-29].
In the past days, the relationship between the quasi-identifiers and sensitive attributes in public was always regarded as a privacy task.It is easy to prevent sensitive attributes from leaking together without publishing quasi-identifiers and sensitive attributes. Trivial disinfection, removing all the quasi-identifiers, or releasing all sensitive attributes in every data, provide the greatest privacy, which may be limited to the opponent’s knowledge, specific individuals, and their quasi-identifiers[30-34].
In the following, we will give the description of the problem in section 1,i.e. the price of privacy consists of a normal price people will pay for and the irrational privacy consumer. Besides, the cost-benefit model is built to analyze the price of privacy,and also the privacy protection issue is studied in this paper. Corresponding model will be built in section 2 to solve these problems. The theory and examples are also given. In this section, the p-sensitive k-anonymity is also presented, which is the first method that a data owner can use to protect the initial microdata against the attribute disclosure. Furthermore, algorithm like k-anonymity and l-diversity leave all sensitive attributes intact and apply generalization and suppression to the quasi-identifiers, which is also utilized to the analysis of privacy protection. The conclusion will be obtained in section 3.
1.1.1Privacypays
The most obvious way for people to pay for privacy is to pay the right of information self-determination in bank services, so as to prevent from providing information to some tax authorities, family members or other people. It is estimated that the business gains about billions of dollars annually.
We use advertising to represent people’s value because advertisers do not include contents they think unbeneficial to selling their products or those which will be refused by their audience. In 1997, besides, there was an estimated 1.8 billions dollars as the cost of curtain for the aesthetics and insulation economy(U.S. Census Bureau, 1999). We do not try to factor these numbers into some motivators. We noticed that transparent or lace curtains seem to be rare(Concerns for family, privacy, or distance from neighbors are often from people in suburbs or countrysides). On January 27, 2003, theNewYorkTimespublished a story about college dorm rooms and private rooms. The storyline on the site is “with more student requests and money-privacy, roommates are no longer the protagonists of college life.” Students at Boston University pay an extra 1 400 dollars per year, or about 4 dollars each day, for an extra private room.
First and foremost, the numbers are not from the same year. Second, many products are “bundled,” and privacy is bundled into a complex product rather than an option to pay. Some of them may be divisible; for example, in the case of curtains, we can look for the average size of curtains per house, find the cheapest option to shield the view, and use it as a privacy component. However, this may give us a misunderstanding: are all the curtains bought for privacy? If privacy is the only issue, will people reuse older curtains? Similarly, at the post office, part of the rents may be for a professional look or to avoid theft of mail. The way to distinguish is not clear. Third, we do not have a comprehensive market listing that seeks to make privacy a factor. Finally and most importantly, the meaning of these numbers mean is unknown and therefore they can not be used correctly. On that account, these figures are not added, and we just point out that privacy is an important part of people’s payment and dismiss the notion that people do not pay for privacy.
1.1.2Irrationalprivacyconsumer
Austin Hill observes that people will tell you that privacy is important to them, but then give you a DNA sample to get Big Mac. Although there is clearly puzzling(or frustrating) exaggeration in this statement, the thrust seems correct. But is it really unreasonable to use PI for a relatively less economic value?
We claim that such conduct does not require inherent inconsistency. Assuming a burger is worth two dollars, a complete identity theft costs an average of 100 000 dollars and the probability of such identity theft to the name, address and phone number of the hamburger vendor is 10-10. In this case, it is reasonable to exchange hamburg information. The expected value of such transaction is still valid for 2 dollars.
In fact, a rational decision about the effect of investing time and energy on the understanding of policy, and the expected value of the investment may be privacy-neutral.
Because consumers believe that companies use their information legitimately and responsibly, they choose the internet to free their creativity, engage in political activities, build and maintain friendships, and engage in business transactions. American companies have won the first chance for these technologies, but their leadership relies on the company’s ability to capture and sustain consumer trust in a globalized market.
From a cost-benefit perspective,although implementing the new enhanced privacy protection framework will surely increase the cost of regulators and businesses, many people still believe that it will bring a win-win situation for consumers and businesses.
Cloud service organizations choose to more protect consumer privacy, which inevitably increases business costs and is more likely to limit product use. However, reports from the business community have also repeatedly confirmed the importance of consumer trust for digital commerce, and the findings also prove this point.
We may wish to examine the choice between the two from the perspective of cost-benefit, taking the net benefit of the two acts as the standard of choice, assumingEis the net income,Ris the shared interests andCis the cost. The benefits of personal financial information sharing can be mainly decomposed into micro-earning and macro-benefit. Micro-earning refers to that the financial institutions reduce information collection costs and operating costs through information sharing, and meanwhile broaden the business. Micro-benefit refers to the financial information after the sharing. Due to the effective control of financial risks, resources are fully utilized, and the entire social financial operating environment has been improved. As an economic entity, a single financial institution is more concerned with the profitability. Therefore, only the micro-benefit are considered here. It increases with the increase of the amount of financial information(Q), and there is a diminishing marginal return.
Fig. 2 Curves of R(Q) and C(Q)
Thefinancialinstitutionsaretoprotectthefinancialprivacyatthecostofthesizeofitseconomicvalue.Thenetlossaftertheinfringementmainlydependsonthepossiblelosstothefinancialinstitutionsafterthepaviliondevelopment.SothefunctionFofbenefitformingtheprotectionofprivacyisasectionfunctionas
(1)
where, Q1istheoptimalpoint,andE(Q)=R(Q)-C(Q). Q0isthelimitationofQ. Vstandsforeconomicvalue. C1istheprotectcostsandC2isthecausedloss. Pisthecorrespondingpossibility.
First, the data owner determines not to personally identify a person’s attribute set. However, working with other data sources may result in personal disclosure. These attributes are called quasi-identifiers or key attributes.
To simplify our discussion, we use the following taxonomy, which includes all the possible attributes from any microdata.I1,I2, …,Imare the identifier attributes such as name and social security number (SSN) can be used to identify a record. andK1,K2, …,Kmare the key attributes such as zip code and age that can be known by an intruder. Besides, the key attributes are viewed as the initial microdata.S1,S2, …,Smare the confidential attributes such as principal diagnosis and annual income that may be unknown to an intruder.
To protect the data, the identifier properties are completely removed and the key properties are “masked” to avoid the possibility of disclosure by using the public control method. We assume that the value of confidential attributes can not be obtained from any external source. This assumption ensures that an intruder can not use confidential attribute values to increase his/her chances of disclosure, so “cover up” is not necessary. And now we give two definitions as follows.
Definition1(k-anonymity property) The k-anonymity property for a masked microdata(MM) is satisfied if every combination of key attribute values in MM occurs forkor more times.
Definition2(p-sensitive k-anonymity property) The MM satisfies p-sensitive k-anonymity property if it satisfies k-anonymity property, and each group of tuples has the identical combination of key attribute values.
Take 3-anonymity property with respect to age, zipcode and sex as example in Table 1. Because the first group has different illnesses,p=1. If the first tuple would have a different value for income,p=2.
Table 1 MM example
Note: F means female; M means male
Algorithms like k-anonymity and diversity keep all sensitive properties intact and align identifiers against them. Our goal is to maintain the authenticity of the data, “which provides good utility for data mining applications while achieving imperfect privacy. We believe that the best measure of utility is the success of data mining algorithms such as decision tree learning between attributes. An algorithm that aggregates only statistical information can be executed on perturbed or random data, giving stronger privacy guarantees to powerful opponents than k-anonymity, diversity, and more.
Our experiments were conducted under the same conditions as the data used to validate the existing micro-data disinfection algorithms, which are very poor for these algorithms. The ability to calculate any individual’s sensitive attributes based on disinfection parameters, disinfected data sets, or without providing additional utilities or additional functions far outweighs the legitimate machine learning workloads in term of the accuracy.
An important question for future research is whether the existence of any real-world data sets on which accurate reasoning can support better data mining than ordinary disinfection reveals the serious endangerment of privacy through sensitive attribute.
Privacy is usually part of some other home decor or convenience. This makes it very hard to make a solid figure in the “privacy market,” though these can be fun.
Consumers seem to spend money in understandable threats with understandable solutions, such as using curtains. The attention that people enter through the windows is easy to understand and the solution is easy to understand. In newer or less transparent situations, understanding may be harder. An example is the http cookie. It is not trivial to know what an http cookie is, as this requires some knowledge of the protocol, server, and state. Understanding cookies is more complicated with traceability and link ability because it requires understanding of web page construction, cookie regeneration, and non-cookie tracking mechanisms. Therefore, understanding the technical nature of threats poses a high threshold. Businesses spend time and energy trying to showcase their activities as best as they can and sometimes even misleading. For example, a warranty card states that we must fully enter “guaranteed best service” also requests demographic information. Understanding how to process this information may take more effort.
A recurring feature of the privacy world is the introduction of new questions, new ways to invade privacy, people angry, research writing, new technology success or failure, and privacy issues.
We further consider the violation of the client’s financial privacy, that is, net income to protect financial privacy. Solution of the differential on both sides of the equation can be obtained
(2)
A prerequisite for financial institutions to protect financial privacy isF>0, that is,V(Q)-C1(V)-C2(V)P(Q)>0.
Hence, we can get
(3)
Fig. 3 Net benefit function of privacy protection
From Fig. 3, we can conclude the following cases.
Case 1:Q1≤Q0. Financial institutions should share the amount of informationQ1.
Case 2:Q0 Case 3:Q1≥Q2. Financial institutions should share the amount of informationQ2. In a word, financial institutions should take net income as a trade-off standard and seek the best point of the amount of financial information sharing. 2.3.1Terms,definitionsandsymbols In this part, the labelled p-k-minimal generalization that satisfies p-sensitive k-anonymity property is given. Two definitions are given as follows. Definition3(p-k-minimal generalization) A node X that satisfies p-sensitive k-anonymity when no other node Y satisfies p-sensitive k-anonymity such that X is on the path from Y to upper level of the lattice(X different of Y) represents a p-k-minimal generalization. Definition4(Frequency set) Given a microdata M(initial or masked), and a set of attributes SA ofM, the frequency set of M with respect to SA is a mapping from each unique combination of values of SA to the total number of tuples inMwith these values of SA. The following notations are used for a microdataM: nis the number of tuples inM; qis the number of confidential attributes inM; sjis the number of distinct values for the attributeSj; (t) is an equivalence class. 2.3.2Privacyprotectionwithp-sensitivek-anonymityproperty It is suggested to promote the usage of classified attributes, such as zipcode, sex. The domain of the generalized attribute is extended to a domain generalization hierarchy, which includes all possible groups of specific attributes. The attribute’s post area code contains all the existing zipcodes, and the domain generalization hierarchy contains all prefix(no repeating) existence value. The domain and value generalization hierachies are presented in Fig. 4. Fig. 4 Examples of domain and value generalization hierachies In order to apply generalization, the data owner defines the domain and value generalization hierarchies of the attributes to be generalized. Generally, the data owner has a variety of choices based on the attributes of each attribute. For example, only one of the six domain attributes of the zipcode has a different generalization hierarchy that is once removed. The selection of the domain generalization level(based on the value generalization level generated by the generalization level of the selected domain) is an important factor in the success of the masking process. When two or more attributes are generalized, the data owner can create a generic lattice to visualize all possible broad domain combinations. These can be explained in Fig. 5. Fig. 5 Generalization lattice for zipcode and sex attribute After the generalization, we can determine that the tuple number has a key attribute value of a smaller frequency, if the number is lower than the defined threshold, it will be suppressed, and these tuples will be removed from the masked microdata. Considering a microdataMwith two key attributesK1andK2, and three confidential attributesS1,S2,S3, and setting the size of the microdata as 1 000, Table 2 and Table 3 are listed. Table 2 Frequency art values Table 3 Frequency art values In the process of generalization, the value of confidential attributes will not change, and the existing changes are only due to inhibition. The number of distinct values for anySjcannot increase by eliminating tuples. Therefore, we can get (4) AssumingSkas the confidential attribute with the smallest number of distinct values for IM, so, (5) It is easy to prevent sensitive attributes from leaking together without publishing quasi-identifiers and sensitive attributes. Trivial disinfection, removal of all quasi-identifiers, or the release of all sensitive attributes from every data are to provide the greatest privacy possible opponent. The study of k-anonymity and l-diversity which are published in unmodified sensitive attributes draws wide attention in the field of technical application, such as specific generalization and suppression of quasi-identifier properties. When the opponent learns information about the sensitive attributes of the individual, the leakage of sensitive attributes will occur. This kind of invasion of privacy is different,and learning whether a person is incomparable to the database is the focus of the differential privacy. The adversary’s baseline knowledgeAbaseis the minimum information about sensitive attributes. Sensitive attribute disclosure shows the difference between the adversary’s posterior knowledgeAsanand baseline knowledgeAbase. It can be measured additively or multiplicatively. (6) Generally speaking, it grasps what opponents learned by observing the disinfection quasi-identifier rather than separating the quasi-identifier from the maximum sensitive property of a private database. Intuitively, a table is used to disclose the distribution of the private if the sensitive attribute values are roughly the same in the distribution of each quasi-identifier class, and they are in the entire table. Gain(S,Q)=H(S)-H(S,Q). (7) And we also have Gain(S,Q)= (8) Intuitively, a table isused to disclose the distribution of the private if the sensitive attribute values are roughly the same in the distribution of each quasi-identifier class, and they are in the entire table. Traditional privacy indicators depend on the syntactic features of the disinfectant data set: the same quasi-identifier record number(anonymity) or frequency sensitive attribute in each quasi-identifier class(diversity). Unfortunately, these two indicators are incomparable. We propose two different metrics to quantify attribute disclosure allowed by a sanitized database. (9) whereAqccstands for adversarial accuracy gain which measures the increase in the adversary’s accuracy after observing the sanitized databaseT0 compared to baseline accuracy from observingT*. The utility of any data set, whether it is sterilized or not, is closely related to innate calculations. For example, census data sets may support extremely accurate education-based revenue classifications, but not cluster by family size. Without a workload context, it makes no sense to say whether a data set is useful or useless, let alone quantifying its utility. Our goal is to use semantic definitions in a single framework to measure privacy and utility transactions: the utility of privacy in confronting the disclosure of sensitive attributes and the specific machine learning tasks. First, for a given workloadw, we measure workload utility of trivially datasets, and the datasets from which either all quasi-identifiersQ, or all sensitive attributesShave been removed. Both of them provide the maximum privacy achievable using generalization and suppression. Fig. 6 Learning the sensitive attribute in marital dataset Fig. 7 Learning the sensitive attribute in occupation dataset Fig. 8 Learning the neutral attribute in marital dataset Fig. 9 Learning the neutral attribute in occupation dataset And a specific example is given below. Hypothese 1 A company suffers a loss in market value whether a privacy breach is announced. Hypothese 2 The magnitude of the negative cumulative abnormal return(CAR) will be larger whether a privacy breach is reported in national media rather than local or industry outlets. Hypothese 3 The magnitude of the negative CAR will increase with the number of individuals affected by the privacy breach. In order to study somewhat homogenous and comparable data, we focus our attention on data breaches,defined as instances in which consumer or other parties’ data were exposed by different subjects, as shown in table 4. Table 4 Distribution of privacy subjects Most of the breaches are due to mishandling of security practices or data, or inappropriate physical defense against thieves, as shown in Table 4. Cleaning the exact amount of personal data leaked in each breach is also hard from many breach announcements, although most of them have offered some details. In the market adjusted model, the event window returns are compared to an expected return of the market only over the event period, so the abnormal returns are given as ARit=Rit-Rmt. (10) In the mean adjusted model, the returns are compared to the mean market return over the event period. Abnormal returns are now given as ARit=Rit-Ri. (11) We also have the following equations from the view of statistic (12) Table5Distribution of privacy breaches in publicly traded companies Type of breachNumberBad security practices24Hacker9Insider attack8Computer or hardware theft18Lost data12Other8SSN41Credit card18Complete credit record9Other personal information10Other1 The null hypothesis is that the abnormal returns are not significantly different from zero. Under the null hypothesis, the abnormal returns are independent, identically distributed and normal with a mean of zero and the variance of abnormal returns over the estimation period. The results of the event study are presented here. After cleaning the data, we focus on 79 events and an estimation window fromt-100 tot-8. The forecast window is fromt-7 tot+10. Table 6 CARs over different periods Fig. 10 CAR values from t-5 to t+10 CARsNo.NegativeIndustryRetail-0.015 701471.43Other-0.002 062466.67Finance 0.000 482653.85Data process-0.005 091428.57Data MisuseAttack evidence-0.008 703357.58No attack evidence-0.001 044165.85Data SubjectThird party-0.005 971764.71Employee 0.001 861060.00Customer-0.004 595154.90ResponsibilityThird party responsible 0.001 581723.52Breach CauseLaptop-0.002 862965.52Employee-0.010 981747.06Customer-0.001 122462.50AffectedLess than 100 000 0.002 023154.84100 000500 000-0.004 582254.55More than 500 000-0.026 56977.78 In theory, this should produce a net loss of over 140 million dollars. However, looking at each company’s abnormal returns and applying that difference to each firm’ market value produce an estimate of only 9 953 968 dollars. Clearly, some firms with large market capitalizations do not suffer as strong adverse effects from an announcement. This somewhat surprising finding suggests that the penalty for a privacy breach may be an anticipation of an absolute value of consequences, which would hurt smaller companies more. PI is similar to private PP and IP. Once lawfully acquired, the PI can be sold to other people who have rights or ownership of the information. When human activity details and metadata become increasingly valuable social information, especially in medical research, disease spread, disaster relief, businesses(e.g. markets, insurance and income), personal behavioral records, beliefs and physical activity, they can become valuable, quantifiable items. When conducting a series of transactions in the private data, there may be different risks and benefits in the area of information(e.g. purchases, social media, healthcare) and groups(e.g. citizenship, professional profile, and age). In general, discrimination has a very negative connotation in our society, and its various forms, especially those based on age, gender, race and religion, are illegal. However, price discrimination is an ancient technological economy that is prevalent in China. Although it is often cloaked to avoid negative public reactions, it is often supported by the government as a matter of public policy, sometimes explicit and often implicit. The fundamental reason is that, according to the standard economic argument, in real life, it is usually necessary to discriminate prices in order to achieve the optimal distribution of resources. In addition, price discrimination is likely to play an increasing role in the future for two main reasons, one of which is the increasing cost of producing goods and services, including one-off costs and low marginal costs(for example, developing one software program can cost hundreds of millions of dollars). Another reason is that modern technology makes price discrimination possible. For example, in 2000 Coca-Cola experimented with the soda water vending machines, which vended opportunities at high temperatures. Businesses may want to do this in the past, but technology is not available. In the era of big data, with the rapid development of artificial intelligence and cloud computing, data have become an important resource in the development process. The use of massive data resources and efficient analytical techniques can make more precise personalized recommendations,and bring convenience to people’s lives. It may also bring the risk of the infringement of user privacy. If there is a crisis of data disclosure, extensive data collection may also bring irreparable harm to the entire society. As a result, the legitimate boundaries of data collection and utilization deserve a deep consideration while facilitating data flow and certification.2.3 p-sensitive k-anonymity property
2.4 Privacy cost solution
3 Conclusions
Journal of Donghua University(English Edition)2019年1期