Every time you switch on your phone, make a call, search for a location on your phone’s map app or purchase a product online, you are leaving behind valuable bits of data.
All of this information that determines your habits is floating around on servers across the world, part of a new trend called big data. The two words seem simple enough, but few understand its implications.
From social networks, mobile devices and even embedded sensor chips in fridges and cars, we have so far produced about 2.7 zettabytes1) (the seventh power2) of 1000 bytes) of data, the equivalent of 36 million HD videos, and that is set to grow 40~45 per cent on an annual basis.
According to research from Ericsson3), people will be using 15 times more data by 2017 than they do today. There will be 3 billion smartphone subscriptions, 5 billion mobile broadband subscriptions and 9 billion mobile subscriptions. By 2020 there will be 50 billion connected devices worldwide. Each of these devices will be generating data.
So far, deciphering all this data has not been an easy task. There are data scientists whose job is to create the tools to mine through the deposits of information. These tools are now becoming sophisticated enough that they not only track your habits, but can predict them as well.
“We are experiencing a very interesting era now, where we see that data is coming upon us from various sources. It is no longer just customer data stored in databases within enterprises generated through business systems,” says Ahmed Auda, the business unit executive for IBM’s software group in the Middle East.
Our smartphones are now containing more and more personal information. From banking transactions to medical records, our daily lives are being condensed on to single devices.
While we may think it belongs to us, there are others who have access to it and this data is valuable to many companies who are trying to better understand consumer habits.
“Knowing a customer’s location, background, the places they like to go to, the times they go to work or lunch will help organizations personalize their services in a better way,” says Mr. Auda.
More and more of this data is considered unstructured. It is coming from PDF files, emails, audio files and “l(fā)ikes” on Facebook, shares on YouTube and so on. There is a growing demand and an awareness among clients and organizations, both big and small, of the importance of data and being able to benefit from it to generate stronger business insight and ultimately to transform that into a competitive advantage.
Based on your choices and habits, the algorithms in big data tools aim to predict what choices you will make and what you will do.
They have the ability to completely redefine the way business is conducted. Why should a store offer half-price sales in order to attract as many customers when it can reliably offer the discounts to customers it knows will only come in when there is an offer? Other customers, who historically have been willing to pay full price, can continue to do so.
But it is not just in the field of retail where big data can play a role.
One insurance company in Florida is using social media during major disasters to determine which customers to address first.
Security First Insurance is using a software from IBM that analyzes messages sent via email and social media using text mining4) tools, analytics and natural language processing to pick up on5) words that express distress or convey significant damage to a client’s property.
When the bird flu epidemic took off a couple of years ago, countries across the world took what precautions they could to prevent the disease from entering their borders. But monitoring the spread of disease and preventing it could become far more accurate in the future.
“You have tweets like, ‘I just landed in New Delhi,’ the thinly veiled6) show-off tweets where people are trying to impress us with where they’re travelling to. I call it ‘rich person vanity,’” said Jer Thorp, the co-founder of the Office for Creative Research, while addressing an audience at the EMC conference7) in Las Vegas recently.
“I know where they’re going to because they’re tweeting it and I know where they’re from because it is in their profile. Maybe if we watch Twitter closely we can build an almost live model of human travel, which can be used if there is an outbreak of an epidemic human disease.”
Creating such a tool is simple enough and with tweets being in the public domain, there are no barriers to using and extracting this data.
Imagine how much more accurate such a tool would be if it were based on users’ mobile phones with access to their medical records. From there you could essentially pinpoint the one person who caused the spread of a disease from one location to another.
The use of this data and prediction of consumer behavior, however, sits uncomfortably with many, particularly privacy advocates, while some suggest it could take away free choice.
Others are less skeptical and maintain it is convenient and consumers will ultimately benefit by getting tailored services, even if it is intrusive.
Whether the majority of consumers are likely to opt into this new world of business-customer relationship remains to be seen and while corporate attitudes are shifting towards big data, consumers will have to think differently about their data as well.
“Data ownership will be the most important discussion. The problem is lots of people are using our data with our permission, but I don’t think people have a good idea of what is being done with that data,” said Mr. Thorp.
“Going forward, we need to give people a personal relationship with their data.”
One major problem, according to Mr. Thorp, is the complexity of this data and the way it is presented.
“It doesn’t look like anything of value so people are willing to give it up, but we need to think about data in a broader way.”
Few are aware of it or even bothered about it. How many people actually check the permissions they grant to mobile apps on their smartphones once they have downloaded them? More and more apps are increasingly asking for permission to access more information and data stored on our phones, from our pictures and cameras to contacts lists, files databases and emails.
“There are different segments in society and their acceptance and readiness to give away this information may vary,” says Mr. Auda.
“Among the younger generation, there is acceptance to share what they do, where they are, what they eat, whether they are in good health or not. But regulation and governance of data is an important component of the overall big-data story and they are slowly being realized.”
Mr. Thorp points out three main factors in the future of big data: ethics, ownership and possibility.
“Ethics is one of the things we will be forced to negotiate with over the next couple of years,” he said.
“Companies that are able to understand data ownership and can broker8) fair negotiation over data ownership with consumers will have the advantage.”
Finally in terms of data possibility, Mr. Thorp says the technology will change public understanding of data.
Until then, it may be wise to remain wary of putting too much personal data in the hands of those with only commercial purposes in mind.
每當(dāng)你打開(kāi)手機(jī),撥一個(gè)電話,在你手機(jī)上的地圖應(yīng)用軟件上搜索一個(gè)地點(diǎn),或是在線購(gòu)買一種產(chǎn)品時(shí),你都留下了點(diǎn)點(diǎn)滴滴的寶貴數(shù)據(jù)。
所有這些能夠確定你行為習(xí)慣的信息正在世界各地的服務(wù)器上流通,構(gòu)成了名為“大數(shù)據(jù)”的新趨勢(shì)的一部分。這三個(gè)字看似很簡(jiǎn)單,卻少有人能理解其蘊(yùn)含的重要意義。
通過(guò)社交網(wǎng)絡(luò)、移動(dòng)設(shè)備甚至冰箱和汽車的內(nèi)置傳感器芯片,我們截至目前已創(chuàng)造出了大約2.7澤字節(jié)(一千字節(jié)的7次冪)的數(shù)據(jù),相當(dāng)于3600萬(wàn)部高清視頻的數(shù)據(jù)量,而這一數(shù)字正在以每年40%~45%的比率遞增。
愛(ài)立信公司的研究表明,到2017年,人們使用的數(shù)據(jù)將是今天的15倍。屆時(shí)將產(chǎn)生30億智能手機(jī)入網(wǎng)用戶,50億移動(dòng)寬帶入網(wǎng)用戶和90億移動(dòng)手機(jī)入網(wǎng)用戶。到2020年,全世界將有500億臺(tái)聯(lián)網(wǎng)設(shè)備,其中每一臺(tái)設(shè)備都將產(chǎn)生數(shù)據(jù)。
截至目前,解讀所有這些數(shù)據(jù)并非易事。數(shù)據(jù)科學(xué)家們應(yīng)運(yùn)而生,他們的工作就是開(kāi)發(fā)工具來(lái)挖掘信息“礦藏”?,F(xiàn)在,這些工具變得足夠復(fù)雜精密,不僅能跟蹤你的習(xí)慣,而且還能預(yù)測(cè)你的習(xí)慣。
“我們現(xiàn)在正經(jīng)歷一個(gè)非常有趣的時(shí)代,身處其中,我們看到數(shù)據(jù)從各種源頭向我們涌來(lái)。這些數(shù)據(jù)不再僅僅是通過(guò)業(yè)務(wù)系統(tǒng)生成的存儲(chǔ)在企業(yè)內(nèi)部數(shù)據(jù)庫(kù)的客戶數(shù)據(jù)?!盜BM軟件集團(tuán)中東事業(yè)部經(jīng)理艾哈邁德·奧達(dá)說(shuō)道。
如今,我們的智能手機(jī)里含有越來(lái)越多的個(gè)人信息。從銀行交易到醫(yī)療記錄,我們的日常生活正被壓縮到一個(gè)個(gè)設(shè)備中。
我們可能會(huì)認(rèn)為這些個(gè)人信息屬于我們自己,但其他人也能訪問(wèn)這些信息。而且,對(duì)于很多正在試圖更好地了解消費(fèi)者習(xí)慣的公司而言,這些數(shù)據(jù)很有價(jià)值。
“了解消費(fèi)者的位置、背景、喜歡去的地方以及上班或吃午飯的時(shí)間,將有助于這些機(jī)構(gòu)更好地推出個(gè)性化服務(wù)?!眾W達(dá)先生說(shuō)。
越來(lái)越多的這類數(shù)據(jù)被視為非結(jié)構(gòu)化數(shù)據(jù),它們?cè)醋訮DF文件、電子郵件、音頻文件、Facebook上的“贊”以及YouTube上的分享等。無(wú)論是個(gè)人客戶還是各種大小型機(jī)構(gòu)都越來(lái)越需要數(shù)據(jù),也越來(lái)越意識(shí)到數(shù)據(jù)以及能夠從數(shù)據(jù)中獲益的重要性。從數(shù)據(jù)中獲益是通過(guò)數(shù)據(jù)形成更強(qiáng)的商業(yè)洞察力,并最終將之轉(zhuǎn)化為競(jìng)爭(zhēng)優(yōu)勢(shì)。
根據(jù)你的選擇和習(xí)慣,大數(shù)據(jù)工具的算法旨在預(yù)測(cè)出你將要作出什么選擇,以及你將會(huì)做些什么。
這些算法能夠徹底地重新定義商業(yè)經(jīng)營(yíng)的方式。商家知道,有些顧客只會(huì)在商品價(jià)格優(yōu)惠時(shí)才會(huì)來(lái)購(gòu)物,如果商家可以牢靠地向這些顧客提供折扣的話,又何必通過(guò)全場(chǎng)半價(jià)來(lái)吸引相同數(shù)量的顧客呢?其他那些一直以來(lái)愿意全價(jià)購(gòu)物的顧客可以繼續(xù)全價(jià)購(gòu)物。
不過(guò),大數(shù)據(jù)可不是只能在零售領(lǐng)域發(fā)揮作用。
在發(fā)生重大災(zāi)難時(shí),佛羅里達(dá)州的一家保險(xiǎn)公司會(huì)利用社交媒體來(lái)決定先為哪些客戶理賠。
安全第一保險(xiǎn)公司正在使用IBM推出的一款軟件來(lái)分析經(jīng)由電子郵件和社交媒體發(fā)出的信息。通過(guò)利用文本挖掘工具、分析方法和自然語(yǔ)言處理,該軟件可以捕捉到那些表達(dá)憂慮或表示客戶財(cái)產(chǎn)遭受重大損失的詞語(yǔ)。
幾年前,當(dāng)禽流感疫情爆發(fā)時(shí),世界各國(guó)都盡其所能采取了各種預(yù)防措施,以防止該疾病越過(guò)邊境潛入國(guó)內(nèi)。不過(guò),未來(lái)人們對(duì)疾病傳播的監(jiān)測(cè)和對(duì)疾病的預(yù)防將變得更加精確。
“你會(huì)在推特上發(fā)布‘我剛到新德里’這樣赤裸裸的炫耀性推文。人們?cè)噲D通過(guò)這樣的文字告知他們旅行去了什么地方,以吸引我們的眼球。我把這叫做‘富人的虛榮’?!眲?chuàng)新研究辦公室的聯(lián)合創(chuàng)始人耶·索普最近在拉斯維加斯的易安信世界大會(huì)上發(fā)表演講時(shí)說(shuō)道。
“我知道他們要去哪兒,因?yàn)樗麄冊(cè)谕铺厣险f(shuō)了;我也知道他們從哪兒來(lái),因?yàn)檫@些信息就在個(gè)人簡(jiǎn)介里。如果我們密切關(guān)注推特,我們或許就能構(gòu)建出一個(gè)近乎實(shí)況轉(zhuǎn)播的人類旅行模型。如果流行性人類疾病爆發(fā),這一模型就派上用場(chǎng)了?!?/p>
創(chuàng)造一個(gè)這樣的工具很簡(jiǎn)單,而推文又屬于公共領(lǐng)域,所以人們?cè)诶煤吞崛∵@類數(shù)據(jù)時(shí)不存在任何障礙。
想象一下,如果這種工具是基于用戶的移動(dòng)電話開(kāi)發(fā)的,可以訪問(wèn)其病歷信息,那該工具得有多精確啊。這樣你基本上就可以精確地定位到將疾病從一個(gè)地方傳到另一個(gè)地方的那個(gè)人。
然而,對(duì)這種數(shù)據(jù)的利用和對(duì)消費(fèi)者行為的預(yù)測(cè)會(huì)讓很多人感到不安,特別是隱私權(quán)的倡導(dǎo)者們,而同時(shí)有些人提出這會(huì)剝奪人們自由選擇的權(quán)利。
其他人對(duì)此沒(méi)有那么懷疑,并堅(jiān)持認(rèn)為這很便利,而消費(fèi)者最終將通過(guò)獲取量身定制的服務(wù)獲益,即使這種服務(wù)方式帶有侵犯性。
至于大多數(shù)消費(fèi)者是否有可能選擇加入這個(gè)企業(yè)與消費(fèi)者關(guān)系的新世界,仍然有待觀察。企業(yè)對(duì)大數(shù)據(jù)的態(tài)度正在轉(zhuǎn)變,而消費(fèi)者也不得不從不同的角度來(lái)考慮他們的個(gè)人數(shù)據(jù)。
“數(shù)據(jù)所有權(quán)將是最重要的討論話題。問(wèn)題在于,很多人是經(jīng)過(guò)我們授權(quán)使用我們的數(shù)據(jù)的,但我認(rèn)為人們并不清楚這些數(shù)據(jù)被用來(lái)做什么。”索普先生說(shuō)。
“在未來(lái)發(fā)展中,我們需要在人們與其數(shù)據(jù)間建立一種個(gè)人聯(lián)系?!?/p>
索普先生認(rèn)為,這里的一個(gè)主要問(wèn)題在于這些數(shù)據(jù)及其呈現(xiàn)方式的復(fù)雜性。
“這些數(shù)據(jù)看上去不像是什么有價(jià)值的東西,所以人們?cè)敢夥艞壦?,但我們需要用更開(kāi)闊的思路來(lái)看待數(shù)據(jù)?!?/p>
很少有人注意到這些數(shù)據(jù),甚至沒(méi)什么人為此操過(guò)心。在人們將移動(dòng)應(yīng)用程序下載到智能手機(jī)上后,又有多少人真的會(huì)去檢查他們授予這些應(yīng)用程序的權(quán)限?越來(lái)越多的應(yīng)用程序在不停地請(qǐng)求得到授權(quán),以便訪問(wèn)存儲(chǔ)在我們手機(jī)上的更多信息和數(shù)據(jù),包括我們的照片、攝像頭、通訊錄、文件數(shù)據(jù)庫(kù)以及電子郵件。
“社會(huì)中存在不同的群體,他們接受和愿意提供這種信息的程度可能不盡相同?!眾W達(dá)先生說(shuō)。
“年輕一代的人們?cè)敢夥窒硭麄冊(cè)谧鍪裁础⑸碓诤翁帯⒊粤耸裁匆约吧眢w是否健康等信息。但數(shù)據(jù)的規(guī)范和管理是整個(gè)大數(shù)據(jù)圖景中的一個(gè)重要組成部分,人們正逐漸認(rèn)識(shí)到這一點(diǎn)?!?/p>
索普先生指出,大數(shù)據(jù)的未來(lái)包含三個(gè)主要因素:道德、所有權(quán)和可能性。
“在接下來(lái)的幾年里,道德將是我們不得不協(xié)商的問(wèn)題之一?!彼f(shuō)。
“那些能夠理解數(shù)據(jù)所有權(quán)并且能夠同消費(fèi)者就數(shù)據(jù)所有權(quán)展開(kāi)公平談判的公司將占據(jù)優(yōu)勢(shì)?!?/p>
最后,在數(shù)據(jù)的可能性方面,索普先生稱技術(shù)將改變公眾對(duì)數(shù)據(jù)的認(rèn)識(shí)。
在那之前,明智的做法或許是保持警惕,不把過(guò)多的個(gè)人數(shù)據(jù)交到那些頭腦中只有商業(yè)目的的人手中。
1.zettabyte [?zet?bait] n. [計(jì)]澤字節(jié),即1021字節(jié)
2.power [?pa??(r)] n. [數(shù)]冪,乘方
3.Ericsson:愛(ài)立信公司,1876年成立于瑞典首都斯德哥爾摩,是全球最大的移動(dòng)通訊設(shè)備商。
4.text mining:文本挖掘
5.pick up on:〈美口〉注意到
6.thinly veiled:未經(jīng)修飾的,沒(méi)有遮掩的
7.EMC conference:易安信世界大會(huì)(EMC World Conference)。易安信(EMC)是美國(guó)一家信息存儲(chǔ)資訊科技公司,每年都召開(kāi)一次世界大會(huì),該會(huì)議是存儲(chǔ)業(yè)界了解易安信技術(shù)、產(chǎn)品和解決方案的重要渠道。
8.broker [?br??k?(r)] vt. 作為權(quán)力經(jīng)紀(jì)人進(jìn)行談判,討價(jià)還價(jià)