In the second half of season 4 of Unbreakable Kimmy Schmidt, the titular character finds out the company she works for engages in data mining. Given the show is quirky, the geeky adolescent CEO explains, “We dive into phone cameras to look around people’s surroundings and also up inside their mouths to determine their socio-economic status.”
While the episode was being ambitious in its tech industry commentary, the mention is excuse enough for the not-so-tech-savvy folks out there to ask, ‘what is data mining?’
Breaking away from the Kimmy Schmidt definition, data mining is an analytic process designed to explore data (usually large amounts of typically business or market-related data or ‘Big Data’) and to seek out consistent patterns and/or systematic relationships between variables, which are usually individuals, organisations, societies and corporations. The main types of data mining techniques are association, classification, clustering, prediction, sequential patterns and decision tree. All of these are implemented on findings from datasets and then validated by applying the detected patterns to create predictions.
Good, bad and illegal
Data mining has gotten a lot of flak for its less-than-honourable uses. But let’s not forget that most tech concepts have a road paved with good intentions.
One of the positive uses includes criminal investigation, in which datasets are used to explore and detect crimes and their relationships with criminals, thus making criminology an appropriate field for applying data mining techniques. Said datasets include criminal databases, geographical data and even policies and other legal issues.
Another beneficial use is in the ‘future healthcare’ industry, where experts use datasets to improve healthcare systems. Notably, researchers use data mining approaches like multi-dimensional databases, machine learning, soft computing, data visualisation and statistics. It has also been useful to predict the volume of patients in every category. Processes that make sure that the patients receive appropriate care at the right place and at the right time, are developed.
Some types of data mining aren’t so cheery, and can be misused. Some government organisations have been known to tap visual data from users through cameras, Big Boss-style. That’s why we see so many people’s laptop cameras taped over! Remarkably the other day I was singing ‘Summer Nights’ from Grease and a day later, I received an Instagram follow-request from an Olvia Newton John fan page. Hmm, scary.
For historical reference, let’s discuss the Cambridge Analytica fiasco with Facebook. In an ugly turn in politics, 50 million Facebook profiles were mined for data. Academic Aleksandr Kogan developed an app called ‘thisisyourdigitallife’, in which a personality quiz was offered. Those who took the quiz had their profiles mined for personal data and even had the data of their Facebook friends extracted as well. “From an initial group of 320,000 quiz-takers, the researchers managed to create records on at least two million people across 11 key US states,” reported The Guardian. Kogan then shared this with CA, which allowed the firm to build a software solution to help influence choices in the US elections. CA whistleblower Christopher Wylie claimed the data sold to CA was then used to develop ‘psychographic’ profiles of people and deliver pro-Trump material to them online. This year, Facebook turns 15 years old… but the celebration is sadly shrouded by scandal.
Byte-sized play-by-plays of tech concepts