Data Mining

In computing, data is quite literally anything you can record.  I don't necessarily mean in terms of sound and video, although that can be included.  In computer science the act of recording data is known as logging and you might be surprised how much actually occurs if you've never studied the field - even if you have studied it you might still be surprised at how much data is logged without you knowing.

Information as opposed to data, is more developed and more concrete.  If data is anything that can be recorded, then information is data with context or meaning.  What you can extract from data, the conclusions you can draw, that is known as information.  We live in a society where almost everything we do generates data, and almost all of that data can be used to generate information.  Whilst the world has become increasingly aware of this fact, and whilst data collection and processing has become more prominent in our mindset and part of the Zeitgeist, it is perhaps a misdirection or misconception that data should be the key focus of our concern - in reality it is the information that is extracted from that data that is the real concern.

How old you are, in and of itself has little consequence when someone else knows that fact.  Your age represents nothing more than a data point, something that can be charted or stated as an attribute.  The risk to you personally, posed by someone else holding this data is negligible.  The risk comes when this data is used to generate information.  When your other data is added to create context, then information emerges.  When people know what you have bought and sold combined with your age they are able to extract information.  Conclusions can be drawn that someone your age might be interested in a given product.  A profile is created, and I don't mean a profile like those on social media networks, although the concept is similar.  A profile in the context of data extraction is used to create templates.  Templates can be referenced later to give an accurate prediction of how someone might behave.

When you start collecting data from many different people you begin to harvest it, this process can be explicit, by asking individuals for their data or it can be implicit, by accessing the data they have already given you authority to access - whether they are fully aware of that or not.  You can also access data in other ways, either by stealth through the use of tracking, without the subject of the data being aware of it, or through unlawful means such as hacking.

When you have amassed a large enough collection of data you can pool it together into a data set and then dig down through it to see what you can find.  The process of digging through data sets is known as data mining, and like real world mining, the ultimate goal is to find clusters - groups of data accumulated together.  In traditional mining those clusters would be ores in veins that indicate more of the resource you are after is nearby.  In data mining, those clusters let you see where your data comes together.  Where that data converges can lead to multiple data subjects all conforming to the same profile you have created.  The larger these clusters become the greater the convergence of the data you collected.

When you see all the purchases made, and the ages of the people who made those purchases, you can see where correlations occur, you can see where specific audiences emerge.  Audiences, like those who watch a TV show or a Theatre production, are people who have an interest in the same thing.  Through data mining you can tap into audiences and identify those who are part of them, and crucially, reach those whose data you have not collected.  Therein lies the ultimate goal of data mining - growth.  All data mining at it's core, the same as traditional mining, is done in the pursuit of accumulating more of the resource you wish to mine.  Data mining seeks to find more data, to extract more information, and to use that information to pursue growth, reaching more people to become subjects, to increase the size of the data set, and improve the accuracy of the predictability through the profiles you create.

How then do you overcome this system?  If you do not want to be part of this system, or if you want to attack it, and protest against it, how do you overcome it?  The answer is rather simple.  Disinformation - false information.  The integrity of these systems in their entirety relies on the accuracy of the information provided.  So the first line of defence against these systems is to use inaccurate information wherever it is legal and prudent to do so.  Those who are more militant in their objection would likely resort to providing disinformation even when it is not legal to do so.  I don't approve of that personally however I would ask you to consider who it is and is not legal to lie to if you wanted to pursue that path.  There is a fine line between giving false information, and committing fraud.  The latter occurs when you give disinformation to a legal entity such as a bank or public body you are legally bound to tell the truth to for the purpose of the services they provide.  As for social networks etc most of their terms and conditions are not enforceable by law as their stipulations are primarily aimed at consent which you can revoke at any time.  Further to that as most of these rely on contract law, one of the first things you will learn about contracts if you ever study them academically or professionally, is that contracts need to be witnessed, and they need to be signed by the party to the contract - those terms of service etc are never witnessed, and as for being party to the contract, if the identity used to sign the contract is not your real identity, the contract is not valid even if it is witnessed.

For legal reasons I must disclaim I am not a lawyer and I cannot offer you legal advice.  I have studied several areas of law and the information provided here is given based on my experience, it is based on opinion and I would ask you to seek legal advice before acting upon it.

Disclaimers aside, the desire to prevent profiling is legitimate.  The services we use have become increasingly invasive and require us to give over more and more information as payment.  They may be free in the monetary sense but they certainly do cost us all to use.  I would urge anyone with concerns to think about the information they give freely to companies online, use software like AdBlock Plus to block advertising, enable Do Not Track features of your browsers, and switch off third party cookies - most websites don't need them except for tracking.  Going further you could use a privacy conscious browser such as Brave to increase privacy and security.  Websites can still track you even if cookies are disabled by using supercookies - these are server-side cookies that use browser fingerprinting inserted into HTTP headers to track you.  Browsers such as Brave offer the option to block this behaviour and prevent websites from tracking you in this way.

Delete any old or outdated information on social networks.  If you have facebook perform an audit of your account by looking through your 'Likes' and deciding what's still relevant, if you have Twitter you can do the same, or consider using a service like tweetdelete.net to delete any tweets older than a specific time frame e.g. a month.  You probably have thousands of tweets on your profile, the bulk of those will no longer be relevant, and 99.9999% of the people who follow you or will follow you will never see or read those tweets, the only people they are benefiting are those who scrape data from your public profile to build up a profile on you.

Delete your old accounts on websites you no longer use - this is perhaps the most pertinent, you can use justdelete.me to find the 'delete account' page for most popular websites and instructions for many others.  You can use namechk.com to find which websites a username is in use on - handy for finding old sites you forgot you once used.

No comments:

Post a Comment

All comments are moderated before they are published. If you want your comment to remain private please state that clearly.