When studying phenomena outside the controlled conditions of the lab there are a whole host of variables that can affect your results. How an animal population behaves one season to the next, or why one set of patients responds well to a drug while others don’t, can come down to a number of factors that are difficult to control for in experimental design. This is why sample size is such a key factor in research, ensuring that you have enough data points, covering a variety of different scenarios, can help to smooth out background noise and identify the key trends you’re looking for. The era of “Big Data” we are now living in has helped with this problem of sample size as the vast amount of data available allows us to investigate different problems and ideas using rich and varied datasets. Big Data is a exactly that, extremely large datasets - the threshold is a moving target but is generally considered to be in the petabytes, for context, one petabyte worth of MP3-encoded music would take 2,000 years to play! The types of data we can now bring together is only limited by the type of sensors we have; from all the different apps on a smartphone to heart monitors, weather stations and CCTV cameras. This vast and varied data requires new ways of thinking and new analytical tools to reveal relationships between variables and conduct predictive analysis of what might happen in the future.
The applications of this analysis are endless, from translation: Google Translate is based on Big Data statistical analysis of text, to personalised medicine: exploring healthcare data to see which drugs are most effective in which types of people.
I work for a company called Carbon Credentials and our mission is to use data to drive sustainable practices. By connecting carbon emissions data to operational data from buildings you can work out what is consuming the most energy and why. Is it a cultural issue: do people just need to get better at switching their equipment off before they leave for the day? Or is it an operational issue with the building: is the air conditioning and heating on at the same time? (This happens more often that you’d think). We work with large datasets from universities, businesses and hospitals to create tailored sustainability performance programs to optimise the use of buildings to reduce energy wastage and increase user comfort. This type of data analysis may seem overwhelming, complicated or just dull! But it is revolutionising our lives for better or worse. We can expand scientific research, improve healthcare and reduce carbon emissions but our personal information is no longer our own - if you want to use pretty much any app on your phone you have to give access to your location, photos, contacts and more.
How many sites do you go on that make you accept the “cookies” that personalise your content and ads?
A University of Cambridge research group hit headlines in 2015 for creating an online tool that can predict your relationship status, intelligence levels, political & religious beliefs and sexual preferences just by analysing your Facebook likes. We offer up a lot of this information readily to Facebook but this analysis demonstrates how what you like, what you search for and what you buy can generate a profile which can determine factors such as what news articles will appear in searches. How this information is being used is helping to confirm existing biases and narrow our experience of the online world. In a world that is increasingly polarising to the extremes of left and right I think it’s important that we consider how data can improve and connect the world rather than isolate and divide it.