Big Data

From TRCCompSci - AQA Computer Science
Revision as of 07:42, 22 May 2017 by Admin (talk | contribs)
Jump to: navigation, search

What is Big Data

Big data is a generic term given to datasets that are so large or complicated that they are difficult to store, manipulate and analyse. The three main features of big data are:

  • volume: the sheer amount of data is on a very large scale
  • variety: the type of data being collected is wide-ranging, varied and may be difficult to classify.
  • velocity: the data changes quickly and may include constantly changing data sources.

Where is Big Data Used

Big data is used for different purposes. In some cases, it is used to record factual data such as banking transactions. However, it is increasingly being used to analyse trends and try to make predictions based on relationships and correlations within the data. Big data is being created all the time in many different areas of life. Examples include:

  • scientific research
  • retail
  • banking
  • government
  • mobile networks
  • security
  • real-time applications
  • the Internet.

Big Data & Latency

Latency is critical here and could be described as the time delay of the amount of time it takes to turn the raw data into meaningful information. With big data there may be a large degree of latency due to the amount of time taken to access and manipulate the sheer number of records.

Machine Learning

Quantitative data can be stored in standard relational databases, it makes it relatively simple to query the data to provide results. Even on a large database this could be done accurately and relatively quickly.

Qualitative data can be stored in a database but it is much harder to analyse or query. Also qualitative data is more likely to be unstructured, so you could end up with just a table of possibly incomplete data. For example, if an online retailer asks for feedback in the form of customer comments they could receive millions of items of data. It would essentially require a team to read each comment and categorize it, is it positive, negative or neutral. Just imagine if you wanted to get more from this.

Machine learning can be used to automate this process, it covers everything from pattern matching to artificial intelligence. At a simple level the machine could look for patterns of words within the comment to determine the nature of the feedback. It could be programmed with the words and phrases to look for. This could be developed to also include some understanding of how the words are used.

More advanced machine learning is where the computer is able to develop its own knowledge based on the data it is manipulating. This often allows big data to identify none obvious patterns and correlations in the data.