Big Data Clustering Techniques: Literature Review

Onshore and Offshore Wind Farm Comparison
August 10, 2021
Non-performing Loans and Macroeconomic Variable
August 10, 2021

Big Data Clustering Techniques: Literature Review

2. CRITICAL SURVEY OF LITERATURE

This chapter gives an overview of the Big Data clustering techniques and the distance and the similarity measures used by them. Clustering using power method and its convergence techniques are described. The need of Hadoop to handle Big Data is been explained.

2.1 MACHINE LEARNING

Machine learning is the method of analyzing data that uses algorithm that learns continuously from the data without being programmed explicitly. They learn themselves when they are exposed to new data. Machine learning is similar to that of data mining to search for the patterns in the data [20].  Machine learning uses these data to detect patterns and adjust program actions accordingly. Facebook, web search engine, smart phones, page ranking etc. uses machine learning algorithms for its processing. It constructs program with the characteristics that automatically adjusts itself to increase the performance using the existing data. The motivation of machine learning is to learn automatically and identify the patterns to take decisions based on them. Machine learning is divided three methods namely [42],

  • Supervised learning or Predictive learning
  • Unsupervised learning or Descriptive learning
  • Semi supervised learning.

2.1.1 Supervised Learning

Supervised learning also called as the predictive learning is a method of mapping the inputs and the outputs, given a set of training set

∑j=1Paijand

ajtotal number ofspecies j

∑i=1Paij. Then, the Chi-square distance is given as

Xih2=∑j=1p1aijahjah-aijai2

Chebyshev Distance (Maximum valuedistance) – It is the defined as the vector space where the distance between any two vectors is greatest of the distance along any coordinate dimension. It is call as max metric or L metric [50,52].

1

1 1

1

1 1
 1 1 1

It is the maximum distance between the points in a single dimension. The formula computes the distance between two points

X=x1x2∙∙∙∙∙xnand

Y=y1y2∙∙∙∙∙ynis

Dchebp,q=maxi=xj-yj

Where

xiand

yjare the values of the