2. CRITICAL SURVEY OF LITERATURE
This chapter gives an overview of the Big Data clustering techniques and the distance and the similarity measures used by them. Clustering using power method and its convergence techniques are described. The need of Hadoop to handle Big Data is been explained.
2.1 MACHINE LEARNING
Machine learning is the method of analyzing data that uses algorithm that learns continuously from the data without being programmed explicitly. They learn themselves when they are exposed to new data. Machine learning is similar to that of data mining to search for the patterns in the data [20]. Machine learning uses these data to detect patterns and adjust program actions accordingly. Facebook, web search engine, smart phones, page ranking etc. uses machine learning algorithms for its processing. It constructs program with the characteristics that automatically adjusts itself to increase the performance using the existing data. The motivation of machine learning is to learn automatically and identify the patterns to take decisions based on them. Machine learning is divided three methods namely [42],
2.1.1 Supervised Learning
Supervised learning also called as the predictive learning is a method of mapping the inputs and the outputs, given a set of training set
∑j=1Paijand
ajtotal number ofspecies j
∑i=1Paij. Then, the Chi-square distance is given as
Xih2=∑j=1p1aijahjah-aijai2
Chebyshev Distance (Maximum valuedistance) – It is the defined as the vector space where the distance between any two vectors is greatest of the distance along any coordinate dimension. It is call as max metric or L metric [50,52].
1 |
1 | 1 |
1 |
1 | 1 |
1 | 1 | 1 |
It is the maximum distance between the points in a single dimension. The formula computes the distance between two points
X=x1x2∙∙∙∙∙xnand
Y=y1y2∙∙∙∙∙ynis
Dchebp,q=maxi=xj-yj
Where
xiand
yjare the values of the