Data Clustering Of Numerical And Categorical Datasets Using Harmony Search Based Ensemble Technique

Main Article Content

Dr. Muhammed Basheer, Ms. Jaya Khatri, Ms. Preeta Rajiv Sivaraman, Smt.Z.Sunitha Bai


Clustering is a common method for finding patterns in underlying data in data mining applications. The majority of conventional clustering methods are restricted to datasets with numeric or categorical characteristics. In real-world data mining applications, however, datasets containing various kinds of characteristics are frequent. To address this issue, we offer a new divide-and-conquer strategy in this article. To begin, the original mixed dataset is split into two sub-datasets: pure category and pure numeric. Then, to generate matching clusters, existing well-established clustering algorithms intended for various kinds of datasets are used. The superiority of our method is shown by comparisons with existing clustering algorithms on real-world datasets. Clustering is a well-known data mining method for pattern detection and retrieval of information. The data in the first clustering dataset may be categorised or numerical. Each kind of data has its own method for clustering. The k-means method for clustering numeric datasets and the k-modes technique for categorical datasets are proposed in this area. The transformation of categorical characteristics into numeric measurements and direct application of the k-means algorithm instead of the k-modes method is one of the major issues in achieving the clustering process on categorical values. In this article, it is suggested to test a method based on the preceding problem, which involves converting categorical data into numeric values by utilising the relative frequency of each modality in the characteristics.

Article Details