Using CRISP methodology for data mining

Case Study on Ethical Considerations in Research
August 14, 2022
Correlation and Simple Linear Regression
August 14, 2022

Using CRISP methodology for data mining

We have 2 data sets to analysis using SPSS PASW 1) Wine Quality Data Set and 2) The Poker Hand Data Set. We can do this using CRISP methodology. Let us look what is CRISP by wikipedia “CRISP-DM stands for Cross Industry Standard Process for Data Mining It is a data mining process model that describes commonly used approaches that expert data miners use to tackle problems.” PASW Modeler is a data mining workbench that enables you to quickly develop predictive models using business expertise and deploy them into business operations to improve decision making. Designed around the industry-standard CRISP-DM model, IBM SPSS PASW Modeler supports the entire data mining process, from data to better business results.

CRISP DM, Clementine’s own “lightweight” methodology of 5 stages

Business Understanding, Data Understanding, Data Preparation Modelling, Evaluation and Deployment.

CRISP Methodology

Business Understanding: Understanding the project requirements & objectives from a business perspective, and then converting this knowledge into a data mining problem definition

Data understanding

In this step following activities are going on, Data understanding, Collecting Initial Data then describing Data, Exploring Data and lastly verifying Data Quality

Home

The data preparation phase

Tasks include table, record, and attribute selection as well as transformation and cleaning of data for modeling tools.Cleaning Data using appropriate cleaning and cleansing strategies then Integrating Data into a single point.

Modeling

Selection and application of various modeling techniques done in this phase, and their parameters are adjusted to optimal values. Basically, there are more than one technique for the same data mining problem type. Some techniques have specific requirements on the form of data. Therefore, stepping back to the data preparation phase is often needed. Steps consist of Generating a Test Design, Building the Models assessing the Model

Evaluation

Building of model (or models) takes place in this phase. Before proceeding to final deployment of the model, it is important to more thoroughly evaluate the model, and review the steps executed to construct the model.

Deployment

In the final stage Knowledge gained is organized presented so that an end user can easily use it. As per the requirements this can be a report or a complex data mining process. Normally Customers carry out the deployment step

Wine quality data set

Wine quality is modeled under classification and regression approaches, which preserves the order of the grades. Explanatory knowledge is given in terms of a sensitivity analysis, which measures the response changes when a given input variable is varied through its domain

The red wine data set contains 1600 samples out of which I have selected 200 random samples and doing the analysis(“Data mining cannot discover patterns that may be present in the larger body of data if those patterns are not present in the sample being “mined” “) .So I selected the data set bearing in mind. The data set I have selected has high confidence. With measurements of 13 chemical constituents (e.g. alcohol, Mg) and the goal is to find the quality of red and white wine.

Input variables