We have 2 data sets to analysis using SPSS PASW 1) Wine Quality Data Set and 2) The Poker Hand Data Set. We can do this using CRISP methodology. Let us look what is CRISP by wikipedia “CRISP-DM stands for Cross Industry Standard Process for Data Mining It is a data mining process model that describes commonly used approaches that expert data miners use to tackle problems.” PASW Modeler is a data mining workbench that enables you to quickly develop predictive models using business expertise and deploy them into business operations to improve decision making. Designed around the industry-standard CRISP-DM model, IBM SPSS PASW Modeler supports the entire data mining process, from data to better business results.
CRISP DM, Clementine’s own “lightweight” methodology of 5 stages
Business Understanding, Data Understanding, Data Preparation Modelling, Evaluation and Deployment.
Business Understanding: Understanding the project requirements & objectives from a business perspective, and then converting this knowledge into a data mining problem definition
In this step following activities are going on, Data understanding, Collecting Initial Data then describing Data, Exploring Data and lastly verifying Data Quality
Tasks include table, record, and attribute selection as well as transformation and cleaning of data for modeling tools.Cleaning Data using appropriate cleaning and cleansing strategies then Integrating Data into a single point.
Selection and application of various modeling techniques done in this phase, and their parameters are adjusted to optimal values. Basically, there are more than one technique for the same data mining problem type. Some techniques have specific requirements on the form of data. Therefore, stepping back to the data preparation phase is often needed. Steps consist of Generating a Test Design, Building the Models assessing the Model
Building of model (or models) takes place in this phase. Before proceeding to final deployment of the model, it is important to more thoroughly evaluate the model, and review the steps executed to construct the model.
In the final stage Knowledge gained is organized presented so that an end user can easily use it. As per the requirements this can be a report or a complex data mining process. Normally Customers carry out the deployment step
Wine quality is modeled under classification and regression approaches, which preserves the order of the grades. Explanatory knowledge is given in terms of a sensitivity analysis, which measures the response changes when a given input variable is varied through its domain
The red wine data set contains 1600 samples out of which I have selected 200 random samples and doing the analysis(“Data mining cannot discover patterns that may be present in the larger body of data if those patterns are not present in the sample being “mined” “) .So I selected the data set bearing in mind. The data set I have selected has high confidence. With measurements of 13 chemical constituents (e.g. alcohol, Mg) and the goal is to find the quality of red and white wine.
Input variables