Analysis of Temparament (Arab = Mizaj) by using different Data Mining Techniques


Textbook, 2020

60 Pages


Excerpt

Inhaltsverzeichnis

ABSTRACT

ACKNOWLEDGEMENT

1. Introduction
1.1 Business Application
1.2 Data Structure
1.3 Tasks and methods
1.4 Import and Export of Data and models
1.5 Categorization of Data Mining Software into Different Types
1.6 Mijaz
1.7 Research Methodology

2. An Introduction to the WEKA Data Mining System
2.1 Data Mining
2.2 Data Mining Software
2.3 Weka Data Mining Software
2.4 Data preprocessing and visualization
2.5 Attribute Selection
2.6 Errors Rules Attribute
2.7 Classification - decision tree
2.8 Clustering - k-means
2.9 Association Rules

3. Data Collection
3.1 Brief Description
3.2 Attributes
3.3 Data Processing

4. Result Analysis and Discussions
4.1 Classification
4.2 Association
4.3 Clustering
4.4 Knowledge Flow for NB Tree Model

5. Conclusion
5.1 Future Scope of the Work

6. Reference
Annexure- I
Annexure- II
Annexure- III

Abstract

There are various classes of temperament of the persons. Mizaj is the same as temperament in Unani Pathy of Arabic. Here we have collected the data on different attributes various backgrounds and field for both male and female persons from Unani Medical College, Pune. We have tried to apply the Data mining rule Classification, Association and Clustering by using the WEKA as Data Mining tool.

We have tried to classify the data using the various models of classification and we found the Naive Bayse (NB) with train set data model showed good classification of the data into four classes with less relative absolute error and compare to J48 model and other models. These are as Bilious, Phlegmatic, Sanguine and Melancholic type. From the confusion matrix it is observed that the data has been correctly classified by Naive Bayse model. As generally the Melancholic type persons are observed very rarely therefore it showed less % of Melancholic type. According to the J48 model, the Mijaz is classified in to three types and these are as bilious, Phlegmatic, sanguine. These are classified depending upon the different attributes such as Sleeping hours, Reaction Moist, Body type, Thorax Shape, occupation and age.

We have also tried to find the relation between the attribute by applying the association rules and using Apriory model we got 10 best rules. These shows that there is some strong relation between different attributes such as Reaction strength, Movement, reaction speed and fear etc. Depending upon the level of the anger the person gets troubled. Thus based on the different attributes the temperament is classified into different types.

Further we have tried to cluster the data into different groups, by using the K-Means and EM model. The EM model clustered the data into two types only, which was not correct.

Therefore, tried apply the k-means, and depending upon the no. folds the data was further clustered. For four fold k-means model clustered the data into four clusters with varying percentage and less variation in the statistical parameters.

We also have tried to build the experiment of knowledge flow for running the model, It was applied for the NB Tree model and the experiment was loaded to get the output in terms if text view or graph views.

ACKNOWLEDGEMENT

First of all I express my sincere gratitude and thank to Dr. A. D. More, Professor, Director-MCA, IMED, Bharti Vidyapeeth, Erandvane, Pune for his continuous encouragement and his valuable guidance without this work should not have been complete.

I specially thank to Prof. Murtaza M. Junaid, Asst. Professor, AIMS, Pune for his constant support and encouragement and Dr. Javed Khan, HOD, MCA, AIMS, Pune for providing the data and arranging the Faculty Development Program on Data Mining which helped me a lot.

I also take an opportunity to thank our beloved Director, Dr. R. K. Jain for his continuous support and encouragement during this period and also thank to Prof. Avinash Devasthali, Dy. Director for the for his continuous support and encouragement.

I also thank to my colleagues for their motivational support and academic support during this period.

Finally, it is my beyond words to describe the love and affection of my Children’s and Wife for allowing to the work without any trouble throughout their busy schedule of daily life.

1. Introduction

Data mining has a long history, with strong roots in statistics, artificial intelligence, machine learning, and database research (1, 2). Advancements in this field were accompanied by development of related software tools, starting with mainframe programs for statistical analysis in the early 1950s, and leading with to a large variety of stand alone, client/server and web based software as today’s service solution

Today, a large number of standard data mining methods are available (3,4) from historical perspective. These methods have different roots. There are several different and sometime overlapping categorizations for example, fuzzy logic, artificial neural networks, and evolutionary algorithms, which are summarized as computational intelligence (5).

The life cycle of new data mining method begins with theoretical paper based on inhouse software prototypes, followed by public or on demand software distribution of successful algorithms as research prototypes. Then, special commercial or open source packages containing a family of similar algorithms are developed or the algorithms are integrated into exiting open source or commercial packages. Many companies have tried to promote their own alone packages, but only few have reached notable market shares. The life cycle of some data mining tools is remarkable short. This may be due to internal marketing decision and acquisitions of specialized companies by larger ones, leading to a reaming and integration of product lines.

The largest commercial success stories resulted from the step-wise integration of data mining methods into established commercial statistical tools. These tools were later adapted to personnel computers and client/server solutions for larger customers. With increasing popularity of data mining, algorithms such as artificial neural network or decision trees were integrated into main products and specialized data mining companies such as Integrated Solutions Ltd. In general, tools of statistical branch are very popular. The worldwide market for business intelligence is increasing day by day. The open source libraries have also become a popular since 1990s.

As the number of available tools continues to grow, to choose of one special tool become increasingly difficult for each potential user. The decision making process can be supported by criteria for the categorization of data mining tools. Different categorization tools are proposed by (6) based on the user groups, data structures, data mining tasks and methods, import and export options, and license models. There are many different data mining tools available, which fit the needs of quite different user groups. These are as follows

1. Business Application
2. Applied Research
3. Algorithm Development
4. Education

1.1 Business Application

This group uses data mining as tool for solving commercially relevant business applications such as customer relationship management, fraud detection and so on. This field is mainly covered by variety of commercial tools providing support for databases with large datasets, and deep integration in the company’s workflow.

1.2 Data Structure

An important criterion for the dimensionality of the underlying raw data in the processed data sets. The first data mining applications were focused on handling datasets of represented by two dimensional feature tables. This format is supported by nearly all existing tools. In some cases, the dataset can be sparse, with only a few nonzero features such as a list of shopping items for different customers. Some data structure datasets are characterized by the same dimensionality, example text mining (7). The most prominent format having higher dimensionality contains time series as elements, leading to dataset dimensions between one and three, example foresting, prediction of stock market. With similar dimensionality, different kind of structured data exist such as gene sequences, mass spectrograms and other. Amore recent trend is the application of data mining methods for images and videos. Another format leading to image-like dimensions including graph etc.

1.3 Tasks and methods

There are various important task in data mining, these are

1) Supervised learning, with known output variable in dataset, including

a) Classification

Data classification is a two-step process. In first step a classifier is built describing a predetermined set of data classes or concepts. This is the learning step, where a classification algorithm builds the classifier by analyzing or ‘learning form’ training set made up of database tuples and their associated class labels. The class label attribute is discrete valued and unordered. It is categorical in that each value serves as category or class. The individual tuples making up the training set are referred to as training tuples and are selected from the database under analysis. In the context of classification, data tuples can be referred to as samples, instances, data points or objects.

Because the class label of each training tuple is provided, this step is also known as supervised learning

Types of Classification:

Various approaches to classification have become popular over a period of time. We will briefly discuss those

- Decision Tree Induction

Decision tree induction algorithm functions recursively. First, an attribute must be selected as the root node. In order to create the most efficient tree, the root node must effectively split the data. Each split attempts to pare down a set of instances until they all have the same classification. The best split is the one that provides what is termed the most information gain.

- Bayesian Classification

Bayesian Classification theory gives mathematical calculus of degrees of belief. Describing what it means for beliefs to be consistent and how they should change with evidence.

- Rule Based classification

The rule based classification process consist of using a training set of labeled objects, from which classification rules are extracted by the employment of basic operation for building a classifier and using it to predict the class label of a given unlabeled object.

- Support Vector Machines

Maximum margin separator is determined by a subset of the data points. Data points in this subset are called ‘support vector’. It will be useful computationally if a small fraction of the data points are support vectors, because we use the support vectors to decide which side of the separator a test case is on.

- Associative Classification

In this type, Indentify discrete continuous attributes, if any and generate all class association rules. Finally build a classifier using the generated class association rules.

- Genetic Algorithms

The algorithm is started with a set of solutions called population. Solution from one population are taken and used to form a new population. This is motivated by a hope, that the new population will be better than the old one. Solutions which are selected to form new solution are selected according to their fitness the more suitable they are the more chances they have to reproduce. This repeated until some condition is satisfied.

- Rough Set Approach

The rough sets methodology provides definitions and methods for finding which attributes separates one class or classification from another. Since inconsistencies are allowed and membership is a set does not have to be absolute, the potential for handling noise gracefully is big. The results from a training phase when using the rough Sets approach will usually be a set of propositional rules which may be said to have syntactic and semantic simplicity for human. How the problems of dynamic database, time and memory constraints are default with will be different for each system using the Rough sets approach, but typically the time complexity will be high.

b) fuzzy classification c) regression

2) Unsupervised learning, without a knowing output variable in the dataset, including a) Clustering

In the class label of each training tuple is not known are known as unsupewised learning and the number or set of classes to be learned may not be known in advance.

A clustering is collection of objects which are ‘similar’ between them and are ‘dissimilar’ to the objects belonging to other clusters. Cluster is a process basically of an unsupervised learning category. It deals with finding a structure in a collection of unlabeled data. By a label we mean identification of a group of similar objects to which the object belongs. There can be two way of clustering. One is the distance-based clustering; the other could be conceptual clustering. In case of conceptual model two or more objects belong to the support a particular social movement could be one cluster. Other who support alternatively to movements could form other clusters. In other words, objects are grouped according to their fit to descriptive concepts, not according to simple similarity measures.

b) Association learning

Data mining is the process of extracting interesting, non trivial, implicit, previously unknown and potentially useful patterns or knowledge from huge amount of data(3). Association rule mining is one of the data mining techniques used for discovering interesting relations between variables in large databases. Agarwal(8) introduced association rules for discovering regularities between products in large scale transaction data recorded is supermarkets. It is known a market basket analysis as it searches for interesting customer habits by observing the items that appear in the basket together. This knowledge can then be used to facilitate planning of Market strategies such as promotional pricing, crossselling, product placements, catalogue design etc. Recently association rule mining application scope is broadened and is used in variety of situations including establishing association between quantitative ad categorical attributes.

- Measuring Association Rules

An Association rule is an implication of the form A^B, where A and B are subsets of an attribute set and A0B=O. X is often referenced as the antecedent and Y as the respondent. The rule is interpreted, in case of Market transaction as, when item X is purchased by customer then item Y is also purchased. For example Bread^Butter indicate that these two items are often purchased together.

In data mining interestingness and usefulness of mined pattern is very important and we need a set of measures for establishing the same. We will first define the related terminology and the required set of measures and then discuss various ways of mining these association rules.

- Basics Algorithms

Most algorithms use support -confidence frame work and use a two step process of

1) discovering all frequent or large item sets having at least min_Sup support and
2) generating association rules from the frequent item sets having atleast min_con confidence. Step 2 has always remain the same with generating all possible combinations and checking them on min_con criteria while there are several algorithms in the literature for finding frequent item sets.

- Apriory Algorithm

The algorithm starts with generating frequent 1-itemsets and the systematically generate frequent k+1 itemsets from already generated frequent k-itemsets. It uses a property of frequent itemsets called ariory property which states that a subset of frequent itemset is always frequent. It can be sated in other way as a super set of a non frequent itemset cannot be frequent.

3) Semi-supervised learning, whereby the output variable is known only for some example.

Each of these tasks consists of a chain of low level tasks example data cleaning, filtering , feature extraction, feature transformation, evaluation, model validation and optimization. For the almost all of the tasks, a large variety of statistical method including classifier using estimated probability density functions, factors analysis and newer machine learning method such as artificial neural networks, fuzzy model, rough sets, support vector machine, decision trees, random forest are available.

Not all of the data mining methods are available in all software tools. The subjective evaluation of the frequency with specific methods are given bellow

a) In some frequent cases classifier using estimated probability density functions, correlation analysis, statistical feature selection, and relevance test.
b) In many tools, decision trees, clustering, regression, data cleaning, data filtering, feature extraction, principal component analysis, factor analysis, advanced feature evaluation and selection, computation of similarities, artificial neural networks, model cross validation are used.
c) In some tools fuzzy classification, association learning and mining frequent item sets, independent component analysis, bootstrapping, complexity measures, model fusion, support vector machines Bayesian networks and learning of crisp rules are used etc.
d) In some infrequent mode random forest, learning of fuzzy system, rough sets and model optimization by evolutionary algorithm is used.

1.4 Import and Export of Data and models

The ease with which data and models can be imported and exported among different software tools plays crucial role in the functionality of data mining tools. First, the data is normally generated and hosted from different sources such as databases of software associated with measurement devices. In business applications, interfaces to database such as Oracle or any database supporting the Structured Query Language (SQL) are the most common means of importing data and all other non-data mining tools support export as text or excel files, formats such as CSV (comma separated values) are frequently used to import formats with data mining tools. In addition, almost all software’s have proprietary binary or textual files, and exchange formats for data and model, example Attribute Relation file Format in WEKA.

1.5 Categorization of Data Mining Software into Different Types

There are different types of similar data mining tools can be found.

The most prominent example is Waikato Environment for Knowledge Analysis (WEKA). WEKA stated in 1994 as a C++ library with first public release in 1996 and in 1999, it was completely rebuilt as Java package, since time it has been regularly updated. The detailed information about the tool is given in chapter 2.

1.6 Mijaz

Unani pathy (8) is a science which deals with health and disease. Today India is one of the leading countries so far as its practices concerned. It has the largest number of Unani educational, research and healthcare institutions. According to Unani pathy the human body is considered to be composed of different natural components. Temperament (Mizaj) is one of them and it indicates the properties of an atom (Unsur) a molecule, a cell, a tissue, an organ and of the organism as a whole. Each and every atom, molecule (murakkab), humour (khilt) cell, organ and body as a whole is furnished with a mizaj (equilibrium) upon which their properties, functions and life depends. In fact, it is the complete mirror of the chemical state of the human body and indicates environment & homeostasis of the body. Temperament as defined by Avicenna (Ibn sina) is the new state of a matter with different quality from that present in the element or compounds before coming into imtizaj (intermixture or chemical combinations) which results from the action & reaction among the contrary qualities and powers present in the atoms of different elements when they are combined together.

Mizaj indicates the principles of chemical combination of different elements (or compounds) to form a new compound, having new properties altogether different from those of the elements (or compounds) possessed by them previous to coming into combinations (imtizaj). Mizaj indicates the state of equilibrium in a compound with respect to required number of atoms and molecules of different elements and their ratio to that particular compound and the state of homeostasis in a cell or in the entire body upon which the life of the cell and the entire organism depends.

1.7 Research Methodology

To understand the various data mining techniques and tools in details, here we planned to collect the data of peoples from different fields and regions of different instances and attributes on temperaments. Also we planned to apply classification, clustering and association mining rules on the same data by using the WEKA as data mining tool and find some relation between the different attributes.

Therefore we have tried to analyze the data and to find the various types of the temperament observed between the various peoples and which are generally observed in the majority of the peoples. This is carried by applying the various classification models of data mining techniques and found the suitable model which classifies the data into four different classes correctly. We have also tried to fond the relation between the attributes by applying the NB Tree model and the Apriory algorithm

2. An Introduction to the WEKA Data Mining System

Abbildung in dieser Leseprobe nicht enthalten

2.1 Data Mining

- "Drowning in Data yet Starving for Knowledge"??? (9)
- "Computers have promised us a fountain of wisdom but delivered a flood of data" William J. Frawley, Gregory Piatetsky-Shapiro, and Christopher J. Matheus
- Data Mining: "The non trivial extraction of implicit, previously unknown, and Potentially useful information from data"William J Frawley, Gregory Piatetsky -Shapiro and Christopher J Matheus
- Data mining finds valuable information hidden in large volumes of data.
- Data mining is the analysis of data and the use of software techniques for finding Patterns and regularities in sets of data.
- Data Mining is an interdisciplinary field involving:

- Databases
- Statistics
- Machine Learning
- High Performance Computing
- Visualization
- Mathematics

2.2 Data Mining Software

KDnuggets: News: 2005: n13: item2

SIGKDD Service Award is the highest service award in the field of data mining and knowledge discovery. It is given to one individual or one group who has performed significant service to the data mining and knowledge discovery field, including professional volunteer services in disseminating technical information to the field, education, and research funding.

The 2005 ACM SIGKDD Service Award is presented to the Weka team for their development of the freely-available Weka Data Mining Software, including the accompanying book Data Mining: Practical Machine Learning Tools and Techniques (now in second edition) and much other documentation. The Weka team includes Ian H. Witten and Eibe Frank, and the following major contributors (in alphabetical order of last names): Remco R. Bouckaert, John G. Cleary, Sally Jo Cunningham, Andrew Donkin, Dale Fletcher, Steve Garner, Mark A. Hall, Geoffrey Holmes, Matt Humphrey, Lyn Hunt, Stuart Inglis, Ashraf M. Kibriya, Richard Kirkby, Brent Martin, Bob McQueen, Craig G. Nevill-Manning, Bernhard Pfahringer, Peter Reutemann, Gabi Schmidberger, Lloyd A. Smith, Tony C. Smith, Kai Ming Ting, Leonard E. Trigg, Yong Wang, Malcolm Ware, and Xin Xu.

The Weka team has put a tremendous amount of effort into continuously developing and maintaining the system since 1994. The development of Weka was funded by a grant from the New Zealand Government's Foundation for Research, Science and Technology.

The key features responsible for Weka's success are:

- it provides many different algorithms for data mining and machine learning
- is open source and freely available
- it is platform-independent
- it is easily useable by people who are not data mining specialists
- it provides flexible facilities for scripting experiments
- it has kept up-to-date, with new algorithms being added as they appear in the research literature.

2.3 Weka Data Mining Software

KDnuggets: News: 2005: n13: item2 (cont.)

The Weka Data Mining Software has been downloaded 200,000 times since it was put on Source Forge in April 2000, and is currently downloaded at a rate of 10,000/month. The Weka mailing list has over 1100 subscribers in 50 countries, including subscribers from many major companies. There are 15 well-documented substantial projects that incorporate, wrap or extend Weka, and no doubt many more that have not been reported on Sourceforge. Ian H. Witten and Eibe Frank also wrote a very popular book "Data Mining: Practical Machine Learning Tools and Techniques" (now in the second edition), that seamlessly integrates Weka system into teaching of data mining and machine learning. In addition, they provided excellent teaching material on the book website. This book became one of the most popular textbooks for data mining and machine learning, and is very frequently cited in scientific publications. Weka is a landmark system in the history of the data mining and machine learning research communities, because it is the only toolkit that has gained such widespread adoption and survived for an extended period of time (the first version of Weka was released 11 years ago). Other data mining and machine learning systems that have achieved this are individual systems, such as C4.5, not toolkits. Since Weka is freely available for download and offers many powerful features (sometimes not found in commercial data mining software), it has become one of the most widely used data mining systems. Weka also became one of the favorite vehicles for data mining research and helped to advance it by making many powerful features available to all. In sum, the Weka team has made an outstanding contribution to the data mining field.

Using Weka to teach Machine Learning, Data and Web Mining

http://uhaweb.hartford.edu/compsci/ccli/

Machine Learning, Data and Web Mining by Example

(“learning by doing” approach)

- Data preprocessing and visualization
- Attribute selection
- Classification (OneR, Decision trees)
- Prediction (Nearest neighbor)
- Model evaluation
- Clustering (K-means, Cobweb)
- Association rules Data preprocessing and visualization Initial Data Preparation (Weka data input)
- Raw data (Mijaz temperament data)

2.4 Data preprocessing and visualization

Following steps are involved in the data preprocessing and the visualization in Weka tool

1) Open Mijaz Temperament data file (a sample from a Unanipathi Medical College)
2) Mijaz - CVS format (mijaz.cvs) Relations, attributes tuples (instances) or
3) Attribute-Relation File Format (ARFF)- http://www.cs.waikato.ac.nz/~ml/weka/arff. html
4) Download and install Weka - http://www.cs.waikato.ac.nz/~ml/weka/
5) Run Weka and select the Explorer
6) Load data into Weka - ARFF format or CVS format (click on “Open file...”)
7) Converting data formats through Weka (click on “Save. ”)
8) Editing data in Weka (click on ‘Edit.’)

The above steps are shown in the following Fig. 1

Abbildung in dieser Leseprobe nicht enthalten

Fig.1 Explorer view of WEKA

The above fig. 1 shows the explorer view of Weka. A file of Mijaz.csv format is opened in the Weka explorer and further the actions are taken such as data transformation and application different data mining model of Classification, Clustering and Association etc.

[...]

Excerpt out of 60 pages

Details

Title
Analysis of Temparament (Arab = Mizaj) by using different Data Mining Techniques
Authors
Year
2020
Pages
60
Catalog Number
V981618
ISBN (eBook)
9783346338198
ISBN (Book)
9783346338204
Language
English
Keywords
analysis, temparament, arab, mizaj, data, mining, techniques
Quote paper
Dr. Bapurao Bandgar (Author)Dr. Ajit D. More (Author), 2020, Analysis of Temparament (Arab = Mizaj) by using different Data Mining Techniques, Munich, GRIN Verlag, https://www.grin.com/document/981618

Comments

  • No comments yet.
Read the ebook
Title: Analysis of Temparament (Arab = Mizaj) by using different Data Mining Techniques



Upload papers

Your term paper / thesis:

- Publication as eBook and book
- High royalties for the sales
- Completely free - with ISBN
- It only takes five minutes
- Every paper finds readers

Publish now - it's free