Agricultural price predictions are an integral component of trade and policy analysis. As the prices of agricultural commodities directly influence the real income of farmers and it also affects the national foreign currency generate. Sesame is highly produced in some tropical and subtropical rain forest Ethiopia region. The thesis is to build a model that can predict market prices of sesame commodity. Based on the complexity of sesame price prediction; the predicting models used for crop are linear regression, support vector machine and neural network models to predict a future price. A data have been taken from the ECX website (www.ecx.com.et) in the interval of January 2013 to March 2019. The total numbers of records selected to the experiments are 5,327 daily prices are used for proposed models. The experimental result had evaluated by RMSE, MSE and CC metrics. We follow six phase CRISP-DM process model for sesame price prediction. The process phase are, business understanding, data understanding, data preparation, modeling, evaluating and deployment.
Table of Contents
Acknowledgment
Table of Tables
Table of Figures
List of Abbreviations
Abstract
Chapter One
1. Introduction
1.1. Background
1.2. Statement of the Problem
1.2.1. Research Questions
1.3. Objective of the Study
1.3.1. General Objective
1.3.2. Specific Objective
1.4. Scope of the Study
1.5. Limitation of the Study
1.6. Significant of the Study
1.7. Thesis Organization
Chapter Two
2. Literature Review and Related Work
2.1. Price Prediction
2.2. Why Price Prediction?
2.3. Overview of Machine learning
2.3.1. Supervised Learning
2.3.2. Unsupervised Learning
2.3.3. Reinforcement Learning
2.4. Frameworks for Building Data Mining
2.4.1. Knowledge Discovery Databases (KDD)
2.4.2. Cross-Industry Standard Process for Data Mining
2.4.3. SEMMA (Sample, Explore, Modify, Model, Assess)
2.5. Related Work
2.5.1. Summary of Related Work
Chapter Three
3. Methodology
3.1. General Framework of Proposed Architecture
3.2. Data Collection
3.3. Data Analysis
3.4. Data Preprocessing
3.5. Data Transformation
3.6. Attributes Selection Method
3.6.1. Correlation-based Feature Selection (CFS)
3.6.2. Relief Attribute Evaluation
3.7. Model Design Methods
3.7.1. Linear Regression
3.7.2. Support Vector Machine
3.7.3. Neural Network
3.8. Performance Evaluation Method
3.8.1. Correlation Coefficient (CC)
3.8.2. Mean Absolute Error (MAE)
3.8.3. Root Mean-Squared Error (RMSE)
Chapter Four
4. Result and Discussion
4.1. Attribute Selection Result
4.2. Experimental Result of Predictive Algorithms
4.2.1. Predicting of Sesame Closing Price Using Linear Regression
4.2.2. Predicting of Sesame Price Using Support Vector Machine
4.2.3. Predicting of Sesame Price using Neural Network
4.3. Performance Evaluation of the Predictive Algorithm
4.3.1. 10 Fold Cross Validation
4.3.2. Percentage Split Validation (70%Training and 30% Testing)
Chapter Five
5. Conclusion and Recommendation
5.1. Conclusion
5.2. Recommendation
6. Reference
7. Appendix
Acknowledgment
First, I would like to thanks to my almighty God and his mother St. Marry for the good health and wellbeing that were necessary to complete this research.
Second I am grateful to my advisor Dr. Melkamu Beyene, for his continual support stating from title selection to implementation.
Thirdly, I would like to thanks my co advisor Mr. Assefa Chekole for his supporting and suggestion our paper.
Fourthly, I would like to thanks my family MS. Bosena Bogale (Woseye), Mr. Gashaw Gebiyaw, MS. Habtam Chekole, Mr, Demoz Gezahagn and my beloved wife Mamey Demoz. I am deeply grateful my family for supporting for master program from beginning to end.
Finally, I would like to thanks my friend Dessie Fikadu and his sister Haymanot Abebaw whose support for using the internet access freely.
Dedication
This thesis is dedicated to my family, specially my beloved wife Mamey Demoz.
Table of Tables
TABLE 1:THE NAME OF MARKET AND LOCATION 1
TABLE 2:THE ATTRIBUTE OF DATASET AND THEIR DATA TYPE 1
TABLE 3: ANALYSIS OF ATTRIBUTE VALUES 1
TABLE 4:MISSING VALUES 1
TABLE 5:ATTRIBUTE NAME TRANSLATION 1
TABLE 6: RESULT OF ATTRIBUTE SELECTION 1
TABLE 7:SAMPLE RESULT OF ACTUAL AND PREDICTING PRICE USING LINEAR REGRESSION1
TABLE 8:SAMPLE RESULT OF ACTUAL AND PREDICTION PRICE USING SUPPORT VECTOR MACHINE 1
TABLE 9:ACTUAL AND PREDICTING SESAME PRICE USING NEURAL NETWORK1
TABLE 10:RESULT OF 10 FOLDS CROSS VALIDATION USING THE SELECTED ATTRIBUTES 1
TABLE 11:RESULT OF PERCENTAGE SPLIT VALIDATION USING THE SELECTED ATTRIBUTES 1
Table of Figures
FIGURE 1: CATEGORY OF MACHINE LEARNING 1
FIG -2: KDDDATA MINING PROCESS FLOW
FIGURE 3: PROCESS DIAGRAM OF CRISP-DM 1
FIGURE 4. ARCHITECTURE OF THE PROPOSED M 1
FIG-5:ETHIOPIA SESAME MARKET AREA 1
FIG 6: CLOSING PRICE 1
FIG-7: OPENING PRICE 1
FIG-8:MAX-PRICE AND MIN-PRICE INFORMATION 1
FIG-9:QUANTITY INFORMATION
FIG-10:MARKET PLACE WITH FREQUENCY INFORMATION 1
FIG-11 LINEAR LINE SEPARATING THE DATA TYPES 1
FIG-12:NODE WITH INPUTS 1
FIG-13:SIMPLE NODE 1
FIG-14:THREELAYER NEURAL NETWORK 1
FIG-15: RESULT OF ACTUAL AND PREDICTED PRICE USING LINEAR REGRESSION 1
FIG-16:ACTUAL AND PREDICTED VALUE SESAME PRICE USING SUPPORT VECTOR MACHINE 1
FIG-17: ACTUAL AND PREDICT PRICE USING NEURAL NETWORK1
FIG-18:DESIGNED NEURAL NETWORK ARCHITECTURE 1
FIGURE 19:PERFORMANCE COMPARISON USING 10-FOLD CROSS VALIDATION
FIGURE 20:PERFORMANCE COMPARISON USING PERCENTAGE SPLIT VALIDATION 1
List of Abbreviations
Abbildung in dieser Leseprobe nicht enthalten
Abstract
Agricultural price predictions are an integral component of trade and policy analysis. As the prices of agricultural commodities directly influence the real income of farmers and it also affects the national foreign currency generate at all. Sesame is highly produced in the hot Ethiopia region. The objective of this thesis is to analyze the existing price and predict market prices of sesame commodity. Based on the complexity of sesame price prediction; the predicting models used for crop are linear regression, support vector machine and neural network models to predict a January 2013 to March 2019. The total numbers of records are 5,327 daily prices are used for future price. The researcher selects neural network model, which is preferable than linear regression and SVM algorithms. The experimental result had lower in RMSE, lower MSE and higher CC. We follow five processing steps for building a prediction model. Such as data collection, data selection, preprocess the data, data transformation, attributes selection, model development, evaluate the performance and select the best algorithms.
The researcher used two feature selection approach correlation based future selection and relief attribute selection method. Out of nine attributes in the dataset, we select six attributes (Trade date, quantity, production year, min-price, opening price, and max-price) important to predicting.
The performances of the models are evaluated by using 10-cross validation and percentage split validation performance metrics. The empirical results illustrate that the neural network is the first, the second is support vector machine and linear regression is the third. However the three algorithms are able to produced satisfactory results, so they may be very useful in long-term price prediction.
Key words: Price prediction, linear Regression, Support Vector Machine, Neural Network, ECX
Chapter One
1. Introduction
1.1. Background
Sesame (Sesamum indicum L, 2n=26) grouped under the family Pedaliaceae; is the most ancient oil seed known and used by man . It is called ‘Queen of oil seeds' due to its high quality polyunsaturated stable fatty acid, which restrains oxidative rancidity. It is also stable due to the natural anti oxidants sesamol and sesamolinol that reduce the rate of oxidation [1].
The oil seed sector in Ethiopia is one of the fastest growing sectors in the country, both in terms of its foreign exchange earnings and as a source of income for thousands of Ethiopians farmers. Sesame is among the most important export oilseed crops in the country, mainly as the second commercial export commodity next to coffee [2].
In Ethiopia the economy policy was agricultural lead economy. Sesame is one of the agricultural crops and important to inputs for oil seed industries and the source of hard currency. Currently many governmental organizations are collecting and presenting daily, weekly, monthly and yearly price information for the selected markets. However, the information is not well analyzed and predicts the future price information using different scientific algorithms and methods.
The main objective of this thesis is to design predictive model and predict the future sesame price. We used design science five phase process model problem awareness, suggestion, development, evaluation and conclusion.
The input data were collected from Ethiopia commodity exchange authority. The total samples data's are 5,327 records.
Data mining provides methodology to transform these data into useful information for decisionmaking [3].Data mining techniques can be used to develop an innovative model to predict the market price of the commodities and also useful for farmers to plan their sesame cultivation activities so that they could fetch more prices to predict price for planning and implementation of agriculture development programs to stabilize the market price for the commodity. This pridiction model is not useful for farmers and traders but also useful for agriculture planning, framing polices schemes in agriculture and market planning. Future price predicting takes an existing series of data to predict the future value [4]
There are many kinds of prediction method on basis of supervised learning; among them the researcher used linear regression, support vector machine and neural network algorithms in order to predict the future price information. The performances of the predictive algorithms are evaluated using correlation coefficient, mean absolute error and root mean square error metrics. And select the best model for prediction purpose. The predicted results are valuable to traders to enter the market and farmers to gain profit.
1.2. Statement of the Problem
According to food and agricultural organization(FAO) report, Ethiopia is the second top sesame crop exporting country in the world with the global market share of around 20% [5]. The crop is cultivated once a year with the rainy season in some hot Ethiopian areas. The prices of sesames are directly affecting the income of farmers and also the national economy. Since, it is the second export crops in agriculture sectors next to the coffee. The market prices of sesame crop are collected by governmental and non-governmental organizations but this information is not used in decision making process.
There are a number of studies conducted to investigate the prediction of the agricultural commodities specially onion, tomato, soya beans, rice, sesame, maize, etc. ([14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24]and [25]).These are motivated partly by the dynamic nature of the problem as well as the need for better results and the real problems of Ethiopia future price prediction.
Currently many governmental organizations are collecting and presenting daily, weekly, monthly and yearly price information for the selected markets. However, the information is not well analyzed and predicts the future price information using different scientific algorithms and methods.
Sesame price is unstable and it changes fast. This makes sesame price prediction as a challenging endeavor. The farmer cultivated the sesame crops, traders needs enter into the market and policy makers to prepare national policies with tradition prediction. The nature of growing sesame is different from others crops, which is cultivated in rainy season and easily affected by pesticide, drought and excessive rainfall. The farmers fear of the price of sesame, because there is no enough company to use sesame commodities in local process. The price is depending on the demand of international markets. When the demand of sesame in the international market decrease the price will be decreases because, still Ethiopia is not price maker in international market.
1.2.1. Research Questions
In this research, the following research questions will be answered.
- Which features can be used to the price prediction?
- Which predictive model is more accurate in predicting future market price of sesame?
1.3. Objective of the Study
1.3.1. General Objective
The general objective of this study is to build a model that can predict the future price of sesame crop.
1.3.2. Specific Objective
In order to achieve the general objective the researcher will undertaking the following specific objectives.
- To understand the problem domain by reviewing related literatures.
- Prepare the data for analysis.
- To identify the techniques used to predict the prices of the sesame,
- To build the predictive model for price prediction,
- To evaluate and measure the accuracy of predicting prices of the sesame and
- To analyze the outcome of the research and make recommendations based on findings
1.4. Scope of the Study
This study focuses on different price forecasting techniques used for forecasting and suggests the best among them and reports these details with their results. The present study was conducted to analyze the dynamic behavior of prices by selecting major markets of sesame crop in Ethiopia namely Metema, Abrhajira, Gondar, Humera, Shiraro, Dessie, and Pawi.
1.5. Limitation of the Study
The limitation in this study is the values of the attributes recorded in interval form have an unequal width; such type of data representation has some influence in the result. The time given to undertake this research was also the limitations of this research, which is not sufficient.
This thesis is unable to include the international market price, environmental condition (like annual rainfall rate), production cost of sesame attributes. So, those attributes are not incorporated. If those attributes are incorporated the performance of the model is increased and more robust.
1.6. Significant of the Study
As the prices of agricultural commodities directly influence the real income of the farmers, it also affects the farmers' livelihood. For producers price predicting will help in planning the farm based on expected profits. Hence, it also helps in combating the price risk.
Sesame price predicts also helps the researchers and policy makers in planning appropriate policies to fight hunger and to ensure food security.
This research is to avail real time, integrated and up-to-date predictive sesame price information to farmers, traders and decision maker at all levels and there by ultimately enhance the national economy Enhancing the bargaining power of the farmers in the market and also farmers has improved their income from the sesame product.
1.7. Thesis Organization
The remaining chapters of this thesis are organized as follows; chapter two describes the concept of price prediction overviews of data mining concept and also reviews related work in the area of price prediction. Chapter three describes the methodology, which shows the proposed model design diagram, data collection method, feature selection methods, performance evaluation metrics and prediction algorithms are describes in briefly. Chapter four the experimental part which briefly shows the result of feature selection and comparison of predictive model using 10 fold cross validation and percentage split validation. The last chapter shows the conclusion and future work or recommendation part.
Chapter Two
2. Literature Review and Related Work
2.1. Price Prediction
Predicting is one of the main objectives of data analyses having the art of saying what will happen in the future rather than why. Predictor can choose a method of predicting based on experience and available external information. As the process goes on, this procedure can be modified to meet the conditions and to satisfy the current situation.Prediction involves using some variables or fields in the database to predict unknown or future values of other variables of interest[6].
2.2. Why Price Prediction?
The ability to predict many types of events seems natural today due to advent of technological breakthroughs. The trend in being able to accurately predict more events, particularly those of an economic nature, will continue to provide a better base for planning. Formal forecasting methods are the means by which this improvement will occur.
Considering the price scenario, proper understanding of sesame price mechanism and their predict would help the farmers to plan and decide about the production portfolio and their marketing for improved farm profit, consumers to plan their budget, traders to know the market trend and Government to plan economic development in the nation. The sufficient information about the prices would strengthen the weak linkage between production and marketing in the country..
Agricultural price predictions are an integral component of trade and policy analysis. As the prices of agricultural commodities directly influence the real income of consumers, it also affects the consumers' access to food. This effect is more prominent in the case of the poorer section of population whose major portion of household income is apportioned towards expenditure on food as well as in the case of small and marginal farm households who are net buyers of food grains. Sesame price predictions also help the researchers and policy makers in planning appropriate policies to fight hunger and to ensure food security. For producers price forecasts will help in planning the farm production and marketing based on expected profits. Hence, it also helps in combating the price risk.
2.3. Overview of Machine learning
At a high level, Machine learning tasks can be categorized into three groups based on the desired output and the kind of input required to produce it.
Abbildung in dieser Leseprobe nicht enthalten
Figure 1: Category of Machine Learning 1
2.3.1. Supervised Learning
The machine learning algorithm is provided with a large enough example input dataset respective output or event/class, usually prepared in consultation with the subject matter expert of a respective domain. The goal of the algorithm is to learn patterns in the data and build a general set of rules to map input to the class or event.Broadly, there are two types commonly used as supervised learning algorithms are regression and classification [7].
i. Regression
The output to be predicted is a continuous number in relevance with a given input dataset. Example use cases are predictions of retail sales, prediction of number of staff required for each shift, number of car parking spaces required for a retail store, credit score, for a customer, etc.
ii. Classification
The output to be predicted is the actual or the probability of an event/class and the number of classes to be predicted can be two or more. The algorithm should learn the patterns in the relevant input of each class from historical data and be able to predict the unseen class or event in the future considering their input. An example use case is spam email filtering where the output expected is to classify an email into either a “spam” or “not spam.”
Building supervised learning models has three stages:
a. Training: The algorithm will be provided with historical input data with the mapped output. The algorithm will learn the patterns within the input data for each output and represent that as a statistical equation, which is also commonly known as a model.
b. Testing or validation: In this phase the performance of the trained model is evaluated, usually by applying it on a dataset (that was not used as part of the training) to predict the class or event.
c. Prediction: Here we apply the trained model to a data set that was not part of either the training or testing. The prediction will be used to drive business decisions.
2.3.2. Unsupervised Learning
There are situations where the desired output class/event is unknown for historical data. The objective in such cases would be to study the patterns in the input dataset to get better understanding and identify similar patterns that can be grouped into specific classes or events. As these types of algorithms do not require any intervention from the subject matter experts beforehand, they are called unsupervised learning. Let's look at some examples of unsupervised learning.
Clustering
Assume that the classes are not known beforehand for a given dataset. The goal here is to divide the input dataset into logical groups of related items. Some examples are grouping similar news articles, grouping similar customers based on their profile, etc.
2.3.3. Reinforcement Learning
The basic objective of reinforcement learning algorithms is to map situations to actions that yield the maximum final reward. While mapping the action, the algorithm should not just consider the immediate reward but also next and all subsequent rewards. For example, a program to play a game or drive a car will have to constantly interact with a dynamic environment in which it is expected to perform a certain goal. Examples of reinforcement learning techniques are the following:
-Markov decision process
-Q-learning
- Temporal Difference methods
- Monte-Carlo methods
2.4. Frameworks for Building Data Mining
Data mining is ability to extract insight fromhuge amounts of data that had previously no use or was underutilized to learn the trend/patterns and predict the possibilities that help to drive business decisions leading to profit[8].
Machine learning method had emerged in the early 1980s, it and has seen a great growth. With the emergence of this field, different process frameworks were introduced. These process frameworks guide and carry the machine learning tasks and its applications. Efforts were made to use data mining process frameworks that will guide the implementation of data mining on big or huge amount of data. Mainly three data mining process frameworks have been most popular, and widely practiced by data mining experts/researchers to build machine learning systems. These models are the following:
A. Knowledge Discovery Databases (KDD) process model
B. Cross Industrial Standard Process for Data Mining (CRISP - DM)
C. Sample, Explore, Modify, Model and Assess (SEMMA)
2.4.1. Knowledge Discovery Databases (KDD)
This refers to the overall process of discovering useful knowledge from data, which was presented by [9]. It is an integration of multiple technologiesfor data management such as data warehousing, statistic machine learning, decisionsupport, visualization, and parallel computing. As the name suggests, KnowledgeDiscovery Databases center around the overall process of knowledge discovery from datathat covers the entire life cycle of data that includes how the data are stored, how it isaccessed, how algorithms can be scaled to enormous datasets efficiently, how results canbe interpreted and visualized[10]. There are five steps in KDD, presented in the underline figure 2.
Abbildung in dieser Leseprobe nicht enthalten
Fig-2: KDD Data Mining Process Flow 1
Selection
In this step, selection and integration of the target data from possibly many different and heterogeneous sources is performed. Then the correct subset of variables and data samples relevant to the analysis task is retrieved from the database.
Preprocessing
Real-world datasets are often incomplete that is, attribute values will be missing; noisy (errors and outliers); and inconsistent, which means there exists discrepancies between the collected data. The unclean data can confuse the mining procedures and lead to unreliable and invalid outputs[11]. Also, performing complex analysis and mining on a huge amount of such soiled data may take a very long time. Preprocessing and cleaning should improve the quality of data and mining results by enhancing the actual mining process. The actions to be taken include the following:
- Collecting required data or information to model
- Outlier treatment or removal of noise
- Using prior domain knowledge to remove the inconsistencies and duplicates from the data
- Choice of strategies for handling missing data
Transformation
In this step, data is transformed or consolidated into forms appropriate for mining, that finding useful features to represent the data depending on the goal of the task. For example, in highdimensional spaces or the large number of attributes, the distances between objects may become meaningless. So dimensionality reduction and transformation methods can be used to reduce the effective number of variables under consideration or find invariant representations for the data.
There are various data transformation techniques:
- Smoothing (binning, clustering, regression, etc.)
- Aggregation
- Generalization in which a primitive data object can be replaced by higher-level concepts
- Normalization, which involves min-max-scaling or z-score
- Feature construction from the existing attributes (PCA, MDS)
- Data reduction techniques are applied to produce reduced representation of the data (smaller volume that closely maintains the integrity of the original data)
- Compression, for example, wavelets, PCA, clustering etc.\
Data Mining
In this step, machine learning algorithms are applied to extract data patterns. Exploration/ summarization methods such as mean, median, mode, standard deviation, class/concept description, and graphical techniques of low-dimensional plots can be used to understand the data. Predictive models such as classification or regression can be used to predict the event or future value. Cluster analysis can be used to understand the existence of similar groups. Select the most appropriate methods to be used for the model and pattern search.
Interpretation / Evaluation
This step is focused on interpreting the mined patterns to make them understandable by the user, such as summarization and visualization. The mined pattern or models are interpreted. Patterns are a local structure that makes statements only about restricted regions of the space spanned by the variables.
2.4.2. Cross-Industry Standard Process for Data Mining
It is generally known by its acronym CRISP-DM. It was established by the European strategic program on research in information technology initiative with an aim to create an unbiased methodology that is not domain dependent[12]. It is an effort to consolidate data mining process best practices followed by experts to tackle data mining problems. This framework is an idealized sequence of activities. It is an iterative process and many of the tasks backtrack to previous tasks and repeat certain actions to bring more clarity. There are six major phases as shown in Figure 3.
Abbildung in dieser Leseprobe nicht enthalten
Figure 3: Process Diagram of CRISP-DM 1
Phase 1: Business Understanding
As the name suggests the focus at this stage is to understand the overall project objectives and expectations from a business perspective. These objectives are converted to a data mining or machine learning problem definition and a plan of action around data requirements, business owners input, and how outcome performance evaluation metrics are designed[13].
Phase 2: Data Understanding
In this phase, initial data are collected that were identified as requirements in the previous phase. Activities are carried out to understanding data gaps or relevance of the data to the objective in hand, any data quality issues, and first insights into the data to bring out appropriate hypotheses.
The outcome of this phase will be presented to the business iteratively to bring more clarity into the business understanding and project objective.
Phase 3: Data Preparation
This phase is all about cleaning the data so that it's ready to be used for the model building phase. Cleaning data could involve filling the known data gaps from previous steps, missing value treatments, identifying the important features, applying transformations, and creating new relevant features where applicable. This is one of the most important phases as the model's accuracy will depend significantly on the quality of data that is being fed into the algorithm to learn the patterns.
Phase 4: Modeling
There are multiple machine learning algorithms available to solve a given problem. So various appropriate machine learning algorithms are applied onto the clean dataset, and their parameters are tuned to the optimal possible values. Model performance for each of the applied models is recorded.
Phase 5: Evaluation
In this stage a benchmarking exercise will be carried out among all the different models that ware identified to have been giving high accuracy. Model will be tested against data that was not used as part of the training to evaluate its performance consistency. The results will be verified against the business requirement identified in phase 1. The subject matter experts from the business will be involved to ensure that the model results are accurate and usable as per required by the project objective.
Phase 6: Deployment
The key focus in this phase is the usability of the model output. So the final model signed off by the subject matter expert will be implemented, and the consumers of the model output will be trained on how to interpret or use it to take the business decisions defined in the business understanding phase. The implementation could be as generating a prediction report and sharing it with the user. Also periodic model training and prediction times will be scheduled based on the business requirement.
2.4.3. SEMMA (Sample, Explore, Modify, Model, Assess)
SEMMA are the sequential steps to build machine learning models incorporated in ‘SAS Enterprise Miner', a product by SAS Institute Inc., one of the largest producers of commercial statistical and business intelligence software. However the sequential steps guide the development of a machine learning system. Let's look at the five sequential steps to understand it better.
Sample
This step is all about selecting the subset of the right volume dataset from a large dataset provided for building the model. This will help us to build the model efficiently. This was a famous practice when the computation power was expensive; however it is still in practice. The selected subset of data should be actual representation of the entire dataset originally collected, which means it should contain sufficient information to retrieve. The data is also divided for training and validation at this stage.
Explore
In this phase activities are carried out to understand the data gaps and relationship with each other's. Two key activities are univariate and multivariate analysis. In univariate analysis each variable looks individually to understand its distribution, whereas in multivariate analysis the relationship between each variable is explored. Data visualization is heavily used to help understand the data better.
Modify
In this phase variables are cleaned where required. New derived features are created by applying business logic to existing features based on the requirement. Variables are transformed if necessary. The outcome of this phase is a clean dataset that can be passed to the machine learning algorithm to build the model.
Model
In this phase, various data mining techniques are applied on the preprocessed data to benchmark their performance against desired outcomes.
Assess
This is the last phase. Here model performance is evaluated against the test data (not used in model training) to ensure reliability and business usefulness.
2.5. Related Work
The researcher reviewed various empirical research works in agricultural product price prediction. A review of past research works in the field of predicting models has been compiled to enable better understanding of the research in predicting and method of analysis to the research subject.
Qasem et, al [14]tries to build decision tree model which is analyse and predict the stock price return. Researchers used three years Aman stock exchange historical data and select six attribute for this experiment. The accuracy of the empirical result is 66% using 10 fold cross validation and percentage split validation metric.
Jha and Sinha[15]develops a user-friendly ANN based decision support system by integrating linear and nonlinear forecasting methodologies. They tries to compare the ARIMA and TDNN models in terms of both modelling and forecasting using monthly wholesale price data of oilseed crops, namely soybean and rapeseed mustard traded in Indore and Delhi markets of India. The TDNN model in general has provided a better forecast accuracy in terms of conventional RMSE and MAD values as compared to the ARIMA model.
Kaur et, al [16] investigates the problem of agricultural product price prediction.In order to solve this problem, researchers used different data mining techniques to evaluate different datasets. They used the K-Means approach, utilize for predicting the agricultural product.
Luo et, al [17] explores two machine learning model: neural network and genetic algorithm for predicting Lenten's edodes price for Beijing Xinfadi wholesale market. A total of 84 records collected between 2003 and 2009 were fed into the three models for training and testing. The study uses back propagation neural network, the neural network model based on genetic algorithm, RBF neural network model. The researcher tries to compare the above models, the predicting ability of BP neural network model are the worst. The neural network model based on genetic algorithm was generally more accurate than RBF neural network model and BP neural network.
Subhasree and Arun [18] aims to prediction of tomato price. They develop BP neural network and proposed genetic based neural network prediction model to achieve the objective. Jan 2009 to Mar 2012 weekly price data of tomato has been taken for prediction. Taking former three week price data as input and later one week price data as output. They evaluated the performance by using mean absolute error (MAE). They conclude that genetic based neural network more predictive than BP neural network.
Joangue et, al [19]explores to develop a simple and practical forecasting model for monthly domestic onion prices based on SVR (Support Vector Regression).The data set was collected from January 2007 to December 2016. The samples that span from January 2007 to December 2013 and January 2014 to December 2016 are used as training and test data sets, respectively, so as to evaluate the out-of-sample forecasting accuracy. They used three variables, such as monthly shipment amount, monthly input amount and monthly onion price.
Amiri et, al [20] studied the ability of geostatistical models and Inverse distance weighting (IDW)), adaptive neuro-fuzzy inference system (ANFIS) and winter method for prediction of seasonality in prices of potatoes and onions in Iran over the seasonal period 1986 to 2001. The results indicate that winter and ANFIS had powerful results for prediction the prices while geostatistical models were not useful in this respect.
Kung et, al [21]presents accuracy analysis of predicting yield using ensemble neural network method. They collected 9953 data sets from the agriculture and food agency council of Taiwan from1997 to 2014. The variables are air temperature, relativehumidity, and precipitation; the environmental factors included the planting area, harvested area,harvest and harvest per unit volume; and the economic factors included the cost of production andthe market trading price. They used five neural network model generate randomly, and also the data sets splits 60% for training the remaining for testing. The accuracy of the five models were 90.81%, 86.70%, 88.10%, 89.87%, 93.30%, respectively.
Cenas [22]tried to shows the comparison between two models;Auto regressive integrated moving average (abbreviated ARIMA) and Kalman algorithms used for for casting the rice crops. The data collected from National Statistics Office (NSO) and PhilippineAgricultural Statistic Office. The time periods are from 1990 to 2014. He had selectedin sample from 2007 to 2011 out of samples the latest data from 2012- to 2014 for improve the accuracy. The researcher used Mean absolute error and Root mean squared error for evaluation the model as a result Kalman model lower in MAE and RMSE. The model is better accuracy than ARIMA Model.
Jose [23]presents the analysis and predicting of diamond prices using three data mining namely neural networks, linear regression and M5P regression tree. He had collected 53940 data sets from publicly available sites of theKaggle repository; the data set contains 10 unique features. He tried to compare the three model based on MAE and RMSE satirical evaluation methods. A result shown that M5P model produced better overall results than linear regression and neural network when evaluated on tenfold cross validation.
Hemaggeta and Nasira [24]presented prediction vegetable crops using back propagation algorithms. The data used for simulation the model obtained from the web sitewww.tnau.ac.in. Totally 3 years (2009-2012) weekly price of tomato has been taken for prediction. Former three week for training and the later one week is for testing. The researcher MAE evaluation metric used to evaluate the performance of the model. The researchers tried to compare two algorithms RBF and Back propagation algorisms. As shown in the experimental result radial basis function is better than back propagation algorithms.
Shah [25]tried to predict the stock price using different machine learning algorithms. For experimentation, the data are collected from yahoo finance website in particular, the stock prices of two companies were studied, namely Google Inc. (GOOG) and Yahoo Inc. (YHOO). The input variables are, closing, max-price, min-price. The researcher compared three algorithms decision stamp, linear regression, and support vector machines. The experimental result shows, support vector machine better prediction than other two algorithms.
[...]
- Quote paper
- Endalamaw Gashaw (Author), 2019, Sesame Price Prediction Using Artificial Neural Network, Munich, GRIN Verlag, https://www.grin.com/document/536740
-
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X.