Yelp provides two main ways for users to review the businesses – reviews and stars. Traditionally, businesses have focused on their rating to assess whether users like their service or not. But reviews contain huge amounts of critical data for the businesses which they can take advantage of. Also, Yelp ratings at times do not accurately represent the actual rating a restaurant deserves. In this paper, I explore how reviews can be used to predict the rating of a business using different machine learning algorithms. I have compared performances of Naive Bayes, SVM and Logistic Regression to identify the best among them.
Inhaltsverzeichnis (Table of Contents)
- Introduction
- Related Work
- Data Collection
- Procedure Outline
- Data Preparation
- Baseline Performance
- Data Exploration
- Optimization
- Results
- Future Work
Zielsetzung und Themenschwerpunkte (Objectives and Key Themes)
This research aims to develop a classifier capable of predicting Yelp restaurant star ratings based on user reviews. The study focuses on leveraging the content and style of reviews to predict ratings, minimizing bias from user perceptions of appropriate ratings.
- Predicting Yelp restaurant star ratings using review content and style.
- Analyzing the influence of different features, including parts of speech, on rating prediction.
- Evaluating the performance of various machine learning models, including Naïve Bayes, SVM, and Logistic Regression.
- Optimizing model performance through feature engineering and parameter tuning.
- Exploring the potential of identifying user sub-genres and their rating preferences for future research.
Zusammenfassung der Kapitel (Chapter Summaries)
- Introduction: The study introduces the challenge of predicting Yelp star ratings and highlights the importance of analyzing user reviews, which are often overlooked. The focus on restaurant reviews and the need to address bias in user ratings are emphasized.
- Related Work: This section reviews previous research efforts in extracting information from user reviews, specifically focusing on Yelp data. It discusses approaches like topic identification, personalized ratings, and sentiment analysis.
- Data Collection: The data used for the project was collected from Yelp's Dataset Challenge, focusing on restaurant data.
- Procedure Outline: This section outlines the research process, including data preparation, feature selection, exploratory data analysis, baseline performance evaluation, model optimization, and results analysis.
- Data Preparation: The data is divided into development, cross-validation, and test sets. Key attributes from the business and review entities are identified and used in the analysis.
- Baseline Performance: This section describes the baseline performance of Naïve Bayes, SVM, and Logistic Regression models on the cross-validation dataset.
- Data Exploration: This section explores the development dataset to identify features with high influence on rating prediction and analyzes the effectiveness of different feature engineering approaches.
- Optimization: This section discusses the optimization of Logistic Regression through L1 and L2 regularization techniques.
- Results: This section presents the key findings, highlighting the effectiveness of POS pairs and adjectives as features, the optimal number of features for model training, and the comparative performance of Logistic Regression and SVM.
Schlüsselwörter (Keywords)
This research focuses on predicting Yelp restaurant star ratings using user reviews, utilizing machine learning techniques such as Naïve Bayes, SVM, and Logistic Regression. Key themes include feature engineering, model optimization, and the analysis of user review content and style. Specific keywords include Yelp data, restaurant ratings, sentiment analysis, POS pairs, adjectives, feature selection, and model performance.
- Citar trabajo
- Kartik Lunkad (Autor), 2015, The Prediction of Yelp Star Ratings Using Yelp Reviews, Múnich, GRIN Verlag, https://www.grin.com/document/303616