Yelp provides two main ways for users to review the businesses – reviews and stars. Traditionally, businesses have focused on their rating to assess whether users like their service or not. But reviews contain huge amounts of critical data for the businesses which they can take advantage of. Also, Yelp ratings at times do not accurately represent the actual rating a restaurant deserves. In this paper, I explore how reviews can be used to predict the rating of a business using different machine learning algorithms. I have compared performances of Naive Bayes, SVM and Logistic Regression to identify the best among them.

Extracto

Inhaltsverzeichnis (Table of Contents)

Introduction
Related Work
Data Collection
Procedure Outline
Data Preparation
Baseline Performance
Data Exploration
Optimization
Results
Future Work

Zielsetzung und Themenschwerpunkte (Objectives and Key Themes)

This research aims to develop a classifier capable of predicting Yelp restaurant star ratings based on user reviews. The study focuses on leveraging the content and style of reviews to predict ratings, minimizing bias from user perceptions of appropriate ratings.

Predicting Yelp restaurant star ratings using review content and style.
Analyzing the influence of different features, including parts of speech, on rating prediction.
Evaluating the performance of various machine learning models, including Naïve Bayes, SVM, and Logistic Regression.
Optimizing model performance through feature engineering and parameter tuning.
Exploring the potential of identifying user sub-genres and their rating preferences for future research.

Zusammenfassung der Kapitel (Chapter Summaries)

Introduction: The study introduces the challenge of predicting Yelp star ratings and highlights the importance of analyzing user reviews, which are often overlooked. The focus on restaurant reviews and the need to address bias in user ratings are emphasized.
Related Work: This section reviews previous research efforts in extracting information from user reviews, specifically focusing on Yelp data. It discusses approaches like topic identification, personalized ratings, and sentiment analysis.
Data Collection: The data used for the project was collected from Yelp's Dataset Challenge, focusing on restaurant data.
Procedure Outline: This section outlines the research process, including data preparation, feature selection, exploratory data analysis, baseline performance evaluation, model optimization, and results analysis.
Data Preparation: The data is divided into development, cross-validation, and test sets. Key attributes from the business and review entities are identified and used in the analysis.
Baseline Performance: This section describes the baseline performance of Naïve Bayes, SVM, and Logistic Regression models on the cross-validation dataset.
Data Exploration: This section explores the development dataset to identify features with high influence on rating prediction and analyzes the effectiveness of different feature engineering approaches.
Optimization: This section discusses the optimization of Logistic Regression through L1 and L2 regularization techniques.
Results: This section presents the key findings, highlighting the effectiveness of POS pairs and adjectives as features, the optimal number of features for model training, and the comparative performance of Logistic Regression and SVM.

Schlüsselwörter (Keywords)

This research focuses on predicting Yelp restaurant star ratings using user reviews, utilizing machine learning techniques such as Naïve Bayes, SVM, and Logistic Regression. Key themes include feature engineering, model optimization, and the analysis of user review content and style. Specific keywords include Yelp data, restaurant ratings, sentiment analysis, POS pairs, adjectives, feature selection, and model performance.

Final del extracto de 5 páginas - subir

Detalles

Título: The Prediction of Yelp Star Ratings Using Yelp Reviews
Universidad: Carnegie Mellon University (Carnegie Mellon University)
Curso: Applied Machine Learning
Calificación: 4.00/4.00
Autor: Kartik Lunkad (Autor)
Año de publicación: 2015
Páginas: 5
No. de catálogo: V303616
ISBN (Ebook): 9783668036970
Idioma: Inglés
Etiqueta: Applied Machine Learning Text Mining Big Data
Seguridad del producto: GRIN Publishing Ltd.

Citar trabajo: Kartik Lunkad (Autor), 2015, The Prediction of Yelp Star Ratings Using Yelp Reviews, Múnich, GRIN Verlag, https://www.grin.com/document/303616

The Prediction of Yelp Star Ratings Using Yelp Reviews