With the amount of data available on social networks, new methodologies for the analysis of information are needed. Some methods allow the users to combine different types of data in order to extract relevant information. In this context, the present paper shows the application of a model via a platform in order to group together information generated by Twitter users, thus facilitating the detection of trends and data related to particular pathologies. In order to implement the model, an analyzing tool that uses the Levenshtein distance was developed, to determine exactly what is required to convert a text into the following texts: 'gripa'-"flu", "dolor de cabeza"-"headache", 'dolor de estomago'-"stomachache", 'fiebre'-"fever" and 'tos'-"cough" in the area of Bogotá.
Inhaltsverzeichnis (Table of Contents)
- Abstract
- Introduction
- Obtaining the information
- Applying the Levenshtein Distance
- Experimentation
- Conclusions
- References
Zielsetzung und Themenschwerpunkte (Objectives and Key Themes)
This paper explores the potential of Twitter data for analyzing the behavior of specific pathologies in a given geographic location. By combining data mining techniques with a custom Levenshtein distance analyzer, the authors aim to demonstrate the feasibility of using social media data to identify trends and patterns related to common illnesses.
- Social media data analysis for health research
- Application of the Levenshtein distance for text analysis
- Identifying patterns and trends in Twitter data related to specific pathologies
- Exploring the relationship between sentiment analysis and pathology prevalence
- Potential for utilizing artificial intelligence and data mining for predicting pathology behavior
Zusammenfassung der Kapitel (Chapter Summaries)
- Abstract: The paper introduces the concept of using social media data, specifically Twitter, to analyze and understand the behavior of pathologies. The authors describe their methodology, which involves applying a model and a custom Levenshtein distance analyzer to group together Twitter information related to specific illnesses. The study focuses on the "gripa" (flu) pathology in Bogotá, Colombia.
- Introduction: This section emphasizes the importance of social media data in understanding public opinion and its implications for various fields, including psychology and economics. The paper introduces the concept of web mining and gatekeeping functions, which are key elements in analyzing social network data. The authors highlight the need for a specific tool to analyze the Levenshtein distance and its connection to sentiment analysis.
- Obtaining the information: This chapter explains the process of collecting Twitter data using a Python script and the Twitter API. The authors detail how they link tweets to a specific city code using GeoPlanet and provide code examples for retrieving tweets related to the pathology "gripa" in Bogotá.
- Applying the Levenshtein Distance: This section focuses on the use of the Levenshtein distance algorithm to analyze the collected Twitter data. The authors explain the algorithm's principles and its application in identifying patterns related to specific pathologies. They provide a snapshot of the Twitter corpus used in their analysis and illustrate the process of cleaning and analyzing the data.
- Experimentation: This chapter describes the experiments conducted using the collected data. The authors applied techniques of clustering, relationship analysis, and sentiment analysis to understand the behavior of the data related to the "gripa" pathology. They present visual representations of their findings, including graphs and diagrams, demonstrating the patterns and relationships identified.
Schlüsselwörter (Keywords)
The main keywords and focus topics of this paper include social media analysis, Twitter data, pathology behavior, Levenshtein distance, sentiment analysis, data mining, artificial intelligence, web mining, gatekeeping functions, and public opinion.
Frequently Asked Questions
How can Twitter data be used to analyze health trends?
Twitter provides a massive amount of real-time public data. By analyzing tweets with specific keywords related to illnesses, researchers can identify geographic trends and patterns in how pathologies spread or are discussed.
What is the Levenshtein distance in text analysis?
The Levenshtein distance is an algorithm used to measure the difference between two sequences of words. In this study, it helps convert and group user-generated text into standardized medical terms like "flu" or "headache".
Which specific pathologies were studied in Bogotá?
The study focused on common symptoms and illnesses including "gripa" (flu), "dolor de cabeza" (headache), "dolor de estomago" (stomachache), "fiebre" (fever), and "tos" (cough).
What is the role of sentiment analysis in this research?
Sentiment analysis is used to understand the public's emotional response to diseases, which can be linked to the prevalence and severity of a pathology in a certain area.
How is geographic information extracted from Twitter?
The research uses the Twitter API along with tools like GeoPlanet to link specific tweets to city codes, allowing for localized data mining in areas like Bogotá.
What are "gatekeeping functions" in social network analysis?
Gatekeeping functions refer to the control and filtering of information within a network. Understanding these helps researchers identify how health information is shared and which trends become dominant.
- Quote paper
- Dennis Salcedo (Author), Alejandro León (Author), 2015, Behavior of Users Talking about Pathologies and Diseases on Twitter, Munich, GRIN Verlag, https://www.grin.com/document/302920