This text describes the performance of a software to extract features out of the textual informations from the walls of social network users and to group them by their interests. Furthermore, it gives possible applications of this software for advertisement and databases in social networks.
From the text:
-Theory;
-Test run;
-Run on a real network user
Table of Contents
1. Theory
1.1 Introduction to the work:
1.2 Approximations and idea of the study:
1.3 The texts source :
1.4 The texts preprocessing:
1.5 Topic model development:
1.6 The clustering algorithm:
1.7 The problem of unknown number of clusters:
2. Test run
3. Run on the real network user
Objective and Thematic Focus
The primary objective of this work is to develop a theoretical framework and software implementation for extracting features from the textual content found on social network user walls to perform interest-based categorization of users.
- Development of text analysis pipelines for social media data
- Application of BigARTM topic modeling for feature extraction
- Implementation of hierarchical clustering based on topic distributions
- Testing and validation of algorithms on social network datasets
- Optimization of regularization strategies for interpretable topic models
Excerpt from the Book
Introduction to the work:
In today’s world, millions of people every day, getting up from the bed in the morning and the first thing come to their page and travel through the social network. With time the page collects a lot of information. Information about the habits, interests, hobbies, political views - all in one degree or another is deposited on the user’s page. Some people do not even think what kind of informative trail they leave in social networks.
The idea of this work is to analyze one of the sources of information and on the basis of this data, to combine users into categories and give researchers the opportunity to study the characteristics of a group. As a source of information, we will take the wall of a user of the social network.
(This work was created as a tool to study the social network user environment, but in fact can be used in different applications and for different purposes. The generated code can be quickly reoriented to analyze the content of other social networks or sources of information.)
Summary of Chapters
Theory: This chapter introduces the motivation for social network analysis, explains the chosen modeling approximations (bag-of-words), details the text preprocessing steps, and outlines the methodology for topic model development and hierarchical clustering.
Test run: This section demonstrates the effectiveness of the proposed algorithm by applying it to a controlled test account with a predefined set of friends from known professional backgrounds.
Run on the real network user: This final chapter applies the established methodology to a real-world social network account, validates the interpretability of the resulting topics, and discusses findings and potential for further model refinement.
Keywords
Social Network Analysis, VK, Topic Modeling, BigARTM, Hierarchical Clustering, Text Processing, Information Diffusion, Data Sparsity, Regularization, User Interests, Dendrogram, Feature Extraction, Machine Learning, User Categorization, Matrix Factorization
Frequently Asked Questions
What is the core focus of this research?
The research focuses on extracting textual data from social media profiles (specifically wall posts) to analyze user interests and categorize users into distinct groups using topic modeling and clustering techniques.
What are the primary thematic areas covered?
The study covers social network data mining, the application of BigARTM for topic extraction, text preprocessing specific to social media, hierarchical clustering algorithms, and the evaluation of model interpretability.
What is the main objective of the study?
The primary objective is to create a reliable methodology and software tool that can group social network users based on their interests derived from textual information found on their profile walls.
Which scientific methods are utilized?
The study utilizes BigARTM for topic modeling, employs Ward's hierarchical clustering for grouping, and uses a uniformity/homogeneity test to determine the optimal number of clusters for the data.
What is addressed in the main body of the work?
The main body covers the theoretical foundation, including topic model development and regularization, the processing of social media text, the application of clustering algorithms, and practical validation through two test cases.
Which keywords characterize this work?
The work is characterized by terms such as social network analysis, topic modeling, BigARTM, clustering, and data sparsity in machine learning contexts.
Why was the VK social network chosen for this study?
VK was selected because it is the largest European social network, providing a rich dataset of user-generated content that is ideal for demonstrating the proposed text analysis and clustering theories.
How does the algorithm handle the "unknown number of clusters" problem?
The researcher uses a dendrogram derived from the hierarchical clustering process and applies a uniformity test based on multivariate data analysis to determine where to draw the boundary to create the most plausible number of clusters.
What role does the "decorrelator" play in the topic model?
The decorrelator is used to distinguish between different topics, which helps in preventing topic overlap and makes it significantly easier for the researcher to interpret what each cluster represents.
- Citar trabajo
- Mihail Novichkov (Autor), 2016, Social Networks. An analysis of VK.com, Múnich, GRIN Verlag, https://www.grin.com/document/340983