Users rely on the Websites to complete many tasks online, e.g., business travel, product research and even planning an entertainment activity. Usually users need to interact with various services and software, such as browsing, search, and social networks, to access different kinds of information, to make comparisons and to have conversations with friends.
The most difficult task while visiting a particular website is "to find the data or information of the interest or relevance". For example; if a website visitor wants to search phone numbers from the whole website then it need all the website pages to be well visited and well read. This requires a lot of time, effort and energy and even then there will be 70-80% chances of mistake in writing down a number correctly.
Now-a-days in the era of SMS-Advertising and Marketing, all the marketing and advertising companies require a complete and cheap solution to improve their businesses. This project will provide a complete solution for their domain in searching numbers from the whole website and saving them in a text file.
The system is developed in.Net Framework and is successfully tested for the test cases generated to check the effectiveness of the system. Testing is done by using the test cases designed for checking the modules of the application for unit testing.

Excerpt

1. Introduction

1.1. Text Mining

1.2. Information Extraction

1.3. SMS Marketing

1.4. Project description

1.5. Benefits that will come from solution

1.6. Limitations

1.7. Need for Solution

1.8. Outline

2. Basic Concepts

2.1. Web Extraction

2.2. Limitations

3. Literature Review

3.1. Information Extraction

3.2. Learning Extractors from Unlabeled Text using Relevant Databases

3.3. Information Extraction from the Web: Techniques and Applications

3.4. A Survey of Web Information Extraction Systems

3.5. Information Extraction A Survey

4. System Design

4.1. Proposed Model Design

4.2. Flow Chart

5. Implementation

5.1. System Development tool

5.2. System Requirements

5.3. Software Requirements

5.4. Hardware Requirements

5.5. System Description

5.6. Hybrid Approach

6. Testing

6.1. Test case01

6.2. Test case02

6.3. Test case03

6.4. Test case04

6.5. Results

7. Conclusion & Future Work

7.1. Conclusion

7.2. Future Work

8. References

Objectives & Core Topics

This work aims to develop an efficient, cost-effective software solution for SMS marketing by automating the extraction of targeted mobile phone numbers from websites, thereby reducing manual effort and human error in data gathering.

Development of a web-based data extraction parser
Application of Document Object Model (DOM) for web navigation
Utilisation of Regular Expressions for pattern-based data mining
Hybrid architectural approach for web data harvesting
System testing and performance analysis of extraction accuracy

Auszug aus dem Buch

5.6. Hybrid Approach

In this project a hybrid approach including, pattern mining by using regular expression and Document Object Model (DOM) techniques is applied to mine web links from websites. This hybrid technique consists of following steps:

a. Find Source code by using HtmlWebRequest and HtmlWebResponse library.

b. Use DOM to find links of website.

c. Use Regular Expressions to mine phone numbers from Source code of website.

d. Clean data by using Regular expression and conditional statements.

e. Save phone Numbers to external file.

Summary of Chapters

1. Introduction: Provides an overview of text mining, information extraction, and the specific needs for automated SMS marketing solutions.

2. Basic Concepts: Outlines the fundamental techniques for web extraction, specifically focusing on web interaction and request/response models.

3. Literature Review: Surveys existing research and systems regarding web information extraction, including various learning extractors and survey studies.

4. System Design: Details the architectural model, process flow, and specific use cases for the proposed software system.

5. Implementation: Describes the development tools, system requirements, and the technical implementation of the hybrid extraction approach.

6. Testing: Analyzes the functionality of the system through defined test cases and evaluates the performance and accuracy of the extraction process.

7. Conclusion & Future Work: Summarizes the achievements of the project and suggests potential future optimizations such as FPGA implementation.

8. References: Lists the academic and technical sources consulted during the research.

Keywords

Text Mining, Information Extraction, SMS Marketing, Web Scraping, .NET Framework, Regular Expressions, Document Object Model, Data Parser, Automated Extraction, Software Testing, Hybrid Approach, Phone Number Mining

Frequently Asked Questions

What is the core purpose of this research?

The research focuses on creating an automated software solution that extracts mobile phone numbers from websites to facilitate SMS marketing for businesses, eliminating the need for manual data entry.

Which domains are covered in this study?

The work covers text mining, information extraction, web scraping, and software development within the context of digital marketing automation.

What is the primary goal of the developed tool?

The primary goal is to provide a reliable, low-cost system that identifies and harvests targeted customer phone numbers from unstructured web pages with high efficiency.

What technology is used for the extraction?

The system uses a hybrid approach combining the Document Object Model (DOM) for navigating website structures and Regular Expressions for identifying and cleaning specific phone number patterns.

What is the focus of the implementation phase?

The implementation focuses on using the .NET Framework and Visual C#.NET to build a functional tool that can navigate URLs, extract links, filter phone numbers, and save the output in text or CSV formats.

Which keywords define this work?

The work is defined by terms such as Information Extraction, SMS Marketing, Web Scrapping, and automated data processing.

How is the accuracy of the system measured?

The system's accuracy is evaluated through specific test cases, resulting in an reported average accuracy rate of 80% during experimental testing.

What are the limitations of the proposed solution?

Key limitations include the necessity of an active internet connection and the challenge that some websites employ anti-scraping defenses that block automated bots.

Why are Regular Expressions utilized in this software?

Regular Expressions are essential for defining search patterns that accurately capture sequences of digits matching mobile phone number formats in different regions.

What future improvements are suggested for the project?

The authors suggest optimizing performance, refining data cleaning algorithms, and exploring hardware acceleration via FPGA to increase processing speed.

Excerpt out of 77 pages - scroll top

Details

Title: Web Fetcher: A SMS Marketing Solution
Grade: A
Authors: Maria Khalid (Author), Huma Siddiquie (Author)
Publication Year: 2013
Pages: 77
Catalog Number: V275844
ISBN (eBook): 9783656685654
ISBN (Book): 9783656685685
Language: English
Tags: fetcher marketing solution
Product Safety: GRIN Publishing GmbH

Quote paper: Maria Khalid (Author), Huma Siddiquie (Author), 2013, Web Fetcher: A SMS Marketing Solution, Munich, GRIN Verlag, https://www.grin.com/document/275844

Web Fetcher: A SMS Marketing Solution