Keyword Relevance in Search Engine Optimization

Master's Thesis, 2014

96 Pages, Grade: 2.5










1.1. Background of Study
1.2. Objective of Study
1.3. Challenges

2.1. Search Engine History
2.2. Popularity vs. Relevancy
2.3. How Does a Web Search Works
2.4. Accuracy of Search Engines
2.5. Calculating PageRank
2.6. Importance of Keywords
2.7. How Search Engines Remain Viable
2.8. What is Search Engine Optimization (SEO)
2.9. How Does Google Work
2.10. Related Research
2.11. Research Questions
2.12. Research Limitations

3.1. Test Platform
3.2. Benchmark Website Creation
3.3. Keywords Creation
3.4. Keywords Counting
3.5. Search Engine Ranking Check
3.6. Inbound Links Check
3.7. Analysis of Domain Age
3.8. Analysis of Recent Updates
3.9. Analysis of Social Interest
3.10. Overall Analysis

4.1. Evaluation of analytic tools
4.1.1. Keyword Count Analysis Using Keyword Density Analyzer
4.1.2. Analysis of Inbound Links Using ahrefs
4.1.3. Analysis of Domain Age Using Domain Age Checker from SEOChat
4.1.4. Analysis of Site Freshness from SiteBeam
4.1.5. Analysis of Site Ranking Using Rank Checking Tool from SEOCentro
4.1.6. Summary of Tools Analysis
4.2. Keyword Analysis Test 1
4.3. Keyword Analysis Test 2
4.4. Keyword Analysis Test 3
4.5. Social Interest Test on Google
4.6. Social Interest Test on Bing and Yahoo
4.7. Backlinks Check on Bing and Yahoo
4.8. Keyword vs. Backlinks Test on Bing and Yahoo
4.9. Social Interest vs. Backlinks Test on Google
4.10. Analysis of Results
4.11. Comparison of Results

5.1. Discussion
5.2. Summary of Findings
5.3. Limitations
5.4. Conclusion
5.5. Future Research Recommendations




The world of search engines have long been dominated by Google and most internet marketers know that they need to get their websites listed on the first page on Google or risk being totally unseen by their online customers. Almost everyone who is on the internet will search using a search engine for the information they want and rely almost completely on the information given on the first page of the search engine results page. It can be unfortunate for a company which can offer the products its customers want but unfortunately it cannot be found on the first few pages of a search engine retrieved pages. This has created a demand for search engine optimization companies which cater towards individuals and companies hoping to get their website listed on the first page of Google but not knowing how to. The work of search engine optimization is also fraught with errors as search engines like Google keep changing their search algorithms in their quest to perfect their search ability and this means the rules for search engine optimization are always changing too. As content may remain the same it is thus important to be able to find a way to measure the content of a website to determine its relevance for search engines to retrieve a desired webpage. One way to measure the content is to determine the amount of important keywords which make up the content and thus the purpose of this research is to determine the relevance of keywords in today’s demanding search technology such as those used by Google and Yahoo. This research also attempts to find out what are the other factors besides keywords which will help a website to rise to the top of a search engine results page.

Keywords: Search engines, search engine optimization, keyword research, keyword relevance, search engine ranking.


I would like to take this opportunity to thank Dr V.P. Thinagaran for his sincere guidance to enable this thesis to be formulated to completion. I would also like to thank the Programme Coordinator Ms Jaspal Kaur for her invaluable help and assistance during the course of my study.


Table 2.1 comScore Search Share Report for July 2013

Table 2.2 Periodic Table of SEO Success Factors

Table 2.3 Page Level Average Keyword Count

Table 2.4 Summary of ranking and hit overlap by the engines

Table 2.5 Summary of relevancy and precision by search engine

Table 2.6 Average precision (AP) across query clusters

Table 4.1 Keyword count results from Keyword Density Analyzer on control website

Table 4.2 Comparison of results from Rank Checker tool and actual search

Table 4.3 Ranking results of Test 1

Table 4.4 Test 1 results on keyword count and other search metrics

Table 4.5 Ranking result for Test 2

Table 4.6 Test 2 result on keyword count and other search metrics

Table 4.7 Ranking result for Test 3

Table 4.8 Test 2 results on keyword count and other search metrics

Table 4.9 Social interest score for Test 4

Table 4.10 Ranking test result for Test 5

Table 4.11 Backlinks test result for Test 6

Table 4.12 Keywords and backlinks test result and comparison for Test 7

Table 4.13 Social interest test result for Test 8

Table 4.14 Social interest score vs. backlinks vs. keyword count

Table 4.15 Summary of results from all the tests

Table 4.16 Summary of search metrics affecting search ranking


Figure 2.1 Creating a Full-text Index

Figure 2.2 Measuring PageRank

Figure 2.3 Weighting of Keywords in Ranking

Figure 2.4 Variance Due to Geo-targeting

Figure 3.1 Keyword Density Tool

Figure 3.2 Rank Checking Tool

Figure 3.4 Domain Age Checking Tool

Figure 3.5 Site Freshness Tool

Figure 3.6 Social Interest Check Tool

Figure 3.7 Flow Chart of Test Process

Figure 4.1 Snapshot of Keyword Density Analyzer on control website

Figure 4.2 Backlinks test results from ahrefs on control website

Figure 4.3 Results from Link Explorer in Bing

Figure 4.4 Result of Domain Age on control website

Figure 4.5 Results from freshness test on control website

Figure 4.6 Result from Site Ranking Test on control website

Figure 4.7 Summary of Result on Page Level Keywords Evaluation

Figure 5.1 Factors affecting Google’s rankings

Figure 5.2 Factors affecting Bing’s rankings

Figure 5.3 Factors affecting Yahoo’s rankings


illustration not visible in this excerpt

CHAPTER 1. Introduction

1.1. Background of study

Almost everyone depends on search engines as their first method of searching for information on the internet. Search engines exist for the sole purpose to provide the people who search with the information they want. Many a times the information returned is not relevant or not the intended search material. Search engines have therefore continuously improved to cater for human search behavior to understand the search query and to return the best information available. These days information is so numerous and widespread that search engines have a tough time to determine which is the most relevant information to provide to the searcher. The algorithm used by search engines is important and it is what determines the quality of the search. The algorithm must be able to gauge the intended information requested by the user and to provide the closest results in a descending order so that the closest result will appear first in a Search Engine Result Page (SERP). The user is thereby fully dependent on the search algorithm for the accuracy of the search results shown on the SERP. Since algorithms are closely guarded secrets by the search engines the user actually does not know how the search engine decides what results are to be shown. This research will use one of the search criteria which are called keywords to determine how much it affects the search results by looking at the search results and analyzing the results given by the search engines since not much is known about the algorithms being used. The three major search engines i.e. Google, Yahoo and Bing being the most popular today will be used for this study.

1.2. Objective of study

The objective of this research hereby is to determine how much keywords affect search engine results. This has been a long standing study as keyword relevance in search engines have been debated for a long time on their importance and relevance. It is now known that keyword is just one of the factor search engines look for in a document with the other major factors being the age of the document (search engines presume the older the document the more validity it possess) and the number of inbound links (again, search engines presume the more incoming links or references made towards the document the more authority it possess).

Many people have tried to figure out how search engines work and none more than the so called Search Engine Optimization (SEO) organizations whose work is to figure out for their clients how to make their clients’ website appear at the top of SERPs. Obviously there is a real purpose here as everyone wants their website or document to be visible and easily searched and no one wants their hard work to be left unknown by netizens. SEO is also now at the forefront of most marketing campaigns and also on the negative side at the forefront of hackers and marketers of dubious products. Search engines are also aware of this fact and this has made the world of search engines even more complicated as search engines try to omit documents which have dubious content but have made it to the top of SERPs. This method of excluding some websites is of course is never perfect as some valid websites will find themselves being omitted or down ranked by a search engine due to their ever changing rules and algorithms in their quest to improve their search results.

Are keywords the only factor to rank a website?

This research will also be conducted to investigate what are the other factors besides keywords which will affect a search and the ranking of the search results. Factors like link backs, domain age, freshness of content and social media links will also be investigated to see if they affect the overall ranking of a website by a search engine.

1.3. Challenges

There are some challenges towards this research as the results of this study may not accurately reflects the way search engines behave as different search engines have different algorithms and the different geographical locations of a user also affects the way a search engine produces its result due to their geo-targeting behavior. For example a search done on and on will give different results due to geo-targeting as Google tries to provide more Malaysian related websites if a user uses Due to this, this study will narrow the scope of comparison between websites from the same geographical location.

Another challenge which will affect the results is server redundancies. Since Google and other search engines like Yahoo may have many different servers around the world it is likely that each server may give a slightly different answer due to some differences in their database. In this research resources are limited to control which server is being accessed and slight difference in answers might be possibly due to different servers being accessed during a test.

CHAPTER 2. Literature Review

2.1. Search Engine History

Search engines have come a long way since the first search engine Archie, was invented in 1990 Kim. Before Google came along and dominated the whole search arena there were many other search engines developed after 1990 which included the likes of InfoSeek, AltaVista, Excite and Yahoo. Google, which was founded by Larry Page and Sergey Brin only came to the fore in 1996 and the rest is history, like they said. As the other search engines drifted into obscurity it is interesting to see that Google remained as strong as ever since its inception. One of the reasons Google differentiated itself from the rest and gained popularity was because of its simplistic design (only text) while the rest cluttered their search page with images and advertisements. Another factor was Google’s ingenious implementation of the PageRank factor which ranked websites according to their relevance. PageRank has since seen its importance being played down by Google as its popularity also proved to be its own nemesis when numerous so called SEOs (search optimization organizations) started to multiply in masses to take advantage of Google’s PageRank technology to tweak certain websites to make them appear at the top of Google’s search listing, thereby making Google’s search results seem manipulated.

2.2. Popularity vs. Relevancy

In the world of search engines popularity and relevancy of a website are two different things. A site may be popular but may not be relevant for a search. Conversely, a site may be relevant for a search but may not be that popular.

Popularity is determined by the number of links pointing towards a site. The more links pointing towards a site the more popular it is Dover & Dafforn, 2011. Since popularity of a site does not mean it is relevant to a search, another metric to gauge relevancy has to be determined. Think of it as a site like Facebook. Facebook is a site which is very popular and may turn up in your searches but it does not mean a search like “buy Christmas stockings” should end up in Facebook as Facebook does not sell Christmas stockings.

In the world of search engines, relevancy is determined by the theoretical distance between two corresponding items with regards to relationship Dover & Dafforn, 2011. Search engines therefore need to determine relevancy by analyzing the content of a webpage, website or document. A webpage with more relevant content would be ranked higher than a webpage with less relevant content. This would enable the search engine to use a list based on the rankings to present the search results to the searcher with the highest ranked webpage being shown first and followed by the other webpages with lower ranks in succession.

2.3. How Does A Web Search Works

To search for a document rapidly, computers uses a full-text index, whereby the list of all the words appear in a text together with a pointer to every occurrence Witten, Gori, & Numerico, 2007. The size of the list of pointers will depend on how many times a word reoccur in a document. The index will contain as many words as there are in the text of the document. The size of the index will be about the size of the document and the size of the list of pointers will be much smaller. However both the sizes of the index and list of pointers can still be compressed. Search engines will therefore store both the document and the full-text index in their database.

illustration not visible in this excerpt

Figure 2.1 Creating a full-text index (Witten et al., 2007)

Once the index is available a search can then be performed by just locating the ordered list and extracting its associated list of numbers. A search will then be able to find all the documents that contain the particular word being searched. Once an ordered list has been created it will be easy and fast for a computer to scan through them. To search for a phrase the computer will perform a proximity search whereas words which occur near to each other are selected.

2.4. Accuracy of Search Engines

To determine if the results of a search are relevant to a query a computer must decide on the following Witten et al., 2007:

A document will be deemed more relevant:

- If it contains more query terms
- If the query terms occur more often
- If it contains fewer non-query terms

A good indexing system will answer queries quickly and effectively, be able to rebuild new indexes and do not require large resources, all of which are easily done with the present computing power available. The difficult part is to determine the effectiveness of the answer or the relevance of the search result.

The most common way to measure the relevance of a search result is to calculate how many of the relevant documents have been retrieved and how early they occur in the ranked list.

Retrieval performance is measured by three quantities:

- The number of documents that are retrieved
- The number of those that are relevant
- The total number of relevant documents

The factor for precision and recall are then calculated as follows:

Precision assesses the accuracy of the search in terms of what proportion are good results while recall indicates the coverage of what proportion of good results are returned.

As an example, if 20 documents are retrieved and 15 of them are relevant this would give a precision of 75%. If the collection contains 60 relevant documents in total then the recall would be 25%.

The world of search engines has been made a lot challenging due to the millions of documents being added to the web almost daily. Search engines need to do their work accurately and at a fast speed to prove that they are worthy to be used.

2.5. Calculating PageRank

As search engines start to become more efficient there were other human factors which start to crop up as if there were not enough problems for search engines. When people started to realize that search engines were measuring often occurring keywords (words which are key to a document), they knew that if they put in more keywords in a document this would make the document easier to find. Hence a person selling cars for example can easily get his website noticed more than for example BMW or Toyota if he puts in more keywords in his website as compared to BMW or Toyota. This presented a new problem to search engines to determine a website’s relevance and one solution to overcome this problem was to create a measurement factor called the PageRank Witten et al., 2007 .

A website is deemed to be more important or relevant if there are other websites or documents pointing back to that website or using it as a reference. However, if two websites both having exactly 20 other websites pointing back to it how do you determine which is more relevant?Google’s founders Larry Page and Sergey Brin developed a metric called the PageRank in 1998 which was able to measure a website’s importance by measuring the link backs or backlinks to a website Witten et al., 2007.

illustration not visible in this excerpt

Figure 2.2 Measuring PageRank (Witten et al., 2007)

The PageRank of a page is a number from 0 to 1 which measures the importance of backlinks towards that website or webpage. Each link towards the page contributes a factor towards its overall PageRank number. That factor is determined by the PageRank of the referring site divided by the number of outlinks (the total number of links going out) from that site. Obviously it is more prestigious for a site to have inbound links rather than outbound links and this is what PageRank aims to measure.

As an example, in Figure 2.2, the PageRank of D can be calculated by adding one-fifth of the value of the PageRank of A (since it has five outlinks) with one-half of the value of the PageRank of C (since it has two outlinks). By using the PageRank method almost every website can be allotted a PageRank number and a higher PageRank number will add towards its importance of being ranked higher compared to other websites with lower PageRank numbers.

2.6. Importance of Keywords

Different search engines use different methods and algorithms to determine the relevance of documents in order to list them in their order of their importance or relevance when a user performs a search. Needless to say Google has managed to secure the majority of users by implementing its PageRank approach to the extent of other search engines trying to copy their algorithms and leading also to the death of many of the earlier search engines which could not match Google’s accuracy and speed.

However, PageRank is not the only sole factor Google and other search engine uses and there are many other factors which can determine a document’s relevance and importance.

On the search term itself a search engine will determine if the word being searched appears in document constantly and where it occurs e.g. Witten et al., 2007 :

- The anchor text
- The title
- The URL (universal resource locator)
- Headings
- The meta tags

Each search engine will employ its own algorithm to determine the overall importance of the page in terms of the occurrence and location of the searched word or term and no one knows precisely what algorithm will be used by individual search engines. Furthermore the search algorithm used by Google, for example, changes regularly as Google attempts to improve their search engine’s results and relevance (Sullivan, Sep 26, 2013 ).

It was believed in the past that search engines liked to see keywords in certain locations of the HTML code to help indicate a page's relevance for that query. Nowadays, this is not very true anymore as relevancy and keyword-based algorithms that Google and Bing use to evaluate and rank pages are massively more complex now. Gaining a slight benefit in a keyword placement-based algorithmic element may even actually harm overall rankings because of how it impacts people's experience with your site (Fishkin, 2013).

In a recent survey conducted on 130 SEO professionals in 2013 (Fishkin, 2013), the ranking factors that matters most to the professionals were queried and the results are shown as in Figure 2.3.

illustration not visible in this excerpt

Figure 2.3 Weighting of Keywords in Ranking (Fishkin, 2013)

In this survey it was found that less than fifteen percent in the

Search engines are now not just looking for a page filled with keywords but are more interested in an “optimized” page. Basically an optimized page is one that must provide unique content and value. Search engines are more intelligent now and they seek this unique value where social shares, links, all the other positive associations and branding come together to create all the right signals to propel the website to the top (Fishkin, 2013).

At the very basic level, an optimized page is one that is:

- Easy to understand
- Providing intuitive navigation and content consumption
- Loading quickly, even on slower connections (like mobile)
- Rendering properly in any browser size and on any device
- Designed to be visually attractive/pleasing/compelling

While search engines are getting more intelligent, it remains a fact that they are still using automated bots or crawlers to look for signs of an optimized page which will provide the visitor with a good user experience. As bots being what they are it is important to provide all the necessary help to these bots to enable them to crawl a website for the information you want it to find.

All the important information in a website has to be properly located and especially the keywords need to be located in the following areas:

- Page title
- Headline
- Body text
- Image and image ALT attributes
- Internal and external links
- Meta description
- Meta keywords

The above are the locations where bots will expect to find the keywords to enable a search engine to rank a website based on the keywords used (Fishkin, 2013).

Since it is a known fact that bots are used to read keywords there are people who will resort to keyword stuffing or to use a lot of keywords in the hope that bots will rank their sites higher. Although this was done in the past search engines are now smarter and this is one reason why keyword count are no longer the major factor in ranking a website. There are also ways to measure keyword relevance where the use of particular keywords are judged to see if they have been used excessively. Google has been known to penalize websites that used keywords excessively or those which try to hide their keywords from view but are still inserted as codes for the bots to read.

As keywords are being less relied on for search rankings it is also important to note if the other search engines besides Google are also placing less reliance on keywords for ranking. It must be remembered that most search engines in the past placed a lot of importance on keywords and their viability depended on how well they were able to please search engine users with the search results.

2.7. How Do Search Engines Remain Viable

As most search engines employ their own custom designed algorithms to run their search engine, these algorithms are highly secretive and remain as corporate secrets to protect the business viability. Search engines like Google earn from advertising which are discreetly inserted into search engine results pages or SERPs. If a user cannot find the information he wants, search engines presume that the user might click on one of the ads which will be presented to him on the same page.It is indeed a paradox as to whether the search engines prefer their user to find the information he wants or to click on one of the ads (in the event that he did not find suitable information) which will then add revenue into the search engine’s operational profit.

Until Google’s appearance in 1998, search engines followed the classic model of information retrieval: to see if a document is a good match to its query, you only need to look inside the document itself. Google changed all that with the introduction of PageRank then and created an earthquake in the world of search engines which left few survivors in its wake. In 2005 only five major search engines remain: Google, Yahoo, MSN Search, AOL and Ask Witten et al., 2007. In 2013, the landscape has remained without much changes with Google, Bing (Microsoft), Yahoo, Ask and AOL in the order of preference as rated by comScore and with Google taking up almost 67% percent of market share in terms of search "comScore Releases July 2013 U.S. Search Engine Rankings,".

Table 2.1 comScore Search Share Report for July 2013 "comScore Releases July 2013 U.S. Search Engine Rankings,"

illustration not visible in this excerpt

2.8. What is Search Engine Optimization (SEO)?

The term Search Engine Optimization (SEO) can be used to describe a diverse set of activities that can be performed to increase the number of desired visitors to a website via search engines (Couzin et al., 2008). These include actions such as making changes to the text and HTML code, communicating directly with the search engines and pursuing other sources of traffic for listings or links. SEO is basically doing things to set up a website so that it ranks well for particular keywords that visitors will search for in search engines. Unlike paid search marketing which requires you to pay for every click sent to your website, SEO is about getting traffic sent to your site from the search engine’s organic results for free (Jones, 2008).

SEO started in 1997 through public reports and commentaries provided by search engine experts such as Danny Sullivan and Bruce Clay, among others (Jones, 2008). Early reports about SEO looked at search engine algorithms and how the various search engines ranked search results. Inspired entrepreneurs and website owners began studying these reports and tested strategies on how they could rank well on the search results. Inevitably, it soon became a business and a profession for some to be engaged in the services to provide SEO advice and help to others to help them achieve better rankings for their websites.

As the World Wide Web grew at a remarkable pace the popularity of some search engines like Alta Vista and Infoseek started to diminish while Google which was incorporated in 1998 became stronger and bigger. Google’s early success was due to their ground breaking new algorithm which made use of their proprietary PageRank formula and also the fact that their web interface was less clustered with advertisements. Google today is much more complex and they have grown to be the leading search engine with over 60 percent in market share (Jones, 2008). This has invariably made them the target for almost every SEO companies out there as they (SEO companies) know that everyone wants to get their website listed on Google to attract as much traffic as possible from organic search.

Although there are also many who practiced SEO unethically to obtain traffic to their websites it cannot be denied that SEO is a much sought after service nowadays as many companies do not have the time to ensure that their website will be well ranked as they may not be familiar as to what the various search engines are looking for. Also most website creators are naturally more concerned about the content and design they put inside their webpages rather than thinking about whether their content or design will be “liked” by the search engines.

As a website developer though it cannot be denied that SEO is important as very few developers actually develop websites where traffic or visitors are not wanted. If you are looking for visitors and lots of it then you will have to pay heed to what SEO is all about or risk being sidelined by the search engines as there are possibly thousands of other similar websites as yours and search engine visitors hardly look past the first few pages of the search results. Therefore, since Google is the favored search engine of the day it definitely does not pay to not understand how Google works if it is traffic you are looking for.

2.9. How Does Google Work?

Since Google is now the preferred search engine it will be used as the search engine for study.Google goes through a few basic steps to index a website before someone can search the website from its huge database.

The basic steps taken by Google are as follows "How Does Google Collect and Rank Results," 10 January 2006:

- A webmaster can submit his website URL to Google for it to be indexed by Google’s robot or wait until the robot trawls the whole internet and stumble upon the website which of course will take a long time.

- After indexing the site the robot will read the contents of the website and store relevant information like keywords into its database. The keywords are important because it will help the robot to find back the same document if someone searches for a document with that keyword. Also the frequency of the keyword in the document or the whole domain will also be measured to ascertain how important or relevant the keyword is as used in the actual search term.

- Keywords are not the only deciding factor for Google to retrieve a document during a search as Google also need to determine how important is the document and this is where their search algorithm goes into full gear. To rank a document in terms of relevance Google uses their proprietary PageRank algorithm to determine how many links are there which point back towards the document or the so called number of backlinks. The more links there are the better the PageRank, but the PageRank algorithm is much more sophisticated as it also measures the “quality” of the backlink i.e. where is the link coming from, is it from a reputable site e.g. ?

- Google today has progressed to more than just using PageRank to determine a website’s relevance and in its latest incarnation – “Hummingbird”, Google reveals that PageRank is just one of over 200 other major ingredients that go into Hummingbird. (Sullivan, Sep 26, 2013 ). The other factors Google now measures are divided into the following groups as presented on Table 2.2. ("The Periodic Table Of SEO Success Factors,")

Table 2.2 Periodic Table of SEO Success Factors ("The Periodic Table Of SEO Success Factors,")

illustration not visible in this excerpt

Search Engine Land ( sees Google as now using a group of factors to identify a website’s popularity and relevance to correctly rank it in its SERP when a user requests for particular information.

In the Periodic Table of SEO Success Factors Periodic Table of SEO Success Factors,”) the writers of Search Engine Land propounded that there are three main groups of factors affecting search engine rankings and more sub-groups within the three main groups. The three main groups are:

- On The Page SEO
- Off The Page SEO
- Violations

There is no one particular factor affecting search engine ranking and a combination of the correct factors will raise the rankings of a website. On The Page search ranking factors are those that are entirely within the publisher’s own control e.g. the HTML code and architecture. Off The Page ranking factors are those that publishers do not directly control like links to the website, social media links and trustworthiness of the publisher. Violations are mainly tactics used to deceive or manipulate a search engine’s understanding of a site’s true relevancy and authority and will actually bring down the ranking of the website.

Explanation on the SEO success factors:

(i)On The Page SEO

(a) Content
Quality – Are pages well written and have quality content?

Research – Have the keywords been properly researched?

Words – Are words and phrases used as what they will be searched for?

Engage – How long does a visitor spend time on that page?

Fresh – Are pages “fresh” and about “hot” topics?

(b) HTML

Titles – Do title tags contain keywords relevant to topics?

Description – Do meta tags describe the page properly?

Headers – Are headlines/headers using relevant keywords?

Structure – Do pages use structured data?

(c) Architecture
Crawl – Can search engines easily “crawl” the site?

Duplicate – Are there duplicate content in the site?

Speed – Does site load quickly?

URL – Are URLs short and meaningful?

Mobile – Does the site work well for mobile?

(ii) Off The Page SEO

(a) Links

Quality – Are links from trusted and respectable sites?

Text – Do links pointing at pages use word you hope they will be found?

Number – Do many links point at your web pages?

(b) Trust
Authority – Do links and shares make site a trusted authority?

History – Has site or domain been around for a long time?

Identity – Does site use means to verify identity of its authors?

(c) Social

Reputation – Do those respected on social networks share your content?

Shares – Do many share your content on social networks?

(d) Personal

Country - What country is someone located in?

Locality – What city or local area is someone located in?

History – Has someone regularly visited the site?

Social – Has someone socially favored the site?

(iii) Violations

(a) Thin or Shallow Content
(b) Ads / Top Heavy Layout
(c) Keywords Stuffing
(d) Hidden Text
(e) Cloaking
(f) Paid Links
(g) Link Spam
(h) Piracy / DMCA Takedowns

Each of the above 33 factors will either contribute towards (blue/unshaded) or pull down (red/shaded) a website’s ranking in Google. Search Engine Land understands that there are 200 main factors in Google’s algorithm (Google is not telling which are the 200) and believes that their 33 factors is a good approximation of the algorithm Google is using Sullivan, Sep 17, 2010 .

2.10. Related Research

Geetha, S. and Sathiyakumari, K. has done research on the prediction model for page ranking of blogs Geetha & Sathiyakumari, 2012 by studying the effects of link backs, social network links and data structure on a page being ranked by search engines. Geetha and Sathiyakumari used MozRank ( as one of their tools in their research to determine the popularity of a domain to gauge the effect of popularity on a page ranking. MozRank which is now being incorporated inside Open Site Explorer ( will also be used in this author’s research. Using the Open Site Explorer the user is able to determine the following important metrics which are used to rank a website:

i. Page Authority – which is based on an algorithmic combination of all link metrics to predict a page’s rank potential.
ii. Domain Authority – which is based on an algorithmic combination of all metrics to predict a domain’s ranking potential.
iii. Linking Authority – which is the number of unique root domains containing at least one link to the URL
iv. Total Links – which is the number of all links to the URL including internal, external, followed and not followed.

In their research, Geetha, S. and Sathiyakumari, K. found that Open Site Explorer was useful as it enables the user to compare up to five URLs at the same time to compare the metrics being measured.

Sharma, D.K. and Sharma, A.K. has also done research to compare the different algorithms being used to rank a webpage and to study the various parameters affecting a search engine ranking Sharma & Sharma, 2010. The purpose of their research was to analyze the current algorithms used for ranking web pages to find their relative strength and limitations in order to find ways to improve the ranking of web pages.The algorithms they analyzed were the

i. Page Rank algorithm
ii. HITS algorithm,
iii. Weighted Page algorithm,
iv. Weighted Links Rank algorithm,
v. EigenRumor algorithm,
vi. Distance Rank algorithm,
vii. Time Rank algorithm,
viii. TagRank algorithm,
ix. Relation Based algorithm,
x. Query Dependent Ranking algorithm,
xi. Ranking and Suggestive algorithm,
xii. Comparison and Score Based algorithm
xiii. Algorithm for Query Processing in Uncertain Databases
xiv. Ranking of Journal based on Page Rank and HITS algorithm

Based on their research it was found that content, back links and keywords were the main parameters that were observed to provide relevancy in a search although it was not conclusive if there were other factors that were also affecting the search results.

Shika Goel and Sunita Yadav did a research on search engine evaluation based on page level keywords Goel, 2013 and used the keyword count to judge the performance of the three search engines namely Google, Bing and Yahoo on educational queries. Page level keywords are keywords found on the individual pages of a website such as in the title, header tags, anchor text, meta tags, ALT tags, in the URL string and finally in the content.

Based on their findings that page level keywords are one of the most critical factors in determining search engine ranking, they did a research by using various queries submitted to the three search engines and created a database for the results. The page level keywords were then calculated from the search results from forty educational queries randomly selected from a set of most searched queries. Only keywords (one word) were used instead of phrases for their method and the top ten pages retrieved from each search engine was then saved in a page repository.

The pages retrieved were then parsed to match the input keyword and the total keywords and the average keyword count were then calculated. The average keyword count for the first ten pages was then used to score for the search engine.

Their results showed that Yahoo is ranking websites based on keywords considerably higher followed by Bing and Google respectively. Goel and Yadav also found that Yahoo showed more interest in keywords present in the title tag rather than in the content and used it as the biggest ranking factor to rank web pages. To verify their results the researchers also used human ranking to determine the number of relevant pages by using a precision measurement where the number of relevant results were divided by 10 and then calculated for the forty keywords tested. The human ranking results showed similar trends whereby Yahoo was measured as the most relevant and Google the least as shown in Table 2.3.

Table 2.3 Page Level Average Keyword Count (Goel, 2013)

illustration not visible in this excerpt

Based on their research Goel and Yadav concluded that Yahoo provided the most relevant results based on the educational keywords submitted but it must be pointed out here that relevance based on keywords alone is again very subjective to interpretations (more so when done by human ranking). As it is, many search engines nowadays do not depend on keywords alone to measure the relevancy of their results.

Dania Bilal (Bilal, 2012) did a research in 2012 to evaluate Google, Yahoo, Bing, Yahoo Kids and Ask Kids on their retrieval performance for queries related to children. Two categories of research questions were addressed in his study: (a) benchmarking retrieved hits and their ranking positions and (b) judging relevancy of hits to calculate recall and precision. To address these, two quantitative research designs were employed to answer the research questions: (a) benchmarking ranked output by the search engines and (b) intellectual relevance judgment. The purpose of Bilal doing his research on information retrieval for children was to gain insight into the performance of the search engines he was studying by reducing the complexities of an otherwise adult search environment.

To calculate recall and precision Bilal used a variation of the precision metric that focused on the first results page with a cutoff at 10 hits per page.

For the first method, Bilal selected Google as a benchmark since it was the leading search engine to compare the results from the other four search engines.

Two sets of queries were formulated for the research. In the first set, a corpus of 130 queries were extracted from studying children’s queries over a period of time and covers different topic domains (e.g. medicine, health, history and social studies). The second set of 25 queries was compiled by a graduate student from studying children in a middle school library. Fifteen unique queries were then selected from the first set and included 5 one-word, 5 two-words and 5 phases or natural language queries. Similarly, 15 unique queries were also selected from the second set giving a total of 30 queries overall.

Each query from the first set was then submitted to a given search engine and the first results page was recorded for analysis on January 31, 2011 between 10.00 pm and 11.15 pm to avoid any possible changes in results due to search engine updates. The second set was similarly submitted on April 18, 2011 between 9.05 and 11.07 pm from the same geographical location. The total submissions resulted in 150 first page results for the five search engines and since the first ten results were recorded a total of 1,500 results were then made available for the study. From the 1,500 results 140 were found to be broken links and were then removed leaving a total of 1,360 results that were judged.

From the 1,360 hits calculation was done for precision and recall including total precision (TP) and average precision (AP). Since Google was used as a benchmark, retrieved hits from Yahoo, Bing, Yahoo Kids and Ask Kids were compared with the top five hits from Google and the overlap in hits were calculated.

For the one-word query cluster it was found that 44% of Yahoo’s results had the same ranking as Google, 28.3% from Bing was similar, 18.7% from Ask Kids was similar and 0% from Yahoo Kids.

For the two-word query cluster it was found that 30% of Yahoo’s results had the same ranking as Google, 24% from Bing was similar, 33% from Yahoo Kids was similar and 22% from Ask Kids was similar.

Finally, for the natural language query cluster it was found that 8% of Yahoo’s results had the same ranking as Google, 20% from Bing was similar, 0% from Yahoo Kids was similar and 37.5% from Ask Kids was similar.

In summary, the results showed that the engines retrieved a higher percentage in hit overlaps that matched Google’s top five hits on the one-word queries (35.7%), followed by two-word queries (26.7%) and natural language queries (16.4%) as shown on Table 2.4.


Excerpt out of 96 pages


Keyword Relevance in Search Engine Optimization
Open University Malaysia  (Faculty of Information Technology & Multimedia Communication)
Master of Information Technology
Catalog Number
ISBN (eBook)
ISBN (Book)
File size
1863 KB
Search engines, search engine optimization, keyword research, keyword relevance, search engine ranking
Quote paper
Tze Ping Khor (Author), 2014, Keyword Relevance in Search Engine Optimization, Munich, GRIN Verlag,


  • No comments yet.
Read the ebook
Title: Keyword Relevance in Search Engine Optimization

Upload papers

Your term paper / thesis:

- Publication as eBook and book
- High royalties for the sales
- Completely free - with ISBN
- It only takes five minutes
- Every paper finds readers

Publish now - it's free