Using Social Semantic Web Data for Privacy Policies

Bachelor Thesis, 2009

71 Pages, Grade: 1,7

Antonia Feserer (Author)



1 Introduction

2 Motivating Scenario and Problem Statement
2.1 Motivating Scenario
2.2 Problem Statement

3 Background
3.1 The Social Web
3.1.1 Social Networking Sites
3.2 The Semantic Web
3.2.1 The Resource Description Framework (RDF)
3.2.2 The SPARQL Protocol And Query Language
3.2.3 SPARQL Endpoints
3.2.4 The Social Semantic Web
3.3 Privacy Policies
3.3.1 Policy Languages
3.3.2 Policy Frameworks
3.3.3 The Protune Framework

4 The Social Web from a Privacy Perspective
4.1 Data Disclosure - Why Better Control is Needed
4.1.1 Privacy Issues
4.1.2 Information Overload
4.2 Privacy Protection on the Social Web - a State of Art
4.2.1 Twitter and its Privacy Options
4.2.2 Facebook and its Privacy Options
4.2.3 Flickr and its Privacy Options
4.3 Comparing Privacy Preferences on Social Platforms
4.3.1 Levels of Trust for Data Disclosure
4.3.2 Network Features and their Protection
4.4 Summary

5 Policy Reasoning Based on Social and Semantic Web Data
5.1 Requirements for Policies on the Social Web
5.2 Social and Semantic Web Data for Policy Spceifieation and Evaluation
5.2.1 Types of Social Data and their Availability
5.2.2 Using Social Data to Define New Concepts
5.2.3 Enforcing Policies upon an Application
5.3 Taking up the Motivating Scenario

6 Implementation
6.1 Retrieving Heterogeneous Information
6.1.1 Retrieving Social Web data
6.1.2 Retrieving Social Semantic Web Data
6.2 Wrappers for External Information Sources
6.2.1 IN-Prcdieatc
6.2.2 SPARQL Endpoint Wrapper
6.2.3 DBpedia Wrapper
6.2.4 DBLP Wrapper
6.2.5 RDF Wrapper
6.2.6 Fliekr Wrapper
6.2.7 Twitter Wrapper
6.3 SPoX- A Use Case

7 Related Work

8 Conclusions and Outlook



Social Web applications arc steadily gaining popularity. At the same time, the open nature of such services leads to the exposure of an immense amount of personal data. Due to insufficient access control on nowadays Social Web applications problems in terms of privacy arise. This thesis focuses on the need for more flexible and fine­grained privacy restrictions. It analyses privacy problems of current Social Web applications and compares the privacy preferences such applications offer. Based on this analysis, this thesis extends the well-known principle of policy-based access control, which is a flexible and dynamic way to define who can get access to what content based on user preferences. The presented extension accommodates policies to the requirements of the Social Web. In particular, it describes how to exploit Social Semantic Web data for privacy reasoning. This includes the retrieval of Social and Semantic Web data from various information sources on the Web. It further includes its usage for the definition of privacy policies and its consideration during the policy evaluation. Consequently, using Social Semantic Web data for policy reasoning allows users to exactly define which social relationships and properties a requester has to have in order to access a particular resource. These conditions can cross the boundaries of a single Social Web application. Hence, a user can for example state that a friend on one application can access pictures stored on another application; thus bridging the walled garden of nowadays Social Web applications.

1 Introduction

In the last years the web underwent a drastic shift from a static, centralised infor­mation system to a dynamic, user-generated, distributed and open platform, and users changed from passive consumers to active participants, interacting, creating and sharing content. [1] This ‘new’ web is called Web 2.0. In the era of this move­ment new Social Web applications emerged creating an environment for people to publish, share and discuss content, plus enabling people to create descriptive profiles of themselves for self-expression and build social networks consisting of relationships with others with the purpose of interaction and communication. With the increasing popularity of such social networking applications the number of users has sealed up and is still growing. Not only the number of users but also the web traffic is an indicator to the growing importance of social networking platforms which arc now among the most visited websites.

With over 100 million unique visitors worldwide, Faccbook is one of the most popular networking sites on the web[1], moreover the site ranks third in the top visited sites on the web only being surpassed by Google and Yahoo! according to Alexa[2]. YouTube (with over 80 million unique visitors), MySpaec (with about 60 million unique visitors) and Fliekr (about 30 million unique visitors)[3] arc other examples of prominent social networking platforms.

However, the availability of such a huge amount of information within the social networking sites and the open nature of the services and their usage also attracts the attention of parties with marketing purposes or malicious intent. Users arc thereby put at risk of online stalking, phishing[4], identity theft, spamming, passing on data to third parties and privacy issues which arc related to personal data exposure due to insufficient access control.

By maintaining social networks and actively participating in Social Web activi­ties like interacting with others, users unwittingly expose sensitive and personal or inappropriate, even reputation-damaging data not only to friends but to an audience that mostly remains invisible and consists of strangers or aeciuaintanecs that poten­tially arc not supposed to see such information. Thus the revealed information can lead to major consequences if read out of context or read by parties, like authorities or job recruiters, for whom this information was not intended. For example, some real-world problems based on privacy issues can be read in (2, 3, 4, 5]. These arc just a few examples that received major public attention. The reputation of social networking sites has been slightly diminished by several such incidents that often reach the attention of the media. Seeing that social networking users become more aware of such privacy risks and do not stop from expressing their concerns[5], the privacy settings need to be acknowledged as an important part of Social Web appli­cations and extensive research needs to be done in the area of improving the privacy preferences and giving the users control over who can see what of their social data.

Although social networking sites have already realised the need for privacy pro­tection and some Social Web applications like Facebook have even installed more complex access restrictions, the privacy preferences are still not fine-grained and flexible enough and far from being satisfying. Further on, such privacy settings con­fine themselves to the properties of the own website without making use of data created beyond the boundaries of the own application.

Policy-based access control is an approach to protect privacy in open systems like the Social Web applications and can further help to control the information overload which users are facing on the Social Web. With policies being formal, well-defined statements [7] the process of defining who can get access to what content based on user preferences can be realized in a flexible and dynamic way. Nevertheless, the current policy-based control of the behaviour of complex systems does not offer solutions for the movement towards Social Web where information about users, their content and their relationships is not confined to one application only but is spread out across the whole Social Web. This is also the problem of privacy settings of the current Social Web applications which offer preferences only according to the established relationships and other attributes within the own website.

The contribution of this thesis is therefore to enhance the privacy policies in terms of integrating data from various information sources such as Social Web applications into the policy specification and reasoning process. Such policies can save people the trouble to create the same social data[6] on each Social Web application they are members of. These privacy policies can be proved to be beneficial seeing as there are numerous social networking sites emerging and offering people various functionalities and side-specific features and people spend an increasing amount of time in maintaining all this distributed data throughout the many services.

Social Web applications provide their information in proprietary formats via their own site-specific application programming interfaces. In this thesis the presented ap­proach collects this arbitrary, heterogeneous data and provides it in a homogeneous format so that it can be integrated into policies and exploited for policy reasoning. Furthermore, in addition to Social Web data, Semantic Web data can be included in policy decisions as well to extend the variety of policy specification. Such semantic information is available in non-application-specific standard formats which can be easily transported and reused. Additionally, information provided by Social Web ap­plications can be retrieved as Social Semantic data, allowing to convert Social data into a unique format using Semantic Web technologies. In this thesis the process of retrieving all these information types, transforming the extracted data into a for­mat appropriate for the policy-based access control and combining them to create fine-grained privacy policies will be explained and demonstrated using selected in­formation sources from the Web. To implement this presented solution the policy framework Protune is used to automate the evaluation and decision process based on the conclusion drawn from privacy policies.

The remainder of the thesis is organised as follows. Section 2 presents a scenario showing how privacy policies can be integrated into a web application and enhance the user experience on such sites. Further on the problem statement is identified and described. Section 3 is intended to provide the background information, necessary for understanding this thesis. Section 4 analyses privacy problems of current Social Web applications and compares the privacy preferences such applications offer. The extension of policy-based access control to accommodate policies to the requirements of the Social Web is presented in Section 5 and the motivating scenario is revisited. Subsequently, Section 6 describes how the Social and Semantic Web data can be retrieved in order to include this data into Protune, describing the actual imple­mentation of the Protune extension for Social Semantic Web data. A prototype implementation called SPoX [8] is also presented, which demonstrates the usage of policy-based behaviour control on the Social application Skvpe. After a presentation of related work in Section 7, Section 8 concludes this thesis also providing an outlook for future research.

2 Motivating Scenario and Problem Statement

2.1 Motivating Scenario

In order to explain the benefits of using a policy-driven approach to protect privacy in open systems, a fictional scenario will be presented that serves as a use ease to demonstrate how privacy policies based on various Social and Semantic Web data can be used to control the own data on a Social Web application. Parts of the scenario will be used throughout this thesis to explain the syntax of policies in general and Protune policies in particular and how external data can be retrieved and integrated into the Protune framework.

Bob is a scientist who is working in a company and taking part in research projects as well as holding a seminar for students at the local university. Bob has a very active web presence; besides having profiles on numerous social networking sites like Faecbook, Fliekr and Twitter, he also manages his own website, where he uploads information about different aspects of his life, such as his business, interests and family-related information. Furthermore, being a supporter of the Semantic Web movement, he has uploaded a FOAF file to express his network of aeciuaintanees.

(S1) To be easily reached by his colleagues, his students at the university as well as by his friends, Bob posts his contact information on his website. Because his email address is not too private he agrees if any of his friends and colleagues can sec it.

(S2) Nevertheless some contact data is very sensitive, therefore Bob docs not want anybody to sec all his contact information; for example Bob only wants to disclose his phone number to his family and elosc-fricnds, which he added to his ‘dose-friend’-list on Faecbook.

(S3) Bob uses his website for blogging as well. He publishes findings concerning some of his researches and includes links to other interesting websites which arc relevant for his profession. He also talks about his personal life, discusses stories and posts pictures about family and friends as well as about his interests beyond his profession. As some of the information he posts is private, such as pictures of his last vacation trip or a wish list he compiled for his next birthday, he only wants his Fliekr and Faecbook friends in addition to his family to sec it. Moreover everybody who is in the Fliekr group about the landscape of Southern France, can sec the pictures of his holiday trip, which arc tagged with the keyword ‘Southern France’ but not the ones tagged ‘private’, as these arc only intended for the family and close friends.

(S4) Any work-specific information Bob posts or files he uploads should only be visible to his colleagues, belonging to the work network of his company on Faecbook. Additionally Bob wants the co-authors of his publications to also have access to updates about any research projects.

(S5) Bob has many different interests and hobbies, which he likes to write about. One of them is his passion for baseball. Unfortunately not all his friends share the same hobby, that is why Bob would like to disclose his updates concerning that sport to people appreciating reading it. Say, if some of Bob’s friends arc in the group about baseball on Faecbook, they arc more likely to prefer reading Bob’s thoughts on the next baseball championship, rather then reading his presentation slides about the newest Semantic Web technologies, which in return would go down well with his FOAF friends. That is why Bob wants his friends, who arc cither also in a baseball group or people having a blog writing about baseball to be able to sec baseball-related information.

(S6) His presentation slides arc only intended for his acquaintances interested in Semantic Web. Further on, Bob wants to use this presentation slides in the next lecture he is giving. To get relevant feedback on the quality of these slides, he hopes his colleagues and friends who arc skilled in that field arc going to read the slides and give their constructive criticism. He knows that all friends he added in his FOAF profile, his colleagues in his company network on Faecbook and any friends, who arc also co-authors of his publications arc all skilled enough to help him. As theses slides arc in German language he only wants people to read the slides who master this language.

2.2 Problem Statement

At present time, the possibilities to define and enforce privacy preferences arc too restrictive and unadaptable to the user’s needs. “The internet lacks walls”, as danah bovd pointed out [9], which is why the ability to define expressive privacy preferences and improve user’s control is essential, especially when it comes to sensitive and personal data.

To be able to decide which part of the personal data can be disclosed to whom, each user may have his own ideas on the right privacy preferences according to his personal situation and purposes on the Social Web; for example a user is searching for new business contacts, wants to meet new people or wants to keep in touch with his current friends, among many others. As these situations arc quite complex but also individual for each user it is difficult to provide static and predefined privacy options which arc also sufficient enough to accommodate each individual need of thousands of users of a Social Web application. Therefore users need to have privacy settings going beyond some predefined checkboxes with a few selected options which also arc of binary type in most eases, such as a profile is cither private or public, a user is cither a friend or a stranger and so on.

Furthermore another shortcoming of the Social Web applications, which needs to be overcome, is the restriction of the users in defining privacy preferences based on information within the border of their own application. Each of the Social Web applications is like an island collecting social data of their users and providing it in a site-specific, proprietary way. Due to the growing difficulty of maintaining such distributed amount of data and the increasing time people spend on the web to manage their identities and social networks and having to re-enter their information every time a new website is being used, it can be proved to be beneficial to incorporate social data of any of the otherwise isolated Social Web applications into the privacy preferences of an application.

Another problem is the perpetual change of the structures of individual social networks as new relationships arc build and already established relationships change their statuses such as an acquaintance can become a close friend and so on. All this changes internally but also beyond the borders of one application need to flow into the privacy preferences without the user having to adjust the settings manually. Say, if a new colleague wants to access a user’s data meant for the employers of his workplace, the privacy settings should automatically recognise and include her as a new colleague.

Of course, flexibility and big variety goes along with usability challenges; with privacy settings being difficult to understand and hard to adjust for normal users. This problem leads to users mostly keeping the default settings as it is already often the ease in nowadays Social Web applications. [10]

Bearing all arising problems with regard to privacy preferences in mind, this thesis seizes on the need for a fine-grained privacy management and exploits policies for defining the behaviour of a system based on certain conditions. These policies need to be expanded in a way that enables a policy-based access control that can be applied to the happenings of the Social Web so that a scenario as presented in the previous section can become reality. To achieve this goal several requirements need to be met:

- A dynamic and well-defined policy language is needed for specifying the poli­cies.

- This language must have the ability to incorporate data from any possible sources on the Web beyond the border of the own application which also sup­ports the high level of flexibility.

- These data can either be Social Data or Semantic Web data of any kind, such as attributes of a user or a group, information about relationships, user­generated data or activities, general information like publications and their authors, information about countries, languages and many others.

- Further qualities policies should have are:

- Fine-grained policies: policies need to be detailed enough to be applied to the complex scenarios on the Web, so that any arbitrary combinations of a user’s needs can be realized.
- Dynamic and automatically adopting to changes: Policies should be ad­justable to changes occurring on the Social Web, such as if a new colleague recluiros access, the policies arc adapted dynamically to the changes and therefore recognise the colleague as such and include him into the appro­priate concepts.
- Usability: The specification process of policies should be intuitive, simple and fast. People who arc not skilled in the formal syntax of the policy languages should have no problem to define policies.
- Lucidity: Users should clearly understand what a policy they have created docs without leaving room for interpretation.
- When access to a resource is denied an explanation is needed helping the user to understand why his request failed.

- To automate the evaluation and decision process of the defined policies a policy framework is needed. This policy framework automatically queries the respec­tive information sources and selects the information a user wants to use for his policies independently of the format in which the data is provided.

- The extracted information is unified and combined to create policies, meaning data from different sources can be incorporate into one policy.

- Policies should be enforced upon an application, that is, if a resource which is protected by such a policy is requested, the framework evaluates the policy and according to the result cither allows or denies access.

- To be able to correctly reason over a policy the framework needs to catego­rize the requester according to some provided identification properties. The requester is added to cither the group of people who arc allowed to access the source or the one for whom aeecss is denied.

- Analogously the information overload, due to the various communication tech­niques on Social Web applications, needs be controlled with the policies as well.

3 Background

This section is intended to provide background information which is relevant for understanding the subseciuent elaborations in this thesis.

3.1 The Social Web

The Web has undergone a change in terms of the way content is being created and used as well as the roles of providers (authors) and consumers (readers) which arc not mutually exclusive anymore. The concept describing the shift from the static, centralised Web to a dynamic, user-generated, open platform is often referred to as the Web 2.0, coined by Tim O’Reilly as he published the article “What is Web 2.0?”. Web 2.0 defines new Web technologies which enable rich user interfaces, provide services open for others to use, combine various sources and enable user participation and interactivity. [11]

Some of the most distinct principles of Web 2.0 arc the linking of data throughout the web and providing services that enable the users to not only socialize online but also publish, share, reuse and generally participate in creating content. This concept is also often referred to as the Social Web. In the era of this movement towards interoperability, information sharing and interacting the so-called Social Web applications have emerged providing an environment where people can link with each other to create personal social networks of relationships and collaborativcly create content. Some of the most typical application categories that built the Social Web arc wikis, blogs, podcasts, social bookmarking web services, social networking sites and content sharing sites.

Social networking sites and content sharing sites have both the purpose to built social networks among the members, be it the primary goal as it is the ease with social networking sites or the secondary objective like with the content sharing site. That is why these sites arc the main focus here, because the aspect of social interaction is important for this thesis. Further on the content sharing sites will not be separately presented in this section as their community aspect is similar to the one in the social networking sites.

3.1.1 Social Networking Sites

This subsection briefly introduces social networking sites and the common features, that arc found in most of them. A more detailed description of the social networking sites’ specific functionalities with regard to privacy will be presented in Section 4.

Social networking sites give people the opportunity to maintain their real-world social connections such as friends, family and colleagues online. Users can also build new relationships based on common ground; shared interests and activities or any other affiliations like common geographical location or business contacts to communicate and to expand their online social network. As a result users create links to other users producing the so-called social graph which represents a web of connections with direct tics and indirect tics (Friends of Friends). [12] Social networking sites enable their users to create sclf-portraving profiles which can contain all possible personal information about a user. Users also post self-generated content about their lives, activities and interests. The purpose of publishing data is mostly to share and discuss it with other community members. Information a user can provide about himself in his profile can be divided in three categories according to [13]:

Contact information: such as name, address, e-mail address, telephone and mo­bile phone number,

Individual information: including relationship status, personality attributes, sex­ual preferences, physical attributes; birthday, education and occupation, religious- and political affiliation,

Interest information: subjects of interest, hobbies, favourite books, movies, etc., and any association affiliations.

Furthermore data about users emerges from them being active on the social network­ing sites, for instance, joining groups or befriending someone.

Social networking sites provide various functionalities to support the social in­teractions and communication of data; chat, messaging, blogging, discussion groups, tagging'[7] and linking content and writing on one another’s ‘Walls'[8] arc the most common functionalities.

All this published information can be intended for cither a selected group of people like friends, family or for the general consumption. For this purpose one of the main characteristics on social networking sites is to define, who of the people belongs to the trusted set of users, also referred to as friends on the most social networking sites, and who belongs to the non-trusted set of users, labelled as the strangers. [14]. The most social networking sites further divide the non-trusted group in social networking sites members and non- members. The term friend, which is commonly applied to the trusted group of people represents a consensual connection between two users and has not necessarily the same meaning as it docs offline as people referred to acquaintances in the real-world arc often seen as friends in the online world. Although on most sites befriending someone is a bi-directional process, that recluiros confirmation of both parties to establish a link between the profiles, there arc also sites where onc-dircetional tics arc common. The access to information is mostly based on the established status of the user.

3.2 The Semantic Web

Information on the internet is mostly provided cither for the humans consumption or for machines to process. Furthermore data provided by one application may not be understood by another, making the process of data exchange and integration very hard. The Semantic Web aims to represent and structure data of the current Web in a way to make it understandable for humans and enable machines to understand the semantics of the content as well. Therefore data needs to be well-defined, interlinked and annotated with metadata so that it can be read, understood and processed by software agents and exchanged across various applications. [15]

The Semantic Web is based on various technologies to make data available in a semantieallv structured format, as it is illustrated in the Semantic Web Stack seen in [161.

3.2.1 The Resource Description Framework (RDF)

One of the most widely adopted open, standard format, used to provide machine- readable information, is the Resource Description Framework (RDF), which was developed by the W3C[9]. With RDF Web resources can be described in a common way so that they can be read and understood by different applications. Information about web pages, personal information of people, among many others can be modelled using RDF. RDF is based on two technologies; XML[10] and URI[11], whereby URI is used to identify resources on the Web and XML is used to exchange information between different systems. [17]

Information in RDF is written in statements like “The location of the company Daimler AG is Germany”, that consist of triples of {Subject, Predicate, Object}, for instance “Daimlcr_AG hasLoeation Germany”. An RDF statement describes a

- resource (the Subject) identified by an URI, such as http ://
- the property of a resource (the Predicate) such as ‘location’ and
- the value of a property, which can be another resource such as or a literal like ‘true’ or ‘25’. Several such triples form an RDF graph. [18]

3.2.2 The SPARQL Protocol And Query Language

To query such previously described triple-data provided by an RDF graph the RDF query language SPARQL (SPARQL Protocol And Query Language) can be used. The Syntax of SPARQL is similar to SQL using SELECT[12], FROM and WHERE to form a query. But as RDF is written in triples the SPARQL syntax is using the same pattern of {Subject, Predicate, Object} statements as one can sec in the following example query. Here the query returns all companies in the RDF graph with the location ‘Germany’:

illustration not visible in this excerpt

3.2.3 SPARQL Endpoints

With the Semantic Web cvolvcmcnt a lot of information is made available on the web in the RDF-format, allowing third parties to retrieve this data via so-called SPARQL endpoints. The endpoints arc web services that allow to address and query RDF-data sources using SPARQL. The query results arc then presented in an XML-format.

Popular endpoints exist for services like DBpedia and DBLP, both offer large knowledge bases for external applications to use. DBpedia is a semantic database which extracts structured information from Wikipedia and provides it in a Semantic Web conform way, strictly speaking the information is presented in the RDF format.

The DBpedia datasets comprises information from numerous subjects; from different geographical locations via known people like athletes or artist through to various or­ganisations including companies, educational institutions and sports teams[13]. Using the dataset queries like “People who were born in Berlin before 1900” [19] can be answered. The DBpedia project also follows the Linked Data principles [20]: all con­cepts arc identified using URI references, allowing the interlinkage to other datasets like DBLP[14] or Geonames[15].

DBLP is a bibliographic database based around computer science which col­lects information about conferences and authors and their publications, papers and journals among others. The DBLP D2R Server provides all this information in a Semantic Web format and allows to access the data via an endpoint.

3.2.4 The Social Semantic Web

In the last years Semantic Web technologies arc used more frequently to represent Social Data provided by Social Web applications. These technologies enable to inter­link different datasets from various sources with each other, thereby helping the data portability movement to create the Web of Data [15]. The resulting Social Semantic Web data is represented in a reusable, machine-readable, and non-application-spccific standard format. The data becomes more accessible and it can be easily integrated in other applications and exchanged among them.[16]

Semantically-interlinked Social data needs representation mechanisms to model specific social information; the user and his social network and the user’s generated content. The FOAF vocabulary is used to describe people and their connections towards each other. FOAF (Friend of a Friend) provides a standard format based on RDF, which enables to indicate personal information, like name, e-mail address, as well as information about people a person knows using the foafiknows property. Further on, it is possible to include a link to other people’s FOAF files. [21] The SIOC project[17] was created with the purpose to represent the social activities in online communities, also called the social media contributions [22] such as blog posts, wiki pages, bookmarks and comments.

All this user-generated content is modelled by the Semantically-Interlinked Online Communities (SIOC) in an open standard format based on RDF. Both FOAF and SIOC contribute to data portability, which has the goal to have all data connected to one person modelled in a single global graph of RDF data which can be reused across interoperable applications. Whereby the data is collected from various applications, where the person has user profiles.

3.3 Privacy Policies

Policies arc well-defined statements that specify and regulate a system’s behaviour under certain conditions. Typically they arc declarative rules with a formal syntax in order to be machine-understandable. [23] Policies arc pervasive on the Web and arc used for different areas in web services. The most common arc security and privacy, business rules and quality of service [24]. In the following for each stated policy type an example is presented:

Security and privacy rule: “Disclose the personnel master data only to employers with a respective access authority”

Business rule: “A purchase which exceeds a specific value is dispatched free of charge”

Quality of service: “WEB traffic should receive at least 50% of the available band­width resources or more, when more is available” [25]

Policies can be used for access control decisions, for example to allow [deny] access to a resource when the set of conditions of the policy is fulfilled [not fulfilled]; con­ditions can include for example verifications concerning some properties provided by the requester, who in turn has to prove that he is authorized to access the requested resource. This proposed type of policies is called privacy policies. An example policy of Bob from the motivation scenario can look like the following:

“Give access to presentation slides only to people who speak German. ”

The request for access triggers an evaluation process of the policies which arc responsible for the required resource. If the requester fulfils the conditions stated in the policies, the resource will be allowed to access. In the example policy indicated above the evaluation process is more complex, as it would need to extract information about the requester from external sources (in this ease from another Social Web application) to cither prove that the requester masters the German language or not, of which more in the later sections.

3.3.1 Policy Languages

Policy languages arc languages to express policies, defining their syntax. This thesis concentrates on languages with formal syntax only. Such languages have many pos­itive characteristics, which arc listed below, that prove to be beneficial in nowadays complex and dynamic web-related systems;

- a policy language is dynamic and extensible to be able to adapt to changes in the environment and the requirements for the behaviour of a system,
- a language is declarative as policies are only in charge of requirement specifi­cation, they do not explain the way these requirements need to be fulfilled,
- the language’s semantic is well-defined in the sense of they are not ambiguous and can be understood by different systems in the same way,
- languages are typically logic-based to provide such a well-defined semantic,
- languages can be reasoned over, meaning that with using explicitly available statements new information can be inferred and thus enhancing the knowledge,
- different types of policies (security, business, etc.) should be able to be ex­pressed in the language as well,
- as many scenarios are nowadays Semantic Web based which requires reasoning and an ontologv-bascd policy speeifieation for interoperability [23], policy lan­guages also need to be expressive enough and use Semantic Web techniques in order to be exchanged between peers on the Semantic Web. [26]

For more detailed analysis of these characteristic see [27].

3.3.2 Policy Frameworks

Policy frameworks, like Kaos [28], Rei [29] or Protune [30], automate the evaluation and decision process, cheeking whether a policy is satisfied, based on the conclusion drawn from privacy policies. These frameworks use Semantic Web languages with above described characteristics.

A policy framework communicates with the environment applied to and is invoked when an event happens such as someone requests access to a resource protected by a policy. If this is the ease the framework automatically enforces the appropriate policies, asks the other party for needed identification properties such as an id or a credit card in ease of business transactions for instance and performs actions if needed for the policy evaluation, such as cheeking the validity of the received credential. Once the reasoning over the policies is finished and a decision is made the framework either allows access or denies access according to the solution of the decision process, in ease of an access request.

3.3.3 The Protune Framework

Protune (PRovisional TrUst NEgotiation) is a framework with the goal to “combinfcl distributed trust management polieies with provisional-style business rules and aeeess control-related actions. Protune features an advanced policy language for policy- driven negotiation and supports distributed credentials management and flexible policy protection mechanisms.” [7].

The Protune policy language is based on logic programming and therefore uses a similar notation for its policies. An example Protune policy for aeeess control looks like the following[18]:

illustration not visible in this excerpt

The left-side of the policy rule is called the head and the right-side the body of the rule. Expressed in natural language the policy states that if a requester, in this ease represented with the variable User[19], is a friend of the resource owner and the resource, represented with the variable File, is a research paper, than the requester can aeeess the resource. Adding facts like “isFricnd(‘Tom’).” and “rcscareh- Papcr('ScmanticWcb.pdf’).” will allow Tom to aeeess the file SemantieWeb.pdf.

In general a variable like User can refer to any resource, which can be found in the set of given facts. Once a matching fact is found, the variable is replaced by a constant like ‘Tom' through unification. This enables to extend the knowledge base without having to add a new policy; for example in ease Bob gets a new friend all he has to do is to add the fact “isFricnd(‘Alice’).”, thus allowing Alice to aeeess the file SemantieWeb.pdf as well. Additionally the meaning of a condition part such as isFricnd(Uscr) can be further defined with rules like

illustration not visible in this excerpt

The evaluation of a policy such as Policy (1) means that the policy is queried with the goal allow(aetion[20] ), which involves the conditions in the body to be evaluated, whereby both predicates in the body need to be fulfilled as the operator represents a conjunction in the Protune language. In doing so for each of the predicates in the body of the policy all applicable rules arc checked, till cither the evaluation is successful, returning true for all predicates, thereby the action access() is executed or the evaluation is unsuccessful, returning false for one of the predicates, resulting in the aeeess being denied.

Summarized the head of a rule holds, that is the action is allowed to be per­formed, if all conditionsi (predicates) in the body hold and conditioui holds, if all conditionsu hold. [23]

illustration not visible in this excerpt

Figure 1: Example Policy inlcuding rules and metarules.

To explain further features of the Protune language an excerpt of a policy is presented in Figure 1, following the example displayed in [31].

Provisional and Logical Predicates A provisional predicate represents ac­tions in Protune. An action can be specified cither in the head of a rule, like acccss(rcsourcc) (1.1) in rule (1), sec Figure 1 or in the body of the rule like it is the ease with dcelaration(Uid, Pwd) (1.2) in Figure 1 which is an action as well. Actions may need to be executed during the decision process, which would be the ease in a negotiation process, where one peer may ask the other peer to send his credentials in order to gain access to a resource. (32] Generally speaking, actioui can be executed only if actiou2 has been executed. (31]

illustration not visible in this excerpt

Besides provisional predicates there arc also logical predicates like password(Uid, Pwd) (1.3) in the body of rule (1), Figure 1.

Metarules The Policy in Figure 1 contains different types of rules; rule (1) contains actions to be executed and rules (2), (3) arc facts. The remaining rules arc metarules. Metarules arc used in Protune to be able to differentiate between a logical predicate and an action among other reasons. In general, metarules describe the properties of other predicates.

For example the rule “access(_) ^ type : provisional” (4) is a metarule stating that access() is associated with an action. Another metarule is the one including sensitivity:publie (10) which means that the predicate passwordQ is not disclosed to the other actor as it is declared private.

It is beyond the scope of this thesis to describe all constructs which arc provided by Protune. More details can be found in (31] for better understanding. Furthermore in the course of this thesis any metarules will be left out concentrating only on the rules which arc needed for the requester to get access to a resource.

4 The Social Web from a Privacy Perspective

The following section deals with uncontrolled information disclosure and conse­quences people have to live with due to major shortcomings in nowadays Social Web applications.[21] The privacy preferences of three selected applications arc presented afterwards followed by an analysis of the shortcoming on Social Web applications with a specific focus on the prior presented applications together with propositions to what additional functionalities would be beneficial.

4.1 Data Disclosure - Why Better Control is Needed

Because of the open nature of nowadays Social Web applications users arc willing to expose a lot of private and sensitive information, which otherwise only a limited number of people would know, for an invisible audience. Once the information is online, it is difficult to oversee the data flow and usage, as information is often broad­cast to other users, linked to other content, indexed by search engines or collected by third-party applications. Furthermore the new possibilities to communicate and share information not only raise privacy issues but also the amount of undesired messages and updates of others leads to information overload.

All things considered especially noteworthy arc two categories were a fine-grained control of information disclosure can be of benefit:

- Security/ privacy protection and aeecss control
- Information overload

Whereas the second category needs to be differentiated between information con­sumers who actively seek out information and consumers who passively receive in­formation.

4.1.1 Privacy Issues

The first category about aeecss control is the more important one, because a weak protection can lead to severe consequences not only online but also in the real-world.Just like the Social Web being very multifaceted so arc the types of privacy threats arising on the Social Web applications. Some of the most common threats arc listed below, for further reading sec [33]:

Social Network Spam Automated friend invitations, if accepted, lead to the spammers getting aeecss to private data like email addresses which arc then used to send personalised marketing activities and advertising products tailored according to profile information. [34]

Viruses and Worms Similar to the problems with e-mails worms arc now also tar­geting Social Web applications as it was the ease with MySpaec and Faecbook. [35] Worms use the friends list of a victim to send clones of themselves to the other profiles, using the established trust between friends on such platforms.

Identity Theft An identity theft is launched by cloning a profile of a user and send­ing friend requests to the contacts extracted from the original profile. Being friends with the contacts gives the attacker access to any personal information, which then can also lead to spamming or other misuse of the information. [34]

Stalking, Bullying Stalking on the web means that a victim is not persecuted physically but by using communication techniques such as instant messaging, posting on the profile etc.. Whereas Bullying is used to harass the victim by means of the same communication techniques.

All the above mentioned threats arc based on easily accessible private data and the easy way of befriending someone, as the tendency to accept any friend requests in Social Web applications is very high regardless of whether the ‘new’ friend is known or trusted by the user accepting the request [36].

Nevertheless a more dominant privacy issue concerning personal data is caused by authorities and people belonging to the social environment of a user. The for­mer primarily consist of job recruiters, employers, police or members of educational establishments among other authorities. The latter includes colleagues, friends and family.

Context of Data and Privacy Issues Privacy issues arise by both above stated groups based on the vast amount of content provided by the users as well as their personal but open social networks with links to other people on the platform. In addition to the personal data, information which is read or seen out of context or data revealed through linkage like tagging photos to a profile, as well as any other activities on a Social Web application such as doing surveys, entering groups etc. creates information that reflects the image of the user and can be wrongly interpreted. Say, a student publishes a photo of him drinking on his birthday, any other people who attended the party know that this is not a regular occurrence for the student, but when the family or even worse (future) employers sec the photo without knowing the context, they might wonder whether this is normal behaviour of the student in question. Such problems arc increasing as the separation of the different areas of life (family, friends, colleagues) dissolves in the online social space although offline they arc mostly kept separate.

Network Relationships and Privacy Issues Social network relations can cause problems insofar as by befriending a person, a link to the person’s profile is added to the friends list of the user, which is in the most eases public. Say Bob added Tom as a friend, thus Tom’s profile becomes part of Bob’s web presence. So any embarrassing photos, inappropriate statements and comments about sensitive topics which Tom can write on Bob’s profile space can cause Bob problems without him actually doing anything. A possible situation could be, if Bob is looking for a job and gets a post on his profile from Tom, telling him to be punctual on their next meeting, as he knows about Bob’s tendency to always coming too late. Such comment would not go down well with job recruiters, who would want to cheek Bob’s profile before inviting him to a job interview. Further on hiring decisions of prospective employees can be based on photos the user uploads or the photos he is tagged on, information he provides about himself like political view, religious affiliation, sexual orientation, desire to have children in case of women and any other information which can lead to discriminations.

4.1.2 Information Overload

When speaking of information overload a differentiation needs to be made:

- Avoid social software annoyance; people passively receiving information from others and
- personalisation of information supply; information consumers who actively seek out information by visiting the profile of someone.

Social Software Annoyance [37] encompasses all the amount of information which a person receives via various techniques which are meanwhile used by most of the Social Web users. These techniques involve chat messages, profile space com­ments like Wall posts on Facebook, blogging posts, private messages, posts within a group among many others. On top of that considering the vast amount of friends on the Social Web networks leads to information overload, which makes it harder for a user to find the information he is actually interested in. Additionally, messages often arrive at an inappropriate time. Say Bob is chatting with his colleague about a project they are working on and he gets chat requests from friends all the time distracting him from the actual topic.

Another example would be the News Feed[22] feature of Facebook; by having many active Facebook friends, who are mostly only acquaintances, it is difficult to find the updates of your close friends and family in the flood of information on one’s main profile page.

Personalisation A similar problem are facing the visitors of a profile, homepage or blog. They suffer from the same information overload, as they are mostly interested in only a specific type of information. Filtering the data according to the preferences could save people a lot of time, as for example a colleague of Bob might not be interested in his posts about his hobbies but would rather prefer updates on the progress of their joint project. Whereas Bob’s friends would get bored having to read through all of Bob’s updates about his progress of the project and new ideas for the next research topic until they would find Bob’s comment on a hobby they are sharing.

4.2 Privacy Protection on the Social Web - a State of Art

This section reviews some of the most prominent Social Web applications in detail with regard to the offered functionalities to preserve personal data against privacy violation. Faecbook is one of the biggest social networking sites, whereas Fliekr is a popular representative of the content sharing sites and Twitter is “one of the fastest- growing phenomena on the Internet" according to the New York Times.[38]. Twitter belongs to the type of Social Web applications with major focus on miero-blogging[23], which is a relatively new form of web-based communication platforms, nevertheless it is not less popular than the other applications.

4.2.1 Twitter and its Privacy Options

Twitter is a miero-blogging service where users can publish short text messages called updates or tweets and can also subscribe to messages from other users whose updates they arc interested in. These messages as well as all replies to the updates can be seen in real-time on the front page sorted like a personal timeline.

Twitter enables people “to follow” other users whose updates will be displayed on the before mentioned timeline, in other worlds a user can subscribe all the messages of interest. Following someone is a synonym to what other networks call when adding someone as a friend. Like a one-way relationship a user can follow a specific person by getting his updates on the own homepage without that person following the user back, which makes the links of the Twitter network unlike most other networks directed.

Analogously other people can keep track on a user’s own updates by becoming what is called on Twitter a follower, in order to receive his updates on their own timeline as soon as he posts them to his own account. A list of all the followers and followccs of a user is provided on the user’s homepage adding links to the profiles of each user mentioned.

Managing a Twitter Profile and its Privacy

A profile on Twitter has a much more limited provision of personal data in com­parison to other social networks, allowing users to give only little information about themselves. When a user signs up for Twitter, his profile is public by default, which means that all the information he has filled in for the profile, including the real name (which is optional as the screen name is used to identify a person), location, short personal description or any links to external homepages and blogs will be displayed for everybody to see.

Furthermore all user’s ‘tweets’ can be followed by any other Twitter member without needing the user’s consent and they arc published on a public timeline of recent updates. Moreover people can search not only for a username but also for a real name or any other information given in a profile with the help of the advanced search. It is important to bear in mind that Twitter is not a services dosed of from the rest of the web, in other worlds publie updates and profiles are not only disclosed to non- members but also to search robots who index information for external search engines like Google.

To keep the personal information private Twitter offers to protect a profile, mak­ing the updates only visible to approved users. Having a protected account the owner can decide who is allowed to sec his profile and receive his update by approving the follow-requests of each user. Although the privacy setting hides all the updates from the profile page, all the other information about the user, for example his photo, real name, location, will stay visible. Unapproved users will still be able to sec who the owner of the private profile is following and all of his followers as well. To protect a profile also means the Twitter updates, posted after the protection, are not only excluded from the public timeline but also from the Twitter search and external search engines as well. However, updates posted before the protection would need to be deleted manually. After changing the public account to private one can remove existing followers who otherwise won’t be affected by the change of privacy settings.

If a user needs to exclude a particular person from getting his updates without changing the privacy settings to private he can block said person. Thereby the person will neither be able to follow him nor will he be able to send a follow request although the blocked user will still sec the profile page and the updates, even reply to the updates in ease the account is kept public.

Linking User’s to Updates Twitter offers a possibility to mention someone else in the own updates by adding ‘©username’ to the update. Moreover a link is being added to a user’s profile, in ease he is mentioned, regardless of the relationship between the users or the privacy settings, so that anybody reading those updates will be able to access the mentioned user’s profile via the link even if the account is private. Additionally, unless the profile is protected this feature is public and searchable. The so-called ‘mentions’, which is a distinctive feature for Twitter, is widely used among the Twitter members, which means a profile can be linked to any context the profile user may or may not want to be connected to, thereby displaying a flaw in the privacy preservation.

Summarised, if a profile is public or a user is a follower, he can sec updates, reply to and follow them. If the profile is private and the user is not a follower, he cannot sec the profile’s updates, thus he cannot make any replies either. To follow the profile the user would need an approval. All the other information provided by the user is seen by any other user of Twitter, regardless of the account being private or public.

4.2.2 Facebook and its Privacy Options

General Functionalities on Facebook

Facebook is one of the most popular social networking websites that allows people to create profiles and build social communities. The core of each social networking service is the ability to create and maintain a personal profile which on Facebook consists of several information sections, which can contain all kinds of personal data about a person. These sections are called basic-, personal-, and contact information

illustration not visible in this excerpt

Figure 2: An example for dividing friends into categories

as well as education and work. In each one a user can disclose as much about himself as he likes, starting with information such as the birthday, hometown, relationship status, sexual orientation, political and religious views, interests via the high school, university and working place through to the emails, addresses and phone numbers. Besides all the information which is specified by the user the profile also displays a list of friends, uploaded photos and videos and any activity made during the time the user spend on Faecbook such as newly-made friendships, groups joined, applications added, notes written by the user or the user’s friends and so on.

One of the most essential features which is a key to building the own social net­work is the ability to add others as friends. Like in most social platforms relationships work bidirectional, meaning a user has to sent a request and wait for confirmation to become a friend of someone. To separate different types of relationships, such as family, friends, colleagues or an even finer subdivision such as friends from school and friends from the sports team, Faecbook offers to create friend lists according to a user’s favoured partition, see Figure 2 for an example. The purpose of those lists is not only to help to categorise the own connections but also to have flexibility in adjusting the profile privacy, of which more later.

Besides including people to the friends list one can also mention the family mem­bers in the profile. If those members arc registered users of Faecbook, a link to their profile will be created once they confirm the relationship.

A distinctive feature of Faecbook is that the platform consists of many different networks which arc based on areas of life, such as school, college, workplace or a geographic location. .Joining a network means a user will not only be able to see the profiles of his friends but of all members of the network unless their privacy settings forbid it. One can easily join a regional network but to join a workplace, school or college network one needs a valid email address provided by the respective establishment to prove the affiliation.

An also important part on the Faecbook network play the many built-in applica­tions which arc being offered for the people to use; photos, videos, groups and notes arc just a few of them but, besides those, which arc set up for a profile by default, there arc many more applications that can be individually added and displayed on the profile or deleted if they arc not required anymore. Some of the more dominant applications, including the “Wall” and the “News Feed”, are going to be analysed more detailed; because of their widespread usage they have a significant impact on privacy of personal data and can pose a big threat to privacy if used wrongly.

Privacy Control of Profile Data

Facebook offers a wide range of possibilities for adjusting the visibility of the content of a profile with several restriction levels to do so. By default, the visibility of a user’s profile content is restricted only to his network members and friends. If this level of privacy is not limited enough he can forbid access to anyone who is not a friend to get the most private profile possible, in contrast to that he can revoke the privacy settings in order to make the profile public for everybody on the Internet to see. The visibility of the following sections of a profile can be adjusted on an individual basis:

- Profile in general or the individual sections mentioned above
- Status Updates (posts a user publishes on his Wall)
- Photos/ videos tagged of a user
- A friends list
- Wall Posts (concerning posts on the Wall made bv friends)
- Contact Information; the privacy settings can be applied to each data in the contact section on an individual basis, such as mobile phone, email address and others.

Figure 3 shows the possible settings for “Wall posts” as an example. Settings for any other section listed above are of the same structure except of the restriction level “Only me”, which is an option solely for the Wall posts as well as the contact information. The contact information is very sensitive data, which a user may want to upload for his business contacts or close friends to see, but hide from any other people, therefore Facebook also disabled the “everyone”- restriction level as a precaution.

To have a more granular control one can either restrict the visibility of certain Information to specific people or deny the visibility to some people by either naming them all individually or creating an aforementioned friends list which includes all users in question. A typical example would be to restrict photos of a family gathering to only the family list and hide them from the other contacts.

As the friends section and the networks section work independently from one another, both have to be considered while customizing one’s profile and any other features one wants to close off from public viewing. Furthermore a user can decide whether a section appears on his profile, listing all the networks he has joined and the networks his friends belong to.

In addition to the settings described above a user can opt to change the visibility of each application on his profile and customize the search privacy, which is needed in addition to the privacy settings; as for example if a user wants to hide his friends list from anybody who is not a friend, he needs to adjust the search settings as well,

illustration not visible in this excerpt

Figure 3: Defining on Faccbook who can see posts on the Wall made by friends

because otherwise the friends list would still appear in search results. As with the most social networks Faccbook offers the ability to block another user, meaning that not only a friendship status will be broken off but also that the blocked person will neither be able to sec the profile nor find it in search results.

Privacy of the Search Functionality

As already mentioned before, additionally to controlling the visibility of a profile, a user can also adjust his search privacy; to be precise, who can search for him and what they sec about him in the search results. By default everybody on Faccbook is able to find a profile. This search result visibility can be restricted to only specific types of people; people in company, school, college or regional networks and the friends and/or friends of friends. There arc many reasons one would not want to be found by specific people like to prevent awkward situations of turning down friends requests from people with whom one would not like to share private information with, like the own employer.

Once being found by search, people who arc not allowed to sec the profile due to the privacy settings can sec a limited version of that profile which can be customized as well, namely the profile picture, the friends list, pages one is fan of,[24] messaging and to add the user as a friend can be cither permitted or not. At most, if all options arc disallowed, people can only sec the name and the networks.

If not disallowed by the user, a public search listing is created as well, to allow

illustration not visible in this excerpt

Figure 4: An example notification displayed on a profile homepage, when adding an application.

people outside from Faeebook to search for the user and search engines like Google to index the listing, provided that everyone on Faeebook is allowed to search for the user. This listing corresponds to the result view of the profile which was established for the Faeebook internal search, meaning that people can also sec a user’s Faeebook friends, if he has allowed that option and reach the friends’ own public listings.

The Wall and its Privacy Possibilities

The afore-mentioned Wall-applicai ion is part of the profile where a user can publish photos, videos, and links to external websites or textual posts. A user can also allow friends to comment on his updates and send their own updates of any kind to his profile. To have a better control of the Wall a user can delete any entry made on his profile, regardless of who the originator is. Furthermore all the Faeebook activities like using applications (for example participating in surveys or games), befriending other users, joining any groups or networks or any other modifications made about the profile information will be posted on the Wall. Figure 4 shows a possible notification about becoming fan of a page, which can cause real-life problems, when seen by the wrong people.

Faeebook offers several restriction levels which give the user the possibility to control the access to his Wall as seen in Figure 3. This is a crucial privacy setting if a user uses Faeebook for business and wants to avoid friends writing embarrassing comments to his Wall. By default the Wall settings correspond to the profile settings, meaning the content of the Wall is visible to anybody who can also sec the Profile but one can exclude others from viewing the friend’s posts on the Wall. To hedge against adverse effects one can also forbid others from posting to the own Wall at all.

News Feed and its Privacy Possibilities

On the Faeebook homepage one can sec the already mentioned “News Feed” section, a feature which keeps a user up to date on recent actions of his friends; newly uploaded photos and videos, groups they joined or comments written on their walls; all those activities by his friends arc automatically posted to his homepage in real­time without having to visit all the friends’ pages for their updates. Nevertheless the information is only collected as long as the privacy settings of the friends do not restrict the access. Basically this means publishing any kind of content on the own Wall will report these updates to the friends News Feed, whereas publishing content to a friend’s Wall will display said content on the homepages of all mutual friends who arc allowed to view the friend’s profile. Content which is hidden from the friends will not be published on the friends’ News Feed, say a photo album is closed off from a group of friends. Any comments on these photos will not be displayed on the homepages of the restricted group of friends cither.

illustration not visible in this excerpt

Figure 5: An example notification displayed on a profile homepage when changing part of the profile.

Next to the News Feed there is the “ffighlights’-seetion, where a user is provided with recent happenings and events which arc of relevance according to Faecbook; among other things those highlights include recently uploaded photos and tags of the friends, comments these friends made about some content or changes of relationship status as seen in Figure 5, which can have an unpleasant outcome or lead to awkward situations.

To protect this information from being published on the friends’ homepage a user can decide which activities he allows to show on the highlights section of his friends. Information which will neither be posted to the News Feed nor to the Wall arc mostly of negative nature namely deleting any entry, declining friendships or invitations but also any visits or readings a user makes on Faecbook arc kept anonymous.

The only activities which can be individually disallowed from being posted to the Wall and thereby to the News Feed arc:

- befriending others
- deleting any profile information
- writing on discussion boards in groups.

A user can also prevent posts he made to someone’s Wall from being published to any News Feeds of their mutual friends. Additionally one can disallow any application, such as groups, photos, events, etc. from publishing news to the Wall and thereby to the friends’ News Feed, such as notifications about newly uploaded photos posted to the Wall.

Facebook’s Applications

In general each application can be individually closed off from public viewing or added to the profile for others to sec. As there arc many applications whose posting notifications can be not only embarrassing but can cause severe problems such as loosing a job if seen by the professional contacts, it is important to oversee the profile once using an application. Figure 4 shows an example which is quite self-explanatory.

The restriction settings for applications’ visibility include the following levels of privacy: “Everyone”, “My Networks and Friends”, “Friends of Friends”, “Only Friends”, and “Only Me”. To have a better granularity for the settings one can exclude partic­ular people from viewing the content as well. In addition a user can decide if people belonging to some or none of his networks arc allowed to view the application in question; the settings arc similar to the ones seen in Figure 3. Faecbook sends a notification by email as soon as an action is made concerning the user such as

- mentioning a user in a post,
- posting on a user’s Wall,
- Tagging a user in a photo, a video or writing a comment on the tagged photo,
- Tagging or commenting on one of the user’s photos, videos or an update on his Wall.

Pictures and Privacy

On Faeebook one can create photo albums with visibility restrictions which can be adjusted to each album individually or for the photos application as a whole. Once photos are uploaded they can be tagged to any people to identify them on the photos; tagging oneself or any of the friends on Faeebook will make those photos appear on the photo section of the respective profiles, creating a link between image and profile. This photo section, which is published in a profile in addition to the own albums, contains all photos a user has been tagged in. Friends will be able to see a photo on the user’s homepage, where he has been tagged only when the publisher’s settings for that photo allow it. Moreover Faeebook notifies all of the friends of the tagging process together with the tagged photo by displaying the activity on their News Feeds.

Tagging Photos can not only be tagged by their owner but by any other person who is able to see them. This functionality is the cause for many privacy issues which can have grievous consequences as already described in the beginning of this section. The privacy settings apply only to the own albums disallowing a user to intervene into the visibility restrictions of photos belonging to someone else’s album. The same goes for the tagged photos which abide by the privacy settings of their respective albums. Therefore to protect one’s privacy Faeebook allows a user not only to remove any tags of his own uploaded photos but also any tags of his person made by other people regardless of the ownership of the photos involved so that the link from the photo to a profile is deleted. If a user does not want to delete any tags but still wants to take precaution, he can simply change the visibility of any photos that have him tagged for only some specific friends to see or hide from anyone but himself. In this case the visibility of the photo albums should be adjusted as well, because hiding the own tagged photos will not hide them if the album is signed as public. Once a user has tagged someone else in a photo, which was not published by him, he will not be able to undo the link.

So if a user disallowed certain people from seeing his tagged photos, photos will not be published on their homepages once tagged to said user. Besides uploading photos in the albums a user can post photos to his Wall or to his friends’ Wall and comment on the friends’ photos if he has permission to do so as well as on photos tagged to him. Those comments will appear next to the image for other people to read.

All in all one can adjust the privacy settings of photos which are tagged of a user or uploaded by him. So once a user has been tagged by another user, he gets notified by Faeebook, for him to decide, whether he wants to keep that link or not. Due to similar functionality the video application will not be discussed in this paper as all the feature settings are equivalent to the ones for the photo application.

Publishing Events

The events application is another feature provided by Faecbook which is not quite ordinary for social networks. For an upcoming event a page can be created on Faecbook which describes the event, its location and time. An event administrator can invite any of his friends who will be then added to the guest list. Depending on the privacy settings of the events, which arc the same as for the group application, which is described below, people without invitations may sec the contents as well if the event is not closed off. Once being added to the guest list the friends can sec a user’s participation on their homepages. In order to remove the connection to an event a user can simply remove himself from the guest list.

Groups and Privacy

Another feature similar to the events application is the group application which allows people with similar interests to share content on specific topics, such as par­ticipating in discussions, publishing photos or videos. When a group is being created, it is cither set to be “global” or restricted to a specific network, moreover a group can have one of the three offered privacy levels; an “open” group with a global set­ting allows everybody on Faecbook to sec the content and join the group whereby a network-specific group, which is open can be viewed only by the members of the same network. A closed group hides its content from non-members and can be joined only by invitation or request acceptation. Lastly the most restricted level is to ad­just the group settings to being secret, meaning the only way to join the group is by invitation but first and foremost the group neither appears in search results nor on the member’s profiles. Members can leave a group anytime which will remove the group name of the list of groups on their profiles.

An important annotation is that the privacy settings of an account dominate over the group preferences, which implies that when joining a group, the profile information will not be exposed to its members who would otherwise not be allowed to view that information.

Further Features on Facebook

There arc far more features offered by Faecbook which arc not analysed in this thesis because of the wide range of possibilities but which also have to be taken into account while talking about privacy. Those features include the chat functionality, using the mobile phone to communicate with Faecbook, the notes application where one can cither write a note or import an external blog, the link application that allows to post links of external websites to ones profile, Beacon advertising system among many others.

4.2.3 Flickr and its Privacy Options

The idea of Flickr is to provide a video and photo sharing website whilst at the same time facilitate social activities by maintaining many features similar to social networking sites, meaning members can connect to other people by building their own social networks. But the main focus still lies on publishing and discussing photos.

Nevertheless this analysis concentrates more on the social networking aspect of the web application. Fliekr provides functionalities that enable members to create own photo albums for their own usage or share them with other people. Similar to social communities a user can add someone to his contact list, allowing them access to more private images. Uploaded photos can be annotated with descriptive keywords to case the search for relevant images, commented on, marked as favourites and posted to groups. On these so-called ‘group pools’ photos which arc related to a particular topic arc shared and discussed. Similar features arc provided for the video sharing service but because of being of minor relevance to the Fliekr community as opposed to the photo service, it will not be further dealt with.

Managing a Fliekr Profile and its Privacy

Profile By creating a profile on Fliekr, a user fills in certain information, which contains

- the real name, gender, relationship status,
- a personal description,
- personal homepage or any instant messaging numbers,
- occupation, hometown, current residence,
- interests and other information.

Although being optional once the information is posted, it is public for everybody to sec. An important characteristic of Fliekr is that the real name is optional as well because a Fliekr user is identified by the distinct screen name. If the real name is specified, it will appear on the profile and in search results if not otherwise stated.

Contact List A user can add any Fliekr member to his ‘Contact List’ and remove him without any consent on the person’s part. Once designated as contact the Fliekr member will receive all the user’s latest updates along with the other contacts’ updates on his homepage as long as the user’s setting permits it. Furthermore a user can mark the contacts as friends and/ or family to create different sets of users for better privacy control of the account as he can decide for example to exclude the friends from seeing his personal data but not his family and vice versa.

Protection of Private Data In general even people without a Fliekr account can still view and search for people, photos or groups if they arc public. To protect a user’s private information, some parts of the profile can be restricted for only a specific group of people to view. This setting can be applied to the email address, instant massaging name, real name and the current city which can be individually disclosed to some fixed groups of people; these groups arc:

- anybody,
- any Fliekr member,
- any contacts,
- friends and family.

Additionally if the email address is not intended for the publie viewing at all, a user can disclose it from anybody but himself.

Protection of the Contact List The contact list cannot be hidden from public view like the above mentioned personal information and what is more any profile visitor gets to sec the contacts in their respective category, i.e. one can sec all people belonging to the family list of a user as well as all people belonging to the friends list and the ones remaining in the contact list. Seeing as the relationships on Fliekr arc onc-dircetional as it is the same ease in Twitter, there is a feature which allows to sec everybody who has added a user as a contact. In order to be removed from a contact list or prevent a user from adding oneself to such a list said user needs to be blocked.

Photos, Groups and Other Features

Photos Photos can be uploaded on Fliekr for public sharing or for private storage. A user can control the privacy level of each of his photos individually or change the default privacy settings for all newly uploaded images as a whole. If a photo is designated as private, he can decide whether it is viewable by his friends, family or only by himself. Otherwise a photo is marked public, meaning everybody, including non- members of Fliekr, have access to it. Fliekr provides many different features which members can apply to any image they have permission to view; like commenting, tagging or adding notes to a photo. All these features can also be restricted to the user’s liking. The privacy settings include: oneself, friends and/or family, contacts and all Fliekr members.

There arc many other settings on Fliekr which allow a user to protect his images according to his own preferences; he can allow people to download or share his photos and videos, allow others to sec where the photo was taken on a world map or he can allow the EXIF[25] data of his photos to be displayed next to the images.

Search Feature Fliekr permits to hide the profile as well as the images from searches. One can decide whether the photos should be restricted from being found by the Fliekr search or third party applications which can access Fliekr and its data. Nevertheless as long as someone knows the email address of a user he can search for that user regardless of his settings. Additionally if a user hides his profile from searches, neither the username nor the email address will appear on any search result lists. Furthermore in such ease the link to the user’s profile will not be visible on any profiles of his contacts.

Groups Lastly there arc groups to be mentioned and their influence on privacy settings. There arc three types of groups to be joined; public groups which can be joined by everybody or which can only be joined by invitation but seen by everybody.

These groups appear on a member’s profile without user being able to hide them. Administrators of the groups can choose to hide photo pools and discussions from non-members. Private groups can only be joined by invitation and viewed only by group members. Thus they arc not displayed on profile pages of members and cannot be found by group search. Adding photos to a group pool, which one has to be member of, allows any members to not only view the photos but also to add notes, tags and comments regardless of the previous privacy settings for these photos. In other worlds the settings for these photos a user made retain only for non-members of the group but have no influence on group members. Additionally if the group is private only members of the group can sec the added photos, but if the group is public the photos become public as well.

4.3 Comparing Privacy Preferences on Social Platforms

As one can sec there is a large variety of privacy settings concerning the visibility and access to personal content which is a key aspect for the Social Web applications to differentiate themselves from each other. In the following the different privacy settings arc analysed and compared.

4.3.1 Levels of Trust for Data Disclosure

To control the visibility and scarchability of content and the access accordingly, the world of a user is divided into different sets of people based on different levels of trust and the established relationship towards other users. Most privacy settings apply access rules on such previously defined sets of users. The different levels of trust for disclosure arc: the profile owner, ‘friends’ groups, non-friends who belong to the same network, friends of friends, logged-in users and non-members.

Level of Disclosure: Profile Owner

The highest restriction level is to conceal information from everybody but himself. This is mostly the ease when dealing with contact information as it is on Faecbook. On Flickr only the email address can be hidden from anyone as it is the only manda­tory information. Important for privacy preservation is the ability to disable the tagging and commenting features for everybody but the user himself to avoid inap­propriate or even reputation damaging comments and photos. This is the ease on Flickr. Whereas Faecbook, despite the ability to hide any photos where the user is tagged, docs not permit to disallow others from tagging. The only possibility on Faecbook to avoid unpleasant tags is to hide all tagged photos from the public.

Level of Disclosure: Friends

The next level is to restrict the visibility of some content to a specific group of people often called friends, who the user can select on his own. The concept of ‘friends’ which is often used in this context has a different meaning as in real life. It simply implies some sort of connection to the person who is added as a friend; this connection can be a close real-life friendship or just someone whose updates arc of interest for the user. Because social network friendship is a very loosely defined concept, a partition of this set of people is of utmost significance for the privacy but also to prevent information overload. Situations like having to add the boss to your friends list can not be avoided but privacy problems, which can arise when different social parts of life arc mixed, can be evaded if one can divide the friends list into different subgroups and adjust the privacy settings accordingly.

Twitter On Twitter only binary relations arc possible, that is friend or not friend. Furthermore as friendships do not need to be mutual or accepted everybody on the platform can simply decide to make a user his ‘friend’ by adding him to the ‘following’- list. The only possibility to avoid this, is to make the profile private, but the problem of a user’s followers reading all his updates no matter for whom they arc intended still remains. Another possibility is to remove a follower from the list, but this can also lead to awkward situations if the removed person is known in real life like the boss.

Flickr Flickr offers a more fine-grained way to deal with ‘friends’. Although friendship is also un-rcciprocatcd and no consent is needed the friendship docs not offer any privileges unless reciprocated, if say user Bob is added as contact by user Tom, this docs not give Tom any advantages unless Bob reciprocates the connection. The only problem that can arise, when Tom adds Bob as a friend, is the link to Bobs profile that appears on Tom’s homepage. Such link indirectly connects Tom’s content to Bob’s social identity, which can have undesired consequences in ease Tom publishes inappropriate photos for instance. Moreover such a link can not be removed on Bob’s part.

Once Bob adds Tom as a contact he can further distinguish the relationship by adding Tom to his friends list and/ or family list or just keep him only as a contact. No further distinctions arc possible, which makes it hard or even impossible to separate close friends from colleagues, close family members from distant relatives or baseball friends from the art workshop friends. Nevertheless, it is still possible to create a set of photos which is designated for only family members to sec.

Facebook Finally, there is Faecbook offering one of the most flexible features for creating subgroups of friends lists on nowadays Social Web applications with reciprocated relations. A user can define his own sets of people such as “Friends from High School”, “Baseball friends”, “Close Friends and Siblings” and “Family members”. Important is also that one user can belong to several such groups like a sister who can be added to the set of “Close Friends and Siblings” as well as to the set of “Family members”. This flexibility allows to adjust the settings so that for example party photos arc disclosed to friends and siblings but not to other family members.

Level of Disclosure: Non-Friends on the Same Network

The next level of diseloser concerning non-friends in the same network is Faccbook- spccific and allows the partition of users by geographical location or an establishment such as university or workplace. Being member of a network allows any other mem­bers to see the content if not otherwise restricted. This network feature is useful when dealing with “Social Software annoyance” and “personalisation of information supply”. For example a user can restrict his Wall posts to the German network be­cause of the updates being in German language. Nevertheless, this feature is not detailed enough to be useful for privacy issues. Other Social Web applications do not provide such a feature.

Level of Disclosure: Friends of Friends

Another category of people is the “friends of friends” category, which is neither used in Flickr nor in Twitter for privacy settings. Facebook on the other hand belongs to the applications that use this level of trust for privacy decisions about disclosing content. Nevertheless, as people sometimes have over hundred of friends and if all their friends are also taken into account, the amount of indirect second-degree relationships becomes immense and uncontrollable and therefore not particularly suitable for privacy restrictions.

Level of Disclosure: Logged-in Members

The second last category includes all registered and logged-in people to allow a users, who wants to keep his content public, to restrict the visibility only to known and registered users. On Twitter this restriction is installed as default and cannot be changed, allowing non-members to read public profiles. On Flickr a user can decide whether his private information as well as his photos are disclosed to non-members on the internet as well as on Facebook where a user can explicitly create a public viewing of his profile which can be seen by non-members and indexed by search engines.

Level of Disclosure: Features not Used for Privacy

Other features like groups, which is available on most Social Web applications, or events on Facebook are not used for privacy restrictions on any Social Web ap­plication although this distinctions based on interest or other affiliations can be a powerful technique especially to avoid information overload, but also for access con­trol. Furthermore there is another set of people which is not taken into account on any application. This set includes users who are not friends but with whom a user has had some sort of contact, be it chatting, tagging the user or being tagged by the user, sending a message or poking[26]. Other activities could be included into the pri­vacy settings as well such as location of a user, time zone, language, interests stated in the profile, comments written on someone’s Wall or blog. All this different data which characterises the people and their connections can be an important part in the access control mechanism of a Social Web application and are therefore exploited in the approach of this thesis as described in the next section.

4.3.2 Network Features and their Protection

After analysing the different levels of trust on a Social Web application it is also important to have a closer look at the different features offered and how they arc protected from privacy violations.

Protection of the Profile

First and foremost the profile itself is the fundamental feature which includes all the private information a user is willing to provide. A profile on Twitter has a much more limited provision of personal data in comparison to other social networks, whereas Faecbook profiles offer a lot of personal and identifiable information which is rarely available on other networks. An also important role plays the identification of a real person on the Social Web, in other words the ability to find the connection between a user’s profile and the real life person. Both Twitter and Fliekr identify a user with a distinct screen name, giving the user the choice between staying anonymous or providing a real name. Whereas on Faecbook providing a real name is not only encouraged but also mandatory, because the name is used as the primarily identifica­tion on the platform. Therefore on Faecbook there is no pscudonymity as it is partly the ease on Twitter and Fliekr. Consequently privacy on Faecbook plays an essential role to protect all the personal information, which is often provided in detail. On Twitter and Fliekr all the personal information remains public and cannot be hidden from specific people with the exception of the email address on Fliekr. Although these platforms have an open nature and arc partly anonymous information like ge­ographical location, relationship status or any other information provided should be able to be hidden from public viewing if favoured.

Protection of the Friends List

Another central feature social interactions arc based on arc the friends lists. The disclosure of friends lists is a predefined setting in all Social Web applications and only a few of them allow to hide the list. On Twitter and Fliekr such option is not provided, but at least on Fliekr only the contact list of a user is disclosed on the profile. The list of people who have added the user as a contact is intended solely for the user. Whereas on Twitter both lists, the ‘follower list’ and the ‘following list’ is viewable by everybody. Faecbook uses a different approach. There a user can decide who is allowed to sec his friends list. A more fine-grained feature is not offered, such that would allow to decide about the visibility of the subgroups a user has created or of each friend individually. It would definitely be useful to keep a certain relationship private or for instance publicly displaying the friends group but protect the family group from others.

Protection of Implicit Information

The following features have one thing in common; that is the disclosure of information about a user or related to a user provided by others. How much control a user has over this information is discussed in the following.

Protection against Uncontrolled Tagging The Photos sharing component is installed in the most Social Web applications together with the feature of tagging. On Twitter no photo sharing and no tagging is provided but a similar technique is the mentioning of a person in the updates. Thereby any person on Twitter can add a link to another user in any context which the mentioned user might not approve of but has no means to control. A possible way to gain control could be to install restrictions about who is allowed to mention a person or to use prior consent. Tagging on Flickr is a feature used in a different way than on social platforms like Facebook. Because the platform is built primarily around photos and not social networking, the tags are mostly keywords describing a photo’s motif but can also include people’s names. Yet such tags do not place a link to the mentioned user’s profile. On Facebook such tags are solely there for identifying the subjects on the photos. Although tagging a non-friend will not create a link to his profile, tagging a friend will do. The tagged person is only notified afterwards, when the deleting of the tag can already be to late, because once a link on a photo is created, this activity is distributed to all the friends who are allowed to see the photo due to the News Feed feature. Although there is an option to hide all photos tagged of a user, the most users would possibly prefer to have some of these photos available to others, therefore a more fine grained privacy control would be of benefit. For instance a user should be able to define which types of people are allowed to tag him like only close friends and family can tag him, but not his other friends whom he barely knows or say the ‘Semantic Web’-group can tag him as well but not his ‘Party all night’-group which has the higher potential to post embarrassing photos of the user. But the most desirable feature needed is to have the subject’s prior consent before the creation of a tag. The same should be applied to non-members of Facebook who can be tagged via the email address.

Protection of the Group Feature The group feature which is a common component in most Social Web applications contains information about it’s members and can also have a negative impact on the users reputation as it discloses interests, preferences and other personal information. But not only the information posted within the group, which can be modified and extended by any member, but also the group name can be explicit enough to cause problems such as Ί hate my job’ group on Facebook. On both Flickr and Facebook there are different types of groups; groups that are public and groups that are private. Private groups or secret in case of Facebook do not appear on the profile pages of the members, but such a restriction level is set up by the administrator of the group and cannot be changed by other members of the group. The same goes to the visibility of the content which can only be determined by administrators. Additionally Facebook allows to forbid a notification to be published on the Wall if a group is joined but this privacy setting applies to all groups in general and cannot be individually changed for specific groups.

Protection of the Information-Distributing Feature The public messag­ing and distribution feature often leads to undesirable spreading of information and often causes not only problems but also information overload. On Flickr only recent photo uploads of a user’s contacts are published on his profile, in case the user is allowed to see them. On Twitter where the upload and subscription of uploads is the major functionality every post is being distributed to all followers of the user. Twitter offers no filtering options like who can sec which of the user’s updates. The Wall on Faecbook collects any activities and uploads done by a user who can restrict the view of the Wall to specific people. Additionally, some actions on Faecbook such as joining a group can be forbidden from being posted to the Wall. Moreover one can decide whether the user’s friends arc aloud to post anything to the user’s Wall or not. The News Feed feature that made a great stir when it was first launched takes this a step further and distributes all this aggregated information to the friends of the user in the typical manner of Twitter. People have to be careful what they post, which information in the profile they change or which built-in applications they use. Because if this information has not been forbidden from being posted, a notification will not only appear on the Wall, where a few of the friends might sec it if they decide to visit the user’s profile but it will be widespread to any friends who arc allowed to sec the Wall and henee draw their attention to this notification. As for the information overload, Faecbook offers filtering options for the News Feed, so that a user can filter out posts from everybody but a specific group of friends or according to the type of information, for example a user can hide anything but photos. Further filtering options according to some attributes like a specific time or particular topic of the content is not possible.

4.4 Summary

Summarised some Social Web applications provide privacy settings with a high level of granularity like Faecbook whereas other platforms provide very limited privacy options allowing the profile to cither be private or public as it is the ease on Twitter. In general one can say, that Twitter belongs to the platforms with the most open nature out of the Social Web applications. Flickr docs have more options but is still very public sharing and open in comparison to Faecbook which offers protection mechanisms no other major competitor provides.

Nevertheless all the privacy preferences still remain to be a limited, fixed set of options which arc only partly extendable and mostly they arc not adjustable to the users’ needs. The same goes to the filtering options to avoid information overload which arc very simple if provided at all. Moreover no external sources of information can be included into the privacy decisions as all applicaions confine themselves to their internal data, even though nowadays, people’s web presence goes far beyond one Social Web application. This lack of external data integration diminishes the possibility of flexible and user-adaptable privacy preferences.

5 Policy Reasoning Based on Social and Semantic Web Data

As shown in the previous sections privacy preferences arc important for the Social Web but the settings provided by the Social Web applications arc not enough to accommodate the users needs. Using privacy policies to preserve some sense of control over a person’s online identity, his social network and the person’s content is an approach to solve the problems of nowadays shortcomings in terms of privacy.

5.1 Requirements for Policies on the Social Web

As the Social Web is very complex and dynamic, so arc the needs for privacy policies of individuals like Bob from the motivation scenario, who may have to face all kinds of situations where policies might be needed. For the proposed solution of privacy control on the Social Web the Protune framework and its corresponding logic-based policy language is used for automated policy evaluation and reasoning process based on the conclusion drawn from privacy policies.

In the following it is shown, that the Protune language is a good choice to provide a suitable solution for privacy preservation which can be adopted by the users and applications on the Social Web because it offers features needed to accommodate the requirements for policies on the Social Web.

- The Protune language has a well-defined semantics to avoid ambiguity and is dynamic to easily accommodate any changes and updates of the behaviour specification of the framework.
- The kev for the privacy-based access control on the Social Web is the ability to incorporate data from various information sources on the Web. Protune allows to query, include and combine such data from external sources into its policy speeifieation and evaluation process.
- The data incorporated into the framework can be of any kind of Social and Semantic Web data. For the Social Web applications useful data will be further analyzed in the next subsection.
- The Protune framework further offers some advanced features, which are par­ticularly important for the Social Web, to ease the policy-based access control; these features include:

Advanced Policy Explanation Mechanisms where policy and system de­cisions arc explained especially if the access has been denied. For better usability it is preferable to explain why access was denied, instead of just stating “access denied”. For example Bob’s friends would rather prefer a message like “The slides arc only for German speakers” instead of a simple “Access denied”. Thereby users can understand policy decisions and know how to acquire permission to access a service [26].

Policy-driven Negotiation In some situations policies need to be private themselves and only disclosed to the other party if some specific condition has been fulfilled. This is often the case in sensitive business rules, but also in the Social Web applications such scenarios can occur. An example would be, if Bob wants his pictures to only be disclosed to people, who are close friends of Bob. If Alice tries to access the pictures and she does not belong to the group of close friends defined by Bob, she may be offended by that if the policy is publicly available. To avoid such situations a negotiation of several steps is needed, where two parties stepwise disclose information until they reach a common agreement. [26]

Controlled Natural Language To ease the adoption of a policy language, the language should be understandable for common users, so that they can specify or personalise the policies to their liking. The Protune language is based on logic programming, which is not easy to understand for people who are not familiar with this. To solve the problem of usability of policies, Protune offers the controlled natural language [23] for policy rules. Users can therefore define policies in English language (using controlled natural language to avoid ambiguity) which are then mapped into the Protune syntax. [39]

All in all Protune is one of the most complete languages available [40, 41] and can therefore serve as a flexible and dynamic control of a Social Web application’s behaviour allowing the users to specify their own policies according to some self- defined conditions. In the following the policy-based access control for the Social Web is presented with emphasizes on what and how data can be incorporated into the framework.

5.2 Social and Semantic Web Data for Policy Specification and Evaluation

The following subsection describes what types of data are useful for policy decisions and how these data can be combined and incorporated into policies.

5.2.1 Types of Social Data and their Availability

Proprietary Social Web Data

- Social Web data encompasses any data found on a Social Web application.

- It can be personal information provided by a user.
- User’s social network: relationships established towards other users.
- User-generated content like public messages, comments, photos etc.
- Data produced by being active on the Social Web application, like tagging a photo or becoming member of a group.

- Some of the data is publicly available and can be accessed by an external application using the open interfaces, which are made available by many Social Web applications.

- Data which is private can only be available via an open interface, if a user gives his consent to access his data.

- For this thesis in particular, the Protune framework is extended with a wrapper to access the Twitter API as a demonstration of querying a proprietary API of a Social Web application, see Section 6.

Semantic Web Data

- There are several databases on the Web providing a huge amount of semanti­cally annotated data.
- This data can be of various content, such as information about publications and their authors, countries, popular places and people, music, films, companies, sports and many more.
- This data is freely accessible and can be queried via SPARQL endpoints.
- To cover that kind of data in this thesis’ solution another wrapper is imple­mented to query SPARQL endpoints; in particular, the DBpedia and the DBLP endpoint, see Section 6.

Social Semantic Web Data

- Recently a lot of Social Web data is being provided in Semantic Web standard format, for example using FOAF and SIOC to model the data.
- Besides creating FOAF files describing oneself and people a person knows in­dependently from any Social Web application, there are also the so-called ex­porters that create FOAF files about a user of a Social Web application using his profile.
- An example exporter is the Flickr exporter generating an RDF file. This file contains public data of a Flickr profile including relationships towards other Flickr users as well as memberships in groups.
- A constrain of such exporters is that only public data can be accessed and used and therefore some information cannot be exported.
- For this thesis a wrapper is implemented for RDF files in general and for the FOAF files generated via the Flickr exporter in particular, see Section 6.

5.2.2 Using Social Data to Define New Concepts

Once the sources are queried and the desired data is selected, it can be combined to specify fine-grained policies to the user’s liking. To case the process of policy specification a user can define concepts describing the conditions a requester needs to fulfil to get access to a resource for instance. These concepts can contain any arbitrary conditions defined by a user according to his wishes and needs. This is a much better alternative to selecting out of static, pre-defined sets of people or a manual specification of each person individually as it is the ease on nowadays privacy settings. [23]

Definition of Concepts As the audience, for example the visitors of a profile, is cither unknown or very multifaceted general groups need to be defined to be able to automatically categorize people accordingly. Generally speaking a concept can look like the following: “Conditions the Requester needs to fulfil”.

To formalize the concept to implement it in a Protune policy, one can define the predicate CondiiionsToFulfilQ with additional predicates that describe what is meant by the concept. To do so, the following rules can be defined:

illustration not visible in this excerpt

ConditionsToFulfil(Person) is true if at least the conditions conditionij hold, whereas for an i e{1,..., m} there has to be a j with j = 1,...,n, in other worlds the predicate ConditionsToFulfil(Pcrson) is true, if at least for one of the rules, the body is true. Additionally, conditionkl(Person) with k,l > 1, can be another concept similarly defined, following the standard logic programming semantics. An example of a concept definition is given in Policy (6):

illustration not visible in this excerpt

Such concepts incorporate any changes on the web without the user needing to explicitly change the concepts. What changes can be automatically identified by concepts arc listed below:

- New person enters a set of people, such as adding a new colleague to the colleague-list on Faecbook.
- A person is deleted from a set of people.
- Adding new data on a profile can lead to giving the profile owner, who is also the requester, access if this added information is part of to the required conditions.
- Activities which are registered by the application after the concept was created. An example would be if a person, after entering a group which has an exclusive permission to access specific data, is allowed to view the data as well.
- Generally speaking any information which changes, is being added or deleted is factored into the decision process of Protune, if it is used in any of the conditions of the policies which are being evaluated.

After defining all the concepts needed, they can be used for specifying policies. In doing so, concepts can be used in the bodies of policies using the same standard Logic Programming semantics like it is the case with concept definitions, as shown in Policy (7).

illustration not visible in this excerpt

5.2.3 Enforcing Policies upon an Application

Once policies are specified they can be applied to a resource or any other data of a user on the Social Web application to control what types of people can see which kind of information provided by the resource holder.

To evaluate policies correctly some type of identification of the requester is needed, depending on the condition part of the policies in question. In the pro­posed solution of this thesis, the types of identification properties which might be needed are listed below:

- The FOAF URL or real name of a requester to find out whether he is added as one of the contacts in the personal FOAF file.
- The Flickr user id of the requester, in case the Flickr exporter is used.
- The real name, the DBLP URI or the homepage of the requester if the DBLP database is used.
- The Twitter username of the requester in case Twitter is used.

In some cases it is sensible to identify a user of one application as being the same person as a user of another application. Merging identities can be tricky if no definite properties are provided on both sites, such as the real name which can be used to identify the person. FOAF profiles intend to remedy this problem. FOAF

illustration not visible in this excerpt

Figure 6: An extract from a FOAF file merging different Social Web identities.

exporters, as they already exist for Fliekr, Faeebook and Twitter, create files that represent the respective profile. A main FOAF file can link to the other generated profiles by using the property owksameAs and therefore interlink different Social Web applications and the identities created on them. The example in Figure 6 is an extract from a FOAF file merging the different identities on the web as presented in [22] where this aspect is discussed in more detail.

After presenting the benefits of the Protune framework, the available data on the Web and how the framework can be used for including this data into the policy speeifieation process the next subsection shows how this approach can help Bob from the motivating scenario in controlling access to his information provided on the Web.

5.3 Taking up the Motivating Scenario

Getting back to the motivating scenario presented in the beginning of the thesis, Bob uses various policies on his own website to restrict access to his resources as well as to offer personalised disclosure according to the people’s interests.

As Bob’s website can be visited by anybody on the internet, some concepts need to be defined, which categorize people according to some specific properties, in order to enforce policies and therefore disclose information only to people intended for.

(SI) For example, Bob provides his contact information. He defines in a policy that any of his colleagues, his students but also his friends can sec this kind of information. A colleague can be recognised as such by providing some digital proof like the membership in the workplace-network on Faeebook which must be the same as Bob’s. If another colleague joins this network, the policy docs not need to undergo any adaption as it automatically recognizes the new colleague. The same applies to the friends who arc allowed to sec the contact information. Bob decides that the concept of friendship can remain relatively unrcstrictivc, including all his Faeebook, Fliekr and Twitter friends.

illustration not visible in this excerpt

Additionally, the predicates like isMyFriend(User) need to be further defined, as such friends can include Fliekr, Faeebook and Twitter friends for example.

(S2) To sec Bob’s phone number, being just a friend on a Social Web application is not a sufficient trust level, therefore Bob entitles only “family and close friends” to view the phone number. For this reason the more fine-grained differentiation on Faecbook can be of benefit as any user who is added to Bob’s “close fricnds”-list has the permission to sec the information in addition to people on the family-list on Fliekr. Further on the people Bob named in his FOAF profile with the help of the foaf:knows property enjoy a relatively high level of trust and have therefore the right to also sec private data such as the phone number. So to implement the concept of “family and close friends” in Protune, one can define the predicate familyAndCloseFriends^) with additional predicates that describe what is meant by the concept, as seen in Policy (9).

illustration not visible in this excerpt

Note, that only one of the condition parts need to be fulfilled, that is a person can be a close friend on Faecbook or a friend in the FOAF profile, but not necessarily both. The policy which uses the new concept looks like the following:

illustration not visible in this excerpt

(S3) Once the concept of “family and close friends” is defined, Bob can use it for any situation where he might need it. Therefore, when Bob uploads his holiday pictures, he can also use this concept to define, who is allowed to sec the photos. Some of his holiday pictures which arc less private, the scenery images to be precise, can be seen by the group “landscape of Southern France” on Fliekr. Therefore a more fine-grained policy is needed, such as seen in Policy (11).

illustration not visible in this excerpt

This policy consists of several rules, which have the same goal to allow access to a photo. For the evaluation of the policy to be successful, at least in one of the rules the conditions in the body need to be fulfilled, as the rules represent a disjunction similar to the Prolog language.

(S4) Further on Bob discloses his work-specific information only to his colleagues who can be identified by their membership in the workplace-network on Faccbook and updates about research projects can be seen additionally by his co-authors extracted from the DBLP database, see Policy (12)

illustration not visible in this excerpt

(S5) Besides using such concepts for privacy Bob can apply them to personalise his content representation according to his readers’ interests.

Bob’s frequent posts about baseball only interest a small number of his friends, that is why he defines the concept of “friends who arc interested in baseball” to save other people the time to pick out the updates they arc actually interested in. Therefore Bob decides to show his baseball updates only to his friends on Flickr and Faccbook who arc also in the baseball-group on Faccbook. To allow anybody in the baseball-group on Faccbook access to his content would not be restrictive enough as Bob has no relation to the many members of the group. That is why the policy for this concept needs to merge the different social network identities in order to recognise friends with interest in baseball. This part of the policy can be defined like the following:

illustration not visible in this excerpt

This policy contains the difficulty of recognising a Flickr friend as a person who has also a Faccbook profile and is in the baseball-group. In such eases enough personal information needs to be provided by the person in question like his correct real name on both Social Web applications for recognizing him as the same person.

(S6) The concepts explained above arc all about categorizing people according to some interests they share, but Bob also uses policies for research and job purposes. Bob, being a lecturer, has prepared some presentation slides for his seminar. As he needs some professional opinion on the slides, he defines a concept “experts on the Semantic Web”, that includes all people mentioned in his FOAF profile, all his colleagues and people with whom Bob has collaborativcly written publications and people having blogs about ‘Semantic Web’. The policy defined for this situation is presented in the following:

illustration not visible in this excerpt

Ftirthcrmorc as theses slides are in German language he only wants people to read the slides, who master this language, which is represented with the predicate isGermanSpeaker() in Policy (14). Such information can be extracted using differ­ent ways. Either the person has provided a location or hometown in his profile on a Social Web application such as Twitter, Fliekr or Faecbook or the location can be included in the FOAF file of the person using the property vCard:Country of the standard format vCard[27], which is designed to model personal information. Once the home location is found out, the DBpedia database can be used to extract the loca­tions where German is spoken; to be exact the category ‘German-speaking countries’ with the property skos:subjcet. Of course such techniques do not include people who have learnt the language in school for example, but because of the importance of the slides, Bob needs only native speakers’ help.

In this manner Bob proceeds defining concepts and creating policies for any situation and combination of data.

When using Protune for the privacy protection of Bob’s website, the access con­trol and the filtered showing of information is done automatically by the framework. Bob only has to prepare a set of policies, the resources which need to be protected and maybe some other facts to be used for the reasoning and evaluation process, such as his own usernames of the various Social Web applications he uses. There is also a number of metarules to be added, which state the types of predicates or which predicate is private or public, meaning which predicates arc allowed to be disclosed to the other party.

If Tom, who also needs to have prepared a set of policies on his own, wants to access Bobs resources, a negotiation is initiated, where the policies provided by both, Bob and Tom, arc evaluated until cither Tom’s policies can prove Tom being authorized to access the requested resource or the request fails and the access is being denied. In that ease Tom gets an explanation, why access was denied.

Overall Bob is not only able to protect his resources from being seen by unwanted parties, he is also able to make it easier for the visitors of his website to find the information they arc seeking.

6 Implementation

6.1 Retrieving Heterogeneous Information

This section describes how Social and Semantic Web data arc retrieved in this thesis and how they arc included into the policy evaluation process of Protune.

6.1.1 Retrieving Social Web data

In the era of Web 2.0 a number of Social Web services started to open their site’s technological interfaces. These application programming interfaces (APIs) allow ex­ternal developers to access, export and reuse the data. An application which wants to use an API of a Social Web platform, needs the permission of a user to access his data and to use the platform and perform actions in the name of the user. The most important technologies and formats used by APIs include HTTP, REST, XML, JSON, RSS and Atom.

Hypertext Transfer Protocol (HTTP) The Hypertext Transfer Protocol (HTTP) is a protocol for information exchange on the web and is used to manipulate or retrieve any data from an API via the so-called request methods. The methods used thereby arc GET (to retrieve data), POST (to submit or change data) and DELETE (to delete data).

Representational State Transfer (REST) The Representational State Trans­fer (REST)[28] is an architectural style for web services, typically found in Web 2.0 applications. The communication of such services is HTTP-based using the above presented HTTP methods.

Data Exchange Formats The results of a method arc mostly available in one or more of the following data exchange formats:

- XML: Extensible Markup Language (XML) is used to transport and store data, in a hierarchically structured format similar to HTML. Data described in XML can not only be easily read by people but also by machines enabling an easy way to share information between different incompatible applications.
- JavaScript Object Notation (JSON)[29] is an alternative to XML which is based on a subset of the JavaScript programming language and is also used for data exchange.
- RSS and atom arc syndication formats often used for syndicating information such as news or changes on websites.

Each Social Web application provides a site-specific proprietary API. In the following the API of the previously presented Social Web application Twitter is introduced. This API has been used to implement a Twitter Wrapper in Protune for this thesis to demonstrate how Social Web data can be integrated into the Protune framework and used for policy reasoning.

Twitter API

The Twitter API conforms to the principles of RESTful systems and the com­munication with the API is HTTP-based. The API allows to use the three HTTP requests GET, POST and DELETE and the response is returned in one of the following structured formats: XML, JSON, RSS and Atom. As my approach for policy-based privacy control only needs to extract specific data the methods POST and DELETE which are used for manipulating data are not needed.

Search API and REST API According to the Twitter Wiki[30] the Twitter API is separated into the Search API and the REST API which differ from each other in their URL structure of the method calls. Nevertheless the methods from both can be mixed and combined to produce new applications. The Search API allows applications to use the Twitter search, thus allowing to search for public updates based on their author, a user mentioned in a speifie update, some keywords, dates and other characteristics. The URL structure looks like the following:

http ://

The REST API allows to do anything a registered user can do on Twitter, provided the application can authenticate itself on Twitter with valid credentials. The REST API methods have in most of the cases the following URL structure [42]:

- The method_categorv states the category the requested method belongs to, as the Twitter API Documentation [43] has distinguished the methods into groups such as user methods and friendship methods.

- The method^name is the name of the actual method such as friends or follow­ers. A few example API methods which require a GET request are listed in the following:

- https ://
- https ://
- https ://

Accessing API Methods There are methods that do not need an authentication, because they use publicly available data, such as getting the public timeline, a specific user’s timeline if it is not private, a list of a user’s friends and followers and so on. If private information of a user is needed or any other actions that require a login, the application needs a user authentication to prove its right to access the private data. If the credentials arc missing or arc not valid the response is an error message along with the 401 HTTP status code: Not authorized.

Authentication Methods The easier but also the more insecure way is the HTTP basic authentication where a valid username and password, which have to be send as part of the HTTP request, arc needed. The other more secure way to gain access to private data is the usage of the OAuth authentication which has been integrated into the API only a few months prior to this writing. OAuth is an open protocol that allows an application to access private data of a user with the user’s prior consent but without the user having to expose his password.

Registration and Authentication Process with OAuth To be able to use OAuth an application first needs to be registered on Twitter providing a name and a description of the application. Additionally a developer has to decide between a desktop and a web application which leads to minor differences. And last but not least the developer has to decide between a read-only or read-write access. To use Twitter for the Protune framework, Protune is registered as a desktop application with read-only rights having the application name ‘Qucrying_With_Protunc’.

Once the registration process is finished a Consumer Key and a Consumer Secret arc generated. Both arc important for the authentication process and arc unique for each application. The next phase is to let a user give the application his permission to use his account. For a desktop application this process is described in the following:

1. First the application asks Twitter for a request token by sending the Consumer Key and the Consumer Secret to Twitter responds by generating a Request Token and Token Secret and sends them back to the application. These keys need to be stored for later use.
2. The next step is to direct the user to the URL authorize?oauth_token=generated_oauth_token where the user, who has to log in to Twitter first, approves the application’s request for access. After the approval a 7 digit PIN is displayed for the user to copy and to pass to the application.
3. Lastly the application sends the Consumer Key, Consumer Secret, the Request Token, Token Secret and the PIN to and gets an Access Token and Token Secret in return. These values need to be saved for later API calls on behalf of the user who has granted access to his profile replacing the otherwise needed username and password.

illustration not visible in this excerpt

Figure 7: Consumer Key and Consumer Secret for the Protune application on Twitter.

An Access Token does not expire and can be used as long as the user does not revoke the permission to use his profile which he can do in his settings on Twitter. Figure (7) shows an example Consumer Key and Consumer Secret for the Protune application together with all the important URLs needed for the authentication process. Once the authentication process has been completed, the application can use Twitter on the authenticated user’s behalf without having to ask the user for credentials every time.

Identification of a User To get data about a specific user, an identifying property is need, which is then included into the request message together with the method name. The obvious identification parameter is the e-mail address but this information cannot be fetched through the API as it is also not provided in a Twitter profile. Therefore one can use the unique username associated with the Twitter account of the user in question or the corresponding integer ID. Such parameters are needed if one wants to use methods such as ‘get the followers of a user’, or ‘has a user blocked another user’ and so on.

6.1.2 Retrieving Social Semantic Web Data

Flickr Exporter

As already announced in the beginning of this section in Protune Flickr in­formation is not retrieved via its API but via the RDF file of a user generated by the application called ‘Flickr exporter’ [44] which can be called via the URL http ://

The Flickr exporter allows to export Flickr profiles using the FOAF and the SIOC specification [20, 45] but only public data can be exported. This resulting RDF file contains personal information about the user provided in FOAF format:

- the name,
- the email address represented as foaf:mbox_shalsum of a foaf:Person which allows to expose the email address in a coded format,
- contacts of the user in Flickr using the foafiknows property and including the URL of their own RDF file for Flickr, so that one can go through the social network of a user like it is also possible on Flickr itself.

Further on to express the community information found on Fliekr the SIOC vocab­ulary is used, including the URL of the user’s Fliekr account, the URL to the user’s Fliekr image gallery and all the groups the user is member of. The general URL structure for generating a user RDF file looks like the following:

The uscr_id needs to be replaced with the actual user ID on Fliekr (which is not the screen name). Once a RDF file is created, Protune uses SPARQL to query the file for particular information which is needed in the policy body part.

SPARQL Endpoints

To query a dataset like the DBpedia’s open SPARQL endpoints arc pro­vided, which can be used to query the database via the SPARQL lan­guage, as the information is provided in RDF. Thereby any possible query can be constructed based on the ontology of the database used. The endpoint for DBpedia is and for DBLP it is http ://

Besides the general SPARQL endpoint Wrapper there arc two Wrappers imple­mented in Protune; for DBpedia and for DBLP.

6.2 Wrappers for External Information Sources

The diagram in Figure 8 gives an overview over all the Wrappers implemented for this thesis and how they arc related to each other. In the diagram only the most important methods of the classes arc shown and the return values as well as the parameters arc omitted so that the overview docs not seem to overloaded.

Execution Handler

The Protune Framework has a complex architecture consisting of several com­ponents each being responsible for a different task in the policy specification and evaluation process. For this thesis the module called Execution Handler is impor­tant as this module is in charge of executing actions and package calls, which arc specified in the policies. [46] There arc already a number of implemented packages such as the RDBMS package which retrieves data from a relational database by ex­ecuting database queries, others provide access to RDF stores, file system requests and there arc also time-aware and location-aware packages. [47]

Adding a New Wrapper into the Protune Framework

One of the outcomes of this thesis arc new packages that execute queries to remote information sources on the web. These packages, also called Wrappers, extend the abstract class AbstractExecutorWrapper, implementing the method

illustration not visible in this excerpt

Figure 8: UML diagram of the Wrappers for external data.

localExecuteAction among others. Each Wrapper has a package name and pro­vides a number of functions, both arc needed when making a call from a policy in the form of

package _ N ame function ( argument list).

The localExecuteAction is the method which contains the execution code of the Wrapper. The input parameter of the method is of type ActionRequest, which contains a function name implemented in the Wrapper, an array of arguments, needed for the invoked function and an array of variables, for which a binding with the return values after the action’s execution will take place.

After execution a result of type ActionResult is returned bv the method. Whereas the ActionResult contains a Boolean value, the ExecutionResult, and the ResultSet. The ExecutionResult specifies whether the execution was successful or not, regardless of whether a variable binding has been returned. If the execution of the action produces a set of results as answers to the query, the ResultSet is con­structed, containing the variable bindings otherwise an empty ResultSet is returned. During the execution following exceptions may arise:

- NoSuchFünctionExccption, if the requested function is not available in the package,
- Illegal Argument Exception, if any of the arguments arc not valid,
- FunctionFailurcExccption, if an internal error occurred during the execution of the action.

Additionally the init() method is needed in each Wrapper for initialisation of the Wrapper. After the Wrapper is created, it needs to be added to the configuration file with some additional information, [48] can be used for further reading on this part.

Using RDF and SPARQL in Java

As Protune is written in Java, to be able to use RDF graphs and query them with SPARQL additional libraries arc needed. The Jena[31] library is a well-known Java API for RDF and therefore used in Protune. With Jena RDF models (the Jena term for RDF graphs) can be created and manipulated. To query Jena RDF models the library ARQ[32] is used. ARQ has implemented SPARQL with which information can be selected not only locally but also remote requests arc possible.

6.2.1 IN-Predicate

The key feature of Protune for this thesis is to query and include external data into the policies. Therefore the provisional in-prcdicatc is used, which allows to retrieve information from external packages; the already presented Wrappers. Then the Pro­tune framework can automatically query the external system to receive the required information. An example for the in-prcdicatc using a database as the information source to decide whether a File is research-based or not can look like the following:

illustration not visible in this excerpt

The external package in Policy (15) is called rdbms. This package retrieves data via the SELECT-statement. This wrapper provides the function query() which needs two arguments; the first includes the SELECT-statement and the second includes a given database, here the database is called “db_uscrs”. The results arc bound to the return variable File and for each binding a new fact is added to the knowledge base, such as “in([‘ScmanticWcb.pdf’l, rdbms: qucry(“SELECT file FROM research”, “db_uscrs”))”, in ease ‘SemanticWeb.pdf’ belongs to the set of result variable bind­ings. This fact can be used for further evaluation and reasoning processes. The negation of the in-prcdicatc is also possible in Protune. The syntax for negation is “not in(... )”.

6.2.2 SPARQL Endpoint Wrapper

The SPARQL Wrapper is one of the newly implemented packages which enables the import of semantic data in RDF format via the SPARQL language. To use the SPARQL endpoints of Semantic Web databases, the ARQ library is added as it allows remote queries.

The SPARQLEndpointWrapper is the Wrapper that allows to query any given endpoint, which needs to be passed as an argument when creating a policy. The query method implemented in the Wrapper processes only SELECT-qucrics, as they arc the ones needed for privacy policies. The query method expects three input parameters; the SELECT-statement, the endpoint URL and the number of the output variables to which the results arc bounded, which is why there cannot be more variables than the number of SELECT-variablcs in the query.

When creating a policy the in-prcdicatc, introduced earlier in this section, is used. The package name for the Wrapper is called sparqlEndpoint and the function name is query. A possible policy using the Wrapper can look like in Policy (16), which calls the package sparqlEndpoint. The fonction query() selects people who were born in Berlin from the DBpedia database, as stated in the second argument, and bounds the results to the variable Name.

illustration not visible in this excerpt

6.2.3 DBpedia Wrapper

The Wrapper for querying the DBpedia database via an endpoint is called DBpedia Wrapper and it is a subclass of the SPARQLEndpointW rapper. Similar to the parent class it offers a method for the user to construct an own SELECT-query to execute against the DBpedia endpoint using the method query of the parent class. Further on a method is provided cheeking if a given URL belongs to a company with the location being in Germany. This is just an example that serves as a model for any other methods to be implemented in the future. The package name for the Wrapper is dbpediaEndpoint and the function name for general queries is called getQueryResult.

The function isCompanyFromGermany() only expects one argument, namely the URL of the homepage in question, as seen in the following example. The function constructs a respective query and sends it to the general method for executing the query:

illustration not visible in this excerpt

The Wrapper for the DBLP endpoint is also a subclass of the SPARQLEndpointWrapper and offers several methods to query the DBLP database. The package name of the Wrapper is dblpEndpoint. This Wrapper also offers a general SELECT-ciuery method using the implemented query function of the parent class.

Additionally, there arc several methods to determine whether a person is the co-author of another person. To test if the requester is a eo-author of the resource provider, some information about the requester is needed to identify him in the DBLP database. Therefore there arc methods that cheek the database with either the help of the homepage of the requester, the DBLP URI or the real name of the requester. The methods also require the real name or the DBLP URI of the resource provider to execute a query. An example to test whether two people arc co-authors using both their real names is provided in the following policy:

illustration not visible in this excerpt

The policy first calls the package dblpEndpoint using the function areCoAuthorsByRealName() with the two names as arguments. The function con­structs a query including both names and sends the query to the general function query () of the parent class SPARQLEndpointW rapper, where the query is exe­cuted against the DBLP database. If the result is not empty, than there is at least one publication written by both people. The function returns a true as a result. It is important to note that no bounding of variables takes places and the result set is empty because the function only cheeks whether two people arc eo-authors or not.

6.2.5 RDF Wrapper

To read and query the RDF file both libraries .Jena and ARQ arc used. The Wrapper for FOAF files is called FOAFWrapper with the package name foafQuery and several methods querying a given RDF file. Like the previously presented Wrappers, this one also permits general RDF queries using the provided queryRDF() function. The function expects the URL of the RDF file of the person and the query.

Additionally, the function isPersonFriend() checks whether the requester is a friend of the resource owner using the name of the requester and the RDF file of the resource owner. Another function called isWebSiteOfAFOAFFriend() finds out whether the requester is a friend or not by using a homepage of the requester.

The following policy gets a name of a person and an URL of an RDF file as arguments, creates a query accordingly and after the query has been executed against the file, returns a true, in ease the name is listed as a friend in the RDF file or false otherwise.

illustration not visible in this excerpt

6.2.6 Flickr Wrapper

The Wrapper to query RDF files generated by the Flickr exporter is called FlickrWrapper and it is a subclass of the FOAFWrapper. The package name is flickrquery and the functions the Wrapper offers are getGroupsUserBelongTo(), getAllUserContacts(). Both functions need a Flickr user id to generate the RDF file of the user and query it for the desired information. The Flickr Wrapper also offers a general function called queryRDF(), which expects the Flickr user id and a query as arguments. The function to get all contacts of a person on Flickr can be called as shown in the following:

illustration not visible in this excerpt

The function fetches all contacts of a user and binds the results, meaning each contact, to the variable User.

6.2.7 Twitter Wrapper

To use the Twitter API the open-source library Twittcr4J[36] is used. It enables .Java applications to easily access the API and also supports the OAuth authentication method, sec Section 6.1.1.

The Twitter Wrapper (TwitterQueryWrapper) works differently then the pre­viously presented Wrappers because it expects the person who wants to use the provided methods to grant Protune access to his Twitter account first.

The order of events of the authentication process is shown step by step in Fig­ure 10 and in the following the process on how to use the Wrapper is listed.

1. When the user wants to include a function of the Twitter Wrapper into a policy, the program cheeks first if an XML-file exists in the home directory of the user, that contains the user’s name and an Access Token. If such a file is found, meaning that the user has already granted access to his Twitter account, the methods of the Twitter Wrapper can be used.

2. Otherwise, if such a file is not found, the authentication process is triggered.

3. Once the user grants aeeess to his account and the Access Token is generated an XML file is created and stored in the home directory of the user so that the user can apply Protune without repeating the authentication process in the future. The produced XML file contains: username, user id, aeeess token and aeeess token secret.

4. After the registration all Twitter Wrapper functions arc at disposal for the user to apply. These functions include:

- getting the real name of a user
- getting the time zone of the user
- getting the location of a user
- getting the last update time of a user
- getting the homepage of a user
- getting the number of followers of a user
- cheeking whether the user is member of Twitter
- checking whether a user’s profile is private
- checking whether a user is followed by the policy creator, meaning whether the user is member of the friends list of the policy creator
- checking whether a user is following the updates of the policy creator
- checking whether a user is blocked by the policy creator

All of the above listed functions need the screen name of the user in question to be able to perform the queries. When making the request to the Twitter API, the first six functions return a result which can be bound to a variable. The last five functions only return a boolean with an empty result set. An example to test whether a user is on the friends list of the policy creator is shown in Policy (21).

illustration not visible in this excerpt

An example to show variable binding is shown in the following when using the function getRealNameOfUser():

illustration not visible in this excerpt

The name of the package to call from a policy is twitterquery. After the call, the Twitter API is queried and the results arc bound to the return variable Name, and for each binding a new fact is added to the knowledge base, such as “in([‘John Smith’], Twitter Wrapper: gctRcalNamcOfUscr(‘tom_k’))“. In this ease only one binding should be possible because a person usually has only one account.

6.3 SPoX- A Use Case

The prototype SPoX (Skvpe Policy Extension)[34] is a reactive policy engine [8] demon­strating the usage of policy-driven behaviour control on the Social Web application Skvpe. SPoX uses Protune and the integrated Social and Semantic Web data from various information sources presented in this thesis to allow users to define policies which arc then enforced upon Skvpe. Skvpe’s privacy preferences arc very limited, therefore the integration of Protune is a beneficai way to enhance the privacy set­tings according to data beyond the boundaries of Skvpe. For example using the two implemented Wrappers for Twitter and Fliekr, a Skvpe user can define a policy stating that only his Fliekr and Twitter friends arc allowed to call him on weekends.

illustration not visible in this excerpt

Figure 9: SPoX and its environment.

Further on SPoX allows to define reactive policies (cvcnt-condition-action rules) which extend the common policies presented in this thesis; basically if an event happens and the conditions of the policy arc true, the policy reacts by executing some actions. For example a user can state if a person calling via Skvpe is not a friend on any Social Web application, the call will be cancelled and turned into a chat message instead. Figure 9 gives a graphical overview of SPoX and how it operates with its environment. [7]

illustration not visible in this excerpt

Figure 10: Protunc’s Authentication on Twitter.

7 Related Work

In [49] an approach for access control on the Social Web is introduced describing the access control scheme Lockr. Lockr can be used to control the sharing of personal content on a Social Web application without the restriction of having to base access control on the application’s social network. Lockr provides an address book for man­aging a global social network which can be applied on a Social Web application for privacy settings. To restrict access to some data a so-called social access control list (ACL) is created, which describes what relationships the requester and the content owner need to have to access the data. A relationship is created by sending a person a so-called ‘social attestation’ containing the relationship such as ‘family’. Using this attestation the person can gain access to the respective data. In contrast to the approach presented in this thesis, the solution in [49] allows privacy settings to be based only on social networks, i.e. the relationships between users, but does not enable to specify other information and attributes into the privacy decision process, such as “only German speaking friends” or “people on whose blog I commented” can access data. Additionally no Semantic Web data can be integrated into the privacy decisions. Further on to use Lockr for privacy preferences a global social network has to be created, meaning all relationships have to be specified first and appropriate attestations need to be send. If for example a relationship ‘colleagues’ is created, every new colleague needs to get an attestation first before he can access the pro­tected data. Whereas the policy-based approach presented in this thesis uses the already established relationships of the various Social Web applications and adjusts dynamically to the changes that occur without having to send access rights to each person individually.

There are other approaches dealing with privacy and interoperability on the Social Web presented in [50, 51, 14]. The approaches [50, 51] contain the basic idea to bridge the walled garden [52] of nowadays Social Web applications. In [50] the introduction of OpenID allows users to manage an online identity which can be used to identify oneself on various Web Services (Social Web applications included) without having to undergo the registration process of each service individually. The OpenSocial project [51] provides a common API for various Social Web applications. By using this API the features and data of the Social Web applications which support OpenSocial can be used and combined.

An approach to preserve privacy of personal data is introduced in [14] but with emphasises on third-party applications’ access to personal data. An access control framework for applications is presented that allows users to specify what data is al­lowed to be retrieved by an external application. [53] underlines the same problem of privacy risks in terms of personal data being exposed to external applications. In [53] the problem is first described using Facebook and its many third-party applications and afterwards a solution is presented by introducing the ‘privacy-bv-proxy’ design for privacy of personal data against external applications.

Moreover privacy on social platforms is the focus of a huge body of research, see [54, 36, 55] for examples. However, although privacy issues on Social Web applica­tions have been acknowledged and their importance pointed out, to my knowledge there are no approaches for privacy preservation which allows to define privacy pref­erences based on data from various information sources, crossing the borders of Social Web applications and incorporating Social Semantic Web data.

8 Conclusions and Outlook

This thesis seizes on the nowadays existing privacy issues of the Social Web appli­cations and offers a flexible solution by introducing the policy-based access control. The Protune language used to implement the presented approach allows users to exactly define which conditions a requester has to fulfil to access their resources by specifying respective policies.

With Protune’s ability to include external information sources the automated policy specification and evaluation process has been extended towards privacy poli­cies based on information beyond the boundaries of one Social Web application. The data from various Social and Semantic Web sources is extracted, combined and incorporated into the policy-based privacy control of personal data. Thereby the user’s experience in using Social Web applications is improved and arbitrary and fine-grained privacy preferences that cover various situations of Social Web applica­tions can be realized. The feasibility of this approach is further demonstrated by describing the tool SPoX that includes the Protune framework into the Social Web application Skvpe and controls Skvpe’s behaviour by using non-rcactivc Protune policies and the reactive extension of these classical policies. [7]

As future work, there arc several research directions for improving and extending the presented work of this thesis listed below:

Integration of further External Data The integration of further Social and Se­mantic Web data from various information sources is an important step for future work. There arc several data collections based on RDF that provide SPARQL endpoints which can be easily included into Protune. The Project Gutenberg[35] for example offers information about authors and literary works or the CIA Faetbook[36] which is a comprehensive collection of RDF data about countries, their history, people, government, economy, geography and so on. An especially interesting information source is the GcoNamcs[37] ' which is a geo­graphical database that provides a web service to extract data. The GcoNamcs ontology allows to use the data with Semantic Web technologies and link the data to a FOAF file, for example to describe the location of a person. Further on data from Social Web applications can be included into the Protune framework as well. Twitter and Fliekr arc just two representatives of such applications, but the Social Web offers a huge variety of applications which contain different types of information. Some examples of Social Web applications that could be included arc: XING and Linkedln which concentrate on business social networks, content sharing sites like Delicious[38] and Digg'[39] or blog publishing applications like Blogger[40] and WordPrcss[41]. The OpcnSocial[42] is an initiative that provides a common API for several Social Web applications which could be integrated into Protune as well.

Usability An important challenge for policy languages such as Protune is their usability. For end-users to adopt and use the Protune framework the policy speeifieation process needs to be simple and should not be too time-consuming, as people tend to avoid eomplieatcd privacy options and keep the default ones [10] . To avoid this, a natural language interface should be integrated so that users do not have to specify policies in a formal syntax but they can under­stand and personalise policies using natural language features to benefit from the policy-based aeecss control. Further on user studies can help to solve major usability issues when defining policies and one can sec if users keep using default settings or prefer to use the fine-grained privacy preferences. Another inter­esting issue to observe during user studies is what type of policies arc created. Questions arise like what arc the most used concepts, arc there similarities between users or what types of users prefer which concepts.

Reactive Policies Further on reactive policies as already implemented with SPoX [8] , can be adjusted and integrated into other existing Social Web applications such as Twitter. To realize the challenge of implementing a tool similar to SPoX into another Social Web application one would need to study whether or how the tool can perform the actions triggered by a reactive policy on the application. On Twitter for example, the Protune application would need to extend the aeecss rights for a profile to ‘read-write’, in order to be able to perform changes in name of a user. Further on one would need to analyse what types of reactive policies arc possible on which applications, as each application has limited possibilities in terms of the features offered. For example, a policy that triggers a notification when a co-author of a user comments on the user’s photo on Flickr is specified for the Fliekr application. This policy cannot be applied in the same way to Twitter as Twitter docs not offer a photo feature.

Mobile Web As the popularity of Mobile Web is increasing, many Social Web applications offer features for the mobile phones. An analyse of privacy issues and how a policy-based aeecss control can be applied to the mobile community can be useful for future research. Thereby one of the main research challenges implies the adaption of the policy framework to the small devices, which have many limitations in comparison to computers. The usability is an even bigger issue on the Mobile Web as for example an application needs to be adjusted to the small displays of the mobile phones.


[1] Graham Comiede and Balaehander Krishnamurthy. Key differences between web 1.0 and web 2.0. First Monday, 13(6), June 2008.
[2] .Janet Kornblum and Mary Beth Marklein. What you say online could haunt you. Website, 2006. Available online at internetprivacy/2006-03-08-facebook-myspace_x.htm; visited on Septem­ber 22nd 2009.
[3] Samantha Rose Hunt. How to use technology wrong. Website, 2009. Avail­able online at; visited on September 22nd 2009.
[4] Erik Brady and Daniel Libit. Alarms sound over athletes’ faecbook time. Web­site, 2006. Available online at other/2006-03-08-athletes-websites_x.htm; visited on September 22nd 2009.
[5] Belinda Luscombc. Faecbook and divorce: Airing the dirty laundry. Website, 2009. Available online at http://www.time.eom/time/magazine/article/0, 9171,1904147,00.html; visited on September 22nd 2009.
[6] Danah Boyd. Faecbook’s privacy trainwrcck: Exposure, invasion, and social convergence. Convergence: The International Journal of Research into New Media Technologies, 14(1):13 20, February 2008.
[7] Philipp Kärger, Emily Kigel, and Daniel Olmedilla. Reactivity and social data: Keys to drive decisions in social network applications. In Second ISWC Work­shop on Social Data on the Web (SDoW2009), 2009.
[8] Philipp Kärger, Emily Kigel, and VcnkatRam Yadav Jaltar. Spox: combining reactive semantic web policies and social semantic data to control the behaviour of skype. In ISWC, Poster and Demo Session, Washington, DC, USA, October 2009.
[9] Danah Boyd. Social network sites: Public, private, or what? Web, May 2007. Knowledge Tree 13, May.
[10] Joseph Bonneau, Jonathan Anderson, and Luke Church. Privacy suites: shared privacy for social networks. In Lome Faith Cranor, editor, SOUPS, ACM In­ternational Conference Proceeding Scries. ACM, 2009.
[11] Tim O’Reilly. What is web 2.0. Website, September 2005. Available online at; visited on September 22nd 2009.
[12] .John Scott. Social networks: critical concepts in sociology VI. Routledge, 2002.
[13] Cathy Do Rosa, .Joanne Cantrell, Andy Havens, Janet Hawk, and Lillie Jenk­ins. Sharing, privacy and trust in our networked world. Website, 2007. Avail­able online at; visited on September 22nd 2009.
[14] Mohamed Shehab, Anna Cinzia Squicciarini, and Gail-Joon Aim. Beyond uscr- to-user access control for online social networks. In Licpm Chen, Mark Dcrmot Ryan, and Guilin Wang, editors, ICICS, volume 5308 of Lecture Notes in Com­puter Science., pages 174 189. Springer, 2008.
[15] W3C. W3c semantic web activity, 2009. Reference format TBD. Available online at; visited on September 22nd 2009.
[16] Ossi Nykänen. Semantic web: Protocol stack. Website, 2003. Available online at; vis­ited on September 22nd 2009.
[17] Graham Klvne and Jeremy J. Carroll. Resource description framework (RDF): Concepts and abstract syntax. World Wide Web Consortium, Recommendation REC-rdf-eoneepts-20040210, feb 2004.
[18] Joshua Tauberer. What is rdf, 2006. Available online at pub/a/2001/01/24/rdf .html; visited on September 22nd 2009.
[19] Aeecssing the dbpedia data set over the web. Website. Available online at; visited on September 22nd 2009.
[20] Christian Bizcr, R. Cyganiak, S. Auer, and G. Kobilarov. querying Wikipedia like a database. In Proc. Int. Conf. on World Wide. Web, 2007.
[21] Dan Bricklcy and Libby Miller. Foaf vocabulary specification 0.9. Namespace document, FOAF Project, 2007.
[22] Uldis Bojars, Alexandre Passant, John G. Brcslin, and Stefan Decker. Social network and data portability using semantic web technologies. In 2nd Workshop on Social Aspects of the. Web (SAW 2008) at BIS2008, pages 5 19, 2008.
[23] Juri Luca De Coi, Philipp Kärger, Daniel Olmedilla, and Sergej Zerr. Using Natural Language Policies for Privacy Control in Social Platforms. Hcraklion, Greece, Jun 2009.
[24] Piero A. Bonatti and Daniel Olmedilla. Semantic web policies: Where arc we and what is still missing? Tutorial at the European Semantic Web Conference (ESWC), June 2006.
[25] Y. Snir, Y. Rambcrg, J. Strassncr, R. Cohen, and B. Moore. Policy cpial- ity of service (cps) information model. Website, 2003. Available online at; visited on September 22nd 2009.
[26] Piero Bonatti and Daniel Olmedilla. Rule-based policy representation and rea­soning for the semantic web. Reasoning Web, pages 240 268, 2007.
[27] Juri L. De Coi, Philipp Kärger, Arne W. Kocsling, and Daniel Olmcdilla. Con­trol your clcarning environment: Exploiting policies in an open infrastructure for lifelong learning. IEEE Transactions on Learning Technologies, 1(1), 2008.
[28] A. Uszok, J. M. Bradshaw, R. Jeffers, N. Suri, P. Haves, M. R. Broody, L. Bunch, M. Johnson, S. Kulkarni, and J. Lott. Kaos policy and domain services: To­ward a description-logic approach to policy representation, dceonflietion, and enforcement. In Proceedings of Policy, Como, Italy, June 2003. AAAI.
[29] Lalana Kagal, Timothy W. Finin, and Anupam .Joshi. A policy language for a pervasive computing environment. In POLICY, pages 63 . IEEE Computer Society, 2003.
[30] Piero A. Bonatti and Daniel Olmcdilla. Driving and monitoring provisional trust negotiation with mctapolieics. In POLICY, pages 14 23. IEEE Computer Society, 2005.
[31] Protune in a nutshell. Website. Available online at http://skydev.13s. e ct/protune/wiki/?pagename=Protune+in+a+ nutshell; visited on September 22nd 2009.
[32] Piero A. Bonatti, Claudiu Duma, Norbert Fuchs, Wolfgang Ncjdl, Daniel Olmcdilla, .Joachim Peer, and Nahid Shalmichri. Semantic web policies - a discussion of requirements and research issues. In ESWC, pages 712 724, 2006.
[33] Giles Hogbcn ENISA. Security issues and recommendations for online social networks. Technical report, European Network and Security Agency, 2007.
[34] Leyla Bilge, Thorsten Strufc, Davide Balzarotti, and Engin Kirda. All your contacts arc belong to us: automated identity theft attacks on social networks. In .Juan Quemada, Gonzalo León, Yocllc S. Maarck, and Wolfgang Ncjdl, editors, WWW, pages 551 560. ACM, 2009.
[35] Elinor Mills. New worm targets faecbook, myspaec. Website, 2008. Available online at; visited on September 22nd 2009.
[36] Ralph Gross and Alessandro Aeciuisti. Information revelation and privacy in online social networks (the faecbook ease). In ACM Workshop on Privacy in the Electronic Society (WPES), pages 71 80, Alexandria, November 2005.
[37] Alexandre Passant, Philipp Kärger, Michael Hausenblas, Daniel Olmcdilla, Axel Polleros, and Stefan Decker. Enabling trust and privacy on the social web. In W3C Workshop on the. Future of Social Networking, Barcelona, Spain, .January 2009.
[38] J. Pontin. From many tweets, one loud voice on the Internet. New York Times Online [web site]. Retrieved May, 8, 2007.
[39] Juri L. De Coi, Philipp Kärger, Daniel Olmcdilla, and Sergej Zerr. Seman­tic· web policies for security, trust management and privacy in social networks. In Workshop on Privacy and Protection in Web-based Social Networks in con­junction with the. 12th International Conference on Artificial Intelligence & Law (ICAIL), Barcelona,Spain, June 2009.
[40] Claudiu Duma, Almut Herzog, and Nahid Shahmchri. Privacy in the semantic web: What policy languages have to offer. In POLICY, pages 109 118. IEEE Computer Society, 2007.
[41] Juri L. De Coi and Daniel Olmcdilla. A review of trust management, secu­rity and privacy policy languages. In International Conference on Security and Cryptography (SECRYPT 2008). INSTICC Press, July 2008.
[42] Bob DuCharmc. Getting started with the twitter api. Website, 2008. Available online at\_LATEST; visited on September 22nd 2009.
[43] Twitter api documentation. Website. Available online at; visited on September 22nd 2009.
[44] Alexandre Passant. Rdf export of fliekr profiles with foaf and sioe. Web­site, 2007. Available online at rdf-export-flickr-profiles-foaf-and-sioc/; visited on September 22nd 2009.
[45] Diego Bcrructa, Dan Brieklcv, Stefan Decker, Sergio Fernández, Christoph Corn, Andreas Harth, Tom Heath, Kingsley Idehen, Kjetil Kjcrnsmo, Alistair Miles, Alexandre Passant, Axel Polleros, Luis Polo, and Michael Sintek. Sioe core ontology specification. W3c member submission, W3C, June 2007.
[46] Protune architecture. Website. Available online at http://skydev. ect/protune/wiki/?pagename=Protune+ architecture; visited on September 22nd 2009.
[47] Daniel Olmcdilla. Semantic web policies for security, trust management and privacy in social networks. Invited Talk at the Workshop on Privacy and Pro­tection in Web-based Social Networks in conjunction with the 12th International Conference on Artificial Intelligence L· Law (ICAIL), .June 2009.
[48] How to use the java configurator. Website. Available online at http :// ect/protune/wiki/ ?pagename=How+to+use+the+java+conf igurator; visited on September 22nd 2009.
[49] Amin Tootoonehian, Kiran Kumar Gollu, Stefan Saroiu, Yashar Ganjali, and Alee Wolman. Loekr: social aeecss control for web 2.0. In Proceedings of the first, workshop on Online. Social Networks (WOSP), pages 43 48, New York, NY, USA, 2008. ACM.
[50] Website, the homepage for the OpenID:; visited on September 22nd 2009.
[51] Website, the homepage for the OpenSoeial project: http://www.opensocial. org/; visited on September 22nd 2009.
[52] Harry Halpin. Beyond walled gardens: Open standards for the social web. In Proceedings of the. First Social Data on the. Web Workshop (SDoW2008), CE UR Workshop Proceedings, ISSN 1613-0073, online. 405/paperi.pdf 2008.
[53] Adrienne Felt and David Evans. Privacy protection for social networking apis. Web (, 2008.
[54] Catherine Dwvcr, Starr Roxanne Hiltz, and Katia Passerini. Trust and privacy concern within social networking sites: A comparison of faccbook and myspacc. In Proceedings of the Thirteenth Americas Conference on Information Systems ( AMCIS 2007), 2007. Paper 339.
[55] (Under)mining Privacy in Social Networks, 2008. Google Ine. Web (http://


[1] retrieved August 14, 2009

[2] A web service which measures web traffic of web sites and creates ranking list accordingly: Alexa traffic rankings, retrieved August 14, 2009 from


[4] Phishing is an attempt to get personal data of users by using forged websites.

[5] Introduction of the ‘News Feed’ on Faccbook, which informed users about the newest activities of their friends, lead to an online petition, including over 700,000 users, demanding an abolition of the feature. [6]

[7] Social data includes user-generated content (bookmarks, tags, reviews, photos, blog posts etc.), personal data of users and their social networks.

[7] Tagging is used for organizing and interlinking content. To tag a content means to assign a descriptive keyword to the content to make the search for this data easier.

[8] The term Wall is used in Facebook to describe the reserved space on a user’s profile where the user himself or his friends can write public messages.

[9] the World Wide Web Consortium

[10] eXtensible Markup Language

[11] Uniform Resource Identifier

[12] The SPARQL language also enables other query forms: DESCRIBE and ASK, but as they are of no relevance for this thesis, they will not be explained here.

[13] the entire ontology of DBpedia is available at:; visited on September 22nd 2009

[14] littp://dblp. uni-trier, de/

[15] www. geonames, org /

[16] see data portability project for further reading:

[17] http : //sioc-proj ect.. org/

[18] Throughout, this thesis the simplified version of the policy syntax is used for clearer understand­ing.

[19] Variables start with a capital letter as it is the case in Logic Programming.

[20] Whereas with ‘action' the predicate access () is meant.

[21] Hereby not. only social networking sites are considered, but any possible web application, where people expose information about themselves including personal websites, blogs and so on.

[22] Collects any information about activities of a user’s friends and posts them to the front page of the user’s profile.

[23] а form of blogging, where user can release short text- messages similar to SMS to communicate with others.

[24] A page is a public profile which can be created to represent a business, a public figure or brand. If a user finds a page of interest., he can “become a fan” of it, which will lead to the page link being displayed on his profile.

[25] EXIF data shows the model of the camera used for the photo as well as other details such as the place and the time the photo was taken:

[26] Pokes! is a Facebook specific feature that, corresponds to simply saying hello, the other user gets a notification on his homepage that he was poked.

















Excerpt out of 71 pages


Using Social Semantic Web Data for Privacy Policies
University of Hannover  (Knowledge Based Systems Institute)
Catalog Number
ISBN (eBook)
ISBN (Book)
File size
3537 KB
Using, Social, Semantic, Data, Privacy, Policies
Quote paper
Antonia Feserer (Author), 2009, Using Social Semantic Web Data for Privacy Policies, Munich, GRIN Verlag,


  • No comments yet.
Look inside the ebook
Title: Using Social Semantic Web Data for Privacy Policies

Upload papers

Your term paper / thesis:

- Publication as eBook and book
- High royalties for the sales
- Completely free - with ISBN
- It only takes five minutes
- Every paper finds readers

Publish now - it's free