Artificial Intelligence in Recruiting. A Literature Review on Artificial Intelligence Technologies, Ethical Implications and the Resulting Chances and Risks


Master's Thesis, 2020

130 Pages, Grade: 1,3


Excerpt

Content

Content

Figures

Tables

Abbreviations

Symbols

1 Introduction
1.1 Problem statement
1.2 Objective and structure of the thesis

2 Theoretical Background
2.1 Recruiting as a part of Human Resources
2.1.1 Recruiting and its subprocesses
2.1.1.1 Reaching out subprocess
2.1.1.2 Preselecting subprocess
2.1.1.3 Assessing subprocess
2.1.2 Evolution of digitalization in recruiting
2.2 Machine learning as a part of artificial intelligence (AI)
2.2.1 Central framework used to create machine learning models
2.2.2 Supervised learning algorithms used in AI-based recruiting tools
2.2.2.1 Machine learning using statistical models
2.2.2.2 Machine learning using instance-based models
2.2.2.3 Machine learning using decision tree models
2.2.2.4 Machine learning using neural network models
2.3 Relevant ethical principles affected by AI-based recruiting tools
2.3.1 Ethical principle fairness
2.3.2 Ethical principle transparency

3 Method
3.1 Phase 1 - Definition of review scope
3.2 Phase 2 - Conceptualization of the topic
3.3 Phase 3 - Literature search
3.3.1 Phase 3.1 - Identification of relevant journals and conferences
3.3.2 Phase 3.2 - Identification of search databases
3.3.3 Phase 3.3 - Keyword search
3.3.4 Phase 3.4 - Forward and backward search

4 Results
4.1 Analysis of AI-based recruiting tools
4.1.1 Enhancing the reaching out subprocess
4.1.1.1 Challenges in the reaching out subprocess
4.1.1.2 Need and approaches for identifying high potential passive candidates
4.1.1.3 Analysis of AI-based recruiting tools in the reaching out subprocess
4.1.2 Enhancing the preselecting subprocess
4.1.2.1 Challenges in the preselecting subprocess
4.1.2.2 Need and approaches for preselecting candidates
4.1.2.3 Need and approach for creating personalized questions for questionnaires
4.1.2.4 Analysis of AI-based recruiting tools in the preselecting subprocess
4.1.3 Enhancing the assessing subprocess
4.1.3.1 Challenges in the assessing subprocess
4.1.3.2 Need and approaches for objectivizing audio-visual input
4.1.3.3 Need and approaches for objectivizing the derivation of personal characteristics
4.1.3.4 Need and approaches for predicting job performance and working habits
4.1.3.5 Need and approaches for predicting the salary of candidates
4.1.3.6 Analysis of AI-based recruiting tools in the assessing subprocess
4.1.4 Concluding assessment of AI-based recruiting tools
4.2 Analysis of the addressing of ethical challenges arising from AI-based recruiting tools
4.2.1 Addressing of fairness
4.2.1.1 Challenge of using data of poor quality in AI-based recruiting tools
4.2.1.2 Need for addressing fairness in AI-based recruiting
4.2.1.3 Analysis of the addressing of fairness in AI-based recruiting tools
4.2.2 Addressing of transparency
4.2.2.1 Challenge of using black box models in AI-based recruiting tools
4.2.2.2 Need for addressing transparency in AI-based recruiting
4.2.2.3 Analysis of the addressing of transparency in in AI-based recruiting tools
4.2.3 Concluding assessment of the addressing of ethical challenges

5 Discussion
5.1 Implications for research
5.2 Implications for practice

6 Conclusion
6.1 Summary
6.2 Limitations and further research needs

References

Appendix

Figures

Figure 2-1: Key areas in Human Resources management and their interdependencies

Figure 2-2: Subprocesses of recruiting

Figure 2-3: Cross-Industry Standard Process for Data Mining framework

Figure 2-4: Interrelation of under- and overfitting, variance and model bias, and their impact on generalizability

Figure 2-5: Common evaluation metrics used in the field of machine learning

Figure 2-6: Categorization of relevant supervised learning algorithms 21

Figure 2-7: Functionality of a neuron and structure of a fully connected feed-forward neural network

Figure 2-8: Most important ethical principles affected by AI-based tools

Figure 2-9: Machine learning algorithms classified according to their transparency

Figure 3-1: Framework for conducting the systematic literature review

Figure 3-2: Literature search process

Figure 3-3: Elimination process resulting in the literature set

Figure 4-1: Technology evolution in the reaching out subprocess

Figure 4-2: Technology evolution in the preselecting subprocess

Figure 4-3: Technology evolution in the assessing subprocess

Figure 4-4: Aggregated view on the technology evolution in AI-based recruiting

Figure 4-5: Temporal evolution of the addressing of fairness

Figure 4-6: Temporal evolution of the addressing of transparency

Figure 4-7: Aggregated view on the addressing of fairness and transparency in AI-based recruiting

Tables

Table 2-1: Confusion matrix

Table 3-1: Classification of the scope of the systematic literature review

Table 4-1: Coding for evaluating the addressing of fairness in the literature set

Table 4-2: Coding for evaluating the addressing of transparency in the literature set

Table 4-3: Synthesized literature table

Abbreviations

Abbildung in dieser Leseprobe nicht enthalten

1 Introduction

1.1 Problem statement

In 2020, approximately 54% of companies experience shortages of high-quality employees ac­cording to the survey of the ManpowerGroup (2020, p. 2) covering half a million organizations worldwide. Next to the difficulty of recruiting well-fitting candidates, the hiring process for one employee takes approximately 42 to 52 days and costs an organization around USD 4,000. At the same time, the recruiting process is considered to be highly subjective (Black and van Esch 2020, pp. 216-217; Zhu et al. 2018, p. 2). To increase the efficiency and objectivity of the recruiting process, organizations started using AI-based tools. While early adopters stated that through the implementation of such tools, they were able to reduce cost and time-to-hire and to enhance the quality of the recruiting process, the technology also bared new risks. For example, in 2015, one year after the launch of their AI-based recruiting tools, Amazon became aware that its tool strongly preferred male candidates for job openings and, in turn, discriminated against women. Next to the awareness of the tool’s discriminatory tendencies, the in-transparency of the decision-making process led Amazon to the conclusion that they cannot guarantee “that the machines would not devise other ways of sorting candidates” than intended. Thus, the project was stopped in 2017 (Dastin 2018, p. 2). In accordance, LinkedIn Talent Solution’s vice presi­dent John Jersin, who is responsible for offering AI-based tools to recruiters, stated that he “would not trust any AI system today to make a hiring decision on its own” (Dastin 2018, p. 5). Consequently, it appears that AI-based recruiting tools can be used to raise potentials in the field of recruiting. Nevertheless, and underscored by the examples from practice, it becomes clear that the success of AI-based recruiting tools does not only depend on the potential chances an implementation would create, but also on the addressing of ethical questionings such as fairness and transparency.

1.2 Objective and structure of the thesis

The thesis at hand aims to analyze this field of tension between the benefits and risks caused by the introduction of AI into recruiting based on a systematic analysis of academic publica­tions. Several academic initiatives underscore the importance of addressing ethical challenges as a factor in the implementation of AI-based tools, such as IEEE’s Ethically Aligned Design series, for example (IEEE 2020). Furthermore, influential journals such as MISQ called for papers assessing ethical risks resulting from the implementation of AI-based tools (Berente et al., pp. 2-3). In contrast to the general interest of academia in ethical questionings regarding AI- based tools, previous literature reviews in the field of AI-based recruiting have solely focused on application fields of AI-based tools (see for example Jatobâ et al. (2019), Geetha and Bhanu Sree Reddy (2018), or Nawaz (2019)). To the knowledge of the author, neither a literature re­view on ethical challenges of AI-based recruiting tools nor an overview addressing application fields and the resulting ethical risks has been published so far. Consequently, this thesis aims to close this research gap by disclosing how AI technologies affect the recruiting process and how ethical challenges arising from the implementation of AI-based tools are addressed in the same publications. Consequently, research question (RQ) one (RQ1) and RQ two (RQ2) can be derived:

1. Which AI technologies are applied in the field of recruiting, and how do they influ­ence the recruiting process?
2. Which major ethical challenges arise from the introduction of AI into recruiting, and how are these challenges addressed by the proposed AI-based tools?

Addressing these two RQs, the remainder of this thesis is structured as follows. Chapter 2 clas­sifies recruiting and its subprocesses as a part of Human Resources (HR), establishes a common understanding of AI and machine learning (ML) algorithms relevant in AI-based recruiting tools, and derives major ethical challenges with a focus on the ethical principles fairness and transparency. In chapter 3, the methodical approach used for identifying and selecting relevant literature is described. Chapter 4 answers both RQs, based on the AI-based recruiting tools included in the literature set. In turn, the first part of the analysis focuses on analyzing how the needs raised through traditional recruiting means are addressed by AI-based recruiting tools, also touching base on the underlying technologies. The second part addresses if and, where applicable, how these publications incorporate fairness and transparency. Subsequently, chapter 5 discusses the main findings of the analysis of the literature set and provides implications for theory and practice, followed by a brief conclusion and the outlining of limitations and future research fields in chapter 6.

2 Theoretical Background

In this chapter, relevant terminology and theoretical concepts are covered to provide a common understanding for the remainder of the thesis at hand. First, the field of recruiting is placed in its business context as a central part of HR. Subsequently, AI is defined and put into the context of recruiting. Thereafter, relevant ethical questionings in the field of AI are outlined, and the concepts of the most relevant ethical principles are set into the context of AI-based recruiting.

2.1 Recruiting as a part of Human Resources

In the following subchapter, an overview of the tasks involved in HR is given, including the responsibilities and the interfaces between recruiting and bordering HR tasks. Subsequently, recruiting, its subprocesses, and relevant terminology are defined, followed by a brief overview of the evolution of digitalization in recruiting.

Strohmeier and Piazza (2015, p. 151) define HR management as a “subset of management tasks that are related to potential or current employees to obtain contributions that directly or indi­rectly support the strategy and performance of an organization.” According to Wirtky et al.’s (2016, p. 25) broad screening of academic definitions of HR management, the HR tasks can be split into six key areas that have temporal dependencies. An illustration can be found in Figure 2-1, which accounts for temporal dependencies through directed and double-sided arrows. The first task is termed planning, which focuses on deriving the demand and supply of employees on a company-wide level. This task also includes specifying job requirements in collaboration with the corresponding department (Bizer et al. 2005, p. 1371; Wirtky et al. 2016, pp. 27-28). Internal staffing and external recruiting, the second and third tasks, aim to provide employees in the right quantity and quality to the organization (Strohmeier and Piazza 2015, p. 151). In­ternal staffing includes matching employees to jobs and forming internal teams (Wirtky et al. 2016, pp. 27-28). External recruiting, which will be referred to as recruiting in the following, is often carried out in parallel to the second task and is responsible for matching external candi­dates to jobs (Halutzky 2016, p. 23; Strohmeier and Piazza 2015, p. 151; Wirtky et al. 2016, p. 40). Fourth, development tasks address the performance management of employees, which in­cludes formal and informal development measurements such as coaching as well as technical and non-technical training (Strohmeier and Piazza 2015, p. 152; Wirtky et al. 2016, p. 42). Fifth, motivation tasks in HR include improving the morality and satisfaction of the workforce, for example, through incentives, rewards, or recognition systems (Strohmeier and Piazza 2015, p. 152; Wirtky et al. 2016, p. 45). Sixth, administration tasks include HR management, control­ling, vacation planning, and payrolling (Wirtky et al. 2016, pp. 27-28).

Abbildung in dieser Leseprobe nicht enthalten

2.1.1 Recruiting and its subprocesses

In this section, recruiting and its subprocesses are explained, and the corresponding tasks asso­ciated with each subprocess are outlined. Also, relevant terminology is defined and put into the context of this thesis.

Bizer et al. (2005, p. 1370) divide recruiting into requirement analysis, the publication of job posts, the preselection of candidates, and the recruitment decision. Wirtky et al. (2016, p. 28) include the tasks candidate attraction, selection, and candidate management into the responsi­bilities of recruiting. Jantan et al. (2010, p. 267), van Esch et al. (2018, p. 915), as well as Maree et al. (2019, p. 715) go into more detail and define recruiting as an organizational process which aims to identify, attract and sign a qualified individual, subdividing recruitment into the sub­processes candidate attraction and collection, candidate pre-selection, and candidate assess­ment. The thesis at hand follows Jantan et al.’s (2010, p. 267), van Esch et al.’s (2018, p. 915) and Maree et al.’s (2019, p. 715) definition, and subdivides the recruiting process into the three subprocesses reaching out, preselecting, and assessing, which are also depicted in Figure 2-2 and structure the following subsections. As the focus of the thesis at hand mainly lies on chal­lenges, needs, and chances AI-based tools create for recruiters, the tasks associated with each subprocess are also explained from the recruiter's perspective. In the context of this thesis, a recruiter is defined as an HR expert who represents the interests of the hiring organization in each of the following subprocesses and participates in the corresponding tasks (Ullah and Witt 2018, pp. 344-345).

Abbildung in dieser Leseprobe nicht enthalten

2.1.1.1 Reaching out subprocess

The main task of the reaching out subprocess is to create a candidate pool of promising candi­dates. Consequently, the potential target group for job vacancies needs to be identified and provided with relevant information in an extensive and timely manner, and applications need to be collected (Black and van Esch 2020, pp. 219-220; Llorens 2011, p. 412; Singh and Finn 2003, p. 396). Before defining the reaching out subprocess in more detail, first, the character­istics of a candidate need to be detailed. A candidate is an individual that is potentially interested in a job opportunity and seeks employment that fits its capabilities, training, remuneration, and social connections (van Esch et al. 2019, p. 216). Candidates can be subdivided into active and passive candidates. In the context of this thesis, the term candidate will refer to active and pas­sive candidates. Active candidates are individuals who take actions to identify job opportuni­ties, while passive candidates are not actively searching for a job but are open for offers if presented with opportunity. In turn, passive candidates are characterized by having a higher likelihood of turnover. Likelihood of turnover is defined as the probability that candidates will quit their job within the next year of employment (Chien and Chen 2008, p. 285; Cho and Ngai 2003, pp. 126-127). Passive candidates outnumber active candidates approximately by a ratio of three to one (Black and van Esch 2020, p. 219; van Esch and Black 2019, p. 736). Conse­quently, one of the central aims of the reaching out subprocess is to identify and approach pas­sive candidates.

The most relevant information for potential candidates is the job posting. A job posting com­prises information regarding the organization itself and the job profile, which includes the placement of the job inside the organization, the aims and responsibilities associated with the position, and further cornerstones such as working hours, potential salary, and entry date. Also, job requirements needed for a successful application are stated, including expected work experience, knowledge and skill, educational background as well as behavioral characteristics (Bröckermann 2016, p. 54).

In order to raise the interest of candidates for the job posting, recruiters need to determine a medium or a mix of media for reaching out to them (Breaugh 2012, p. 391). In general, the means for attracting candidates can be split into offline and online channels. Offline channels comprise of advertisements in newspapers and print media, job fairs, and employment referrals, among others (Breaugh 2012, p. 397; Bröckermann 2016, p. 52; Singh and Finn 2003, p. 396). Online channels include job offer adverts in online media, job boards, and online social net­works (OSNs), as well as search engine marketing and posting job offerings on the website of the hiring organization (Bizer et al. 2005, pp. 1371-1372; Breaugh 2012, p. 397; Maree et al. 2019, p. 715; Sivabalan et al. 2014, p. 178). In the context of this thesis, OSNs are especially relevant. In general, OSNs can be divided into professional OSNs such as LinkedIn or XING and non-professional OSNs such as Facebook, Instagram, or Twitter. On both OSNs, job post­ings can be published either on a company-owned profile or can be advertised to users for compensation (Black and van Esch 2020, p. 218; Josimovski et al. 2019, p. 39; McCarthy et al. 2017, p. 1704; Wirtky et al. 2016, p. 43). In the context of this thesis, only professional OSNs are relevant, and thus, the term OSN will solely refer to the former.

In the case of a successful reaching out subprocess, candidates apply for the advertised job. Applications can be transmitted either electronically or non-digital. Both consist of the candi­date’s resume and optionally additional information such as cover letters, certificates, or letters of recommendation (Black and van Esch 2020, p. 219; Harris 2018, p. 16; Kessler et al. 2012, p. 1125). Although in the field of HR, resume and Curriculum Vitae sometimes refer to textual documents that differ with regard to the included information, in the literature set underlying this thesis, both terms are used interchangeably. To enhance comprehensibility, this work only uses the term resume. Traditionally, resumes are textual overviews of a candidate’s historical and current professional attributes, qualifications, and experiences next to educational back­ground, personal information, and interests (Abdessalem and Amdouni 2011, p. 220; Kessler et al. 2008, p. 629; Zhang and Wang 2018, p. 2). Lately, video resumes, which are “video­recorded messages where job candidates present themselves to potential employers” (Nguyen and Gatica-Perez 2016, p. 1422), have been introduced as an alternative form of resumes and have gained popularity in the recruiting community (Chen et al. 2016, p. 32). In comparison to textual resumes, video resumes give recruiters a more in-depth insight into a candidate’s com­munication skills and personality (Nguyen and Gatica-Perez 2016, p. 1422). Resumes, inde­pendent of their form, are considered to be a major predictor for future job performance (Zhang and Wang 2018, p. 2). In the context of this thesis, the term application refers to an electronic application, and candidate profile is used as the umbrella term for all information provided by the candidate. All received applications result in the candidate pool. The quality of this candi­date pool is determined by its breadth, which is the number of included candidates, and its depth, which is the extent of information recruiters have of the candidates (Black and van Esch 2020, p. 219).

2.1.1.2 Preselecting subprocess

In the preselecting subprocess, an organization reduces the candidate pool breadth by 50 to 80% by screening out inappropriate candidates (Black and van Esch 2020, p. 220). The preselecting and the downstream assessing subprocess focus on determining the candidate or candidates with the most significant potential to contribute to the organization’s success (Black and van Esch 2020, p. 220; Wirtky et al. 2016, pp. 40-41). Both subprocesses aim to assign candidates a hirability or a person-job fit score. Hirability, following Nguyen et al. (2014, p. 1021), is defined in the context of this thesis as the classification of a candidate with the binary variable hire or reject. Person-job, on the other hand, refers to the matching degree of a candidate's characteristics and the corresponding job profile and can be depicted on a non-binary scale (Bizer et al. 2005, p. 1372; Qin et al. 2020, p. 2; Shen et al. 2018, p. 3545; Wirtky et al. 2016, p. 40; Zhu et al. 2018, p. 5). In the following, characteristics of candidates evaluated in prese­lecting or assessing are explained in the corresponding recruiting subprocess.

During preselection, recruiters focus on evaluating the knowledge and skills of candidates, which are mainly derived from characteristics such as work experience and educational back­ground (Harris 2018, p. 16; Li et al. 2011, p. 75; van Esch et al. 2018, p. 916). The basis for preselecting candidates is formed through extracted information from the submitted application and from additional information sources, such as blogs, personal websites, OSNs, or prelimi­nary e-assessments (Laumer et al. 2009, p. 263; McCarthy et al. 2017, p. 1704). Especially OSNs provide in-depth insights about candidates, as network participants host information on their OSN profile, including personal values, interests, career development, current workplace, and place of living. Furthermore, publicly available interpersonal connections can be leveraged in the recruiting context (Black and van Esch 2020, p. 219; Hugl 2011, p. 385). E-assessments relevant in the context of this thesis are questionnaires aiming to derive information useful for anticipating the future performance of a candidate. These are referred to as questionnaires in the following (Laumer et al. 2009, pp. 263-264; Qin et al. 2019, p. 2171). All extracted information, in turn, is used to preselect candidates with a positive hirability score or the highest person-job fit, which in turn results in a ranking of candidates.

2.1.1.3 Assessing subprocess

Assessing candidates is the final subprocess of recruiting, in which the candidates remaining in the narrowed down candidate pool are mainly evaluated based on personal characteristics and anticipated working habits (Bizer et al. 2005, p. 1372; Maree et al. 2019, p. 715; Singh and Finn 2003, p. 396; Sivabalan et al. 2014, p. 178; Wirtky et al. 2016, p. 40). In general, desired per­sonal characteristics strongly differ depending on the job (Güclütürk et al. 2018, p. 319). Per­sonal characteristics addressable by AI-based recruiting tools are a candidate’s personality and cultural fit. The personality of a candidate is usually evaluated using the Five-Factor Model, which is also referred to as the Big Five Model in research (Escalante et al. 2017, p. 3691; Güclütürk et al. 2018, p. 318; Li et al. 2011, p. 76; Mairesse et al. 2007, p. 458; Nguyen et al. 2014, p. 1019; Nguyen and Gatica-Perez 2016, p. 1423; van Esch et al. 2018, p. 916). The Big Five Model characterizes human personality traits on an abstract level according to the five dimensions agreeableness, conscientiousness, extraversion, neuroticism, and openness (Güclütürk et al. 2018, p. 318; Mairesse et al. 2007, p. 458; Nguyen et al. 2014, p. 1027). In turn, based on a quantification of the outlined characteristics, the matching degree between job requirements and candidate personality can be derived as the deviation between the former. Interrelated with the Big Five is the assessment of a candidate’s potential cultural fit with the organizational environment. Following Bye et al. (2014, p. 9), team fit is defined as how well the self-presentation of a candidate matches the recruiter's “culturally bound normative expec­tations,” representative for the department’s and organization’s normative expectations. Ergo, cultural fit is the matching degree between the personality of a candidate and the working en­vironment of the corresponding team or department determined by the personalities of the em­ployees working in the former (Bye et al. 2014, p. 9).

In general, recruiters try to anticipate the working habits of candidates as these are interrelated with job performance and hence determine whether the recruitment of a candidate leads to productivity and revenue increases or higher costs and adverse working environments (Ali Shah et al. 2020, pp. 1-2; Benton 2016, p. 70; Chien and Chen 2008, p. 281; Cho and Ngai 2003, p. 124; Strohmeier and Piazza 2015, pp. 156-157). Relevant factors in the context of this thesis are a candidate’s job attitude and likelihood of absenteeism. Following Tung et al. (2005, pp. 784-785), job attitude is defined as the level of satisfaction an individual receives by performing a job, the perceived impact of a job, as well as the individual’s commitment towards an organization. Job attitude is an indicator for predicting job performance (Tung et al. 2005, p. 785). In line with Ali Shah et al. (2020, p. 1), absenteeism is defined in the context of this thesis as an individual’s habit of being late, absent from duty, or regularly engaging in activities not related to work during working hours. Absenteeism is an indicator of poor performance. In turn, detecting candidates that are likely to be absent often can save organizations the costs of further recruiting and can prevent harmful and unproductive working environments (Ali Shah et al. 2020, pp. 1-2).

Each or a subset of these characteristics and working habits are assessed using different meth­ods, of which graphological analysis and interviews are the relevant approaches in the context of this thesis. Following Coll and Fornés (2009, p. 1081), graphological analysis is defined as the derivation of personality traits from a candidate’s handwriting sample, which includes fea­tures such as the slope, the shape of strokes, and the spacing between lines among others. In­terviews are characterized as interpersonal face-to-face interactions between a candidate and at least one interviewer, conducted either on- or offline. The interviewing party, which comprises of at least one recruiter and is used in its singular form in the following, evaluates the behavior of candidates next to confirming their abilities, skills, and knowledge based on a composed set of interview questions (Chen et al. 2016, p. 32; McCarthy et al. 2017, p. 1704; Nguyen et al. 2014, p. 1018; Qin et al. 2019, p. 2165; Shen et al. 2018, p. 3544). The resulting evaluation of the interviewer is based on the previously screened candidate profile and the candidate’s verbal and nonverbal behavior (Nguyen et al. 2014, p. 1018). Verbal behavior in the context of em­ployment interviews is defined as the spoken word and is the primary mode of communication in interview situations (Nguyen et al. 2014, p. 1018). Non-verbal behavior consists of signals, which can be either perceived aurally or visually. Aural signals include the intonation, the amount of spoken time, and the tone of voice, while perceived visual signals are facial expres­sions, gaze, gestures of head or extremities, body posture, clothing, and overall attractiveness (Nguyen et al. 2014, pp. 1018-2021; Nguyen and Gatica-Perez 2016, p. 1423). The perception of aural and visual signals has an especial influence on the assessing process during the first 100 milliseconds, which will be referred to as a candidate’s first impression (Escalante et al. 2017, p. 3689; Güclütürk et al. 2018, p. 316).

By assessing the beforementioned characteristics of candidates through the described methods, the best fitting candidate or candidates are determined. This candidate, in turn, enters salary negotiations, which will also be included in the assessment subprocess in this thesis. Subse­quently, the candidate is offered employment, followed by the contract signing (Black and van Esch 2020, p. 219; van Esch et al. 2018, p. 915).

2.1.2 Evolution of digitalization in recruiting

In the following, a brief temporal overview of the evolution of recruiting and its intertwined relationship with information technology (IT) is given to provide a common understanding of what traditional recruiting is. Additionally, relevant stakeholders of AI-based recruiting are de­fined.

Before the 1990s, recruiting relied mainly on low-tech means such as printed ads and employee referrals (Black and van Esch 2020, p. 216; Llorens 2011, p. 412; Singh and Finn 2003, p. 395). Due to its offline and interpersonal character, recruitment was limited in its capacities by a trade-off between information richness, which stands for the costs of providing information, and information reach, which corresponds to the costs of reaching out to candidates with rele­vant job information (Black and van Esch 2020, p. 217). In the context of this thesis, infor­mation richness is equated with the quality of the recruiting process. In the 1990s, the internet emerged, which resulted in the digitization of candidate and job profiles. This technology marks the beginning of digitalization in recruiting. By using corporate websites and digital job boards for reaching out to candidates and by digitalizing the whole application process, the costs for information richness and information reach, as well as the administrative workload, decreased (Black and van Esch 2020, p. 218; Holm 2014, p. 443; Llorens 2011, p. 412; Singh and Finn 2003, p. 401). At the same time, the costs for searching and applying for jobs eradicated (Black and van Esch 2020, p. 218; Llorens 2011, p. 412). Between the years 2000 and 2010, web 2.0 was introduced to the recruiting community, leading to the creation of professional OSNs that, in turn, further increased the efficiency of job posting and searching (Black and van Esch 2020, p. 218; Llorens 2011, p. 412).

Up to the year 2010, efforts in the field of recruiting concentrated on increasing the efficiency of the process in terms of costs and time-to-hire. Still, the efficiency potential is not exhausted. Besides, the output of each subprocess is highly subjective and interdependent with the re­cruiter, and not all available information is leveraged (Black and van Esch 2020, pp. 216-217; Breaugh 2012, pp. 403-406; Harris 2018, p. 16; Kessler et al. 2012, p. 1133). AI-based tools were partly introduced into recruiting since 2010 to address these fields of process efficiency and quality, and are considered to be the next technological milestone of the recruitment pro­cesses (Black and van Esch 2020, p. 218; Kaplan and Haenlein 2019, p. 17; van Esch et al. 2019, p. 217). Published AI-based recruiting tools are detailed in chapter 4.

Through the historically unmatched interdependence between recruitment and IT, the spectrum of stakeholders needs to exceed recruiters and candidates. Additional stakeholders of AI-based recruiting tools relevant in the context of this thesis include data scientists composing the tools, managers deciding about the usage of the former, which, together with recruiters, are summa­rized under the term organizational stakeholders. Furthermore, the increasing impact of AI- based recruiting tools on humans led to a renewed interest of regulators in the field. This stake­holder evaluates the risks of such tools and adapts legislation accordingly (Arrieta et al. 2020, p. 88). In the context of this thesis, the term traditional recruiting refers to technologies used prior to AI technologies, and stakeholders refer to the whole of the beforementioned.

2.2 Machine learning as a part of artificial intelligence (AI)

In this subchapter, first, the term AI is defined, followed by a derivation of ML as a central component of AI-based recruiting tools. Second, to fully understand the capabilities and limi­tations of AI-based tools, this subchapter clarifies the procedure of how such tools are created by using the Cross-Industry Standard Process for Data Mining (CRISP-DM) framework. At last, algorithms relevant in AI-based recruiting tools and hence in the context of this thesis are explained.

To understand AI-based recruiting tools, first, a profound understanding of AI needs to be es­tablished. Although the theoretical foundations of AI date back to a Conference in Dartmouth in 1956, the technology has just gained popularity in the past decade (Hagras 2018, p. 28; Ong- sulee 2017, p. 1). The main prerequisites for AI-based tools are large quantities of high-quality data and the capabilities to process the former. Four factors can be derived that assisted the rise of AI (Hagras 2018, p. 28; Kreutzer and Sirrenberg 2019, p. 6). First, the digitalization and interconnection of products, services, and processes resulted in an unmet generation of data culminating in approximately 40,000 Exabytes alone in the year 2020 (Fogel 2006, p. 16; Kreut­zer and Sirrenberg 2019, pp. 75-81; Russell and Norvig 2016, pp. 27-28). Second, the generated data can be stored cheaply (Ongsulee 2017, p. 1). Third, the processing power of IT systems has increased exponentially for decades. Moore’s law predicted a doubling in the performance of IT systems every 18 months in 1965, and this law prevailed at least until 2019. In turn, computers became capable of handling vast amounts of data and solving complex equations (Fogel 2006, p. 15; Hagras 2018, p. 28; Kreutzer and Sirrenberg 2019, p. 74; Rotman 2020, p. 4). Fourth, the technological advancements resulted in an increased interest of businesses and organizations in AI-based technologies, which led to a substantial rise in the investment vol­ume. This accelerated the development of AI even further (Domingos 2012, p. 84; Kreutzer and Sirrenberg 2019, p. 100). All four trends resulted in AI becoming one of the most influential technologies of the 21st century.

Although the term AI has first been published over 60 years ago and by now tools based on this technology set are omnipresent in daily life, academia is still lacking a precise, commonly ac­cepted definition (Bolander 2019, p. 850; Jantan et al. 2010, p. 262; Ongsulee 2017, p. 1; Strohmeier and Piazza 2015, p. 153). This is mainly due to two reasons. First, AI is a vast field in science with a variety of research areas such as computer science, mathematics, artificial psychology, philosophy, neuroscience, and linguistics, among others (Ongsulee 2017, p. 1). Second, the speed of evolution of the technology makes it hard to reach an agreement regarding what is considered AI and what is basic statistics. Görz (2011, p. 313) phrased this issue as “if it works, it’s no longer AI.” This struggle to determine what makes an IT system intelligent is further reinforced by academia’s difficulty to understand and define human intelligence itself (Kreutzer and Sirrenberg 2019, pp. 2-3). Thus, to define AI, first, the term intelligence needs to be specified. In the dictionary of Oxford University Press (2020e), intelligence is characterized as “the ability to learn, understand, and think in a logical way about things.” According to Negnevitsky (2005, pp. 1-2), this translates into the characteristic that someone or something can learn, understand, solve problems, and make decisions accordingly. Closely tied to this definition of human intelligence, Strohmeier and Piazza (2015, p. 153), Bolander (2019, pp. 850-851), and Tambe et al. (2019, p. 15) define AI as IT programs that can perform tasks which only humans were able to perform in the past. Jantan et al. (2010, p. 262), van Esch et al. (2019, p. 215), and Russell and Norvig (2016, p. 34) explicitly include the ability of an AI-based tool to perceive different environments and to adapt the course of action accordingly in their defini­tion of AI. Kaplan and Haenlein’s (2019, p. 17) define AI even broader and more data-centric as IT programs that possess the “ability to interpret external data correctly, to learn from such data, and to use those learnings to achieve specific goals and tasks through flexible adaptation.” In the context of this thesis, Kaplan and Haenlein’s (2019) definition is used as it best combines the characteristics of intelligence with the data focus of AI-based tools.

On a less abstract level, AI comprises of techniques such as “search, symbolic and logical rea­soning, statistical techniques, and behavior-based approaches” next to methods like ML (Hagras 2018, p. 28). As AI-based recruiting tools mainly use ML in research, the thesis at hand will solely focus on the former. Correlated with the challenge of defining the term AI, there is no clear definition of ML. Bolander (2019, p. 854), for example, defines ML as “any AI algo­rithm that does not have a static behavior but can learn from experience,” while Ongsulee (2017, p. 2) see ML algorithms as computer systems that can learn without explicitly being pro­grammed. Goodfellow (2016, pp. 2-3), in accordance with Kaplan and Haenlein (2019, p. 18) and Kleinberg et al. (2018, p. 238), characterize ML algorithms as a set of technologies that can extract patterns from raw data, can turn these into knowledge and can propose decisions based on the former. In the context of this thesis, the more detailed definition of Goodfellow (2016), Kaplan and Haenlein (2019), and Kleinberg et al. (2018) is used.

In general, ML algorithms can be classified into supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning algorithms (Ayodele 2010, p. 19; Ong- sulee 2017, p. 2). Supervised learning is the most applied form of ML in current research and makes up for about 70% of tools (LeCun et al. 2015, p. 436; Ongsulee 2017, p. 2). This focus on supervised learning algorithms is also reflected in the field of AI-based recruiting tools. In the context of AI-based recruiting, unsupervised learning algorithms are only a peripheral phe­nomenon, and no publications were identified which use semi-supervised learning algorithms or reinforcement learning algorithms. Consequently, the field of AI-based recruiting is limited to supervised learning algorithms in the context of this thesis, and the following definitions, as well as the term ML algorithm, will solely refer to the former.

Within the field of supervised learning algorithms, academia still lacks a clear definition of which algorithms classify as ML and which do not, as the borderline between the former and computational statistics is rather vague and does overlap (Ongsulee 2017, p. 2). This becomes apparent when comparing different textbooks on the matter. In Kubat’s (2017) textbook, mod­els that are solely based on Logistic Regression (LogReg) algorithms do not classify as ML models, while Goodfellow (2016), Larranaga (2019), Alpaydin (2008), and Forsyth (2019) de­fine LogReg as ML algorithm. Genetic algorithms, on the other hand, are included in Kubat’s (2017) ML definition, while the majority of the other consulted textbooks do not classify ge­netic algorithms as ML. To provide a stringent line of which algorithms are and which are not considered ML algorithms, only algorithms classify as ML in the context of this thesis if they are included in the majority of the following textbooks representing the common ground in academic literature: Goodfellow (2016), Larranaga (2019), Alpaydin (2008), and Forsyth (2019). Ergo, linear regression including ridge, LASSO, and Ordinary Least Squared regres­sion, greedy algorithms, linear discriminant analysis, as well as genetic algorithms do not clas­sify as ML. In turn, ML and AI can be differentiated as follows. ML algorithms are methods adapted to and used on datasets for knowledge derivation, which results in a specific ML model. The representation of such models will be referred to as AI-based tools in accordance with the formerly introduced limitation that AI is restricted to supervised ML in the context of this thesis (Kaplan and Haenlein 2019, p. 18).

2.2.1 Central framework used to create machine learning models

This section defines further terminology relevant in the field of AI-based recruiting. Based on a framework common in ML, the procedure of how ML models are composed and evaluated are outlined in the following.

The framework used in the thesis at hand is the CRISP-DM framework, which can be applied for the conceptualization process of ML models regardless of the application field and under­lying ML algorithm (Wirth and Hipp 2000, p. 29). The framework was developed by leading industry companies under the patronage of the European Commission and is widely accepted among researchers and practitioners (Europäische Kommission 1998; Wirth and Hipp 2000, pp. 29-30). In general, CRISP-DM can be seen as a generic checklist, which proposes central steps that should be included in model conceptualization for ensuring reliable and repeatable results (Wirth and Hipp 2000, pp. 29-30). In the following, each of the six steps of the frame­work, which are depicted in Figure 2-3, is summarized from a data scientist's perspective, touch­ing base on the corresponding tasks performed in each step of the model conceptualization.

Abbildung in dieser Leseprobe nicht enthalten

In the first step of the CRISP-DM framework, the data scientist should aim to understand busi­ness requirements and objectives in order to transform this knowledge into an ML task (Wirth and Hipp 2000, p. 34). The ensuing step two, which is termed data understanding, comprises of generating or collecting data, identifying data quality issues, and deriving first insights from the dataset (Tung et al. 2005, p. 786; Wirth and Hipp 2000, p. 34). In general, a dataset consists of a collection of data points, which will be referred to as instances or data instances in the context of this thesis. Data instances have several characteristics, which are called features and make up the so-called feature space (Kononenko 2010, p. 4). Datasets used in supervised learning are also characterized by having a predetermined output feature, which will be referred to as output following Emerson (2018, pp. 219-220) and Henelius et al. (2014, p. 1508). Step one and two need to be repeated until the data scientist fully understands the business and data perspective.

Subsequently, data pre-processing needs to be performed, which corresponds to step three in the CRISP-DM framework. Data pre-processing includes all activities performed on the raw data to create the final dataset that is fed to the ML algorithm. In turn, this step forms the foun­dation for meaningful results. Pre-processing comprises of data cleaning, including noise de­tection and disregarding redundant features, feature extraction, data integration, and transfor­mation, as well as the creation of new features (Domingos 2012, p. 84; Russell and Norvig 2016, p. 713; Tung et al. 2005, p. 786; Wirth and Hipp 2000, pp. 34-35). Enhancements of step three relevant in the context of this thesis, are the inclusion of ontologies and frameworks based on statistical co-occurrence (Caliskan et al. 2017, p. 183; Faliagka et al. 2014, p. 516). Both are abstractions of reality, structuring a specific information space and making it accessible to other systems (Dadzie et al. 2018, p. 51; Garcia-Sânchez et al. 2006, p. 249; Strohmeier and Piazza 2015, p. 158). Ontologies can be implemented in ML models, for example, as the foundation for calculating the semantic meaning or contextual information of a word, which is also referred to as word embedding. Word embeddings are vector representations of a word in a high-dimen­sional vector space, which in turn are the foundation for calculating the semantic similarity of a set of concepts from different input sources (Caliskan et al. 2017, p. 183; Faliagka et al. 2014, p. 516; Montuschi et al. 2013, p. 43). A widely used framework for calculating word embed­dings in HR is, for example, Global Vectors for Word Representation (GloVe) (Caliskan et al. 2017, p. 183; Luo et al. 2019, p. 1103). One central aim of data pre-processing is to reduce the dimensionality of the dataset, which refers to the number of features fed to the ML algorithm, and thereby keeping or raising the information quality (Ayodele 2010, p. 22). In general, the performance of ML algorithms strongly depends on the quality of the dataset and the coverage of data instances, making step two and three essential for ensuring the predictive performance and validity of the resulting AI-based tool (Faliagka et al. 2014, p. 523; Goodfellow 2016, p. 3).

During modeling, which is step four in the CRISP-DM framework, the data scientist needs to select the ML algorithm or algorithms that presumably perform best for the problem statement at hand and adapts these towards the dataset resulting in ML models (Goodfellow 2016, pp. 114-116; Wirth and Hipp 2000, p. 34). In general, ML models can be categorized into classifi­cation, clustering, and regression models according to the task they aim to solve. In the context of this thesis, only classification tasks are analyzed, which are approached by supervised learn­ing models. More information regarding clustering or regression approaches can be found in Goodfellow’s (2016) or Russell and Norvig’s (2016) textbooks. Classification tasks refer to the concept of relating the features of input instances to a limited number of output features in the form of a discrete-valued function. This function, in turn, predicts the output for unlabeled in­stances based on its respective features (Chien and Chen 2008, p. 281; Domingos 2012, p. 79; Goodfellow 2016, p. 98; Henelius et al. 2014, p. 1508; Kononenko 2010, p. 4; Russell and Norvig 2016, p. 696; Urbanowicz and Browne 2017, p. 24). Before feeding data to the ML algorithm, the pre-processed dataset needs to be split into training, test, and an optional valida­tion set. The training set should “span the complete variability of each feature” (Harris 2018, p. 22) and should make up for the most substantial proportion of the data, followed by test and validation set in descending order of data subset size. Variability is referred to as data diversity in the context of this thesis. The validity of an ML model is directly interrelated with dataset size and data diversity, although what can be considered sufficient differs with regard to the decision-space. For simplicity reasons, datasets that comprise less than 400 instances are as­sumed to have difficulties covering the whole diversity of the dataset in the context of this thesis. This assumption is based on the factors that the problems ML models aim to solve in recruiting are complex and non-trivial, and at the same time, most of the analyzed ML models are based on complex learning algorithms (see for instance Kessler et al. (2012, p. 1133), Zhou et al. (2019, p. 3), or Zhang and Wang (2018, p. 7)). Both factors can only be addressed with sufficient dataset size, and using 400 instances as a differentiator can be seen as a reasonable and comparably restrictive compromise. ML algorithms learn on the training set, while hy­perparameter tuning, if applicable, is done with the validation set through grid search or random search (Ali Shah et al. 2020, p. 7; Bellamy et al. 2019, p. 4; LeCun et al. 2015, p. 436). When simply splitting the dataset into these three subsets, it is called holdout cross-validation (Russell and Norvig 2016, p. 708). When the amount of HR data is limited, cross-validation is often performed as an alternative to holdout cross-validation in the field of AI-based recruiting.

During cross-validation, the whole dataset is split into k subsets, and one subset is held out while the model trains on the rest of the dataset. This process is repeated until all subsets have been held out once. Consequently, the model is trained on the maximum possible variety, and the model stays interpretable, as the results of each training can be averaged to estimate the overall performance of the model (Domingos 2012, p. 80; Goodfellow 2016, p. 120; Harris 2018, pp. 22-23; Li et al. 2011, p. 78; Russell and Norvig 2016, p. 708).

In general, the aim of training a model is to achieve generalizability, which translates to how good the learned concepts fit when the trained ML model is provided with unseen data in­stances. In the following, determinants of generalizability are outlined, which correspond to the schematic learning process of an ML model. Generalizability is mainly determined by how under- or overfitted a model is and by its bias and variance. A depiction of the interrelation of these concepts can be found in Figure 2-4. Under- and overfitting can be determined by using the learning curves of training loss or through error rates. For simplicity reasons, both terms are defined based on error rates. During training, ML models start with a high error rate for training and test set and learn the underlying patterns to reduce this error rate. If a model performs poorly on the training set, it is termed underfitting, which can be caused, for example, by not training enough in terms of time and data, or by using a model incapable of deriving the underlying patterns (Goodfellow 2016, pp. 109-110). The interrelation of the prediction, symbolized by the line, and the underlying data instances are depicted in the top-left graph of Figure 2-4. If the model continues its learning process and learns the patterns in too great depths and includes irrelevant details and noise in its classification function, the error rate further decreases in the training set, but increases for the test set. Such models, which are characterized by a consider­able deviation of predictive performance between the training and the test set, are called over­fitting and are depicted in the top-left graph in Figure 2-4 (Goodfellow 2016, pp. 109-110). Hence, during training, the ML model is optimized to have limited under- and overfitting in order to generalize well, which can be seen in Figure 2-4. Correlated with over- and underfitting are the terms bias and variance. The bias of an ML model translates to the deviation of a model’s output from the true output of the function and will be referred to as model bias. Simply put, model bias measures the model’s tendency towards learning the same wrong thing, which re­sults in having difficulties fitting the model to the training set (Domingos 2012, pp. 81-82; Goodfellow 2016, p. 127). Variance shows to which extent an ML model learns random things regardless of the original input. In turn, this translates to a model’s failure of fitting the model to the test set (Domingos 2012, pp. 81-82). To sum it up, a data scientist needs to balance of under- and overfitting, as well as variance and model bias in order to identify an appropriate level of complexity, which translates to a model’s generalizability as depicted in the bottom graph in Figure 2-4. Over- and underfitting, as well as variance and model bias issues, can be partially addressed through cross-validation, which is a central approach in the field AI-based recruiting (Domingos 2012, pp. 81-82).

Abbildung in dieser Leseprobe nicht enthalten

After modeling and training, the performance of the resulting ML model is evaluated regarding its generalizability on unseen data in step five of the CRISP-DM framework. For a first evalu­ation, the data scientist usually uses the beforementioned test set to simulate unseen data in­stances (Bellamy et al. 2019, 4; Domingos 2012, p. 80; LeCun et al. 2015, p. 436). Standard evaluation metrics used in ML include Accuracy, Error Rate, and metrics based on the so-called confusion matrix (Ali Shah et al. 2020, p. 7). Accuracy is defined as the proportion of correctly predicted outputs, while on the opposite, the error rate is the proportion of incorrectly predicted outputs (Goodfellow 2016, p. 102; Powers 2011, p. 38). A confusion matrix, of which a sim­plified binary version is depicted in Table 2-1, is a visualization method to show how well an ML model classifies. For this matrix the prediction results are split according to what has been classified correctly (True Positives and True Negatives) and what has been mispredicted (False Positives and False Negatives) (Powers 2011, pp. 38-39).

Table 2-1: Confusion matrix

Abbildung in dieser Leseprobe nicht enthalten

Source: Own depiction adapted from Powers (2011, p. 38)

Based on the confusion matrix, the following evaluation metrics can be derived, of which the corresponding formulas can be found in Figure 2-5. Precision measures “the proportion of pre­dicted positive cases that are correctly real positives,” while Recall depicts a model’s “propor­tion of real positive cases that are correctly predicted positive” (Powers 2011, p. 38). The F1 score includes a model’s Recall and Precision in one metric, demonstrating how well a model performs based on taking both metrics as equally important into account (Powers 2011, p. 41). Furthermore, the Recursive Operating Characteristic curve and the derivable Area Under the Curve (AUC) can be used to evaluate how well a model predicts with regard to different thresh­olds. This curve plots a model’s sensitivity, which is its Recall, to its specificity, which is the ratio of True Negatives to the sum of True Negatives and False Positives (Powers 2011, pp. 39­40). The better an ML model, the closer it is to the upper left corner of the plot (Ali Shah et al. 2020, p. 7; Kessler et al. 2008, p. 631; Powers 2011, p. 41). In the context of this thesis, Accu­racy is used as the primary evaluation metric as it is the most broadly used metric in the analyzed literature set and hence provides the best foundation for model comparisons. If an ML model’s Accuracy is not stated, AUC, F1, Precision, and Recall are used in the indicated order for eval­uating the performance of models in the context of this thesis. This order is proposed as AUC and F1 scores include either Precision or Recall or both in their metric, and hence information regarding the latter can also be directly derived.

Figure 2-5: Common evaluation metrics used in the field of machine learning

Abbildung in dieser Leseprobe nicht enthalten

Source: Own depiction following Powers (2011, pp. 38-41)

After assessing the ML model based on the presented evaluation metrics, the data scientist needs to derive if the conceptualized ML model is capable of solving the task at hand during step six. If the data scientist evaluates the performance as insufficient, the CRISP-DM process is started again in step one. If, on the other hand, the model is seen as well-fitting based on the preceding evaluation metrics, the ML model or the results are deployed (Wirth and Hipp 2000, p. 36).

2.2.2 Supervised learning algorithms used in AI-based recruiting tools

In accordance with the definition of ML algorithms at the beginning of this subchapter, this section explains the functioning of all supervised learning algorithms used by publications in the literature set, this thesis is based on. Furthermore, the advantages of each algorithm are stated.

Relevant ML algorithms include Naive Bayes (NBs), LogRegs, Support Vector Machines (SVMs), K-Nearest Neighbors (KNNs), Decision Trees (DTs) and Random Forests (RFs), as well as Artificial Neural Networks (ANNs) and Deep Neural Networks (DNNs). In the remain­der of this thesis, an ML model, which is based on the NB algorithm, termed NB model; the same holds for all other algorithms. The algorithms outlined in the following are structured according to Hagras’s (2018, p. 31) and Gunning and Aha’s (2019, p. 47) categorization into statistical models, instance-based models, DT models, and neural network models, as is de­picted in Figure 2-6.

Abbildung in dieser Leseprobe nicht enthalten

2.2.2.1 Machine learning using statistical models

Relevant statistical models include ML models based on NB and LogReg. The NB learning algorithm classifies input instances based on Bayes Theorem and the assumption that all fea­tures are independent of each other and equally important. In order to predict the classification of a data instance, the probability for all potential outputs is estimated through Bayes Theorem, and the output with the highest probability is assigned as the classification to the input instance (Feng et al. 2016, p. 3; Forsyth 2019, pp. 10-12).

LogReg-based ML models learn the LogReg function of the corresponding training set by esti­mating the regression coefficients through algorithms such as maximum-likelihood or gradient descent, for example (Forsyth 2019, pp. 258-260; James et al. 2013, pp. 133-137). In turn, the impact of each feature can be derived (James et al. 2013, p. 136). By introducing a threshold, the classification process becomes a linear task in which the output is assigned based on the derived impact of the input features (James et al. 2013, p. 133).

The advantages of statistical models are the following. NB models are well suited when features are also in reality independent of each other and only limited processing power is available (Forsyth 2019, p. 12). LogReg algorithms perform well when the dataset contains noise and instances are linearly separable (Russell and Norvig 2016, p. 727). Both algorithms and the resulting AI-based tools are considered to be comparably easy to interpret (Hagras 2018, p. 30).

2.2.2.2 Machine learning using instance-based models

For classifying new input instances, instance-based models, which are also called non-paramet­ric models, compare the features of new instances with data instances introduced during the training process (Russell and Norvig 2016, p. 737). Relevant in the context of this thesis are KNN- and SVM-based models. The KNN algorithm classifies unseen instances based on iden­tifying the k most similar data instances in the training set and assigning the output to input instances, which occurs the most often among the k selected data instances (Domingos 2012, p. 78; Russell and Norvig 2016, p. 738). The similarity between data instances is determined based on the distance in a corresponding vector space, which can be calculated by using distance measures such as the Euclidean distance or Minkowski distance, for example. Optimization, in turn, is done by minimizing the distance between similar pairs and maximizing the distance of dissimilar points (Hu et al. 2018, p. 60390; Kessler et al. 2008, p. 628; Russell and Norvig 2016, p. 738). To simplify the terminology, ML models that refer to their algorithm as supervised distance learning models are clustered as KNN algorithms in this thesis.

SVM-based models use so-called hyperplanes, which are n-1 dimensional separators, to divide the n-dimensional feature space into subspaces, separating the training instances (James et al. 2013, p. 351; Russell and Norvig 2016, pp. 744-746). For non-linear classification, training data is transformed into higher dimensional representations through so-called kernel methods until the data is linearly separable. The separator is determined by maximizing the margin be­tween the support vectors between the separator and each data instance through quadratic pro­gramming (Ayodele 2010, pp. 25-27; Russell and Norvig 2016, pp. 744-746). The classification Accuracy increases depending on the size of the margin between the hyper-plane and the cor­responding data instance (Li et al. 2011, p. 77).

The advantages of instance-based ML models are the following. ML models based on KNNs are especially well suited when only a few training instances are available and the decision space is considered low dimensional (Russell and Norvig 2016, p. 739). In comparison, SVMs are also well suited for learning from few training instances but can also produce acceptable results for high-dimensional decision spaces (Li et al. 2011, p. 77; Martinez-Gil et al. 2018, p. 24). While KNN models are considered to be easily interpretable by humans, SVM-based ML models become in-transparent if the decision space gets too high dimensional due to the com­plexity of the necessary kernel functions (Hagras 2018, p. 30).

2.2.2.3 Machine learning using decision tree models

In general, a DT is a graph that consists of a parental node, stems/branches, internal nodes, and leaves. Similar to SVMs, DTs recursively split the input space into subsets until an appropriate level of granularity is reached (Chien and Chen 2008, p. 282; Goodfellow 2016, p. 142). A DT is usually generated by using heuristic functions such as C4.5, for example, which recursively partition data instances until an overfitting tree is constructed (Chien and Chen 2008, p. 281; Cho and Ngai 2003, p. 126; Tung et al. 2005, pp. 789-790). Subsequently, the tree is pruned again until the desired generalizability is reached (Cho and Ngai 2003, p. 126; Tung et al. 2005, pp. 789-790). In the resulting tree, every internal node represents a splitting criterion for one feature, and the resulting branches each stand for one feature value or a set of the former.

Internal nodes are arranged according to their relative impact on the prediction output, which is also referred to as information gain (Domingos 2012, p. 79; Forsyth 2019, p. 41; Min and Emam 2003, p. 153). The leaves of a DT directly reveal the output class, and the path towards these leaves can be interpreted as IF-THEN rules (Chien and Chen 2008, p. 282; Cho and Ngai 2003, p. 126; Domingos 2012, p. 79; Goodfellow 2016, p. 142; Tung et al. 2005, p. 787). Hence, for classification tasks, the resulting model assigns outputs to new input instances based on the leave the instance ends after applying the splitting criterion of each node on the features of the instance (James et al. 2013, p. 314).

RF models consist of a set of DTs, of which each is built only including a subset of the predic­tors to generate strongly different DTs. For classification tasks, each of these trees votes for a particular classification, and the result of the RF is the classification with the most votes (James et al. 2013, pp. 324-325).

The advantages of DT are their fast learning speed and their ability to cope with noisy data (Chien and Chen 2008, p. 281; Cho and Ngai 2003, p. 126; Goodfellow 2016, p. 142). In com­parison to DTs, RFs produce results that are more complex to interpret due to the number of underlying trees. At the same time, results are more decorrelated from specific features making prediction results less variable and, in turn, more reliable due to the avoidance of overfitting (James et al. 2013, p. 325). Consequently, while DT models are considered to be interpretable for humans based on their visual output and the straightforward derivation of useful rules, RF models are in-transparent due to the increased complexity caused through the introduction of several DTs (Hagras 2018, p. 30).

2.2.2.4 Machine learning using neural network models

In the following, the basic functioning of neural networks is explained based on a fully con­nected, single-layered feed-forward neural network. Key components of supervised learning neural networks are neurons, activation functions, weights, input and output layer, hidden layer(s), and a learning algorithm. Each layer of a neural network consists of n neurons, which each have input (Xj) and output links (Yi) and an individual activation threshold. These links, in turn, have a corresponding weight (Wij) between neuron i and j. During the learning process, every neuron functions as a threshold device that inputs the aggregation of all weighted inputs of its connections into an activation function which determines the neuron’s output as depicted at the top of Figure 2-7 (Ali Shah et al. 2020, p. 4; Cho and Ngai 2003, p. 125; Fogel 2006, pp. 13-14; Russell and Norvig 2016, pp. 727-728). Typical activation functions are, for example, ReLU or tanh (Ali Shah et al. 2020, p. 4). If the network only has one-directed connections, it classifies as a feed-forward network, while if the output of neurons is fed back into them as part of the input, it is a recurrent network (Russell and Norvig 2016, p. 729). Data is fed into the model through the input layer, while the output layer forwards the prediction, giving a neural network a structure as depicted in Figure 2-7.

In general, neural networks can be categorized regarding their depth and width. The depth of a neural network corresponds to the number of hidden layers, and its width describes the number of hidden neurons, as well as the input-output connection of the former. The number of hidden layers can vary between one and several tens of thousands of layers. In general, the more hidden layers a neural network has, the more complexity it can represent (Kreutzer and Sirrenberg 2019, pp. 5-6; LeCun et al. 2015, p. 436; Russell and Norvig 2016, p. 732). Ongsulee (2017, p. 3), in accordance with Goodfellow (2016, pp. 164-165), define neural networks with more than one hidden layer as DNNs. In the context of this thesis, Goodfellow’s (2016) and Ongsulee’s (2017) definition is used to differentiate between neural networks as with this definition, trends regarding the number of used hidden layers can be especially well derived. Hence, neural net­works with one hidden layer are referred to as ANNs, while networks with more than one hid­den layer are termed DNNs. The term neural network includes both, ANNs and DNNs. The differentiation is also depicted in Figure 2-7, in which the greyed-out layer is the precondition for classifying as a DNN. For simplicity reasons, the term ANN and DNN represent both, feed­forward and recurrent networks, while DNN’s further include deep convolutional and deep re­sidual networks. A more in-depth elaboration on the former can be found in LeCun et al.’s (2015, pp. 439-442) paper on deep learning.

For training, which corresponds to updating the connection weights, usually, gradient descent is used as a method (Russell and Norvig 2016, p. 730). In the forward pass, the model’s predic­tion is calculated by forwarding the data through the network. The loss function to be minimized is the deviation between the model’s prediction and the sought output. To calculate the gradients in a multi-layer network, the backpropagation algorithm recursively applies the chain rule by backpropagating the loss from the output over the hidden layers back to the input in the back­ward pass. The connection weights can efficiently be updated on the fly. This process of iterat­ing forward and backward through the neural network is continued until the deviation con­verges, resulting in stabilized weights (Ali Shah et al. 2020, p. 4; Cho and Ngai 2003, p. 125; LeCun et al. 2015, p. 436; Russell and Norvig 2016, pp. 733-735).

Abbildung in dieser Leseprobe nicht enthalten

The advantages of neural networks are the following. In contrast to other ML approaches, the learning process of neural networks is solely based on data instances, and thus, the needed domain knowledge is limited. Furthermore, neural networks are especially well suited for learn­ing complex non-linear patterns from high-dimensional and incomplete, noisy, or contradictory datasets (Kreutzer and Sirrenberg 2019, p. 4; LeCun et al. 2015, p. 436; Livingstone et al. 1997, p. 142). Taking the differences between DNNs and ANNs into account, usually the more com­plex the data patterns are, the better a DNN performs in comparison to an ANN (Kreutzer and Sirrenberg 2019, pp. 5-6; LeCun et al. 2015, p. 436; Livingstone et al. 1997, p. 142; Russell and Norvig 2016, p. 732). Both, ANNs and DNNs, are hard to interpret for humans (Hagras 2018, pp. 30-31).

2.3 Relevant ethical principles affected by AI-based recruiting tools

This subchapter first derives which ethical principles are the most important when implement­ing AI-based tools in general and in the context of recruiting. Subsequently, the most relevant ethical principles are outlined in detail and put into the context of AI.

If tools perform complex, cognitive work, the social requirements and ethical implications of the task are inherited (Bostrom and Yudkowsky 2014, p. 317; Verdiesen 2018, p. 388). As formerly derived, and further illustrated with the example from practice in chapter 1, are recruiting tasks complex. Thus, ethical implications need to be taken into account. Still, there is no clear consensus in academia on which ethical questionings need to be addressed when adopting AI-based tools. Currently, the discussion in academia and practice focuses on 11 eth­ical values, including fairness, transparency, non-maleficence, responsibility, privacy, benefi­cence to freedom, trust, sustainability, dignity, and solidarity (Jobin et al. 2019, p. 395). An overview of the former can be found in Figure 2-8, of which the most important ethical princi­ples in the context of this thesis are highlighted in dark grey. In accordance with the objective of the thesis at hand to analyze the addressing of major ethical challenges, the most relevant ethical questionings are identified in the following using Jobin et al.’s (2019, pp. 389-397) ex­cessive web-search based overview of ethical principles. This overview is used due to two rea­sons. First, the ethical principles identified in this paper directly represent ethical values, ma­jorly impact the development and the perception of AI-based tools, and greatly influence legis­lation. Second, the overview is excessive and includes ethical guidelines published by all of the formerly defined stakeholders next to non-profit organizations, academic institutions, science foundations, federations of worker unions, political parties, and supranational organizations (Jobin et al. 2019, pp. 389-391). Hence, Jobin et al.’s (2019) overview of ethical principles can be seen as an aggregation representing the most extensive consensus of the AI ecosystem. Thus, it can be considered a suitable indicator to determine which ethical challenges are the most crucial in the field of AI and, consequently, also in the field of AI-based recruiting. In total, Jobin et al. (2019, p. 395) identify 84 guidelines that state ethical principles relevant when im­plementing AI-based tools. Within these 84 guidelines, the ethical principles fairness and trans­parency are the most prominent as they are included in 73 and, respectively, 68 of the guide­lines. The importance of the former is further supported by several conceptual studies in the field of AI-based recruiting (see for example Cortez and Embrechts (2013, p. 30), El Ouirdi et al. (2016, p. 431), Gade et al. (2019, p. 3203), Gonzalez et al. (2019, pp. 36-37), Hagras (2018, p. 29), or Rudin (2019, pp. 206-209)). Based on the formerly presented lines of arguments, especially with regard to the broad inclusion of stakeholders, and in accordance with AI-based recruiting literature, fairness and transparency are identified as the most relevant ethical values and thus principles in the field of AI. Consequently, the thesis at hand focuses on these two ethical principles. In the context of this thesis, the term ethical principles refers to ethical guide­lines and also represents the eponymous ethical values following the reasoning of Jobin et al. (2019, p. 391).

Abbildung in dieser Leseprobe nicht enthalten

2.3.1 Ethical principle fairness

In this section, the ethical principle fairness is defined and put into the context of AI. In general, the concept of fairness is highly interwoven with the concepts of bias and discrimination (Arrieta et al. 2020, p. 113; Bantilan 2018, p. 15; Jobin et al. 2019, p. 394). Thus, in order to explain the term fairness in the context of this thesis, bias, and discrimination, as well as their interdependency among each other and with fairness, need to be explained first. Subsequently, the concept of fairness is defined, and approaches to address the former are outlined.

The Oxford University Press (2020a) defines bias as “a strong feeling in favor of or against one group of people, or one side in an argument, often not based on fair judgment.” In accordance, bias is defined as a subgroup being systematically privileged at the costs of another based on a set of features in the context of this thesis (Bantilan 2018, p. 15; Bellamy et al. 2019, p. 2). The term discrimination is defined by the Oxford University Press (2020b) as “the practice of treat­ing somebody or a particular group in society less fairly than others” based on age, ethnicity, or gender. Weichselbaumer (2003, p. 629) and Arrieta et al. (2020, p. 115) extend this definition by including health status comprising of Body-Mass-Index and height, religion, sexual orienta­tion, address, and age as causes for discrimination. In the context of this thesis, the beforemen­tioned features, which, if included in the dataset the ML model trains on, can potentially intro­duce discrimination into an AI-based tool, are termed discriminatory features. Bantilan (2018, p. 15), in accordance with Fukuchi et al. (2015, p. 1503) and Hajian et al. (2016, p. 2125), state that discrimination occurs when “an action is based on biases resulting in the unfair treatment of people.” Following this definition, biases are defined as the foundation of discrimination in the context of this thesis, and discrimination arises when the underlying biases are made per­manent in a decision-making process. Ergo, an ML model and the corresponding AI-based tool is considered discriminatory if the likelihood of predicting an adverse outcome is higher than the distribution in the dataset would suggest with regard to a set of features (Tambe et al. 2019, p. 35).

In the Oxford University Press (2020d) dictionary, fairness is defined as “the quality of treating people equally.” In line with this broad definition, Bantilan (2018, p. 15) defines fairness as the “inverse of discrimination.” In alignment with the formerly derived definitions of discrimina­tion and bias, Arrieta et al. (2020, p. 113) define the output of an AI-based tool as fair if it does not discriminate against an underrepresented group, which is encrypted in the dataset as bias. From a more technical perspective, fairness in the context of AI-based tools can be defined as fairness through unawareness, fairness through equality of opportunity, or counterfactual fair­ness. Fairness through unawareness is met if an ML model explicitly excludes protected fea­tures from the dataset or training set (Kusner et al. 2017, pp. 4067-4068). Equality of oppor­tunity is fulfilled if the output of a classifier is entirely independent of the set of protected fea­tures (Kusner et al. 2017, pp. 4067-4068; Quadrianto et al. 2019, p. 8228). The concept of counterfactual fairness evaluates if a model can guarantee that the prediction outcome of a sys­tem would be exactly the same if an individual would be in a counterfactual world where it is part of the opposite group regarding the discriminatory feature (Arrieta et al. 2020, p. 115; Kusner et al. 2017, p. 4068). In this thesis, the ethical principle fairness is defined by the concept of fairness through unawareness as, out of the three concepts, it is the only one that can be examined without needing to evaluate the underlying dataset or code, which is mostly not avail­able.

[...]

Excerpt out of 130 pages

Details

Title
Artificial Intelligence in Recruiting. A Literature Review on Artificial Intelligence Technologies, Ethical Implications and the Resulting Chances and Risks
College
Friedrich-Alexander University Erlangen-Nuremberg  (Wirtschaftsinformatik)
Grade
1,3
Author
Year
2020
Pages
130
Catalog Number
V978174
ISBN (eBook)
9783346324214
ISBN (Book)
9783346324221
Language
English
Tags
Artificial Intelligence, Explainable AI, Künstliche Intelligenz, Ethics in AI
Quote paper
Matthias Rudolph (Author), 2020, Artificial Intelligence in Recruiting. A Literature Review on Artificial Intelligence Technologies, Ethical Implications and the Resulting Chances and Risks, Munich, GRIN Verlag, https://www.grin.com/document/978174

Comments

  • No comments yet.
Read the ebook
Title: Artificial Intelligence in Recruiting. A Literature Review on Artificial Intelligence Technologies, Ethical Implications and the Resulting Chances and Risks



Upload papers

Your term paper / thesis:

- Publication as eBook and book
- High royalties for the sales
- Completely free - with ISBN
- It only takes five minutes
- Every paper finds readers

Publish now - it's free