Diploma Thesis, 2018
2. Literature Review
3.1. Data and Methodology Approach
3.2 Empirical Analysis
5. Dutch Professional Football
5.1. Youth Categories
5.2 Elite Team
5.3 International Players
7.1 Web Source:
The Relative Age Effect (RAE) is a phenomenon that impacts an individual’s outcome. In particular, the RAE in sports favors those athletes that have been born prior to a cut-off date and is detrimental for those who were born later in the same period. The grouping of soccer players in academies benefits the early born football players in the selection for the elite team because of physical advantages that are based on biological maturation. Therefore, evidence suggests that the effect is more likely to be observed in the lower youth categories. This study examines the RAE in the Dutch football leagues by investigating data from the national teams U-15 and U-19 of the national team. A Pearson test provides evidence that the observed birth date distribution deviates from the expected one, which leads to a skewed birth date distribution. The findings suggest that the early born football players have indeed an age advantage in the selection system in relation to their late born peers. Moreover, there is higher probability that they will become professional in the future. Also, the RAE seems to decrease overtime, but it still exists, even in the elite category.
Keywords: relative age effect; overrepresentation; soccer; cut-off date; physical advantages; elite team; youth categories
illustration not visible in this excerpt
Undoubtedly, the main purpose of each organization is to find the most talented and high-skilled people for its crucial positions. Notwithstanding, the selection system might deviate from what optimality suggests. In fields such as education and sports, it is a common phenomenon to group people according to their age. In most countries, schools group students of the same academic year together, based on their date of birth (Gonzalez-Villoral et al., 2015). Students that are born close after the cut-off date, which is usually January 1, show a relatively better education performance compared to their younger classmates (Thompson et al., 1991; Navarro et al., 2015). The Relative Age Effect (RAE) can explain those differences in the performance level that mainly depend on the maturation level between those born early and those born late.
The RAE in sports is related to asymmetries in the birth distribution of athletes (Gonzalez-Villoral et al., 2015). Specifically, the existence of the RAE suggests an overrepresentation of the early born players, in the elite teams, over their age-disadvantaged peers in the same age category (Besters, 2018; Gonzalez-Villoral et al., 2015; Helsen et al., 2005). This implies a skewed birth date distribution that favors those that were born early. A commonly used cut-off date in sports is January 1 or August 1. The birth date of an athlete seems to be crucial and associated with his/her future career. The physical and psychological attributes of the early born athletes make their selection by their coaches and scouts for the top teams more probable (Vincent and Glamser, 2016). The age grouping induces the coaches often to show a preference for the best players in a given period. Thus, football clubs, by making normative decisions, tend to select the older players for the elite team as they look for immediate results in various competitions (Besters, 2018; Sierra-Diaz et al., 2017). The internal selection is usually based on the physical development and the biological maturation of the athletes. These anthropological characteristics work in favor of those that were born early. Their age-disadvantaged peers, who could possibly be more talented, are likely not to receive the same opportunities (Ashworth and Heyndels, 2007). Consequently, people that were born in the first months of the year have a relative age-advantage to be selected for the elite team (Hill and Soteriadou, 2016).
The biased selection and the discrimination against those athletes that were born late, may result in abandonment of the sport at an early age because of the discouragement that comes with this discrimination (Thompson et al., 1991; Helsen et al., 2005). Hence, it is obvious that the RAE can either enhance or diminish the chance for a player to access the elite team (Delorme et al., 2009). The selection system should be structured in such way to mitigate the negative impact of the biased selection because of the relative age differences (Rees et al., 2016). However, even though the RAE is more persistent at younger ages (Navarro et al., 2015), after the adolescent period, as maturation differences become less prominent, the RAE seems to decrease because the skills of each player matter more (Sierra-Diaz et al., 2017; Besters, 2018). However, in sports like soccer it seems that the reduction in the RAE, as time passes, does not create a significant bias in the selection system (Sierra-Diaz et al., 2017; Besters, 2018). The relevant literature will be discussed in the next section.
By examining the impact and the presence of the RAE in the national lower youth categories, under 15 and under 19 years old, in Dutch soccer, this MSc thesis attempts to identify the persistence of the overrepresentation of those players that were born early. Specifically, the main research question that this thesis seeks to answer is how the birth date of a soccer player affects his performance and his selection by the scouts and/or the coaches for the elite top team. In other words, how the birth date does affect the career of an athlete. For that reason, I also investigate those players that have become professionals.
Talent is assumed to be uniformly distributed across birth dates of the soccer players. The RAE which contradicts this hypothesis, implies a non-uniform birth date distribution, and therefore a loss of talent. In my thesis, I find an overrepresentation of the early born athletes, especially at younger ages. Also, I take into account that the overrepresentation changes in line with the change in the cut-off date in the Netherlands. More precisely, the players in the Dutch football are grouped by the calendar year, after the change in the cut-off date from October 1 to January 1 around 1999-2000.
The data that I use in this study have been collected directly from the websites: https://www.onsoranje.nl, http://www.voetbal.com/wedstrijd/ned-playoffs-eredivisie/, http://www.voetbalstats.nl/alledebnedxi.php and https://eu-football.info/. The main dataset contains all the players in the youth academies, U-15 and U-19 of the Dutch national team, who have been born in 1965 until 2000. The dataset consists of 1,300 Dutch football players. More precisely, 425 players stem from the U-15 category and 875 players stem from the U-19 category. In my analysis, I use birth semesters and birth quarters for each player. A ratio S1/S2 was created that takes the value one (1) if there is an equal number of the players who have been born either the first or the second semester in the team. Furthermore, through a Pearson - Goodness-of-fit test I show that the estimated birth distribution is considerably different from the expected birth distribution. In addition, through several graphs, I provide evidence that supports the overrepresentation of those players that have been born in the first semester of the year. Furthermore, I find that the RAE indeed shows a decline after the adolescent period.
Apart from the methods mentioned above, linear models are used to examine whether the differences among the early bloomers and the late bloomers are statistically significant. The results of the parameter estimates suggest an overrepresentation of those athletes that have been born early. A player who was born in the first semester has overall more appearances in the team in both categories.
Furthermore, I investigate those football players that were born during the period 1965-2000 and became professionals by focusing at the ages 18 until 23. Linear probability models are used in this part, and the findings indicate that the early born players are more likely to become professionals, except for the ages of 19, 21 and 23. Moreover, an analysis of the elite team from a sample of 38 Dutch elite players suggests that the RAE still exists, even in the professional top category, but to a lower extent. Finally, I examine the presence of the RAE in a sample of 68 Dutch international players that participated in an UEFA European Championship during the period 1990-2018 while they had never played a match at any category at the youth level of the national team. The evidence suggests an inverse RAE in that small sample of players.
This dissertation is structured as follows. Section 2 provides a related literature review. In section 3, the data, the methodology approach and the empirical analysis are described, respectively. Section 4 presents the results of the model estimation. In section, 5 I present an analysis of the Dutch professional football, through the youth categories U-15 and U-19, the elite team and the international players who never appeared at the national level. Finally, in section 6, I provide some concluding remarks and solutions to mitigate or even to avoid the RAE, based on the existing literature.
There is a mixed broad evidence of the existence of the RAE in sports. This topic has drawn the attention of economists and generated a large body of studies that investigated the presence of RAE in different kind of sports. The leading paper on the impact of the RAE in the UEFA soccer championship is the one by Gonzallez-Villoral et al. (2015). The survey examines the effect in European soccer, by analyzing the results of 16 teams (Czech Republic, Denmark, England, France, Germany, Greece, Croatia, Holland, Ireland, Italy, Poland, Portugal, Russia, Spain, Sweden and Ukraine), for the professional teams, as well as the youth categories U-21, U-19 and U-17, that participated in the football UEFA European Championship of 2012.
The data have been collected directly from the official website of the Union of European Football Associations (UEFA). Birth quarters and birth semesters have been used and the results from the t-tests (Chi-square distribution) do not suggest any specific direction for the effect. Also, the findings suggest that only the birth date distribution differences among the youth categories U-21, U-19 and U-17, are statistically significant. Moreover, by examining the semester in which the players were born, the effect is clearly more evident in the U-17 category in than the other youth categories. The authors suggest that the discrimination for those that were born late diminishes as the level of the age increases. Hence, the results do not indicate a presence of the RAE at higher ages in the professional teams.
A major paper regarding the relationship between the soccer education programs and the age grouping in the youth academies, and how it affects the earnings of the German professional soccer players, is the one by Ashworth and Heyndels (2007). The article introduces its purpose by estimating a wage function for all the players in the Bundesliga for the seasons 1997/98 and 1998/99. The authors find that the professional players that were born close to the cut-off date tend to earn relatively higher wages than those that were born at a larger distance from the cut-off date. Furthermore, the impact on the earnings is related with the position of each player. Specifically, the effect is higher for defenders and goalkeepers, compared to other positions such as the forwards.
One mechanism that explains the results is the existence of a selection bias in the talent scouting system. The scouts tend to promote the oldest players for the top teams, due to their physical development and appearance, who are not necessarily the most talented ones. A simple model is explained in the article in which the marginal productivity of each player increases as time passes. Hence, because those players that were born close after the cut-off date have a relatively better physical development and better skills, they also have relatively higher marginal productivity.
Furthermore, this is the first paper that examines the peer effects in sports regarding the soccer education programs of the academies. In particular, the authors argue that the late bloomers can receive higher quality education. The rationale given for this result was that these athletes have the opportunity to train with their age-advantaged peers who have a critical developmental advantage. Consequently, this interaction leads to more developed technical skills for the late bloomers.
The paper by Vincent and Glamser (2016), analyzes the differences in the impact of the RAE by examining the birth distribution of 804 female and 540 male soccer players, who are 17 years old, in the US Olympic Development Program (ODP). The main scope of the ODP is to support the most highly-skilled youth soccer players, of both sexes, with better training sessions. The descriptive statistics of the study show that highest number of births in US occur in August, September and July. On the other hand, the lowest number of births take place in February, April and January. However, it seems that the early born athletes of that category can still benefit from the RAE even though the births are fewer in the first month of the year. Moreover, male athletes that were born in the first seven months of the calendar year seem to have an age advantage over the rest of the players in their team.
Moreover, the RAE in the paper mentioned above, is explained as a self-fulfilling prophecy process. More precisely, the coaches select the early born athletes as they expect that they will do better on the current period. Hence, it is not unlikely the overall quality of the team to be decreasing in the further future.
The results indicate that the effect is stronger for male than female athletes. This can be explained by the following reasons. First, the biological maturation is different between the two genders. In particular, women grow up relatively much earlier than men. Second, it is possible that the competition for a place in a team differs in intensity between the two genders in many sports.
Besters (2018), distinguishes the differences between internal selection and external selection. External selection refers to the recruitment of players from the outside environment of the team, based on the reports by the scouts. The latter depends only on a few observations. On the other hand, the internal selection is basically the decision by the academy on whether a player must leave the academy or not and depends on many observations. The author uses data from the youth academy of PSV Eindhoven. By constructing two ratios, one for those that enter the academy and another for those that decide to leave the academy, the author finds that the external selection works in favor of the athletes that are born early. In contrast, the internal selection can diminish the presence of the RAE, but the reduction is not enough to offset the consequences of the biased external selection system and, thus, to equality among the players.
Besides, the football academies might fail to identify a high-skilled player who was born late, resulting in a wrong decision concerning whether to keep that player in the academy or not, especially when the player is at a crucial age where he/she still develops his/her technical and tactical skills.
Several studies have shown that the RAE is found to be present in many sports such as ice hockey (Nollan et Howell, 2010), basketball (Delorme et al.,2008; Werneck et al., 2016), rugby (Lewis et al., 2015), baseball (Beals et al., 2017; Sims et al., 2016; Thompson et al, 1991), tennis (Moreira et al., 2017; Baxter-Jones et al., 1995), football (Ashworth and Heyndels, 2007; Helsen et al., 2000; Salinero et al., 2013; Hill and Soteriadou, 2016 ) etc. Furthermore, other studies show that in sports in which the technical skills are more relevant than the maturation, such as dancing and gymnastics, the effect is limited (Rossum, 2006; Baxter-Jones et al., 1995). In addition, it is more probable for the effect to be observed within sports with higher competition, such as football (Musch and Grondin, 2001).
Edginton et al. (2014), investigate the birth date distribution of 388 male medal winning boxers from tournaments during the period 2000-2012. Their findings indicate that the observed birth date distribution values deviate at a significant degree from the expected values. Specifically, it seems that the majority of the participants have been born in the first half of a year, especially in the first quartile.
Furthermore, the existence of the RAE in sports with female participants, is a topic that only a few studies have studied. As already discussed, the volatility of the maturation is larger in male than female athletes, and the competition is expected to be higher in a team with only male participants in relation to a group consisting only of women (Vincent and Glamser, 2016). This implies a greater effect in male youth sports (Musch and Grondin, 2001). In general, the selection system in female sports seems to evaluate the performance of the athletes with higher accuracy. However, the effect varies across individual sports for women, and the direction of the effect is not clear (Romann and Fuchslocher, 2014; Besters, 2018). Hancock et al. (2015), by dividing a group of 921 female athletes into two smaller groups, find that for athletes that are less than 15 years old, the relative age differences among the two groups of interest cannot be observed, but above the 15 years of age a reverse RAE is visible. In addition, there is an overrepresentation of the female gymnasts that were born in the second and the third quarter in a year.
Evidence indicates that an important factor which is deeply related to the RAE is the culture. Sierra-Diaz et al. (2017) states that the selection system in the western countries focuses more on the performance of the athletes at older ages, compared to non-western countries, where the selection system depends a lot on the biological maturation. Additionally, the African countries seem to have an inverse RAE. More precisely, the birth date distribution suggests an overrepresentation of the athletes that were born in the last months of the calendar year. Also, the authors argue that a crucial element is the size of the country. For example, in small countries the RAE cannot be easily identified because the competition is lower, and many people are not interested in a specific sport.
Moreover, most of the research papers about the existence of the RAE in sports suggest that the effect diminishes at older ages because after the adolescent period the maturation differences become less important, and the age advantage of being born early ceases to be visible. The research by Bjerke et al. (2017), investigates the performance of 50 skiers of both genders, in the alpine World Cup for a twenty-year period. The authors argue that, in many sports, such as in the male elite alpine skiing, an inverse RAE can exist.
Besides, evidence from several studies has shown that not only the genetic traits affect the overall performance of a player, but also practice and motivation have a key role in the development of expertise (Besters, 2018; Helsen et al., 2000; Rees et al., 2016; Abbott and Collins, 2004). Williams and Reilly (2000) argue that factors such as sociological and physical attributes can be used to predict the talent and the future of the athlete, and therefore his or her overall performance as an athlete.
Finally, Bandura (1977) states that the advantage of being born early, encourages an athlete to further develop his/her skills, and possibly result in participation in the top team and enhancement of his/her self-confidence level as well.
In the following two sub-sections I provide more details about the data used for the analysis, the strategy of the analysis, followed by the main empirical analysis of my thesis. The methodology is based on the study by Besters (2018).
The dataset for the U-15 category, contains all the players that were born during the period 1970-1994 (for the period 1995-2000 the data are missing). Additionally, for the U-19 category, the sample contains data for the players that were born from 1965 until 2000. Specifically, the dataset consists of 1,300 Dutch football players (425 stem from the U-15 and 875 stem from the U-19). However, 176 players are presented in both categories. The time interval from which I collect the observations for the sample is considered large enough to support the analysis.
By constructing two birth semesters (S1 = January to June; S2 = July to December) I show the overrepresentation of the age-advantaged players who were born early. Furthermore, by designing four birth quantiles (Q1 = January to March; Q2 = April to June; Q3 = July to September; Q4 = October to December) I determine in which quantile the effect is more intense. An overview analysis of the number of the players by category is provided in Table 1. The percentages of the total amounts are presented in parentheses. Table 1 provides evidence that the percentage of the early born football players is relatively higher than the percentage of those that are born late.
Table 1: Overview of the number of the players by category
illustration not visible in this excerpt
Furthermore, the two birth date distributions, by semester and quarter, are depicted in Figures 1.A-1.D for each of the two categories. All figures are in line with what the RAE suggests and those players that were born close to the cut-off date are more likely to be chosen by the scouts and/or the coaches to play for their country. The overall picture suggests that the birth dates are not uniformly distributed. Additionally, it seems that the effect is less intense overtime, as expected.
illustration not visible in this excerpt
Figure 1.A: Birth date distribution by semester for the U-15 category
illustration not visible in this excerpt
Figure 1.B: Birth date distribution by quarter for the U-15 category
illustration not visible in this excerpt
Figure 1.C: Birth date distribution by semester for the U-19 category
illustration not visible in this excerpt
Figure 1.D: Birth date distribution by quarter for the U-19 category
Moreover, a ratio S1/S2 is constructed which is used to determine if the early born players are preferred over their late born peers, and vice versa. More precisely, if the value of this ratio is equal to one (1) then this indicates that the number of the selected early born athletes is equal to the number of the selected late born athletes. Moreover, I created this ratio for every birth semester and birth quarter for each category. Additionally, a - Pearson Goodness-of-fit test was used to examine how well the model reflects the data. Specifically, the - Pearson Goodness-of-fit test statistic examines whether there are statistically significant differences among the observed values and the expected values of the birth semester and birth quarter distributions. This test can show whether the actual observed values of the birth distributions deviate significantly from the expected values. The distribution of birth quarters and birth semester by birth year is presented in the Table 2.A and Table 2.B for each category.
For instance, in 1988, for the U-15 category (Table 2.A), the values of the birth quarters, which are, 13, 6, 5 and 1, summing up to 25, show the number of athletes born in each quarter. Normally, we expect to observe an equal number of athletes in each quarter. In the case of 1988, that number is 6.25 (25/4) persons per quarter. By subtracting the expected value from the observed values, we get 6.75, -0.25, -1.25 and -5.25. Additionally, in 1987, for the U-19 category (Table 2.B), the observed values for the birth semesters are 20 and 8, and the sum is 28. Therefore, the expected value is 14 (28/2) persons per semester. By taking the difference, again, between the actual observed values and the expected value we get 6 and -6. These results are in line with the results of the Figures 1.A-1. D. Those athletes that were born early, have indeed an age-advantage in the internal selection system relative to their late born peers.
Finally, by looking at the - Pearson test statistic values, and their p-values, it is obvious that in general the differences are statistically significant for both categories, especially if we focus on the semester values rather than on the quarter values. Thus, it can be concluded, that the observed values of the birth date distributions by quarters and semesters deviate from the expected values.
 Available on the internet via www.psv.nl/jeugd.
 Retrieved from https://www.scienceforsport.com/relative-age-effect/ [14/05/2018]
 A 0.25 person does not make any sense, but this result is justified for the purpose of the statistical inferences and interpretations.
Master's Thesis, 77 Pages
Master's Thesis, 191 Pages
Master's Thesis, 79 Pages
Research Paper (postgraduate), 37 Pages
Seminar Paper, 18 Pages
Research Paper (undergraduate), 11 Pages
Master's Thesis, 156 Pages
Term Paper (Advanced seminar), 16 Pages
GRIN Publishing, located in Munich, Germany, has specialized since its foundation in 1998 in the publication of academic ebooks and books. The publishing website GRIN.com offer students, graduates and university professors the ideal platform for the presentation of scientific papers, such as research projects, theses, dissertations, and academic essays to a wide audience.
Free Publication of your term paper, essay, interpretation, bachelor's thesis, master's thesis, dissertation or textbook - upload now!