Long-Run Human Capital Development in Italy and the Influences of the Mafia

Bachelor Thesis, 2019

40 Pages, Grade: 1,3


Table of Contents

I. List of Figures

II. List of Tables

IV. Abstract

1. Introduction

2. Data
2.1 Age Data and Data Structure
2.2 Independent Variables and Investigation of Sources
2.3 Problems and Difficulties

3. Human Capital Analysis Italy
4.1 Methodology
4.2 Human Capital Development in the Long-Run
4.3 Gender based Human Capital Development
4.4 Regional Human Capital Development

4. Empirical Analysis
4.1 Regional Human Capital Development
4.2 Difficulties in Regression Analysis
4.3 Endogeneity Problems and Multicollinearity
4.4 Main Results

5. Development of Mafia
5.1 Origin of the Mafia
5.2 Human Capital Influence

6. Conclusion

V. Bibliography

VI. Data Sources

VII. Appendix

List of Figures

Figure 1: Census Distribution Rome 1600-1800

Figure 2: Marriage Register Distribution Vincenza 1700

Figure 3: Death Records Distribution Turin 1600

Figure 4: Original ABCC Distribution

Figure 5: ABCC Distribution starting 1340

Figure 6: Adjusted ABCC Distribution Italy

Figure 7: Gender based ABCC Distribution Italy

Figure 8: Gender Based ABCC Northern Regions

Figure 9: Gender Based ABCC Southern Regions

Figure 10: ABCC Northern Regions

Figure 11: ABCC Southern Regions

Figure 12: ABCC North-South Italy

Figure 13: ABCC Values 1650

Figure 14: ABCC Values 1800

Figure 11: ABCC Provinces of Sicily

Figure 16: ABCC Distribution starting 1330

Figure 17: ABCC Distribution starting 1350

Figure 18: ABCC Distribution without Death Records

Figure 19: Adjusted ABCC Distribution

Figure 20: ABCC Distribution Sicily-Lombardy

List of Tables

Table 1: Sources Composition 1740-1790

Table 2: Empirical Analysis ABCC

Table 3: Empirical Analysis South Dummy

Table 4: Empirical Analysis Mafia-ABCC

Table 5: Sources Composition

Table 6: Sources Distribution 1730-1780

Table 7: Original ABCC

Table 8: Number of Cases per Birth Decade

Table 9: Pearson's Correlation


This bachelor thesis has the objective to offer a deeper insight in the Italian human capital development in the long run. Hereby, the thesis is also investigating the differences in development in the long run. Hereby, the thesis is also investigation of economic indicators and their connection to regional numeracy takes up ideas and results of other scholars and subsequently implements them in the empirical analysis. The presence of an actual North-South descent in numeracy was reinforced, especially for more recent time. The empirical analysis, however, could not confirm a significant relation between the regional numeracy on Sicily and the presence of criminal organizations.

1. Introduction

Human capital of a country is more than just the sum of the number of school years of a coun- try's population. Human capital is a summary of education, skill and talent. As obvious, for skill and talent it is hard to find an objective measure which, additionally, does not have any endogenous problems when running a regression. For researchers there is only left to concen- trate on the education of people to measure human capital more specifically. One would not only count schooling years to the generic term "education", but also obviously numeracy and literacy. Due to the lack of records concerning schooling or literacy, researchers are using other methods to estimate human capital. This proved to be extremely important for long-run studies.

Human capital is seen as one of the key factors for the development and growth of an economy. Hippe and Baten (2011) show that human capital is of major importance and a big driving factor for economic development in every country today as much as it was the case centuries ago. The economic development could hardly be any more different on a regional basis in one country than it is the case in Italy. The Mezzogiorno, as the southern part of Italy is also known, had and partly still has to fight with a lot of challenging circumstances that made by people's economic situation less advantageous than it was the case in Northern and Central Italy. The envi- ronmental conditions as well as the situation of the infrastructure was complicating economic growth and development (Dickie, 1999). Furthermore, the South could not benefit from the relatively sophisticated economy that was already established in northern regions as Milan and Venice as they did not have enough connections between the domestic economy.

My thesis is dedicated to research the long-run development of human capital in Italy, both for the country as a whole and regional differences. On this occasion, I want to take a closer look on the north-south descent, which has been addressed in some papers before (Davis (2012), Federico, Nuvolari and Vasta (2019)). First, I will describe the age data sources, its structure and subsequently the sources and collecting of independent variables in the dataset. These in- dependent variables will later be included in an empirical analysis about the connection of hu- man capital and economic development. In this regard, I will enlarge upon human capital de- velopment, especially regarding gender and region. In this section it shall be differentiated be- tween bigger territories as well as the specific provinces of Italy. Then, the impact of different independent economic indicators on the numeracy value will be investigated by regression anal- ysis. In this context, historical and regional differences in human capital, the time period from 1800 to 1900 will be considered more closely because of the presence of mafia and criminal activities in the south of Italy. An empirical analysis shall show if the increased presence of mafia and criminality is influenced by the human capital value in the specific provinces of Sicily, where the most famous mafia organization had its origin. The last part concludes with the most important results.

1. Data

2.1 Age Data and Data Structure

The age data panel consists out of 64 different data sets with different sources. I divided the sources in the five categories census, death registers, marriage registers, inquisition and hospital records (ordered by number of total observations). Especially the division in the first three cat- egories is of major importance because of the later methodology and the categorie's influence on its results. The categories inquisition and hospital and its possible influence can be neglected because of its relatively small sample size in relation to the data's total number of observations. The online bibliography ISTAT published some imported census data that are included in this thesis as for example the Italian census of 1871 which covers most of the Italian regions. Fur- thermore, the website familysearch.org was helpful with data of all three categories. As for death registers there would be as specific examples one of Naples 1772-1801 or Turin 1750- 1801. The hospital records are rather limited to only specific question years like the ones of 1687 and 1708 in roman hospitals. For some of the data made available by the chair of Eco- nomic History the source was not clearly identified.1 To assign them, regardless of the missing information, to one of the categories it proved be very helpful to look at the age distribution of the individual data set.

Abbildung in dieser Leseprobe nicht enthalten

Figure 1: Census Distribution Rome 1600-1800 Figure 2: Marriage Register Distribution Vincenza 1700

Abbildung in dieser Leseprobe nicht enthalten

Figure 2: Marriage Register Distribution Vincenza 1700

Abbildung in dieser Leseprobe nicht enthalten

Figure 3: Death Records Distribution Turin 1600

The first figure can be determined as a typical census age distribution where the number of cases decrease with increasing age. The second one is typical for a marriage register. It is obvious that the number of cases has its peak in the mid-twenties, which is the most prominent age for marrying, especially in former times. The last age histogram shows us a death record, where the highest number is either before the age of ten or after the age of 60. To mention here is that this distribution is out of the 17th century, where infant mortality rate was still high. The high variability of the data ensures the random- ness of the observations. In addition, I checked each dataset individually for possible counter- checking which would distort or threaten to distort the results, depending on the relative sample size to the absolute number of observations. Also, I complemented available data with province and region variables to guarantee the greatest possible covering of all regions for later compar- ison. In the major part of the data the gender is identified which makes it possible to study gender gaps in human capital over time and regions. The appended data set includes now about 11 million observations and extends over six centuries with birth decades starting 1330 to 1910.

2.2 Independent Variables and Investigation of Sources

The thesis is not only concerned about the development of human capital but also wants to question the different economic development in regions and the possible role of human capital in it. This, however, might be rather difficult for most countries especially for the time before 1800.

The simplest method for measuring the economic productivity would be, of course, taking the GDP of the country as the dependent variable. In order to investigate time periods when GDP measuring was not yet possible, it is necessary to be more creative in finding indicators for economic productivity. On possible indicator is the urbanisation rate and the growth rate of urban inhabitants during the time. Malanima (2005) admits a connection between the produc- tivity of the Italian economy and the urbanisation. The implication is that a national economy had to display progressiveness and a certain level of productivity to be able to provide enough food for a large city's population. A city's inhabitant tended to be in the secondary or tertiary sector of activity. As we know in the medieval and early modern period, craftsmen and tributed to the formation of large trading companies in cities. This part of the population could not as in earlier times grow and provide food for themselves. Consequently, the agricultural sector had to be managed more efficiently for higher productivity and outcome. Wrigley (2016) studied the limiting nature of the agricultural sector but also the incentive character of the urban sector. Without the demand for aliments from urban regions, farmers in rural parts did not have any reason to produce more than for local needs. Chandler (1987) collected in his book data about the population of cities and suburbs with more than 5000 inhabitants over a time of four thousand years. I could extract the data about Italian cities for the required period. Furthermore, I added the related provinces and regions to the listed cities. Next, I calculated the mean urban population of all cities in a region. In order to catch the changes in urban population and be more independent from total numbers, I calculated a growth rate depending on the previous mean urban population of the region.

The settling of craftsmen in cities lead to the next point: the formation of website of"Institutions for collective action" data was available about the number of guilds in some regions of Italy as Venice, Rome, Turin and Milan. The records started with the 13th century until their abolishment in the beginning of the 19th century. Unfortunately, the abolish- ment year was not mentioned for all guilds. Naturally, one can conclude that a higher number of guilds in a region and, therefore, number of craftsmen has relation with the economic produc- tivity in this region.

Returning to the importance of the agricultural sector, we can take a closer look at the important factors for high prosperity. Apart from rainfall the most important factor is, of course, the tem- perature. Guiot and Corona (2010) evaluated trends and deviations of European spring and summer temperatures by means of temperature reconstructions starting in the 15th century. The data was covering the region from the 27.5°N to 72.5°N and from 7.5°W to 57.5°E. The tem- perature deviations in °C are relative to the average temperature 1961-1990. The observations were recorded in 5° steps. With the help of latitude map, I could determine the most relevant observations for Italy from 47,5°N to 37,5°N and 7,5°E to 17,5°E. As a next step, I assigned the observations to regions that are geographically closest the longitude and latitude values. Because of the 5° steps, this could only be done roughly. For the regions which were located in between those observations, e.g. Campania and Basilicata, I calculated the mean between the temperature deviations of 42,5°N and 37,5°N, which should apply because of the relatively homogeneous topography from north to south. Finally, in order to make the data comparable to the age data in the analysis, I computed the mean of one decade of temperature deviations for each region because the age data is measured in observations per birth decade and also in order to avoid extreme values in one year.

In recent works different scholars concluded that the European Marriage Pattern (EMP) indeed had an effect of the education (De Moor and Van Zanden 2010, Foreman-Peck 2011). A higher average age at the first marriage for example proves to have an influence as women had a chance increase their human capital when marrying later. This knowledge and education, eventually, could later be passed on to the next generation, too.

In order to be able to investigate if there is a connection between the regional numeracy level and the presence of mafia and criminal activities, I collected mafia data on a regional level. Catanzaro (1992) and Dickie (2004) give some indications about mafia presence on a regional and time basis. I follow Buonanno et al. (2015) and used the mafia data put together by a for- mer police officer named Cutrera in 1900 and of a parliamentary inquiry of Damiani in 1885. These sources were also used in the paper of Acemoglu, De Feo and De Luca (2017).

2.3 Problems and Difficulties

After merging the different data sets, I had to realize that for the relevant origin period of the mafia in Sicily starting 1860 there were almost no observations. As mentioned before, the online bibliography of ISTAT published online the original documents for different census starting 1861. Therefore, I picked out the relevant provinces for Sicily in the census data of 1881, in order to compare it later to the mafia presence described by Damiani in 1885. To provide an overview over changing circumstances, I intended to use the same approach with the census from 1901 and compare them to mafia presence in 1900. Starting the 1901 census the data compliers used a different method in collecting the census data. They used age groups of five starting age 21 and aggregated the number of cases in this groups. Consequently, this census data was useless for my research because the exact age statements cannot be traced back.

As mentioned above, for some age data sets the source could not be traced back or simply there were no clues given. In most cases I could deal with it with the help of the age distribution in this data. Furthermore, I checked each dataset for possible counterchecking or that the source might be a transcript made some years after the original recordings. Consequently, a group of people who stated their age is 40 or 45 would later be listed as stating their age as 43 and 48 in a transcript made three years after. This would bias the outcomes and overestimate the numeracy. However, the total number for the individual ages showed than multiples of five have been the ages, with a few exceptions, that showed the highest numbers of total observa- tions.

For the few exceptions, where no clear pattern for the multiples of five was visible, this fact could originate from the fact that most of this data's observations were ages from zero to 35. These relatively younger part of the population proved to be more aware of their age and could state their age in records more precisely (Crayen and Baten 2010).

Some cities in the urban population database from Chandler (1987) could not be matched with the provinces and regions because the listed names of the city were too imprecise. For example, the listed "Montemaggiore" could belong to two different cities, Montemaggiore Belsito or Montemaggiore al Metauro. So, this observation could belong to either Pesaro and Urbino in Marche or to the region of Palermo in Sicily. Cities that could not have been undoubtedly matched were left out of the analysis.

For the ISTAT census data from 1881 regarding data for Sicily it proved to be difficult to in- clude the data on a province level. First, today's province Enna in the heart of the Italian island was only founded in 1927. Before that, the region of Enna was divided into the district di Nic- osia and di Piazza Armerina, which both belonged to the province of Caltanissetta. Furthermore, today's province of Ragusa was in 1817 part of the province Noto, which was renamed as prov- ince of Syracuse when the Italian kingdom was founded in 1865. The province Ragusa, as we know it today, has only been formed in 2015.

Furthermore, because many variables as the urbanisation data and the guild number is only available for specific decades, I had to face the fact that doing a regression analysis taking all variables into account would not provide significant results. The reason lies within the small number of decades where each of the variables are available. In order to provide a solution a get meaningful results, I decided to split the regression analysis into smaller regression analysis that consider less variables at the same time.

2. Human Capital Analysis Italy

3.1 Methodology

The method of age-heaping is already firmly established among scholars researching human capital development (Crayen and Baten 2010). The idea behind using age heaping strategies is that we do have only limited records of numeracy and literacy rates. Schooling records and enrolment rates are only available starting the 19th century for some and in the 20th century for most of the countries. In order to be able to estimate literacy rates for even earlier periods, one popular method has been the signing ability of people. The reliability of this measure, how- ever, has been debated by many historians so far (Houston 1885). The age heaping method is one way to complement this method of measuring. Chrisomalis (2009) also scrutinises the co- evolution and therefore the connection between numeracy and literacy. This gives us the pos- sibility to extrapolate from numeracy to literacy rates in a certain period and region. And these two measures are indicative of the human capital development.

For calculating the abcc value the observations are subdivided into age groups, starting with 23 up to age 72, always in steps of 10. The reason for omitting the ages before and after is that before age of 23 there is a probability that the parents have still indicated the age and after 72 the so-called "survivor-effect" must be taken into account. To put it simply, more educated people have a high tendency to also receive a higher income and therefore have a higher chance of living longer than less educated people. The groups are formed in this way so that the round numbers, i.e. those ending in 0, are in the middle of the sample to counteract possi- ble distortions, e.g. because there will be more respondents at age 60 than at age 69. The abcc value is calculated by means of the Whipple's index. It measures on a scale of 100 to 500 how many people must estimate their own age in a census. The different ages are enumerated and divided by their sum, assuming that one fifth of the respondents really have an age ending in 0 or 5. At 100, there is no age-heaping, i.e. the age is statistically evenly distributed or as many have given a rounded age that ends in 0 or 5 as others have given an age that does not end in 0 or 5. At 500, all ages end at 5 or 0.

3.2 Human Capital Development in the Long-Run

The first observations start with the birth decade of 1330. Looking at the first abcc values in 1330 in seems rather unlikely that the numeracy rates in the beginning of the 14th century are supposed to be around 100%. This is because the number of cases for the first two birth dec- ades is low, with only 351 cases, in comparison to 1350, with 15600 observations. Therefore, the sample for these age groups is not representative and will be excluded from further analy- sis.2

Abbildung in dieser Leseprobe nicht enthalten

Figure 4: Original ABCC Distribution Figure

Abbildung in dieser Leseprobe nicht enthalten

Figure 5: ABCC Distribution starting 1340

That would lead to other unexpected developments. First, the high abcc values from 1380 to 1430 and second, the low abcc values from 1740 to 1790. The second anomaly can be ex- plained when looking at the composition of sources during this time period.

Abbildung in dieser Leseprobe nicht enthalten

Table 1: Sources Composition 1740-1790

More than half of the total observations taken by death records lie within this period of time. These are double the observations from a census. Death registers are known for having a down- ward bias in numeracy. In some cases, the decedent could not state his age before dying and in this case the priest had to estimate the age of the decedent. In order to get an adjusted abcc value, the formula ABCC_adj= 19.38 + 1.202 * ABCCobs is used (Bassino and Baten 2016). This should only be applied by abcc values that are substantially under 100.

Starting 1740 until 1780 the observation proportion from death register are substantially higher than the share of normal census observations. If you were to drop them, it would lead to losing a high number of cases which consequently would also lead to imprecision in estimation. I controlled for other birth decades and none have such a high share in death records than the time period mentioned above. One possible solution would be to do a proportional adjustment which considers how big the share of death register observation is in this specific birth decade and adjust the abcc index accordingly to it.3 This can be done to the time period of 1750-1780. The other values such as in 1740 and 1790 proved to be already too high for further adjustment. Normally, marriage register also need lower adjustment because people marrying are generally in their twenties to mid-thirties and this population share can state the age precisely than others. This fact will be ignored in this thesis because the marriage observations only represent 0,02% of the total observation number. The adjustments in the time period 1750-1780 lead to a steadier development of the abcc value as can be seen the adjusted graphic.

Abbildung in dieser Leseprobe nicht enthalten

Figure 6: Adjusted ABCC Distribution Italy

The other irregularity in the period from 1380 to 1430 is more unusual because the high abcc values cannot be explained by simply looking at the sources. One possible reason could be the low number of cases for 1420 and 1430 of only about 800 observations. This also could be the cause for the relatively low abcc values in 1480-1500. The high values of 1400 are particularly contradictory as the Black Death reached the peninsula and it cost the lives of approximately half the population (Acemoglu and Robinson 2013). This, obviously, also should have led ter- rible economic and humanitarian situation. Even if the rather extreme values of this period are due to the lower observation number, the overall development of the abcc value are utterly consistent with Malanima's(2010) findings of the economic development in Italy. He mentions a promising industry in northern and central Italy in the 14th century which declined during the 15th century. The trade and industry recuperated in the 16th century and declined once again in the beginning of the 17th century. Then finally, the economic productivity increased again in the 18th and 19th century.


1 For specific source information see Appendix A.

2 See in Appendix B the differences in abcc development and the detailed influence of death registers on the result.

3 One example for part adjustment of the birth decade of 1750: Normal abcc value: 61.07869 Census share: 3847/28275 Death register share: 24428/28275 Adjusted Abcc: (3847/28275)*61.07869 +(19.38+1.202*61.07869)*( 24428/28275)=88.48116 See calculations for part adjustment and structure of sources during critical period in Appendix C.

Excerpt out of 40 pages


Long-Run Human Capital Development in Italy and the Influences of the Mafia
University of Tubingen
Catalog Number
ISBN (eBook)
ISBN (Book)
long-run, human, capital, development, italy, influences, mafia
Quote paper
Victoria Lindner (Author), 2019, Long-Run Human Capital Development in Italy and the Influences of the Mafia, Munich, GRIN Verlag, https://www.grin.com/document/506776


  • No comments yet.
Read the ebook
Title: Long-Run Human Capital Development in Italy and the Influences of the Mafia

Upload papers

Your term paper / thesis:

- Publication as eBook and book
- High royalties for the sales
- Completely free - with ISBN
- It only takes five minutes
- Every paper finds readers

Publish now - it's free