Excerpt
Table of contents
1 Introduction
2 Industry dynamics and new industry emergence
2.1 Industry definition
2.2 The industry lifecycle
2.3 New industry emergence
3 Big data analytics
3.1 Big data
3.2 Data analytics
3.3 Selected big data analytics techniques
4 Big data analytics and industry emergence
4.1 Big data analytics as an industry
4.2 Big data analytics in the pre-emergence phase (1980 to 2004)
4.3 Big data analytics in the emergence phase (2005 to 2010)
4.4 Big data analytics in the growth phase (2011 to today)
5 Conclusion
References
1 Introduction
„If you can’t measure it, you can’t manage it“ – Peter Drucker.
Companies have used statistical and analytical methods for hundreds of years as a basis for decision making. However, recent technological developments such as the internet, cloud computing and big data analytics open up entirely new dimensions of data analysis that businesses can use to improve decision making.
But what is data? In its purest form, data is simply characters and numbers strung together to represent something in the real world. Raw data is not particularly useful. So why the hype? By using certain processes and methods, it can be used to generate information. By adding additional context, it can be used to turn information into knowledge.
Big data is a buzzword in today’s world, but there is little consensus to what it actually means. This research paper will take a closer look on the Big data analytics industry and how it developed.
It starts with an overview about current theories of industry emergence. Then, big data analytics will be defined. This will serve as a basis for the question whether big data analytics can truly be considered an industry and if it can be categorized as an industry emergence as currently understood by scholars. The paper ends with a short conclusion, a description of the challenges that were encountered while doing this research and an alternative concept to industry evolution.
The term big data analytics is sometimes abbreviated as BDA throughout this paper due to the frequent usage in the text to prevent unncecessary repitions.
2 Industry dynamics and new industry emergence
2.1 Industry definition
Although the term ‚Industry‘ is frequently used in the media, by professionals and by scholars, there is no consensus about how to define it. Therefore, two definitions will build the foundation for further discussions.
Michael Porter (1980) defines an industry as a group of companies offering products or services that are close substitutes. He is one of the earliest scholars who addressed industry structures and industry evolution. His definition is used in several works about the industry lifecycle.
Gregory Theyel (2017) describes an industry as a group of firms aiming to meet the needs of a target group of customers with similar products and/or services. His book about industry emergence appeared only last year, and he builds up his own definition by combining commonly used definitions.
These two definitions are not the same, although they are similar. Theyel includes the existence of a target group into his definition. Also, he does not require the products to be substitutes allowing more products to be included without proving mathematical relations between them such as positive cross-price elasticity.
2.2 The industry lifecycle
Over time, many scholars have become interested in the evolution of industries. However, the literature on this subject is very fragmented and does not follow a holistic concept. The terms industry lifecycle, product lifecycle, industry development and industry evolution are used as synonyms to describe similar phenomena in the literature on this topic. In the next paragraphs, some influential and relevant works on the research of industry development will be referenced.
Gort and Klepper (1982) divide the industry lifecycle into five stages by looking at the number of producers of a product over time. They did not name the stages at this point. Only two years later, Klepper and Graddy (1990) used the same empirical data of the study from 1982 to identify three stages in the lifecycle based on the number of firms, the output produced, and the price over time. Low and Abrahamson (1997) also classify the industry lifecycle into three stages. They are the first scholars to introduce terms that relate to the three phases – emergent, growth and mature. Anita McGahan (2004) includes a decline stage where sales drop, and the profitability of firms shrinks into this „traditional model“. This short overview is not a comprehensive list of works. Other influential authors include Agarwal and Gort (1996), Klepper and Simons (1997) and Utterback (1994).
Most authors do not offer any precise information about the transition between phases that can be used to pinpoint industries which are currently making a change in their respective stage. Additionally, none of these papers look specifically in one of the mentioned stages. Forbes and Kirsch (2011) go as far as naming it a „consequent lack of research attention“. During the progress of this research, some works that focused on the process of industry emergence could be identified. They are discussed in the following section.
2.3 New industry emergence
The definition of emerging industries according to Porter (1980) is “a newly formed or re-formed industries that have been created by technological innovations, shifts in relative cost relationships, emergence of new consumer needs, or other economic and sociological changes that elevate a new product or service to the level of a potentially viable business opportunity”.
Gregory Theyel (2017) describes industry emergence as the birth and early growth of new industries.
According to Roderick MacDonald (2010), there are two types of literature on emerging industries. Some scholars focus on one industry, while others are trying to build models around the industry lifecycle and industry evolution.
Recent studies on emerging industries include the solar photovoltaic industry in Germany and China (Liu & Starik, 2018), human suborbital space transportation (Davidian, 2018) and floating offshore wind energy (Bento & Fontes, 2019). These types of studies describe the development of the chosen industry without specific regard to theories of new industry emergence.
Conceptual models of new industry emergence were developed by Agarwal & Bayus (2004), Phaal et al. (2011) and Gregory Theyel (2017). These papers will be used for further analysis. Their models show similar methodologies on categorizing different stages in the emergence process.
Agrawal and Bayus (2004) focus on changes in the number of firms and sales to differentiate phases in the emergence process. The work of Phaal et al. (2011) is especially interesting because their framework on emergence was developed for technology and science intensive sectors and therefore only focuses on the emergence that occurs due to new scientific and technological discoveries. The authors also place a particular focus on the transition between the identified emergence phases. Gregory Theyel (2017) emphasizes the importance of strategic synchronization between different elements as a key concept for industries to successfully emerge.
2.3.1 Pre-emergence stage
The first stage that scholars identified in the emergence process is a pre-industry phase. Phaal et al. (2011) identify two sub-phases in this stage, namely the precursor phase and the embryonic phase. The precursor phase starts with a scientific discovery or breakthrough that is the starting point for future research and ends with the first translation into a new technology of this discovery. The embryonic phase refers to the development of this new technology to the point where it can be used for commercial purposes. Agrawal and Bayus (2004) term the first stage pre-firm take off and base it on the time between the scientific discovery and an increase in the number of firms. Initially, there is only a small number of companies in this stage which they call „creators“. Gregory Theyel (2017) characterizes the concept phase according to the number of firms, the state of technology, the investment used and the target market. In this state, few firms are in the market and they focus on innovating and gaining property rights to their innovations. Technology is still in the beginning stages with basic prototypes being developed. Outside investment is limited, entrepreneurs often finance themselves with personal investment. There is no leading business model and a wide variety of products are available. Finally, the market is not yet developed and only customers termed „visionaries“ are already interested in the products produced in this initial stage.
2.3.2 Emergence stage
According to Forbes and Kirsch (2011), the emergent stage is where the new industry starts to formally exist. It is the stage in the industry lifecycle where many new firms enter the newly created industry. Phaal et al. (2011) describe the nurture phase as an incremental step-by-step mechanism where the new technology improves in performance and price. It ends at the point where the new technology can be introduced into the mass market. Basic science loses importance in this stage whereas the application of the existing knowledge is in the focus. Gregory Theyel (2011) names the standard setting competition between newly entered firms as the most important factor in the validation stage. The viability of the new technology is proven, and external investors are starting to get interested. Firms are distributing their pilot products. The „early adopters“ are starting to adopt the new technology. There is a convergence toward a proven business model, companies try to establish themselves as „standard setters“.
2.3.3 Growth stage
The growth stage is defined as quickly increasing industrial growth accompanied by increasing marketing, business development and commercial development activities (Phaal et al., 2011). Agrawal and Bayus (2004) similarly determine the transition into the post sales take-off stage as a sharp increase in the amount of sales. Gregory Theyel (2017) mentions the shift from product innovation to process innovation as an important step in the diffusion phase. The technology is mature and ready for deployment. Public Offerings (IPO), acquisitions and corporate investment are common as investment types. The mass market gets interested as the technology or the product grows more „fashionable“.
3 Big data analytics
Big data analytics (BDA) can be defined as the use of advanced analytic techniques on Big data sets (Russom, 2011). The research company Gartner defines advanced analytics as the autonomous or semi-autonomous examination of data or content using sophisticated techniques and tools (Gartner, 2018). Often, those methods are used to examine complex relationships between different variables.
The use of analytics in a Big data environment poses new challenges on existing methods due to the properties of Big data sets which are explained in the next sections.
3.1 Big data
To start the discussion about BDA, a definition of Big data needs to be established first. Most scholars characterize Big data based on the „3 V’s“ – Volume, Variety and Velocity (e.g. Shah, Rabhi, & Ray, 2015). In fact, over 59 papers explicitly mention these three terms to define Big data (Sivarajah, Kamal, Irani & Weerakkody, 2017). Other scholars extend this to the „6 V’s“ (e.g. Chen & Zhang, 2014) which additionally include Veracity, Variability and Value.
3.1.1 Volume
Volume is the most commonly understood property of Big data sets. The data created worldwide increases every two years and is expected to reach 160 zettabytes in 2025 (statista.de, 2017). One zettabyte equals one billion terabytes. Currently, the amount of data worldwide is approximately 30 zettabytes.
The invention of the internet and the growing use of personal computers and mobile devices are the main reasons of this increase. Another reason is the increasing amount of Internet of Things (IoT) devices which refers to the fact that many initially „unsmart“ objects are being equipped with sensors, for example regular home appliances. The devices connected in the IoT is predicted to triple from about seven billion today to twenty-one billion in 2020 (statista.de, 2017).
3.1.2 Variety
A second factor that is unique to Big data is the variety in data types. Data can be classified into three big categories.
Structured Data is all data that can be represented in a data model. This means that data types and data relationships must be clear and easily definable. The most common type of structured data sources are relational databases. A relational database is a set of tables with predefined relationships between these tables. Enterprises commonly use them for managing transactional data (such as orders, payments), supplier information and customer information. It is estimated that only about five percent of enterprise data is structured data with the other data being semi-structured or unstructured. Structured data can be easily stored, entered, changed, deleted and analyzed. The most common way is to use structured query language (SQL) to process queries for database management systems (DBMS).
Semi-structured Data is essentially a form ofstructured datathat does not conform with the formal structure ofdatamodels associated with relational databases or other forms ofdatatables. There is no clear consensus about what this data type is. Typically, it is described as data that has unstructured parts but still conforms to a specific structure by using tags. Examples of semi-structured data are extensive markup language (XML) and IoT sensor logs. Many IT professionals do not use this classification when describing Big data.
Unstructured Data includes all data that does not fit in the two categories above. Examples include videos, social media posts, pictures, audio data, clickstreams and unstructured text such as blogs and websites. Big data methods often use mechanisms to transform unstructured data into structured data. Unstructured Data allows analysts to use the data that is collected more flexible because it does not have to be transformed and standardized into one specific format but instead in any way that the analysis process needs to be scaled to. The amount of unstructured data is increasing at a much higher rate than structured data.
[...]