Grin logo
de en es fr
Shop
GRIN Website
Texte veröffentlichen, Rundum-Service genießen
Zur Shop-Startseite › Anglistik - Kultur und Landeskunde

Societal Impact Factors and Major Challenges for Natural Language Processing

Titel: Societal Impact Factors and Major Challenges for Natural Language Processing

Akademische Arbeit , 2019 , 14 Seiten , Note: 1,3

Autor:in: Szahel Kumke (Autor:in)

Anglistik - Kultur und Landeskunde
Leseprobe & Details   Blick ins Buch
Zusammenfassung Leseprobe Details

What is math about? Basically spoken, it is about uniting separate parts and bringing together something new. To understand and to read the new creation, it is important to understand the meaning of each part to create further meaning of the whole. Where is the difference to language? To properly understand a language it is important to split it into smaller units and to understand the meaning of each unit it is necessary and helpful to understand the whole language. The mathematical formula is 1+1= language.

Firstly, this seems to be confusing but lately, logical. Language does consist of smaller units which, once brought together, build up to a new system, a whole language. To understand a language it is therefore indispensable to understand logical connections but it is not necessary to be a math genius. Today’s most striking field of linguistic, Natural Language Processing (NLP), combines the ability to think logically and to analyse language in an encompassing manner.

The aim of this paper is to give a brief introduction on what is Natural Language Processing (NLP) and further, to define several challenges NLP has to face due to online data bias. The challenges which concern the field of technology as well as they influence the social impact form a work frame for the overlaying field of ethical challenges in online data which are going to be displayed in this paper. Not only existing challenges but also future solutions will be a subject of discussion.

Leseprobe


Table of Contents

Introduction

Part One - What is NLP -

NLP tasks before 1960

NLP tasks after 1960

Part Two - The societal impact factors of NLP -

Avoiding exclusion

Avoiding overgeneralisation

Avoiding overexposure

Underexposure and its negative impact on balanced data

Part Three - Major challenges for online research ethics -

Conclusion

Research Objectives & Key Themes

This paper aims to provide a comprehensive introduction to Natural Language Processing (NLP) while critically examining the ethical challenges arising from online data bias and the societal impact of these technologies.

  • The historical development and technical foundations of NLP.
  • Social impact factors including exclusion, overgeneralisation, and overexposure.
  • The relationship between data bias and ethical research practices.
  • Major challenges for online research ethics within the context of social computing.
  • Future-oriented solutions for maintaining ethical standards in data-driven research.

Excerpt from the Book

Part Two - The societal impact factors of NLP -

Any data set carries demographical user’s information the so called demographical bias. For further readings the term ’overfitting’ needs to be explained more properly. ”Overfitting refers to a model that models the training data too well. Overfitting happens when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data.“10 In other words, the assumption that human nature is so universal that finding on one group can be applied to another group as well. The demographical bias which is carried in the data anticipates that the omnipresent universal demographic information can be taken for granted. The typical demographic information is based on standard western-country attributions such as educated, rich, industrialised, and democratic research participants (WEIRD11). This phenomenon means for NLP that the model implicitly assumes all language to be identical to the one it was trained on. This can cause serious consequences as exclusion or demographic misinterpretation. The falsified data lead to wrong assumptions which represents an ethical problem in itself. If the universality and objectivity of scientific knowledge12 is threatened this way the society’s increasing demand for validated methods as basis for making important decisions falls short. Imagine an NLP application which bias is solely trained with standard language technology, a language spoken mainly by white Americans. What if for instance a citizen of Latino descent wants to use the application but it failed to meet his needs in language? The hidden ambiguity creates demographical differences instead of wiping them out. The user-friendly technology would then state by its algorithm which user would be able to benefit from it. This is not fair and therefore a user friendly technology needs to be designed by considering all type of demographic differences. A lack of awareness as previously shown can cause exclusion of people and should be avoided by creating awareness of the mechanism in NLP research and development. Therefore the over-represented group in the training data needs to be downsampled and potential counter-measures to demographic bias should be applied.

Summary of Chapters

Introduction: This chapter defines the basic concepts of NLP through a relatable student narrative and establishes the research scope regarding ethical challenges in online data.

Part One - What is NLP -: This section provides a historical overview of NLP, tracing its evolution from early computational linguistics to modern machine-learning approaches.

Part Two - The societal impact factors of NLP -: This section analyzes how demographic bias, overfitting, and research design influence the fairness and societal outcomes of NLP applications.

Part Three - Major challenges for online research ethics -: This section explores ethical frameworks, such as the Belmont Report, and applies them to the complexities of online social computing research.

Conclusion: This final chapter synthesizes the findings, calling for a re-evaluation of ethical guidelines and increased transparency in future NLP research.

Keywords

Natural Language Processing, NLP, Societal Impact, Online Research Ethics, Data Bias, Demographic Bias, Overfitting, Belmont Report, Machine Learning, Computational Linguistics, Artificial Intelligence, Digital Trace Data, Transparency, Research Fairness, Sentiment Analysis.

Frequently Asked Questions

What is the primary focus of this work?

The paper examines the intersection of Natural Language Processing technology and the ethical implications arising from its use, particularly concerning online data bias and societal impact.

Which thematic areas does the author cover?

The work covers the technical evolution of NLP, the societal risks of algorithmic bias, and the urgent need for updated ethical codes in online research.

What is the core research goal?

The goal is to define the challenges NLP faces regarding data bias and to propose an ethical framework that balances technological utility with the protection of human subjects.

What scientific methods are applied in the study?

The study utilizes a review of established literature, analysis of linguistic theory, and an examination of ethical guidelines like the Belmont Report to address modern digital research challenges.

What topics are discussed in the main body?

The main body details the history of NLP tasks, specific societal impact factors like exclusion and overgeneralization, and the complexities of enforcing ethical transparency in online data collections.

Which keywords best characterize this research?

Key terms include Natural Language Processing, data bias, societal impact, online research ethics, and transparency in algorithmic development.

How does the author define the relationship between NLP and the Belmont Report?

The author argues that while the Belmont Report provides foundational ethical principles, these must be re-elaborated to account for the unique, large-scale nature of modern online data and social media platforms.

Why is the concept of "WEIRD" participants significant in this paper?

The author uses the "WEIRD" (Western, Educated, Industrialized, Rich, Democratic) acronym to illustrate how demographic bias in training data leads to models that do not accurately represent or serve global, diverse populations.

Ende der Leseprobe aus 14 Seiten  - nach oben

Details

Titel
Societal Impact Factors and Major Challenges for Natural Language Processing
Hochschule
Universität Kassel
Note
1,3
Autor
Szahel Kumke (Autor:in)
Erscheinungsjahr
2019
Seiten
14
Katalognummer
V1112090
ISBN (eBook)
9783346485274
ISBN (Buch)
9783346485281
Sprache
Englisch
Schlagworte
societal impact factors major challenges natural language processing
Produktsicherheit
GRIN Publishing GmbH
Arbeit zitieren
Szahel Kumke (Autor:in), 2019, Societal Impact Factors and Major Challenges for Natural Language Processing, München, GRIN Verlag, https://www.grin.com/document/1112090
Blick ins Buch
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
Leseprobe aus  14  Seiten
Grin logo
  • Grin.com
  • Versand
  • Kontakt
  • Datenschutz
  • AGB
  • Impressum