Grin logo
en de es fr
Shop
GRIN Website
Publicación mundial de textos académicos
Go to shop › Filología inglesa - Otros

Word prediction and word probability exemplified in searches over a pharmaceutical database

Título: Word prediction and word probability exemplified in searches over a pharmaceutical database

Tesis (Bachelor) , 2009 , 63 Páginas , Calificación: 1,0

Autor:in: B.A. Marc Bohnes (Autor)

Filología inglesa - Otros
Extracto de texto & Detalles   Leer eBook
Resumen Extracto de texto Detalles

This Bachelor of Arts thesis contributes to the CREAM project between Novartis Pharma AG and Bielefeld University. Throughout the thesis a method called n-gram modeling will be discovered which supplies its user with information about the frequential use of words. This information will be needed in order to improve a database the CREAM project works on. This improvement is to do with a calculation of probabilities in search queries sent to the database. The thesis consists of five chapters. The first chapter introduces the CREAM project and the database. The second chapter provides the reader with information about the current state of n-gram modeling and where it can be found in contemporary literature. The third chapter deals extensively with how corpora have to be prepared in order to be analyzed accordingly and how n-gram modeling can be computed in terms of frequential distribution of words. In chapter four a computer code will be introduced that uses a corpus to obtain certain n-grams. Finally, in chapter five, the information retrieved by the computer code(s) will be evaluated and a forecast of future work will be mentioned.
Due to copyright-protected material, the appendix is not part of the thesis.

Extracto


Inhaltsverzeichnis (Table of Contents)

  • Introduction to CREAM
    • The Corpus Research for Exploitation of Annotated Metadata project (CREAM)
      • The main topic
      • The Corpus eNova Database
        • eNova Application and Sample Walk Through
      • The main goal
  • State of the Art
    • Contributions to Word Prediction and Word Probability
      • Current Contributions to n-Gram Analysis
    • Applied n-Gram Analysis
      • National Security
      • Spelling Correction
        • Other Areas Related to Spelling Correction
  • n-Grams - Word Count and n-Gram Modeling
    • Introduction to Word Count in Corpora
    • Tokenization - Word Segmentation in Corpora
      • Word Types vs. Word Tokens
      • Stemming and Lemmatization
      • Non-word Characters
    • Parameters of Tokenization
      • Compounding and Words Separated by Whitespace
      • Hyphens
      • Case-(In-)Sensitivity

Zielsetzung und Themenschwerpunkte (Objectives and Key Themes)

This Bachelor of Arts thesis aims to contribute to the CREAM project, a collaborative effort between Novartis Pharma AG and Bielefeld University, by exploring the use of n-gram modeling for improving a database used within the project. The thesis examines the potential of n-gram modeling to calculate probabilities in search queries submitted to the database, ultimately enhancing search efficiency and effectiveness. Key themes explored in the thesis include:
  • N-gram modeling and its application in word prediction and probability calculation
  • The role of corpora in n-gram analysis, particularly in preparing and analyzing data
  • Practical applications of n-gram modeling in areas such as national security and spelling correction
  • The methodology of tokenization and its importance in processing corpora for n-gram analysis
  • The development and implementation of computer code to extract n-grams from corpora

Zusammenfassung der Kapitel (Chapter Summaries)

Chapter 1 introduces the CREAM project and its primary goals. It focuses on the Corpus eNova database, providing a description of the database and its application within the project. This chapter also highlights the importance of calculating probabilities in search queries submitted to the database. Chapter 2 delves into the current state of the art in n-gram modeling and its contributions to word prediction and probability calculation. It examines existing research on n-gram analysis and discusses its practical applications in areas such as national security and spelling correction. Chapter 3 explores the process of preparing corpora for n-gram analysis. It covers aspects of tokenization, including word segmentation, word types vs. word tokens, stemming, lemmatization, and the handling of non-word characters. The chapter also discusses different parameters involved in tokenization, such as compounding, hyphens, and case sensitivity. Chapter 4 introduces a computer code specifically designed to extract n-grams from corpora. This chapter explains the functionality and usage of the code, demonstrating its ability to retrieve relevant n-grams from a given corpus.

Schlüsselwörter (Keywords)

The primary keywords and focus topics of the thesis include n-gram modeling, word prediction, word probability, corpora, tokenization, spelling correction, national security, and the CREAM project. This thesis examines the potential of n-gram analysis in improving search efficiency and effectiveness within the CREAM project's database. The research focuses on applying n-gram modeling techniques to enhance user experience and optimize search results.
Final del extracto de 63 páginas  - subir

Detalles

Título
Word prediction and word probability exemplified in searches over a pharmaceutical database
Universidad
Bielefeld University
Calificación
1,0
Autor
B.A. Marc Bohnes (Autor)
Año de publicación
2009
Páginas
63
No. de catálogo
V199178
ISBN (Ebook)
9783656256755
ISBN (Libro)
9783656258216
Idioma
Inglés
Etiqueta
Linguistik Englisch n-Gramme NLP Computerlinguistik Programmieren
Seguridad del producto
GRIN Publishing Ltd.
Citar trabajo
B.A. Marc Bohnes (Autor), 2009, Word prediction and word probability exemplified in searches over a pharmaceutical database, Múnich, GRIN Verlag, https://www.grin.com/document/199178
Leer eBook
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
Extracto de  63  Páginas
Grin logo
  • Grin.com
  • Page::Footer::PaymentAndShipping
  • Contacto
  • Privacidad
  • Aviso legal
  • Imprint