Grin logo
en de es fr
Shop
GRIN Website
Publish your texts - enjoy our full service for authors
Go to shop › English Language and Literature Studies - Other

Word prediction and word probability exemplified in searches over a pharmaceutical database

Title: Word prediction and word probability exemplified in searches over a pharmaceutical database

Bachelor Thesis , 2009 , 63 Pages , Grade: 1,0

Autor:in: B.A. Marc Bohnes (Author)

English Language and Literature Studies - Other
Excerpt & Details   Look inside the ebook
Summary Excerpt Details

This Bachelor of Arts thesis contributes to the CREAM project between Novartis Pharma AG and Bielefeld University. Throughout the thesis a method called n-gram modeling will be discovered which supplies its user with information about the frequential use of words. This information will be needed in order to improve a database the CREAM project works on. This improvement is to do with a calculation of probabilities in search queries sent to the database. The thesis consists of five chapters. The first chapter introduces the CREAM project and the database. The second chapter provides the reader with information about the current state of n-gram modeling and where it can be found in contemporary literature. The third chapter deals extensively with how corpora have to be prepared in order to be analyzed accordingly and how n-gram modeling can be computed in terms of frequential distribution of words. In chapter four a computer code will be introduced that uses a corpus to obtain certain n-grams. Finally, in chapter five, the information retrieved by the computer code(s) will be evaluated and a forecast of future work will be mentioned.
Due to copyright-protected material, the appendix is not part of the thesis.

Excerpt


Inhaltsverzeichnis (Table of Contents)

  • Introduction to CREAM
    • The Corpus Research for Exploitation of Annotated Metadata project (CREAM)
      • The main topic
      • The Corpus eNova Database
        • eNova Application and Sample Walk Through
      • The main goal
  • State of the Art
    • Contributions to Word Prediction and Word Probability
      • Current Contributions to n-Gram Analysis
    • Applied n-Gram Analysis
      • National Security
      • Spelling Correction
        • Other Areas Related to Spelling Correction
  • n-Grams - Word Count and n-Gram Modeling
    • Introduction to Word Count in Corpora
    • Tokenization - Word Segmentation in Corpora
      • Word Types vs. Word Tokens
      • Stemming and Lemmatization
      • Non-word Characters
    • Parameters of Tokenization
      • Compounding and Words Separated by Whitespace
      • Hyphens
      • Case-(In-)Sensitivity

Zielsetzung und Themenschwerpunkte (Objectives and Key Themes)

This Bachelor of Arts thesis aims to contribute to the CREAM project, a collaborative effort between Novartis Pharma AG and Bielefeld University, by exploring the use of n-gram modeling for improving a database used within the project. The thesis examines the potential of n-gram modeling to calculate probabilities in search queries submitted to the database, ultimately enhancing search efficiency and effectiveness. Key themes explored in the thesis include:
  • N-gram modeling and its application in word prediction and probability calculation
  • The role of corpora in n-gram analysis, particularly in preparing and analyzing data
  • Practical applications of n-gram modeling in areas such as national security and spelling correction
  • The methodology of tokenization and its importance in processing corpora for n-gram analysis
  • The development and implementation of computer code to extract n-grams from corpora

Zusammenfassung der Kapitel (Chapter Summaries)

Chapter 1 introduces the CREAM project and its primary goals. It focuses on the Corpus eNova database, providing a description of the database and its application within the project. This chapter also highlights the importance of calculating probabilities in search queries submitted to the database. Chapter 2 delves into the current state of the art in n-gram modeling and its contributions to word prediction and probability calculation. It examines existing research on n-gram analysis and discusses its practical applications in areas such as national security and spelling correction. Chapter 3 explores the process of preparing corpora for n-gram analysis. It covers aspects of tokenization, including word segmentation, word types vs. word tokens, stemming, lemmatization, and the handling of non-word characters. The chapter also discusses different parameters involved in tokenization, such as compounding, hyphens, and case sensitivity. Chapter 4 introduces a computer code specifically designed to extract n-grams from corpora. This chapter explains the functionality and usage of the code, demonstrating its ability to retrieve relevant n-grams from a given corpus.

Schlüsselwörter (Keywords)

The primary keywords and focus topics of the thesis include n-gram modeling, word prediction, word probability, corpora, tokenization, spelling correction, national security, and the CREAM project. This thesis examines the potential of n-gram analysis in improving search efficiency and effectiveness within the CREAM project's database. The research focuses on applying n-gram modeling techniques to enhance user experience and optimize search results.
Excerpt out of 63 pages  - scroll top

Details

Title
Word prediction and word probability exemplified in searches over a pharmaceutical database
College
Bielefeld University
Grade
1,0
Author
B.A. Marc Bohnes (Author)
Publication Year
2009
Pages
63
Catalog Number
V199178
ISBN (eBook)
9783656256755
ISBN (Book)
9783656258216
Language
English
Tags
Linguistik Englisch n-Gramme NLP Computerlinguistik Programmieren
Product Safety
GRIN Publishing GmbH
Quote paper
B.A. Marc Bohnes (Author), 2009, Word prediction and word probability exemplified in searches over a pharmaceutical database, Munich, GRIN Verlag, https://www.grin.com/document/199178
Look inside the ebook
  • Depending on your browser, you might see this message in place of the failed image.
  • https://cdn.openpublishing.com/images/brand/1/preview_popup_advertising.jpg
  • Depending on your browser, you might see this message in place of the failed image.
  • Depending on your browser, you might see this message in place of the failed image.
  • Depending on your browser, you might see this message in place of the failed image.
  • Depending on your browser, you might see this message in place of the failed image.
  • Depending on your browser, you might see this message in place of the failed image.
  • Depending on your browser, you might see this message in place of the failed image.
  • Depending on your browser, you might see this message in place of the failed image.
  • Depending on your browser, you might see this message in place of the failed image.
  • Depending on your browser, you might see this message in place of the failed image.
  • Depending on your browser, you might see this message in place of the failed image.
  • Depending on your browser, you might see this message in place of the failed image.
  • Depending on your browser, you might see this message in place of the failed image.
  • Depending on your browser, you might see this message in place of the failed image.
  • Depending on your browser, you might see this message in place of the failed image.
Excerpt from  63  pages
Grin logo
  • Grin.com
  • Payment & Shipping
  • Contact
  • Privacy
  • Terms
  • Imprint