Grin logo
en de es fr
Shop
GRIN Website
Publicación mundial de textos académicos
Go to shop › Ciencias de la computación - Aplicada

Effective Data Mining Techniques for Unstructured Data in Big Data

Título: Effective Data Mining Techniques for Unstructured Data in Big Data

Tesis de Máster , 2017 , 40 Páginas , Calificación: 10

Autor:in: Dnyandeo Khemnar (Autor), Nilesh Thorat (Autor)

Ciencias de la computación - Aplicada
Extracto de texto & Detalles   Leer eBook
Resumen Extracto de texto Detalles

In this paper I collect healthcare data, which consists of all the details of the patients' symptoms, disease etc. After the collection of data, there will be pre-processing on all the details of the patients' data, as we need only filtered data for our analysis. The data will be stored in Hadoop. A user can retrieve the data by symptoms, disease etc.

Big Data is a collection of large and complex data. It consists of structured, semi-structured, and unstructured types of data. Data gets generated from various sources and from different fields. In today's era, data is being generated in huge amounts. The whole world is moving towards the digitalization. Social media sites, digital pictures and videos, and many others. All this type of data is known as big data. Data mining is a useful technique for extracting a pattern. This is helpful from large scale data sets. Useful and meaningful data can be extracted from this big data with the help of data mining by processing on that data.

Extracto


Table of Contents

  • INTRODUCTION
    • IDEA AND MOTIVATION
    • LITERATURE SURVEY
  • PROBLEM DEFINITION AND SCOPE
    • SCOPE
    • SOFTWARE CONTEXT
    • SOFTWARE CONSTRAINTS
    • OUTCOMES
    • HARDWARE SPECIFICATION
    • S/W SPECIFICATION
    • AREA OF DISSERTATION
  • DISSERTATION PLAN
    • PROJECT PLAN
    • TIMELINE OF PROJECT
    • FEASIBILITY STUDY
      • Economical Feasibility
      • Technical Feasibility
      • Operational feasibility
      • Time Feasibility
    • RISK MANAGEMENT
      • Project Risk
      • Risk Assessment
    • EFFORT AND COST ESTIMATION
      • Lines of code (LOC)
      • Effort
      • Development Time
      • Number of People
  • SOFTWARE REQUIREMENT SPECIFICATION
    • INTRODUCTION
      • Purpose
      • Scope of Document
      • Overview of responsibilities of developer
    • PRODUCT OVERVIEW
      • Block diagram
    • FUNCTINAL MODEL
      • Flow diagram
      • Data Flow Diagram
      • UML Diagrams
        • Sequence diagram
        • Class diagram
      • Non-Functional Requirements
    • BEHAVIORAL MODEL AND DESCRIPTION
      • Description of software behavior
      • Use case diagram
  • DETAILED DESIGN
    • ARCHITECTURE DESIGN
      • Algorithms
    • INTERFACES
      • Human Interface
      • Database interface
  • TESTING
    • INTRODUCTION
      • Goals and Objective
    • TESTING STRATEGY
      • White Box Testing
      • Black Box Testing
      • System testing
      • Performance testing
  • DATA TABLE AND DISCUSSION
    • INPUT TO THE SYSTEM
    • OUTPUT
    • PERFORMANCE OF PROPOSED SYSTEM
      • Performance of proposed system with respect to baseline algorithm
      • Performance of proposed system with respect to blowfish encryption algorithm
    • RESULT
      • Difference between proposed algorithm and base algorithm i.e provider aware algorithm
  • SUMMARY AND CONCLUSION
    • FUTURE ENHANCEMENT
  • REFERENCES

Objectives and Key Themes

The dissertation aims to develop an effective data mining technique for both structured and unstructured big data, focusing on privacy preservation during data sharing from distributed databases. The work explores the challenges of anonymizing data while maintaining privacy and examines existing techniques to address this issue.

  • Privacy-preserving data analysis and publishing
  • Data anonymization techniques
  • Collaborative data publishing
  • Trusted third-party (TTP) role in data sharing
  • Insider attacks and their mitigation

Chapter Summaries

The dissertation begins by introducing the idea and motivation behind developing a new data mining technique for big data, with a focus on privacy preservation. It then defines the problem and scope of the dissertation, outlining the software context, constraints, and expected outcomes. Chapter 3 details the project plan, timeline, and feasibility study, including economic, technical, operational, and time feasibility aspects. Chapter 4 focuses on the software requirement specification, outlining the purpose, scope of the document, and responsibilities of the developer. It also includes a product overview with block diagrams, functional models with flow diagrams and data flow diagrams, and a detailed analysis of UML diagrams such as sequence diagrams and class diagrams. Finally, Chapter 5 dives into the detailed design, examining the architecture design and algorithms used, as well as interface details, including human and database interfaces.

Keywords

The primary focus of the dissertation lies in the intersection of big data, data mining, privacy preservation, and data anonymization. It investigates techniques for collaborative data publishing and the role of a trusted third party in ensuring data privacy while facilitating data sharing from distributed databases. Key concepts include privacy-preserving data analysis, insider attacks, and the development of a new algorithm for data anonymization, addressing the challenges of data sharing while maintaining privacy for individuals and sensitive information.

Final del extracto de 40 páginas  - subir

Detalles

Título
Effective Data Mining Techniques for Unstructured Data in Big Data
Universidad
Rajiv Gandhi University  (PATEL COLLEGE OF SCIENCE AND TECHNOLOGY)
Curso
COMPUTER SCIENCE
Calificación
10
Autores
Dnyandeo Khemnar (Autor), Nilesh Thorat (Autor)
Año de publicación
2017
Páginas
40
No. de catálogo
V1307474
ISBN (Ebook)
9783346783394
ISBN (Libro)
9783346783400
Idioma
Inglés
Etiqueta
Big data Data mining Hace theorem Map Reducer Privacy Preservation Mechanism.
Seguridad del producto
GRIN Publishing Ltd.
Citar trabajo
Dnyandeo Khemnar (Autor), Nilesh Thorat (Autor), 2017, Effective Data Mining Techniques for Unstructured Data in Big Data, Múnich, GRIN Verlag, https://www.grin.com/document/1307474
Leer eBook
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
  • Si ve este mensaje, la imagen no pudo ser cargada y visualizada.
Extracto de  40  Páginas
Grin logo
  • Grin.com
  • Page::Footer::PaymentAndShipping
  • Contacto
  • Privacidad
  • Aviso legal
  • Imprint