In this paper I collect healthcare data, which consists of all the details of the patients' symptoms, disease etc. After the collection of data, there will be pre-processing on all the details of the patients' data, as we need only filtered data for our analysis. The data will be stored in Hadoop. A user can retrieve the data by symptoms, disease etc.

Big Data is a collection of large and complex data. It consists of structured, semi-structured, and unstructured types of data. Data gets generated from various sources and from different fields. In today's era, data is being generated in huge amounts. The whole world is moving towards the digitalization. Social media sites, digital pictures and videos, and many others. All this type of data is known as big data. Data mining is a useful technique for extracting a pattern. This is helpful from large scale data sets. Useful and meaningful data can be extracted from this big data with the help of data mining by processing on that data.

Excerpt

INTRODUCTION

IDEA AND MOTIVATION
LITERATURE SURVEY

PROBLEM DEFINITION AND SCOPE

SCOPE
SOFTWARE CONTEXT
SOFTWARE CONSTRAINTS
OUTCOMES
HARDWARE SPECIFICATION
S/W SPECIFICATION
AREA OF DISSERTATION

DISSERTATION PLAN

PROJECT PLAN
TIMELINE OF PROJECT
FEASIBILITY STUDY

Economical Feasibility
Technical Feasibility
Operational feasibility
Time Feasibility

RISK MANAGEMENT

Project Risk
Risk Assessment

EFFORT AND COST ESTIMATION

Lines of code (LOC)
Effort
Development Time
Number of People

SOFTWARE REQUIREMENT SPECIFICATION

INTRODUCTION

Purpose
Scope of Document
Overview of responsibilities of developer

PRODUCT OVERVIEW

Block diagram

FUNCTINAL MODEL

Flow diagram
Data Flow Diagram
UML Diagrams

Sequence diagram
Class diagram

Non-Functional Requirements

BEHAVIORAL MODEL AND DESCRIPTION

Description of software behavior
Use case diagram

DETAILED DESIGN

ARCHITECTURE DESIGN

Algorithms

INTERFACES

Human Interface
Database interface

TESTING

INTRODUCTION

Goals and Objective

TESTING STRATEGY

White Box Testing
Black Box Testing
System testing
Performance testing

DATA TABLE AND DISCUSSION

INPUT TO THE SYSTEM
OUTPUT
PERFORMANCE OF PROPOSED SYSTEM

Performance of proposed system with respect to baseline algorithm
Performance of proposed system with respect to blowfish encryption algorithm

RESULT

Difference between proposed algorithm and base algorithm i.e provider aware algorithm

SUMMARY AND CONCLUSION

FUTURE ENHANCEMENT

REFERENCES

Objectives and Key Themes

The dissertation aims to develop an effective data mining technique for both structured and unstructured big data, focusing on privacy preservation during data sharing from distributed databases. The work explores the challenges of anonymizing data while maintaining privacy and examines existing techniques to address this issue.

Privacy-preserving data analysis and publishing
Data anonymization techniques
Collaborative data publishing
Trusted third-party (TTP) role in data sharing
Insider attacks and their mitigation

Chapter Summaries

The dissertation begins by introducing the idea and motivation behind developing a new data mining technique for big data, with a focus on privacy preservation. It then defines the problem and scope of the dissertation, outlining the software context, constraints, and expected outcomes. Chapter 3 details the project plan, timeline, and feasibility study, including economic, technical, operational, and time feasibility aspects. Chapter 4 focuses on the software requirement specification, outlining the purpose, scope of the document, and responsibilities of the developer. It also includes a product overview with block diagrams, functional models with flow diagrams and data flow diagrams, and a detailed analysis of UML diagrams such as sequence diagrams and class diagrams. Finally, Chapter 5 dives into the detailed design, examining the architecture design and algorithms used, as well as interface details, including human and database interfaces.

Keywords

The primary focus of the dissertation lies in the intersection of big data, data mining, privacy preservation, and data anonymization. It investigates techniques for collaborative data publishing and the role of a trusted third party in ensuring data privacy while facilitating data sharing from distributed databases. Key concepts include privacy-preserving data analysis, insider attacks, and the development of a new algorithm for data anonymization, addressing the challenges of data sharing while maintaining privacy for individuals and sensitive information.

Excerpt out of 40 pages - scroll top

Details

Title: Effective Data Mining Techniques for Unstructured Data in Big Data
College: Rajiv Gandhi University (PATEL COLLEGE OF SCIENCE AND TECHNOLOGY)
Course: COMPUTER SCIENCE
Grade: 10
Authors: Dnyandeo Khemnar (Author), Nilesh Thorat (Author)
Publication Year: 2017
Pages: 40
Catalog Number: V1307474
ISBN (eBook): 9783346783394
ISBN (Book): 9783346783400
Language: English
Tags: Big data Data mining Hace theorem Map Reducer Privacy Preservation Mechanism.
Product Safety: GRIN Publishing GmbH

Quote paper: Dnyandeo Khemnar (Author), Nilesh Thorat (Author), 2017, Effective Data Mining Techniques for Unstructured Data in Big Data, Munich, GRIN Verlag, https://www.grin.com/document/1307474

Effective Data Mining Techniques for Unstructured Data in Big Data

Excerpt

Table of Contents

Objectives and Key Themes

Chapter Summaries

Keywords

Details