In this paper I collect healthcare data, which consists of all the details of the patients' symptoms, disease etc. After the collection of data, there will be pre-processing on all the details of the patients' data, as we need only filtered data for our analysis. The data will be stored in Hadoop. A user can retrieve the data by symptoms, disease etc.
Big Data is a collection of large and complex data. It consists of structured, semi-structured, and unstructured types of data. Data gets generated from various sources and from different fields. In today's era, data is being generated in huge amounts. The whole world is moving towards the digitalization. Social media sites, digital pictures and videos, and many others. All this type of data is known as big data. Data mining is a useful technique for extracting a pattern. This is helpful from large scale data sets. Useful and meaningful data can be extracted from this big data with the help of data mining by processing on that data.
Table of Contents
- INTRODUCTION
- IDEA AND MOTIVATION
- LITERATURE SURVEY
- PROBLEM DEFINITION AND SCOPE
- SCOPE
- SOFTWARE CONTEXT
- SOFTWARE CONSTRAINTS
- OUTCOMES
- HARDWARE SPECIFICATION
- S/W SPECIFICATION
- AREA OF DISSERTATION
- DISSERTATION PLAN
- PROJECT PLAN
- TIMELINE OF PROJECT
- FEASIBILITY STUDY
- Economical Feasibility
- Technical Feasibility
- Operational feasibility
- Time Feasibility
- RISK MANAGEMENT
- Project Risk
- Risk Assessment
- EFFORT AND COST ESTIMATION
- Lines of code (LOC)
- Effort
- Development Time
- Number of People
- SOFTWARE REQUIREMENT SPECIFICATION
- INTRODUCTION
- Purpose
- Scope of Document
- Overview of responsibilities of developer
- PRODUCT OVERVIEW
- Block diagram
- FUNCTINAL MODEL
- Flow diagram
- Data Flow Diagram
- UML Diagrams
- Sequence diagram
- Class diagram
- Non-Functional Requirements
- BEHAVIORAL MODEL AND DESCRIPTION
- Description of software behavior
- Use case diagram
- DETAILED DESIGN
- ARCHITECTURE DESIGN
- Algorithms
- INTERFACES
- Human Interface
- Database interface
- TESTING
- INTRODUCTION
- Goals and Objective
- TESTING STRATEGY
- White Box Testing
- Black Box Testing
- System testing
- Performance testing
- DATA TABLE AND DISCUSSION
- INPUT TO THE SYSTEM
- OUTPUT
- PERFORMANCE OF PROPOSED SYSTEM
- Performance of proposed system with respect to baseline algorithm
- Performance of proposed system with respect to blowfish encryption algorithm
- RESULT
- Difference between proposed algorithm and base algorithm i.e provider aware algorithm
- SUMMARY AND CONCLUSION
- FUTURE ENHANCEMENT
- REFERENCES
Objectives and Key Themes
The dissertation aims to develop an effective data mining technique for both structured and unstructured big data, focusing on privacy preservation during data sharing from distributed databases. The work explores the challenges of anonymizing data while maintaining privacy and examines existing techniques to address this issue.
- Privacy-preserving data analysis and publishing
- Data anonymization techniques
- Collaborative data publishing
- Trusted third-party (TTP) role in data sharing
- Insider attacks and their mitigation
Chapter Summaries
The dissertation begins by introducing the idea and motivation behind developing a new data mining technique for big data, with a focus on privacy preservation. It then defines the problem and scope of the dissertation, outlining the software context, constraints, and expected outcomes. Chapter 3 details the project plan, timeline, and feasibility study, including economic, technical, operational, and time feasibility aspects. Chapter 4 focuses on the software requirement specification, outlining the purpose, scope of the document, and responsibilities of the developer. It also includes a product overview with block diagrams, functional models with flow diagrams and data flow diagrams, and a detailed analysis of UML diagrams such as sequence diagrams and class diagrams. Finally, Chapter 5 dives into the detailed design, examining the architecture design and algorithms used, as well as interface details, including human and database interfaces.
Keywords
The primary focus of the dissertation lies in the intersection of big data, data mining, privacy preservation, and data anonymization. It investigates techniques for collaborative data publishing and the role of a trusted third party in ensuring data privacy while facilitating data sharing from distributed databases. Key concepts include privacy-preserving data analysis, insider attacks, and the development of a new algorithm for data anonymization, addressing the challenges of data sharing while maintaining privacy for individuals and sensitive information.
- Quote paper
- Dnyandeo Khemnar (Author), Nilesh Thorat (Author), 2017, Effective Data Mining Techniques for Unstructured Data in Big Data, Munich, GRIN Verlag, https://www.grin.com/document/1307474