Gratis online lesen
Data management needs and their web based application have changed dynamically in the last few years. Strict data and variety of features consistency is provided by the relational databases. NoSQL databases have been developed due to massive cost of storing and manipulating data in classical relational database systems. NoSQL databases provide more scalability and heterogeneity when compared to RDBMS. MongoDB is only NoSQL database which provides high scalabilty, performance and availability. MongoDB’s documents are encoded in a JSON like format called BSON. MongoDB is a document based NoSQL database designed for Internet and web based applications. Its Data model is easy to build on due to its inherent support for unstructured data. It doesn’t require costly and time consuming migrations when application requirements change as compared to other RDBMS. This paper describes the advantages of MongoDB when compared to other NoSQL databases and its applications in sentiment analysis.
There was an enormous growth in the database sizes during the last decade. This made monolithic database systems struggle to keep up with today’s requirements as database size is increasing day by day. Well known products on RDBMS traits are available today. Because of the sheer size of the data or for the purpose of load balancing have to rely on custom built relations or utilize alternative database systems applications those require data and or functional partitioning, With the high demand for multi-machine databases distributed partitioning implementation has become a real challenge, Now whole set of NoSQL (not only SQL) databases 1 have emerged to fill the gap of RDBMS. Divergence from the relational data model, simplification of transactional model and transaction processing and most importantly the shift to the imperative programming model from the declarative style SQL language are the common features of NoSQL products. Viable alternatives are being used in place of Relational databases, such as NoSQL 2 databases, for reasons of scalability and heterogeneity. MongoDB, a NoSQL database, is known for its agile database built for scalability, performance and high availability. Single server environment 3 and also complex multi-site architectures both can be used to deploy it. The application of MongoDB in twitter sentiment analysis is also a noticing one. In this paper we compare MongoDB with other NoSQL databases based on different factors. Some of its applications in sentiment analysis are also listed.
Data models are well designed to be a bit flexible in order to support the storage needs coming from applications dealing with highly heterogeneous data. Also, the wide-spread use of dynamically typed scripting languages has made less strictly structured background storage system favourable. While a highly generic data model looks reasonable from the aspect of the client, efficient server-side processing makes certain restrictions on the data model necessary.
As a result, many NoSQL systems offer semi structured models and list-like data types. A primary objective of NoSQL database systems is to evenly distribute data among shards.
No SQL Data Models
- Key-values Stores
Using a hash table where there is a unique key and a pointer to a particular item of data is the main idea used here. The simplest and easiest to implement here is the key/ model value. Among other disadvantages, it is inefficient when you are only interested in querying or updating part of a value.
Examples: Redis, Voldemort, Oracle BDB, Amazon SimpleDB, Riak4
- Column Family Stores
Column family stones were created to store and process very large amounts of data distributed over many machines. There are also keys that point to multiple columns. The columns are arranged by column family. Examples: Cassandra, HBase
- Document Databases- These were inspired by Lotus Notes and therefore similar to key-value stores. These are basically versioned documents and these are collections of other key-value collections. The JSON. Document databases are the formats in which semi-structured documents are stored. A physical container for collections is known as a database. Every database gets its own files on the file system and a single MongoDB server typically has multiple databases.
Collection is a group of MongoDB documents which is the equivalent of an RDBMS table. A collection exists within a single database. Collections do not enforce a schema. Documents which are within a collection might have different fields. Usually, all documents present in a collection are of relevant and similar purpose.5
MongoDB 8 is well designed database management system for the web-based applications and Internet. NoSQL database is a document based data model in which persistence strategies are built for high read and write throughput and the ability to scale easily with automatic failover. It makes easy to build on, MongoDB’s document data model because it has inherent support for unstructured data and does not require costly and time consuming migrations when application requirements change. Its basically a JSON-like format, called BSON which is a natural fit for modern object oriented programming methodologies and also lightweight, fast and traversable. MongoDB uses BSON as network transfer format for documents. At first BSON seems BLOBlike, but there exists an important difference: the database consists by MongoDB understands BSON internals. This means that it can reach inside BSON objects, even nested ones by using dot notation. This allows MongoDB to build indexes and match objects against query expressions on both top-level and nested BSON keys. MongoDB also supports rich queries and full indexes. This distinguishes it from other document databases in which a separate server layer is used to handle complex queries. Its other features include automatic sharding, replication, and easy storage. The large amounts of user-related sensitive information and the increasing popularity of MongoDB stored in these databases raise the concern for the confidentiality and privacy of the data and the security provided by these systems. Security was not a primary concern of its designers, When MongoDB was initially designed.6
Abbildung in dieser Leseprobe nicht enthalten
Fig 1. Oracle Cloud Infrastructure
Components Of MONGODB
MongoDB is a cross-platform, document oriented database that provides, high performance, high availability, and easy scalability. MongoDB works on concept of collection and document.
1. Each database contains collections which in turn contains documents. Each document can be different with a varying number of fields. The size and content of each document can be different from each other.
2. The document structure is more in line with how developers construct their classes and objects in their respective programming languages. Developers will often say that their classes are not rows and columns but have a clear structure with key-value pairs.
3. The rows (or documents as called in MongoDB) doesn’t need to have a schema defined beforehand. Instead, the fields can be created on the fly.
4. The data model available within MongoDB allows you to represent hierarchical relationships, to store arrays, and other more complex structures more easily.
5. Scalability – The MongoDB environments are very scalable. Companies across the world have defined clusters with some of them running 100+ nodes with around millions of documents within the database.
Abbildung in dieser Leseprobe nicht enthalten
Fig 2. MongoDB Architecture
Features Of MONGODB
MongoDB is known for being the aggression framework, which is particularly good for the database managers these days, BSON format makes this framework more useful and effective, ad-Hoc query enables seamless data management, file storage and indexing are also highlighted features of MongoDB and replication of data for data backup is easy with MongoDB
1._id – This is a field required in every MongoDB document. The _id field represents a unique value in the MongoDB document. The _id field is like the document’s primary key. If you create a new document without an _id field, MongoDB will automatically create the field. So for example, if we see the example of the above customer table, Mongo DB will add a 24 digit unique identifier to each document in the collection.
2.Collection – This is a grouping of MongoDB documents. A collection is the equivalent of a table which is created in any other RDMS such as Oracle or MS SQL. A collection exists within a single database. As seen from the introduction collections don’t enforce any sort of structure.
3.Cursor – This is a pointer to the result set of a query. Clients can iterate through a cursor to retrieve results.
4.Database – This is a container for collections like in RDMS wherein it is a container for tables. Each database gets its own set of files on the file system. A MongoDB server can store multiple databases.
5.Document – A record in a MongoDB collection is basically called a document. The document, in turn, will consist of field name and values.
6.Field – A name-value pair in a document. A document has zero or more fields. Fields are analogous to columns in relational databases.The following diagram shows an example of Fields with Key value pairs. So in the example below CustomerID and it is one of the key value pair’s defined in the document.
For any software development project, the structure and arrangement of files and folders provide a foundation for scalability and maintainability. And it is a key yet sometimes overlooked aspect that should be priorities during software development phases. The layout of the project structure helps anyone new to the project to get familiar easier and quicker. As the application grows, it keeps the project files in logical order so the components, utility methods, styles, etc. will grow consistently and still be manageable. The approach for a modular and scalable project structure will also assist greatly in finding bugs and debugging the application. Numerous approaches are widely practised in various application development standards. Yet, there is no single optimal solution. It also depends on the type of project and its requirements. Here is a table with the screenshot along with the role of the folders of the project structure in this thesis project. The files are arranged in folders with names that describe the purpose and its contents.
Abbildung in dieser Leseprobe nicht enthalten
Fig 1. Project Structure
We would like to express our very great appreciation to Dr. P.S. Bedi for his valuable, constructive suggestions and assistance in keeping our progress on schedule during the planning and development of this research work. We would also like to express my deep gratitude to Professor Gurpreet Kaur and Professor Amandeep Kaur , our research supervisors, for their patient guidance, enthusiastic encouragement and useful critiques of this research work. Lastly we also take this opportunity to give thanks to all others who gave us support for the project or in other aspects of our study at Guru Tegh Bahadur Institute of Technology.
NoSQL systems offer much less functionality than traditional relation database management systems 7, especially in transaction isolation and scan operations. But they can be successfully used when complex database logic is not, but large-scale, distributed operation is an objective. MongoDB is an effective document oriented database which can be used for tweet analysis and other applications. JSON format of data stored in MongoDB helps in analysing the data easily for further processing. In future we can analyse more number of features of MongoDB and compare with NoSQL8
1 Laszlo Dobos, Balazs Pinczel etal. Sneddon, “A Comparative evaluation of NoSQL systems”. Annales Univ. Sci. Budapest., Sect. Comp. 42 (2014) 173-198.
2 Peng Wang, Yan Qi, Hua-min Yang, “Analysis and study on the performance of query based on NOSQL “in Computer Modelling & New Technologies 2014 18(9) 153-159.
3 Prabhakaran Murugesan, Indrakshi Ray “Audit Log Management in MongoDB”, in 2014 IEEE 10th World Congress on Services.
4 D.R.Merlin Shalini & Mr.S.Dhamodharan, “Performance and Scaling Comparison Study of RDBMS and NoSQL (MongoDB”, in COMPUSOFT, An international journal of advanced computer technology, 3 (11), November-2014 (Volume-III, Issue-XI).
5 Anju Abraham, “A Dynamic Query Form System for Mongodb”, in SSRG International Journal of Computer Science and Engineering (SSRG-IJCSE) – volume1 issue9 Nov 2014.
6 Sanobar Khan, Prof.Vanita Mane, “SQL Suport over MongoDB using Metadata”, in International Journal of Scientific and Research Publications, Volume 3, Issue 10, October 2013
7 Yong-Lak Choi, Woo-Seong Jeon , and Seok-Hwan Yoon, “Improving Database System Performance by Applying NoSQL” in J Inf Process Syst, Vol.10, No.3, pp.355~364, September 2014