Cassandra is a column shop
How is columnar NoSQL different from document-centric?
The three types of NoSQL databases I've read about are key-value, column, and document-oriented.
The key value is pretty simple - a key with a simple value.
I've seen document-oriented databases described as a key value, but the value can be a structure, like a JSON object. Each "document" can have all, some, or none of the same keys as another.
Column-oriented seems to be document-oriented because you don't specify a structure.
What's the difference between these two and why should you use one over the other?
I took a special look at MongoDB and Cassandra. Basically, I need a dynamic structure that can change but does not affect other values. At the same time, I need to be able to search / filter certain keys and run reports. At CAP, AP is the most important thing for me. The data can "possibly" be synchronized across nodes as long as there is no conflict or data loss. Each user would have their own "table".
In Cassandra, each row (addressed with a key) contains one or more "columns". Columns are themselves key-value pairs. The column names do not have to be predefined, ie the structure is not fixed. Columns in a row are stored in sorted order according to their keys (names).
In some cases, you might have a very large number of columns in a row (for example, as an index to enable certain types of queries). Cassandra can handle such large structures efficiently and allows you to get specific ranges of columns.
There is another level of structure (not that commonly used) called supercolumns, where one column contains nested (sub) columns.
You can think of the forest as a nested hash table / dictionary with 2 or 3 key levels.
Normal column family:
Super column family:
There are also high-level structures - column families and key areas - that you can use to divide or group your data.
See also this question: Cassandra: What is a sub-column?
Or the data modeling links from http://wiki.apache.org/cassandra/ArticlesAndPresentations
Subject: Comparison with document-oriented databases - the latter usually insert entire documents (usually JSON), while in Cassandra you can address individual columns or supercolumns and update them individually, ie they work at a different level of granularity. Each column has its own timestamp / version (to match updates in the distributed cluster).
The Cassandra column values are bytes only but can be entered as ASCII, UTF8 text, numbers, dates, etc.
Of course, you could use Cassandra as a primitive document store by including columns that contain JSON - but you wouldn't get all of the functionality of a true document-oriented store.
The main difference is that document stores (e.g. MongoDB and CouchDB) allow documents of any complexity, i.e. sub-documents within sub-documents, lists of documents, etc., while column stores (e.g. Cassandra and HBase) only allow a fixed format, z. B. strict one-level or two-level dictionaries.
In "Insert", "Document-Based" is more consistent and direct to use rdbms words. Note that you can use Cassandra to achieve consistency with the notion of quorum. However, this does not apply to all column-based systems and reduces availability. Choose MongoDB on a write-once, read-often system. Also take this into account if you always want to read the entire structure of the object. A document-based system is designed to return the entire document when you receive it and is not very strong at returning parts of the entire line.
The column-based systems like Cassandra are far better at "updates" than document-based ones. You can change the value of a column without reading the row it contains. The write does not necessarily have to be done on the same server. A line can appear in multiple files on multiple servers. Choose Cassandra on a huge, rapidly evolving data system. Also take this into account if you plan to have a very large block of data per key and do not have to load all of them with every query. In "Select" you can use Cassandra to load only the column that you need.
Also keep in mind that Mongo DB is written in C ++ and is in the second major version, while Cassandra must be run on a JVM and the first major version has only been in the release candidate since yesterday (but the 0.X versions were in Productions turned by large company already).
On the other hand, Cassandra's design was partly based on Amazon Dynamo and is essentially designed as a high availability solution, which, however, has nothing to do with the column-based format. MongoDB also scales, but not as gracefully as Cassandra.
I would say the main difference is how each of these DB types physically stores the data.
For column types, the data is stored in columns that enable efficient aggregation operations / queries for a particular column.
For document types, the entire document is logically stored in one place and generally retrieved as a whole (no efficient aggregation for "columns" / "fields" possible).
The confusing bit is that a wide-column "line" can easily be represented as a document, but as mentioned is stored differently and optimized for different purposes.
- Who made the Square card reader
- Why did you buy a generator
- What is Arby an acronym for
- Sheryl Sandberg What is life
- Where can we use chemistry
- The revolution ensures peace and security
- Can you pay Google for SEO
- Why was the TV series Stalker canceled
- Is masturbation an addictive disease and harmful?
- Are there other time measurements
- What is the parent-child relationship
- Have you ever got a tattoo
- Which star sign are you
- What are marketing campaign purposes
- What are the best browser extensions
- What is the greenhouse effect 2
- How long have you worked for Avon
- What is your worst experience in your workplace
- When is it acceptable to dance?
- Who are your favorite underrated Quorans?
- How was a childhood in ancient Rome
- What makes something entertaining
- What are the fastest sports cars
- What are the best online parenting courses