Cassandra is a column shop

How is columnar NoSQL different from document-centric?


The three types of NoSQL databases I've read about are key-value, column, and document-oriented.

The key value is pretty simple - a key with a simple value.

I've seen document-oriented databases described as a key value, but the value can be a structure, like a JSON object. Each "document" can have all, some, or none of the same keys as another.

Column-oriented seems to be document-oriented because you don't specify a structure.

What's the difference between these two and why should you use one over the other?

I took a special look at MongoDB and Cassandra. Basically, I need a dynamic structure that can change but does not affect other values. At the same time, I need to be able to search / filter certain keys and run reports. At CAP, AP is the most important thing for me. The data can "possibly" be synchronized across nodes as long as there is no conflict or data loss. Each user would have their own "table".

Reply:


In Cassandra, each row (addressed with a key) contains one or more "columns". Columns are themselves key-value pairs. The column names do not have to be predefined, ie the structure is not fixed. Columns in a row are stored in sorted order according to their keys (names).

In some cases, you might have a very large number of columns in a row (for example, as an index to enable certain types of queries). Cassandra can handle such large structures efficiently and allows you to get specific ranges of columns.

There is another level of structure (not that commonly used) called supercolumns, where one column contains nested (sub) columns.

You can think of the forest as a nested hash table / dictionary with 2 or 3 key levels.

Normal column family:

Super column family:

There are also high-level structures - column families and key areas - that you can use to divide or group your data.

See also this question: Cassandra: What is a sub-column?

Or the data modeling links from http://wiki.apache.org/cassandra/ArticlesAndPresentations

Subject: Comparison with document-oriented databases - the latter usually insert entire documents (usually JSON), while in Cassandra you can address individual columns or supercolumns and update them individually, ie they work at a different level of granularity. Each column has its own timestamp / version (to match updates in the distributed cluster).

The Cassandra column values ​​are bytes only but can be entered as ASCII, UTF8 text, numbers, dates, etc.

Of course, you could use Cassandra as a primitive document store by including columns that contain JSON - but you wouldn't get all of the functionality of a true document-oriented store.







The main difference is that document stores (e.g. MongoDB and CouchDB) allow documents of any complexity, i.e. sub-documents within sub-documents, lists of documents, etc., while column stores (e.g. Cassandra and HBase) only allow a fixed format, z. B. strict one-level or two-level dictionaries.


In "Insert", "Document-Based" is more consistent and direct to use rdbms words. Note that you can use Cassandra to achieve consistency with the notion of quorum. However, this does not apply to all column-based systems and reduces availability. Choose MongoDB on a write-once, read-often system. Also take this into account if you always want to read the entire structure of the object. A document-based system is designed to return the entire document when you receive it and is not very strong at returning parts of the entire line.

The column-based systems like Cassandra are far better at "updates" than document-based ones. You can change the value of a column without reading the row it contains. The write does not necessarily have to be done on the same server. A line can appear in multiple files on multiple servers. Choose Cassandra on a huge, rapidly evolving data system. Also take this into account if you plan to have a very large block of data per key and do not have to load all of them with every query. In "Select" you can use Cassandra to load only the column that you need.

Also keep in mind that Mongo DB is written in C ++ and is in the second major version, while Cassandra must be run on a JVM and the first major version has only been in the release candidate since yesterday (but the 0.X versions were in Productions turned by large company already).

On the other hand, Cassandra's design was partly based on Amazon Dynamo and is essentially designed as a high availability solution, which, however, has nothing to do with the column-based format. MongoDB also scales, but not as gracefully as Cassandra.




I would say the main difference is how each of these DB types physically stores the data.
For column types, the data is stored in columns that enable efficient aggregation operations / queries for a particular column.
For document types, the entire document is logically stored in one place and generally retrieved as a whole (no efficient aggregation for "columns" / "fields" possible).

The confusing bit is that a wide-column "line" can easily be represented as a document, but as mentioned is stored differently and optimized for different purposes.

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from.

By continuing, you consent to our use of cookies and other tracking technologies and affirm you're at least 16 years old or have consent from a parent or guardian.

You can read details in our Cookie policy and Privacy policy.