Saturday, January 03, 2009

Ten Reasons why Couchdb is better than Mysql

  1. No schema - schema less db, which means you can develop at the speed of your thought, you dont need to do a db update everytime you add a column
  2. everything is over http, simple http, get post, put, delete requests which means works with varnish/squid out of the box (i prefer varnish because it is clean and simple)
  3. Attachments - you can store file attachments for eg., your greeting card to grandma can carry images, music, flash
  4. Map Reduce - no more sql queries, use amazingly scalable map-reduce based views. Views once saved are lighting fast
  5. Sexy Futon javascript interface, comes with a cool js interface for displaying and editing data
  6. javascript server using mozilla spidermonkey to construct views - means no need php in flash
  7. zero config replication - work from home with no internet
  8. python couchdb library
  9. bulk updates, deletes - you can store 100000 docs in one post request
  10. Each couchdb document is just a simple JSON compatible doc, no cruft just simple
  11. (Bonus) Uses Erlang, which means it is scalable for multicore multiprocessor machines
  12. (Extra Bonus) Low memory requirement Takes 150MB compared to 8Gig taken by Mysql for a similar db setup
  13. similar to zodb, but much more cleaner and intuitive
  14. Extremely friendly community and developers - Damien, Jchris, paul davis, noah slater, chris anderson, Jan


NextGenSearch said...

> Is anyone using couchdb in production? Not sure if a database running via
> an interpreted language is the way to sort through a lot of records.

First of all erlang is more than just an "interpreted" language. It's more
like java in that it compiles to bytecode. So if java fits in you idea of
"interpreted" languages then maybe erlang would too. But java is pretty fast
these days and I doubt you are gonna find much problem with erlang in that
respect. Plus it's easy to do distributed in erlang, as opposed to java or
C, and that has some definite speed gains also.

> guess I like the idea of couchdb. How do relate data, do joins,
> explain queries with it?

That is a subject far larger than an email can really cover. take a look at
the wiki page of views for the background:

Additionally the how to guides cover some other common questions:

Like for instance hierarchical data in views.

and finally some examples in the view snippets section:

puzza007 said...

Also check this paper on Google's Sawzall language used for MapReduce tasks.

Bob said...

These CouchDB features you mention are very nice and are part of what makes it a great up-and-coming technology. But something that needs to be highlighted is that CouchDB is *not* ready for production. There has been zero optimization to date, initial construction of views for large amounts of data is incredibly slow, critical functionality that would allow it to be truthfully described as "distributed" have yet to be implemented, and various other critical parts of its proposed functionality haven't even been planned yet. All of this while its lead developers (with the exception of Damien) merrily try to push CouchDB tangentially away from its defined core competencies in pursuit of e.g. tacking on application server functionality, and the like.

Now, if "production" means an installation that you don't expect to need to scale (at least not anytime soon), then nevermind.

Matthew Nielsen said...

I have some observations...

1. -- this is not always good. Writing code without considering and implementing the design of your data storage is a good way to paint yourself in to a corner. Building schemas forces you to consider a design from the data-centric point of view too, not just the code point of view.

2. Everything over http? HTTP is actually a very heavy protocol so for hard hit installations this could prove to be a severe bottleneck.

3. Most RDBS can do this too -- mysql, specifically, does this in the form of 'blob' columns.

9. Mysql, postgres and most others can do this too. Mysqldump creates files based on that.

10. sqlite is the same way and they would then likely share the same issue -- once you get 8 million rows it gets very, very slow.

12. That's a pretty sweeping generalization, you're not specifying the size of the db, the data in it, how it's used and how much. Like saying my Kia is better than a Ferrari because the Kia has better gas mileage. It's all about context.