CouchDB

From Cybagora
Revision as of 19:32, 3 January 2015 by Sysop (Talk | contribs)

Jump to: navigation, search

Meet CouchDB

CouchDB is often categorized as a “NoSQL” database, a term that became increasingly popular in late 2009, and early 2010. While this term is a rather generic characterization of a database, or data store, it does clearly define a break from traditional SQL-based databases. A CouchDB database lacks a schema, or rigid pre-defined data structures such as tables. Data stored in CouchDB is a JSON document(s). The structure of the data, or document(s), can change dynamically to accommodate evolving needs.

What it is Not

To better understand what CouchDB is, it may be helpful to understand a few things that CouchDB isn't:

  • A relational database. These differences are articulated above in the Meet CouchDB section, and other portions of this Wiki.
  • A replacement for all databases. When developing and designing a good information system you should select the best tool for the job. While CouchDB can be used in a wide variety of applications you may find that another data store is a better fit for your problem. If you are new to CouchDB, and aren't sure if it's a good fit for your data management problem, please ask others on the mailing list and the #couchdb IRC channel for advice.
  • An object-oriented database. While CouchDB stores JSON objects, it isn't meant to function as a seamless persistence layer for an object-oriented programming language.

Key Characteristics

Let's review some of the basic elements of CouchDB.

Documents

A CouchDB document is a JSON object that consists of named fields. Field values may be strings, numbers, dates, or even ordered lists and associative maps. An example of a document would be a blog post:

{
    "Subject": "I like Plankton",
    "Author": "Rusty",
    "PostedDate": "5/23/2006",
    "Tags": ["plankton", "baseball", "decisions"],
    "Body": "I decided today that I don't like baseball. I like plankton."
}

In the above example document, Subject is a field that contains a single string value "I like plankton". Tags is a field containing the list of values "plankton", "baseball", and "decisions".

A CouchDB database is a flat collection of these documents. Each document is identified by a unique ID.

Views

To address this problem of adding structure back to semi-structured data, CouchDB integrates a view model using JavaScript for description. Views are the method of aggregating and reporting on the documents in a database, and are built on-demand to aggregate, join and report on database documents. Views are built dynamically and don’t affect the underlying document; you can have as many different view representations of the same data as you like. Incremental updates to documents do not require full re-indexing of views.

Schema-Free

Unlike SQL databases, which are designed to store and report on highly structured, interrelated data, CouchDB is designed to store and report on large amounts of semi-structured, document oriented data. CouchDB greatly simplifies the development of document oriented applications, such as collaborative web applications.

In an SQL database, the schema and storage of the existing data must be updated as needs evolve. With CouchDB, no schema is required, so new document types with new meaning can be safely added alongside the old. However, for applications requiring robust validation of new documents custom validation functions are possible. The view engine is designed to easily handle new document types and disparate but similar documents.

Distributed

CouchDB is a peer based distributed database system. Any number of CouchDB hosts (servers and offline-clients) can have independent "replica copies" of the same database, where applications have full database interactivity (query, add, edit, delete). When back online or on a schedule, database changes can be replicated bi-directionally.

CouchDB has built-in conflict detection and management and the replication process is incremental and fast, copying only documents changed since the previous replication. Most applications require no special planning to take advantage of distributed updates and replication.

Unlike cumbersome attempts to bolt distributed features on top of the same legacy models and databases, replication in CouchDB is the result of careful ground-up design, engineering and integration. This replication framework provides a comprehensive set of features:

  • Master → Slave replication
  • Master ↔ Master replication
  • Filtered Replication
  • Incremental and bi-directional replication
  • Conflict management

These replication features can be used in combination to create powerful solutions to many problems in the IT industry. In addition to the fantastic replication features, CouchDB's reliability and scalability is further enhanced by being implemented in the Erlang programming language. Erlang has built-in support for concurrency, distribution, fault tolerance, and has been used for years to build reliable systems in the telecommunications industry. By design, the Erlang language and runtime are able to take advantage of newer hardware with multiple CPU cores.

1. Introduction

CouchDB is a database that completely embraces the web. Store your data with JSON documents. Access your documents with your web browser, via HTTP. Query, combine, and transform your documents with JavaScript. CouchDB works well with modern web and mobile apps. You can even serve web apps directly out of CouchDB. And you can distribute your data, or your apps, efficiently using CouchDB’s incremental replication. CouchDB supports master-master setups with automatic conflict detection.

CouchDB comes with a suite of features, such as on-the-fly document transformation and real-time change notifications, that makes web app development a breeze. It even comes with an easy to use web administration console. You guessed it, served up directly out of CouchDB! We care a lot about distributed scaling. CouchDB is highly available and partition tolerant, but is also eventually consistent. And we care a lot about your data. CouchDB has a fault-tolerant storage engine that puts the safety of your data first.

In this section you’ll learn about every basic bit of CouchDB, see upon what conceptions and technologies it built and walk through short tutorial that teach how to use CouchDB.

1.1. Technical Overview

1.1.1. Document Storage

A CouchDB server hosts named databases, which store documents. Each document is uniquely named in the database, and CouchDB provides a RESTful HTTP API for reading and updating (add, edit, delete) database documents.

Documents are the primary unit of data in CouchDB and consist of any number of fields and attachments. Documents also include metadata that’s maintained by the database system. Document fields are uniquely named and contain values of varying types (text, number, boolean, lists, etc), and there is no set limit to text size or element count.

The CouchDB document update model is lockless and optimistic. Document edits are made by client applications loading documents, applying changes, and saving them back to the database. If another client editing the same document saves their changes first, the client gets an edit conflict error on save. To resolve the update conflict, the latest document version can be opened, the edits reapplied and the update tried again.

Document updates (add, edit, delete) are all or nothing, either succeeding entirely or failing completely. The database never contains partially saved or edited documents.

1.1.2. ACID Properties

The CouchDB file layout and commitment system features all Atomic Consistent Isolated Durable (ACID) properties. On-disk, CouchDB never overwrites committed data or associated structures, ensuring the database file is always in a consistent state. This is a “crash-only” design where the CouchDB server does not go through a shut down process, it’s simply terminated.

Document updates (add, edit, delete) are serialized, except for binary blobs which are written concurrently. Database readers are never locked out and never have to wait on writers or other readers. Any number of clients can be reading documents without being locked out or interrupted by concurrent updates, even on the same document. CouchDB read operations use a Multi-Version Concurrency Control (MVCC) model where each client sees a consistent snapshot of the database from the beginning to the end of the read operation.

Documents are indexed in B-trees by their name (DocID) and a Sequence ID. Each update to a database instance generates a new sequential number. Sequence IDs are used later for incrementally finding changes in a database. These B-tree indexes are updated simultaneously when documents are saved or deleted. The index updates always occur at the end of the file (append-only updates).

Documents have the advantage of data being already conveniently packaged for storage rather than split out across numerous tables and rows in most database systems. When documents are committed to disk, the document fields and metadata are packed into buffers, sequentially one document after another (helpful later for efficient building of views).

When CouchDB documents are updated, all data and associated indexes are flushed to disk and the transactional commit always leaves the database in a completely consistent state. Commits occur in two steps:

  • All document data and associated index updates are synchronously flushed to disk.
  • The updated database header is written in two consecutive, identical chunks to make up the first 4k of the file, and then synchronously flushed to disk.

In the event of an OS crash or power failure during step 1, the partially flushed updates are simply forgotten on restart. If such a crash happens during step 2 (committing the header), a surviving copy of the previous identical headers will remain, ensuring coherency of all previously committed data. Excepting the header area, consistency checks or fix-ups after a crash or a power failure are never necessary.

1.1.3. Compaction

Wasted space is recovered by occasional compaction. On schedule, or when the database file exceeds a certain amount of wasted space, the compaction process clones all the active data to a new file and then discards the old file. The database remains completely online the entire time and all updates and reads are allowed to complete successfully. The old database file is deleted only when all the data has been copied and all users transitioned to the new file.

1.1.4. Views

ACID properties only deal with storage and updates, but we also need the ability to show our data in interesting and useful ways. Unlike SQL databases where data must be carefully decomposed into tables, data in CouchDB is stored in semi-structured documents. CouchDB documents are flexible and each has its own implicit structure, which alleviates the most difficult problems and pitfalls of bi-directionally replicating table schemas and their contained data.

But beyond acting as a fancy file server, a simple document model for data storage and sharing is too simple to build real applications on – it simply doesn’t do enough of the things we want and expect. We want to slice and dice and see our data in many different ways. What is needed is a way to filter, organize and report on data that hasn’t been decomposed into tables.

See also Guide to Views

View Model

To address this problem of adding structure back to unstructured and semi-structured data, CouchDB integrates a view model. Views are the method of aggregating and reporting on the documents in a database, and are built on-demand to aggregate, join and report on database documents. Because views are built dynamically and don’t affect the underlying document, you can have as many different view representations of the same data as you like.

View definitions are strictly virtual and only display the documents from the current database instance, making them separate from the data they display and compatible with replication. CouchDB views are defined inside special design documents and can replicate across database instances like regular documents, so that not only data replicates in CouchDB, but entire application designs replicate too. Javascript View Functions

Views are defined using Javascript functions acting as the map part in a map-reduce system. A view function takes a CouchDB document as an argument and then does whatever computation it needs to do to determine the data that is to be made available through the view, if any. It can add multiple rows to the view based on a single document, or it can add no rows at all.

See also View functions

View Indexes

Views are a dynamic representation of the actual document contents of a database, and CouchDB makes it easy to create useful views of data. But generating a view of a database with hundreds of thousands or millions of documents is time and resource consuming, it’s not something the system should do from scratch each time.

To keep view querying fast, the view engine maintains indexes of its views, and incrementally updates them to reflect changes in the database. CouchDB’s core design is largely optimized around the need for efficient, incremental creation of views and their indexes.

Views and their functions are defined inside special “design” documents, and a design document may contain any number of uniquely named view functions. When a user opens a view and its index is automatically updated, all the views in the same design document are indexed as a single group.

The view builder uses the database sequence ID to determine if the view group is fully up-to-date with the database. If not, the view engine examines the all database documents (in packed sequential order) changed since the last refresh. Documents are read in the order they occur in the disk file, reducing the frequency and cost of disk head seeks.

The views can be read and queried simultaneously while also being refreshed. If a client is slowly streaming out the contents of a large view, the same view can be concurrently opened and refreshed for another client without blocking the first client. This is true for any number of simultaneous client readers, who can read and query the view while the index is concurrently being refreshed for other clients without causing problems for the readers.

As documents are processed by the view engine through your ‘map’ and ‘reduce’ functions, their previous row values are removed from the view indexes, if they exist. If the document is selected by a view function, the function results are inserted into the view as a new row.

When view index changes are written to disk, the updates are always appended at the end of the file, serving to both reduce disk head seek times during disk commits and to ensure crashes and power failures can not cause corruption of indexes. If a crash occurs while updating a view index, the incomplete index updates are simply lost and rebuilt incrementally from its previously committed state.

1.1.5. Security and Validation

To protect who can read and update documents, CouchDB has a simple reader access and update validation model that can be extended to implement custom security models.

See also /db/_security

Administrator Access

CouchDB database instances have administrator accounts. Administrator accounts can create other administrator accounts and update design documents. Design documents are special documents containing view definitions and other special formulas, as well as regular fields and blobs. Update Validation

As documents written to disk, they can be validated dynamically by javascript functions for both security and data validation. When the document passes all the formula validation criteria, the update is allowed to continue. If the validation fails, the update is aborted and the user client gets an error response.

Both the user’s credentials and the updated document are given as inputs to the validation formula, and can be used to implement custom security models by validating a user’s permissions to update a document.

A basic “author only” update document model is trivial to implement, where document updates are validated to check if the user is listed in an “author” field in the existing document. More dynamic models are also possible, like checking a separate user account profile for permission settings.

The update validations are enforced for both live usage and replicated updates, ensuring security and data validation in a shared, distributed system.

See also Validate document update functions

1.1.6. Distributed Updates and Replication

CouchDB is a peer-based distributed database system. It allows users and servers to access and update the same shared data while disconnected. Those changes can then be replicated bi-directionally later.

The CouchDB document storage, view and security models are designed to work together to make true bi-directional replication efficient and reliable. Both documents and designs can replicate, allowing full database applications (including application design, logic and data) to be replicated to laptops for offline use, or replicated to servers in remote offices where slow or unreliable connections make sharing data difficult.

The replication process is incremental. At the database level, replication only examines documents updated since the last replication. Then for each updated document, only fields and blobs that have changed are replicated across the network. If replication fails at any step, due to network problems or crash for example, the next replication restarts at the same document where it left off.

Partial replicas can be created and maintained. Replication can be filtered by a javascript function, so that only particular documents or those meeting specific criteria are replicated. This can allow users to take subsets of a large shared database application offline for their own use, while maintaining normal interaction with the application and that subset of data. Conflicts

Conflict detection and management are key issues for any distributed edit system. The CouchDB storage system treats edit conflicts as a common state, not an exceptional one. The conflict handling model is simple and “non-destructive” while preserving single document semantics and allowing for decentralized conflict resolution.

CouchDB allows for any number of conflicting documents to exist simultaneously in the database, with each database instance deterministically deciding which document is the “winner” and which are conflicts. Only the winning document can appear in views, while “losing” conflicts are still accessible and remain in the database until deleted or purged during database compaction. Because conflict documents are still regular documents, they replicate just like regular documents and are subject to the same security and validation rules.

When distributed edit conflicts occur, every database replica sees the same winning revision and each has the opportunity to resolve the conflict. Resolving conflicts can be done manually or, depending on the nature of the data and the conflict, by automated agents. The system makes decentralized conflict resolution possible while maintaining single document database semantics.

Conflict management continues to work even if multiple disconnected users or agents attempt to resolve the same conflicts. If resolved conflicts result in more conflicts, the system accommodates them in the same manner, determining the same winner on each machine and maintaining single document semantics.

See also Replication and conflict model

Applications

Using just the basic replication model, many traditionally single server database applications can be made distributed with almost no extra work. CouchDB replication is designed to be immediately useful for basic database applications, while also being extendable for more elaborate and full-featured uses.

With very little database work, it is possible to build a distributed document management application with granular security and full revision histories. Updates to documents can be implemented to exploit incremental field and blob replication, where replicated updates are nearly as efficient and incremental as the actual edit differences (“diffs”).

The CouchDB replication model can be modified for other distributed update models. If the storage engine is enhanced to allow multi-document update transactions, it is possible to perform Subversion-like “all or nothing” atomic commits when replicating with an upstream server, such that any single document conflict or validation failure will cause the entire update to fail. Like Subversion, conflicts would be resolved by doing a “pull” replication to force the conflicts locally, then merging and re-replicating to the upstream server.

1.1.7. Implementation

CouchDB is built on the Erlang OTP platform, a functional, concurrent programming language and development platform. Erlang was developed for real-time telecom applications with an extreme emphasis on reliability and availability.

Both in syntax and semantics, Erlang is very different from conventional programming languages like C or Java. Erlang uses lightweight “processes” and message passing for concurrency, it has no shared state threading and all data is immutable. The robust, concurrent nature of Erlang is ideal for a database server.

CouchDB is designed for lock-free concurrency, in the conceptual model and the actual Erlang implementation. Reducing bottlenecks and avoiding locks keeps the entire system working predictably under heavy loads. CouchDB can accommodate many clients replicating changes, opening and updating documents, and querying views whose indexes are simultaneously being refreshed for other clients, without needing locks.

For higher availability and more concurrent users, CouchDB is designed for “shared nothing” clustering. In a “shared nothing” cluster, each machine is independent and replicates data with its cluster mates, allowing individual server failures with zero downtime. And because consistency scans and fix-ups aren’t needed on restart, if the entire cluster fails – due to a power outage in a datacenter, for example – the entire CouchDB distributed system becomes immediately available after a restart.

CouchDB is built from the start with a consistent vision of a distributed document database system. Unlike cumbersome attempts to bolt distributed features on top of the same legacy models and databases, it is the result of careful ground-up design, engineering and integration. The document, view, security and replication models, the special purpose query language, the efficient and robust disk layout and the concurrent and reliable nature of the Erlang platform are all carefully integrated for a reliable and efficient system.

1.2. Why CouchDB?

Apache CouchDB is one of a new breed of database management systems. This topic explains why there’s a need for new systems as well as the motivations behind building CouchDB.

As CouchDB developers, we’re naturally very excited to be using CouchDB. In this topic we’ll share with you the reasons for our enthusiasm. We’ll show you how CouchDB’s schema-free document model is a better fit for common applications, how the built-in query engine is a powerful way to use and process your data, and how CouchDB’s design lends itself to modularization and scalability. 1.2.1. Relax

If there’s one word to describe CouchDB, it is relax. It is the byline to CouchDB’s official logo and when you start CouchDB, you see:

Apache CouchDB has started. Time to relax.

Why is relaxation important? Developer productivity roughly doubled in the last five years. The chief reason for the boost is more powerful tools that are easier to use. Take Ruby on Rails as an example. It is an infinitely complex framework, but it’s easy to get started with. Rails is a success story because of the core design focus on ease of use. This is one reason why CouchDB is relaxing: learning CouchDB and understanding its core concepts should feel natural to most everybody who has been doing any work on the Web. And it is still pretty easy to explain to non-technical people.

Getting out of the way when creative people try to build specialized solutions is in itself a core feature and one thing that CouchDB aims to get right. We found existing tools too cumbersome to work with during development or in production, and decided to focus on making CouchDB easy, even a pleasure, to use.

Another area of relaxation for CouchDB users is the production setting. If you have a live running application, CouchDB again goes out of its way to avoid troubling you. Its internal architecture is fault-tolerant, and failures occur in a controlled environment and are dealt with gracefully. Single problems do not cascade through an entire server system but stay isolated in single requests.

CouchDB’s core concepts are simple (yet powerful) and well understood. Operations teams (if you have a team; otherwise, that’s you) do not have to fear random behavior and untraceable errors. If anything should go wrong, you can easily find out what the problem is, but these situations are rare.

CouchDB is also designed to handle varying traffic gracefully. For instance, if a website is experiencing a sudden spike in traffic, CouchDB will generally absorb a lot of concurrent requests without falling over. It may take a little more time for each request, but they all get answered. When the spike is over, CouchDB will work with regular speed again.

The third area of relaxation is growing and shrinking the underlying hardware of your application. This is commonly referred to as scaling. CouchDB enforces a set of limits on the programmer. On first look, CouchDB might seem inflexible, but some features are left out by design for the simple reason that if CouchDB supported them, it would allow a programmer to create applications that couldn’t deal with scaling up or down.

Note

CouchDB doesn’t let you do things that would get you in trouble later on. This sometimes means you’ll have to unlearn best practices you might have picked up in your current or past work. 1.2.2. A Different Way to Model Your Data

We believe that CouchDB will drastically change the way you build document-based applications. CouchDB combines an intuitive document storage model with a powerful query engine in a way that’s so simple you’ll probably be tempted to ask, “Why has no one built something like this before?”

   Django may be built for the Web, but CouchDB is built of the Web. I’ve never seen software that so completely embraces the philosophies behind HTTP. CouchDB makes Django look old-school in the same way that Django makes ASP look outdated.
   —Jacob Kaplan-Moss, Django developer

CouchDB’s design borrows heavily from web architecture and the concepts of resources, methods, and representations. It augments this with powerful ways to query, map, combine, and filter your data. Add fault tolerance, extreme scalability, and incremental replication, and CouchDB defines a sweet spot for document databases. 1.2.3. A Better Fit for Common Applications

We write software to improve our lives and the lives of others. Usually this involves taking some mundane information such as contacts, invoices, or receipts and manipulating it using a computer application. CouchDB is a great fit for common applications like this because it embraces the natural idea of evolving, self-contained documents as the very core of its data model. Self-Contained Data

An invoice contains all the pertinent information about a single transaction the seller, the buyer, the date, and a list of the items or services sold. As shown in Figure 1. Self-contained documents, there’s no abstract reference on this piece of paper that points to some other piece of paper with the seller’s name and address. Accountants appreciate the simplicity of having everything in one place. And given the choice, programmers appreciate that, too. Self-contained documents

Figure 1. Self-contained documents

Yet using references is exactly how we model our data in a relational database! Each invoice is stored in a table as a row that refers to other rows in other tables one row for seller information, one for the buyer, one row for each item billed, and more rows still to describe the item details, manufacturer details, and so on and so forth.

This isn’t meant as a detraction of the relational model, which is widely applicable and extremely useful for a number of reasons. Hopefully, though, it illustrates the point that sometimes your model may not “fit” your data in the way it occurs in the real world.

Let’s take a look at the humble contact database to illustrate a different way of modeling data, one that more closely “fits” its real-world counterpart – a pile of business cards. Much like our invoice example, a business card contains all the important information, right there on the cardstock. We call this “self-contained” data, and it’s an important concept in understanding document databases like CouchDB. Syntax and Semantics

Most business cards contain roughly the same information – someone’s identity, an affiliation, and some contact information. While the exact form of this information can vary between business cards, the general information being conveyed remains the same, and we’re easily able to recognize it as a business card. In this sense, we can describe a business card as a real-world document.

Jan’s business card might contain a phone number but no fax number, whereas J. Chris’s business card contains both a phone and a fax number. Jan does not have to make his lack of a fax machine explicit by writing something as ridiculous as “Fax: None” on the business card. Instead, simply omitting a fax number implies that he doesn’t have one.

We can see that real-world documents of the same type, such as business cards, tend to be very similar in semantics – the sort of information they carry, but can vary hugely in syntax, or how that information is structured. As human beings, we’re naturally comfortable dealing with this kind of variation.

While a traditional relational database requires you to model your data up front, CouchDB’s schema-free design unburdens you with a powerful way to aggregate your data after the fact, just like we do with real-world documents. We’ll look in depth at how to design applications with this underlying storage paradigm. 1.2.4. Building Blocks for Larger Systems

CouchDB is a storage system useful on its own. You can build many applications with the tools CouchDB gives you. But CouchDB is designed with a bigger picture in mind. Its components can be used as building blocks that solve storage problems in slightly different ways for larger and more complex systems.

Whether you need a system that’s crazy fast but isn’t too concerned with reliability (think logging), or one that guarantees storage in two or more physically separated locations for reliability, but you’re willing to take a performance hit, CouchDB lets you build these systems.

There are a multitude of knobs you could turn to make a system work better in one area, but you’ll affect another area when doing so. One example would be the CAP theorem discussed in Eventual Consistency. To give you an idea of other things that affect storage systems, see Figure 2 and Figure 3.

By reducing latency for a given system (and that is true not only for storage systems), you affect concurrency and throughput capabilities. Throughput, latency, or concurrency

Figure 2. Throughput, latency, or concurrency Scaling: read requests, write requests, or data

Figure 3. Scaling: read requests, write requests, or data

When you want to scale out, there are three distinct issues to deal with: scaling read requests, write requests, and data. Orthogonal to all three and to the items shown in Figure 2 and Figure 3 are many more attributes like reliability or simplicity. You can draw many of these graphs that show how different features or attributes pull into different directions and thus shape the system they describe.

CouchDB is very flexible and gives you enough building blocks to create a system shaped to suit your exact problem. That’s not saying that CouchDB can be bent to solve any problem – CouchDB is no silver bullet – but in the area of data storage, it can get you a long way. 1.2.5. CouchDB Replication

CouchDB replication is one of these building blocks. Its fundamental function is to synchronize two or more CouchDB databases. This may sound simple, but the simplicity is key to allowing replication to solve a number of problems: reliably synchronize databases between multiple machines for redundant data storage; distribute data to a cluster of CouchDB instances that share a subset of the total number of requests that hit the cluster (load balancing); and distribute data between physically distant locations, such as one office in New York and another in Tokyo.

CouchDB replication uses the same REST API all clients use. HTTP is ubiquitous and well understood. Replication works incrementally; that is, if during replication anything goes wrong, like dropping your network connection, it will pick up where it left off the next time it runs. It also only transfers data that is needed to synchronize databases.

A core assumption CouchDB makes is that things can go wrong, like network connection troubles, and it is designed for graceful error recovery instead of assuming all will be well. The replication system’s incremental design shows that best. The ideas behind “things that can go wrong” are embodied in the Fallacies of Distributed Computing:

   The network is reliable.
   Latency is zero.
   Bandwidth is infinite.
   The network is secure.
   Topology doesn’t change.
   There is one administrator.
   Transport cost is zero.
   The network is homogeneous.

Existing tools often try to hide the fact that there is a network and that any or all of the previous conditions don’t exist for a particular system. This usually results in fatal error scenarios when something finally goes wrong. In contrast, CouchDB doesn’t try to hide the network; it just handles errors gracefully and lets you know when actions on your end are required. 1.2.6. Local Data Is King

CouchDB takes quite a few lessons learned from the Web, but there is one thing that could be improved about the Web: latency. Whenever you have to wait for an application to respond or a website to render, you almost always wait for a network connection that isn’t as fast as you want it at that point. Waiting a few seconds instead of milliseconds greatly affects user experience and thus user satisfaction.

What do you do when you are offline? This happens all the time – your DSL or cable provider has issues, or your iPhone, G1, or Blackberry has no bars, and no connectivity means no way to get to your data.

CouchDB can solve this scenario as well, and this is where scaling is important again. This time it is scaling down. Imagine CouchDB installed on phones and other mobile devices that can synchronize data with centrally hosted CouchDBs when they are on a network. The synchronization is not bound by user interface constraints like subsecond response times. It is easier to tune for high bandwidth and higher latency than for low bandwidth and very low latency. Mobile applications can then use the local CouchDB to fetch data, and since no remote networking is required for that, latency is low by default.

Can you really use CouchDB on a phone? Erlang, CouchDB’s implementation language has been designed to run on embedded devices magnitudes smaller and less powerful than today’s phones. 1.2.7. Wrapping Up

The next document Eventual Consistency further explores the distributed nature of CouchDB. We should have given you enough bites to whet your interest. Let’s go!

1.3. Eventual Consistency

In the previous document Why CouchDB?, we saw that CouchDB’s flexibility allows us to evolve our data as our applications grow and change. In this topic, we’ll explore how working “with the grain” of CouchDB promotes simplicity in our applications and helps us naturally build scalable, distributed systems. 1.3.1. Working with the Grain

A distributed system is a system that operates robustly over a wide network. A particular feature of network computing is that network links can potentially disappear, and there are plenty of strategies for managing this type of network segmentation. CouchDB differs from others by accepting eventual consistency, as opposed to putting absolute consistency ahead of raw availability, like RDBMS or Paxos. What these systems have in common is an awareness that data acts differently when many people are accessing it simultaneously. Their approaches differ when it comes to which aspects of consistency, availability, or partition tolerance they prioritize.

Engineering distributed systems is tricky. Many of the caveats and “gotchas” you will face over time aren’t immediately obvious. We don’t have all the solutions, and CouchDB isn’t a panacea, but when you work with CouchDB’s grain rather than against it, the path of least resistance leads you to naturally scalable applications.

Of course, building a distributed system is only the beginning. A website with a database that is available only half the time is next to worthless. Unfortunately, the traditional relational database approach to consistency makes it very easy for application programmers to rely on global state, global clocks, and other high availability no-nos, without even realizing that they’re doing so. Before examining how CouchDB promotes scalability, we’ll look at the constraints faced by a distributed system. After we’ve seen the problems that arise when parts of your application can’t rely on being in constant contact with each other, we’ll see that CouchDB provides an intuitive and useful way for modeling applications around high availability. 1.3.2. The CAP Theorem

The CAP theorem describes a few different strategies for distributing application logic across networks. CouchDB’s solution uses replication to propagate application changes across participating nodes. This is a fundamentally different approach from consensus algorithms and relational databases, which operate at different intersections of consistency, availability, and partition tolerance.

The CAP theorem, shown in Figure 1. The CAP theorem, identifies three distinct concerns:

   Consistency: All database clients see the same data, even with concurrent updates.
   Availability: All database clients are able to access some version of the data.
   Partition tolerance: The database can be split over multiple servers.

Pick two. The CAP theorem

Figure 1. The CAP theorem

When a system grows large enough that a single database node is unable to handle the load placed on it, a sensible solution is to add more servers. When we add nodes, we have to start thinking about how to partition data between them. Do we have a few databases that share exactly the same data? Do we put different sets of data on different database servers? Do we let only certain database servers write data and let others handle the reads?

Regardless of which approach we take, the one problem we’ll keep bumping into is that of keeping all these database servers in sync. If you write some information to one node, how are you going to make sure that a read request to another database server reflects this newest information? These events might be milliseconds apart. Even with a modest collection of database servers, this problem can become extremely complex.

When it’s absolutely critical that all clients see a consistent view of the database, the users of one node will have to wait for any other nodes to come into agreement before being able to read or write to the database. In this instance, we see that availability takes a backseat to consistency. However, there are situations where availability trumps consistency:

   Each node in a system should be able to make decisions purely based on local state. If you need to do something under high load with failures occurring and you need to reach agreement, you’re lost. If you’re concerned about scalability, any algorithm that forces you to run agreement will eventually become your bottleneck. Take that as a given.
       – Werner Vogels, Amazon CTO and Vice President

If availability is a priority, we can let clients write data to one node of the database without waiting for other nodes to come into agreement. If the database knows how to take care of reconciling these operations between nodes, we achieve a sort of “eventual consistency” in exchange for high availability. This is a surprisingly applicable trade-off for many applications.

Unlike traditional relational databases, where each action performed is necessarily subject to database-wide consistency checks, CouchDB makes it really simple to build applications that sacrifice immediate consistency for the huge performance improvements that come with simple distribution. 1.3.3. Local Consistency

Before we attempt to understand how CouchDB operates in a cluster, it’s important that we understand the inner workings of a single CouchDB node. The CouchDB API is designed to provide a convenient but thin wrapper around the database core. By taking a closer look at the structure of the database core, we’ll have a better understanding of the API that surrounds it. The Key to Your Data

At the heart of CouchDB is a powerful B-tree storage engine. A B-tree is a sorted data structure that allows for searches, insertions, and deletions in logarithmic time. As Figure 2. Anatomy of a view request illustrates, CouchDB uses this B-tree storage engine for all internal data, documents, and views. If we understand one, we will understand them all. Anatomy of a view request

Figure 2. Anatomy of a view request

CouchDB uses MapReduce to compute the results of a view. MapReduce makes use of two functions, “map” and “reduce”, which are applied to each document in isolation. Being able to isolate these operations means that view computation lends itself to parallel and incremental computation. More important, because these functions produce key/value pairs, CouchDB is able to insert them into the B-tree storage engine, sorted by key. Lookups by key, or key range, are extremely efficient operations with a B-tree, described in big O notation as O(log N) and O(log N + K), respectively.

In CouchDB, we access documents and view results by key or key range. This is a direct mapping to the underlying operations performed on CouchDB’s B-tree storage engine. Along with document inserts and updates, this direct mapping is the reason we describe CouchDB’s API as being a thin wrapper around the database core.

Being able to access results by key alone is a very important restriction because it allows us to make huge performance gains. As well as the massive speed improvements, we can partition our data over multiple nodes, without affecting our ability to query each node in isolation. BigTable, Hadoop, SimpleDB, and memcached restrict object lookups by key for exactly these reasons. No Locking

A table in a relational database is a single data structure. If you want to modify a table – say, update a row – the database system must ensure that nobody else is trying to update that row and that nobody can read from that row while it is being updated. The common way to handle this uses what’s known as a lock. If multiple clients want to access a table, the first client gets the lock, making everybody else wait. When the first client’s request is processed, the next client is given access while everybody else waits, and so on. This serial execution of requests, even when they arrived in parallel, wastes a significant amount of your server’s processing power. Under high load, a relational database can spend more time figuring out who is allowed to do what, and in which order, than it does doing any actual work.

Note

Modern relational databases avoid locks by implementing MVCC under the hood, but hide it from the end user, requiring them to coordinate concurrent changes of single rows or fields.

Instead of locks, CouchDB uses Multi-Version Concurrency Control (MVCC) to manage concurrent access to the database. Figure 3. MVCC means no locking illustrates the differences between MVCC and traditional locking mechanisms. MVCC means that CouchDB can run at full speed, all the time, even under high load. Requests are run in parallel, making excellent use of every last drop of processing power your server has to offer. MVCC means no locking

Figure 3. MVCC means no locking

Documents in CouchDB are versioned, much like they would be in a regular version control system such as Subversion. If you want to change a value in a document, you create an entire new version of that document and save it over the old one. After doing this, you end up with two versions of the same document, one old and one new.

How does this offer an improvement over locks? Consider a set of requests wanting to access a document. The first request reads the document. While this is being processed, a second request changes the document. Since the second request includes a completely new version of the document, CouchDB can simply append it to the database without having to wait for the read request to finish.

When a third request wants to read the same document, CouchDB will point it to the new version that has just been written. During this whole process, the first request could still be reading the original version.

A read request will always see the most recent snapshot of your database at the time of the beginning of the request. 1.3.4. Validation

As application developers, we have to think about what sort of input we should accept and what we should reject. The expressive power to do this type of validation over complex data within a traditional relational database leaves a lot to be desired. Fortunately, CouchDB provides a powerful way to perform per-document validation from within the database.

CouchDB can validate documents using JavaScript functions similar to those used for MapReduce. Each time you try to modify a document, CouchDB will pass the validation function a copy of the existing document, a copy of the new document, and a collection of additional information, such as user authentication details. The validation function now has the opportunity to approve or deny the update.

By working with the grain and letting CouchDB do this for us, we save ourselves a tremendous amount of CPU cycles that would otherwise have been spent serializing object graphs from SQL, converting them into domain objects, and using those objects to do application-level validation. 1.3.5. Distributed Consistency

Maintaining consistency within a single database node is relatively easy for most databases. The real problems start to surface when you try to maintain consistency between multiple database servers. If a client makes a write operation on server A, how do we make sure that this is consistent with server B, or C, or D? For relational databases, this is a very complex problem with entire books devoted to its solution. You could use multi-master, master/slave, partitioning, sharding, write-through caches, and all sorts of other complex techniques. 1.3.6. Incremental Replication

CouchDB’s operations take place within the context of a single document. As CouchDB achieves eventual consistency between multiple databases by using incremental replication you no longer have to worry about your database servers being able to stay in constant communication. Incremental replication is a process where document changes are periodically copied between servers. We are able to build what’s known as a shared nothing cluster of databases where each node is independent and self-sufficient, leaving no single point of contention across the system.

Need to scale out your CouchDB database cluster? Just throw in another server.

As illustrated in Figure 4. Incremental replication between CouchDB nodes, with CouchDB’s incremental replication, you can synchronize your data between any two databases however you like and whenever you like. After replication, each database is able to work independently.

You could use this feature to synchronize database servers within a cluster or between data centers using a job scheduler such as cron, or you could use it to synchronize data with your laptop for offline work as you travel. Each database can be used in the usual fashion, and changes between databases can be synchronized later in both directions. Incremental replication between CouchDB nodes

Figure 4. Incremental replication between CouchDB nodes

What happens when you change the same document in two different databases and want to synchronize these with each other? CouchDB’s replication system comes with automatic conflict detection and resolution. When CouchDB detects that a document has been changed in both databases, it flags this document as being in conflict, much like they would be in a regular version control system.

This isn’t as troublesome as it might first sound. When two versions of a document conflict during replication, the winning version is saved as the most recent version in the document’s history. Instead of throwing the losing version away, as you might expect, CouchDB saves this as a previous version in the document’s history, so that you can access it if you need to. This happens automatically and consistently, so both databases will make exactly the same choice.

It is up to you to handle conflicts in a way that makes sense for your application. You can leave the chosen document versions in place, revert to the older version, or try to merge the two versions and save the result. 1.3.7. Case Study

Greg Borenstein, a friend and coworker, built a small library for converting Songbird playlists to JSON objects and decided to store these in CouchDB as part of a backup application. The completed software uses CouchDB’s MVCC and document revisions to ensure that Songbird playlists are backed up robustly between nodes.

Note

Songbird is a free software media player with an integrated web browser, based on the Mozilla XULRunner platform. Songbird is available for Microsoft Windows, Apple Mac OS X, Solaris, and Linux.

Let’s examine the workflow of the Songbird backup application, first as a user backing up from a single computer, and then using Songbird to synchronize playlists between multiple computers. We’ll see how document revisions turn what could have been a hairy problem into something that just works.

The first time we use this backup application, we feed our playlists to the application and initiate a backup. Each playlist is converted to a JSON object and handed to a CouchDB database. As illustrated in Figure 5. Backing up to a single database, CouchDB hands back the document ID and revision of each playlist as it’s saved to the database. Backing up to a single database

Figure 5. Backing up to a single database

After a few days, we find that our playlists have been updated and we want to back up our changes. After we have fed our playlists to the backup application, it fetches the latest versions from CouchDB, along with the corresponding document revisions. When the application hands back the new playlist document, CouchDB requires that the document revision is included in the request.

CouchDB then makes sure that the document revision handed to it in the request matches the current revision held in the database. Because CouchDB updates the revision with every modification, if these two are out of sync it suggests that someone else has made changes to the document between the time we requested it from the database and the time we sent our updates. Making changes to a document after someone else has modified it without first inspecting those changes is usually a bad idea.

Forcing clients to hand back the correct document revision is the heart of CouchDB’s optimistic concurrency.

We have a laptop we want to keep synchronized with our desktop computer. With all our playlists on our desktop, the first step is to “restore from backup” onto our laptop. This is the first time we’ve done this, so afterward our laptop should hold an exact replica of our desktop playlist collection.

After editing our Argentine Tango playlist on our laptop to add a few new songs we’ve purchased, we want to save our changes. The backup application replaces the playlist document in our laptop CouchDB database and a new document revision is generated. A few days later, we remember our new songs and want to copy the playlist across to our desktop computer. As illustrated in Figure 6. Synchronizing between two databases, the backup application copies the new document and the new revision to the desktop CouchDB database. Both CouchDB databases now have the same document revision. Synchronizing between two databases

Figure 6. Synchronizing between two databases

Because CouchDB tracks document revisions, it ensures that updates like these will work only if they are based on current information. If we had made modifications to the playlist backups between synchronization, things wouldn’t go as smoothly.

We back up some changes on our laptop and forget to synchronize. A few days later, we’re editing playlists on our desktop computer, make a backup, and want to synchronize this to our laptop. As illustrated in Figure 7. Synchronization conflicts between two databases, when our backup application tries to replicate between the two databases, CouchDB sees that the changes being sent from our desktop computer are modifications of out-of-date documents and helpfully informs us that there has been a conflict.

Recovering from this error is easy to accomplish from an application perspective. Just download CouchDB’s version of the playlist and provide an opportunity to merge the changes or save local modifications into a new playlist. Synchronization conflicts between two databases

Figure 7. Synchronization conflicts between two databases 1.3.8. Wrapping Up

CouchDB’s design borrows heavily from web architecture and the lessons learned deploying massively distributed systems on that architecture. By understanding why this architecture works the way it does, and by learning to spot which parts of your application can be easily distributed and which parts cannot, you’ll enhance your ability to design distributed and scalable applications, with CouchDB or without it.

We’ve covered the main issues surrounding CouchDB’s consistency model and hinted at some of the benefits to be had when you work with CouchDB and not against it. But enough theory – let’s get up and running and see what all the fuss is about!

1.4. Getting Started

In this document, we’ll take a quick tour of CouchDB’s features, familiarizing ourselves with Futon, the built-in administration interface. We’ll create our first document and experiment with CouchDB views. 1.4.1. All Systems Are Go!

We’ll have a very quick look at CouchDB’s bare-bones Application Programming Interface (API) by using the command-line utility curl. Please note that this is not the only way of talking to CouchDB. We will show you plenty more throughout the rest of the documents. What’s interesting about curl is that it gives you control over raw HTTP requests, and you can see exactly what is going on “underneath the hood” of your database.

Make sure CouchDB is still running, and then do:

curl http://127.0.0.1:5984/

This issues a GET request to your newly installed CouchDB instance.

The reply should look something like:

{

 "couchdb": "Welcome",
 "uuid": "85fb71bf700c17267fef77535820e371",
 "version": "1.4.0",
 "vendor": {
     "version": "1.4.0",
     "name": "The Apache Software Foundation"
 }

}

Not all that spectacular. CouchDB is saying “hello” with the running version number.

Next, we can get a list of databases:

curl -X GET http://127.0.0.1:5984/_all_dbs

All we added to the previous request is the _all_dbs string.

The response should look like:

["_replicator","_users"]

Oh, that’s right, we didn’t create any databases yet! All we see is an empty list.

Note

The curl command issues GET requests by default. You can issue POST requests using curl -X POST. To make it easy to work with our terminal history, we usually use the -X option even when issuing GET requests. If we want to send a POST next time, all we have to change is the method.

HTTP does a bit more under the hood than you can see in the examples here. If you’re interested in every last detail that goes over the wire, pass in the -v option (e.g., curl -vX GET), which will show you the server curl tries to connect to, the request headers it sends, and response headers it receives back. Great for debugging!

Let’s create a database:

curl -X PUT http://127.0.0.1:5984/baseball

CouchDB will reply with:

{"ok":true}

Retrieving the list of databases again shows some useful results this time:

curl -X GET http://127.0.0.1:5984/_all_dbs

["baseball"]

Note

We should mention JavaScript Object Notation (JSON) here, the data format CouchDB speaks. JSON is a lightweight data interchange format based on JavaScript syntax. Because JSON is natively compatible with JavaScript, your web browser is an ideal client for CouchDB.

Brackets ([]) represent ordered lists, and curly braces ({}) represent key/value dictionaries. Keys must be strings, delimited by quotes ("), and values can be strings, numbers, booleans, lists, or key/value dictionaries. For a more detailed description of JSON, see Appendix E, JSON Primer.

Let’s create another database:

curl -X PUT http://127.0.0.1:5984/baseball

CouchDB will reply with:

{"error":"file_exists","reason":"The database could not be created, the file already exists."}

We already have a database with that name, so CouchDB will respond with an error. Let’s try again with a different database name:

curl -X PUT http://127.0.0.1:5984/plankton

CouchDB will reply with:

{"ok":true}

Retrieving the list of databases yet again shows some useful results:

curl -X GET http://127.0.0.1:5984/_all_dbs

CouchDB will respond with:

["baseball", "plankton"]

To round things off, let’s delete the second database:

curl -X DELETE http://127.0.0.1:5984/plankton

CouchDB will reply with:

{"ok":true}

The list of databases is now the same as it was before:

curl -X GET http://127.0.0.1:5984/_all_dbs

CouchDB will respond with:

["baseball"]

For brevity, we’ll skip working with documents, as the next section covers a different and potentially easier way of working with CouchDB that should provide experience with this. As we work through the example, keep in mind that “under the hood” everything is being done by the application exactly as you have been doing here manually. Everything is done using GET, PUT, POST, and DELETE with a URI. 1.4.2. Welcome to Futon

After having seen CouchDB’s raw API, let’s get our feet wet by playing with Futon, the built-in administration interface. Futon provides full access to all of CouchDB’s features and makes it easy to work with some of the more complex ideas involved. With Futon we can create and destroy databases; view and edit documents; compose and run MapReduce views; and trigger replication between databases.

To load Futon in your browser, visit:

http://127.0.0.1:5984/_utils/

If you’re running version 0.9 or later, you should see something similar to Figure 1. The Futon welcome screen. In later documents, we’ll focus on using CouchDB from server-side languages such as Ruby and Python. As such, this document is a great opportunity to showcase an example of natively serving up a dynamic web application using nothing more than CouchDB’s integrated web server, something you may wish to do with your own applications.

The first thing we should do with a fresh installation of CouchDB is run the test suite to verify that everything is working properly. This assures us that any problems we may run into aren’t due to bothersome issues with our setup. By the same token, failures in the Futon test suite are a red flag, telling us to double-check our installation before attempting to use a potentially broken database server, saving us the confusion when nothing seems to be working quite like we expect! The Futon welcome screen

Figure 1. The Futon welcome screen

Some common network configurations cause the replication test to fail when accessed via the localhost address. You can fix this by accessing CouchDB via 127.0.0.1, e.g. http://127.0.0.1:5984/_utils/.

Navigate to the test suite by clicking “Test Suite” on the Futon sidebar, then click “run all” at the top to kick things off. Figure 2. The Futon test suite running some tests shows the Futon test suite running some tests. The Futon test suite running some tests

Figure 2. The Futon test suite running some tests

Because the test suite is run from the browser, not only does it test that CouchDB is functioning properly, it also verifies that your browser’s connection to the database is properly configured, which can be very handy for diagnosing misbehaving proxies or other HTTP middleware.

If the test suite has an inordinate number of failures, you’ll need to see the troubleshooting section in Appendix D, Installing from Source for the next steps to fix your installation.

Now that the test suite is finished, you’ve verified that your CouchDB installation is successful and you’re ready to see what else Futon has to offer. 1.4.3. Your First Database and Document

Creating a database in Futon is simple. From the overview page, click “Create Database.” When asked for a name, enter hello-world and click the Create button.

After your database has been created, Futon will display a list of all its documents. This list will start out empty (Figure 3. An empty database in Futon), so let’s create our first document. Click the “New Document” link and then the Create button in the pop up. Make sure to leave the document ID blank, and CouchDB will generate a UUID for you.

For demoing purposes, having CouchDB assign a UUID is fine. When you write your first programs, we recommend assigning your own UUIDs. If your rely on the server to generate the UUID and you end up making two POST requests because the first POST request bombed out, you might generate two docs and never find out about the first one because only the second one will be reported back. Generating your own UUIDs makes sure that you’ll never end up with duplicate documents.

Futon will display the newly created document, with its _id and _rev as the only fields. To create a new field, click the “Add Field” button. We’ll call the new field hello. Click the green check icon (or hit the Enter key) to finalize creating the hello field. Double-click the hello field’s value (default null) to edit it.

You can experiment with other JSON values; e.g., [1, 2, "c"] or {"foo": "bar"}. Once you’ve entered your values into the document, make a note of its _rev attribute and click “Save Document.” The result should look like Figure 4. A “hello world” document in Futon. An empty database in Futon

Figure 3. An empty database in Futon A "hello world" document in Futon

Figure 4. A “hello world” document in Futon

You’ll notice that the document’s _rev has changed. We’ll go into more detail about this in later documents, but for now, the important thing to note is that _rev acts like a safety feature when saving a document. As long as you and CouchDB agree on the most recent _rev of a document, you can successfully save your changes.

Futon also provides a way to display the underlying JSON data, which can be more compact and easier to read, depending on what sort of data you are dealing with. To see the JSON version of our “hello world” document, click the Source tab. The result should look like Figure 5. The JSON source of a “hello world” document in Futon. The JSON source of a "hello world" document in Futon

Figure 5. The JSON source of a “hello world” document in Futon 1.4.4. Running a Query Using MapReduce

Traditional relational databases allow you to run any queries you like as long as your data is structured correctly. In contrast, CouchDB uses predefined map and reduce functions in a style known as MapReduce. These functions provide great flexibility because they can adapt to variations in document structure, and indexes for each document can be computed independently and in parallel. The combination of a map and a reduce function is called a view in CouchDB terminology.

For experienced relational database programmers, MapReduce can take some getting used to. Rather than declaring which rows from which tables to include in a result set and depending on the database to determine the most efficient way to run the query, reduce queries are based on simple range requests against the indexes generated by your map functions.

Map functions are called once with each document as the argument. The function can choose to skip the document altogether or emit one or more view rows as key/value pairs. Map functions may not depend on any information outside of the document. This independence is what allows CouchDB views to be generated incrementally and in parallel.

CouchDB views are stored as rows that are kept sorted by key. This makes retrieving data from a range of keys efficient even when there are thousands or millions of rows. When writing CouchDB map functions, your primary goal is to build an index that stores related data under nearby keys.

Before we can run an example MapReduce view, we’ll need some data to run it on. We’ll create documents carrying the price of various supermarket items as found at different shops. Let’s create documents for apples, oranges, and bananas. (Allow CouchDB to generate the _id and _rev fields.) Use Futon to create documents that have a final JSON structure that looks like this:

{

"_id": "00a271787f89c0ef2e10e88a0c0001f4",
"_rev": "1-2628a75ac8c3abfffc8f6e30c9949fd6",
"item": "apple",
"prices": {
    "Fresh Mart": 1.59,
    "Price Max": 5.99,
    "Apples Express": 0.79
}

}

This document should look like Figure 6. An example document with apple prices in Futon when entered into Futon. An example document with apple prices in Futon

Figure 6. An example document with apple prices in Futon

OK, now that that’s done, let’s create the document for oranges:

{

"_id": "00a271787f89c0ef2e10e88a0c0003f0",
"_rev": "1-e9680c5d9a688b4ff8dd68549e8e072c",
"item": "orange",
"prices": {
    "Fresh Mart": 1.99,
    "Price Max": 3.19,
    "Citrus Circus": 1.09
}

}

And finally, the document for bananas:

{

"_id": "00a271787f89c0ef2e10e88a0c00048b",
"_rev": "1-60e25d93dc12884676d037400a6fa189",
"item": "banana",
"prices": {
    "Fresh Mart": 1.99,
    "Price Max": 0.79,
    "Banana Montana": 4.22
}

}

Imagine we’re catering a big luncheon, but the client is very price-sensitive. To find the lowest prices, we’re going to create our first view, which shows each fruit sorted by price. Click “hello-world” to return to the hello-world overview, and then from the “select view” menu choose “Temporary view…” to create a new view. A temporary view in Futon

Figure 7. A temporary view in Futon

Edit the map function, on the left, so that it looks like the following:

function(doc) {

 var shop, price, value;
 if (doc.item && doc.prices) {
     for (shop in doc.prices) {
         price = doc.prices[shop];
         value = [doc.item, shop];
         emit(price, value);
     }
 }

}

This is a JavaScript function that CouchDB runs for each of our documents as it computes the view. We’ll leave the reduce function blank for the time being.

Click “Run” and you should see result rows like in Figure 8. The results of running a view in Futon, with the various items sorted by price. This map function could be even more useful if it grouped the items by type so that all the prices for bananas were next to each other in the result set. CouchDB’s key sorting system allows any valid JSON object as a key. In this case, we’ll emit an array of [item, price] so that CouchDB groups by item type and price. The results of running a view in Futon

Figure 8. The results of running a view in Futon

Let’s modify the view function so that it looks like this:

function(doc) {

 var shop, price, key;
 if (doc.item && doc.prices) {
     for (shop in doc.prices) {
         price = doc.prices[shop];
         key = [doc.item, price];
         emit(key, shop);
     }
 }

}

Here, we first check that the document has the fields we want to use. CouchDB recovers gracefully from a few isolated map function failures, but when a map function fails regularly (due to a missing required field or other JavaScript exception), CouchDB shuts off its indexing to prevent any further resource usage. For this reason, it’s important to check for the existence of any fields before you use them. In this case, our map function will skip the first “hello world” document we created without emitting any rows or encountering any errors. The result of this query should look like Figure 9. The results of running a view after grouping by item type and price. The results of running a view after grouping by item type and price

Figure 9. The results of running a view after grouping by item type and price

Once we know we’ve got a document with an item type and some prices, we iterate over the item’s prices and emit key/values pairs. The key is an array of the item and the price, and forms the basis for CouchDB’s sorted index. In this case, the value is the name of the shop where the item can be found for the listed price.

View rows are sorted by their keys – in this example, first by item, then by price. This method of complex sorting is at the heart of creating useful indexes with CouchDB.

MapReduce can be challenging, especially if you’ve spent years working with relational databases. The important things to keep in mind are that map functions give you an opportunity to sort your data using any key you choose, and that CouchDB’s design is focused on providing fast, efficient access to data within a range of keys. 1.4.5. Triggering Replication

Futon can trigger replication between two local databases, between a local and remote database, or even between two remote databases. We’ll show you how to replicate data from one local database to another, which is a simple way of making backups of your databases as we’re working through the examples.

First we’ll need to create an empty database to be the target of replication. Return to the overview and create a database called hello-replication. Now click “Replicator” in the sidebar and choose hello-world as the source and hello-replication as the target. Click “Replicate” to replicate your database. The result should look something like Figure 10. Running database replication in Futon. Running database replication in Futon

Figure 10. Running database replication in Futon

Note

For larger databases, replication can take much longer. It is important to leave the browser window open while replication is taking place. As an alternative, you can trigger replication via curl or some other HTTP client that can handle long-running connections. If your client closes the connection before replication finishes, you’ll have to retrigger it. Luckily, CouchDB’s replication can take over from where it left off instead of starting from scratch. 1.4.6. Wrapping Up

Now that you’ve seen most of Futon’s features, you’ll be prepared to dive in and inspect your data as we build our example application in the next few documents. Futon’s pure JavaScript approach to managing CouchDB shows how it’s possible to build a fully featured web application using only CouchDB’s HTTP API and integrated web server.

But before we get there, we’ll have another look at CouchDB’s HTTP API – now with a magnifying glass. Let’s curl up on the couch and relax.

1.5. The Core API

This document explores the CouchDB in minute detail. It shows all the nitty-gritty and clever bits. We show you best practices and guide you around common pitfalls.

We start out by revisiting the basic operations we ran in the previous document Getting Started, looking behind the scenes. We also show what Futon needs to do behind its user interface to give us the nice features we saw earlier.

This document is both an introduction to the core CouchDB API as well as a reference. If you can’t remember how to run a particular request or why some parameters are needed, you can always come back here and look things up (we are probably the heaviest users of this document).

While explaining the API bits and pieces, we sometimes need to take a larger detour to explain the reasoning for a particular request. This is a good opportunity for us to tell you why CouchDB works the way it does.

The API can be subdivided into the following sections. We’ll explore them individually:

   Server
   Databases
   Documents
   Replication
   Wrapping Up

1.5.1. Server

This one is basic and simple. It can serve as a sanity check to see if CouchDB is running at all. It can also act as a safety guard for libraries that require a certain version of CouchDB. We’re using the curl utility again:

curl http://127.0.0.1:5984/

CouchDB replies, all excited to get going:

{

 "couchdb": "Welcome",
 "uuid": "85fb71bf700c17267fef77535820e371",
 "vendor": {
     "name": "The Apache Software Foundation",
     "version": "1.5.0"
 },
 "version": "1.5.0"

}

You get back a JSON string, that, if parsed into a native object or data structure of your programming language, gives you access to the welcome string and version information.

This is not terribly useful, but it illustrates nicely the way CouchDB behaves. You send an HTTP request and you receive a JSON string in the HTTP response as a result. 1.5.2. Databases

Now let’s do something a little more useful: create databases. For the strict, CouchDB is a database management system (DMS). That means it can hold multiple databases. A database is a bucket that holds “related data”. We’ll explore later what that means exactly. In practice, the terminology is overlapping – often people refer to a DMS as “a database” and also a database within the DMS as “a database.” We might follow that slight oddity, so don’t get confused by it. In general, it should be clear from the context if we are talking about the whole of CouchDB or a single database within CouchDB.

Now let’s make one! We want to store our favorite music albums, and we creatively give our database the name albums. Note that we’re now using the -X option again to tell curl to send a PUT request instead of the default GET request:

curl -X PUT http://127.0.0.1:5984/albums

CouchDB replies:

{"ok":true}

That’s it. You created a database and CouchDB told you that all went well. What happens if you try to create a database that already exists? Let’s try to create that database again:

curl -X PUT http://127.0.0.1:5984/albums

CouchDB replies:

{"error":"file_exists","reason":"The database could not be created, the file already exists."}

We get back an error. This is pretty convenient. We also learn a little bit about how CouchDB works. CouchDB stores each database in a single file. Very simple.

Let’s create another database, this time with curl’s -v (for “verbose”) option. The verbose option tells curl to show us not only the essentials – the HTTP response body – but all the underlying request and response details:

curl -vX PUT http://127.0.0.1:5984/albums-backup

curl elaborates:

  • About to connect() to 127.0.0.1 port 5984 (#0)
  • Trying 127.0.0.1... connected
  • Connected to 127.0.0.1 (127.0.0.1) port 5984 (#0)

> PUT /albums-backup HTTP/1.1 > User-Agent: curl/7.16.3 (powerpc-apple-darwin9.0) libcurl/7.16.3 OpenSSL/0.9.7l zlib/1.2.3 > Host: 127.0.0.1:5984 > Accept: */* > < HTTP/1.1 201 Created < Server: CouchDB (Erlang/OTP) < Date: Sun, 05 Jul 2009 22:48:28 GMT < Content-Type: text/plain;charset=utf-8 < Content-Length: 12 < Cache-Control: must-revalidate < {"ok":true}

  • Connection #0 to host 127.0.0.1 left intact
  • Closing connection #0

What a mouthful. Let’s step through this line by line to understand what’s going on and find out what’s important. Once you’ve seen this output a few times, you’ll be able to spot the important bits more easily.

  • About to connect() to 127.0.0.1 port 5984 (#0)

This is curl telling us that it is going to establish a TCP connection to the CouchDB server we specified in our request URI. Not at all important, except when debugging networking issues.

  • Trying 127.0.0.1... connected
  • Connected to 127.0.0.1 (127.0.0.1) port 5984 (#0)

curl tells us it successfully connected to CouchDB. Again, not important if you aren’t trying to find problems with your network.

The following lines are prefixed with > and < characters. The > means the line was sent to CouchDB verbatim (without the actual >). The < means the line was sent back to curl by CouchDB.

> PUT /albums-backup HTTP/1.1

This initiates an HTTP request. Its method is PUT, the URI is /albums-backup, and the HTTP version is HTTP/1.1. There is also HTTP/1.0, which is simpler in some cases, but for all practical reasons you should be using HTTP/1.1.

Next, we see a number of request headers. These are used to provide additional details about the request to CouchDB.

> User-Agent: curl/7.16.3 (powerpc-apple-darwin9.0) libcurl/7.16.3 OpenSSL/0.9.7l zlib/1.2.3

The User-Agent header tells CouchDB which piece of client software is doing the HTTP request. We don’t learn anything new: it’s curl. This header is often useful in web development when there are known errors in client implementations that a server might want to prepare the response for. It also helps to determine which platform a user is on. This information can be used for technical and statistical reasons. For CouchDB, the User-Agent header is irrelevant.

> Host: 127.0.0.1:5984

The Host header is required by HTTP 1.1. It tells the server the hostname that came with the request.

> Accept: */*

The Accept header tells CouchDB that curl accepts any media type. We’ll look into why this is useful a little later.

>

An empty line denotes that the request headers are now finished and the rest of the request contains data we’re sending to the server. In this case, we’re not sending any data, so the rest of the curl output is dedicated to the HTTP response.

< HTTP/1.1 201 Created

The first line of CouchDB’s HTTP response includes the HTTP version information (again, to acknowledge that the requested version could be processed), an HTTP status code, and a status code message. Different requests trigger different response codes. There’s a whole range of them telling the client (curl in our case) what effect the request had on the server. Or, if an error occurred, what kind of error. RFC 2616 (the HTTP 1.1 specification) defines clear behavior for response codes. CouchDB fully follows the RFC.

The 201 Created status code tells the client that the resource the request was made against was successfully created. No surprise here, but if you remember that we got an error message when we tried to create this database twice, you now know that this response could include a different response code. Acting upon responses based on response codes is a common practice. For example, all response codes of 400 Bad Request or larger tell you that some error occurred. If you want to shortcut your logic and immediately deal with the error, you could just check a >= 400 response code.

< Server: CouchDB (Erlang/OTP)

The Server header is good for diagnostics. It tells us which CouchDB version and which underlying Erlang version we are talking to. In general, you can ignore this header, but it is good to know it’s there if you need it.

< Date: Sun, 05 Jul 2009 22:48:28 GMT

The Date header tells you the time of the server. Since client and server time are not necessarily synchronized, this header is purely informational. You shouldn’t build any critical application logic on top of this!

< Content-Type: text/plain;charset=utf-8

The Content-Type header tells you which MIME type the HTTP response body is and its encoding. We already know CouchDB returns JSON strings. The appropriate Content-Type header is application/json. Why do we see text/plain? This is where pragmatism wins over purity. Sending an application/json Content-Type header will make a browser offer you the returned JSON for download instead of just displaying it. Since it is extremely useful to be able to test CouchDB from a browser, CouchDB sends a text/plain content type, so all browsers will display the JSON as text.

Note

There are some extensions that make your browser JSON-aware, but they are not installed by default. For more information, look at the popular JSONView extension, available for both Firefox and Chrome.

Do you remember the Accept request header and how it is set to \*/\* -> */* to express interest in any MIME type? If you send Accept: application/json in your request, CouchDB knows that you can deal with a pure JSON response with the proper Content-Type header and will use it instead of text/plain.

< Content-Length: 12

The Content-Length header simply tells us how many bytes the response body has.

< Cache-Control: must-revalidate

This Cache-Control header tells you, or any proxy server between CouchDB and you, not to cache this response.

<

This empty line tells us we’re done with the response headers and what follows now is the response body.

{"ok":true}

We’ve seen this before.

  • Connection #0 to host 127.0.0.1 left intact
  • Closing connection #0

The last two lines are curl telling us that it kept the TCP connection it opened in the beginning open for a moment, but then closed it after it received the entire response.

Throughout the documents, we’ll show more requests with the -v option, but we’ll omit some of the headers we’ve seen here and include only those that are important for the particular request.

Creating databases is all fine, but how do we get rid of one? Easy – just change the HTTP method:

> curl -vX DELETE http://127.0.0.1:5984/albums-backup

This deletes a CouchDB database. The request will remove the file that the database contents are stored in. There is no “Are you sure?” safety net or any “Empty the trash” magic you’ve got to do to delete a database. Use this command with care. Your data will be deleted without a chance to bring it back easily if you don’t have a backup copy.

This section went knee-deep into HTTP and set the stage for discussing the rest of the core CouchDB API. Next stop: documents. 1.5.3. Documents

Documents are CouchDB’s central data structure. The idea behind a document is, unsurprisingly, that of a real-world document – a sheet of paper such as an invoice, a recipe, or a business card. We already learned that CouchDB uses the JSON format to store documents. Let’s see how this storing works at the lowest level.

Each document in CouchDB has an ID. This ID is unique per database. You are free to choose any string to be the ID, but for best results we recommend a UUID (or GUID), i.e., a Universally (or Globally) Unique IDentifier. UUIDs are random numbers that have such a low collision probability that everybody can make thousands of UUIDs a minute for millions of years without ever creating a duplicate. This is a great way to ensure two independent people cannot create two different documents with the same ID. Why should you care what somebody else is doing? For one, that somebody else could be you at a later time or on a different computer; secondly, CouchDB replication lets you share documents with others and using UUIDs ensures that it all works. But more on that later; let’s make some documents:

curl -X PUT http://127.0.0.1:5984/albums/6e1295ed6c29495e54cc05947f18c8af -d '{"title":"There is Nothing Left to Lose","artist":"Foo Fighters"}'

CouchDB replies:

{"ok":true,"id":"6e1295ed6c29495e54cc05947f18c8af","rev":"1-2902191555"}

The curl command appears complex, but let’s break it down. First, -X PUT tells curl to make a PUT request. It is followed by the URL that specifies your CouchDB IP address and port. The resource part of the URL /albums/6e1295ed6c29495e54cc05947f18c8af specifies the location of a document inside our albums database. The wild collection of numbers and characters is a UUID. This UUID is your document’s ID. Finally, the -d flag tells curl to use the following string as the body for the PUT request. The string is a simple JSON structure including title and artist attributes with their respective values.

Note

If you don’t have a UUID handy, you can ask CouchDB to give you one (in fact, that is what we did just now without showing you). Simply send a GET /_uuids request:

curl -X GET http://127.0.0.1:5984/_uuids

CouchDB replies:

{"uuids":["6e1295ed6c29495e54cc05947f18c8af"]}

Voilà, a UUID. If you need more than one, you can pass in the ?count=10 HTTP parameter to request 10 UUIDs, or really, any number you need.

To double-check that CouchDB isn’t lying about having saved your document (it usually doesn’t), try to retrieve it by sending a GET request:

curl -X GET http://127.0.0.1:5984/albums/6e1295ed6c29495e54cc05947f18c8af

We hope you see a pattern here. Everything in CouchDB has an address, a URI, and you use the different HTTP methods to operate on these URIs.

CouchDB replies:

{"_id":"6e1295ed6c29495e54cc05947f18c8af","_rev":"1-2902191555","title":"There is Nothing Left to Lose","artist":"Foo Fighters"}

This looks a lot like the document you asked CouchDB to save, which is good. But you should notice that CouchDB added two fields to your JSON structure. The first is _id, which holds the UUID we asked CouchDB to save our document under. We always know the ID of a document if it is included, which is very convenient.

The second field is _rev. It stands for revision. Revisions

If you want to change a document in CouchDB, you don’t tell it to go and find a field in a specific document and insert a new value. Instead, you load the full document out of CouchDB, make your changes in the JSON structure (or object, when you are doing actual programming), and save the entire new revision (or version) of that document back into CouchDB. Each revision is identified by a new _rev value.

If you want to update or delete a document, CouchDB expects you to include the _rev field of the revision you wish to change. When CouchDB accepts the change, it will generate a new revision number. This mechanism ensures that, in case somebody else made a change without you knowing before you got to request the document update, CouchDB will not accept your update because you are likely to overwrite data you didn’t know existed. Or simplified: whoever saves a change to a document first, wins. Let’s see what happens if we don’t provide a _rev field (which is equivalent to providing a outdated value):

curl -X PUT http://127.0.0.1:5984/albums/6e1295ed6c29495e54cc05947f18c8af \

    -d '{"title":"There is Nothing Left to Lose","artist":"Foo Fighters","year":"1997"}'

CouchDB replies:

{"error":"conflict","reason":"Document update conflict."}

If you see this, add the latest revision number of your document to the JSON structure:

curl -X PUT http://127.0.0.1:5984/albums/6e1295ed6c29495e54cc05947f18c8af \

    -d '{"_rev":"1-2902191555","title":"There is Nothing Left to Lose","artist":"Foo Fighters","year":"1997"}'

Now you see why it was handy that CouchDB returned that _rev when we made the initial request. CouchDB replies:

{"ok":true,"id":"6e1295ed6c29495e54cc05947f18c8af","rev":"2-8aff9ee9d06671fa89c99d20a4b3ae"}

CouchDB accepted your write and also generated a new revision number. The revision number is the MD5 hash of the transport representation of a document with an N- prefix denoting the number of times a document got updated. This is useful for replication. See Replication and conflict model for more information.

There are multiple reasons why CouchDB uses this revision system, which is also called Multi-Version Concurrency Control (MVCC). They all work hand-in-hand, and this is a good opportunity to explain some of them.

One of the aspects of the HTTP protocol that CouchDB uses is that it is stateless. What does that mean? When talking to CouchDB you need to make requests. Making a request includes opening a network connection to CouchDB, exchanging bytes, and closing the connection. This is done every time you make a request. Other protocols allow you to open a connection, exchange bytes, keep the connection open, exchange more bytes later – maybe depending on the bytes you exchanged at the beginning – and eventually close the connection. Holding a connection open for later use requires the server to do extra work. One common pattern is that for the lifetime of a connection, the client has a consistent and static view of the data on the server. Managing huge amounts of parallel connections is a significant amount of work. HTTP connections are usually short-lived, and making the same guarantees is a lot easier. As a result, CouchDB can handle many more concurrent connections.

Another reason CouchDB uses MVCC is that this model is simpler conceptually and, as a consequence, easier to program. CouchDB uses less code to make this work, and less code is always good because the ratio of defects per lines of code is static.

The revision system also has positive effects on replication and storage mechanisms, but we’ll explore these later in the documents.

Warning

The terms version and revision might sound familiar (if you are programming without version control, stop reading this guide right now and start learning one of the popular systems). Using new versions for document changes works a lot like version control, but there’s an important difference: CouchDB does not guarantee that older versions are kept around. Documents in Detail

Now let’s have a closer look at our document creation requests with the curl -v flag that was helpful when we explored the database API earlier. This is also a good opportunity to create more documents that we can use in later examples.

We’ll add some more of our favorite music albums. Get a fresh UUID from the /_uuids resource. If you don’t remember how that works, you can look it up a few pages back.

curl -vX PUT http://127.0.0.1:5984/albums/70b50bfa0a4b3aed1f8aff9e92dc16a0 \

    -d '{"title":"Blackened Sky","artist":"Biffy Clyro","year":2002}'

Note

By the way, if you happen to know more information about your favorite albums, don’t hesitate to add more properties. And don’t worry about not knowing all the information for all the albums. CouchDB’s schema-less documents can contain whatever you know. After all, you should relax and not worry about data.

Now with the -v option, CouchDB’s reply (with only the important bits shown) looks like this:

> PUT /albums/70b50bfa0a4b3aed1f8aff9e92dc16a0 HTTP/1.1 > < HTTP/1.1 201 Created < Location: http://127.0.0.1:5984/albums/70b50bfa0a4b3aed1f8aff9e92dc16a0 < ETag: "1-e89c99d29d06671fa0a4b3ae8aff9e" < {"ok":true,"id":"70b50bfa0a4b3aed1f8aff9e92dc16a0","rev":"1-e89c99d29d06671fa0a4b3ae8aff9e"}

We’re getting back the 201 Created HTTP status code in the response headers, as we saw earlier when we created a database. The Location header gives us a full URL to our newly created document. And there’s a new header. An ETag in HTTP-speak identifies a specific version of a resource. In this case, it identifies a specific version (the first one) of our new document. Sound familiar? Yes, conceptually, an ETag is the same as a CouchDB document revision number, and it shouldn’t come as a surprise that CouchDB uses revision numbers for ETags. ETags are useful for caching infrastructures. Attachments

CouchDB documents can have attachments just like an email message can have attachments. An attachment is identified by a name and includes its MIME type (or Content-Type) and the number of bytes the attachment contains. Attachments can be any data. It is easiest to think about attachments as files attached to a document. These files can be text, images, Word documents, music, or movie files. Let’s make one.

Attachments get their own URL where you can upload data. Say we want to add the album artwork to the 6e1295ed6c29495e54cc05947f18c8af document (“There is Nothing Left to Lose”), and let’s also say the artwork is in a file artwork.jpg in the current directory:

curl -vX PUT http://127.0.0.1:5984/albums/6e1295ed6c29495e54cc05947f18c8af/artwork.jpg?rev=2-2739352689 \

    --data-binary @artwork.jpg -H "Content-Type:image/jpg"

Note

The --data-binary @ option tells curl to read a file’s contents into the HTTP request body. We’re using the -H option to tell CouchDB that we’re uploading a JPEG file. CouchDB will keep this information around and will send the appropriate header when requesting this attachment; in case of an image like this, a browser will render the image instead of offering you the data for download. This will come in handy later. Note that you need to provide the current revision number of the document you’re attaching the artwork to, just as if you would update the document. Because, after all, attaching some data is changing the document.

You should now see your artwork image if you point your browser to http://127.0.0.1:5984/albums/6e1295ed6c29495e54cc05947f18c8af/artwork.jpg

If you request the document again, you’ll see a new member:

curl http://127.0.0.1:5984/albums/6e1295ed6c29495e54cc05947f18c8af

CouchDB replies:

{

 "_id": "6e1295ed6c29495e54cc05947f18c8af",
 "_rev": "3-131533518",
 "title": "There is Nothing Left to Lose",
 "artist": "Foo Fighters",
 "year": "1997",
 "_attachments": {
     "artwork.jpg": {
         "stub": true,
         "content_type": "image/jpg",
         "length": 52450
     }
 }

}

_attachments is a list of keys and values where the values are JSON objects containing the attachment metadata. stub=true tells us that this entry is just the metadata. If we use the ?attachments=true HTTP option when requesting this document, we’d get a Base64 encoded string containing the attachment data.

We’ll have a look at more document request options later as we explore more features of CouchDB, such as replication, which is the next topic. 1.5.4. Replication

CouchDB replication is a mechanism to synchronize databases. Much like rsync synchronizes two directories locally or over a network, replication synchronizes two databases locally or remotely.

In a simple POST request, you tell CouchDB the source and the target of a replication and CouchDB will figure out which documents and new document revisions are on source that are not yet on target, and will proceed to move the missing documents and revisions over.

We’ll take an in-depth look at replication in the document Introduction Into Replications; in this document, we’ll just show you how to use it.

First, we’ll create a target database. Note that CouchDB won’t automatically create a target database for you, and will return a replication failure if the target doesn’t exist (likewise for the source, but that mistake isn’t as easy to make):

curl -X PUT http://127.0.0.1:5984/albums-replica

Now we can use the database albums-replica as a replication target:

curl -vX POST http://127.0.0.1:5984/_replicate \

    -d '{"source":"albums","target":"albums-replica"}' \
    -H "Content-Type: application/json"

Note

CouchDB supports the option "create_target":true placed in the JSON POSTed to the _replicate URL. It implicitly creates the target database if it doesn’t exist.

CouchDB replies (this time we formatted the output so you can read it more easily):

{

 "history": [
   {
     "start_last_seq": 0,
     "missing_found": 2,
     "docs_read": 2,
     "end_last_seq": 5,
     "missing_checked": 2,
     "docs_written": 2,
     "doc_write_failures": 0,
     "end_time": "Sat, 11 Jul 2009 17:36:21 GMT",
     "start_time": "Sat, 11 Jul 2009 17:36:20 GMT"
   }
 ],
 "source_last_seq": 5,
 "session_id": "924e75e914392343de89c99d29d06671",
 "ok": true

}

CouchDB maintains a session history of replications. The response for a replication request contains the history entry for this replication session. It is also worth noting that the request for replication will stay open until replication closes. If you have a lot of documents, it’ll take a while until they are all replicated and you won’t get back the replication response until all documents are replicated. It is important to note that replication replicates the database only as it was at the point in time when replication was started. So, any additions, modifications, or deletions subsequent to the start of replication will not be replicated.

We’ll punt on the details again – the "ok": true at the end tells us all went well. If you now have a look at the albums-replica database, you should see all the documents that you created in the albums database. Neat, eh?

What you just did is called local replication in CouchDB terms. You created a local copy of a database. This is useful for backups or to keep snapshots of a specific state of your data around for later. You might want to do this if you are developing your applications but want to be able to roll back to a stable version of your code and data.

There are more types of replication useful in other situations. The source and target members of our replication request are actually links (like in HTML) and so far we’ve seen links relative to the server we’re working on (hence local). You can also specify a remote database as the target:

curl -vX POST http://127.0.0.1:5984/_replicate \

    -d '{"source":"albums","target":"http://example.org:5984/albums-replica"}' \
    -H "Content-Type:application/json"

Using a local source and a remote target database is called push replication. We’re pushing changes to a remote server.

Note

Since we don’t have a second CouchDB server around just yet, we’ll just use the absolute address of our single server, but you should be able to infer from this that you can put any remote server in there.

This is great for sharing local changes with remote servers or buddies next door.

You can also use a remote source and a local target to do a pull replication. This is great for getting the latest changes from a server that is used by others:

curl -vX POST http://127.0.0.1:5984/_replicate \

    -d '{"source":"http://example.org:5984/albums-replica","target":"albums"}' \
    -H "Content-Type:application/json"

Finally, you can run remote replication, which is mostly useful for management operations:

curl -vX POST http://127.0.0.1:5984/_replicate \

    -d '{"source":"http://example.org:5984/albums","target":"http://example.org:5984/albums-replica"}' \
    -H"Content-Type: application/json"

Note

CouchDB and REST

CouchDB prides itself on having a RESTful API, but these replication requests don’t look very RESTy to the trained eye. What’s up with that? While CouchDB’s core database, document, and attachment API are RESTful, not all of CouchDB’s API is. The replication API is one example. There are more, as we’ll see later in the documents.

Why are there RESTful and non-RESTful APIs mixed up here? Have the developers been too lazy to go REST all the way? Remember, REST is an architectural style that lends itself to certain architectures (such as the CouchDB document API). But it is not a one-size-fits-all. Triggering an event like replication does not make a whole lot of sense in the REST world. It is more like a traditional remote procedure call. And there is nothing wrong with this.

We very much believe in the “use the right tool for the job” philosophy, and REST does not fit every job. For support, we refer to Leonard Richardson and Sam Ruby who wrote RESTful Web Services (O’Reilly), as they share our view. 1.5.5. Wrapping Up

This is still not the full CouchDB API, but we discussed the essentials in great detail. We’re going to fill in the blanks as we go. For now, we believe you’re ready to start building CouchDB applications.

See also

Complete HTTP API Reference:

   Server API Reference
   Database API Reference
   Document API Reference
   Replication API

1.6. Security

In this document, we’ll look at the basic security mechanisms in CouchDB: the Admin Party, Basic Authentication, Cookie Authentication; how CouchDB handles users and protects their credentials. 1.6.1. Authentication The Admin Party

When you start out fresh, CouchDB allows any request to be made by anyone. Create a database? No problem, here you go. Delete some documents? Same deal. CouchDB calls this the Admin Party. Everybody has privileges to do anything. Neat.

While it is incredibly easy to get started with CouchDB that way, it should be obvious that putting a default installation into the wild is adventurous. Any rogue client could come along and delete a database.

A note of relief: by default, CouchDB will listen only on your loopback network interface (127.0.0.1 or localhost) and thus only you will be able to make requests to CouchDB, nobody else. But when you start to open up your CouchDB to the public (that is, by telling it to bind to your machine’s public IP address), you will want to think about restricting access so that the next bad guy doesn’t ruin your admin party.

In our previous discussions, we dropped some keywords about how things without the Admin Party work. First, there’s admin itself, which implies some sort of super user. Then there are privileges. Let’s explore these terms a little more.

CouchDB has the idea of an admin user (e.g. an administrator, a super user, or root) that is allowed to do anything to a CouchDB installation. By default, everybody is an admin. If you don’t like that, you can create specific admin users with a username and password as their credentials.

CouchDB also defines a set of requests that only admin users are allowed to do. If you have defined one or more specific admin users, CouchDB will ask for identification for certain requests:

   Creating a database (PUT /database)
   Deleting a database (DELETE /database)
   Setup a database security (PUT /database/_security)
   Creating a design document (PUT /database/_design/app)
   Updating a design document (PUT /database/_design/app?rev=1-4E2)
   Deleting a design document (DELETE /database/_design/app?rev=2-6A7)
   Execute a temporary view (POST /database/_temp_view)
   Triggering compaction (POST /database/_compact)
   Reading the task status list (GET /_active_tasks)
   Restarting the server (POST /_restart)
   Reading the active configuration (GET /_config)
   Updating the active configuration (PUT /_config/section/key)

Creating New Admin User

Let’s do another walk through the API using curl to see how CouchDB behaves when you add admin users.

> HOST="http://127.0.0.1:5984" > curl -X PUT $HOST/database {"ok":true}

When starting out fresh, we can add a database. Nothing unexpected. Now let’s create an admin user. We’ll call her anna, and her password is secret. Note the double quotes in the following code; they are needed to denote a string value for the configuration API:

> curl -X PUT $HOST/_config/admins/anna -d '"secret"' ""

As per the _config API’s behavior, we’re getting the previous value for the config item we just wrote. Since our admin user didn’t exist, we get an empty string. Hashing Passwords

Seeing the plain-text password is scary, isn’t it? No worries, CouchDB doesn’t show up the plain-text password anywhere. It gets hashed right away. The hash is that big, ugly, long string that starts out with -hashed-. How does that work?

   Creates a new 128-bit UUID. This is our salt.
   Creates a sha1 hash of the concatenation of the bytes of the plain-text password and the salt (sha1(password + salt)).
   Prefixes the result with -hashed- and appends ,salt.

To compare a plain-text password during authentication with the stored hash, the same procedure is run and the resulting hash is compared to the stored hash. The probability of two identical hashes for different passwords is too insignificant to mention (c.f. Bruce Schneier). Should the stored hash fall into the hands of an attacker, it is, by current standards, way too inconvenient (i.e., it’d take a lot of money and time) to find the plain-text password from the hash.

But what’s with the -hashed- prefix? When CouchDB starts up, it reads a set of .ini files with config settings. It loads these settings into an internal data store (not a database). The config API lets you read the current configuration as well as change it and create new entries. CouchDB is writing any changes back to the .ini files.

The .ini files can also be edited by hand when CouchDB is not running. Instead of creating the admin user as we showed previously, you could have stopped CouchDB, opened your local.ini, added anna = secret to the admins, and restarted CouchDB. Upon reading the new line from local.ini, CouchDB would run the hashing algorithm and write back the hash to local.ini, replacing the plain-text password. To make sure CouchDB only hashes plain-text passwords and not an existing hash a second time, it prefixes the hash with -hashed-, to distinguish between plain-text passwords and hashed passwords. This means your plain-text password can’t start with the characters -hashed-, but that’s pretty unlikely to begin with.

Note

Since 1.3.0 release CouchDB uses -pbkdf2- prefix by default to sign about using PBKDF2 hashing algorithm instead of SHA1. Basic Authentication

Now that we have defined an admin, CouchDB will not allow us to create new databases unless we give the correct admin user credentials. Let’s verify:

> curl -X PUT $HOST/somedatabase {"error":"unauthorized","reason":"You are not a server admin."}

That looks about right. Now we try again with the correct credentials:

> HOST="http://anna:secret@127.0.0.1:5984" > curl -X PUT $HOST/somedatabase {"ok":true}

If you have ever accessed a website or FTP server that was password-protected, the username:password@ URL variant should look familiar.

If you are security conscious, the missing s in http:// will make you nervous. We’re sending our password to CouchDB in plain text. This is a bad thing, right? Yes, but consider our scenario: CouchDB listens on 127.0.0.1 on a development box that we’re the sole user of. Who could possibly sniff our password?

If you are in a production environment, however, you need to reconsider. Will your CouchDB instance communicate over a public network? Even a LAN shared with other collocation customers is public. There are multiple ways to secure communication between you or your application and CouchDB that exceed the scope of this documentation. CouchDB as of version 1.1.0 comes with SSL built in.

See also

Basic Authentication API Reference Cookie Authentication

Basic authentication that uses plain-text passwords is nice and convenient, but not very secure if no extra measures are taken. It is also a very poor user experience. If you use basic authentication to identify admins, your application’s users need to deal with an ugly, unstylable browser modal dialog that says non-professional at work more than anything else.

To remedy some of these concerns, CouchDB supports cookie authentication. With cookie authentication your application doesn’t have to include the ugly login dialog that the users’ browsers come with. You can use a regular HTML form to submit logins to CouchDB. Upon receipt, CouchDB will generate a one-time token that the client can use in its next request to CouchDB. When CouchDB sees the token in a subsequent request, it will authenticate the user based on the token without the need to see the password again. By default, a token is valid for 10 minutes.

To obtain the first token and thus authenticate a user for the first time, the username and password must be sent to the _session API. The API is smart enough to decode HTML form submissions, so you don’t have to resort to any smarts in your application.

If you are not using HTML forms to log in, you need to send an HTTP request that looks as if an HTML form generated it. Luckily, this is super simple:

> HOST="http://127.0.0.1:5984" > curl -vX POST $HOST/_session \

      -H 'Content-Type:application/x-www-form-urlencoded' \
      -d 'name=anna&password=secret'

CouchDB replies, and we’ll give you some more detail:

< HTTP/1.1 200 OK < Set-Cookie: AuthSession=YW5uYTo0QUIzOTdFQjrC4ipN-D-53hw1sJepVzcVxnriEw; < Version=1; Path=/; HttpOnly > ... < {"ok":true}

A 200 OK response code tells us all is well, a Set-Cookie header includes the token we can use for the next request, and the standard JSON response tells us again that the request was successful.

Now we can use this token to make another request as the same user without sending the username and password again:

> curl -vX PUT $HOST/mydatabase \

      --cookie AuthSession=YW5uYTo0QUIzOTdFQjrC4ipN-D-53hw1sJepVzcVxnriEw \
      -H "X-CouchDB-WWW-Authenticate: Cookie" \
      -H "Content-Type:application/x-www-form-urlencoded"

{"ok":true}

You can keep using this token for 10 minutes by default. After 10 minutes you need to authenticate your user again. The token lifetime can be configured with the timeout (in seconds) setting in the couch_httpd_auth configuration section.

See also

Cookie Authentication API Reference 1.6.2. Authentication Database

You may already note, that CouchDB administrators are defined within config file and you now wondering does regular users are also stored there. No, they don’t. CouchDB has special authentication database – _users by default – that stores all registered users as JSON documents.

CouchDB uses special database (called _users by default) to store information about registered users. This is a system database – this means that while it shares common database API, there are some special security-related constraints applied and used agreements on documents structure. So how authentication database is different from others?

   Only administrators may browse list of all documents (GET /_users/_all_docs)
   Only administrators may listen changes feed (GET /_users/_changes)
   Only administrators may execute design functions like views, shows and others
   Only administrators may GET, PUT or DELETE any document (to be honest, that they always can do)
   There is special design document _auth that cannot be modified
   Every document (of course, except design documents) represents registered CouchDB users and belong to them
   Users may only access (GET /_users/org.couchdb.user:Jan) or modify (PUT /_users/org.couchdb.user:Jan) documents that they owns

These draconian rules are reasonable: CouchDB cares about user’s personal information and doesn’t discloses it for everyone. Often, users documents are contains not only system information like login, password hash and roles, but also sensitive personal information like: real name, email, phone, special internal identifications and more - this is not right information that you want to share with the World. Users Documents

Each CouchDB user is stored in document format. These documents are contains several mandatory fields, that CouchDB handles for correct authentication process:

   _id (string): Document ID. Contains user’s login with special prefix Why org.couchdb.user: prefix?
   derived_key (string): PBKDF2 key
   name (string): User’s name aka login. Immutable e.g. you cannot rename existed user - you have to create new one
   roles (array of string): List of user roles. CouchDB doesn’t provides any builtin roles, so you’re free to define your own depending on your needs. However, you cannot set system roles like _admin there. Also, only administrators may assign roles to users - by default all users have no roles
   password_sha (string): Hashed password with salt. Used for simple password_scheme
   password_scheme (string): Password hashing scheme. May be simple or pbkdf2
   salt (string): Hash salt. Used for simple password_scheme
   type (string): Document type. Constantly have value user

Additionally, you may specify any custom fields that are relates to the target user. This is good place to store user’s private information because only the target user and CouchDB administrators may browse it. Why org.couchdb.user: prefix?

The reason to have special prefix before user’s login name is to have namespaces which users are belongs to. This prefix is designed to prevent replication conflicts when you’ll try to merge two _user databases or more.

For current CouchDB releases, all users are belongs to the same org.couchdb.user namespace and this cannot be changed, but we’d made such design decision for future releases. Creating New User

Creating new user is a very trivial operation. You need just to send single PUT request with user’s data to CouchDB. Let’s create user with login jan and password apple:

curl -X PUT http://localhost:5984/_users/org.couchdb.user:jan \

    -H "Accept: application/json" \
    -H "Content-Type: application/json" \
    -d '{"name": "jan", "password": "apple", "roles": [], "type": "user"}'

This curl command will produce next HTTP request:

PUT /_users/org.couchdb.user:jan HTTP/1.1 Accept: application/json Content-Length: 62 Content-Type: application/json Host: localhost:5984 User-Agent: curl/7.31.0

And CouchDB responds with:

HTTP/1.1 201 Created Cache-Control: must-revalidate Content-Length: 83 Content-Type: application/json Date: Fri, 27 Sep 2013 07:33:28 GMT ETag: "1-e0ebfb84005b920488fc7a8cc5470cc0" Location: http://localhost:5984/_users/org.couchdb.user:jan Server: CouchDB (Erlang OTP)

{"ok":true,"id":"org.couchdb.user:jan","rev":"1-e0ebfb84005b920488fc7a8cc5470cc0"}

Document successfully created what also means that user jan have created too! Let’s check is this true:

curl -X POST http://localhost:5984/_session -d 'name=jan&password=apple'

CouchDB should respond with:

{"ok":true,"name":"jan","roles":[]}

Which means that username was recognized and password’s hash matches with stored one. If we specify wrong login and/or password, CouchDB will notify us with the next error message:

{"error":"unauthorized","reason":"Name or password is incorrect."}

Password Changing

This is quite common situation: user had forgot their password, it was leaked somehow (via copy-paste, screenshot, or by typing in wrong chat window) or something else. Let’s change password for our user jan.

First of all, let’s define what is the password changing from the point of CouchDB and the authentication database. Since “users” are “documents”, this operation is nothing, but updating the document with special field password which contains the plain text password. Scared? No need to: the authentication database has special internal hook on document update which looks for this field and replaces it with the secured hash, depending on chosen password_scheme.

Summarizing above, we need to get document content, add password field with new plain text password value and store JSON result to the authentication database.

curl -X GET http://localhost:5984/_users/org.couchdb.user:jan

{

 "_id": "org.couchdb.user:jan",
 "_rev": "1-e0ebfb84005b920488fc7a8cc5470cc0",
 "derived_key": "e579375db0e0c6a6fc79cd9e36a36859f71575c3",
 "iterations": 10,
 "name": "jan",
 "password_scheme": "pbkdf2",
 "roles": [],
 "salt": "1112283cf988a34f124200a050d308a1",
 "type": "user"

}

Here is our user’s document. We may strip hashes from stored document to reduce amount of posted data:

curl -X PUT http://localhost:5984/_users/org.couchdb.user:jan \

    -H "Accept: application/json" \
    -H "Content-Type: application/json" \
    -H "If-Match: 1-e0ebfb84005b920488fc7a8cc5470cc0" \
    -d '{"name":"jan", "roles":[], "type":"user", "password":"orange"}'

{"ok":true,"id":"org.couchdb.user:jan","rev":"2-ed293d3a0ae09f0c624f10538ef33c6f"}

Updated! Now let’s check that password was really changed:

curl -X POST http://localhost:5984/_session -d 'name=jan&password=apple'

CouchDB should respond with:

{"error":"unauthorized","reason":"Name or password is incorrect."}

Looks like the password apple is wrong, what about orange?

curl -X POST http://localhost:5984/_session -d 'name=jan&password=orange'

CouchDB should respond with:

{"ok":true,"name":"jan","roles":[]}

Hooray! You may wonder why so complex: need to retrieve user’s document, add special field to it, post it back - where is one big button that changes the password without worry about document’s content? Actually, Futon has such at the right bottom corner if you have logged in - all implementation details are hidden from your sight.

Note

There is no password confirmation for API request: you should implement it on your application layer like Futon does. Users Public Information

New in version 1.4.

Sometimes users wants to share some information with the World. For instance, their contact email to let other users get in touch with them. To solve this problem, but still keep sensitive and private information secured there is special configuration option public_fields. In this options you may define comma separated list of users document fields that will be publicity available.

Normally, if you request any user’s document and you’re not administrator or this document owner, CouchDB will respond with 404 Not Found:

curl http://localhost:5984/_users/org.couchdb.user:robert

{"error":"not_found","reason":"missing"}

This response is constant for both cases when user exists or not exists - by security reasons.

Now let’s share field name. First, setup the public_fields configuration option. Remember, that this action requires administrator’s privileges and the next command will ask for password for user admin, assuming that they are the server administrator:

curl -X PUT http://localhost:5984/_config/couch_http_auth/public_fields \

    -H "Content-Type: application/json" \
    -d '"name"' \
    -u admin

What have changed? Let’s check Robert’s document once again:

curl http://localhost:5984/_users/org.couchdb.user:robert

{"_id":"org.couchdb.user:robert","_rev":"6-869e2d3cbd8b081f9419f190438ecbe7","name":"robert"}

Good news! Now we may read field name from every user’s document without need to be an administrator. That’s important note: don’t publish sensitive information, especially without user’s acknowledge - they may not like such actions from your side.

1.7. Futon: Web GUI Administration Panel

Futon is a native web-based interface built into CouchDB. It provides a basic interface to the majority of the functionality, including the ability to create, update, delete and view documents and views, provides access to the configuration parameters, and an interface for initiating replication.

The default view is the Overview page which provides you with a list of the databases. The basic structure of the page is consistent regardless of the section you are in. The main panel on the left provides the main interface to the databases, configuration or replication systems. The side panel on the right provides navigation to the main areas of Futon interface: Futon Overview

Futon Overview

The main sections are:

   Overview
   The main overview page, which provides a list of the databases and provides the interface for querying the database and creating and updating documents. See Managing Databases and Documents.
   Configuration
   An interface into the configuration of your CouchDB installation. The interface allows you to edit the different configurable parameters. For more details on configuration, see Configuring CouchDB section.
   Replicator
   An interface to the replication system, enabling you to initiate replication between local and remote databases. See Configuring Replication.
   Status
   Displays a list of the running background tasks on the server. Background tasks include view index building, compaction and replication. The Status page is an interface to the Active Tasks API call.
   Verify Installation
   The Verify Installation allows you to check whether all of the components of your CouchDB installation are correctly installed.
   Test Suite
   The Test Suite section allows you to run the built-in test suite. This executes a number of test routines entirely within your browser to test the API and functionality of your CouchDB installation. If you select this page, you can run the tests by using the Run All button. This will execute all the tests, which may take some time.

1.7.1. Managing Databases and Documents

You can manage databases and documents within Futon using the main Overview section of the Futon interface.

To create a new database, click the Create Database ELLIPSIS button. You will be prompted for the database name, as shown in the figure below. Creating a Database

Creating a Database

Once you have created the database (or selected an existing one), you will be shown a list of the current documents. If you create a new document, or select an existing document, you will be presented with the edit document display.

Editing documents within Futon requires selecting the document and then editing (and setting) the fields for the document individually before saving the document back into the database.

For example, the figure below shows the editor for a single document, a newly created document with a single ID, the document _id field. Editing a Document

Editing a Document

To add a field to the document:

   Click Add Field.
   In the fieldname box, enter the name of the field you want to create. For example, “company”.
   Click the green tick next to the field name to confirm the field name change.
   Double-click the corresponding Value cell.
   Enter a company name, for example “Example”.
   Click the green tick next to the field value to confirm the field value.
   The document is still not saved as this point. You must explicitly save the document by clicking the Save Document button at the top of the page. This will save the document, and then display the new document with the saved revision information (the _rev field).
   Edited Document
   Edited Document

The same basic interface is used for all editing operations within Futon. You must remember to save the individual element (fieldname, value) using the green tick button, before then saving the document. 1.7.2. Configuring Replication

When you click the Replicator option within the Tools menu you are presented with the Replicator screen. This allows you to start replication between two databases by filling in or select the appropriate options within the form provided. Replication Form

Replication Form

To start a replication process, either the select the local database or enter a remote database name into the corresponding areas of the form. Replication occurs from the database on the left to the database on the right.

If you are specifying a remote database name, you must specify the full URL of the remote database (including the host, port number and database name). If the remote instance requires authentication, you can specify the username and password as part of the URL, for example http://username:pass@remotehost:5984/demo.

To enable continuous replication, click the Continuous checkbox.

To start the replication process, click the Replicate button. The replication process should start and will continue in the background. If the replication process will take a long time, you can monitor the status of the replication using the Status option under the Tools menu.

Once replication has been completed, the page will show the information returned when the replication process completes by the API.

The Replicator tool is an interface to the underlying replication API. For more information, see /_replicate. For more information on replication, see Replication.

1.8. cURL: Your Command Line Friend

The curl utility is a command line tool available on Unix, Linux, Mac OS X and Windows and many other platforms. curl provides easy access to the HTTP protocol (among others) directly from the command-line and is therefore an ideal way of interacting with CouchDB over the HTTP REST API.

For simple GET requests you can supply the URL of the request. For example, to get the database information:

shell> curl http://127.0.0.1:5984

This returns the database information (formatted in the output below for clarity):

{

   "couchdb": "Welcome",
   "uuid": "85fb71bf700c17267fef77535820e371",
   "vendor": {
       "name": "The Apache Software Foundation",
       "version": "1.4.0"
   },
   "version": "1.4.0"

}

Note

For some URLs, especially those that include special characters such as ampersand, exclamation mark, or question mark, you should quote the URL you are specifying on the command line. For example:

shell> curl 'http://couchdb:5984/_uuids?count=5'

You can explicitly set the HTTP command using the -X command line option. For example, when creating a database, you set the name of the database in the URL you send using a PUT request:

shell> curl -X PUT http://127.0.0.1:5984/demo {"ok":true}

But to obtain the database information you use a GET request (with the return information formatted for clarity):

shell> curl -X GET http://127.0.0.1:5984/demo {

  "compact_running" : false,
  "doc_count" : 0,
  "db_name" : "demo",
  "purge_seq" : 0,
  "committed_update_seq" : 0,
  "doc_del_count" : 0,
  "disk_format_version" : 5,
  "update_seq" : 0,
  "instance_start_time" : "1306421773496000",
  "disk_size" : 79

}

For certain operations, you must specify the content type of request, which you do by specifying the Content-Type header using the -H command-line option:

shell> curl -H 'Content-Type: application/json' http://127.0.0.1:5984/_uuids

You can also submit ‘payload’ data, that is, data in the body of the HTTP request using the -d option. This is useful if you need to submit JSON structures, for example document data, as part of the request. For example, to submit a simple document to the demo database:

shell> curl -H 'Content-Type: application/json' \

           -X POST http://127.0.0.1:5984/demo \
           -d '{"company": "Example, Inc."}'

{"ok":true,"id":"8843faaf0b831d364278331bc3001bd8",

"rev":"1-33b9fbce46930280dab37d672bbc8bb9"}

In the above example, the argument after the -d option is the JSON of the document we want to submit.

The document can be accessed by using the automatically generated document ID that was returned:

shell> curl -X GET http://127.0.0.1:5984/demo/8843faaf0b831d364278331bc3001bd8 {"_id":"8843faaf0b831d364278331bc3001bd8",

"_rev":"1-33b9fbce46930280dab37d672bbc8bb9",
"company":"Example, Inc."}

The API samples in the API Basics show the HTTP command, URL and any payload information that needs to be submitted (and the expected return value). All of these examples can be reproduced using curl with the command-line examples shown above.

2.1. Installation on Unix-like systems

A high-level guide to Unix-like systems, inc. Mac OS X and Ubuntu.

This document is the canonical source of installation information. However, many systems have gotchas that you need to be aware of. In addition, dependencies frequently change as distributions update their archives. If you’re running into trouble, be sure to check out the wiki. If you have any tips to share, please also update the wiki so that others can benefit from your experience.

See also

Community installation guides 2.1.1. Troubleshooting

   There is a troubleshooting guide.
   There is a wiki for general documentation.
   There are collection of friendly mailing lists.

Please work through these in order if you experience any problems. 2.1.2. Dependencies

You should have the following installed:

   Erlang OTP (>=R14B01, =<R17)
   ICU
   OpenSSL
   Mozilla SpiderMonkey (1.8.5)
   GNU Make
   GNU Compiler Collection
   libcurl
   help2man
   Python (>=2.7) for docs
   Python Sphinx (>=1.1.3)

It is recommended that you install Erlang OTP R13B-4 or above where possible. You will only need libcurl if you plan to run the JavaScript test suite. And help2man is only need if you plan on installing the CouchDB man pages. Python and Sphinx are only required for building the online documentation. Debian-based Systems

You can install the dependencies by running:

sudo apt-get install build-essential sudo apt-get install erlang-base-hipe sudo apt-get install erlang-dev sudo apt-get install erlang-manpages sudo apt-get install erlang-eunit sudo apt-get install erlang-nox sudo apt-get install libicu-dev sudo apt-get install libmozjs-dev sudo apt-get install libcurl4-openssl-dev

There are lots of Erlang packages. If there is a problem with your install, try a different mix. There is more information on the wiki. Additionally, you might want to install some of the optional Erlang tools which may also be useful.

Be sure to update the version numbers to match your system’s available packages.

Unfortunately, it seems that installing dependencies on Ubuntu is troublesome.

See also

   Installing on Debian
   Installing on Ubuntu

RedHat-based (Fedora, Centos, RHEL) Systems

You can install the dependencies by running:

sudo yum install autoconf sudo yum install autoconf-archive sudo yum install automake sudo yum install curl-devel sudo yum install erlang-asn1 sudo yum install erlang-erts sudo yum install erlang-eunit sudo yum install erlang-os_mon sudo yum install erlang-xmerl sudo yum install help2man sudo yum install js-devel sudo yum install libicu-devel sudo yum install libtool sudo yum install perl-Test-Harness

While CouchDB builds against the default js-devel-1.7.0 included in some distributions, it’s recommended to use a more recent js-devel-1.8.5. Mac OS X

Follow Installation with HomeBrew reference till brew install couchdb step. 2.1.3. Installing

Once you have satisfied the dependencies you should run:

./configure

This script will configure CouchDB to be installed into /usr/local by default.

If you wish to customise the installation, pass –help to this script.

If everything was successful you should see the following message:

You have configured Apache CouchDB, time to relax.

Relax.

To install CouchDB you should run:

make && sudo make install

You only need to use sudo if you’re installing into a system directory.

Try gmake if make is giving you any problems.

If everything was successful you should see the following message:

You have installed Apache CouchDB, time to relax.

Relax. 2.1.4. First Run

You can start the CouchDB server by running:

sudo -i -u couchdb couchdb

This uses the sudo command to run the couchdb command as the couchdb user.

When CouchDB starts it should eventually display the following message:

Apache CouchDB has started, time to relax.

Relax.

To check that everything has worked, point your web browser to:

http://127.0.0.1:5984/_utils/index.html

From here you should verify your installation by pointing your web browser to:

http://localhost:5984/_utils/verify_install.html

2.1.5. Security Considerations

You should create a special couchdb user for CouchDB.

On many Unix-like systems you can run:

adduser --system \

       --home /usr/local/var/lib/couchdb \
       --no-create-home \
       --shell /bin/bash \
       --group --gecos \
       "CouchDB Administrator" couchdb

On Mac OS X you can use the Workgroup Manager to create users.

You must make sure that:

   The user has a working POSIX shell
   The user’s home directory is /usr/local/var/lib/couchdb

You can test this by:

   Trying to log in as the couchdb user
   Running pwd and checking the present working directory

Change the ownership of the CouchDB directories by running:

chown -R couchdb:couchdb /usr/local/etc/couchdb chown -R couchdb:couchdb /usr/local/var/lib/couchdb chown -R couchdb:couchdb /usr/local/var/log/couchdb chown -R couchdb:couchdb /usr/local/var/run/couchdb

Change the permission of the CouchDB directories by running:

chmod 0770 /usr/local/etc/couchdb chmod 0770 /usr/local/var/lib/couchdb chmod 0770 /usr/local/var/log/couchdb chmod 0770 /usr/local/var/run/couchdb

2.1.6. Running as a Daemon SysV/BSD-style Systems

You can use the couchdb init script to control the CouchDB daemon.

On SysV-style systems, the init script will be installed into:

/usr/local/etc/init.d

On BSD-style systems, the init script will be installed into:

/usr/local/etc/rc.d

We use the [init.d|rc.d] notation to refer to both of these directories.

You can control the CouchDB daemon by running:

/usr/local/etc/[init.d|rc.d]/couchdb [start|stop|restart|status]

If you wish to configure how the init script works, you can edit:

/usr/local/etc/default/couchdb

Comment out the COUCHDB_USER setting if you’re running as a non-superuser.

To start the daemon on boot, copy the init script to:

/etc/[init.d|rc.d]

You should then configure your system to run the init script automatically.

You may be able to run:

sudo update-rc.d couchdb defaults

If this fails, consult your system documentation for more information.

A logrotate configuration is installed into:

/usr/local/etc/logrotate.d/couchdb

Consult your logrotate documentation for more information.

It is critical that the CouchDB logs are rotated so as not to fill your disk.