Like any large scale web site, Shopzilla suffers from the same dilemma – how to scale large amounts of data, and do it fast. There are many different ways of doing this floating around the Internet nowadays. Cassandra, memcached, Hadoop, Oracle Coherence, etc. We chose the Oracle Coherence route at Shopzilla.
A great example of our Coherence implementation is our document server for our search engine. We have a fairly simple schema for these documents – it’s an Integer representing a document identifier, and a String representing the document. This sounds easy, but out of each colocation, we need to serve over 120 million documents with near-immediate response time. Our search engine issues requests for documents in batches in parallel, and our document server responds to each batch within 2.5 – 3.5 milliseconds, with the total document requests for a single search taking no more than 25 milliseconds, including parsing by the search engine. This includes a Java-based web service running Apache Tomcat, Oracle Coherence storing our documents and a C-based packed data structure for the serialization protocol.

We use 14 dual processor, dual core servers for this Coherence grid. Each server runs 4 JVMs, one per core, maximizing the use of the cores and RAM in the servers. This grid keeps a “backup count” to stay failsafe, and has extra capacity to allow 2 machines to fail without a data loss. You obviously incur a small performance penalty if you lose a machine while the grid recovers, but you continue to serve traffic, and the grid recovers automatically.
We build search indexes several times a day. It used to require taking a search cluster offline to update the document server’s indexes. Now that we use Coherence, we push data in to the grid without interruption, and without a performance penalty.
With our previous system, you can see obvious spikes in response times during our index refreshes:

Response times during index refreshes
With our new system, we can load and serve data without any performance penalties. That is evident by the response times in this graph:

Response times during data load
Coherence allowed us to replace our legacy SSD-based SAN and legacy document server code with a new architecture, utilizing our web service archetype, in a short period of time. We had a complete replacement online, serving production traffic, in one 2 week iteration.
Oracle’s support staff has been remarkably responsive to our stumbling blocks as we embrace this emerging technology and has made a great partner for our continued expansion and future.
As we continue to invest in our technology platform, you can be rest assured that Oracle Coherence will continue to be a major player in our designs and architecture.
Comments (5)
Comment RSS | Trackback URI
I’d be really interested in hearing what other technologies (especially open source ones) you investigated before choosing to use Oracle Coherence.
Many projects and companies don’t have enough spare cash to purchase Oracle software and support, so knowing what open source alternatives exist would be really helpful.
Learning why your engineers decided that the open sources technologies they researched were not sufficient enough compared to Oracle’s offering would be enlightening.
Coherence has caching semantics, including automatic redistribution and partitioning of data when nodes are added or removed. It has built-in support for implementing “CacheStores” that can perform read-through for cache misses and write-through or write-behind to any data source for backup. The query functionality (tied to their powerful serialization implementation) offers fast, compelling query access, more than just a simple “get” of a key in the map.
Coherence also supports calculation within the grid (distributed to the nodes), along with a good API for use in our existing C/C++ clients.
Unfortunately, none of the open source alternatives were as feature rich as Coherence, and none could match the response time of Oracle’s support staff for issues.
There are several viable open source alternatives out there – but chief among them is Memcached. For us, it came down the the fact that as a mature marketplace with fairly significant scale, we had a number of requirements that weren’t available out-of-the-box with the OS products. While we could have adapted the product for our needs, the trade-off was having to develop a significant engineering competency in the (memcached) solution itself, rather than focus on delivery of our core value proposition (shopping).
In our case, Coherence did what we needed. It isn’t free, but given the trade between dollars and engineers, we were fortunate to be able to choose dollars this time.
For a company or product with less specific requirements, or more appetite for this type of systems engineering, the open source alternatives including memcached are excellent and should be closely considered. Memcached does run Facebook after all…
Rob, Phil, thank you both for your replies
Terrific work! This is the type of information that should be shared around the web. Shame on the search engines for not positioning this post higher!
Related Posts (5)
Leave a Comment