distributed replicated blob server 20040804 review

Download

The Distributed Replicated Blob Server Project (drbs) is a young project, not mature enough to handle production data. You still m

License:	GPL (GNU General Public License)
File size:	0K
Developer:	Joerg Beyer

The Distributed Replicated Blob Server Project (drbs) is a young project, not mature enough to handle production data.

You still might take a look and report any feedback from build problemss or bugs to ideas what problems drbs might solve for you.

How to keep a large set of blobs available under the following circumstances:

Requirements:
the blobs are immutable (once they are written).
when you retrieve a blob, then you want it at whole (not seeking in the blob).
the blobs are identified by a simple number (choosen by the server, not influenced by teh client), the blobid.
failure of storage components is expected.

drbs introduces 3 components:

the blobclient. It is the client Library to access the blobs.
a number of blobserver. The atually store the blobs in a file system. Blobs are stored and downloaded. Each blob is stored on a number (e.g.: 3) blobserver, so the failure of a blobserver can be compensated, the remaining blobserver could replicate the blob to the degree of redundance that you want. A sensible setup needs at least 10 blobservers, but they could all run on the same host. For more redundancy I would spread them to more hardware - but for a test a single machine works well. The google people speak of hundreds of these server processes and machines.
a single blobmaster. It coordinates where the blobs are stored and tells the blobclient for a blob lookup, where they can get each blob. The blobmaster never sees the actual blob - only the meta information.

The blobs are validated with a (md5) checksum. This makes sure that failing disk and/or mistakes by humans are detected. The blobmaster keeps all his data in ram (it is not very large, since it's only the meta data on the blobs).

The Blobserver keeps all the meta data in ram an has the blobs as files in the ordinary file system. The blobserver logs all changes in a logfiles, so this server could be restarted fast: the blobserver reads a logfile on startup and replays the actions, reaching the old state again. Since the logfile is just mmap'ed it could be read and interpreted fast.

Of course it would be possible to implement such a solution on top of a ordinary database but I follow the "The Google File System" paper, that claims all this could be done with much lower overhead.

This solution here is cheaper: do the math yourself and calculate what a fileserver and this el-cheapo solution would cost you. This souftware assumes that hardware will fail, so cheaper hardware that will fail could be choosen.

While this blob server works on a single machine, it is intended to scale up to store larger sets of blobs on many machines. The google paper talks of hundreds of machines.

distributed replicated blob server 20040804 screenshot
Zoom

distributed replicated blob server 20040804 keywords

distributed replicated blob server 20040804: The Distributed Replicated Blob Server Project (drbs) is a young project, not mature enough to handle production data. You still m
The Freeduc-cd 1.5: Freeduc is a "run-from-CD" Linux distribution based on Knoppix and created by OFSET in France: "Until now - and probably for a while
Featherweight Linux 1.3: Featherweight Linux is my Live-CD installable Linux distribution that I remastered from Feather Linux, which is built on knoppix te
Fenris 0.07-m2 build 3245: Fenris is a suite of tools suitable for code analysis, debugging, protocol analysis, reverse engineering, forensics, diagnostics, sec
TYPO3 4.0.3 RC1: TYPO3 is a free Open Source content management system for enterprise purposes on the web and in intranets. TYPO3 offers full flexi
SquirrelMail 1.5.1: SquirrelMail is a standards-based Webmail package written in PHP4
SLAMD 1.8.2: The SLAMD Distributed Load Generation Engine (SLAMD) is a Java-based application designed for stress testing and performance analysis

distributed replicated blob server 20040804 review

Alternative/similar