High Scalability - High Scalability - Pomegranate - Storing Billions and Billions of Tiny Little Files

Pomegranate is a novel distributed file system built over distributed tabular storage that acts an awful lot like a NoSQL system. It's targeted at increasing the performance of tiny object access in order to support applications like online photo and micro-blog services, which require high concurrency, high throughput, and low latency. Their tests seem to indicate it works:

  <blockquote>
    <p>
      We have demonstrate that file system over tabular storage performs well for highly concurrent access. In our test cluster, we observed <strong>linearly </strong>increased more than <strong>100,000</strong> aggregate read and write requests served per second (<span class="caps"><span class="caps">RPS</span></span>).<em> </em>
    </p>
  </blockquote>
  
  <p>
    Rather than sitting atop the file system like almost every other K-V store, Pomegranate is baked into file system. The idea is that the file system API is common to every platform so it wouldn't require a separate API to use. Every application could use it out of the box.
  </p>
  
  <p>
    The features of Pomegranate are:
  </p>
  
  <ul>
    <li>
      It handles billions of small files efficiently, even in one directory;
    </li>
    <li>
      It provide separate and scalable caching layer, which can be snapshot-able;
    </li>
    <li>
      The storage layer uses log structured store to absorb small file writes to utilize the disk bandwidth;
    </li>
    <li>
      Build a global namespace for both small files and large files;
    </li>
    <li>
      Columnar storage to exploit temporal and spatial locality;
    </li>
    <li>
      Distributed extendible hash to index metadata;
    </li>
    <li>
      Snapshot-able and reconfigurable caching to increase parallelism and tolerant failures;
    </li>
    <li>
      Pomegranate should be the first file system that is built over tabular storage, and the building experience should be worthy for file system community. 
    </li>
  </ul>
</blockquote>

<div class="posterous_quote_citation">
  via <a href="http://highscalability.com/blog/2010/8/30/pomegranate-storing-billions-and-billions-of-tiny-little-fil.html">highscalability.com</a>
</div>

<p>
  Very cool technology. This reminded me of a distributed filesystem Google Tech Talk (<a href="http://www.youtube.com/watch?v=3xKZ4KGkQY8">http://www.youtube.com/watch?v=3xKZ4KGkQY8</a>) on Wuala (<a href="http://www.wuala.com/)">http://www.wuala.com/)</a> that I found fascinating for all the little problems they had to overcome to make this work.
</p>

Posted via email from Sijin Joseph

Sijin Joseph
Sijin Joseph

Hands-on technology leader with 15+ yrs of experience in launching products, scaling up teams and setting up software infrastructure.