<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Hbase for storing Users?</title>
	<atom:link href="http://andrewmccall.com/2009/06/hbase-for-storing-users/feed/" rel="self" type="application/rss+xml" />
	<link>http://andrewmccall.com/2009/06/hbase-for-storing-users/</link>
	<description>If you want to know what I think...</description>
	<lastBuildDate>Tue, 02 Mar 2010 12:33:07 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: andrewmccall</title>
		<link>http://andrewmccall.com/2009/06/hbase-for-storing-users/comment-page-1/#comment-63</link>
		<dc:creator>andrewmccall</dc:creator>
		<pubDate>Sun, 28 Jun 2009 14:15:25 +0000</pubDate>
		<guid isPermaLink="false">http://andrewmccall.com/?p=410#comment-63</guid>
		<description>TIm, That sound interesting, I&#039;ve not played with Solr but I am creating a lucene index using some of the field in some of the tables - the cluster is running some highly customised nutch jobs based on the code here: &lt;a href=&quot;http://github.com/andrewmccall/nutchbase&quot; rel=&quot;nofollow&quot;&gt;http://github.com/andrewmccall/nutchbase&lt;/a&gt;. I considered putting the user Ids in a luncene index and using that to  find users, but I was a bit reticent to implement it because I felt there was too much I didn&#039;t know.  &lt;br&gt;&lt;br&gt;Thinking about it again, I may just look at both implementations in more depth because it may be a better way to go especially as indexes start to pile up.</description>
		<content:encoded><![CDATA[<p>TIm, That sound interesting, I&#39;ve not played with Solr but I am creating a lucene index using some of the field in some of the tables &#8211; the cluster is running some highly customised nutch jobs based on the code here: <a href="http://github.com/andrewmccall/nutchbase" rel="nofollow">http://github.com/andrewmccall/nutchbase</a>. I considered putting the user Ids in a luncene index and using that to  find users, but I was a bit reticent to implement it because I felt there was too much I didn&#39;t know.  </p>
<p>Thinking about it again, I may just look at both implementations in more depth because it may be a better way to go especially as indexes start to pile up.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tim Sell</title>
		<link>http://andrewmccall.com/2009/06/hbase-for-storing-users/comment-page-1/#comment-60</link>
		<dc:creator>Tim Sell</dc:creator>
		<pubDate>Sun, 28 Jun 2009 04:32:15 +0000</pubDate>
		<guid isPermaLink="false">http://andrewmccall.com/?p=410#comment-60</guid>
		<description>Just an idea.&lt;br&gt;Have you considered using a search server for the indexing/searching and hbase for storing?&lt;br&gt;You&#039;d have to keep them in sync of course, but solr is quite useful. You can optimize for just returning ids, by indexing fields and not storing them and there is progress on sharding if you really need it. It doesn&#039;t scale to billions of rows of course, but it unlikely that will be a problem for users. You can do exact matches on any of the fields, and of course utilise full text searches where appropriate.</description>
		<content:encoded><![CDATA[<p>Just an idea.<br />Have you considered using a search server for the indexing/searching and hbase for storing?<br />You&#39;d have to keep them in sync of course, but solr is quite useful. You can optimize for just returning ids, by indexing fields and not storing them and there is progress on sharding if you really need it. It doesn&#39;t scale to billions of rows of course, but it unlikely that will be a problem for users. You can do exact matches on any of the fields, and of course utilise full text searches where appropriate.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: andrewmccall</title>
		<link>http://andrewmccall.com/2009/06/hbase-for-storing-users/comment-page-1/#comment-58</link>
		<dc:creator>andrewmccall</dc:creator>
		<pubDate>Sat, 27 Jun 2009 01:58:49 +0000</pubDate>
		<guid isPermaLink="false">http://andrewmccall.com/?p=410#comment-58</guid>
		<description>Thanks for that Jonathan, good to know I&#039;m more or less on the right path. Now that you mention it I remember reading about it in the doc but since forgot. Looked again and saw this:&lt;br&gt;&lt;br&gt;&lt;a href=&quot;http://hadoop.apache.org/hbase/docs/current/api/index.html?org/apache/hadoop/hbase/regionserver/transactional/package-summary.html&quot; rel=&quot;nofollow&quot;&gt;http://hadoop.apache.org/hbase/docs/current/api...&lt;/a&gt;&lt;br&gt;&lt;br&gt;Which I&#039;ll look into and post about if it&#039;s useful.</description>
		<content:encoded><![CDATA[<p>Thanks for that Jonathan, good to know I&#39;m more or less on the right path. Now that you mention it I remember reading about it in the doc but since forgot. Looked again and saw this:</p>
<p><a href="http://hadoop.apache.org/hbase/docs/current/api/index.html?org/apache/hadoop/hbase/regionserver/transactional/package-summary.html" rel="nofollow"></a><a href="http://hadoop.apache.org/hbase/docs/current/api.." rel="nofollow">http://hadoop.apache.org/hbase/docs/current/api..</a>.</p>
<p>Which I&#39;ll look into and post about if it&#39;s useful.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Gray</title>
		<link>http://andrewmccall.com/2009/06/hbase-for-storing-users/comment-page-1/#comment-57</link>
		<dc:creator>Jonathan Gray</dc:creator>
		<pubDate>Fri, 26 Jun 2009 22:30:04 +0000</pubDate>
		<guid isPermaLink="false">http://andrewmccall.com/?p=410#comment-57</guid>
		<description>Basic secondary indexing on HBase is done as you describe.  Create an additional table for each index where the row id is the indexed field.  This is also included as an integrated feature using TransactionalHBase which will take care of managing the secondary tables for you.  It uses OCC (optimistic concurrency control) for safety.&lt;br&gt;&lt;br&gt;In my own usage, I manage the secondary tables at the application level.  This is faster but less safe.&lt;br&gt;&lt;br&gt;I have plans to add a less safe, but fast server-side implementation of this in the future for my own purposes.  But I also heard there&#039;s a chance OCC will be pluggable for the current implementation, in which case I&#039;d just use that.  Sign up to the mailing list, 0.20.0 release coming up soon and that will be determined for that release.</description>
		<content:encoded><![CDATA[<p>Basic secondary indexing on HBase is done as you describe.  Create an additional table for each index where the row id is the indexed field.  This is also included as an integrated feature using TransactionalHBase which will take care of managing the secondary tables for you.  It uses OCC (optimistic concurrency control) for safety.</p>
<p>In my own usage, I manage the secondary tables at the application level.  This is faster but less safe.</p>
<p>I have plans to add a less safe, but fast server-side implementation of this in the future for my own purposes.  But I also heard there&#39;s a chance OCC will be pluggable for the current implementation, in which case I&#39;d just use that.  Sign up to the mailing list, 0.20.0 release coming up soon and that will be determined for that release.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
