<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Antipatter &#187; ec2</title>
	<atom:link href="http://antipatter.com/tag/ec2/feed/" rel="self" type="application/rss+xml" />
	<link>http://antipatter.com</link>
	<description>The Web, The Business, The Smoke and Mirrors</description>
	<lastBuildDate>Tue, 15 Nov 2011 15:34:53 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Everybody Disco</title>
		<link>http://antipatter.com/2008/09/everybody-disco/</link>
		<comments>http://antipatter.com/2008/09/everybody-disco/#comments</comments>
		<pubDate>Fri, 05 Sep 2008 02:13:04 +0000</pubDate>
		<dc:creator>loren</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[concurrent]]></category>
		<category><![CDATA[ec2]]></category>
		<category><![CDATA[erlang]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://antipatter.com/?p=181</guid>
		<description><![CDATA[
I&#8217;m looking at a new project called Disco.  It was developed by Nokia as an implementation of MapReduce, the Google-spawned algorithm for sharding a large computing task into pieces that can be crunched by multiple cores or servers.  Word has it that MapReduce is used extensively at Google, probably to build that big index, I&#8217;m [...]
]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m looking at a new project called <a title="Disco" href="http://www.discoproject.org" target="_blank">Disco</a>.  It was developed by Nokia as an implementation of <a title="MapReduce" href="http://labs.google.com/papers/mapreduce.html" target="_blank">MapReduce</a>, the Google-spawned algorithm for sharding a large computing task into pieces that can be crunched by multiple cores or servers.  Word has it that MapReduce is used extensively at Google, probably to build that big index, I&#8217;m guessing.</p>
<p>The core of Disco is implemented in <a title="Erlang" href="http://www.erlang.org/" target="_blank">Erlang</a>, the concurrent, fault-tolerant, multicore-ready distributed computing platform developed some time ago at Ericsson.  Erlang is brilliant, though <a title="Thinking in Concurrency" href="http://antipatter.com/2008/07/thinking-in-concurrency/" target="_blank">kind of weird</a>, and represents a big educational hurdle for the existing programming population.  It&#8217;s just too different from the existing major programming paradigms.</p>
<p>Disco takes a stab at solving that problem, by allowing programmers to write their jobs in <a title="Python" href="http://python.org/" target="_blank">Python</a>.  The jobs are executed by the Erlang core, buying all that distributed, fault-tolerant goodness that Erlang provides, but keeping it safely sealed away from application developers who can work in the relatively friendlier world of Python.</p>
<p>Here (lifted directly from the Disco documentation) is a Disco job:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">from</span> disco.<span style="color: black;">core</span> <span style="color: #ff7700;font-weight:bold;">import</span> Disco, result_iterator
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> fun_map<span style="color: black;">&#40;</span>e, params<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: black;">&#91;</span><span style="color: black;">&#40;</span>w, <span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">for</span> w <span style="color: #ff7700;font-weight:bold;">in</span> e.<span style="color: black;">split</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#93;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> fun_reduce<span style="color: black;">&#40;</span><span style="color: #008000;">iter</span>, out, params<span style="color: black;">&#41;</span>:
    s = <span style="color: black;">&#123;</span><span style="color: black;">&#125;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> w, f <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">iter</span>:
        s<span style="color: black;">&#91;</span>w<span style="color: black;">&#93;</span> = s.<span style="color: black;">get</span><span style="color: black;">&#40;</span>w, <span style="color: #ff4500;">0</span><span style="color: black;">&#41;</span> + <span style="color: #008000;">int</span><span style="color: black;">&#40;</span>f<span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> w, f <span style="color: #ff7700;font-weight:bold;">in</span> s.<span style="color: black;">iteritems</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
        out.<span style="color: black;">add</span><span style="color: black;">&#40;</span>w, f<span style="color: black;">&#41;</span>
&nbsp;
results = Disco<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;disco://localhost&quot;</span><span style="color: black;">&#41;</span>.<span style="color: black;">new_job</span><span style="color: black;">&#40;</span>
		name = <span style="color: #483d8b;">&quot;wordcount&quot;</span>,
                <span style="color: #008000;">input</span> = <span style="color: black;">&#91;</span><span style="color: #483d8b;">&quot;http://discoproject.org/chekhov.txt&quot;</span><span style="color: black;">&#93;</span>,
                <span style="color: #008000;">map</span> = fun_map,
		<span style="color: #008000;">reduce</span> = fun_reduce<span style="color: black;">&#41;</span>.<span style="color: black;">wait</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">for</span> word, frequency <span style="color: #ff7700;font-weight:bold;">in</span> result_iterator<span style="color: black;">&#40;</span>results<span style="color: black;">&#41;</span>:
	<span style="color: #ff7700;font-weight:bold;">print</span> word, frequency</pre></div></div>

<p>So this code snip is about creating a word count of some text.  MapReduce always consists of two functions &#8211; the Map function, which is used to split up a big job into a bunch of smaller jobs, and the Reduce function which assembles it back together into a single result.  (This is the essence of MapReduce, and isn&#8217;t tied to a particular technology).</p>
<p>The code above has two fun_* functions.  &#8220;Fun&#8221; is a Erlang-ism that creates an anonymous function, not unlike a lambda in Python.  The functions themselves are passed into the Disco instance which then spits out the results, once all the reduce functions exit no doubt.</p>
<p>So in the above code example, it looks like each word gets its own job, zipping through the text and getting a frequency count.  The job split is initially established by fun_map.  Then fun_reduce runs, concurrently, once per unique word in the text and counts up the frequency of that word, adding its results to the &#8220;out&#8221; accumulator.  Disco ties it all together and returns it as the &#8220;results&#8221;.</p>
<p>Wait, this gets better.  Disco comes with tools that allow it to be deployed on <a title="Amazon EC2" href="http://www.amazon.com/gp/browse.html?node=201590011" target="_blank">Amazon&#8217;s EC2</a> computing cloud.  (Hm, Python.  <em>Django-Disco</em> anyone?) Imagine dynamic, linear capacity scaling, on rented compute cycles, with easily written Python jobs.  I think I might be salivating a bit.</p>
<p>I&#8217;m a huge fan of anything that can deliver concurrent programming power in a form that&#8217;s paletable to programmers that haven&#8217;t grown up with it.  I&#8217;m going to eagerly watch the Disco project to see how it does.</p>
<div class="acc_license"></div><!---->]]></content:encoded>
			<wfw:commentRss>http://antipatter.com/2008/09/everybody-disco/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

