<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Antipatter &#187; python</title>
	<atom:link href="http://antipatter.com/tag/python/feed/" rel="self" type="application/rss+xml" />
	<link>http://antipatter.com</link>
	<description>The Web, The Business, The Smoke and Mirrors</description>
	<lastBuildDate>Tue, 15 Nov 2011 15:34:53 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Everybody Disco</title>
		<link>http://antipatter.com/2008/09/everybody-disco/</link>
		<comments>http://antipatter.com/2008/09/everybody-disco/#comments</comments>
		<pubDate>Fri, 05 Sep 2008 02:13:04 +0000</pubDate>
		<dc:creator>loren</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[concurrent]]></category>
		<category><![CDATA[ec2]]></category>
		<category><![CDATA[erlang]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://antipatter.com/?p=181</guid>
		<description><![CDATA[
I&#8217;m looking at a new project called Disco.  It was developed by Nokia as an implementation of MapReduce, the Google-spawned algorithm for sharding a large computing task into pieces that can be crunched by multiple cores or servers.  Word has it that MapReduce is used extensively at Google, probably to build that big index, I&#8217;m [...]
]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m looking at a new project called <a title="Disco" href="http://www.discoproject.org" target="_blank">Disco</a>.  It was developed by Nokia as an implementation of <a title="MapReduce" href="http://labs.google.com/papers/mapreduce.html" target="_blank">MapReduce</a>, the Google-spawned algorithm for sharding a large computing task into pieces that can be crunched by multiple cores or servers.  Word has it that MapReduce is used extensively at Google, probably to build that big index, I&#8217;m guessing.</p>
<p>The core of Disco is implemented in <a title="Erlang" href="http://www.erlang.org/" target="_blank">Erlang</a>, the concurrent, fault-tolerant, multicore-ready distributed computing platform developed some time ago at Ericsson.  Erlang is brilliant, though <a title="Thinking in Concurrency" href="http://antipatter.com/2008/07/thinking-in-concurrency/" target="_blank">kind of weird</a>, and represents a big educational hurdle for the existing programming population.  It&#8217;s just too different from the existing major programming paradigms.</p>
<p>Disco takes a stab at solving that problem, by allowing programmers to write their jobs in <a title="Python" href="http://python.org/" target="_blank">Python</a>.  The jobs are executed by the Erlang core, buying all that distributed, fault-tolerant goodness that Erlang provides, but keeping it safely sealed away from application developers who can work in the relatively friendlier world of Python.</p>
<p>Here (lifted directly from the Disco documentation) is a Disco job:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">from</span> disco.<span style="color: black;">core</span> <span style="color: #ff7700;font-weight:bold;">import</span> Disco, result_iterator
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> fun_map<span style="color: black;">&#40;</span>e, params<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: black;">&#91;</span><span style="color: black;">&#40;</span>w, <span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">for</span> w <span style="color: #ff7700;font-weight:bold;">in</span> e.<span style="color: black;">split</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#93;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> fun_reduce<span style="color: black;">&#40;</span><span style="color: #008000;">iter</span>, out, params<span style="color: black;">&#41;</span>:
    s = <span style="color: black;">&#123;</span><span style="color: black;">&#125;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> w, f <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">iter</span>:
        s<span style="color: black;">&#91;</span>w<span style="color: black;">&#93;</span> = s.<span style="color: black;">get</span><span style="color: black;">&#40;</span>w, <span style="color: #ff4500;">0</span><span style="color: black;">&#41;</span> + <span style="color: #008000;">int</span><span style="color: black;">&#40;</span>f<span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> w, f <span style="color: #ff7700;font-weight:bold;">in</span> s.<span style="color: black;">iteritems</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
        out.<span style="color: black;">add</span><span style="color: black;">&#40;</span>w, f<span style="color: black;">&#41;</span>
&nbsp;
results = Disco<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;disco://localhost&quot;</span><span style="color: black;">&#41;</span>.<span style="color: black;">new_job</span><span style="color: black;">&#40;</span>
		name = <span style="color: #483d8b;">&quot;wordcount&quot;</span>,
                <span style="color: #008000;">input</span> = <span style="color: black;">&#91;</span><span style="color: #483d8b;">&quot;http://discoproject.org/chekhov.txt&quot;</span><span style="color: black;">&#93;</span>,
                <span style="color: #008000;">map</span> = fun_map,
		<span style="color: #008000;">reduce</span> = fun_reduce<span style="color: black;">&#41;</span>.<span style="color: black;">wait</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">for</span> word, frequency <span style="color: #ff7700;font-weight:bold;">in</span> result_iterator<span style="color: black;">&#40;</span>results<span style="color: black;">&#41;</span>:
	<span style="color: #ff7700;font-weight:bold;">print</span> word, frequency</pre></div></div>

<p>So this code snip is about creating a word count of some text.  MapReduce always consists of two functions &#8211; the Map function, which is used to split up a big job into a bunch of smaller jobs, and the Reduce function which assembles it back together into a single result.  (This is the essence of MapReduce, and isn&#8217;t tied to a particular technology).</p>
<p>The code above has two fun_* functions.  &#8220;Fun&#8221; is a Erlang-ism that creates an anonymous function, not unlike a lambda in Python.  The functions themselves are passed into the Disco instance which then spits out the results, once all the reduce functions exit no doubt.</p>
<p>So in the above code example, it looks like each word gets its own job, zipping through the text and getting a frequency count.  The job split is initially established by fun_map.  Then fun_reduce runs, concurrently, once per unique word in the text and counts up the frequency of that word, adding its results to the &#8220;out&#8221; accumulator.  Disco ties it all together and returns it as the &#8220;results&#8221;.</p>
<p>Wait, this gets better.  Disco comes with tools that allow it to be deployed on <a title="Amazon EC2" href="http://www.amazon.com/gp/browse.html?node=201590011" target="_blank">Amazon&#8217;s EC2</a> computing cloud.  (Hm, Python.  <em>Django-Disco</em> anyone?) Imagine dynamic, linear capacity scaling, on rented compute cycles, with easily written Python jobs.  I think I might be salivating a bit.</p>
<p>I&#8217;m a huge fan of anything that can deliver concurrent programming power in a form that&#8217;s paletable to programmers that haven&#8217;t grown up with it.  I&#8217;m going to eagerly watch the Disco project to see how it does.</p>
<div class="acc_license"></div><!---->]]></content:encoded>
			<wfw:commentRss>http://antipatter.com/2008/09/everybody-disco/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Thinking in Concurrency</title>
		<link>http://antipatter.com/2008/07/thinking-in-concurrency/</link>
		<comments>http://antipatter.com/2008/07/thinking-in-concurrency/#comments</comments>
		<pubDate>Thu, 24 Jul 2008 15:58:20 +0000</pubDate>
		<dc:creator>loren</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[rampant speculation]]></category>
		<category><![CDATA[concurrent]]></category>
		<category><![CDATA[erlang]]></category>
		<category><![CDATA[multicore]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[reia]]></category>

		<guid isPermaLink="false">http://antipatter.com/?p=27</guid>
		<description><![CDATA[
A few weeks ago, Anwar Ghuloum at the Research@Intel blog advised developers that they should start thinking about &#8220;tens, hundreds and thousands of cores&#8221;.  In essence, the people that control processor development are telling us that the future of programming is going to move out rather than up.  To get an idea of the time [...]
]]></description>
			<content:encoded><![CDATA[<p>A few weeks ago, Anwar Ghuloum at the Research@Intel blog advised developers that they should start thinking about <a href="http://blogs.intel.com/research/2008/06/unwelcome_advice.php" target="_blank">&#8220;tens, hundreds and thousands of cores&#8221;</a>.  In essence, the people that control processor development are telling us that the future of programming is going to move out rather than up.  To get an idea of the time line, Intel&#8217;s CEO tells us to expect <a href="http://techfreep.com/intel-80-cores-by-2011.htm" target="_blank">80 cores by 2011</a>.  We can probably assume the number of cores on a processor is going to increase geometrically in the years that follow.</p>
<p><strong>The catch is that we don&#8217;t really know how to program for this.</strong></p>
<p>To take advantage of massively multicore architectures, we really need to take our computing problems and decompose them into smaller bits that can be farmed out to various worker processes.  This idea is what Google&#8217;s <a href="http://labs.google.com/papers/mapreduce.html" target="_blank">MapReduce</a> is all about, and is an approach that has worked well with certain categories of problems, such as bioinformatic data analysis, or 3D rendering.</p>
<h2>Programming Crisis</h2>
<p>I am seeing a pending programmer educational crisis coming.  Most programmers just are not trained to think about this kind of decomposition.  They&#8217;ve learned functional and object-oriented languages, and tend to think in a linear fashion.  Furthermore, the nature of the programming languages and platforms they&#8217;ve  worked on has ingrained the concept into them that multi-threaded programming is extremely tricky.  And with mainstream platforms, it is.</p>
<p>A lot of the existing difficulty with well-known languages, such as Java, is managing shared state, or memory between different threads.  Java has various synchronization mechanisms, but using them in heavily multithreaded environments can be challenging.  I&#8217;ve seen unit tests randomly pass or fail, swinging on race conditions &#8211; which thread ended first &#8211; that determine the success or failure of the test.</p>
<p>In the web world we&#8217;ve become rather fond of very high level languages such as Python and Ruby.  However the runtime story gets even worse &#8211; most scripting language runtimes can&#8217;t acknowledge more than one processor at a time &#8211; meaning that the only way to parallelize work is to instantiate multiple instances of the runtime.  Besides this being a hack, it makes communication between processes awkward (usually relying on expensive data serialization), and creates the kind of issues that we&#8217;ve been trying to get away from with high-level languages: the developer has to deal with boilerplate computer science problems, rather than being allowed to exclusively concentrate on their business problems.</p>
<h2>Concurrent Languages</h2>
<p>One answer seems to be tech stacks that are designed for concurrency from the ground up.  Enter <a href="http://www.erlang.org/" target="_blank">Erlang</a>, a language developed for telephony applications by Ericsson.  Erlang handles concurrency very, very well.  For example, it circumvents the shared state problem by <em>not having any shared state</em>.  However, from the perspective of someone with a Java or Python background (e.g. me) it&#8217;s frickin&#8217; weird.  Some serious adjustments in the way problems are approached have to be made.</p>
<p>Here&#8217;s a real simple code snip, that generates the squares of a list of numbers.  First a naive implementation in Python:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> squares<span style="color: black;">&#40;</span>num_list<span style="color: black;">&#41;</span>:
    square_list = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> n <span style="color: #ff7700;font-weight:bold;">in</span> num_list:
        square_list.<span style="color: black;">append</span><span style="color: black;">&#40;</span>n <span style="color: #66cc66;">*</span> n<span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> square_list</pre></div></div>

<p>A more compact version in Python, using a list comprehension:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> squares<span style="color: black;">&#40;</span>num_list<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: black;">&#91;</span>n<span style="color: #66cc66;">*</span>n <span style="color: #ff7700;font-weight:bold;">for</span> n <span style="color: #ff7700;font-weight:bold;">in</span> num_list<span style="color: black;">&#93;</span></pre></div></div>

<p>Here&#8217;s the same code in Erlang:</p>
<pre>square([H|T]) -&gt; H*H | square(T);
square([]) -&gt; [].</pre>
<p>Assuming you don&#8217;t know Erlang, can you even tell what&#8217;s going on?  You&#8217;re looking at a completely different paradigm.  The left side of the functions (to the left of the &#8220;-&gt;&#8221; marker) relies on pattern matching (sort of like regular expressions).  To the right side, the function implementation is <a title="Tail Recursion" href="http://en.wikipedia.org/wiki/Tail_recursion" target="_blank">tail recursion</a> at work.  Needless to say, this is a bit of a departure for many developers.  I shudder to think of trying to take a department of developers who cut their teeth on Java and PHP, and train them up in Erlang.</p>
<p>This is why I&#8217;m so interested in projects like <a title="Reia" href="http://wiki.reia-lang.org/wiki/Main_Page" target="_blank">Reia</a>.  In an attempt to fix the &#8220;impedance mismatch&#8221; between the coming massively multicore future, and today&#8217;s programming skills, their goal is to make a high-level Python/Ruby like language that compiles onto bytecode that will run on the Erlang VM.  It&#8217;s in a very early stage, and I&#8217;m not sure if it will ever gain critical mass, but this is a problem that needs solving, and I&#8217;m intererested in any attempts.</p>
<div class="acc_license"></div><!---->]]></content:encoded>
			<wfw:commentRss>http://antipatter.com/2008/07/thinking-in-concurrency/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

