Showing posts with label Amazon. Show all posts
Showing posts with label Amazon. Show all posts

Monday, March 23, 2009

Streaming Hadoop Data Into R Scripts

Along the lines of Mongo Measurement Requires Mongo Management, the HadoopStreaming package on CRAN provides utilities for applying R scripts to Hadoop streaming.

Hadoop has been deployed on Amazon's EC2. See our more recent ACM article, "Hadoop Superlinear Scalability: The Perpetual Motion of Parallel Performance" for a more detailed discussion about scalability issues.

Saturday, August 23, 2008

It's the PLANNING, Stupid!

The phrase "It's the ECONOMY, stupid!" helped boost Bill Clinton's election prospects in 1992. In 1999, when eBay.com was having its "CNN moments" (as we called them, back then), I declared that capacity planning seemed to be an oxymoron for many pre-bubble-bursting web sites. They were prepared to throw any amount of money at lots of iron, which presumably meant they understood the "capacity" part (capital expenditure). It's the "planning" part they didn't grok. Planning means thinking ahead. Planning requires investment in the future. If the Financial Dept. can do it, why can't IT?

Thursday, August 21, 2008

GCaP Book Availability

Guerrilla graduate, Greg Rogers, alerted me to the fact that Amazon.com is showing my GCaP book as being "Temporarily out of stock". Sideways, this turns out to be good news. I know some of you will be thinking ahead to Xmas gifts, so don't be dissuaded. :-) I contacted my editor at Springer.de and here's what's going on.

Back in April, I was busily directing the attention of the GCaP course attendees to the Guerrilla Manual booklet inserted in the back cover when, to their surprise and my chagrin, they discovered that their copies did not have the booklet. Mine did, but it was an older copy. What the ...!? Turns out, it was a production error in the latest printing, which is now in the process of being corrected. The misprint versions have possibly been "recalled" and that's why it is temporarily out of stock. I'm sure Amazon will still be happy to take your order. Thanks, Greg.

Saturday, February 16, 2008

Web 2.0 Meets Error 33

Apparently Amazon's Elastic Cloud snapped yesterday and havoc rained down on a number of Web 2.0 sites. This is unfortunate because the same kind of technology was deployed very rapidly (elastically?), exactly one year ago, to help search for missing computer scientist and yachtsman, Jim Gray.

When I was at Xerox PARC, we had a term for this kind of failure mode: Error 33. Error 33 states that it is not a good idea for the success of your research project to be dependent on the possible failure of someone else's research project. This term was coined by the first Director of Xerox PARC, Dr. George Pake and the nomenclature is reminiscent of Catch 22.

Error 33 is an all too appropriate reminder that a lot of Web 2.0 technology, which is hyped as ready for prime-time, is really still in the R&D phase. It's probably only very annoying when SmugMug is off the air for several hours, but mission-critical services like banks and hospitals should approach with caution. Achieving higher reliability is only likely to come at a higher premium.

Monday, February 26, 2007

Helping Amazon's Mechanical Turk Search for Jim Gray's Yacht

Regrettably, Jim Gray is still missing, but I thought Amazon.com deserved more kudos than they got in the press for their extraordinary effort to help in the search for Gray's yacht. Google got a lot of press coverage for talking up the idea of using satellite image sources, but Amazon did it. Why is that? One reason is that Amazon has a lot of experience operating and maintaining very large distributed databases (VLDBs). Another reason is that it's not just Google that has been developing interesting Internet tools. Amazon (somewhat quitely, by comparison) has also developed their own Internet tools, like the Mechanical Turk. These two strengths combined at Amazon and enabled them to load a huge number of satellite images of the Pacific into the Turk database, thereby facilitating anyone (many eyes) to scan them via the Turk interface, and all that on very short order. Jim would be impressed.

I spent several hours on Sunday, Feb 4th, using Amazon's Mechanical Turk to help look for Gray's yacht. The images above (here about one quarter the size displayed by the Turk) show one example where I thought there might have been an interesting object; possibly a yacht. Image A is captured by the satellite at a short time before image B (t1 < t2). You can think of the satellite as sweeping down this page. Things like whitecaps on the ocean surface are going to tend dissipate and thus change pixels between successive frames, whereas a solid object like a ship will tend to remain invariant. The red circle (which I added) marks such a stable set of pixels which also have approximately the correct dimensions for a yacht i.e., about 10 pixels long (as explained by Amazon). Unfortunately, what appears to be an interesting object here has not led to the discovery of Gray's whereabouts.

Use of the Turk satellite images was hampered by a lack of any way to reference the images (about 5 per web page) by number, and there was no coordinate system within each image to express the location of any interesting objects. These limitations could have led to ambiguities in the follow up human expert search of flagged images. However, given that time was of the essence for any possible rescue effort, omitting these niceties was a completely understandable decision.