SAM SIG: Thinking at Scale with Hadoop



  • Title: Thinking at Scale with Hadoop

    Processing big data with Hadoop requires a new way of thinking about data problems. Many people approach Hadoop with an SQL or MPP bias, but the inherent nature of the map-reduce paradigm means a fields/operations/pipes view is more appropriate. In this talk, we'll cover examples of typical data processing tasks and how to model them using this approach, as well as how to accelerate this type of development using the Cascading open source project.

    Speaker Bio: 
    Ken Krugler has been a software developer, consultant, trainer and entrepreneur for over 20 years. He started Krugle in 2005, as a pioneer in code search and an early adopter/supporter of Nutch, Hadoop, Lucene and Solr. Currently, his Bixo Labs web mining company uses Hadoop, Cascading, and Bixo to solve large-scale web mining and data analysis problems. He is a committer for the Apache Tika project, an author of one of the new Lucene In Action use cases, and an expert in web crawling and data mining.

    New Location:
    LinkedIn in Mountain View (details to follow)