Business Intelligence SIG: BI Over Petabytes: Meet Apache Mahout



  • DJ Cline Article

    BI Over Petabytes: Meet Apache Mahout


    There are many useful applications of Machine Learning algorithms to Business Intelligence, yet analyzing the vast amounts of data available today is still mostly a black art. The Apache Mahout project is dedicated to the production of open source Machine Learning tools on the Apache Hadoop supercomputing platform that can orchestrate thousands of computers to analyze huge volumes of data in reasonable time. Mahout currently offers highly scalable programs for classifying (is this spam?), clustering (are these similar?), recommending (if you like X you might also like Y) and other tasks that can improve their performance by learning from past experiences. Coupled with cost-effective cloud computing infrastructures such as Amazon's EC2/S3, this means that it is now practical for even small companies to distill Business Intelligence from Internet-sized datasets. The speaker will give an overview of the Mahout project and will show some illustrative examples of their work.

    Speaker Bio:

    Jeff Eastman is an engineer, mentor and entrepreneur with many years of experience building advanced software applications. He was a long-time engineer and architect at HP and left to start his own software consulting practice in 1994. More recently he has developed a passion for cloud computing and the Hadoop platform in particular. Looking for an application domain for this technology, he joined the Mahout project shortly after its inception in January 2008 and has been a committer for about a year.



    6:30pm - 7:00pm - Registration
    7:00pm - 8:30pm - Presentation

    $15 at the door for non-SDForum members
    No charge for SDForum members

    For more information: visit: