Saturday 5 January 2013

Installation Of Mahout In Ubuntu


Apache Mahout is an Apache project to produce free implementations of distributed or otherwise scalable machine learning algorithms on the Hadoop platform.

The MAHOUT-DISTRIBUTION-0.4-SRC installation is done in below versions of Linux, Java and Hadoop respectively.

UBUNTU 12.04 LTS
JAVA 1.7.0_09
HADOOP 1.1.0

Install maven using below command.
apt-get install maven2

After installation check for maven version using command mvn –version

Set the Maven environment variables like below.
export MAVEN_HOME="/usr/share/maven2"
export PATH=$PATH:$MAVEN_HOME/bin

I have hduser as a dedicated hadoop system user. I had installed my Hadoop in /home/hduser/hadoop folder. Now I am going to install mahout in /home/hduser folder. Change the directory to the hduser and execute below commands.

Download the Mahout from below URL using wget.
wget http://apache.techartifact.com/mirror/mahout/0.4/mahout-distribution-0.4-src.tar.gz
[Out of so many zipped files in there download the .src zipped file]

Unzip the tar file.
sudo tar xzf  mahout-distribution-0.4-src.tar.gz

Change the name to Mahout.
sudo mv mahout-distribution-0.4-src mahout

Now go to Mahout folder and execute bellow command.
mvn install

That’s it, now we can run the Mahout examples.

After every example you have to clear the output directory and tmp directory. Otherwise it give error like Output directory temp/itemIDIndex already exists. 
For me those directories are
/home/hduser/output        [Specified in --output option]
/home/hduser/temp           [Specified in hadoop.tmp.dir of  core-site.xml file]

No comments:

Post a Comment