Wednesday, August 1, 2007

Weka Data mining on EC2 - install

As I mentioned I progressing through the roadmap in testing various data mining and business intelligence tools running under Virtual Machines (VM), my choice is to use Amazon EC2 AMI (Amazon Machine Images).

Installing Weka was straight forward. I used this WEKA wiki as a guide and also the README which was extracted from the zip file.

Install Summary:

Note: I used an existing VM which had Java 1.5 installed.

Use this command to check you have the environment setup for JAVA

java -version

Output:


java version "1.5.0_12"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_12-b04)
Java HotSpot(TM) Client VM (build 1.5.0_12-b04, mixed mode, sharing)


Create an weka user use the following as its profile

groupadd weka
useradd -g weka weka
cd /home/weka/
vi .bash_profile


# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi

# User specific environment and startup programs

LD_LIBRARY_PATH=/usr/local/java/lib
JAVA_HOME=/usr/local/java
WEKAHOME=/usr/local/weka

PATH=/usr/local/java/bin/:$PATH:$HOME/bin
CLASSPATH=$CLASSPATH:/$WEKAHOME/weka.jar

export PATH LD_LIBRARY_PATH JAVA_HOME WEKAHOME CLASSPATH
unset USERNAME


Get Weka using wget

wget http://optusnet.dl.sourceforge.net/sourceforge/weka/weka-3-4-11.zip
unzip weka-3-4-11.zip

Move the unzipped directory to /usr/local and create a symbolic link /usr/local/weka.
This allows you to upgrade or downgrade WEKA as required.

mv weka-3-4-11 /usr/local/
cd /usr/local
ln -s weka-3-4-11/ weka
chown -R weka:weka /usr/local/weka/

Test that weka is working ok

su - weka
mkdir tutorial
cd tutorial/
java weka.classifiers.trees.J48 -t $WEKAHOME/data/iris.arff

Output:


J48 pruned tree
------------------

petalwidth <= 0.6: Iris-setosa (50.0)
petalwidth > 0.6
| petalwidth <= 1.7
| | petallength <= 4.9: Iris-versicolor (48.0/1.0)
| | petallength > 4.9
| | | petalwidth <= 1.5: Iris-virginica (3.0)
| | | petalwidth > 1.5: Iris-versicolor (3.0/1.0)
| petalwidth > 1.7: Iris-virginica (46.0/1.0)

Number of Leaves : 5

Size of the tree : 9


Time taken to build model: 0.1 seconds
Time taken to test model on training data: 0 seconds

=== Error on training data ===

Correctly Classified Instances 147 98 %
Incorrectly Classified Instances 3 2 %
Kappa statistic 0.97
Mean absolute error 0.0233
Root mean squared error 0.108
Relative absolute error 5.2482 %
Root relative squared error 22.9089 %
Total Number of Instances 150


=== Confusion Matrix ===

a b c <-- classified as
50 0 0 | a = Iris-setosa
0 49 1 | b = Iris-versicolor
0 2 48 | c = Iris-virginica



=== Stratified cross-validation ===

Correctly Classified Instances 144 96 %
Incorrectly Classified Instances 6 4 %
Kappa statistic 0.94
Mean absolute error 0.035
Root mean squared error 0.1586
Relative absolute error 7.8705 %
Root relative squared error 33.6353 %
Total Number of Instances 150


=== Confusion Matrix ===

a b c <-- classified as
49 1 0 | a = Iris-setosa
0 47 3 | b = Iris-versicolor
0 2 48 | c = Iris-virginica

No comments: