Saturday, December 8, 2007

JGroups Cluster on EC2 large 64bit instances

Introduction:

In this series of articles I have been covering getting the proposed Pentaho Cluster running on EC2.
http://blog.vmdatamine.com/2007/09/pentaho-business-suite-cluster-research.html
http://blog.vmdatamine.com/2007/09/pentaho-cluster-installing-jgroups.html
http://blog.vmdatamine.com/2007/11/pentaho-cluster-installing-jgroups-on.html

In the last article, I ran a JGroups cluster test and found the results disappointing compared to the test results published by JBoss.

So given there are new larger instances available now with better performance I decided to see how JGroups would perform on those instances.

The specification of the large instances can be found in Amazon's announcement.

The only change from the last test from an upgrade to Java (JDK 6 Release 3) and running on Amazon public image Fedora 64 bit OS.

I tried both the large and extra large instances in a 4 node cluster setup running TCP.

Comments:

  1. The network bandwidth is still the limiting factor.
  2. I had to modify the tcp.xml settings to enable queues to stop the test hanging sporadically.
  3. The larger 64 bit instances have more throughput vs the small nodes. This could be due to settings or CPU is an underlining factor after all.
  4. You are not going to reach the JBoss performance results without a faster network.

Results:

Two nodes: 2 senders:

-- results:

10.252.93.220:7800 (myself):
num_msgs_expected=20000, num_msgs_received=20000 (loss rate=0.0%), received=20MB, time=3211ms, msgs/sec=6228.59, throughput=6.23MB

10.252.99.47:7800:
num_msgs_expected=20000, num_msgs_received=20000 (loss rate=0.0%), received=20MB, time=3236ms, msgs/sec=6180.47, throughput=6.18MB

combined: 6204.53 msgs/sec averaged over all receivers (throughput=6.2MB/sec)

Two nodes: 1 sender, 1 receiver

-- results:

10.252.93.220:7800 (myself):
num_msgs_expected=10000, num_msgs_received=10000 (loss rate=0.0%), received=10MB, time=2607ms, msgs/sec=3835.83, throughput=3.84MB

10.252.99.47:7800:
num_msgs_expected=10000, num_msgs_received=10000 (loss rate=0.0%), received=10MB, time=2515ms, msgs/sec=3976.14, throughput=3.98MB

combined: 3905.98 msgs/sec averaged over all receivers
(throughput=3.9MB/sec)

4 nodes: 2 senders, 2 receivers

-- results:

10.252.23.15:7800:

num_msgs_expected=20000, num_msgs_received=20000 (loss rate=0.0%), received=20MB, time=4640ms, msgs/sec=4310.34, throughput=4.31MB
10.252.98.208:7800:
num_msgs_expected=20000, num_msgs_received=20000 (loss rate=0.0%), received=20MB, time=4687ms, msgs/sec=4267.12, throughput=4.27MB
10.252.79.0:7800 (myself):
num_msgs_expected=20000, num_msgs_received=20000 (loss rate=0.0%), received=20MB, time=4642ms, msgs/sec=4308.49, throughput=4.31MB
10.252.93.203:7800:
num_msgs_expected=20000, num_msgs_received=20000 (loss rate=0.0%), received=20MB, time=4654ms, msgs/sec=4297.38, throughput=4.3MB

combined: 4295.83 msgs/sec averaged over all receivers (throughput=4.3MB/sec)


4 nodes: 2 senders, 2 receivers 100k messages test

-- results:

10.252.23.15:7800:
num_msgs_expected=200000, num_msgs_received=200000 (loss rate=0.0%), received=200MB, time=19828ms, msgs/sec=10086.75, throughput=10.09MB

10.252.98.208:7800:
num_msgs_expected=200000, num_msgs_received=200000 (loss rate=0.0%), received=200MB, time=19893ms, msgs/sec=10053.79, throughput=10.05MB

10.252.79.0:7800:
num_msgs_expected=200000, num_msgs_received=200000 (loss rate=0.0%), received=200MB, time=19828ms, msgs/sec=10086.75, throughput=10.09MB

10.252.93.203:7800 (myself):
num_msgs_expected=200000, num_msgs_received=200000 (loss rate=0.0%), received=200MB, time=19917ms, msgs/sec=10041.67, throughput=10.04MB

combined: 10067.24 msgs/sec averaged over all receivers (throughput=10.07MB/sec)

2nd run:

-- results:

10.252.23.15:7800:
num_msgs_expected=200000, num_msgs_received=200000 (loss rate=0.0%), received=200MB, time=20605ms, msgs/sec=9706.38, throughput=9.71MB

10.252.98.208:7800:
num_msgs_expected=200000, num_msgs_received=200000 (loss rate=0.0%), received=200MB, time=20629ms, msgs/sec=9695.09, throughput=9.7MB

10.252.79.0:7800:
num_msgs_expected=200000, num_msgs_received=200000 (loss rate=0.0%), received=200MB, time=20590ms, msgs/sec=9713.45, throughput=9.71MB

10.252.93.203:7800 (myself):
num_msgs_expected=200000, num_msgs_received=200000 (loss rate=0.0%), received=200MB, time=20636ms, msgs/sec=9691.8, throughput=9.69MB

combined: 9701.68 msgs/sec averaged over all receivers (throughput=9.7MB/sec)

Extra large 4 nodes : 2 senders, 2 receivers

-- results:

10.252.106.3:7800:
num_msgs_expected=200000, num_msgs_received=200000 (loss rate=0.0%), received=200MB, time=17857ms, msgs/sec=11200.09, throughput=11.2MB

10.252.15.79:7800:
num_msgs_expected=200000, num_msgs_received=200000 (loss rate=0.0%), received=200MB, time=17831ms, msgs/sec=11216.42, throughput=11.22MB

10.252.10.223:7800 (myself):
num_msgs_expected=200000, num_msgs_received=200000 (loss rate=0.0%), received=200MB, time=17837ms, msgs/sec=11212.65, throughput=11.21MB

10.252.6.223:7800:
num_msgs_expected=200000, num_msgs_received=200000 (loss rate=0.0%), received=200MB, time=17837ms, msgs/sec=11212.65, throughput=11.21MB

combined: 11210.45 msgs/sec averaged over all receivers (throughput=11.21MB/sec)


-- results:

10.252.106.3:7800:
num_msgs_expected=200000, num_msgs_received=200000 (loss rate=0.0%), received=200MB, time=15585ms, msgs/sec=12832.85, throughput=12.83MB

10.252.15.79:7800:
num_msgs_expected=200000, num_msgs_received=200000 (loss rate=0.0%), received=200MB, time=15564ms, msgs/sec=12850.17, throughput=12.85MB

10.252.10.223:7800 (myself):
num_msgs_expected=200000, num_msgs_received=200000 (loss rate=0.0%), received=200MB, time=15563ms, msgs/sec=12850.99, throughput=12.85MB

10.252.6.223:7800:
num_msgs_expected=200000, num_msgs_received=200000 (loss rate=0.0%), received=200MB, time=15563ms, msgs/sec=12850.99, throughput=12.85MB

combined: 12846.25 msgs/sec averaged over all receivers (throughput=12.85MB/sec)

Extra large 4 nodes : 4 senders and TCP queues

-- results:

10.252.106.3:7800:
num_msgs_expected=400000, num_msgs_received=400000 (loss rate=0.0%), received=400MB, time=27784ms, msgs/sec=14396.78, throughput=14.4MB

10.252.15.79:7800:
num_msgs_expected=400000, num_msgs_received=400000 (loss rate=0.0%), received=400MB, time=27744ms, msgs/sec=14417.53, throughput=14.42MB

10.252.10.223:7800 (myself):
num_msgs_expected=400000, num_msgs_received=400000 (loss rate=0.0%), received=400MB, time=27761ms, msgs/sec=14408.7, throughput=14.41MB

10.252.6.223:7800:
num_msgs_expected=400000, num_msgs_received=400000 (loss rate=0.0%), received=400MB, time=27755ms, msgs/sec=14411.82, throughput=14.41MB

combined: 14408.71 msgs/sec averaged over all receivers (throughput=14.41MB/sec)

with TCP queue_max_size set to 1000

10.252.106.3:7800:
num_msgs_expected=400000, num_msgs_received=400000 (loss rate=0.0%), received=400MB, time=26667ms, msgs/sec=14999.81, throughput=15MB

10.252.15.79:7800:
num_msgs_expected=400000, num_msgs_received=400000 (loss rate=0.0%), received=400MB, time=26709ms, msgs/sec=14976.23, throughput=14.98MB

10.252.10.223:7800 (myself):
num_msgs_expected=400000, num_msgs_received=400000 (loss rate=0.0%), received=400MB, time=26676ms, msgs/sec=14994.75, throughput=14.99MB

10.252.6.223:7800:
num_msgs_expected=400000, num_msgs_received=400000 (loss rate=0.0%), received=400MB, time=26710ms, msgs/sec=14975.66, throughput=14.98MB

combined: 14986.61 msgs/sec averaged over all receivers (throughput=14.99MB/sec)

Example of TCP.XML (note &gt and &lt to handle HTML)


<>
< TCP start_port="7800"
loopback="false"
discard_incompatible_packets="true"
max_bundle_size="64000"
max_bundle_timeout="30"
use_incoming_packet_handler="true"
enable_bundling="true"
use_send_queues="true"
sock_conn_timeout="300"
skip_suspected_members="true"

use_concurrent_stack="true"

thread_pool.enabled="true"
thread_pool.min_threads="8"
thread_pool.max_threads="40"
thread_pool.keep_alive_time="5000"
thread_pool.queue_enabled="true"
thread_pool.queue_max_size="100"
thread_pool.rejection_policy="run"

oob_thread_pool.enabled="true"
oob_thread_pool.min_threads="8"
oob_thread_pool.max_threads="20"
oob_thread_pool.keep_alive_time="5000"
oob_thread_pool.queue_enabled="true"
oob_thread_pool.queue_max_size="100"
oob_thread_pool.rejection_policy="run"/ >

< TCPPING timeout="3000"
initial_hosts="${jgroups.tcpping.initial_hosts:10.252.106.3[7800],
10.252.15.79[7800],10.252.10.223[7800],10.252.6.223[7800]}"
port_range="1"
num_initial_members="2"/ >


Monday, November 5, 2007

Pentaho Cluster : Installing JGroups on EC2

Overview:

Wondering why I hadn't updated my progress with installing JGroups on EC2?
It was because I had three false starts and got nowhere.

Finally however I found some more documentation and was able to get it running.

I found this report about a JGroups Performance test and the associated JBoss wiki Perftests.

That was enough information to understand how to get it working. It also helped that the more recent version JGroups 2.5.1 came with some sample configuration files.

Comments:

  1. The network bandwidth between EC2 nodes is the limiting factor.
  2. For 2 nodes: 4183.65 msgs/sec averaged over all receivers (throughput=4.18MB/sec) vs 60783.12 msgs/sec averaged over all receivers (throughput=60.78MB/sec)
  3. For 4 nodes: 3852.11 msgs/sec averaged over all receivers (throughput=3.85MB/sec)
    vs 60783.12 msgs/sec averaged over all receivers (throughput=60.78MB/sec)
The JGroups on EC2 performance versus the JGroups Performance report was very bad. Without changing any settings it was 15 times slower. This is the benefit of having a 1 Gigabit LAN versus 100 Megabit LAN.
The next step would be to test on the larger instances. If the network performance, rated as better for the larger instances versus the default is true, it will show up in the results.


Install:
  1. wget http://easynews.dl.sourceforge.net/sourceforge/javagroups/JGroups-2.5.1.bin.zip
  2. unzip JGroups-2.5.1.bin.zip -d YourJavaLibDirectory
  3. cd YourJavaLibDirectory.
  4. nslookup `hostname` to get your servers IP address.
  5. edit the JGroups-2.5.1.bin/config.txt and JGroups-2.5.1.bin/tcp.xml to add the hosts. See the sample files at the bottom of this post.
  6. java -cp JGroups-2.5.1.bin/concurrent.jar:JGroups-2.5.1.bin/jgroups-all.jar:JGroups-2.5.1.bin/commons-logging.jar org.jgroups.tests.perf.Test -receiver -config JGroups-2.5.1.bin/config.txt -props JGroups-2.5.1.bin/tcp.xml
  7. java -cp JGroups-2.5.1.bin/concurrent.jar:JGroups-2.5.1.bin/jgroups-all.jar:JGroups-2.5.1.bin/commons-logging.jar org.jgroups.tests.perf.Test -sender -config JGroups-2.5.1.bin/config.txt -props JGroups-2.5.1.bin/tcp.xml
  8. If you have the hosts correct it should run the test.

Results:

2 nodes
-- results:

10.255.23.160:7800 (myself):
num_msgs_expected=20000, num_msgs_received=20000 (loss rate=0.0%), received=20MB, time=4664ms, msgs/sec=4288.16, throughput=4.29MB

10.255.26.143:7800:
num_msgs_expected=20000, num_msgs_received=20000 (loss rate=0.0%), received=20MB, time=4903ms, msgs/sec=4079.14, throughput=4.08MB

combined: 4183.65 msgs/sec averaged over all receivers (throughput=4.18MB/sec)

4 nodes (2 senders, 2 receivers):

-- results:

10.253.15.95:7800:
num_msgs_expected=20000, num_msgs_received=20000 (loss rate=0.0%), received=20MB, time=5212ms, msgs/sec=3837.3, throughput=3.84MB

10.255.23.160:7800:
num_msgs_expected=20000, num_msgs_received=20000 (loss rate=0.0%), received=20MB, time=5174ms, msgs/sec=3865.48, throughput=3.87MB

10.255.26.143:7800:
num_msgs_expected=20000, num_msgs_received=20000 (loss rate=0.0%), received=20MB, time=5192ms, msgs/sec=3852.08, throughput=3.85MB

10.253.83.143:7800 (myself):
num_msgs_expected=20000, num_msgs_received=20000 (loss rate=0.0%), received=20MB, time=5190ms, msgs/sec=3853.56, throughput=3.85MB

combined: 3852.11 msgs/sec averaged over all receivers (throughput=3.85MB/sec)


Sample Output:

2 Nodes:

----------------------- TEST -----------------------
Date: Mon Nov 05 04:30:54 EST 2007
Run by: root

mcast_port: 7500
log_interval: 1000
sender: true
props: JGroups-2.5.1.bin/tcp.xml
jmx: false
bind_addr: localhost
num_members: 2
msg_size: 1000
dump_transport_stats: false
start_port: 7800
topic: topic/testTopic
num_senders: 2
cluster: 10.255.23.160:7800,10.255.26.143:7801
num_msgs: 10000
transport: org.jgroups.tests.perf.transports.JGroupsTransport
config: JGroups-2.5.1.bin/config.txt
processing_delay: 0
mcast_addr: 228.1.2.3
JGroups version: 2.5.1

Nov 5, 2007 4:30:54 AM org.jgroups.JChannel init
INFO: JGroups version: 2.5.1

-------------------------------------------------------
GMS: address is 10.255.26.143:7800
-------------------------------------------------------
-- 10.255.26.143:7800 joined
-- waiting for 2 members to join
-- 10.255.23.160:7800 joined
-- READY (2 acks)

-- sending 10000 1KB messages
-- received 1000 messages
-- received 2000 messages
++ sent 1000
-- received 3000 messages
++ sent 2000
-- received 4000 messages
-- received 5000 messages
++ sent 3000
-- received 6000 messages
++ sent 4000
-- received 7000 messages
-- received 8000 messages
++ sent 5000
-- received 9000 messages
-- received 10000 messages
-- received 11000 messages
++ sent 6000
-- received 12000 messages
++ sent 7000
-- received 13000 messages
-- received 14000 messages
-- received 15000 messages
++ sent 8000
-- received 16000 messages
-- received 17000 messages
++ sent 9000
-- received 18000 messages
-- received 19000 messages
++ sent 10000
-- received 20000 messages

-- results:

10.255.23.160:7800:
num_msgs_expected=20000, num_msgs_received=20000 (loss rate=0.0%), received=20MB, time=4664ms, msgs/sec=4288.16, throughput=4.29MB

10.255.26.143:7800 (myself):
num_msgs_expected=20000, num_msgs_received=20000 (loss rate=0.0%), received=20MB, time=4903ms, msgs/sec=4079.14, throughput=4.08MB

combined: 4183.65 msgs/sec averaged over all receivers (throughput=4.18MB/sec)


4 Nodes:

Sender Node Output


----------------------- TEST -----------------------
Date: Mon Nov 05 04:30:54 EST 2007
Run by: root

mcast_port: 7500
log_interval: 1000
sender: true
props: JGroups-2.5.1.bin/tcp.xml
jmx: false
bind_addr: localhost
num_members: 2
msg_size: 1000
dump_transport_stats: false
start_port: 7800
topic: topic/testTopic
num_senders: 2
cluster: 10.255.23.160:7800,10.255.26.143:7801
num_msgs: 10000
transport: org.jgroups.tests.perf.transports.JGroupsTransport
config: JGroups-2.5.1.bin/config.txt
processing_delay: 0
mcast_addr: 228.1.2.3
JGroups version: 2.5.1

Nov 5, 2007 4:30:54 AM org.jgroups.JChannel init
INFO: JGroups version: 2.5.1

-------------------------------------------------------
GMS: address is 10.255.26.143:7800
-------------------------------------------------------
-- 10.255.26.143:7800 joined
-- waiting for 2 members to join
-- 10.255.23.160:7800 joined
-- READY (2 acks)

-- sending 10000 1KB messages
-- received 1000 messages
-- received 2000 messages
++ sent 1000
-- received 3000 messages
++ sent 2000
-- received 4000 messages
-- received 5000 messages
++ sent 3000
-- received 6000 messages
++ sent 4000
-- received 7000 messages
-- received 8000 messages
++ sent 5000
-- received 9000 messages
-- received 10000 messages
-- received 11000 messages
++ sent 6000
-- received 12000 messages
++ sent 7000
-- received 13000 messages
-- received 14000 messages
-- received 15000 messages
++ sent 8000
-- received 16000 messages
-- received 17000 messages
++ sent 9000
-- received 18000 messages
-- received 19000 messages
++ sent 10000
-- received 20000 messages

-- results:

10.255.23.160:7800:
num_msgs_expected=20000, num_msgs_received=20000 (loss rate=0.0%), received=20MB, time=4664ms, msgs/sec=4288.16, throughput=4.29MB

10.255.26.143:7800 (myself):
num_msgs_expected=20000, num_msgs_received=20000 (loss rate=0.0%), received=20MB, time=4903ms, msgs/sec=4079.14, throughput=4.08MB

combined: 4183.65 msgs/sec averaged over all receivers (throughput=4.18MB/sec)


Receiver Node Output

----------------------- TEST -----------------------
Date: Mon Nov 05 04:46:12 EST 2007
Run by: root

mcast_port: 7500
log_interval: 1000
sender: false
props: JGroups-2.5.1.bin/tcp.xml
jmx: false
bind_addr: localhost
num_members: 4
msg_size: 1000
dump_transport_stats: false
start_port: 7800
topic: topic/testTopic
num_senders: 2
cluster: 10.255.23.160:7800,10.255.26.143:7801,10.253.83.143:7802,10.253.15.95:7803
num_msgs: 10000
transport: org.jgroups.tests.perf.transports.JGroupsTransport
config: JGroups-2.5.1.bin/config.txt
processing_delay: 0
mcast_addr: 228.1.2.3
JGroups version: 2.5.1

Nov 5, 2007 4:46:12 AM org.jgroups.JChannel init
INFO: JGroups version: 2.5.1

-------------------------------------------------------
GMS: address is 10.253.83.143:7800
-------------------------------------------------------
-- 10.253.15.95:7800 joined
-- 10.253.83.143:7800 joined
-- waiting for 4 members to join
-- 10.255.23.160:7800 joined
-- 10.255.26.143:7800 joined
-- READY (4 acks)

-- received 1000 messages
-- received 2000 messages
-- received 3000 messages
-- received 4000 messages
-- received 5000 messages
-- received 6000 messages
-- received 7000 messages
-- received 8000 messages
-- received 9000 messages
-- received 10000 messages
-- received 11000 messages
-- received 12000 messages
-- received 13000 messages
-- received 14000 messages
-- received 15000 messages
-- received 16000 messages
-- received 17000 messages
-- received 18000 messages
-- received 19000 messages
-- received 20000 messages

-- local results:
sender: 10.255.23.160:7800: num_msgs_expected=10000, num_msgs_received=10000 (loss rate=0.0%), received=10MB, time=5180ms,

msgs/sec=1930.5, throughput=1.93MB
sender: 10.253.15.95:7800: num_msgs_expected=10000, num_msgs_received=10000 (loss rate=0.0%), received=10MB, time=4832ms,

msgs/sec=2069.54, throughput=2.07MB


-- results:

10.253.15.95:7800:
num_msgs_expected=20000, num_msgs_received=20000 (loss rate=0.0%), received=20MB, time=5212ms, msgs/sec=3837.3, throughput=3.84MB

10.255.23.160:7800:
num_msgs_expected=20000, num_msgs_received=20000 (loss rate=0.0%), received=20MB, time=5174ms, msgs/sec=3865.48, throughput=3.87MB

10.255.26.143:7800:
num_msgs_expected=20000, num_msgs_received=20000 (loss rate=0.0%), received=20MB, time=5192ms, msgs/sec=3852.08, throughput=3.85MB

10.253.83.143:7800 (myself):
num_msgs_expected=20000, num_msgs_received=20000 (loss rate=0.0%), received=20MB, time=5190ms, msgs/sec=3853.56, throughput=3.85MB

combined: 3852.11 msgs/sec averaged over all receivers (throughput=3.85MB/sec)

-- results:

10.253.15.95:7800:
num_msgs_expected=20000, num_msgs_received=20000 (loss rate=0.0%), received=20MB, time=5212ms, msgs/sec=3837.3, throughput=3.84MB

10.255.23.160:7800:
num_msgs_expected=20000, num_msgs_received=20000 (loss rate=0.0%), received=20MB, time=5174ms, msgs/sec=3865.48, throughput=3.87MB

10.255.26.143:7800:
num_msgs_expected=20000, num_msgs_received=20000 (loss rate=0.0%), received=20MB, time=5192ms, msgs/sec=3852.08, throughput=3.85MB

10.253.83.143:7800 (myself):
num_msgs_expected=20000, num_msgs_received=20000 (loss rate=0.0%), received=20MB, time=5190ms, msgs/sec=3853.56, throughput=3.85MB

combined: 3852.11 msgs/sec averaged over all receivers (throughput=3.85MB/sec)


Sample config.txt file

############################
# only used by TCP Transport
############################

# List of hosts in the cluster. Since we don't specify ports, you cannot run multiple TcpTransports
# on the same machine: each ember has to be run on a separate machine (this may be changed in a next version)
#cluster=127.0.0.1:7800,127.0.0.1:7801
# 2nodes # cluster=10.255.23.160:7800,10.255.26.143:7801
cluster=10.255.23.160:7800,10.255.26.143:7801,10.253.83.143:7802,10.253.15.95:7803


Sample hosts line in tcp.xml



initial_hosts="${jgroups.tcpping.initial_hosts:10.255.23.160[7800],10.255.26.143[7801],10.253.83.143[7802],10.253.15.95[7803]}"





Thursday, October 18, 2007

IOzone benchmark on EC2

I have run off another IO benchmark on EC2 using IOZone.

The OS is Centos 4.4 OS, running on Amazon Machine Image (AMI), which is a Zen based Virtual Machine (VM).

There seem to be a couple of sweet spots identified by the benchmark.

  1. To stay in CPU Cache keep your file size smaller than under 256KB and read and write in 64 Byte chunks
  2. If you must read from larger files, the size of the cache is the maximum size to remain in memory
  3. Reading is best done between 64 Byte to 512 Byte chunks.
  4. At least on EC2 stay away from reading 16 Kbyte files in 128 byte, 1K and 8K chunks.
The results and larger graphs can be found here

http://s3.amazonaws.com/dbadojo_benchmark/iozone_ec2_write.GIF
http://s3.amazonaws.com/dbadojo_benchmark/iozone_ec2_read.GIF
http://s3.amazonaws.com/dbadojo_benchmark/iozone_ec2_random_read.GIF
http://s3.amazonaws.com/dbadojo_benchmark/iozone_ec2_random_write.GIF
http://s3.amazonaws.com/dbadojo_benchmark/iozone_benchmark_ec2.zip (Right Click SAVE AS)

Have Fun

Paul

Thursday, October 4, 2007

Bonnie IO benchmark on EC2

I thought it might be useful to cross-link to an article I posted about running the bonnie and bonnie++ IO benchmark tool against EC2.

http://blog.dbadojo.com/2007/10/bonnie-io-benchmark-vs-ec2.html

Now back to finalizing testing JGroups on 2 separate nodes.

Have Fun

Paul

Thursday, September 20, 2007

Pentaho Cluster: Installing JGroups


As I mentioned in the previous post I am going to attempt to make a Pentaho Cluster, based on this presentation(PDF) and outlined in this preparation post.

So the fundamental building block for this cluster is the JGroups Java libraries, so first cab off the rank is to download and install the jar files and get demo to work.

After some mishaps with my CLASSPATH I got that sorted and tested the demo using two JGroups instances running on the same node. See the picture. Drawing in one window was immediately reflected in the second window on the right.

The install.html which is part of the zipped download file was good and explained the procedure well.

So the next thing is try the JGroups clustering on two separate EC2 nodes. That is next...

Have Fun

Paul

Installing JGroups onto a EC2 node with Java already installed.

Check Java is installed

java -version


java version "1.5.0_12"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_12-b04)
Java HotSpot(TM) Client VM (build 1.5.0_12-b04, mixed mode, sharing)


Download JGroups

http://labs.jboss.com/jgroups/downloads
wget http://easynews.dl.sourceforge.net/sourceforge/javagroups/JGroups-2.3.bin.zip

Unzip and copy files to /usr/local

unzip JGroups-2.3.bin.zip -d /usr/local

Run the checker to make sure you have installed correctly

java -jar JGroups-2.3.bin/jgroups-all.jar


Version: 2.3
CVS: $Id: Version.java,v 1.35 2006/06/11 19:15:23 belaban Exp $
History: (see doc/history.txt for details)


Start X-window to EC2

ssh -i id_rsa-gsg-keypair -X root@yourEC2host

Check the display is set correctly

echo $DISPLAY

localhost:10.0

Run the demo twice, use & to background each command from command line.

cd /usr/local
java -cp JGroups-2.3.bin/concurrent.jar:JGroups-2.3.bin/jgroups-all.jar:JGroups-2.3.bin/commons-logging.jar org.jgroups.demos.Draw


Sep 20, 2007 7:34:29 AM org.jgroups.protocols.UDP createSockets
INFO: sockets will use interface 10.253.22.176
Sep 20, 2007 7:34:29 AM org.jgroups.protocols.UDP createSockets
INFO: socket information:
local_addr=10.253.22.176:32772, mcast_addr=228.8.8.8:45566, bind_addr=/10.253.22.176, ttl=32
sock: bound to 10.253.22.176:32772, receive buffer size=64000, send buffer size=32000
mcast_recv_sock: bound to 10.253.22.176:45566, send buffer size=64000, receive buffer size=64000
mcast_send_sock: bound to 10.253.22.176:32773, send buffer size=64000, receive buffer size=64000

-------------------------------------------------------
GMS: address is 10.253.22.176:32772
-------------------------------------------------------
** View=[10.253.22.176:32769|1] [10.253.22.176:32769, 10.253.22.176:32772]
** View=[10.253.22.176:32769|1] [10.253.22.176:32769, 10.253.22.176:32772]






Monday, September 10, 2007

Pentaho Business Suite Cluster : Research and preparation

As I mentioned in this article on running the Pentaho Demo on EC2, according to a recent presentation, Pentaho has the ability to be clustered using JBoss Clustering (JBoss JGroups), JBoss AS and Apache.

This post is to outline the background documentation I am using to give this cluster a go on EC2.
There are some posts indicating that the lack of multicasting with EC2 is a issue, however comments and the doco suggest that it will revert to TCP, although that is slower.

Given I already have Pentaho installed and saved as an AMI, I am going to build on that to make the single node. The documentation and wikis confident that it should auto discover and be a piece of cake, however we shall see.

Have Fun

Paul

Here is the specification of the cluster node (from that presentation)

Single Node

Single CPU
2 GB RAM
JBoss AS 4.0.3
JBoss JGroups?
Pentaho 1.1.5

Cluster Master:

Apache HTTP server 2.0.58 with mod_jk module version 1.2.15
JGroups cluster master
JMS / Web Services for Operational BI

Doco (Documentation)

http://docs.jboss.org/jbossas/jboss4guide/r4/html/jbosscache.chapt.html
http://docs.jboss.org/jbossas/jboss4guide/r4/html/cluster.chapt.html
http://www.onjava.com/pub/a/onjava/2002/07/10/jboss.html
http://www.jboss.org/wiki/Wiki.jsp?page=JBossHA
http://www.jboss.org/developers/projects/jboss/clustering
http://clusterstore.demo.jboss.com/

Forums and Blogs:

http://www.jgroups.org/javagroupsnew/docs/Perftest.html
http://blog.decaresystems.ie/index.php/2007/01/29/amazon-web-services-the-future-of-datacenter-computing-part-1
http://blog.decaresystems.ie/index.php/2007/02/12/amazon-web-services-the-future-of-datacenter-computing-part-2

Apache
http://jakarta.apache.org/tomcat/tomcat-3.3-doc/mod_jk-howto.html

Downloads

http://labs.jboss.com/jbossas/downloads
http://labs.jboss.com/jgroups/downloads
http://labs.jboss.com/jbosscache/download/index.html

Monday, September 3, 2007

Mondrian OLAP on MySQL EC2 Part 1


If you are wondering why there has been no recent postings, it is due to mainly to struggling to get the Mondrian installed. It had nothing to do with the environment and mostly due to documentation missing vital examples or explanation.

It is with relief I can say I have passed the test and can pass over to the fair lands of using Mondrian OLAP and running the demo.

You can download Mondrian from the Pentaho website or directly from sourceforge.
Use this installation document as a guide however it glosses over some of the nice little details which will make the demo work or not.

Comments:

  1. Installing the demo was way too hard, just bundle tomcat or build a include everything binary for MySQL, PostgreSQL, Oracle or whatever. You want people to try the program, not many people will persist like I did to get it to work.
  2. Having to hand edit every file which connects to the database or is used as a tomcat configuration file sucks, ever heard of storing that stuff in a single file or in the database.
  3. Provide a simple SQL script to create schema objects and load data. Running the Foodmart loader did not prove you could run the demo! use this SQL script otherwise from Gizzar
  4. Some kind of example of either a tomcat or other verification tool would be good, rather than having to lather, rinse and repeat over java, JDBC, Tomcat and red herring errors.
Many thanks to some other blogs and sites which helped solved various issues which cropped up
Gizzar article on Mondrian Open Source OLAP with MySQL
University of Vienna (Wien) for Tomcat FAQ solving tomcat shutdown on missing X
Mondrian Forum

Have Fun

Paul

Install:
Note: This really requires a HOWTO document. I will work on that based on this. Hopefully you can use this in conjunction with installation guide and the various other blogs who had fun with this.

  1. Install some linux packages: yum install gcc autoconf
  2. Download the last release of Mondrian non-embedded files from SourceForge.
  3. Download and install Java 1.5 or better from Sun Java Downloads.
  4. Download and install MySQL 5.0 from MySQL Downloads.
  5. Download and unzip the MySQL JDBC driver
  6. Download and install Apache Tomcat 5.0.28.
  7. Verify that Java, MySQL and Tomcat are working by doing the following:
  8. java -version
  9. mysql -V
  10. /usr/local/tomcat/bin/startup.sh
  11. Point your browser at http://yourhostname:8080, if it works, tomcat is working.
  12. unzip Mondrian.zip -b /usr/local/mondrian
  13. Create the MySQL database: mysqladmin create foodmart -u root -p
  14. Create the Foodmart User: create user 'foodmart'@'yourhostname' identified by 'foodmart';
  15. Grant permissions: grant all privileges on *.* to 'foodmart'@'yourhostname' identified by 'foodmart';
  16. Explode the mondrian.war into /usr/local/tomcat/webapps/mondrian: jar -xvf mondrian.war
  17. Locate the 4 jar files eigenbase-properties.jar,eigenbase-resgen.jar,eigenbase-xom.jar and log4j-1.2.9.jar
  18. Run Mondrian.FoodmartLoader to create tables and load data, pass in the full path to all the required JAR files, otherwise it will fail with a Class notFound error. This is an example:


  19. java -cp "lib/mondrian.jar: /usr/local/mysql-5.0.45-linux-i686/mysql-connector-java-5.0.7/src/lib/log4j-1.2.9.jar:
    lib/eigenbase-xom.jar:lib/eigenbase-resgen.jar:lib/eigenbase-properties.jar:
    /usr/local/mysql/mysql-connector-java-5.0.7/mysql-connector-java-5.0.7-bin.jar" \
    mondrian.test.loader.MondrianFoodMartLoader \
    -verbose -tables -data -indexes \
    -jdbcDrivers=com.mysql.jdbc.Driver \
    -inputFile=demo/FoodMartCreateData.sql \
    -outputJdbcURL="jdbc:mysql://localhost/foodmart?user=foodmart&password=foodmart"


  20. cp /usr/local/jakarta-tomcat-5.0.28/webapps/mondrian/WEB-INF/lib/xalan.jar /usr/local/jakarta-tomcat-5.0.28/common/endorsed/
  21. Modify the query files, web.xml, datasource.xml and mondrian.properties file to replace localhost with your hostname


  22. sed -i`date +%y%m%d` -e "s/localhost/yourhostname/" fourhier.jsp
    sed -i`date +%y%m%d` -e "s/localhost/yourhostname/" mondrian.jsp
    sed -i`date +%y%m%d` -e "s/localhost/yourhostname/" colors.jsp
    sed -i`date +%y%m%d` -e "s/localhost/yourhostname/" arrows.jsp
    sed -i`date +%y%m%d` -e "s/localhost/yourhostname/" web.xml
    sed -i`date +%y%m%d` -e "s/localhost/yourhostname/" datasource.xml
    sed -i`date +%y%m%d` -e "s/localhost/yourhostname/" mondrian.properties

  23. Modify the connection string in each file as per install guide, including the #38 stuff it is not a browser character issue.
  24. Test connectivity for mysql user foodmart@yourhostname. Connection errors can cause this error


  25. Mondrian Error:Internal error: Error while creating SQL dialect

  26. export CATALINA_OPTS='-Djava.awt.headless=true' or add to your shell profile, this stops tomcat dying when X-windows is not found!! Thanks to this link for solving that.

Saturday, August 25, 2007

Pentaho Business Intelligence suite on EC2


As I mentioned in the road map, the plan is to run through the popular data mining and business intelligence software installing and running any demos available.

This is partially to gain experience with the software but also to demonstrate the ability to use the on demand nature of Amazon's EC2 (Elastic Cloud beta) to provide the ability to use the tools when required and ramp up or down the amount of computing resources used.

Pentaho is a popular open source Business Intelligence (BI) software suite, it has a active support group, good support forums with active Pentaho employee participation. You can download the software suite or subsections of the software from the downloads area or alternatively go to the sourceforge site and get them from there.

Like any good software vendor, open source or not, they provide a Pentaho 1.2.1 GA demo of their software so potential clients can get a good look and feel for the product.

I used my old faithful CentOS 4.4 linux distro which is essentially Red Hat Linux Enterprise 4 (RHEL4) running MySQL 5.1 as a base to install the Pentaho demo.
The Pentaho BI Suite is built on Java and the demo uses JBoss, providing access to the various parts of the BI Suite (Reporting, Kettle ETL, Weka and Shark Workflow Engine).
So naturally it required a Java JDK. Given I had the JDK-1.5.0.12 for linux handy I installed that Java.

Comments on the install and demo:

  • I tried the Pentaho 1.6.0 Release Candidate 1 demo (pentaho_demo_mysql5-1.6.0-RC1.782.tar.gz) and the demo install failed with a bunch of java class errors. I found this Pentaho forum post indicating similar issues. I haven't tried the 1.6.0 zip file to check whether it is indeed an issue with missing jar files.
  • Once I reverted to the Pentaho 1.2.1 GA demo everything was sweet.
  • To run the Pentaho Server on EC2 and use your browser you will need to update the /install_dir/pentaho-demo/jboss/server/default/deploy/pentaho.war/WEB-INF/web.xml and modify the base-url to be the hostname of your server.
  • The log produced by the start_pentaho.sh was very verbose and actually very interesting to see the calls made to service the web requests.
  • Set the EC2-security group to allow access to port 8080 unless you are using the default security group.
  • Point your browser at http://yourEC2-DNS-hostname:8080
I haven't delved sufficiently into the documentation to determine what is required to separate the components of Pentaho onto separate server/compute resources. However it seems Pentaho already have delivered a presentation on that ability. So that is something to test in the future.

I have included a screenshot of the home page once the demo was up and running. As per normal I have dumped the most revelant pieces of my work at the end of this post.

Have Fun

Paul



Get Java 1.5 JDK and follow the intructions at Java 1.5 and install

cd /usr/local
sh /mnt/jdk-1_5_0_12-linux-i586.bin

Do you agree to the above license terms? [yes or no]
yes
Unpacking...
Checksumming...
0
0
Extracting...
UnZipSFX 5.42 of 14 January 2001, by Info-ZIP (Zip-Bugs@lists.wku.edu).
creating: jdk1.5.0_12/
creating: jdk1.5.0_12/jre/
creating: jdk1.5.0_12/jre/bin/
inflating: jdk1.5.0_12/jre/bin/java
inflating: jdk1.5.0_12/jre/bin/keytool
inflating: jdk1.5.0_12/jre/bin/policytool
...

Setup a bunch of symbolic links. Note: This allows flexibility to change the versions in the future.

ln -s /usr/local/jdk1.5.0_12/ java
cd bin
ln -s /usr/local/jdk1.5.0_12/bin/java java

Check java is working

java -version

java version "1.5.0_12"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_12-b04)
Java HotSpot(TM) Client VM (build 1.5.0_12-b04, mixed mode, sharing)


edit .bash_profile add JAVA_HOME and add java and mysql binaries to the path

[pentaho@domU-12-31-35-00-53-92 ~]$ source .bash_profile

Example of bash_profile:

cat .bash_profile

# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi

# User specific environment and startup programs

PATH=$PATH:$HOME/bin:$JAVA_HOME/bin:/usr/local/mysql/bin
JAVA_HOME=/usr/local/java/

export PATH JAVA_HOME
unset USERNAME

Check the JAVA and MySQL versions and path:

java -version

java version "1.5.0_12"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_12-b04)
Java HotSpot(TM) Client VM (build 1.5.0_12-b04, mixed mode, sharing)

mysql -V

mysql Ver 14.13 Distrib 5.1.20-beta, for pc-linux-gnu (i686) using readline 5.0

Get Pentaho demo 1.2 zipfile Demo (to be safe)

wget http://umn.dl.sourceforge.net/sourceforge/pentaho/pentaho_demo-1.2.1.625-GA.zip
unzip pentaho_demo-1.2.1.625-GA.zip -d /usr/local/pentaho

cd /usr/local/pentaho
chown -R root:pentaho .
ls -la

total 12
drwxr-xr-x 3 root pentaho 4096 Aug 25 03:01 .
drwxr-xr-x 15 root root 4096 Aug 25 03:00 ..
drwxr-xr-x 5 root pentaho 4096 Aug 25 03:01 pentaho-demo

Loading the sample data and checking what was created in MySQL database:

cd /usr/local/pentaho/pentaho-demo/data
mysql -u root -p < SampleDataDump_MySql.sql
mysql -u root -p
Enter password:

Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.1.20-beta-log MySQL Community Server (GPL)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| hibernate |
| mysql |
| quartz |
| sampledata |
| test |
+--------------------+
6 rows in set (0.00 sec)


mysql> use sampledata
Database changed
mysql> show tables;
+----------------------+
| Tables_in_sampledata |
+----------------------+
| CUSTOMERS |
| CUSTOMER_W_TER |
| DEPARTMENT_MANAGERS |
| EMPLOYEES |
| OFFICES |
| ORDERDETAILS |
| ORDERFACT |
| ORDERS |
| PAYMENTS |
| PRODUCTS |
| QUADRANT_ACTUALS |
| TIME |
| TRIAL_BALANCE |
+----------------------+
13 rows in set (0.00 sec)

mysql> show table status;
+---------------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-------------------+----------+----------------+----------------------------------------------------------------------------------+
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment |
+---------------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-------------------+----------+----------------+----------------------------------------------------------------------------------+
| CUSTOMERS | InnoDB | 10 | Compact | 117 | 420 | 49152 | 0 | 0 | 0 | NULL | 2007-08-25 03:04:58 | NULL | NULL | latin1_general_cs | NULL | | InnoDB free: 11264 kB |
| CUSTOMER_W_TER | InnoDB | 10 | Compact | 103 | 477 | 49152 | 0 | 16384 | 0 | NULL | 2007-08-25 03:04:58 | NULL | NULL | latin1_general_cs | NULL | | InnoDB free: 11264 kB |
| DEPARTMENT_MANAGERS | InnoDB | 10 | Compact | 4 | 4096 | 16384 | 0 | 0 | 0 | NULL | 2007-08-25 03:04:58 | NULL | NULL | latin1_general_cs | NULL | | InnoDB free: 11264 kB |
| EMPLOYEES | InnoDB | 10 | Compact | 23 | 712 | 16384 | 0 | 0 | 0 | NULL | 2007-08-25 03:04:58 | NULL | NULL | latin1_general_cs | NULL | | InnoDB free: 11264 kB |
| OFFICES | InnoDB | 10 | Compact | 7 | 2340 | 16384 | 0 | 0 | 0 | NULL | 2007-08-25 03:04:58 | NULL | NULL | latin1_general_cs | NULL | | InnoDB free: 11264 kB |
| ORDERDETAILS | InnoDB | 10 | Compact | 2913 | 61 | 180224 | 0 | 0 | 0 | NULL | 2007-08-25 03:04:58 | NULL | NULL | latin1_general_cs | NULL | | InnoDB free: 11264 kB |
| ORDERFACT | InnoDB | 10 | Compact | 3027 | 173 | 524288 | 0 | 131072 | 0 | NULL | 2007-08-25 03:04:58 | NULL | NULL | latin1_general_cs | NULL | | InnoDB free: 11264 kB; (`PRODUCTCODE`) REFER `sampledata`.`PRODUCTS`(`PRODUCTCOD |
| ORDERS | InnoDB | 10 | Compact | 227 | 216 | 49152 | 0 | 0 | 0 | NULL | 2007-08-25 03:04:59 | NULL | NULL | latin1_general_cs | NULL | | InnoDB free: 11264 kB |
| PAYMENTS | InnoDB | 10 | Compact | 272 | 60 | 16384 | 0 | 0 | 0 | NULL | 2007-08-25 03:04:59 | NULL | NULL | latin1_general_cs | NULL | | InnoDB free: 11264 kB |
| PRODUCTS | InnoDB | 10 | Compact | 91 | 720 | 65536 | 0 | 0 | 0 | NULL | 2007-08-25 03:04:58 | NULL | NULL | latin1_general_cs | NULL | | InnoDB free: 11264 kB |
| QUADRANT_ACTUALS | InnoDB | 10 | Compact | 148 | 110 | 16384 | 0 | 0 | 0 | NULL | 2007-08-25 03:04:59 | NULL | NULL | latin1_general_cs | NULL | | InnoDB free: 11264 kB |
| TIME | InnoDB | 10 | Compact | 207 | 237 | 49152 | 0 | 0 | 0 | NULL | 2007-08-25 03:04:59 | NULL | NULL | latin1_general_cs | NULL | | InnoDB free: 11264 kB |
| TRIAL_BALANCE | InnoDB | 10 | Compact | 22 | 744 | 16384 | 0 | 0 | 0 | NULL | 2007-08-25 03:04:59 | NULL | NULL | latin1_general_cs | NULL | | InnoDB free: 11264 kB |
+---------------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-------------------+----------+----------------+----------------------------------------------------------------------------------+
13 rows in set (0.01 sec)

The Pentaho 1.2.1 GA demo does not set the permission correctly for linux,
the files lack execution permission.


chmod -R +x /usr/local/pentaho/pentaho-demo/

Start the server and redirect STDOUT and STDERR to the one file

mkdir -p /usr/local/pentaho/pentaho-demo/logs
cd usr/local/pentaho/pentaho-demo
./start-pentaho.sh > logs/pentaho_`date +%Y%m%d`.log 2>&1 &
[1] 3375

Check the output of the server log

tail -f /usr/local/pentaho/pentaho-demo/logs/pentaho_20070825.log

JAVA_HOME set to /usr/local/java/
JAVA is /usr/local/java//bin/java
=========================================================================

JBoss Bootstrap Environment

JBOSS_HOME: /usr/local/pentaho/pentaho-demo/jboss

JAVA: /usr/local/java//bin/java

JAVA_OPTS: -server -Xms128m -Xmx512m -XX:MaxPermSize=256m -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterva
l=3600000 -Djava.awt.headless=true -Djava.io.tmpdir=/tmp/ -Dprogram.name=run.sh

CLASSPATH: /usr/local/pentaho/pentaho-demo/jboss/bin/run.jar:/usr/local/java//lib/tools.jar

=========================================================================

[Server@1a758cb]: [Thread[main,5,main]]: checkRunning(false) entered
[Server@1a758cb]: [Thread[main,5,main]]: checkRunning(false) exited
[Server@1a758cb]: Startup sequence initiated from main() method
[Server@1a758cb]: Loaded properties from [/usr/local/pentaho/pentaho-demo/data/server.properties]
[Server@1a758cb]: Initiating startup sequence...
[Server@1a758cb]: Server socket opened successfully in 6 ms.
04:26:13,863 INFO [Server] Starting JBoss (MX MicroKernel)...
04:26:13,865 INFO [Server] Release ID: JBoss [Zion] 4.0.4.GA (build: CVSTag=JBoss_4_0_4_GA date=200605151000)

04:26:13,866 INFO [Server] Home Dir: /usr/local/pentaho/pentaho-demo/jboss
04:26:13,947 INFO [Server] Home URL: file:/usr/local/pentaho/pentaho-demo/jboss/
04:26:13,949 INFO [Server] Patch URL: null
04:26:13,949 INFO [Server] Server Name: default
04:26:13,949 INFO [Server] Server Home Dir: /usr/local/pentaho/pentaho-demo/jboss/server/default
04:26:13,949 INFO [Server] Server Home URL: file:/usr/local/pentaho/pentaho-demo/jboss/server/default/
04:26:13,949 INFO [Server] Server Log Dir: /usr/local/pentaho/pentaho-demo/jboss/server/default/log
04:26:13,950 INFO [Server] Server Temp Dir: /usr/local/pentaho/pentaho-demo/jboss/server/default/tmp
04:26:13,950 INFO [Server] Root Deployment Filename: jboss-service.xml
04:26:14,735 INFO [ServerInfo] Java version: 1.5.0_12,Sun Microsystems Inc.
04:26:14,735 INFO [ServerInfo] Java VM: Java HotSpot(TM) Server VM 1.5.0_12-b04,Sun Microsystems Inc.
04:26:14,735 INFO [ServerInfo] OS-System: Linux 2.6.16-xenU,i386
04:26:16,761 INFO [Server] Core system initialized
[Server@1a758cb]: Database [index=0, id=0, db=file:sampledata/sampledata, alias=sampledata] opened sucessfully in 6835 ms.
[Server@1a758cb]: Database [index=1, id=1, db=file:shark/shark, alias=shark] opened sucessfully in 46 ms.
[Server@1a758cb]: Database [index=2, id=2, db=file:hibernate/hibernate, alias=hibernate] opened sucessfully in 13 ms.
[Server@1a758cb]: Database [index=3, id=3, db=file:quartz/quartz, alias=quartz] opened sucessfully in 79 ms.
[Server@1a758cb]: Startup sequence completed in 6984 ms.
...
04:29:36,794 INFO [STDOUT] Pentaho BI Platform server is ready. (1.2.1-625 GA)
04:29:44,906 INFO [TomcatDeployer] deploy, ctxPath=/sw-style, warUrl=.../deploy/sw-style.war/
04:29:45,985 INFO [Http11BaseProtocol] Starting Coyote HTTP/1.1 on http-0.0.0.0-8080
04:29:46,132 INFO [ChannelSocket] JK: ajp13 listening on /0.0.0.0:8009
04:29:46,276 INFO [JkMain] Jk running ID=0 time=0/163 config=null
04:29:46,294 INFO [Server] JBoss (MX MicroKernel) [4.0.4.GA (build: CVSTag=JBoss_4_0_4_GA date=200605151000)] Started in 3m:32s:342ms

Need to edit the web.xml file to get the base-url

vi /usr/local/pentaho/pentaho-demo/jboss/server/default/deploy/pentaho.war/WEB-INF/web.xml

replace localhost:8080 with your external EC2 DNS name eg: ec2-67-202-2-78.z-2.compute-1.amazonaws.com:8080

Friday, August 17, 2007

Weka Web service on EC2

Time is flying. We are working on getting a WEKA EC2 node presented as a web service, so essentially you can point your WSDL (Web Server Definition Language) aware software at it and
use the WEKA data mining tool as required.

Other stuff I have been working on and reviewing was the next build of the Pentaho BI Suite of products. I have the software downloaded, just requires time to run through the install and run the demos if available.

I am also looking at taking part in the Amazon Paid AMI, so once these AMI (Amazon Machine Images) are ready, once you sign up for Amazon EC2 (and S3) you can run specific nodes as required.

Other stuff, I ran through an install of the Oracle SOA Suite as well. So I have a couple of articles in the pipeline.

I am interested in your thoughts on using paid AMIs versus web service.

Have Fun

Paul

Tuesday, August 7, 2007

Weka Data mining on EC2 - testing

So once the VM was setup and running. It was time to see how WEKA performed in a virtual environment.

The performance on the EC2 node was good. These are not large datasets and I have a couple of those to play with in the near future. Given you can join the netflix prize competition and download a dataset with 100 Million data points (more than 2 Gig).

If you are surprised at the length of this post. I have found in the past, that when I am the one using a search engine, I want to see as much information as possible. There might be somewhere out there who in the future wants a quick solution to running WEKA without having to read the documentation.

So of this work was guided from the README, once I got the hang of it I got some datasets on Leukemia-ALLAML and ran WEKA on those.

Have Fun

Paul



List options for Weka Classifying

java weka.classifiers.trees.J48

Weka exception: No training file and no object input file given.

General options:

-t
Sets training file.
-T
Sets test file. If missing, a cross-validation will be performed on the training data.
-c
Sets index of class attribute (default: last).
-x
Sets number of folds for cross-validation (default: 10).
-s
Sets random number seed for cross-validation (default: 1).
-m
Sets file with cost matrix.
-l
Sets model input file.
-d
Sets model output file.
-v
Outputs no statistics for training data.
-o
Outputs statistics only, not the classifier.
-i
Outputs detailed information-retrieval statistics for each class.
-k
Outputs information-theoretic statistics.
-p
Only outputs predictions for test instances, along with attributes (0 for none).
-r
Only outputs cumulative margin distribution.
-z
Only outputs the source representation of the classifier, giving it the supplied name.
-g
Only outputs the graph representation of the classifier.

Options specific to weka.classifiers.trees.J48:

-U
Use unpruned tree.
-C
Set confidence threshold for pruning.
(default 0.25)
-M
Set minimum number of instances per leaf.
(default 2)
-R
Use reduced error pruning.
-N
Set number of folds for reduced error
pruning. One fold is used as pruning set.
(default 3)
-B
Use binary splits only.
-S
Don't perform subtree raising.
-L
Do not clean up after the tree has been built.
-A
Laplace smoothing for predicted probabilities.
-Q
Seed for random data shuffling (default 1).

Running the NaiveBayes Classifier on the labor dataset

java weka.classifiers.bayes.NaiveBayes -t $WEKAHOME/data/labor.arff

Naive Bayes Classifier

Class bad: Prior probability = 0.36

duration: Normal Distribution. Mean = 2 StandardDev = 0.7071 WeightSum = 20 Precision = 1.0
wage-increase-first-year: Normal Distribution. Mean = 2.6563 StandardDev = 0.8643 WeightSum = 20 Precision = 0.3125
wage-increase-second-year: Normal Distribution. Mean = 2.9524 StandardDev = 0.8193 WeightSum = 15 Precision = 0.35714285714285715
wage-increase-third-year: Normal Distribution. Mean = 2.0344 StandardDev = 0.1678 WeightSum = 4 Precision = 0.38749999999999996
cost-of-living-adjustment: Discrete Estimator. Counts = 10 2 6 (Total = 18)
working-hours: Normal Distribution. Mean = 39.4887 StandardDev = 1.8903 WeightSum = 19 Precision = 1.8571428571428572
pension: Discrete Estimator. Counts = 12 3 6 (Total = 21)
standby-pay: Normal Distribution. Mean = 2.5 StandardDev = 0.866 WeightSum = 4 Precision = 2.0
shift-differential: Normal Distribution. Mean = 2.4691 StandardDev = 1.5738 WeightSum = 9 Precision = 2.7777777777777777
education-allowance: Discrete Estimator. Counts = 4 10 (Total = 14)
statutory-holidays: Normal Distribution. Mean = 10.2 StandardDev = 0.805 WeightSum = 20 Precision = 1.2
vacation: Discrete Estimator. Counts = 12 8 3 (Total = 23)
longterm-disability-assistance: Discrete Estimator. Counts = 6 9 (Total = 15)
contribution-to-dental-plan: Discrete Estimator. Counts = 8 8 1 (Total = 17)
bereavement-assistance: Discrete Estimator. Counts = 10 4 (Total = 14)
contribution-to-health-plan: Discrete Estimator. Counts = 9 3 7 (Total = 19)


Class good: Prior probability = 0.64

duration: Normal Distribution. Mean = 2.25 StandardDev = 0.6821 WeightSum = 36 Precision = 1.0
wage-increase-first-year: Normal Distribution. Mean = 4.3837 StandardDev = 1.1773 WeightSum = 36 Precision = 0.3125
wage-increase-second-year: Normal Distribution. Mean = 4.447 StandardDev = 0.9805 WeightSum = 31 Precision = 0.35714285714285715
wage-increase-third-year: Normal Distribution. Mean = 4.5795 StandardDev = 0.7893 WeightSum = 11 Precision = 0.38749999999999996
cost-of-living-adjustment: Discrete Estimator. Counts = 14 8 3 (Total = 25)
working-hours: Normal Distribution. Mean = 37.5491 StandardDev = 2.9266 WeightSum = 32 Precision = 1.8571428571428572
pension: Discrete Estimator. Counts = 1 3 8 (Total = 12)
standby-pay: Normal Distribution. Mean = 11.2 StandardDev = 2.0396 WeightSum = 5 Precision = 2.0
shift-differential: Normal Distribution. Mean = 5.6818 StandardDev = 5.0584 WeightSum = 22 Precision = 2.7777777777777777
education-allowance: Discrete Estimator. Counts = 8 4 (Total = 12)
statutory-holidays: Normal Distribution. Mean = 11.4182 StandardDev = 1.2224 WeightSum = 33 Precision = 1.2
vacation: Discrete Estimator. Counts = 8 11 15 (Total = 34)
longterm-disability-assistance: Discrete Estimator. Counts = 16 1 (Total = 17)
contribution-to-dental-plan: Discrete Estimator. Counts = 3 9 14 (Total = 26)
bereavement-assistance: Discrete Estimator. Counts = 19 1 (Total = 20)
contribution-to-health-plan: Discrete Estimator. Counts = 1 8 15 (Total = 24)


Time taken to build model: 0.01 seconds
Time taken to test model on training data: 0.01 seconds

=== Error on training data ===

Correctly Classified Instances 56 98.2456 %
Incorrectly Classified Instances 1 1.7544 %
Kappa statistic 0.961
Mean absolute error 0.0481
Root mean squared error 0.1532
Relative absolute error 10.5249 %
Root relative squared error 32.1057 %
Total Number of Instances 57


=== Confusion Matrix ===

a b <-- classified as 19 1 | a = bad 0 37 | b = good === Stratified cross-validation === Correctly Classified Instances 51 89.4737 % Incorrectly Classified Instances 6 10.5263 % Kappa statistic 0.7741 Mean absolute error 0.1042 Root mean squared error 0.2637 Relative absolute error 22.7763 % Root relative squared error 55.2266 % Total Number of Instances 57 === Confusion Matrix === a b <-- classified as 18 2 | a = bad 4 33 | b = good Trying a different classifier from the list on the same dataset

java weka.classifiers.lazy.IBk -t $WEKAHOME/data/labor.arff

IB1 instance-based classifier
using 1 nearest neighbour(s) for classification


Time taken to build model: 0 seconds
Time taken to test model on training data: 0.02 seconds

=== Error on training data ===

Correctly Classified Instances 57 100 %
Incorrectly Classified Instances 0 0 %
Kappa statistic 1
Mean absolute error 0.0169
Root mean squared error 0.0169
Relative absolute error 3.7085 %
Root relative squared error 3.5513 %
Total Number of Instances 57


=== Confusion Matrix ===

a b <-- classified as 20 0 | a = bad 0 37 | b = good === Stratified cross-validation === Correctly Classified Instances 47 82.4561 % Incorrectly Classified Instances 10 17.5439 % Kappa statistic 0.6235 Mean absolute error 0.1876 Root mean squared error 0.4113 Relative absolute error 41.0144 % Root relative squared error 86.1487 % Total Number of Instances 57 === Confusion Matrix === a b <-- classified as 16 4 | a = bad 6 31 | b = good What the dataset looks like ARFF format

What ARFF files look like

cat $WEKAHOME/data/labor.arff

% Date: Tue, 15 Nov 88 15:44:08 EST
% From: stan
% To: aha@ICS.UCI.EDU
%
% 1. Title: Final settlements in labor negotitions in Canadian industry
%
% 2. Source Information
% -- Creators: Collective Barganing Review, montly publication,
% Labour Canada, Industrial Relations Information Service,
% Ottawa, Ontario, K1A 0J2, Canada, (819) 997-3117
% The data includes all collective agreements reached
% in the business and personal services sector for locals
% with at least 500 members (teachers, nurses, university
% staff, police, etc) in Canada in 87 and first quarter of 88.
% -- Donor: Stan Matwin, Computer Science Dept, University of Ottawa,
% 34 Somerset East, K1N 9B4, (stan@uotcsi2.bitnet)
% -- Date: November 1988
%
% 3. Past Usage:
% -- testing concept learning software, in particular
% an experimental method to learn two-tiered concept descriptions.
% The data was used to learn the description of an acceptable
% and unacceptable contract.
% The unacceptable contracts were either obtained by interviewing
% experts, or by inventing near misses.
% Examples of use are described in:
% Bergadano, F., Matwin, S., Michalski, R.,
% Zhang, J., Measuring Quality of Concept Descriptions,
% Procs. of the 3rd European Working Sessions on Learning,
% Glasgow, October 1988.
% Bergadano, F., Matwin, S., Michalski, R., Zhang, J.,
% Representing and Acquiring Imprecise and Context-dependent
% Concepts in Knowledge-based Systems, Procs. of ISMIS'88,
% North Holland, 1988.
% 4. Relevant Information:
% -- data was used to test 2tier approach with learning
% from positive and negative examples
%
% 5. Number of Instances: 57
%
% 6. Number of Attributes: 16
%
% 7. Attribute Information:
% 1. dur: duration of agreement
% [1..7]
% 2 wage1.wage : wage increase in first year of contract
% [2.0 .. 7.0]
% 3 wage2.wage : wage increase in second year of contract
% [2.0 .. 7.0]
% 4 wage3.wage : wage increase in third year of contract
% [2.0 .. 7.0]
% 5 cola : cost of living allowance
% [none, tcf, tc]
% 6 hours.hrs : number of working hours during week
% [35 .. 40]
% 7 pension : employer contributions to pension plan
% [none, ret_allw, empl_contr]
% 8 stby_pay : standby pay
% [2 .. 25]
% 9 shift_diff : shift differencial : supplement for work on II and III shift
% [1 .. 25]
% 10 educ_allw.boolean : education allowance
% [true false]
% 11 holidays : number of statutory holidays
% [9 .. 15]
% 12 vacation : number of paid vacation days
% [ba, avg, gnr]
% 13 lngtrm_disabil.boolean :
% employer's help during employee longterm disabil
% ity [true , false]
% 14 dntl_ins : employers contribution towards the dental plan
% [none, half, full]
% 15 bereavement.boolean : employer's financial contribution towards the
% covering the costs of bereavement
% [true , false]
% 16 empl_hplan : employer's contribution towards the health plan
% [none, half, full]
%
% 8. Missing Attribute Values: None
%
% 9. Class Distribution:
%
% 10. Exceptions from format instructions: no commas between attribute values.
%
%
@relation 'labor-neg-data'
@attribute 'duration' real
@attribute 'wage-increase-first-year' real
@attribute 'wage-increase-second-year' real
@attribute 'wage-increase-third-year' real
@attribute 'cost-of-living-adjustment' {'none','tcf','tc'}
@attribute 'working-hours' real
@attribute 'pension' {'none','ret_allw','empl_contr'}
@attribute 'standby-pay' real
@attribute 'shift-differential' real
@attribute 'education-allowance' {'yes','no'}
@attribute 'statutory-holidays' real
@attribute 'vacation' {'below_average','average','generous'}
@attribute 'longterm-disability-assistance' {'yes','no'}
@attribute 'contribution-to-dental-plan' {'none','half','full'}
@attribute 'bereavement-assistance' {'yes','no'}
@attribute 'contribution-to-health-plan' {'none','half','full'}
@attribute 'class' {'bad','good'}
@data
1,5,?,?,?,40,?,?,2,?,11,'average',?,?,'yes',?,'good'
2,4.5,5.8,?,?,35,'ret_allw',?,?,'yes',11,'below_average',?,'full',?,'full','good'
?,?,?,?,?,38,'empl_contr',?,5,?,11,'generous','yes','half','yes','half','good'
3,3.7,4,5,'tc',?,?,?,?,'yes',?,?,?,?,'yes',?,'good'
3,4.5,4.5,5,?,40,?,?,?,?,12,'average',?,'half','yes','half','good'
2,2,2.5,?,?,35,?,?,6,'yes',12,'average',?,?,?,?,'good'
3,4,5,5,'tc',?,'empl_contr',?,?,?,12,'generous','yes','none','yes','half','good'
3,6.9,4.8,2.3,?,40,?,?,3,?,12,'below_average',?,?,?,?,'good'
2,3,7,?,?,38,?,12,25,'yes',11,'below_average','yes','half','yes',?,'good'
1,5.7,?,?,'none',40,'empl_contr',?,4,?,11,'generous','yes','full',?,?,'good'
3,3.5,4,4.6,'none',36,?,?,3,?,13,'generous',?,?,'yes','full','good'
2,6.4,6.4,?,?,38,?,?,4,?,15,?,?,'full',?,?,'good'
2,3.5,4,?,'none',40,?,?,2,'no',10,'below_average','no','half',?,'half','bad'
3,3.5,4,5.1,'tcf',37,?,?,4,?,13,'generous',?,'full','yes','full','good'
1,3,?,?,'none',36,?,?,10,'no',11,'generous',?,?,?,?,'good'
2,4.5,4,?,'none',37,'empl_contr',?,?,?,11,'average',?,'full','yes',?,'good'
1,2.8,?,?,?,35,?,?,2,?,12,'below_average',?,?,?,?,'good'
1,2.1,?,?,'tc',40,'ret_allw',2,3,'no',9,'below_average','yes','half',?,'none','bad'
1,2,?,?,'none',38,'none',?,?,'yes',11,'average','no','none','no','none','bad'
2,4,5,?,'tcf',35,?,13,5,?,15,'generous',?,?,?,?,'good'
2,4.3,4.4,?,?,38,?,?,4,?,12,'generous',?,'full',?,'full','good'
2,2.5,3,?,?,40,'none',?,?,?,11,'below_average',?,?,?,?,'bad'
3,3.5,4,4.6,'tcf',27,?,?,?,?,?,?,?,?,?,?,'good'
2,4.5,4,?,?,40,?,?,4,?,10,'generous',?,'half',?,'full','good'
1,6,?,?,?,38,?,8,3,?,9,'generous',?,?,?,?,'good'
3,2,2,2,'none',40,'none',?,?,?,10,'below_average',?,'half','yes','full','bad'
2,4.5,4.5,?,'tcf',?,?,?,?,'yes',10,'below_average','yes','none',?,'half','good'
2,3,3,?,'none',33,?,?,?,'yes',12,'generous',?,?,'yes','full','good'
2,5,4,?,'none',37,?,?,5,'no',11,'below_average','yes','full','yes','full','good'
3,2,2.5,?,?,35,'none',?,?,?,10,'average',?,?,'yes','full','bad'
3,4.5,4.5,5,'none',40,?,?,?,'no',11,'average',?,'half',?,?,'good'
3,3,2,2.5,'tc',40,'none',?,5,'no',10,'below_average','yes','half','yes','full','bad'
2,2.5,2.5,?,?,38,'empl_contr',?,?,?,10,'average',?,?,?,?,'bad'
2,4,5,?,'none',40,'none',?,3,'no',10,'below_average','no','none',?,'none','bad'
3,2,2.5,2.1,'tc',40,'none',2,1,'no',10,'below_average','no','half','yes','full','bad'
2,2,2,?,'none',40,'none',?,?,'no',11,'average','yes','none','yes','full','bad'
1,2,?,?,'tc',40,'ret_allw',4,0,'no',11,'generous','no','none','no','none','bad'
1,2.8,?,?,'none',38,'empl_contr',2,3,'no',9,'below_average','yes','half',?,'none','bad'
3,2,2.5,2,?,37,'empl_contr',?,?,?,10,'average',?,?,'yes','none','bad'
2,4.5,4,?,'none',40,?,?,4,?,12,'average','yes','full','yes','half','good'
1,4,?,?,'none',?,'none',?,?,'yes',11,'average','no','none','no','none','bad'
2,2,3,?,'none',38,'empl_contr',?,?,'yes',12,'generous','yes','none','yes','full','bad'
2,2.5,2.5,?,'tc',39,'empl_contr',?,?,?,12,'average',?,?,'yes',?,'bad'
2,2.5,3,?,'tcf',40,'none',?,?,?,11,'below_average',?,?,'yes',?,'bad'
2,4,4,?,'none',40,'none',?,3,?,10,'below_average','no','none',?,'none','bad'
2,4.5,4,?,?,40,?,?,2,'no',10,'below_average','no','half',?,'half','bad'
2,4.5,4,?,'none',40,?,?,5,?,11,'average',?,'full','yes','full','good'
2,4.6,4.6,?,'tcf',38,?,?,?,?,?,?,'yes','half',?,'half','good'
2,5,4.5,?,'none',38,?,14,5,?,11,'below_average','yes',?,?,'full','good'
2,5.7,4.5,?,'none',40,'ret_allw',?,?,?,11,'average','yes','full','yes','full','good'
2,7,5.3,?,?,?,?,?,?,?,11,?,'yes','full',?,?,'good'
3,2,3,?,'tcf',?,'empl_contr',?,?,'yes',?,?,'yes','half','yes',?,'good'
3,3.5,4,4.5,'tcf',35,?,?,?,?,13,'generous',?,?,'yes','full','good'
3,4,3.5,?,'none',40,'empl_contr',?,6,?,11,'average','yes','full',?,'full','good'
3,5,4.4,?,'none',38,'empl_contr',10,6,?,11,'generous','yes',?,?,'full','good'
3,5,5,5,?,40,?,?,?,?,12,'average',?,'half','yes','half','good'
3,6,6,4,?,35,?,?,14,?,9,'generous','yes','full','yes','full','good'
%
%
%

Basic Statistics and Validation of dataset

java weka.core.Instances $WEKAHOME/data/labor.arff

Relation Name: labor-neg-data
Num Instances: 57
Num Attributes: 17

Name Type Nom Int Real Missing Unique Dist
1 duration Num 0% 98% 0% 1 / 2% 0 / 0% 3
2 wage-increase-first-year Num 0% 49% 49% 1 / 2% 7 / 12% 17
3 wage-increase-second-year Num 0% 47% 33% 11 / 19% 8 / 14% 15
4 wage-increase-third-year Num 0% 14% 12% 42 / 74% 6 / 11% 9
5 cost-of-living-adjustment Nom 65% 0% 0% 20 / 35% 0 / 0% 3
6 working-hours Num 0% 89% 0% 6 / 11% 3 / 5% 8
7 pension Nom 47% 0% 0% 30 / 53% 0 / 0% 3
8 standby-pay Num 0% 16% 0% 48 / 84% 6 / 11% 7
9 shift-differential Num 0% 54% 0% 26 / 46% 5 / 9% 10
10 education-allowance Nom 39% 0% 0% 35 / 61% 0 / 0% 2
11 statutory-holidays Num 0% 93% 0% 4 / 7% 0 / 0% 6
12 vacation Nom 89% 0% 0% 6 / 11% 0 / 0% 3
13 longterm-disability-assis Nom 49% 0% 0% 29 / 51% 0 / 0% 2
14 contribution-to-dental-pl Nom 65% 0% 0% 20 / 35% 0 / 0% 3
15 bereavement-assistance Nom 53% 0% 0% 27 / 47% 0 / 0% 2
16 contribution-to-health-pl Nom 65% 0% 0% 20 / 35% 0 / 0% 3
17 class Nom 100% 0% 0% 0 / 0% 0 / 0% 2

Trying Associations

java weka.associations.Apriori -t $WEKAHOME/data/weather.nominal.arff

Apriori
=======

Minimum support: 0.15 (2 instances)
Minimum metric : 0.9
Number of cycles performed: 17

Generated sets of large itemsets:

Size of set of large itemsets L(1): 12

Size of set of large itemsets L(2): 47

Size of set of large itemsets L(3): 39

Size of set of large itemsets L(4): 6

Best rules found:

1. humidity=normal windy=FALSE 4 ==> play=yes 4 conf:(1)
2. temperature=cool 4 ==> humidity=normal 4 conf:(1)
3. outlook=overcast 4 ==> play=yes 4 conf:(1)
4. temperature=cool play=yes 3 ==> humidity=normal 3 conf:(1)
5. outlook=rainy windy=FALSE 3 ==> play=yes 3 conf:(1)
6. outlook=rainy play=yes 3 ==> windy=FALSE 3 conf:(1)
7. outlook=sunny humidity=high 3 ==> play=no 3 conf:(1)
8. outlook=sunny play=no 3 ==> humidity=high 3 conf:(1)
9. temperature=cool windy=FALSE 2 ==> humidity=normal play=yes 2 conf:(1)
10. temperature=cool humidity=normal windy=FALSE 2 ==> play=yes 2 conf:(1)

Trying FILTER

java weka.filters.supervised.attribute.Discretize \
-i $WEKAHOME/data/iris.arff -c last

@relation iris-weka.filters.supervised.attribute.Discretize-Rfirst-last

@attribute sepallength {'\'(-inf-5.55]\'','\'(5.55-6.15]\'','\'(6.15-inf)\''}
@attribute sepalwidth {'\'(-inf-2.95]\'','\'(2.95-3.35]\'','\'(3.35-inf)\''}
@attribute petallength {'\'(-inf-2.45]\'','\'(2.45-4.75]\'','\'(4.75-inf)\''}
@attribute petalwidth {'\'(-inf-0.8]\'','\'(0.8-1.75]\'','\'(1.75-inf)\''}
@attribute class {Iris-setosa,Iris-versicolor,Iris-virginica}

@data

'\'(-inf-5.55]\'','\'(3.35-inf)\'','\'(-inf-2.45]\'','\'(-inf-0.8]\'',Iris-setosa
'\'(-inf-5.55]\'','\'(2.95-3.35]\'','\'(-inf-2.45]\'','\'(-inf-0.8]\'',Iris-setosa
'\'(-inf-5.55]\'','\'(2.95-3.35]\'','\'(-inf-2.45]\'','\'(-inf-0.8]\'',Iris-setosa
'\'(-inf-5.55]\'','\'(2.95-3.35]\'','\'(-inf-2.45]\'','\'(-inf-0.8]\'',Iris-setosa
'\'(-inf-5.55]\'','\'(3.35-inf)\'','\'(-inf-2.45]\'','\'(-inf-0.8]\'',Iris-setosa
'\'(-inf-5.55]\'','\'(3.35-inf)\'','\'(-inf-2.45]\'','\'(-inf-0.8]\'',Iris-setosa
'\'(-inf-5.55]\'','\'(3.35-inf)\'','\'(-inf-2.45]\'','\'(-inf-0.8]\'',Iris-setosa
'\'(-inf-5.55]\'','\'(3.35-inf)\'','\'(-inf-2.45]\'','\'(-inf-0.8]\'',Iris-setosa
'\'(-inf-5.55]\'','\'(-inf-2.95]\'','\'(-inf-2.45]\'','\'(-inf-0.8]\'',Iris-setosa
'\'(-inf-5.55]\'','\'(2.95-3.35]\'','\'(-inf-2.45]\'','\'(-inf-0.8]\'',Iris-setosa
'\'(-inf-5.55]\'','\'(3.35-inf)\'','\'(-inf-2.45]\'','\'(-inf-0.8]\'',Iris-setosa
'\'(-inf-5.55]\'','\'(3.35-inf)\'','\'(-inf-2.45]\'','\'(-inf-0.8]\'',Iris-setosa
...


Running an experiment

java weka.experiment.Experiment -r -T $WEKAHOME/data/iris.arff \
-D weka.experiment.InstancesResultListener \
-P weka.experiment.RandomSplitResultProducer -- \
-W weka.experiment.ClassifierSplitEvaluator -- \
-W weka.classifiers.rules.OneR

Experiment:
Runs from: 1 to: 10
Datasets: /usr/local/weka/data/iris.arff
Custom property iterator: off
ResultProducer: RandomSplitResultProducer: -P 66.0 -W weka.experiment.ClassifierSplitEvaluator --:
ResultListener: weka.experiment.InstancesResultListener@1270b73

Initializing...
RandomSplitResultProducer: setting additional measures for split evaluator
Iterating...
Postprocessing...

Running the Lazy Classifier on larger dataset:

java weka.classifiers.lazy.IBk -t $WEKAHOME/data/soybean.arff

IB1 instance-based classifier
using 1 nearest neighbour(s) for classification


Time taken to build model: 0.01 seconds
Time taken to test model on training data: 4.38 seconds

=== Error on training data ===

Correctly Classified Instances 682 99.8536 %
Incorrectly Classified Instances 1 0.1464 %
Kappa statistic 0.9984
Mean absolute error 0.0029
Root mean squared error 0.0152
Relative absolute error 2.9949 %
Root relative squared error 6.9346 %
Total Number of Instances 683


=== Confusion Matrix ===

a b c d e f g h i j k l m n o p q r s <-- classified as 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | a = diaporthe-stem-canker 0 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | b = charcoal-rot 0 0 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | c = rhizoctonia-root-rot 0 0 0 88 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | d = phytophthora-rot 0 0 0 0 44 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | e = brown-stem-rot 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0 0 0 0 | f = powdery-mildew 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0 0 0 | g = downy-mildew 0 0 0 0 0 0 0 92 0 0 0 0 0 0 0 0 0 0 0 | h = brown-spot 0 0 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0 | i = bacterial-blight 0 0 0 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 | j = bacterial-pustule 0 0 0 0 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 | k = purple-seed-stain 0 0 0 0 0 0 0 0 0 0 0 44 0 0 0 0 0 0 0 | l = anthracnose 0 0 0 0 0 0 0 0 0 0 0 0 20 0 0 0 0 0 0 | m = phyllosticta-leaf-spot 0 0 0 0 0 0 0 0 0 0 0 0 0 91 0 0 0 0 0 | n = alternarialeaf-spot 0 0 0 0 0 0 0 0 0 0 0 0 0 1 90 0 0 0 0 | o = frog-eye-leaf-spot 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15 0 0 0 | p = diaporthe-pod-&-stem-blight 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 0 0 | q = cyst-nematode 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16 0 | r = 2-4-d-injury 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 | s = herbicide-injury === Stratified cross-validation === Correctly Classified Instances 623 91.2152 % Incorrectly Classified Instances 60 8.7848 % Kappa statistic 0.9036 Mean absolute error 0.0122 Root mean squared error 0.0879 Relative absolute error 12.71 % Root relative squared error 40.1285 % Total Number of Instances 683 === Confusion Matrix === a b c d e f g h i j k l m n o p q r s <-- classified as 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | a = diaporthe-stem-canker 0 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | b = charcoal-rot 0 0 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | c = rhizoctonia-root-rot 0 0 0 88 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | d = phytophthora-rot 0 0 0 0 44 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | e = brown-stem-rot 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0 0 0 0 | f = powdery-mildew 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0 0 0 | g = downy-mildew 0 0 0 0 0 0 0 81 0 0 0 0 5 4 2 0 0 0 0 | h = brown-spot 0 0 0 0 0 0 0 0 19 1 0 0 0 0 0 0 0 0 0 | i = bacterial-blight 0 0 0 0 0 0 0 0 2 17 0 0 1 0 0 0 0 0 0 | j = bacterial-pustule 0 0 0 0 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 | k = purple-seed-stain 0 0 0 0 0 0 0 0 0 0 0 44 0 0 0 0 0 0 0 | l = anthracnose 0 0 0 0 0 0 0 6 0 0 0 0 13 0 1 0 0 0 0 | m = phyllosticta-leaf-spot 0 0 0 0 0 0 0 4 0 0 0 0 0 81 6 0 0 0 0 | n = alternarialeaf-spot 0 0 0 0 0 0 0 3 0 0 0 0 0 17 71 0 0 0 0 | o = frog-eye-leaf-spot 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15 0 0 0 | p = diaporthe-pod-&-stem-blight 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 0 0 | q = cyst-nematode 2 1 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 8 3 | r = 2-4-d-injury 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 | s = herbicide-injury


Testing the Instances call

java weka.core.Instances $WEKAHOME/data/soybean.arff

Relation Name: soybean
Num Instances: 683
Num Attributes: 36

Name Type Nom Int Real Missing Unique Dist
1 date Nom 100% 0% 0% 1 / 0% 0 / 0% 7
2 plant-stand Nom 95% 0% 0% 36 / 5% 0 / 0% 2
3 precip Nom 94% 0% 0% 38 / 6% 0 / 0% 3
4 temp Nom 96% 0% 0% 30 / 4% 0 / 0% 3
5 hail Nom 82% 0% 0% 121 / 18% 0 / 0% 2
6 crop-hist Nom 98% 0% 0% 16 / 2% 0 / 0% 4
7 area-damaged Nom 100% 0% 0% 1 / 0% 0 / 0% 4
8 severity Nom 82% 0% 0% 121 / 18% 0 / 0% 3
9 seed-tmt Nom 82% 0% 0% 121 / 18% 0 / 0% 3
10 germination Nom 84% 0% 0% 112 / 16% 0 / 0% 3
11 plant-growth Nom 98% 0% 0% 16 / 2% 0 / 0% 2
12 leaves Nom 100% 0% 0% 0 / 0% 0 / 0% 2
13 leafspots-halo Nom 88% 0% 0% 84 / 12% 0 / 0% 3
14 leafspots-marg Nom 88% 0% 0% 84 / 12% 0 / 0% 3
15 leafspot-size Nom 88% 0% 0% 84 / 12% 0 / 0% 3
16 leaf-shread Nom 85% 0% 0% 100 / 15% 0 / 0% 2
17 leaf-malf Nom 88% 0% 0% 84 / 12% 0 / 0% 2
18 leaf-mild Nom 84% 0% 0% 108 / 16% 0 / 0% 3
19 stem Nom 98% 0% 0% 16 / 2% 0 / 0% 2
20 lodging Nom 82% 0% 0% 121 / 18% 0 / 0% 2
21 stem-cankers Nom 94% 0% 0% 38 / 6% 0 / 0% 4
22 canker-lesion Nom 94% 0% 0% 38 / 6% 0 / 0% 4
23 fruiting-bodies Nom 84% 0% 0% 106 / 16% 0 / 0% 2
24 external-decay Nom 94% 0% 0% 38 / 6% 0 / 0% 3
25 mycelium Nom 94% 0% 0% 38 / 6% 0 / 0% 2
26 int-discolor Nom 94% 0% 0% 38 / 6% 0 / 0% 3
27 sclerotia Nom 94% 0% 0% 38 / 6% 0 / 0% 2
28 fruit-pods Nom 88% 0% 0% 84 / 12% 0 / 0% 4
29 fruit-spots Nom 84% 0% 0% 106 / 16% 0 / 0% 4
30 seed Nom 87% 0% 0% 92 / 13% 0 / 0% 2
31 mold-growth Nom 87% 0% 0% 92 / 13% 0 / 0% 2
32 seed-discolor Nom 84% 0% 0% 106 / 16% 0 / 0% 2
33 seed-size Nom 87% 0% 0% 92 / 13% 0 / 0% 2
34 shriveling Nom 84% 0% 0% 106 / 16% 0 / 0% 2
35 roots Nom 95% 0% 0% 31 / 5% 0 / 0% 3
36 class Nom 100% 0% 0% 0 / 0% 0 / 0% 19

Using NaiveBayes Classifier on soybean data

java weka.classifiers.bayes.NaiveBayes -t $WEKAHOME/data/soybean.arff

Naive Bayes Classifier

Class diaporthe-stem-canker: Prior probability = 0.03

date: Discrete Estimator. Counts = 1 1 1 6 6 6 6 (Total = 27)
plant-stand: Discrete Estimator. Counts = 21 1 (Total = 22)
precip: Discrete Estimator. Counts = 1 1 21 (Total = 23)
temp: Discrete Estimator. Counts = 1 21 1 (Total = 23)
hail: Discrete Estimator. Counts = 20 2 (Total = 22)
crop-hist: Discrete Estimator. Counts = 1 7 8 8 (Total = 24)
area-damaged: Discrete Estimator. Counts = 18 4 1 1 (Total = 24)
severity: Discrete Estimator. Counts = 1 15 7 (Total = 23)
seed-tmt: Discrete Estimator. Counts = 12 10 1 (Total = 23)
...

Time taken to build model: 0.01 seconds
Time taken to test model on training data: 0.11 seconds

=== Error on training data ===

Correctly Classified Instances 640 93.7042 %
Incorrectly Classified Instances 43 6.2958 %
Kappa statistic 0.931
Mean absolute error 0.0081
Root mean squared error 0.0765
Relative absolute error 8.4277 %
Root relative squared error 34.8958 %
Total Number of Instances 683


=== Confusion Matrix ===

a b c d e f g h i j k l m n o p q r s <-- classified as 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | a = diaporthe-stem-canker 0 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | b = charcoal-rot 0 0 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | c = rhizoctonia-root-rot 0 0 0 88 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | d = phytophthora-rot 0 0 0 0 44 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | e = brown-stem-rot 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0 0 0 0 | f = powdery-mildew 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0 0 0 | g = downy-mildew 0 0 0 0 0 0 0 79 0 0 0 0 5 4 4 0 0 0 0 | h = brown-spot 0 0 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0 | i = bacterial-blight 0 0 0 0 0 0 0 0 1 19 0 0 0 0 0 0 0 0 0 | j = bacterial-pustule 0 0 0 0 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 | k = purple-seed-stain 0 0 0 0 0 0 0 0 0 0 0 44 0 0 0 0 0 0 0 | l = anthracnose 0 0 0 0 0 0 0 2 0 0 0 0 18 0 0 0 0 0 0 | m = phyllosticta-leaf-spot 0 0 0 0 0 0 0 0 0 0 0 0 0 91 0 0 0 0 0 | n = alternarialeaf-spot 0 0 0 0 0 0 0 3 0 0 0 0 0 21 66 1 0 0 0 | o = frog-eye-leaf-spot 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15 0 0 0 | p = diaporthe-pod-&-stem-blight 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 0 0 | q = cyst-nematode 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 14 0 | r = 2-4-d-injury 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 | s = herbicide-injury === Stratified cross-validation === Correctly Classified Instances 635 92.9722 % Incorrectly Classified Instances 48 7.0278 % Kappa statistic 0.923 Mean absolute error 0.0096 Root mean squared error 0.0817 Relative absolute error 9.9344 % Root relative squared error 37.2742 % Total Number of Instances 683 === Confusion Matrix === a b c d e f g h i j k l m n o p q r s <-- classified as 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | a = diaporthe-stem-canker 0 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | b = charcoal-rot 0 0 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | c = rhizoctonia-root-rot 0 0 0 88 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | d = phytophthora-rot 0 0 0 0 44 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | e = brown-stem-rot 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0 0 0 0 | f = powdery-mildew 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0 0 0 | g = downy-mildew 0 0 0 0 0 0 0 77 0 0 0 0 5 6 4 0 0 0 0 | h = brown-spot 0 0 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0 | i = bacterial-blight 0 0 0 0 0 0 0 0 2 18 0 0 0 0 0 0 0 0 0 | j = bacterial-pustule 0 0 0 0 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 | k = purple-seed-stain 0 0 0 0 0 0 0 0 0 0 0 44 0 0 0 0 0 0 0 | l = anthracnose 0 0 0 0 0 0 0 2 0 0 0 0 17 1 0 0 0 0 0 | m = phyllosticta-leaf-spot 0 0 0 0 0 0 0 0 0 0 0 0 0 91 0 0 0 0 0 | n = alternarialeaf-spot 0 0 0 0 0 0 0 3 0 0 0 0 0 22 65 1 0 0 0 | o = frog-eye-leaf-spot 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15 0 0 0 | p = diaporthe-pod-&-stem-blight 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 0 0 | q = cyst-nematode 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 14 0 | r = 2-4-d-injury 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 | s = herbicide-injury The same dataset with as a Pruned Decision Tree

java weka.classifiers.trees.J48 -t $WEKAHOME/data/soybean.arff

J48 pruned tree
------------------

leafspot-size = lt-1/8
| canker-lesion = dna
| | leafspots-marg = w-s-marg
| | | seed-size = norm: bacterial-blight (21.0/1.0)
| | | seed-size = lt-norm: bacterial-pustule (3.23/1.23)
| | leafspots-marg = no-w-s-marg: bacterial-pustule (17.91/0.91)
| | leafspots-marg = dna: bacterial-blight (0.0)
| canker-lesion = brown: bacterial-blight (0.0)
| canker-lesion = dk-brown-blk: phytophthora-rot (4.78/0.1)
| canker-lesion = tan: purple-seed-stain (11.23/0.23)
leafspot-size = gt-1/8
| roots = norm
| | mold-growth = absent
| | | fruit-spots = absent
| | | | leaf-malf = absent
| | | | | fruiting-bodies = absent
| | | | | | date = april: brown-spot (5.0)
| | | | | | date = may: brown-spot (24.0/1.0)
| | | | | | date = june
| | | | | | | precip = lt-norm: phyllosticta-leaf-spot (4.0)
| | | | | | | precip = norm: brown-spot (5.0/2.0)
| | | | | | | precip = gt-norm: brown-spot (21.0)
| | | | | | date = july
| | | | | | | precip = lt-norm: phyllosticta-leaf-spot (1.0)
| | | | | | | precip = norm: phyllosticta-leaf-spot (2.0)
| | | | | | | precip = gt-norm: frog-eye-leaf-spot (11.0/5.0)
| | | | | | date = august
| | | | | | | leaf-shread = absent
| | | | | | | | seed-tmt = none: alternarialeaf-spot (16.0/4.0)
| | | | | | | | seed-tmt = fungicide
| | | | | | | | | plant-stand = normal: frog-eye-leaf-spot (6.0)
| | | | | | | | | plant-stand = lt-normal: alternarialeaf-spot (5.0/1.0)
| | | | | | | | seed-tmt = other: frog-eye-leaf-spot (3.0)
| | | | | | | leaf-shread = present: alternarialeaf-spot (2.0)
| | | | | | date = september
| | | | | | | stem = norm: alternarialeaf-spot (44.0/4.0)
| | | | | | | stem = abnorm: frog-eye-leaf-spot (2.0)
| | | | | | date = october: alternarialeaf-spot (31.0/1.0)
| | | | | fruiting-bodies = present: brown-spot (34.0)
| | | | leaf-malf = present: phyllosticta-leaf-spot (10.0)
| | | fruit-spots = colored
| | | | fruit-pods = norm: brown-spot (2.0)
| | | | fruit-pods = diseased: frog-eye-leaf-spot (62.0)
| | | | fruit-pods = few-present: frog-eye-leaf-spot (0.0)
| | | | fruit-pods = dna: frog-eye-leaf-spot (0.0)
| | | fruit-spots = brown-w/blk-specks
| | | | crop-hist = diff-lst-year: brown-spot (0.0)
| | | | crop-hist = same-lst-yr: brown-spot (2.0)
| | | | crop-hist = same-lst-two-yrs: brown-spot (0.0)
| | | | crop-hist = same-lst-sev-yrs: frog-eye-leaf-spot (2.0)
| | | fruit-spots = distort: brown-spot (0.0)
| | | fruit-spots = dna: brown-stem-rot (9.0)
| | mold-growth = present
| | | leaves = norm: diaporthe-pod-&-stem-blight (7.25)
| | | leaves = abnorm: downy-mildew (20.0)
| roots = rotted
| | area-damaged = scattered: herbicide-injury (1.1/0.1)
| | area-damaged = low-areas: phytophthora-rot (30.03)
| | area-damaged = upper-areas: phytophthora-rot (0.0)
| | area-damaged = whole-field: herbicide-injury (3.66/0.66)
| roots = galls-cysts: cyst-nematode (7.81/0.17)
leafspot-size = dna
| int-discolor = none
| | leaves = norm
| | | stem-cankers = absent
| | | | canker-lesion = dna: diaporthe-pod-&-stem-blight (5.53)
| | | | canker-lesion = brown: purple-seed-stain (0.0)
| | | | canker-lesion = dk-brown-blk: purple-seed-stain (0.0)
| | | | canker-lesion = tan: purple-seed-stain (9.0)
| | | stem-cankers = below-soil: rhizoctonia-root-rot (19.0)
| | | stem-cankers = above-soil: anthracnose (0.0)
| | | stem-cankers = above-sec-nde: anthracnose (24.0)
| | leaves = abnorm
| | | stem = norm
| | | | plant-growth = norm: powdery-mildew (22.0/2.0)
| | | | plant-growth = abnorm: cyst-nematode (4.3/0.39)
| | | stem = abnorm
| | | | plant-stand = normal
| | | | | leaf-malf = absent
| | | | | | seed = norm: diaporthe-stem-canker (21.0/1.0)
| | | | | | seed = abnorm: anthracnose (9.0)
| | | | | leaf-malf = present: 2-4-d-injury (3.0)
| | | | plant-stand = lt-normal
| | | | | fruiting-bodies = absent: phytophthora-rot (50.16/7.61)
| | | | | fruiting-bodies = present
| | | | | | roots = norm: anthracnose (11.0/1.0)
| | | | | | roots = rotted: phytophthora-rot (12.89/2.15)
| | | | | | roots = galls-cysts: phytophthora-rot (0.0)
| int-discolor = brown
| | leaf-malf = absent: brown-stem-rot (35.73/0.73)
| | leaf-malf = present: 2-4-d-injury (3.15/0.68)
| int-discolor = black: charcoal-rot (22.22/2.22)

Number of Leaves : 61

Size of the tree : 93


Time taken to build model: 0.23 seconds
Time taken to test model on training data: 0.09 seconds

=== Error on training data ===

Correctly Classified Instances 658 96.3397 %
Incorrectly Classified Instances 25 3.6603 %
Kappa statistic 0.9598
Mean absolute error 0.0104
Root mean squared error 0.0625
Relative absolute error 10.7981 %
Root relative squared error 28.5358 %
Total Number of Instances 683


=== Confusion Matrix ===

a b c d e f g h i j k l m n o p q r s <-- classified as 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | a = diaporthe-stem-canker 0 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | b = charcoal-rot 1 0 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | c = rhizoctonia-root-rot 0 0 0 88 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | d = phytophthora-rot 0 0 0 0 44 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | e = brown-stem-rot 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0 0 0 0 | f = powdery-mildew 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0 0 0 | g = downy-mildew 0 0 0 0 0 0 0 90 0 0 0 0 0 0 2 0 0 0 0 | h = brown-spot 0 0 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0 | i = bacterial-blight 0 0 0 0 0 0 0 0 1 19 0 0 0 0 0 0 0 0 0 | j = bacterial-pustule 0 0 0 0 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 | k = purple-seed-stain 0 0 0 1 0 0 0 0 0 0 0 43 0 0 0 0 0 0 0 | l = anthracnose 0 0 0 0 0 0 0 3 0 0 0 0 17 0 0 0 0 0 0 | m = phyllosticta-leaf-spot 0 0 0 0 0 0 0 0 0 0 0 0 0 88 3 0 0 0 0 | n = alternarialeaf-spot 0 0 0 0 0 0 0 0 0 0 0 0 0 10 81 0 0 0 0 | o = frog-eye-leaf-spot 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15 0 0 0 | p = diaporthe-pod-&-stem-blight 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 0 0 | q = cyst-nematode 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16 0 | r = 2-4-d-injury 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 | s = herbicide-injury === Stratified cross-validation === Correctly Classified Instances 625 91.5081 % Incorrectly Classified Instances 58 8.4919 % Kappa statistic 0.9068 Mean absolute error 0.0135 Root mean squared error 0.0842 Relative absolute error 14.0484 % Root relative squared error 38.4134 % Total Number of Instances 683 === Confusion Matrix === a b c d e f g h i j k l m n o p q r s <-- classified as 19 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | a = diaporthe-stem-canker 0 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | b = charcoal-rot 1 0 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | c = rhizoctonia-root-rot 0 0 0 87 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 | d = phytophthora-rot 0 0 0 0 44 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | e = brown-stem-rot 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0 0 0 0 | f = powdery-mildew 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0 0 0 | g = downy-mildew 0 0 0 0 0 0 0 85 0 0 0 0 2 1 4 0 0 0 0 | h = brown-spot 0 0 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0 | i = bacterial-blight 0 0 0 0 0 0 0 0 1 19 0 0 0 0 0 0 0 0 0 | j = bacterial-pustule 0 0 0 0 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 | k = purple-seed-stain 0 0 0 4 0 0 0 0 0 0 0 40 0 0 0 0 0 0 0 | l = anthracnose 0 0 0 0 0 0 0 3 0 0 0 0 14 0 3 0 0 0 0 | m = phyllosticta-leaf-spot 0 0 0 0 0 0 0 1 0 0 0 0 0 85 5 0 0 0 0 | n = alternarialeaf-spot 0 0 0 0 0 0 0 3 0 0 0 0 1 20 67 0 0 0 0 | o = frog-eye-leaf-spot 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15 0 0 0 | p = diaporthe-pod-&-stem-blight 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 0 0 | q = cyst-nematode 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 14 0 | r = 2-4-d-injury 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 2 3 | s = herbicide-injury Building a data model

java weka.classifiers.trees.J48 -t $WEKAHOME/data/soybean.arff \
-i -k -d J48-data.model > J48-data.out &

On the segment data provider, build on one set, check against another

java weka.classifiers.trees.J48 -t $WEKAHOME/data/segment-test.arff \
-i -k -d J48-segment-data.model >J48-segment-data.out

The results:

[weka@domU-12-31-36-00-26-23 tutorial]$ ls -l
total 108
-rw-rw-r-- 1 weka weka 60556 Aug 1 09:13 J48-data.model
-rw-rw-r-- 1 weka weka 12906 Aug 1 09:13 J48-data.out
-rw-rw-r-- 1 weka weka 18784 Aug 1 09:17 J48-segment-data.model
-rw-rw-r-- 1 weka weka 6146 Aug 1 09:17 J48-segment-data.out

more J48-segment-data.out

J48 pruned tree
------------------

region-centroid-row <= 155 | intensity-mean <= 31.6296 | | hue-mean <= -1.84512 | | | hue-mean <= -2.22949 | | | | saturation-mean <= 0.48999: window (3.0) | | | | saturation-mean > 0.48999: foliage (77.0)
| | | hue-mean > -2.22949
| | | | saturation-mean <= 0.864482 | | | | | rawgreen-mean <= 14.6667 | | | | | | region-centroid-col <= 100 | | | | | | | hue-mean <= -2.03349 | | | | | | | | hue-mean <= -2.14532: foliage (2.0) | | | | | | | | hue-mean > -2.14532: window (13.0/3.0)
| | | | | | | hue-mean > -2.03349
| | | | | | | | region-centroid-row <= 150: brickface (2.0) | | | | | | | | region-centroid-row > 150: window (2.0)
| | | | | | region-centroid-col > 100: window (56.0)
| | | | | rawgreen-mean > 14.6667
| | | | | | region-centroid-row <= 122: window (26.0/1.0) | | | | | | region-centroid-row > 122
| | | | | | | region-centroid-col <= 165: cement (10.0) | | | | | | | region-centroid-col > 165: window (4.0/1.0)
| | | | saturation-mean > 0.864482
| | | | | hue-mean <= -2.101: foliage (22.0) | | | | | hue-mean > -2.101
| | | | | | region-centroid-row <= 132 | | | | | | | hue-mean <= -2.08047: foliage (9.0) | | | | | | | hue-mean > -2.08047: window (3.0/1.0)
| | | | | | region-centroid-row > 132
| | | | | | | region-centroid-row <= 143: window (10.0) | | | | | | | region-centroid-row > 143: foliage (2.0)
| | hue-mean > -1.84512
| | | exgreen-mean <= -5.77778 | | | | exred-mean <= -5.88889 | | | | | region-centroid-row <= 104: brickface (6.0) | | | | | region-centroid-row > 104: foliage (3.0)
| | | | exred-mean > -5.88889: brickface (118.0/1.0)
| | | exgreen-mean > -5.77778
| | | | exred-mean <= -0.777778: grass (5.0/1.0) | | | | exred-mean > -0.777778
| | | | | region-centroid-col <= 34: foliage (2.0) | | | | | region-centroid-col > 34: window (14.0)
| intensity-mean > 31.6296
| | rawblue-mean <= 88.4444: cement (94.0/1.0) | | rawblue-mean > 88.4444: sky (110.0)
region-centroid-row > 155
| rawred-mean <= 23.3333 | | exgreen-mean <= -3.77778: cement (5.0/1.0) | | exgreen-mean > -3.77778: grass (118.0)
| rawred-mean > 23.3333: path (94.0)

Number of Leaves : 26

Size of the tree : 51


Time taken to build model: 0.45 seconds
Time taken to test model on training data: 0.02 seconds

=== Error on training data ===

Correctly Classified Instances 800 98.7654 %
Incorrectly Classified Instances 10 1.2346 %
Kappa statistic 0.9856
K&B Relative Info Score 79692.1947 %
K&B Information Score 2232.1312 bits 2.7557 bits/instance
Class complexity | order 0 2268.6706 bits 2.8008 bits/instance
Class complexity | scheme 45.7746 bits 0.0565 bits/instance
Complexity improvement (Sf) 2222.896 bits 2.7443 bits/instance
Mean absolute error 0.0058
Root mean squared error 0.054
Relative absolute error 2.3848 %
Root relative squared error 15.443 %
Total Number of Instances 810


=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure Class
1 0.001 0.992 1 0.996 brickface
1 0 1 1 1 sky
0.959 0 1 0.959 0.979 foliage
0.973 0.003 0.982 0.973 0.977 cement
0.992 0.009 0.954 0.992 0.973 window
1 0 1 1 1 path
0.992 0.001 0.992 0.992 0.992 grass


=== Confusion Matrix ===

a b c d e f g <-- classified as 125 0 0 0 0 0 0 | a = brickface 0 110 0 0 0 0 0 | b = sky 0 0 117 1 4 0 0 | c = foliage 0 0 0 107 2 0 1 | d = cement 1 0 0 0 125 0 0 | e = window 0 0 0 0 0 94 0 | f = path 0 0 0 1 0 0 122 | g = grass === Stratified cross-validation === Correctly Classified Instances 757 93.4568 % Incorrectly Classified Instances 53 6.5432 % Kappa statistic 0.9235 K&B Relative Info Score 75326.8356 % K&B Information Score 2110.05 bits 2.605 bits/instance Class complexity | order 0 2268.8296 bits 2.801 bits/instance Class complexity | scheme 37665.7637 bits 46.5009 bits/instance Complexity improvement (Sf) -35396.9341 bits -43.6999 bits/instance Mean absolute error 0.02 Root mean squared error 0.1312 Relative absolute error 8.1735 % Root relative squared error 37.5168 % Total Number of Instances 810 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure Class 0.96 0.009 0.952 0.96 0.956 brickface 1 0.001 0.991 1 0.995 sky 0.844 0.022 0.873 0.844 0.858 foliage 0.9 0.01 0.934 0.9 0.917 cement 0.881 0.031 0.841 0.881 0.86 window 0.989 0.001 0.989 0.989 0.989 path 0.984 0.003 0.984 0.984 0.984 grass === Confusion Matrix === a b c d e f g <-- classified as 120 0 3 0 2 0 0 | a = brickface 0 110 0 0 0 0 0 | b = sky 4 0 103 1 14 0 0 | c = foliage 0 1 2 99 5 1 2 | d = cement 2 0 10 3 111 0 0 | e = window 0 0 0 1 0 93 0 | f = path 0 0 0 2 0 0 121 | g = grass Checking meta classifier:

java weka.classifiers.meta.ClassificationViaRegression \
-W weka.classifiers.functions.LinearRegression \
-t $WEKAHOME/data/iris.arff -x 2 -- -S 1

Options: -W weka.classifiers.functions.LinearRegression -- -S 1

Classification via Regression

Classifier for class with index 0:


Linear Regression Model

class =

0.0656 * sepallength +
0.2425 * sepalwidth +
-0.2228 * petallength +
-0.0634 * petalwidth +
0.1225

Classifier for class with index 1:


Linear Regression Model

class =

-0.0215 * sepallength +
-0.4407 * sepalwidth +
0.2185 * petallength +
-0.4832 * petalwidth +
1.563

Classifier for class with index 2:


Linear Regression Model

class =

-0.0441 * sepallength +
0.1982 * sepalwidth +
0.0042 * petallength +
0.5465 * petalwidth +
-0.6854



Time taken to build model: 0.14 seconds
Time taken to test model on training data: 0.01 seconds

=== Error on training data ===

Correctly Classified Instances 127 84.6667 %
Incorrectly Classified Instances 23 15.3333 %
Kappa statistic 0.77
Mean absolute error 0.2164
Root mean squared error 0.2943
Relative absolute error 48.6997 %
Root relative squared error 62.4309 %
Total Number of Instances 150


=== Confusion Matrix ===

a b c <-- classified as 50 0 0 | a = Iris-setosa 0 34 16 | b = Iris-versicolor 0 7 43 | c = Iris-virginica === Stratified cross-validation === Correctly Classified Instances 123 82 % Incorrectly Classified Instances 27 18 % Kappa statistic 0.73 Mean absolute error 0.2349 Root mean squared error 0.3157 Relative absolute error 52.8443 % Root relative squared error 66.9658 % Total Number of Instances 150 === Confusion Matrix === a b c <-- classified as 49 1 0 | a = Iris-setosa 0 33 17 | b = Iris-versicolor 0 9 41 | c = Iris-virginica Testing some real datasets now Leukemia-ALLAML

The data can be found here http://research.i2r.a-star.edu.sg/rp/Leukemia/ALLAML.html

java weka.classifiers.trees.J48 -t $WEKAHOME/data/ALL-AML_train.arff \
-T $WEKAHOME/data/ALL-AML_test.arff -i -k \
-d Leukemia-ALLAML.tree.J48.model > Leukemia-ALLAML.tree.J48.out

The results:

more Leukemia-ALLAML.tree.J48.out

J48 pruned tree
------------------

attribute4847 <= 938: ALL (27.0) attribute4847 > 938: AML (11.0)

Number of Leaves : 2

Size of the tree : 3


Time taken to build model: 1.13 seconds
Time taken to test model on training data: 0.07 seconds

=== Error on training data ===

Correctly Classified Instances 38 100 %
Incorrectly Classified Instances 0 0 %
Kappa statistic 1
K&B Relative Info Score 3744.5181 %
K&B Information Score 33.0001 bits 0.8684 bits/instance
Class complexity | order 0 33.0001 bits 0.8684 bits/instance
Class complexity | scheme 0 bits 0 bits/instance
Complexity improvement (Sf) 33.0001 bits 0.8684 bits/instance
Mean absolute error 0
Root mean squared error 0
Relative absolute error 0 %
Root relative squared error 0 %
Total Number of Instances 38


=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure Class
1 0 1 1 1 ALL
1 0 1 1 1 AML


=== Confusion Matrix ===

a b <-- classified as 27 0 | a = ALL 0 11 | b = AML === Error on test data === Correctly Classified Instances 31 91.1765 % Incorrectly Classified Instances 3 8.8235 % Kappa statistic 0.8198 K&B Relative Info Score 3160.6324 % K&B Information Score 27.8544 bits 0.8192 bits/instance Class complexity | order 0 34.609 bits 1.0179 bits/instance Class complexity | scheme 3222 bits 94.7647 bits/instance Complexity improvement (Sf) -3187.391 bits -93.7468 bits/instance Mean absolute error 0.0882 Root mean squared error 0.297 Relative absolute error 18.9873 % Root relative squared error 58.8575 % Total Number of Instances 34 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure Class 0.9 0.071 0.947 0.9 0.923 ALL 0.929 0.1 0.867 0.929 0.897 AML === Confusion Matrix === a b <-- classified as 18 2 | a = ALL 1 13 | b = AML Same data with NaiveBayes:

java weka.classifiers.bayes.NaiveBayes -t $WEKAHOME/data/ALL-AML_train.arff \
-T $WEKAHOME/data/ALL-AML_test.arff -i -k \
-d Leukemia-ALLAML.NaiveBayes.J48.model > Leukemia-ALLAML.NaiveBayes.J48.out

Checking the results (trimmed):

tail -100 Leukemia-ALLAML.NaiveBayes.J48.out

attribute7096: Normal Distribution. Mean = 17632.9975 StandardDev = 5491.6378 WeightSum = 11 Precision = 885.6756756756756
attribute7097: Normal Distribution. Mean = 16260.5405 StandardDev = 1742.1779 WeightSum = 11 Precision = 687.9459459459459
attribute7098: Normal Distribution. Mean = 918.855 StandardDev = 410.0149 WeightSum = 11 Precision = 64.37837837837837
attribute7099: Normal Distribution. Mean = 280.1548 StandardDev = 173.7343 WeightSum = 11 Precision = 17.216216216216218
attribute7100: Normal Distribution. Mean = 59.3889 StandardDev = 189.0803 WeightSum = 11 Precision = 29.694444444444443
attribute7101: Normal Distribution. Mean = 11265.5725 StandardDev = 2448.7777 WeightSum = 11 Precision = 390.9189189189189
attribute7102: Normal Distribution. Mean = 10453.7396 StandardDev = 3122.437 WeightSum = 11 Precision = 419.6756756756757
attribute7103: Normal Distribution. Mean = 318.7273 StandardDev = 376.3747 WeightSum = 11 Precision = 47.37837837837838
attribute7104: Normal Distribution. Mean = 2731.2801 StandardDev = 1380.7546 WeightSum = 11 Precision = 236.56756756756758
attribute7105: Normal Distribution. Mean = -288.0413 StandardDev = 90.8241 WeightSum = 11 Precision = 11.606060606060606
attribute7106: Normal Distribution. Mean = 0 StandardDev = 63.0836 WeightSum = 11 Precision = 7.324324324324325
attribute7107: Normal Distribution. Mean = 300.6417 StandardDev = 114.8094 WeightSum = 11 Precision = 27.558823529411764
attribute7108: Normal Distribution. Mean = -6.5039 StandardDev = 40.9087 WeightSum = 11 Precision = 8.942857142857143
attribute7109: Normal Distribution. Mean = 249.1057 StandardDev = 80.4043 WeightSum = 11 Precision = 16.81081081081081
attribute7110: Normal Distribution. Mean = 56.7107 StandardDev = 49.8522 WeightSum = 11 Precision = 6.636363636363637
attribute7111: Normal Distribution. Mean = 63.7126 StandardDev = 31.0336 WeightSum = 11 Precision = 9.870967741935484
attribute7112: Normal Distribution. Mean = -16.5111 StandardDev = 217.4379 WeightSum = 11 Precision = 25.945945945945947
attribute7113: Normal Distribution. Mean = 267.1091 StandardDev = 128.0862 WeightSum = 11 Precision = 16.6
attribute7114: Normal Distribution. Mean = 122.4791 StandardDev = 87.51 WeightSum = 11 Precision = 17.054054054054053
attribute7115: Normal Distribution. Mean = 233.8717 StandardDev = 111.0206 WeightSum = 11 Precision = 11.588235294117647
attribute7116: Normal Distribution. Mean = 307.9662 StandardDev = 139.9155 WeightSum = 11 Precision = 24.37142857142857
attribute7117: Normal Distribution. Mean = -319.0614 StandardDev = 110.253 WeightSum = 11 Precision = 25.43243243243243
attribute7118: Normal Distribution. Mean = -2319.9951 StandardDev = 878.3917 WeightSum = 11 Precision = 105.89189189189189
attribute7119: Normal Distribution. Mean = 378.2703 StandardDev = 120.9712 WeightSum = 11 Precision = 94.56756756756756
attribute7120: Normal Distribution. Mean = 182.4489 StandardDev = 82.9293 WeightSum = 11 Precision = 10.1875
attribute7121: Normal Distribution. Mean = 797.0098 StandardDev = 352.9267 WeightSum = 11 Precision = 38.62162162162162
attribute7122: Normal Distribution. Mean = 11.3143 StandardDev = 56.262 WeightSum = 11 Precision = 11.314285714285715
attribute7123: Normal Distribution. Mean = 348.8624 StandardDev = 134.0911 WeightSum = 11 Precision = 67.32432432432432
attribute7124: Normal Distribution. Mean = -17.8909 StandardDev = 48.2762 WeightSum = 11 Precision = 4.685714285714286
attribute7125: Normal Distribution. Mean = 1109.484 StandardDev = 549.1813 WeightSum = 11 Precision = 57.2972972972973
attribute7126: Normal Distribution. Mean = 326.3333 StandardDev = 147.522 WeightSum = 11 Precision = 29.666666666666668
attribute7127: Normal Distribution. Mean = 8.5 StandardDev = 20.0873 WeightSum = 11 Precision = 5.5
attribute7128: Normal Distribution. Mean = 1145.2208 StandardDev = 1057.6857 WeightSum = 11 Precision = 91.28571428571429
attribute7129: Normal Distribution. Mean = -24.6494 StandardDev = 26.9834 WeightSum = 11 Precision = 3.7142857142857144


Time taken to build model: 0.42 seconds
Time taken to test model on training data: 1.28 seconds

=== Error on training data ===

Correctly Classified Instances 38 100 %
Incorrectly Classified Instances 0 0 %
Kappa statistic 1
K&B Relative Info Score 3744.5181 %
K&B Information Score 33.0001 bits 0.8684 bits/instance
Class complexity | order 0 33.0001 bits 0.8684 bits/instance
Class complexity | scheme 0 bits 0 bits/instance
Complexity improvement (Sf) 33.0001 bits 0.8684 bits/instance
Mean absolute error 0
Root mean squared error 0
Relative absolute error 0 %
Root relative squared error 0 %
Total Number of Instances 38


=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure Class
1 0 1 1 1 ALL
1 0 1 1 1 AML


=== Confusion Matrix ===

a b <-- classified as 27 0 | a = ALL 0 11 | b = AML === Error on test data === Correctly Classified Instances 30 88.2353 % Incorrectly Classified Instances 4 11.7647 % Kappa statistic 0.7518 K&B Relative Info Score 2905.1505 % K&B Information Score 25.6028 bits 0.753 bits/instance Class complexity | order 0 34.609 bits 1.0179 bits/instance Class complexity | scheme 4296 bits 126.3529 bits/instance Complexity improvement (Sf) -4261.391 bits -125.335 bits/instance Mean absolute error 0.1176 Root mean squared error 0.343 Relative absolute error 25.3165 % Root relative squared error 67.9628 % Total Number of Instances 34 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure Class 0.95 0.214 0.864 0.95 0.905 ALL 0.786 0.05 0.917 0.786 0.846 AML === Confusion Matrix === a b <-- classified as 19 1 | a = ALL 3 11 | b = AML Running off predictions


java weka.classifiers.trees.J48 -t $WEKAHOME/data/ALL-AML_train.arff \
-T $WEKAHOME/data/ALL-AML_test.arff -i -k \
-d Leukemia-ALLAML.tree.J48.model -p 0 > Leukemia-ALLAML.tree.J48.out

more Leukemia-ALLAML.tree.J48.out

0 ALL 1.0 ALL
1 ALL 1.0 ALL
2 ALL 1.0 ALL
3 ALL 1.0 ALL
4 ALL 1.0 ALL
5 ALL 1.0 ALL
6 ALL 1.0 ALL
7 ALL 1.0 ALL
8 ALL 1.0 ALL
9 ALL 1.0 ALL
10 ALL 1.0 ALL
11 ALL 1.0 ALL
12 ALL 1.0 ALL
13 ALL 1.0 ALL
14 AML 1.0 ALL
15 ALL 1.0 ALL
16 AML 1.0 ALL
17 ALL 1.0 ALL
18 ALL 1.0 ALL
19 ALL 1.0 ALL
20 AML 1.0 AML
21 AML 1.0 AML
22 AML 1.0 AML
23 AML 1.0 AML
24 AML 1.0 AML
25 AML 1.0 AML
26 AML 1.0 AML
27 AML 1.0 AML
28 AML 1.0 AML
29 AML 1.0 AML
30 ALL 1.0 AML
31 AML 1.0 AML
32 AML 1.0 AML
33 AML 1.0 AML

java -mx1024m weka.classifiers.bayes.NaiveBayes \
-t $WEKAHOME/data/ALL-AML_train.arff \
-T $WEKAHOME/data/ALL-AML_test.arff -i -k \
-d Leukemia-ALLAML.NaiveBayes.J48.model -p 0 > Leukemia-ALLAML.NaiveBayes.J48.pred

The results:

[weka@domU-12-31-36-00-26-23 tutorial]$ ls -l
total 3920
-rw-rw-r-- 1 weka weka 60556 Aug 1 09:13 J48-data.model
-rw-rw-r-- 1 weka weka 12906 Aug 1 09:13 J48-data.out
-rw-rw-r-- 1 weka weka 18784 Aug 1 09:17 J48-segment-data.model
-rw-rw-r-- 1 weka weka 6146 Aug 1 09:17 J48-segment-data.out
-rw-rw-r-- 1 weka weka 1506090 Aug 1 09:41 Leukemia-ALLAML.NaiveBayes.J48.model
-rw-rw-r-- 1 weka weka 1703981 Aug 1 09:36 Leukemia-ALLAML.NaiveBayes.J48.out
-rw-rw-r-- 1 weka weka 535 Aug 1 09:41 Leukemia-ALLAML.NaiveBayes.J48.pred
-rw-rw-r-- 1 weka weka 666093 Aug 1 09:40 Leukemia-ALLAML.tree.J48.model
-rw-rw-r-- 1 weka weka 535 Aug 1 09:40 Leukemia-ALLAML.tree.J48.out

more Leukemia-ALLAML.NaiveBayes.J48.pred

0 ALL 1.0 ALL
1 ALL 1.0 ALL
2 AML 1.0 ALL
3 ALL 1.0 ALL
4 ALL 1.0 ALL
5 ALL 1.0 ALL
6 ALL 1.0 ALL
7 ALL 1.0 ALL
8 ALL 1.0 ALL
9 ALL 1.0 ALL
10 ALL 1.0 ALL
11 ALL 1.0 ALL
12 ALL 1.0 ALL
13 ALL 1.0 ALL
14 ALL 1.0 ALL
15 ALL 1.0 ALL
16 ALL 1.0 ALL
17 ALL 1.0 ALL
18 ALL 1.0 ALL
19 ALL 1.0 ALL
20 AML 1.0 AML
21 AML 1.0 AML
22 AML 1.0 AML
23 AML 1.0 AML
24 ALL 1.0 AML
25 AML 1.0 AML
26 AML 1.0 AML
27 AML 1.0 AML
28 AML 1.0 AML
29 AML 1.0 AML
30 ALL 1.0 AML
31 AML 1.0 AML
32 ALL 1.0 AML
33 AML 1.0 AML