Introduction:
In this series of articles I have been covering getting the proposed Pentaho Cluster running on EC2.
http://blog.vmdatamine.com/2007/09/pentaho-business-suite-cluster-research.html
http://blog.vmdatamine.com/2007/09/pentaho-cluster-installing-jgroups.html
http://blog.vmdatamine.com/2007/11/pentaho-cluster-installing-jgroups-on.html
In the last article, I ran a JGroups cluster test and found the results disappointing compared to the test results published by JBoss.
So given there are new larger instances available now with better performance I decided to see how JGroups would perform on those instances.
The specification of the large instances can be found in Amazon's announcement.
The only change from the last test from an upgrade to Java (JDK 6 Release 3) and running on Amazon public image Fedora 64 bit OS.
I tried both the large and extra large instances in a 4 node cluster setup running TCP.
Comments:
- The network bandwidth is still the limiting factor.
- I had to modify the tcp.xml settings to enable queues to stop the test hanging sporadically.
- The larger 64 bit instances have more throughput vs the small nodes. This could be due to settings or CPU is an underlining factor after all.
- You are not going to reach the JBoss performance results without a faster network.
Results:
Two nodes: 2 senders:
-- results:
10.252.93.220:7800 (myself):
num_msgs_expected=20000, num_msgs_received=20000 (loss rate=0.0%), received=20MB, time=3211ms, msgs/sec=6228.59, throughput=6.23MB
10.252.99.47:7800:
num_msgs_expected=20000, num_msgs_received=20000 (loss rate=0.0%), received=20MB, time=3236ms, msgs/sec=6180.47, throughput=6.18MB
combined: 6204.53 msgs/sec averaged over all receivers (throughput=6.2MB/sec)
Two nodes: 1 sender, 1 receiver
-- results:
10.252.93.220:7800 (myself):
num_msgs_expected=10000, num_msgs_received=10000 (loss rate=0.0%), received=10MB, time=2607ms, msgs/sec=3835.83, throughput=3.84MB
10.252.99.47:7800:
num_msgs_expected=10000, num_msgs_received=10000 (loss rate=0.0%), received=10MB, time=2515ms, msgs/sec=3976.14, throughput=3.98MB
combined: 3905.98 msgs/sec averaged over all receivers
(throughput=3.9MB/sec)
4 nodes: 2 senders, 2 receivers
-- results:
10.252.23.15:7800:
num_msgs_expected=20000, num_msgs_received=20000 (loss rate=0.0%), received=20MB, time=4640ms, msgs/sec=4310.34, throughput=4.31MB
10.252.98.208:7800:
num_msgs_expected=20000, num_msgs_received=20000 (loss rate=0.0%), received=20MB, time=4687ms, msgs/sec=4267.12, throughput=4.27MB
10.252.79.0:7800 (myself):
num_msgs_expected=20000, num_msgs_received=20000 (loss rate=0.0%), received=20MB, time=4642ms, msgs/sec=4308.49, throughput=4.31MB
10.252.93.203:7800:
num_msgs_expected=20000, num_msgs_received=20000 (loss rate=0.0%), received=20MB, time=4654ms, msgs/sec=4297.38, throughput=4.3MB
combined: 4295.83 msgs/sec averaged over all receivers (throughput=4.3MB/sec)
4 nodes: 2 senders, 2 receivers 100k messages test
-- results:
10.252.23.15:7800:
num_msgs_expected=200000, num_msgs_received=200000 (loss rate=0.0%), received=200MB, time=19828ms, msgs/sec=10086.75, throughput=10.09MB
10.252.98.208:7800:
num_msgs_expected=200000, num_msgs_received=200000 (loss rate=0.0%), received=200MB, time=19893ms, msgs/sec=10053.79, throughput=10.05MB
10.252.79.0:7800:
num_msgs_expected=200000, num_msgs_received=200000 (loss rate=0.0%), received=200MB, time=19828ms, msgs/sec=10086.75, throughput=10.09MB
10.252.93.203:7800 (myself):
num_msgs_expected=200000, num_msgs_received=200000 (loss rate=0.0%), received=200MB, time=19917ms, msgs/sec=10041.67, throughput=10.04MB
combined: 10067.24 msgs/sec averaged over all receivers (throughput=10.07MB/sec)
2nd run:
-- results:
10.252.23.15:7800:
num_msgs_expected=200000, num_msgs_received=200000 (loss rate=0.0%), received=200MB, time=20605ms, msgs/sec=9706.38, throughput=9.71MB
10.252.98.208:7800:
num_msgs_expected=200000, num_msgs_received=200000 (loss rate=0.0%), received=200MB, time=20629ms, msgs/sec=9695.09, throughput=9.7MB
10.252.79.0:7800:
num_msgs_expected=200000, num_msgs_received=200000 (loss rate=0.0%), received=200MB, time=20590ms, msgs/sec=9713.45, throughput=9.71MB
10.252.93.203:7800 (myself):
num_msgs_expected=200000, num_msgs_received=200000 (loss rate=0.0%), received=200MB, time=20636ms, msgs/sec=9691.8, throughput=9.69MB
combined: 9701.68 msgs/sec averaged over all receivers (throughput=9.7MB/sec)
Extra large 4 nodes : 2 senders, 2 receivers
-- results:
10.252.106.3:7800:
num_msgs_expected=200000, num_msgs_received=200000 (loss rate=0.0%), received=200MB, time=17857ms, msgs/sec=11200.09, throughput=11.2MB
10.252.15.79:7800:
num_msgs_expected=200000, num_msgs_received=200000 (loss rate=0.0%), received=200MB, time=17831ms, msgs/sec=11216.42, throughput=11.22MB
10.252.10.223:7800 (myself):
num_msgs_expected=200000, num_msgs_received=200000 (loss rate=0.0%), received=200MB, time=17837ms, msgs/sec=11212.65, throughput=11.21MB
10.252.6.223:7800:
num_msgs_expected=200000, num_msgs_received=200000 (loss rate=0.0%), received=200MB, time=17837ms, msgs/sec=11212.65, throughput=11.21MB
combined: 11210.45 msgs/sec averaged over all receivers (throughput=11.21MB/sec)
-- results:
10.252.106.3:7800:
num_msgs_expected=200000, num_msgs_received=200000 (loss rate=0.0%), received=200MB, time=15585ms, msgs/sec=12832.85, throughput=12.83MB
10.252.15.79:7800:
num_msgs_expected=200000, num_msgs_received=200000 (loss rate=0.0%), received=200MB, time=15564ms, msgs/sec=12850.17, throughput=12.85MB
10.252.10.223:7800 (myself):
num_msgs_expected=200000, num_msgs_received=200000 (loss rate=0.0%), received=200MB, time=15563ms, msgs/sec=12850.99, throughput=12.85MB
10.252.6.223:7800:
num_msgs_expected=200000, num_msgs_received=200000 (loss rate=0.0%), received=200MB, time=15563ms, msgs/sec=12850.99, throughput=12.85MB
combined: 12846.25 msgs/sec averaged over all receivers (throughput=12.85MB/sec)
Extra large 4 nodes : 4 senders and TCP queues
-- results:
10.252.106.3:7800:
num_msgs_expected=400000, num_msgs_received=400000 (loss rate=0.0%), received=400MB, time=27784ms, msgs/sec=14396.78, throughput=14.4MB
10.252.15.79:7800:
num_msgs_expected=400000, num_msgs_received=400000 (loss rate=0.0%), received=400MB, time=27744ms, msgs/sec=14417.53, throughput=14.42MB
10.252.10.223:7800 (myself):
num_msgs_expected=400000, num_msgs_received=400000 (loss rate=0.0%), received=400MB, time=27761ms, msgs/sec=14408.7, throughput=14.41MB
10.252.6.223:7800:
num_msgs_expected=400000, num_msgs_received=400000 (loss rate=0.0%), received=400MB, time=27755ms, msgs/sec=14411.82, throughput=14.41MB
combined: 14408.71 msgs/sec averaged over all receivers (throughput=14.41MB/sec)
with TCP queue_max_size set to 1000
10.252.106.3:7800:
num_msgs_expected=400000, num_msgs_received=400000 (loss rate=0.0%), received=400MB, time=26667ms, msgs/sec=14999.81, throughput=15MB
10.252.15.79:7800:
num_msgs_expected=400000, num_msgs_received=400000 (loss rate=0.0%), received=400MB, time=26709ms, msgs/sec=14976.23, throughput=14.98MB
10.252.10.223:7800 (myself):
num_msgs_expected=400000, num_msgs_received=400000 (loss rate=0.0%), received=400MB, time=26676ms, msgs/sec=14994.75, throughput=14.99MB
10.252.6.223:7800:
num_msgs_expected=400000, num_msgs_received=400000 (loss rate=0.0%), received=400MB, time=26710ms, msgs/sec=14975.66, throughput=14.98MB
combined: 14986.61 msgs/sec averaged over all receivers (throughput=14.99MB/sec)
Example of TCP.XML (note > and < to handle HTML)
<>
< TCP start_port="7800"
loopback="false"
discard_incompatible_packets="true"
max_bundle_size="64000"
max_bundle_timeout="30"
use_incoming_packet_handler="true"
enable_bundling="true"
use_send_queues="true"
sock_conn_timeout="300"
skip_suspected_members="true"
use_concurrent_stack="true"
thread_pool.enabled="true"
thread_pool.min_threads="8"
thread_pool.max_threads="40"
thread_pool.keep_alive_time="5000"
thread_pool.queue_enabled="true"
thread_pool.queue_max_size="100"
thread_pool.rejection_policy="run"
oob_thread_pool.enabled="true"
oob_thread_pool.min_threads="8"
oob_thread_pool.max_threads="20"
oob_thread_pool.keep_alive_time="5000"
oob_thread_pool.queue_enabled="true"
oob_thread_pool.queue_max_size="100"
oob_thread_pool.rejection_policy="run"/ >
< TCPPING timeout="3000"
initial_hosts="${jgroups.tcpping.initial_hosts:10.252.106.3[7800],
10.252.15.79[7800],10.252.10.223[7800],10.252.6.223[7800]}"
port_range="1"
num_initial_members="2"/ >