VM Datamining: Weka Data mining on EC2

So once the VM was setup and running. It was time to see how WEKA performed in a virtual environment.

The performance on the EC2 node was good. These are not large datasets and I have a couple of those to play with in the near future. Given you can join the netflix prize competition and download a dataset with 100 Million data points (more than 2 Gig).

If you are surprised at the length of this post. I have found in the past, that when I am the one using a search engine, I want to see as much information as possible. There might be somewhere out there who in the future wants a quick solution to running WEKA without having to read the documentation.

So of this work was guided from the README, once I got the hang of it I got some datasets on Leukemia-ALLAML and ran WEKA on those.

Have Fun

Paul


List options for Weka Classifying

java weka.classifiers.trees.J48

Weka exception: No training file and no object input file given.

General options:

-t 
      Sets training file.
-T 
      Sets test file. If missing, a cross-validation will be performed on the training data.
-c 
      Sets index of class attribute (default: last).
-x 
      Sets number of folds for cross-validation (default: 10).
-s 
      Sets random number seed for cross-validation (default: 1).
-m 
      Sets file with cost matrix.
-l 
      Sets model input file.
-d 
      Sets model output file.
-v
      Outputs no statistics for training data.
-o
      Outputs statistics only, not the classifier.
-i
      Outputs detailed information-retrieval statistics for each class.
-k
      Outputs information-theoretic statistics.
-p 
      Only outputs predictions for test instances, along with attributes (0 for none).
-r
      Only outputs cumulative margin distribution.
-z 
      Only outputs the source representation of the classifier, giving it the supplied name.
-g
      Only outputs the graph representation of the classifier.

Options specific to weka.classifiers.trees.J48:

-U
      Use unpruned tree.
-C 
      Set confidence threshold for pruning.
      (default 0.25)
-M 
      Set minimum number of instances per leaf.
      (default 2)
-R
      Use reduced error pruning.
-N 
      Set number of folds for reduced error
      pruning. One fold is used as pruning set.
      (default 3)
-B
      Use binary splits only.
-S
      Don't perform subtree raising.
-L
      Do not clean up after the tree has been built.
-A
      Laplace smoothing for predicted probabilities.
-Q 
      Seed for random data shuffling (default 1).

Running the NaiveBayes Classifier on the labor dataset

java weka.classifiers.bayes.NaiveBayes -t $WEKAHOME/data/labor.arff

Naive Bayes Classifier

Class bad: Prior probability = 0.36

duration:  Normal Distribution. Mean = 2 StandardDev = 0.7071 WeightSum = 20 Precision = 1.0
wage-increase-first-year:  Normal Distribution. Mean = 2.6563 StandardDev = 0.8643 WeightSum = 20 Precision = 0.3125
wage-increase-second-year:  Normal Distribution. Mean = 2.9524 StandardDev = 0.8193 WeightSum = 15 Precision = 0.35714285714285715
wage-increase-third-year:  Normal Distribution. Mean = 2.0344 StandardDev = 0.1678 WeightSum = 4 Precision = 0.38749999999999996
cost-of-living-adjustment:  Discrete Estimator. Counts =  10 2 6  (Total = 18)
working-hours:  Normal Distribution. Mean = 39.4887 StandardDev = 1.8903 WeightSum = 19 Precision = 1.8571428571428572
pension:  Discrete Estimator. Counts =  12 3 6  (Total = 21)
standby-pay:  Normal Distribution. Mean = 2.5 StandardDev = 0.866 WeightSum = 4 Precision = 2.0
shift-differential:  Normal Distribution. Mean = 2.4691 StandardDev = 1.5738 WeightSum = 9 Precision = 2.7777777777777777
education-allowance:  Discrete Estimator. Counts =  4 10  (Total = 14)
statutory-holidays:  Normal Distribution. Mean = 10.2 StandardDev = 0.805 WeightSum = 20 Precision = 1.2
vacation:  Discrete Estimator. Counts =  12 8 3  (Total = 23)
longterm-disability-assistance:  Discrete Estimator. Counts =  6 9  (Total = 15)
contribution-to-dental-plan:  Discrete Estimator. Counts =  8 8 1  (Total = 17)
bereavement-assistance:  Discrete Estimator. Counts =  10 4  (Total = 14)
contribution-to-health-plan:  Discrete Estimator. Counts =  9 3 7  (Total = 19)


Class good: Prior probability = 0.64

duration:  Normal Distribution. Mean = 2.25 StandardDev = 0.6821 WeightSum = 36 Precision = 1.0
wage-increase-first-year:  Normal Distribution. Mean = 4.3837 StandardDev = 1.1773 WeightSum = 36 Precision = 0.3125
wage-increase-second-year:  Normal Distribution. Mean = 4.447 StandardDev = 0.9805 WeightSum = 31 Precision = 0.35714285714285715
wage-increase-third-year:  Normal Distribution. Mean = 4.5795 StandardDev = 0.7893 WeightSum = 11 Precision = 0.38749999999999996
cost-of-living-adjustment:  Discrete Estimator. Counts =  14 8 3  (Total = 25)
working-hours:  Normal Distribution. Mean = 37.5491 StandardDev = 2.9266 WeightSum = 32 Precision = 1.8571428571428572
pension:  Discrete Estimator. Counts =  1 3 8  (Total = 12)
standby-pay:  Normal Distribution. Mean = 11.2 StandardDev = 2.0396 WeightSum = 5 Precision = 2.0
shift-differential:  Normal Distribution. Mean = 5.6818 StandardDev = 5.0584 WeightSum = 22 Precision = 2.7777777777777777
education-allowance:  Discrete Estimator. Counts =  8 4  (Total = 12)
statutory-holidays:  Normal Distribution. Mean = 11.4182 StandardDev = 1.2224 WeightSum = 33 Precision = 1.2
vacation:  Discrete Estimator. Counts =  8 11 15  (Total = 34)
longterm-disability-assistance:  Discrete Estimator. Counts =  16 1  (Total = 17)
contribution-to-dental-plan:  Discrete Estimator. Counts =  3 9 14  (Total = 26)
bereavement-assistance:  Discrete Estimator. Counts =  19 1  (Total = 20)
contribution-to-health-plan:  Discrete Estimator. Counts =  1 8 15  (Total = 24)


Time taken to build model: 0.01 seconds
Time taken to test model on training data: 0.01 seconds

=== Error on training data ===

Correctly Classified Instances          56               98.2456 %
Incorrectly Classified Instances         1                1.7544 %
Kappa statistic                          0.961
Mean absolute error                      0.0481
Root mean squared error                  0.1532
Relative absolute error                 10.5249 %
Root relative squared error             32.1057 %
Total Number of Instances               57


=== Confusion Matrix ===

a  b   <-- classified as  19  1 |  a = bad   0 37 |  b = good    === Stratified cross-validation ===  Correctly Classified Instances          51               89.4737 % Incorrectly Classified Instances         6               10.5263 % Kappa statistic                          0.7741 Mean absolute error                      0.1042 Root mean squared error                  0.2637 Relative absolute error                 22.7763 % Root relative squared error             55.2266 % Total Number of Instances               57   === Confusion Matrix ===    a  b   <-- classified as  18  2 |  a = bad   4 33 |  b = good  Trying a different classifier from the list on the same dataset

java weka.classifiers.lazy.IBk -t $WEKAHOME/data/labor.arff

IB1 instance-based classifier
using 1 nearest neighbour(s) for classification


Time taken to build model: 0 seconds
Time taken to test model on training data: 0.02 seconds

=== Error on training data ===

Correctly Classified Instances          57              100      %
Incorrectly Classified Instances         0                0      %
Kappa statistic                          1
Mean absolute error                      0.0169
Root mean squared error                  0.0169
Relative absolute error                  3.7085 %
Root relative squared error              3.5513 %
Total Number of Instances               57


=== Confusion Matrix ===

a  b   <-- classified as  20  0 |  a = bad   0 37 |  b = good    === Stratified cross-validation ===  Correctly Classified Instances          47               82.4561 % Incorrectly Classified Instances        10               17.5439 % Kappa statistic                          0.6235 Mean absolute error                      0.1876 Root mean squared error                  0.4113 Relative absolute error                 41.0144 % Root relative squared error             86.1487 % Total Number of Instances               57   === Confusion Matrix ===    a  b   <-- classified as  16  4 |  a = bad   6 31 |  b = good  What the dataset looks like ARFF format

What ARFF files look like

cat $WEKAHOME/data/labor.arff

% Date: Tue, 15 Nov 88 15:44:08 EST
% From: stan 
% To: aha@ICS.UCI.EDU
%
% 1. Title: Final settlements in labor negotitions in Canadian industry
%
% 2. Source Information
%    -- Creators: Collective Barganing Review, montly publication,
%       Labour Canada, Industrial Relations Information Service,
%         Ottawa, Ontario, K1A 0J2, Canada, (819) 997-3117
%         The data includes all collective agreements reached
%         in the business and personal services sector for locals
%         with at least 500 members (teachers, nurses, university
%         staff, police, etc) in Canada in 87 and first quarter of 88.
%    -- Donor: Stan Matwin, Computer Science Dept, University of Ottawa,
%                 34 Somerset East, K1N 9B4, (stan@uotcsi2.bitnet)
%    -- Date: November 1988
%
% 3. Past Usage:
%    -- testing concept learning software, in particular
%       an experimental method to learn two-tiered concept descriptions.
%       The data was used to learn the description of an acceptable
%       and unacceptable contract.
%       The unacceptable contracts were either obtained by interviewing
%       experts, or by inventing near misses.
%       Examples of use are described in:
%         Bergadano, F., Matwin, S., Michalski, R.,
%         Zhang, J., Measuring Quality of Concept Descriptions,
%         Procs. of the 3rd European Working Sessions on Learning,
%         Glasgow, October 1988.
%         Bergadano, F., Matwin, S., Michalski, R., Zhang, J.,
%         Representing and Acquiring Imprecise and Context-dependent
%         Concepts in Knowledge-based Systems, Procs. of ISMIS'88,
%         North Holland, 1988.
% 4. Relevant Information:
%    -- data was used to test 2tier approach with learning
% from positive and negative examples
%
% 5. Number of Instances: 57
%
% 6. Number of Attributes: 16
%
% 7. Attribute Information:
%    1.  dur: duration of agreement
%        [1..7]
%    2   wage1.wage : wage increase in first year of contract
%        [2.0 .. 7.0]
%    3   wage2.wage : wage increase in second year of contract
%        [2.0 .. 7.0]
%    4   wage3.wage : wage increase in third year of contract
%        [2.0 .. 7.0]
%    5   cola : cost of living allowance
%        [none, tcf, tc]
%    6   hours.hrs : number of working hours during week
%        [35 .. 40]
%    7   pension : employer contributions to pension plan
%        [none, ret_allw, empl_contr]
%    8   stby_pay : standby pay
%        [2 .. 25]
%    9   shift_diff : shift differencial : supplement for work on II and III shift
%        [1 .. 25]
%   10   educ_allw.boolean : education allowance
%        [true false]
%   11   holidays : number of statutory holidays
%        [9 .. 15]
%   12   vacation : number of paid vacation days
%        [ba, avg, gnr]
%   13   lngtrm_disabil.boolean :
%        employer's help during employee longterm disabil
%        ity [true , false]
%   14   dntl_ins : employers contribution towards the dental plan
%        [none, half, full]
%   15   bereavement.boolean : employer's financial contribution towards the
%        covering the costs of bereavement
%        [true , false]
%   16   empl_hplan : employer's contribution towards the health plan
%        [none, half, full]
%
% 8. Missing Attribute Values: None
%
% 9. Class Distribution:
%
% 10. Exceptions from format instructions: no commas between attribute values.
%
%
@relation 'labor-neg-data'
@attribute 'duration' real
@attribute 'wage-increase-first-year' real
@attribute 'wage-increase-second-year' real
@attribute 'wage-increase-third-year' real
@attribute 'cost-of-living-adjustment' {'none','tcf','tc'}
@attribute 'working-hours' real
@attribute 'pension' {'none','ret_allw','empl_contr'}
@attribute 'standby-pay' real
@attribute 'shift-differential' real
@attribute 'education-allowance' {'yes','no'}
@attribute 'statutory-holidays' real
@attribute 'vacation' {'below_average','average','generous'}
@attribute 'longterm-disability-assistance' {'yes','no'}
@attribute 'contribution-to-dental-plan' {'none','half','full'}
@attribute 'bereavement-assistance' {'yes','no'}
@attribute 'contribution-to-health-plan' {'none','half','full'}
@attribute 'class' {'bad','good'}
@data
1,5,?,?,?,40,?,?,2,?,11,'average',?,?,'yes',?,'good'
2,4.5,5.8,?,?,35,'ret_allw',?,?,'yes',11,'below_average',?,'full',?,'full','good'
?,?,?,?,?,38,'empl_contr',?,5,?,11,'generous','yes','half','yes','half','good'
3,3.7,4,5,'tc',?,?,?,?,'yes',?,?,?,?,'yes',?,'good'
3,4.5,4.5,5,?,40,?,?,?,?,12,'average',?,'half','yes','half','good'
2,2,2.5,?,?,35,?,?,6,'yes',12,'average',?,?,?,?,'good'
3,4,5,5,'tc',?,'empl_contr',?,?,?,12,'generous','yes','none','yes','half','good'
3,6.9,4.8,2.3,?,40,?,?,3,?,12,'below_average',?,?,?,?,'good'
2,3,7,?,?,38,?,12,25,'yes',11,'below_average','yes','half','yes',?,'good'
1,5.7,?,?,'none',40,'empl_contr',?,4,?,11,'generous','yes','full',?,?,'good'
3,3.5,4,4.6,'none',36,?,?,3,?,13,'generous',?,?,'yes','full','good'
2,6.4,6.4,?,?,38,?,?,4,?,15,?,?,'full',?,?,'good'
2,3.5,4,?,'none',40,?,?,2,'no',10,'below_average','no','half',?,'half','bad'
3,3.5,4,5.1,'tcf',37,?,?,4,?,13,'generous',?,'full','yes','full','good'
1,3,?,?,'none',36,?,?,10,'no',11,'generous',?,?,?,?,'good'
2,4.5,4,?,'none',37,'empl_contr',?,?,?,11,'average',?,'full','yes',?,'good'
1,2.8,?,?,?,35,?,?,2,?,12,'below_average',?,?,?,?,'good'
1,2.1,?,?,'tc',40,'ret_allw',2,3,'no',9,'below_average','yes','half',?,'none','bad'
1,2,?,?,'none',38,'none',?,?,'yes',11,'average','no','none','no','none','bad'
2,4,5,?,'tcf',35,?,13,5,?,15,'generous',?,?,?,?,'good'
2,4.3,4.4,?,?,38,?,?,4,?,12,'generous',?,'full',?,'full','good'
2,2.5,3,?,?,40,'none',?,?,?,11,'below_average',?,?,?,?,'bad'
3,3.5,4,4.6,'tcf',27,?,?,?,?,?,?,?,?,?,?,'good'
2,4.5,4,?,?,40,?,?,4,?,10,'generous',?,'half',?,'full','good'
1,6,?,?,?,38,?,8,3,?,9,'generous',?,?,?,?,'good'
3,2,2,2,'none',40,'none',?,?,?,10,'below_average',?,'half','yes','full','bad'
2,4.5,4.5,?,'tcf',?,?,?,?,'yes',10,'below_average','yes','none',?,'half','good'
2,3,3,?,'none',33,?,?,?,'yes',12,'generous',?,?,'yes','full','good'
2,5,4,?,'none',37,?,?,5,'no',11,'below_average','yes','full','yes','full','good'
3,2,2.5,?,?,35,'none',?,?,?,10,'average',?,?,'yes','full','bad'
3,4.5,4.5,5,'none',40,?,?,?,'no',11,'average',?,'half',?,?,'good'
3,3,2,2.5,'tc',40,'none',?,5,'no',10,'below_average','yes','half','yes','full','bad'
2,2.5,2.5,?,?,38,'empl_contr',?,?,?,10,'average',?,?,?,?,'bad'
2,4,5,?,'none',40,'none',?,3,'no',10,'below_average','no','none',?,'none','bad'
3,2,2.5,2.1,'tc',40,'none',2,1,'no',10,'below_average','no','half','yes','full','bad'
2,2,2,?,'none',40,'none',?,?,'no',11,'average','yes','none','yes','full','bad'
1,2,?,?,'tc',40,'ret_allw',4,0,'no',11,'generous','no','none','no','none','bad'
1,2.8,?,?,'none',38,'empl_contr',2,3,'no',9,'below_average','yes','half',?,'none','bad'
3,2,2.5,2,?,37,'empl_contr',?,?,?,10,'average',?,?,'yes','none','bad'
2,4.5,4,?,'none',40,?,?,4,?,12,'average','yes','full','yes','half','good'
1,4,?,?,'none',?,'none',?,?,'yes',11,'average','no','none','no','none','bad'
2,2,3,?,'none',38,'empl_contr',?,?,'yes',12,'generous','yes','none','yes','full','bad'
2,2.5,2.5,?,'tc',39,'empl_contr',?,?,?,12,'average',?,?,'yes',?,'bad'
2,2.5,3,?,'tcf',40,'none',?,?,?,11,'below_average',?,?,'yes',?,'bad'
2,4,4,?,'none',40,'none',?,3,?,10,'below_average','no','none',?,'none','bad'
2,4.5,4,?,?,40,?,?,2,'no',10,'below_average','no','half',?,'half','bad'
2,4.5,4,?,'none',40,?,?,5,?,11,'average',?,'full','yes','full','good'
2,4.6,4.6,?,'tcf',38,?,?,?,?,?,?,'yes','half',?,'half','good'
2,5,4.5,?,'none',38,?,14,5,?,11,'below_average','yes',?,?,'full','good'
2,5.7,4.5,?,'none',40,'ret_allw',?,?,?,11,'average','yes','full','yes','full','good'
2,7,5.3,?,?,?,?,?,?,?,11,?,'yes','full',?,?,'good'
3,2,3,?,'tcf',?,'empl_contr',?,?,'yes',?,?,'yes','half','yes',?,'good'
3,3.5,4,4.5,'tcf',35,?,?,?,?,13,'generous',?,?,'yes','full','good'
3,4,3.5,?,'none',40,'empl_contr',?,6,?,11,'average','yes','full',?,'full','good'
3,5,4.4,?,'none',38,'empl_contr',10,6,?,11,'generous','yes',?,?,'full','good'
3,5,5,5,?,40,?,?,?,?,12,'average',?,'half','yes','half','good'
3,6,6,4,?,35,?,?,14,?,9,'generous','yes','full','yes','full','good'
%
%
%

Basic Statistics and Validation of dataset

java weka.core.Instances  $WEKAHOME/data/labor.arff

Relation Name:  labor-neg-data
Num Instances:  57
Num Attributes: 17

   Name                      Type  Nom  Int Real     Missing      Unique  Dist
 1 duration                   Num   0%  98%   0%     1 /  2%     0 /  0%     3
 2 wage-increase-first-year   Num   0%  49%  49%     1 /  2%     7 / 12%    17
 3 wage-increase-second-year  Num   0%  47%  33%    11 / 19%     8 / 14%    15
 4 wage-increase-third-year   Num   0%  14%  12%    42 / 74%     6 / 11%     9
 5 cost-of-living-adjustment  Nom  65%   0%   0%    20 / 35%     0 /  0%     3
 6 working-hours              Num   0%  89%   0%     6 / 11%     3 /  5%     8
 7 pension                    Nom  47%   0%   0%    30 / 53%     0 /  0%     3
 8 standby-pay                Num   0%  16%   0%    48 / 84%     6 / 11%     7
 9 shift-differential         Num   0%  54%   0%    26 / 46%     5 /  9%    10
10 education-allowance        Nom  39%   0%   0%    35 / 61%     0 /  0%     2
11 statutory-holidays         Num   0%  93%   0%     4 /  7%     0 /  0%     6
12 vacation                   Nom  89%   0%   0%     6 / 11%     0 /  0%     3
13 longterm-disability-assis  Nom  49%   0%   0%    29 / 51%     0 /  0%     2
14 contribution-to-dental-pl  Nom  65%   0%   0%    20 / 35%     0 /  0%     3
15 bereavement-assistance     Nom  53%   0%   0%    27 / 47%     0 /  0%     2
16 contribution-to-health-pl  Nom  65%   0%   0%    20 / 35%     0 /  0%     3
17 class                      Nom 100%   0%   0%     0 /  0%     0 /  0%     2

Trying Associations

java weka.associations.Apriori -t $WEKAHOME/data/weather.nominal.arff

Apriori
=======

Minimum support: 0.15 (2 instances)
Minimum metric : 0.9
Number of cycles performed: 17

Generated sets of large itemsets:

Size of set of large itemsets L(1): 12

Size of set of large itemsets L(2): 47

Size of set of large itemsets L(3): 39

Size of set of large itemsets L(4): 6

Best rules found:

1. humidity=normal windy=FALSE 4 ==> play=yes 4    conf:(1)
2. temperature=cool 4 ==> humidity=normal 4    conf:(1)
3. outlook=overcast 4 ==> play=yes 4    conf:(1)
4. temperature=cool play=yes 3 ==> humidity=normal 3    conf:(1)
5. outlook=rainy windy=FALSE 3 ==> play=yes 3    conf:(1)
6. outlook=rainy play=yes 3 ==> windy=FALSE 3    conf:(1)
7. outlook=sunny humidity=high 3 ==> play=no 3    conf:(1)
8. outlook=sunny play=no 3 ==> humidity=high 3    conf:(1)
9. temperature=cool windy=FALSE 2 ==> humidity=normal play=yes 2    conf:(1)
10. temperature=cool humidity=normal windy=FALSE 2 ==> play=yes 2    conf:(1)

 Trying FILTER 

java weka.filters.supervised.attribute.Discretize \
-i $WEKAHOME/data/iris.arff -c last

@relation iris-weka.filters.supervised.attribute.Discretize-Rfirst-last

@attribute sepallength {'\'(-inf-5.55]\'','\'(5.55-6.15]\'','\'(6.15-inf)\''}
@attribute sepalwidth {'\'(-inf-2.95]\'','\'(2.95-3.35]\'','\'(3.35-inf)\''}
@attribute petallength {'\'(-inf-2.45]\'','\'(2.45-4.75]\'','\'(4.75-inf)\''}
@attribute petalwidth {'\'(-inf-0.8]\'','\'(0.8-1.75]\'','\'(1.75-inf)\''}
@attribute class {Iris-setosa,Iris-versicolor,Iris-virginica}

@data

'\'(-inf-5.55]\'','\'(3.35-inf)\'','\'(-inf-2.45]\'','\'(-inf-0.8]\'',Iris-setosa
'\'(-inf-5.55]\'','\'(2.95-3.35]\'','\'(-inf-2.45]\'','\'(-inf-0.8]\'',Iris-setosa
'\'(-inf-5.55]\'','\'(2.95-3.35]\'','\'(-inf-2.45]\'','\'(-inf-0.8]\'',Iris-setosa
'\'(-inf-5.55]\'','\'(2.95-3.35]\'','\'(-inf-2.45]\'','\'(-inf-0.8]\'',Iris-setosa
'\'(-inf-5.55]\'','\'(3.35-inf)\'','\'(-inf-2.45]\'','\'(-inf-0.8]\'',Iris-setosa
'\'(-inf-5.55]\'','\'(3.35-inf)\'','\'(-inf-2.45]\'','\'(-inf-0.8]\'',Iris-setosa
'\'(-inf-5.55]\'','\'(3.35-inf)\'','\'(-inf-2.45]\'','\'(-inf-0.8]\'',Iris-setosa
'\'(-inf-5.55]\'','\'(3.35-inf)\'','\'(-inf-2.45]\'','\'(-inf-0.8]\'',Iris-setosa
'\'(-inf-5.55]\'','\'(-inf-2.95]\'','\'(-inf-2.45]\'','\'(-inf-0.8]\'',Iris-setosa
'\'(-inf-5.55]\'','\'(2.95-3.35]\'','\'(-inf-2.45]\'','\'(-inf-0.8]\'',Iris-setosa
'\'(-inf-5.55]\'','\'(3.35-inf)\'','\'(-inf-2.45]\'','\'(-inf-0.8]\'',Iris-setosa
'\'(-inf-5.55]\'','\'(3.35-inf)\'','\'(-inf-2.45]\'','\'(-inf-0.8]\'',Iris-setosa
...


Running an experiment

java weka.experiment.Experiment -r -T $WEKAHOME/data/iris.arff \
-D weka.experiment.InstancesResultListener \
-P weka.experiment.RandomSplitResultProducer --  \
-W weka.experiment.ClassifierSplitEvaluator --  \
-W weka.classifiers.rules.OneR

Experiment:
Runs from: 1 to: 10
Datasets: /usr/local/weka/data/iris.arff
Custom property iterator: off
ResultProducer: RandomSplitResultProducer: -P 66.0 -W weka.experiment.ClassifierSplitEvaluator --: 
ResultListener: weka.experiment.InstancesResultListener@1270b73

Initializing...
RandomSplitResultProducer: setting additional measures for split evaluator
Iterating...
Postprocessing...

Running the Lazy Classifier on larger dataset:

java weka.classifiers.lazy.IBk -t $WEKAHOME/data/soybean.arff

IB1 instance-based classifier
using 1 nearest neighbour(s) for classification


Time taken to build model: 0.01 seconds
Time taken to test model on training data: 4.38 seconds

=== Error on training data ===

Correctly Classified Instances         682               99.8536 %
Incorrectly Classified Instances         1                0.1464 %
Kappa statistic                          0.9984
Mean absolute error                      0.0029
Root mean squared error                  0.0152
Relative absolute error                  2.9949 %
Root relative squared error              6.9346 %
Total Number of Instances              683


=== Confusion Matrix ===

a  b  c  d  e  f  g  h  i  j  k  l  m  n  o  p  q  r  s   <-- classified as  20  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 |  a = diaporthe-stem-canker   0 20  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 |  b = charcoal-rot   0  0 20  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 |  c = rhizoctonia-root-rot   0  0  0 88  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 |  d = phytophthora-rot   0  0  0  0 44  0  0  0  0  0  0  0  0  0  0  0  0  0  0 |  e = brown-stem-rot   0  0  0  0  0 20  0  0  0  0  0  0  0  0  0  0  0  0  0 |  f = powdery-mildew   0  0  0  0  0  0 20  0  0  0  0  0  0  0  0  0  0  0  0 |  g = downy-mildew   0  0  0  0  0  0  0 92  0  0  0  0  0  0  0  0  0  0  0 |  h = brown-spot   0  0  0  0  0  0  0  0 20  0  0  0  0  0  0  0  0  0  0 |  i = bacterial-blight   0  0  0  0  0  0  0  0  0 20  0  0  0  0  0  0  0  0  0 |  j = bacterial-pustule   0  0  0  0  0  0  0  0  0  0 20  0  0  0  0  0  0  0  0 |  k = purple-seed-stain   0  0  0  0  0  0  0  0  0  0  0 44  0  0  0  0  0  0  0 |  l = anthracnose   0  0  0  0  0  0  0  0  0  0  0  0 20  0  0  0  0  0  0 |  m = phyllosticta-leaf-spot   0  0  0  0  0  0  0  0  0  0  0  0  0 91  0  0  0  0  0 |  n = alternarialeaf-spot   0  0  0  0  0  0  0  0  0  0  0  0  0  1 90  0  0  0  0 |  o = frog-eye-leaf-spot   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 15  0  0  0 |  p = diaporthe-pod-&-stem-blight   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 14  0  0 |  q = cyst-nematode   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 16  0 |  r = 2-4-d-injury   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  8 |  s = herbicide-injury    === Stratified cross-validation ===  Correctly Classified Instances         623               91.2152 % Incorrectly Classified Instances        60                8.7848 % Kappa statistic                          0.9036 Mean absolute error                      0.0122 Root mean squared error                  0.0879 Relative absolute error                 12.71   % Root relative squared error             40.1285 % Total Number of Instances              683   === Confusion Matrix ===    a  b  c  d  e  f  g  h  i  j  k  l  m  n  o  p  q  r  s   <-- classified as  20  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 |  a = diaporthe-stem-canker   0 20  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 |  b = charcoal-rot   0  0 20  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 |  c = rhizoctonia-root-rot   0  0  0 88  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 |  d = phytophthora-rot   0  0  0  0 44  0  0  0  0  0  0  0  0  0  0  0  0  0  0 |  e = brown-stem-rot   0  0  0  0  0 20  0  0  0  0  0  0  0  0  0  0  0  0  0 |  f = powdery-mildew   0  0  0  0  0  0 20  0  0  0  0  0  0  0  0  0  0  0  0 |  g = downy-mildew   0  0  0  0  0  0  0 81  0  0  0  0  5  4  2  0  0  0  0 |  h = brown-spot   0  0  0  0  0  0  0  0 19  1  0  0  0  0  0  0  0  0  0 |  i = bacterial-blight   0  0  0  0  0  0  0  0  2 17  0  0  1  0  0  0  0  0  0 |  j = bacterial-pustule   0  0  0  0  0  0  0  0  0  0 20  0  0  0  0  0  0  0  0 |  k = purple-seed-stain   0  0  0  0  0  0  0  0  0  0  0 44  0  0  0  0  0  0  0 |  l = anthracnose   0  0  0  0  0  0  0  6  0  0  0  0 13  0  1  0  0  0  0 |  m = phyllosticta-leaf-spot   0  0  0  0  0  0  0  4  0  0  0  0  0 81  6  0  0  0  0 |  n = alternarialeaf-spot   0  0  0  0  0  0  0  3  0  0  0  0  0 17 71  0  0  0  0 |  o = frog-eye-leaf-spot   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 15  0  0  0 |  p = diaporthe-pod-&-stem-blight   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 14  0  0 |  q = cyst-nematode   2  1  0  2  0  0  0  0  0  0  0  0  0  0  0  0  0  8  3 |  r = 2-4-d-injury   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  8 |  s = herbicide-injury   


Testing the Instances call

java weka.core.Instances  $WEKAHOME/data/soybean.arff

Relation Name:  soybean
Num Instances:  683
Num Attributes: 36

   Name                      Type  Nom  Int Real     Missing      Unique  Dist
 1 date                       Nom 100%   0%   0%     1 /  0%     0 /  0%     7
 2 plant-stand                Nom  95%   0%   0%    36 /  5%     0 /  0%     2
 3 precip                     Nom  94%   0%   0%    38 /  6%     0 /  0%     3
 4 temp                       Nom  96%   0%   0%    30 /  4%     0 /  0%     3
 5 hail                       Nom  82%   0%   0%   121 / 18%     0 /  0%     2
 6 crop-hist                  Nom  98%   0%   0%    16 /  2%     0 /  0%     4
 7 area-damaged               Nom 100%   0%   0%     1 /  0%     0 /  0%     4
 8 severity                   Nom  82%   0%   0%   121 / 18%     0 /  0%     3
 9 seed-tmt                   Nom  82%   0%   0%   121 / 18%     0 /  0%     3
10 germination                Nom  84%   0%   0%   112 / 16%     0 /  0%     3
11 plant-growth               Nom  98%   0%   0%    16 /  2%     0 /  0%     2
12 leaves                     Nom 100%   0%   0%     0 /  0%     0 /  0%     2
13 leafspots-halo             Nom  88%   0%   0%    84 / 12%     0 /  0%     3
14 leafspots-marg             Nom  88%   0%   0%    84 / 12%     0 /  0%     3
15 leafspot-size              Nom  88%   0%   0%    84 / 12%     0 /  0%     3
16 leaf-shread                Nom  85%   0%   0%   100 / 15%     0 /  0%     2
17 leaf-malf                  Nom  88%   0%   0%    84 / 12%     0 /  0%     2
18 leaf-mild                  Nom  84%   0%   0%   108 / 16%     0 /  0%     3
19 stem                       Nom  98%   0%   0%    16 /  2%     0 /  0%     2
20 lodging                    Nom  82%   0%   0%   121 / 18%     0 /  0%     2
21 stem-cankers               Nom  94%   0%   0%    38 /  6%     0 /  0%     4
22 canker-lesion              Nom  94%   0%   0%    38 /  6%     0 /  0%     4
23 fruiting-bodies            Nom  84%   0%   0%   106 / 16%     0 /  0%     2
24 external-decay             Nom  94%   0%   0%    38 /  6%     0 /  0%     3
25 mycelium                   Nom  94%   0%   0%    38 /  6%     0 /  0%     2
26 int-discolor               Nom  94%   0%   0%    38 /  6%     0 /  0%     3
27 sclerotia                  Nom  94%   0%   0%    38 /  6%     0 /  0%     2
28 fruit-pods                 Nom  88%   0%   0%    84 / 12%     0 /  0%     4
29 fruit-spots                Nom  84%   0%   0%   106 / 16%     0 /  0%     4
30 seed                       Nom  87%   0%   0%    92 / 13%     0 /  0%     2
31 mold-growth                Nom  87%   0%   0%    92 / 13%     0 /  0%     2
32 seed-discolor              Nom  84%   0%   0%   106 / 16%     0 /  0%     2
33 seed-size                  Nom  87%   0%   0%    92 / 13%     0 /  0%     2
34 shriveling                 Nom  84%   0%   0%   106 / 16%     0 /  0%     2
35 roots                      Nom  95%   0%   0%    31 /  5%     0 /  0%     3
36 class                      Nom 100%   0%   0%     0 /  0%     0 /  0%    19

Using NaiveBayes Classifier on soybean data

java weka.classifiers.bayes.NaiveBayes -t $WEKAHOME/data/soybean.arff

Naive Bayes Classifier

Class diaporthe-stem-canker: Prior probability = 0.03

date:  Discrete Estimator. Counts =  1 1 1 6 6 6 6  (Total = 27)
plant-stand:  Discrete Estimator. Counts =  21 1  (Total = 22)
precip:  Discrete Estimator. Counts =  1 1 21  (Total = 23)
temp:  Discrete Estimator. Counts =  1 21 1  (Total = 23)
hail:  Discrete Estimator. Counts =  20 2  (Total = 22)
crop-hist:  Discrete Estimator. Counts =  1 7 8 8  (Total = 24)
area-damaged:  Discrete Estimator. Counts =  18 4 1 1  (Total = 24)
severity:  Discrete Estimator. Counts =  1 15 7  (Total = 23)
seed-tmt:  Discrete Estimator. Counts =  12 10 1  (Total = 23)
...

Time taken to build model: 0.01 seconds
Time taken to test model on training data: 0.11 seconds

=== Error on training data ===

Correctly Classified Instances         640               93.7042 %
Incorrectly Classified Instances        43                6.2958 %
Kappa statistic                          0.931
Mean absolute error                      0.0081
Root mean squared error                  0.0765
Relative absolute error                  8.4277 %
Root relative squared error             34.8958 %
Total Number of Instances              683


=== Confusion Matrix ===

a  b  c  d  e  f  g  h  i  j  k  l  m  n  o  p  q  r  s   <-- classified as  20  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 |  a = diaporthe-stem-canker   0 20  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 |  b = charcoal-rot   0  0 20  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 |  c = rhizoctonia-root-rot   0  0  0 88  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 |  d = phytophthora-rot   0  0  0  0 44  0  0  0  0  0  0  0  0  0  0  0  0  0  0 |  e = brown-stem-rot   0  0  0  0  0 20  0  0  0  0  0  0  0  0  0  0  0  0  0 |  f = powdery-mildew   0  0  0  0  0  0 20  0  0  0  0  0  0  0  0  0  0  0  0 |  g = downy-mildew   0  0  0  0  0  0  0 79  0  0  0  0  5  4  4  0  0  0  0 |  h = brown-spot   0  0  0  0  0  0  0  0 20  0  0  0  0  0  0  0  0  0  0 |  i = bacterial-blight   0  0  0  0  0  0  0  0  1 19  0  0  0  0  0  0  0  0  0 |  j = bacterial-pustule   0  0  0  0  0  0  0  0  0  0 20  0  0  0  0  0  0  0  0 |  k = purple-seed-stain   0  0  0  0  0  0  0  0  0  0  0 44  0  0  0  0  0  0  0 |  l = anthracnose   0  0  0  0  0  0  0  2  0  0  0  0 18  0  0  0  0  0  0 |  m = phyllosticta-leaf-spot   0  0  0  0  0  0  0  0  0  0  0  0  0 91  0  0  0  0  0 |  n = alternarialeaf-spot   0  0  0  0  0  0  0  3  0  0  0  0  0 21 66  1  0  0  0 |  o = frog-eye-leaf-spot   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 15  0  0  0 |  p = diaporthe-pod-&-stem-blight   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 14  0  0 |  q = cyst-nematode   0  0  0  2  0  0  0  0  0  0  0  0  0  0  0  0  0 14  0 |  r = 2-4-d-injury   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  8 |  s = herbicide-injury    === Stratified cross-validation ===  Correctly Classified Instances         635               92.9722 % Incorrectly Classified Instances        48                7.0278 % Kappa statistic                          0.923 Mean absolute error                      0.0096 Root mean squared error                  0.0817 Relative absolute error                  9.9344 % Root relative squared error             37.2742 % Total Number of Instances              683   === Confusion Matrix ===    a  b  c  d  e  f  g  h  i  j  k  l  m  n  o  p  q  r  s   <-- classified as  20  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 |  a = diaporthe-stem-canker   0 20  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 |  b = charcoal-rot   0  0 20  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 |  c = rhizoctonia-root-rot   0  0  0 88  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 |  d = phytophthora-rot   0  0  0  0 44  0  0  0  0  0  0  0  0  0  0  0  0  0  0 |  e = brown-stem-rot   0  0  0  0  0 20  0  0  0  0  0  0  0  0  0  0  0  0  0 |  f = powdery-mildew   0  0  0  0  0  0 20  0  0  0  0  0  0  0  0  0  0  0  0 |  g = downy-mildew   0  0  0  0  0  0  0 77  0  0  0  0  5  6  4  0  0  0  0 |  h = brown-spot   0  0  0  0  0  0  0  0 20  0  0  0  0  0  0  0  0  0  0 |  i = bacterial-blight   0  0  0  0  0  0  0  0  2 18  0  0  0  0  0  0  0  0  0 |  j = bacterial-pustule   0  0  0  0  0  0  0  0  0  0 20  0  0  0  0  0  0  0  0 |  k = purple-seed-stain   0  0  0  0  0  0  0  0  0  0  0 44  0  0  0  0  0  0  0 |  l = anthracnose   0  0  0  0  0  0  0  2  0  0  0  0 17  1  0  0  0  0  0 |  m = phyllosticta-leaf-spot   0  0  0  0  0  0  0  0  0  0  0  0  0 91  0  0  0  0  0 |  n = alternarialeaf-spot   0  0  0  0  0  0  0  3  0  0  0  0  0 22 65  1  0  0  0 |  o = frog-eye-leaf-spot   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 15  0  0  0 |  p = diaporthe-pod-&-stem-blight   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 14  0  0 |  q = cyst-nematode   0  0  0  2  0  0  0  0  0  0  0  0  0  0  0  0  0 14  0 |  r = 2-4-d-injury   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  8 |  s = herbicide-injury  The same dataset with as a Pruned Decision Tree

java weka.classifiers.trees.J48 -t $WEKAHOME/data/soybean.arff

J48 pruned tree
------------------

leafspot-size = lt-1/8
|   canker-lesion = dna
|   |   leafspots-marg = w-s-marg
|   |   |   seed-size = norm: bacterial-blight (21.0/1.0)
|   |   |   seed-size = lt-norm: bacterial-pustule (3.23/1.23)
|   |   leafspots-marg = no-w-s-marg: bacterial-pustule (17.91/0.91)
|   |   leafspots-marg = dna: bacterial-blight (0.0)
|   canker-lesion = brown: bacterial-blight (0.0)
|   canker-lesion = dk-brown-blk: phytophthora-rot (4.78/0.1)
|   canker-lesion = tan: purple-seed-stain (11.23/0.23)
leafspot-size = gt-1/8
|   roots = norm
|   |   mold-growth = absent
|   |   |   fruit-spots = absent
|   |   |   |   leaf-malf = absent
|   |   |   |   |   fruiting-bodies = absent
|   |   |   |   |   |   date = april: brown-spot (5.0)
|   |   |   |   |   |   date = may: brown-spot (24.0/1.0)
|   |   |   |   |   |   date = june
|   |   |   |   |   |   |   precip = lt-norm: phyllosticta-leaf-spot (4.0)
|   |   |   |   |   |   |   precip = norm: brown-spot (5.0/2.0)
|   |   |   |   |   |   |   precip = gt-norm: brown-spot (21.0)
|   |   |   |   |   |   date = july
|   |   |   |   |   |   |   precip = lt-norm: phyllosticta-leaf-spot (1.0)
|   |   |   |   |   |   |   precip = norm: phyllosticta-leaf-spot (2.0)
|   |   |   |   |   |   |   precip = gt-norm: frog-eye-leaf-spot (11.0/5.0)
|   |   |   |   |   |   date = august
|   |   |   |   |   |   |   leaf-shread = absent
|   |   |   |   |   |   |   |   seed-tmt = none: alternarialeaf-spot (16.0/4.0)
|   |   |   |   |   |   |   |   seed-tmt = fungicide
|   |   |   |   |   |   |   |   |   plant-stand = normal: frog-eye-leaf-spot (6.0)
|   |   |   |   |   |   |   |   |   plant-stand = lt-normal: alternarialeaf-spot (5.0/1.0)
|   |   |   |   |   |   |   |   seed-tmt = other: frog-eye-leaf-spot (3.0)
|   |   |   |   |   |   |   leaf-shread = present: alternarialeaf-spot (2.0)
|   |   |   |   |   |   date = september
|   |   |   |   |   |   |   stem = norm: alternarialeaf-spot (44.0/4.0)
|   |   |   |   |   |   |   stem = abnorm: frog-eye-leaf-spot (2.0)
|   |   |   |   |   |   date = october: alternarialeaf-spot (31.0/1.0)
|   |   |   |   |   fruiting-bodies = present: brown-spot (34.0)
|   |   |   |   leaf-malf = present: phyllosticta-leaf-spot (10.0)
|   |   |   fruit-spots = colored
|   |   |   |   fruit-pods = norm: brown-spot (2.0)
|   |   |   |   fruit-pods = diseased: frog-eye-leaf-spot (62.0)
|   |   |   |   fruit-pods = few-present: frog-eye-leaf-spot (0.0)
|   |   |   |   fruit-pods = dna: frog-eye-leaf-spot (0.0)
|   |   |   fruit-spots = brown-w/blk-specks
|   |   |   |   crop-hist = diff-lst-year: brown-spot (0.0)
|   |   |   |   crop-hist = same-lst-yr: brown-spot (2.0)
|   |   |   |   crop-hist = same-lst-two-yrs: brown-spot (0.0)
|   |   |   |   crop-hist = same-lst-sev-yrs: frog-eye-leaf-spot (2.0)
|   |   |   fruit-spots = distort: brown-spot (0.0)
|   |   |   fruit-spots = dna: brown-stem-rot (9.0)
|   |   mold-growth = present
|   |   |   leaves = norm: diaporthe-pod-&-stem-blight (7.25)
|   |   |   leaves = abnorm: downy-mildew (20.0)
|   roots = rotted
|   |   area-damaged = scattered: herbicide-injury (1.1/0.1)
|   |   area-damaged = low-areas: phytophthora-rot (30.03)
|   |   area-damaged = upper-areas: phytophthora-rot (0.0)
|   |   area-damaged = whole-field: herbicide-injury (3.66/0.66)
|   roots = galls-cysts: cyst-nematode (7.81/0.17)
leafspot-size = dna
|   int-discolor = none
|   |   leaves = norm
|   |   |   stem-cankers = absent
|   |   |   |   canker-lesion = dna: diaporthe-pod-&-stem-blight (5.53)
|   |   |   |   canker-lesion = brown: purple-seed-stain (0.0)
|   |   |   |   canker-lesion = dk-brown-blk: purple-seed-stain (0.0)
|   |   |   |   canker-lesion = tan: purple-seed-stain (9.0)
|   |   |   stem-cankers = below-soil: rhizoctonia-root-rot (19.0)
|   |   |   stem-cankers = above-soil: anthracnose (0.0)
|   |   |   stem-cankers = above-sec-nde: anthracnose (24.0)
|   |   leaves = abnorm
|   |   |   stem = norm
|   |   |   |   plant-growth = norm: powdery-mildew (22.0/2.0)
|   |   |   |   plant-growth = abnorm: cyst-nematode (4.3/0.39)
|   |   |   stem = abnorm
|   |   |   |   plant-stand = normal
|   |   |   |   |   leaf-malf = absent
|   |   |   |   |   |   seed = norm: diaporthe-stem-canker (21.0/1.0)
|   |   |   |   |   |   seed = abnorm: anthracnose (9.0)
|   |   |   |   |   leaf-malf = present: 2-4-d-injury (3.0)
|   |   |   |   plant-stand = lt-normal
|   |   |   |   |   fruiting-bodies = absent: phytophthora-rot (50.16/7.61)
|   |   |   |   |   fruiting-bodies = present
|   |   |   |   |   |   roots = norm: anthracnose (11.0/1.0)
|   |   |   |   |   |   roots = rotted: phytophthora-rot (12.89/2.15)
|   |   |   |   |   |   roots = galls-cysts: phytophthora-rot (0.0)
|   int-discolor = brown
|   |   leaf-malf = absent: brown-stem-rot (35.73/0.73)
|   |   leaf-malf = present: 2-4-d-injury (3.15/0.68)
|   int-discolor = black: charcoal-rot (22.22/2.22)

Number of Leaves  :     61

Size of the tree :      93


Time taken to build model: 0.23 seconds
Time taken to test model on training data: 0.09 seconds

=== Error on training data ===

Correctly Classified Instances         658               96.3397 %
Incorrectly Classified Instances        25                3.6603 %
Kappa statistic                          0.9598
Mean absolute error                      0.0104
Root mean squared error                  0.0625
Relative absolute error                 10.7981 %
Root relative squared error             28.5358 %
Total Number of Instances              683


=== Confusion Matrix ===

a  b  c  d  e  f  g  h  i  j  k  l  m  n  o  p  q  r  s   <-- classified as  20  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 |  a = diaporthe-stem-canker   0 20  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 |  b = charcoal-rot   1  0 19  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 |  c = rhizoctonia-root-rot   0  0  0 88  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 |  d = phytophthora-rot   0  0  0  0 44  0  0  0  0  0  0  0  0  0  0  0  0  0  0 |  e = brown-stem-rot   0  0  0  0  0 20  0  0  0  0  0  0  0  0  0  0  0  0  0 |  f = powdery-mildew   0  0  0  0  0  0 20  0  0  0  0  0  0  0  0  0  0  0  0 |  g = downy-mildew   0  0  0  0  0  0  0 90  0  0  0  0  0  0  2  0  0  0  0 |  h = brown-spot   0  0  0  0  0  0  0  0 20  0  0  0  0  0  0  0  0  0  0 |  i = bacterial-blight   0  0  0  0  0  0  0  0  1 19  0  0  0  0  0  0  0  0  0 |  j = bacterial-pustule   0  0  0  0  0  0  0  0  0  0 20  0  0  0  0  0  0  0  0 |  k = purple-seed-stain   0  0  0  1  0  0  0  0  0  0  0 43  0  0  0  0  0  0  0 |  l = anthracnose   0  0  0  0  0  0  0  3  0  0  0  0 17  0  0  0  0  0  0 |  m = phyllosticta-leaf-spot   0  0  0  0  0  0  0  0  0  0  0  0  0 88  3  0  0  0  0 |  n = alternarialeaf-spot   0  0  0  0  0  0  0  0  0  0  0  0  0 10 81  0  0  0  0 |  o = frog-eye-leaf-spot   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 15  0  0  0 |  p = diaporthe-pod-&-stem-blight   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 14  0  0 |  q = cyst-nematode   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 16  0 |  r = 2-4-d-injury   0  0  0  4  0  0  0  0  0  0  0  0  0  0  0  0  0  0  4 |  s = herbicide-injury    === Stratified cross-validation ===  Correctly Classified Instances         625               91.5081 % Incorrectly Classified Instances        58                8.4919 % Kappa statistic                          0.9068 Mean absolute error                      0.0135 Root mean squared error                  0.0842 Relative absolute error                 14.0484 % Root relative squared error             38.4134 % Total Number of Instances              683   === Confusion Matrix ===    a  b  c  d  e  f  g  h  i  j  k  l  m  n  o  p  q  r  s   <-- classified as  19  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 |  a = diaporthe-stem-canker   0 20  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 |  b = charcoal-rot   1  0 19  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 |  c = rhizoctonia-root-rot   0  0  0 87  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0 |  d = phytophthora-rot   0  0  0  0 44  0  0  0  0  0  0  0  0  0  0  0  0  0  0 |  e = brown-stem-rot   0  0  0  0  0 20  0  0  0  0  0  0  0  0  0  0  0  0  0 |  f = powdery-mildew   0  0  0  0  0  0 20  0  0  0  0  0  0  0  0  0  0  0  0 |  g = downy-mildew   0  0  0  0  0  0  0 85  0  0  0  0  2  1  4  0  0  0  0 |  h = brown-spot   0  0  0  0  0  0  0  0 20  0  0  0  0  0  0  0  0  0  0 |  i = bacterial-blight   0  0  0  0  0  0  0  0  1 19  0  0  0  0  0  0  0  0  0 |  j = bacterial-pustule   0  0  0  0  0  0  0  0  0  0 20  0  0  0  0  0  0  0  0 |  k = purple-seed-stain   0  0  0  4  0  0  0  0  0  0  0 40  0  0  0  0  0  0  0 |  l = anthracnose   0  0  0  0  0  0  0  3  0  0  0  0 14  0  3  0  0  0  0 |  m = phyllosticta-leaf-spot   0  0  0  0  0  0  0  1  0  0  0  0  0 85  5  0  0  0  0 |  n = alternarialeaf-spot   0  0  0  0  0  0  0  3  0  0  0  0  1 20 67  0  0  0  0 |  o = frog-eye-leaf-spot   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 15  0  0  0 |  p = diaporthe-pod-&-stem-blight   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 14  0  0 |  q = cyst-nematode   0  0  0  1  0  0  0  0  0  0  0  1  0  0  0  0  0 14  0 |  r = 2-4-d-injury   0  0  0  1  0  0  0  0  0  0  0  1  0  0  0  1  0  2  3 |  s = herbicide-injury   Building a data model

java weka.classifiers.trees.J48 -t $WEKAHOME/data/soybean.arff \
-i -k -d J48-data.model > J48-data.out &

On the segment data provider, build on one set, check against another

java weka.classifiers.trees.J48 -t $WEKAHOME/data/segment-test.arff \
-i -k -d J48-segment-data.model >J48-segment-data.out

The results:

[weka@domU-12-31-36-00-26-23 tutorial]$ ls -l
total 108
-rw-rw-r--  1 weka weka 60556 Aug  1 09:13 J48-data.model
-rw-rw-r--  1 weka weka 12906 Aug  1 09:13 J48-data.out
-rw-rw-r--  1 weka weka 18784 Aug  1 09:17 J48-segment-data.model
-rw-rw-r--  1 weka weka  6146 Aug  1 09:17 J48-segment-data.out

more J48-segment-data.out

J48 pruned tree
------------------

region-centroid-row <= 155 |   intensity-mean <= 31.6296 |   |   hue-mean <= -1.84512 |   |   |   hue-mean <= -2.22949 |   |   |   |   saturation-mean <= 0.48999: window (3.0) |   |   |   |   saturation-mean > 0.48999: foliage (77.0)
|   |   |   hue-mean > -2.22949
|   |   |   |   saturation-mean <= 0.864482 |   |   |   |   |   rawgreen-mean <= 14.6667 |   |   |   |   |   |   region-centroid-col <= 100 |   |   |   |   |   |   |   hue-mean <= -2.03349 |   |   |   |   |   |   |   |   hue-mean <= -2.14532: foliage (2.0) |   |   |   |   |   |   |   |   hue-mean > -2.14532: window (13.0/3.0)
|   |   |   |   |   |   |   hue-mean > -2.03349
|   |   |   |   |   |   |   |   region-centroid-row <= 150: brickface (2.0) |   |   |   |   |   |   |   |   region-centroid-row > 150: window (2.0)
|   |   |   |   |   |   region-centroid-col > 100: window (56.0)
|   |   |   |   |   rawgreen-mean > 14.6667
|   |   |   |   |   |   region-centroid-row <= 122: window (26.0/1.0) |   |   |   |   |   |   region-centroid-row > 122
|   |   |   |   |   |   |   region-centroid-col <= 165: cement (10.0) |   |   |   |   |   |   |   region-centroid-col > 165: window (4.0/1.0)
|   |   |   |   saturation-mean > 0.864482
|   |   |   |   |   hue-mean <= -2.101: foliage (22.0) |   |   |   |   |   hue-mean > -2.101
|   |   |   |   |   |   region-centroid-row <= 132 |   |   |   |   |   |   |   hue-mean <= -2.08047: foliage (9.0) |   |   |   |   |   |   |   hue-mean > -2.08047: window (3.0/1.0)
|   |   |   |   |   |   region-centroid-row > 132
|   |   |   |   |   |   |   region-centroid-row <= 143: window (10.0) |   |   |   |   |   |   |   region-centroid-row > 143: foliage (2.0)
|   |   hue-mean > -1.84512
|   |   |   exgreen-mean <= -5.77778 |   |   |   |   exred-mean <= -5.88889 |   |   |   |   |   region-centroid-row <= 104: brickface (6.0) |   |   |   |   |   region-centroid-row > 104: foliage (3.0)
|   |   |   |   exred-mean > -5.88889: brickface (118.0/1.0)
|   |   |   exgreen-mean > -5.77778
|   |   |   |   exred-mean <= -0.777778: grass (5.0/1.0) |   |   |   |   exred-mean > -0.777778
|   |   |   |   |   region-centroid-col <= 34: foliage (2.0) |   |   |   |   |   region-centroid-col > 34: window (14.0)
|   intensity-mean > 31.6296
|   |   rawblue-mean <= 88.4444: cement (94.0/1.0) |   |   rawblue-mean > 88.4444: sky (110.0)
region-centroid-row > 155
|   rawred-mean <= 23.3333 |   |   exgreen-mean <= -3.77778: cement (5.0/1.0) |   |   exgreen-mean > -3.77778: grass (118.0)
|   rawred-mean > 23.3333: path (94.0)

Number of Leaves  :     26

Size of the tree :      51


Time taken to build model: 0.45 seconds
Time taken to test model on training data: 0.02 seconds

=== Error on training data ===

Correctly Classified Instances         800               98.7654 %
Incorrectly Classified Instances        10                1.2346 %
Kappa statistic                          0.9856
K&B Relative Info Score              79692.1947 %
K&B Information Score                 2232.1312 bits      2.7557 bits/instance
Class complexity | order 0            2268.6706 bits      2.8008 bits/instance
Class complexity | scheme               45.7746 bits      0.0565 bits/instance
Complexity improvement     (Sf)       2222.896  bits      2.7443 bits/instance
Mean absolute error                      0.0058
Root mean squared error                  0.054
Relative absolute error                  2.3848 %
Root relative squared error             15.443  %
Total Number of Instances              810


=== Detailed Accuracy By Class ===

TP Rate   FP Rate   Precision   Recall  F-Measure   Class
1         0.001      0.992     1         0.996    brickface
1         0          1         1         1        sky
0.959     0          1         0.959     0.979    foliage
0.973     0.003      0.982     0.973     0.977    cement
0.992     0.009      0.954     0.992     0.973    window
1         0          1         1         1        path
0.992     0.001      0.992     0.992     0.992    grass


=== Confusion Matrix ===

 a   b   c   d   e   f   g   <-- classified as  125   0   0   0   0   0   0 |   a = brickface    0 110   0   0   0   0   0 |   b = sky    0   0 117   1   4   0   0 |   c = foliage    0   0   0 107   2   0   1 |   d = cement    1   0   0   0 125   0   0 |   e = window    0   0   0   0   0  94   0 |   f = path    0   0   0   1   0   0 122 |   g = grass    === Stratified cross-validation ===  Correctly Classified Instances         757               93.4568 % Incorrectly Classified Instances        53                6.5432 % Kappa statistic                          0.9235 K&B Relative Info Score              75326.8356 % K&B Information Score                 2110.05   bits      2.605  bits/instance Class complexity | order 0            2268.8296 bits      2.801  bits/instance Class complexity | scheme            37665.7637 bits     46.5009 bits/instance Complexity improvement     (Sf)     -35396.9341 bits    -43.6999 bits/instance Mean absolute error                      0.02 Root mean squared error                  0.1312 Relative absolute error                  8.1735 % Root relative squared error             37.5168 % Total Number of Instances              810   === Detailed Accuracy By Class ===  TP Rate   FP Rate   Precision   Recall  F-Measure   Class   0.96      0.009      0.952     0.96      0.956    brickface   1         0.001      0.991     1         0.995    sky   0.844     0.022      0.873     0.844     0.858    foliage   0.9       0.01       0.934     0.9       0.917    cement   0.881     0.031      0.841     0.881     0.86     window   0.989     0.001      0.989     0.989     0.989    path   0.984     0.003      0.984     0.984     0.984    grass   === Confusion Matrix ===     a   b   c   d   e   f   g   <-- classified as  120   0   3   0   2   0   0 |   a = brickface    0 110   0   0   0   0   0 |   b = sky    4   0 103   1  14   0   0 |   c = foliage    0   1   2  99   5   1   2 |   d = cement    2   0  10   3 111   0   0 |   e = window    0   0   0   1   0  93   0 |   f = path    0   0   0   2   0   0 121 |   g = grass   Checking meta classifier:

java weka.classifiers.meta.ClassificationViaRegression \
-W weka.classifiers.functions.LinearRegression \
-t $WEKAHOME/data/iris.arff -x 2 -- -S 1

Options: -W weka.classifiers.functions.LinearRegression -- -S 1

Classification via Regression

Classifier for class with index 0:


Linear Regression Model

class =

    0.0656 * sepallength +
    0.2425 * sepalwidth +
   -0.2228 * petallength +
   -0.0634 * petalwidth +
    0.1225

Classifier for class with index 1:


Linear Regression Model

class =

   -0.0215 * sepallength +
   -0.4407 * sepalwidth +
    0.2185 * petallength +
   -0.4832 * petalwidth +
    1.563

Classifier for class with index 2:


Linear Regression Model

class =

   -0.0441 * sepallength +
    0.1982 * sepalwidth +
    0.0042 * petallength +
    0.5465 * petalwidth +
   -0.6854



Time taken to build model: 0.14 seconds
Time taken to test model on training data: 0.01 seconds

=== Error on training data ===

Correctly Classified Instances         127               84.6667 %
Incorrectly Classified Instances        23               15.3333 %
Kappa statistic                          0.77
Mean absolute error                      0.2164
Root mean squared error                  0.2943
Relative absolute error                 48.6997 %
Root relative squared error             62.4309 %
Total Number of Instances              150


=== Confusion Matrix ===

a  b  c   <-- classified as  50  0  0 |  a = Iris-setosa   0 34 16 |  b = Iris-versicolor   0  7 43 |  c = Iris-virginica    === Stratified cross-validation ===  Correctly Classified Instances         123               82      % Incorrectly Classified Instances        27               18      % Kappa statistic                          0.73 Mean absolute error                      0.2349 Root mean squared error                  0.3157 Relative absolute error                 52.8443 % Root relative squared error             66.9658 % Total Number of Instances              150   === Confusion Matrix ===    a  b  c   <-- classified as  49  1  0 |  a = Iris-setosa   0 33 17 |  b = Iris-versicolor   0  9 41 |  c = Iris-virginica   Testing some real datasets now Leukemia-ALLAML

The data can be found here http://research.i2r.a-star.edu.sg/rp/Leukemia/ALLAML.html

java weka.classifiers.trees.J48 -t $WEKAHOME/data/ALL-AML_train.arff \
-T $WEKAHOME/data/ALL-AML_test.arff -i -k \
-d Leukemia-ALLAML.tree.J48.model  > Leukemia-ALLAML.tree.J48.out

The results:

more Leukemia-ALLAML.tree.J48.out

J48 pruned tree
------------------

attribute4847 <= 938: ALL (27.0) attribute4847 > 938: AML (11.0)

Number of Leaves  :     2

Size of the tree :      3


Time taken to build model: 1.13 seconds
Time taken to test model on training data: 0.07 seconds

=== Error on training data ===

Correctly Classified Instances          38              100      %
Incorrectly Classified Instances         0                0      %
Kappa statistic                          1
K&B Relative Info Score               3744.5181 %
K&B Information Score                   33.0001 bits      0.8684 bits/instance
Class complexity | order 0              33.0001 bits      0.8684 bits/instance
Class complexity | scheme                0      bits      0      bits/instance
Complexity improvement     (Sf)         33.0001 bits      0.8684 bits/instance
Mean absolute error                      0
Root mean squared error                  0
Relative absolute error                  0      %
Root relative squared error              0      %
Total Number of Instances               38


=== Detailed Accuracy By Class ===

TP Rate   FP Rate   Precision   Recall  F-Measure   Class
1         0          1         1         1        ALL
1         0          1         1         1        AML


=== Confusion Matrix ===

a  b   <-- classified as  27  0 |  a = ALL   0 11 |  b = AML   === Error on test data ===  Correctly Classified Instances          31               91.1765 % Incorrectly Classified Instances         3                8.8235 % Kappa statistic                          0.8198 K&B Relative Info Score               3160.6324 % K&B Information Score                   27.8544 bits      0.8192 bits/instance Class complexity | order 0              34.609  bits      1.0179 bits/instance Class complexity | scheme             3222      bits     94.7647 bits/instance Complexity improvement     (Sf)      -3187.391  bits    -93.7468 bits/instance Mean absolute error                      0.0882 Root mean squared error                  0.297 Relative absolute error                 18.9873 % Root relative squared error             58.8575 % Total Number of Instances               34   === Detailed Accuracy By Class ===  TP Rate   FP Rate   Precision   Recall  F-Measure   Class   0.9       0.071      0.947     0.9       0.923    ALL   0.929     0.1        0.867     0.929     0.897    AML   === Confusion Matrix ===    a  b   <-- classified as  18  2 |  a = ALL   1 13 |  b = AML  Same data with NaiveBayes:

java weka.classifiers.bayes.NaiveBayes -t $WEKAHOME/data/ALL-AML_train.arff \
-T $WEKAHOME/data/ALL-AML_test.arff -i -k \
-d Leukemia-ALLAML.NaiveBayes.J48.model  > Leukemia-ALLAML.NaiveBayes.J48.out

Checking the results (trimmed): 

tail -100 Leukemia-ALLAML.NaiveBayes.J48.out

attribute7096:  Normal Distribution. Mean = 17632.9975 StandardDev = 5491.6378 WeightSum = 11 Precision = 885.6756756756756
attribute7097:  Normal Distribution. Mean = 16260.5405 StandardDev = 1742.1779 WeightSum = 11 Precision = 687.9459459459459
attribute7098:  Normal Distribution. Mean = 918.855 StandardDev = 410.0149 WeightSum = 11 Precision = 64.37837837837837
attribute7099:  Normal Distribution. Mean = 280.1548 StandardDev = 173.7343 WeightSum = 11 Precision = 17.216216216216218
attribute7100:  Normal Distribution. Mean = 59.3889 StandardDev = 189.0803 WeightSum = 11 Precision = 29.694444444444443
attribute7101:  Normal Distribution. Mean = 11265.5725 StandardDev = 2448.7777 WeightSum = 11 Precision = 390.9189189189189
attribute7102:  Normal Distribution. Mean = 10453.7396 StandardDev = 3122.437 WeightSum = 11 Precision = 419.6756756756757
attribute7103:  Normal Distribution. Mean = 318.7273 StandardDev = 376.3747 WeightSum = 11 Precision = 47.37837837837838
attribute7104:  Normal Distribution. Mean = 2731.2801 StandardDev = 1380.7546 WeightSum = 11 Precision = 236.56756756756758
attribute7105:  Normal Distribution. Mean = -288.0413 StandardDev = 90.8241 WeightSum = 11 Precision = 11.606060606060606
attribute7106:  Normal Distribution. Mean = 0 StandardDev = 63.0836 WeightSum = 11 Precision = 7.324324324324325
attribute7107:  Normal Distribution. Mean = 300.6417 StandardDev = 114.8094 WeightSum = 11 Precision = 27.558823529411764
attribute7108:  Normal Distribution. Mean = -6.5039 StandardDev = 40.9087 WeightSum = 11 Precision = 8.942857142857143
attribute7109:  Normal Distribution. Mean = 249.1057 StandardDev = 80.4043 WeightSum = 11 Precision = 16.81081081081081
attribute7110:  Normal Distribution. Mean = 56.7107 StandardDev = 49.8522 WeightSum = 11 Precision = 6.636363636363637
attribute7111:  Normal Distribution. Mean = 63.7126 StandardDev = 31.0336 WeightSum = 11 Precision = 9.870967741935484
attribute7112:  Normal Distribution. Mean = -16.5111 StandardDev = 217.4379 WeightSum = 11 Precision = 25.945945945945947
attribute7113:  Normal Distribution. Mean = 267.1091 StandardDev = 128.0862 WeightSum = 11 Precision = 16.6
attribute7114:  Normal Distribution. Mean = 122.4791 StandardDev = 87.51 WeightSum = 11 Precision = 17.054054054054053
attribute7115:  Normal Distribution. Mean = 233.8717 StandardDev = 111.0206 WeightSum = 11 Precision = 11.588235294117647
attribute7116:  Normal Distribution. Mean = 307.9662 StandardDev = 139.9155 WeightSum = 11 Precision = 24.37142857142857
attribute7117:  Normal Distribution. Mean = -319.0614 StandardDev = 110.253 WeightSum = 11 Precision = 25.43243243243243
attribute7118:  Normal Distribution. Mean = -2319.9951 StandardDev = 878.3917 WeightSum = 11 Precision = 105.89189189189189
attribute7119:  Normal Distribution. Mean = 378.2703 StandardDev = 120.9712 WeightSum = 11 Precision = 94.56756756756756
attribute7120:  Normal Distribution. Mean = 182.4489 StandardDev = 82.9293 WeightSum = 11 Precision = 10.1875
attribute7121:  Normal Distribution. Mean = 797.0098 StandardDev = 352.9267 WeightSum = 11 Precision = 38.62162162162162
attribute7122:  Normal Distribution. Mean = 11.3143 StandardDev = 56.262 WeightSum = 11 Precision = 11.314285714285715
attribute7123:  Normal Distribution. Mean = 348.8624 StandardDev = 134.0911 WeightSum = 11 Precision = 67.32432432432432
attribute7124:  Normal Distribution. Mean = -17.8909 StandardDev = 48.2762 WeightSum = 11 Precision = 4.685714285714286
attribute7125:  Normal Distribution. Mean = 1109.484 StandardDev = 549.1813 WeightSum = 11 Precision = 57.2972972972973
attribute7126:  Normal Distribution. Mean = 326.3333 StandardDev = 147.522 WeightSum = 11 Precision = 29.666666666666668
attribute7127:  Normal Distribution. Mean = 8.5 StandardDev = 20.0873 WeightSum = 11 Precision = 5.5
attribute7128:  Normal Distribution. Mean = 1145.2208 StandardDev = 1057.6857 WeightSum = 11 Precision = 91.28571428571429
attribute7129:  Normal Distribution. Mean = -24.6494 StandardDev = 26.9834 WeightSum = 11 Precision = 3.7142857142857144


Time taken to build model: 0.42 seconds
Time taken to test model on training data: 1.28 seconds

=== Error on training data ===

Correctly Classified Instances          38              100      %
Incorrectly Classified Instances         0                0      %
Kappa statistic                          1
K&B Relative Info Score               3744.5181 %
K&B Information Score                   33.0001 bits      0.8684 bits/instance
Class complexity | order 0              33.0001 bits      0.8684 bits/instance
Class complexity | scheme                0      bits      0      bits/instance
Complexity improvement     (Sf)         33.0001 bits      0.8684 bits/instance
Mean absolute error                      0
Root mean squared error                  0
Relative absolute error                  0      %
Root relative squared error              0      %
Total Number of Instances               38


=== Detailed Accuracy By Class ===

TP Rate   FP Rate   Precision   Recall  F-Measure   Class
1         0          1         1         1        ALL
1         0          1         1         1        AML


=== Confusion Matrix ===

a  b   <-- classified as  27  0 |  a = ALL   0 11 |  b = AML   === Error on test data ===  Correctly Classified Instances          30               88.2353 % Incorrectly Classified Instances         4               11.7647 % Kappa statistic                          0.7518 K&B Relative Info Score               2905.1505 % K&B Information Score                   25.6028 bits      0.753  bits/instance Class complexity | order 0              34.609  bits      1.0179 bits/instance Class complexity | scheme             4296      bits    126.3529 bits/instance Complexity improvement     (Sf)      -4261.391  bits   -125.335  bits/instance Mean absolute error                      0.1176 Root mean squared error                  0.343 Relative absolute error                 25.3165 % Root relative squared error             67.9628 % Total Number of Instances               34   === Detailed Accuracy By Class ===  TP Rate   FP Rate   Precision   Recall  F-Measure   Class   0.95      0.214      0.864     0.95      0.905    ALL   0.786     0.05       0.917     0.786     0.846    AML   === Confusion Matrix ===    a  b   <-- classified as  19  1 |  a = ALL   3 11 |  b = AML  Running off predictions


java weka.classifiers.trees.J48 -t $WEKAHOME/data/ALL-AML_train.arff \
-T $WEKAHOME/data/ALL-AML_test.arff -i -k \
-d Leukemia-ALLAML.tree.J48.model -p 0  > Leukemia-ALLAML.tree.J48.out

more Leukemia-ALLAML.tree.J48.out

0 ALL 1.0 ALL
1 ALL 1.0 ALL
2 ALL 1.0 ALL
3 ALL 1.0 ALL
4 ALL 1.0 ALL
5 ALL 1.0 ALL
6 ALL 1.0 ALL
7 ALL 1.0 ALL
8 ALL 1.0 ALL
9 ALL 1.0 ALL
10 ALL 1.0 ALL
11 ALL 1.0 ALL
12 ALL 1.0 ALL
13 ALL 1.0 ALL
14 AML 1.0 ALL
15 ALL 1.0 ALL
16 AML 1.0 ALL
17 ALL 1.0 ALL
18 ALL 1.0 ALL
19 ALL 1.0 ALL
20 AML 1.0 AML
21 AML 1.0 AML
22 AML 1.0 AML
23 AML 1.0 AML
24 AML 1.0 AML
25 AML 1.0 AML
26 AML 1.0 AML
27 AML 1.0 AML
28 AML 1.0 AML
29 AML 1.0 AML
30 ALL 1.0 AML
31 AML 1.0 AML
32 AML 1.0 AML
33 AML 1.0 AML

java -mx1024m weka.classifiers.bayes.NaiveBayes \
-t $WEKAHOME/data/ALL-AML_train.arff \
-T $WEKAHOME/data/ALL-AML_test.arff -i -k \
-d Leukemia-ALLAML.NaiveBayes.J48.model -p 0 > Leukemia-ALLAML.NaiveBayes.J48.pred

The results:

[weka@domU-12-31-36-00-26-23 tutorial]$ ls -l
total 3920
-rw-rw-r--  1 weka weka   60556 Aug  1 09:13 J48-data.model
-rw-rw-r--  1 weka weka   12906 Aug  1 09:13 J48-data.out
-rw-rw-r--  1 weka weka   18784 Aug  1 09:17 J48-segment-data.model
-rw-rw-r--  1 weka weka    6146 Aug  1 09:17 J48-segment-data.out
-rw-rw-r--  1 weka weka 1506090 Aug  1 09:41 Leukemia-ALLAML.NaiveBayes.J48.model
-rw-rw-r--  1 weka weka 1703981 Aug  1 09:36 Leukemia-ALLAML.NaiveBayes.J48.out
-rw-rw-r--  1 weka weka     535 Aug  1 09:41 Leukemia-ALLAML.NaiveBayes.J48.pred
-rw-rw-r--  1 weka weka  666093 Aug  1 09:40 Leukemia-ALLAML.tree.J48.model
-rw-rw-r--  1 weka weka     535 Aug  1 09:40 Leukemia-ALLAML.tree.J48.out

more Leukemia-ALLAML.NaiveBayes.J48.pred

0 ALL 1.0 ALL
1 ALL 1.0 ALL
2 AML 1.0 ALL
3 ALL 1.0 ALL
4 ALL 1.0 ALL
5 ALL 1.0 ALL
6 ALL 1.0 ALL
7 ALL 1.0 ALL
8 ALL 1.0 ALL
9 ALL 1.0 ALL
10 ALL 1.0 ALL
11 ALL 1.0 ALL
12 ALL 1.0 ALL
13 ALL 1.0 ALL
14 ALL 1.0 ALL
15 ALL 1.0 ALL
16 ALL 1.0 ALL
17 ALL 1.0 ALL
18 ALL 1.0 ALL
19 ALL 1.0 ALL
20 AML 1.0 AML
21 AML 1.0 AML
22 AML 1.0 AML
23 AML 1.0 AML
24 ALL 1.0 AML
25 AML 1.0 AML
26 AML 1.0 AML
27 AML 1.0 AML
28 AML 1.0 AML
29 AML 1.0 AML
30 ALL 1.0 AML
31 AML 1.0 AML
32 ALL 1.0 AML
33 AML 1.0 AML
VM Datamining

Tuesday, August 7, 2007

Weka Data mining on EC2 - testing

No comments:

Blog Archive

Labels

Subscribe now: Feedburner Feed