Apriori Online
The Apriori algorithm was initially developed by Agrawal et al (Agrawal 93, Agrawal 94) and is used to find association rules from a dataset. This online version of the Apriori algorithm is based off an implementation by Christian Borgelt's.
A comma-separated (CSV) file is required as input. The file is to be in a tabular format, and can be created from either Excel or OpenOffice. The first row in the file is the column headings and each subsequent row represents observations. Example input is:
college,state,pub_priv,math_sat
Alaska Pacific University,AK,2,490
University of Alaska at Fairbanks,AK,1,499
University of Alaska Southeast,AK,1,,
University of Alaska at Anchorage,AK,1,459
Before the dataset is executed on the Apriori implementation, a series of steps will be performed on the dataset. First to fix missing values the average of the given column will be used. Currently this only works on numeric attributes. Second, columns with numeric attributes may be categorized; this is based on the following condition: If there are more unique values in a column than specified by Categorization Threshold, then categorization will be performed based on the Category Granularity inputted. More features coming..
Output
None (Apriori not run yet)
The following are required fields:
Category Granularity (how many categories to make in a given column)
Categorization Threshold (if number of unique values in the col are > than this number, then categorization will be performed)
Dataset (*.CSV) file input
The following are optional:
Target type - default
association rules
s: item sets
c: closed item sets
m: maximal item sets
r: association rules
h: association hyperedges
Minimal number of items per set/rule/hyperedge (default: 1)
Maximal number of items per set/rule/hyperedge (default: 5)
Minimal support of a set/rule/hyperedge (default: 10%)
Maximal support of a set/rule/hyperedge (default: 100%)
Minimal confidence of a rule/hyperedge (default: 80%)
Item separator for output (default: " ")
Output format for support/confidence (default: "%.1f")
Extended support output (print both rule support types)
Yes
Print absolute support (number of transactions)
Yes
Print lift value (confidence divided by prior)
Yes
Print value of additional rule evaluation measure
Yes
Write output in scanable form (quote certain characters)
Yes
Sort items w.r.t. their frequency (default: 2)
1: ascending
-1: descending
0: do not sort
2: ascending
-2: descending
w.r.t. transaction size sum
Do not organize transactions as a prefix tree
Yes
Use quicksort to sort the transactions (default: heapsort)
Yes