MOA Social-based Ensembles and Clusterers


  • Abstract
In this page we make available the MOA implementations (source code included) for Social Adaptive Ensemble 2 (SAE2), Scale-free Network Classifier (SFNClassifier) and the Social Network Clusterer Stream (SNCStream).
In order to make use of these algorithms you will need to use MOA framework, which you can find at {M}assive {O}nline {A}nalysis Framework.
  • Documentation
In order to use these algorithms, please download both moa.jar and sizeofag.jar from {M}assive {O}nline {A}nalysis Framework
Then, download the SocialEnsembles.jar file (if you want to work with SAE2 or SFNClassifier) or SNCStream.jar (if you want to work with SNCStream) (both available at the bottom of this page) to the classpath when lauching MOA.
Example (windows):
java -classpath .;SocialEnsembles.jar;moa.jar;weka.jar -javaagent:sizeofag.jar moa.gui.GUI
Example (linux/mac):
         
java -cp SocialEnsembles.jar:moa.jar:weka.jar -javaagent:sizeofag.jar moa.gui.GUI

You can also unzip MOA's any of the JAR files, add the source code for the algorithms desired to moa.classifiers/moa.clusterers package and recompile MOA.
  • Social-based Algorithms
    • SAE2

Social Adaptive Ensemble 2 (SAE2) is a dynamic ensemble classifier for data stream classification that is built on the Social Adaptive Ensemble (SAE). Similarly to its predecessor, SAE2 maintains an ensemble of classifiers arranged as a network in which connections are created between two classifiers if they have similar predictions. In comparison with SAE, SAE2 includes a more scalable adaptation method, achieved by updating classifier’s connections weights before they are added to the ensemble; an alternative combination method based on maximal cliques; a voting strategy based on weighted majority, which diminishes prediction ties; and some other minor enhancements, such as a threshold to limit the maximum ensemble size. Its parameters are: 

    • -l BaseLearner (default: trees.HoeffdingTree)
      Base learner for new experts
    • -c PeriodLengthOption (default: 10000)
      Number of instances before a network update takes place
    • -o MaxExperts (default: 10)
      Maximum number of experts at once
    • -e TxminE (default: 0.7)
      EXPERT minimum correctly classified rate
    • -n CsMin (default: 0.9)
      Minimum Similarity Coefficient between EXPERTS (Activate Connection)
    • -x CsMax (default: 0.99)
      Maximum Similarity Coefficient between EXPERTS (Redundant Expert) - 1.01 = allow redundants
    • -v combinationMethod (default: hmg.sae.combination.MaximalCliques)
      Which algorithm should be used to group classifiers.
    • -a votingMethod (default: hmg.sae.vote.WeightedMajorityVoteCurrentPeriod)
      Which algorithm should be used for prediction and tie break.

Other parameters related to debugging and visualization. 

    • -w DoNotWriteNetwork
      Activate/Deactivate pajek network output to file. 
    • -q DoNotWriteMeasurements
      Activate/Deactivate network measurements output to file. 
    • -p pajekFile (default: sae-net)
      Network pajek project file name.
    • -z measurementsFile (default: sae-measurements)
      Network measurements file name.
    • -r randomSeed (default: 1)
      Seed for random behaviour of the classifier.
    • SFNClassifier
Scale-free Network Classifier (SFNClassifier) is an ensemble-based classifier for data streams. It is conceived as a dynamic scale-free network. The representation of the ensemble as a network allows us to extract centrality metrics, which are used to perform a weighted majority vote. Based on empirical studies, we concluded that SFNClassifier has comparable results to other ensemble-learners in terms of accuracy and outperformed the other methods in processing time.
SFNClassifier has only 6 parameters to be set:
      • -e: an expected hit rate that determines the expected output threshold of the network. (default: 0.95)
      • -l: a base learner to train. (default: HoeffdingTree)
      • -m: the centrality metric used for polling votes for instances. (default: Eigenvector)
      • -u: a period update size which determines how many instances will be evaluated before a network update takes place. (default: 1.000)
      • -k: a maximum amount of nodes (classifiers) in the network. (default: 3)
      • -r: Seed for random behaviour of the network (used for preferential attachment process). (default: 1)
  • Social-based Clustering Algorithms
    • SNCStream
The Social Network Clusterer Stream (SNCStream) is a one-step social network-based data stream clustering algorithm capable of finding non-hyper-spherical clusters. SNCStream uses a scale-free-like homophily procedure to track the evolution of clusters during data streams. SNCStream achieves high clustering quality rates (in terms of Clustering Mapping Measure - CMM) while maintaining suitable CPU Time and RAM Hours. Additionally, SNCStream is not bounded to a user-given amount of ground-truth clusters to be found.

SNCStream has the following parameters to be set (those marked with an '&' mark as optional):

      • -m: mu (default: 1)
      • -i: initial amount of instances. (default: 100)
      • -l: lambda: a decaying factor. (default: 0.25)
      • -e: epsilon: the epsilon neighborhood. (default: 0.02)
      • -b: beta: the constant for outlier detection. (default: 0.2)
      • -d: Distance metric (Euclidian, Mahalanobis, Cosine, Fractional, Manhattan) (OPTIONAL). (default: Euclidian)
      • -k: The amount of edges to be built on each node addition (default: 4)
      • -u: Pruning (OPTIONAL). (default: false)
      • -]: Network maximum size (OPTIONAL - Must be set of -u was set to true). (default: 1000)
      • -": Pruning strategy: Weakest, Lowest Degree, Highest Degree, Minimal Edge (OPTIONAL). (default: Weakest)
  • References
    • SAE
Heitor Murilo Gomes and Fabrício Enembreck
SAE: Social Adaptive Ensemble Classifier for Data Streams.
In IEEE 2013 Symposium on Computational Intelligence and Data Mining (CIDM), 2013, Singapore.
    • SAE2
Heitor Murilo Gomes and Fabrício Enembreck
SAE2: Advances On The Social Adaptive Ensemble Classifier for Data Streams.
In ACM 29th Symposium On Applied Computing (ACM SAC), 2014, Gyeongju, South Korea.
    • SFNClassifier
Jean Paul Barddal, Heitor Murilo Gomes and Fabrício Enembreck
SFNClassifier: A Scale-free Social Network Method to Handle Concept Drift.
In ACM 29th Symposium On Applied Computing (ACM SAC), 2014, Gyeongju, South Korea.

    • SNCStream
Jean Paul Barddal, Heitor Murilo Gomes and Fabrício Enembreck
SNCStream: A Social Network-based Data Stream Clustering Algorithm
In ACM 30th Symposium On Applied Computing (ACM SAC), 2015, Salamanca, Spain.
  • Contact
    • Fabrício Enembreck
fabricio@ppgia.pucpr.br
  • Heitor Murilo Gomes
hmgomes@ppgia.pucpr.br
  • Jean Paul Barddal
jean.barddal@ppgia.pucpr.br

ċ
SNCStream.jar
(86k)
Jean Paul Barddal,
12 de mai de 2015 10:04
ċ
SocialEnsembles.jar
(88k)
Heitor Gomes,
25 de jun de 2014 03:02