D-MAST Platform

Distributed Multi-Agent Based Platform for High Performance Computing Infrastructures

The Distributed - Multi Agent System Simulation Toolbox (D-MAST) is a dynamic platform for high performance computing infrastructures (Grid Infrastructures), which supports unlimited number of agents and

overcomes the difficulties of executing experiments in grid infrastructures, with an easier, quicker and technically robust approach, focusing in large scale machine learning experiments. It distributes a large experiment to many sub-experiments, allocated over independent Worker Nodes (WN) for better parallelization, and can be deployed over existing grid infrastructures without compromising the security of existing systems. A web-based innovating monitoring system and the flexible modular layered architecture of the platform allows easily the integration of external tools. Mainly, is a multi-agent based high performance computing platform for machine learning experiments.

The D-MAST platform was developed in JAVA and its architecture is based on three modular layers (next Figure), each of which consists of several objects and sub-objects.

Large-Scale Multi-Agent Experiment Distribution

In this section, the segmentation of a large-scale multi-agent system experiment to sub-experiments running in parallel, is described, to optimize the usage of resources. Due to the nature of our experiments, their segmentation is implemented via creating matches as autonomous grid jobs. A social event experiment consists of many matches between the participants. Each match is an autonomous job with input and output data, managed from the multi-agent based simulation platform, located in the administrative domain- user interface of the grid infrastructures. Each job runs on a random free worker node, of a random free cluster of the grid infrastructures (in principle, a user has no control over which worker node to select for a given job).

After transferring the experiment set-up to the grid administrative domain, segmentation proceeds with implementing the round robin algorithm between all agents. The large folder icon, over the administrative domain (named Large Scale Experiment), in above Figure represents the multi-agent system experiment and its input and output data. After the segmentation of the experiment, the platform transfers all necessary data of each match–job to the storage element, in parallel (floppy disk icons encircled by dashed line in the most left of above Figure). During data transfer the system executes the autonomous jobs and finds the available clusters of the high performance computing infrastructures through the Worker Management System (WMS). Thereafter, the Computing Element (CE) of each cluster searches for free worker nodes while transferring the necessary data of the experiment from the grid storage element to its own storage element. When an available worker node is found, the job starts after all required data is fed from the storage element to that worker node. After the job starts a unique Job ID is created and communicated to the user interface and to the web-based graphical user interface for monitoring the status of the job. After the end of the job, a reverse data flow brings the result of the experiment and all newly generated data to the user interface (floppy disk icons encircled by complete line) to be made available to subsequent stages. This process is the same for all the matches of a social event.

The most important key point of D-MAST platform is that it does not require any specialized grid usage knowledge.

The D-MAST Platform Management Toolkit

Access to grid infrastructures is established through an internet access terminal; the D-MAST platform services and the grid services can be managed through a web-based JAVA application. This application implements a comprehensive SSH communication protocol, through which all prerequisite grid commands can be submitted and executed. This application was built over the existing infrastructures, aiming for safer and faster access to the grid and to the new multi-agent based simulation services developed.

The above Figure presents the graphical user interface toolkit for the top layer of the D-MAST platform, for accessing the grid infrastructures and managing the multi-agent based simulation application. It is separated into two column panels, A and B, each consisting of several layers.

Panel A represents the SSH secure communication sub-systems, which enable communication between the two computers through commands, grid user interfaces and user terminals. Initially, the A1 layer is used for connection to the remote computer (in our case, to the grid user interface). The remote domain information should be imported as follows: username@domainname.comafter a connection request, the A3 layer (information window), informs the user about any responses from the remote domain. All available commands of the user interface, the SSH protocol and the multi-agent based simulation platform can be imported as text from the input field of the A2 layer and subsequently one may retrieve the results from the A3 layer. Generally, the A2 layer is the input of the SSH systems and the A3 layer the output.

Panel B contains three layers which correspond to experiment execution, experiment distribution monitoring (and managing) and experiment evolution monitoring. After a user successfully connects to the system - as presented in the previous paragraph- in order to initiate some grid activities, the button “Init” of the B1 layer should be pressed; this initializes the customizations of the grid by issuing specific grid infrastructure commands (for example, declarations of virtual organizations or storage element usage, which can anyway be manually submitted from panel A too). By pressing the small button of layer B1, the drop-down list in the left, automatically loads all available experiments of the user located in the user interface. Selected experiments can be managed from the rest of the component of the application. When starting a new experiment with one of the available from drop-down list, the number of iterations per match is to be input in the field, and then submitted via the “Start” button. Each time the “Update” button is pressed, the application automatically loads the status of the selected experiment from the drop-down list in layers B2 and B3. Layer B2 displays information in tabular form about each job- match of the experiment. This information includes agents’ names, number of iterations per match, step number of the round robin sequence, job location (worker node), status, unique job id and indicative dates of start, stop and delay dates. Selecting a job row and pressing the “Get” button opens a new browser tab with more detailed information about the cluster, worker node and job status. Users may easily stop a job-match by selecting the corresponding row from layer B2 and pressing the “Kill” button; in that case, the D-MAST platform re-submits the same Job to different resources after a few seconds. The termination of a job may be sometimes necessary, due to long waits or other problems which spring up at the grid infrastructure.

General Instructions

  • First of all, it should be accepted the Java Safety Security notifications, in order to run the Application.
  • By using an FTP client, transfer and unzip the Tournament Demo data file in your Grid UI root storage.
  • ** Do not change the Tournament folder name. Each Tournament folder should begin with the "Tour" word. **
  • Move only the MABSServer.jar and the userinfo.xml files to the root storage.
  • Update the userinfo.xml file with your info, most of them are offered from Grid UI administrator.
  • The Tournament demo folder contains two games the RLGame (RLGameBL.jar) and the Rock,Paper,Scissors (RockPaperScissors.jar).
  • The Source Code of the Rock,Paper,Scissors game is given as example for users ho may customize their own ML systems, based on the example, for running and managing them with this platform in Grid HPC Infrastructures.
  • Connect with your Grid account from the SSH Client For RLGTournament Platform, and follow the numbered steps shown in the top of the platform.
  • Type the name of the executable .jar file in the first input field.
  • Initialize some Grid pre-requirement attributes with Init button.
  • Press the Green button to load all the available Tournaments, choose one, type the number of repetitions of the games and start the experiment.
  • After the end of the experiment collect your tournament data from the root storage of your Grid UI.

***Is recommended to use the platform in browser with installed Grid Certification***

Publications

  • Kiourt, C. and Kalles, D.: A platform for large-scale game-playing multi-agent systems on a high performance computing infrastructure, Multiagent and Grid Systems, 12(1), p 35-54, 2016, DOI: [http://dx.doi.org/10.1016/j.culher.2016.06.007]. (Link)
  • Kiourt C. and Kalles, D.: Building A Social Multi-Agent System Simulation Management Toolbox., 6th Balkan Conference in Informatics (BCI 2013), Thessaloniki , pp 66- 70, Sep. 19-21, Greece, 2013, DOI:[https://doi.org/10.1145/2490257.2490293]. (Link)
  • Kiourt C. and Kalles, D.: Development of Grid-Based Multi Agent Systems for Social Learning., IEEE International Conference on Information Intelligence, Systems and Applications (IISA 2015), Corfu, Greece; Jul 04-07, 2015, DOI:[https://doi.org/10.1109/IISA.2015.7387973]. (Link)