Design Tasks
Create a dynamic AWS cloud platform that simulates an enterprise network, then collect data from the network.
Feed data into our TensorFlow model that can predict network outages.
Utilize Python to dispatch Ansible configurations to an AWS node should TensorFlow sense an outage is about to occur.
Design Goals
Agent shall seamlessly integrate into existing nodes.
Agent shall not consume too many resources.
AI shall be trained with relative ease and work on a variety of enterprise networks.
Specific Design Goals
Develop a dashboard that shall allow users to configure the features of the agent and the AI.
Create an agent that shall have a heightened detection of down nodes.
Quick and efficient agent that shall traverse to a specific node.
Design Specifications
Cloud Environment
Amazon Web Services - Allows for configuration of VMs and physical servers in the network mesh.
Physical Resources
2-5 Raspberry Pis 3B+ running Raspberry Pi OS - Simulates physical servers that connects to the AWS interface.
Software
TensorFlow - AI tool that will take our data on the network and predict if a network outage may occur.
Python - Program will have a threshold based on TensorFlow that, when higher than usual, will send network instructions to Ansible.
Ansible - Automation tool that enables network configuration.
GNS3 - Simulate network behavior.
Paths Eliminated
Eliminated advanced health checks because that does not speed up the reconfiguration process
Eliminated extra connections between nodes because it will increase algorithm complexity.
Eliminated load balancers in each small cluster because that would significantly affect overhead and hardware cost.Â
Eliminated adding backup nodes because that would also significantly increase hardware cost.
Eliminated adding more network managers per subnet because it would significantly increase hardware cost.
Eliminated upgrade existing network managers because that is only a temporary solution that might become obsolete soon and if it needs to be consistently replaced resulting in increased hardware costs.
Quantitative Design Constraints
Data Security (High Importance): Ensure that our tool is secure and not susceptible to tampering or malicious use.
Scope: Ensure that our tool is optimized for high traffic / high risk enterprise computer networks.
Flexibility: Ensure that our final product is adaptable for further development and expansion.
Attributes to Guide Final Solution
Accuracy: Verify that our tool accurately predicts which nodes may go down.
Speed: Verify that once a node goes down, our tool can quickly resolve the issue.
Simplicity: Ensure that our product is easy to follow and understand such that troubleshooting is simple.
Scale: Ensure that our product is able to work well in large networks where network slowdown is most damaging.
Step-by-Step Approach to Solution
Create a cloud computing network in AWS.
Test Ansible scripts to investigate which are most pertinent to get networks back up and running.
Collect network data via the AWS network that will be used to train our AI model.
Set up TensorFlow AI model to predict instances when network nodes may go down.
Integrate AWS, TensorFlow, and Ansible via a Python program that will sense any network hiccups and dispatch Ansible configurations to resolve the issue.