Experiment

Materials

  • Laptop or Computer
  • University of Wisconsin Breast Cancer Data Set
  • Eclipse Indigo
  • Java Runtime Environment
  • Google App Engine
  • Google Web Toolkit

Controlled Variables

  1. Each neural network should adhere to the following standards:
    1. Each trial shall use a different set of randomly selected instances for testing.
    2. Each commercial package shall be optimized to yield best results.  For example, if a package can support multiple hidden layers, the option will be evaluated to determine the best settings. The custom network will be implemented with proper settings.
    3. Each neural network will use the University of Wisconsin Original Breast Cancer Database.
    4. Trials of the same implementation will all be run using the same settings.
  2. All trials will be run on the same computer.
  3. The number of iterations for custom neural networks will be 1,500

Test Variable

The level of diagnostic success achieved by the networks.

Abridged versions of the procedures for this project are detailed below.

Experiment Phase 1

  1. Understand significance of each of 9 inputs: clump thickness, uniformity of cell size, uniformity of cell shape, marginal adhesion, single epithelial cell size, bare nuclei, bland chromatin, normal nucleoli, and marginal adhesion.
  2. Architect and optimize a neural network model for each of three commercial softwares, network should adhere to the following control variable standards:
    1. Reserve 70 instances for testing. In other words, those instances should not be used for training.
    2. Each trial shall use a different set of randomly selected instances for testing.
    3. Each package shall be optimized to yield best results.  For example, if a package can support multiple hidden layers, those results should be evaluated to decide optimal settings.
    4. If multiple neural network training architectures are supported, a back propagation technique shall be chosen.
  3. Execute 10 trials for each network implementation and capture number of properly predicted malignant and benign tumors and also number of false malignant and false benign predictions.
  4. Analyze results to determine success and failures of implementations.
  5. Determine if modern neural networks were successful at predicting malignant versus benign when including outliers. 
  6. Identify areas for improvement.

Experiment Phase 2

  1. Design pseudocode for a custom developed breast cancer neural network, including the following algorithmic components.
    1. Artificial Input Layer.  Convert inputs to binary inputs to simulate the on/off firing of neurons.
    2. Sigmoid Function. A logistic function that removes the linearity from processing.
    3. Summation Function.  Matrix math function to propagate neural firings through the network. 
    4. Step Function.  Incorporate malignancy weightings.
    5. Inconclusive Assessment.  Evaluate using multiple independently trained networks.
  2. Define a neural network model with artificial input layer.
  3. Identify a way to weight malignant false negatives higher.
  4. Implement a custom neural network in Java.
  5. Implement logic that allows the network to rule masses inconclusive.
  6. Train multiple base networks using the data from the dataset, allowing the computer to do all weighting on its own.
  7. Tune the network with different number of hidden nodes and malignancy weightings to identify optimal configuration.
  8. Test the network by training the network with all samples but one. Run 10 different trials for each sample (This should result in 6,800 trials).
  9. Run network using all samples in the training set to compare results and determine optimal capability.
  10. Establish a website to host the neural network implementation and provide an interface for hospitals.
  11. Design and implement a web service suitable for integration with the cloud via Google’s AppEngine.
  12. Deploy the web service and web application to the cloud.
  13. Test the network with different test sample sizes to determine correlation to success rates.  Use a large number of tests to reduce the impact of randomness.  Running 1,500 networks for each training size at 20 increments will result in over 7,000,000 trials.
  14. Analyze results and present findings.