機器學習與生物資訊  Machine Learning and Bioinformatics


教學網站 http://syslab.nchu.edu.tw/

上課時間:周四 2,3,4節
上課教室:
資工系電腦教室 821

本課程由 劉俊吉老師(2/23 ~ 5/11) 與 江信毅老師(5/18 ~ 6/15)合授。

Textbook

Ethem Alpaydin, Introduction to Machine Learning, third edition

The MIT Press

http://www.cmpe.boun.edu.tr/~ethem/i2ml3e/



Syllabus

  1. Introduction (ppt)
  2. Supervised Learning (ppt)
  3. Support vector machine (SVM) (ppt)
  4. Dimension Reduction (ppt)
  5. Genetic Algorithm (ppt)
  6. Simulated Annealing (ppt)
  7. Bayesian Decision Theory (ppt)
  8. Bayesian Probabilistic model (ppt)
  9. Hidden Markov Models (HMM) (ppt)
  10. Baum-Welch Algorithm (ppt)
  11. Label propagation algorithm (ppt)
  12. Neural Networks (ppt)

Keras: Deep Learning library

Homework 2 

  • Data source: Best MG, Sol N, Kooi I, Tannous J et al. RNA-Seq of Tumor-Educated Platelets Enables Blood-Based Pan-Cancer, Multiclass, and Molecular Pathway Cancer Diagnostics. Cancer Cell 2015 Nov 9;28(5):666-76.  http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE68086 
  • Selected features with SD > 5. It resulted in 290 samples and 6847 features.
  • Randomly selected 200 samples for training/testing data and 70 samples for validation data (See http://syslab.nchu.edu.tw/learning2017/Homework1_SD5.xlsx). 
    • Create 130 training samples and 70 testing samples from 200 training/testing samples.
    • Perform deep learning using various parameters, e.g. layers, dropout thresholds, activation functions, dimensions, etc. 
    • Write a performance analysis report 
  • Please submit your result to http://syslab5.nchu.edu.tw/Homework2 (you can submit at most 10 times)
  • Email the performance analysis report to chunchiliu@gmail.com
  • Deadline: June 8.
from keras.models import Sequential
from keras.layers import Dense, Dropout
from fn import get_data, corr
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'


output_dim1=1024
output_dim2=512
batch_size=200
nb_epoch=1000


data, labels, classes = get_data("Training.csv")

model = Sequential()

model.add(Dense(output_dim=output_dim1, input_dim=len(data[0]), activation="relu"))
model.add(Dropout(0.25))
model.add(Dense(output_dim2, activation="relu"))
model.add(Dropout(0.5))
model.add(Dense(len(classes), activation="sigmoid"))

model.compile(loss="mse", optimizer="rmsprop")
model.fit(data, labels, nb_epoch=nb_epoch, batch_size=batch_size)

data2, labels2, classes2 = get_data("Test.csv")
predict_types = model.predict_classes(data2)
print("Classes:", classes2)
print("True Types:", labels2.argmax(1))
print("Predict Types:", predict_types)
print("Corr:", corr(labels2.argmax(1).tolist(), predict_types.tolist()))
  1. Python Introduction, Setup Environment (簡介、安裝環境) ML
  2. Basic Syntax (基本語法)
  3. Variable, Assignment, Operator (變數、賦值、運算子)
  4. String, Number (字串、數字)
  5. If - else (判斷)
  6. Loop (迴圈)
  7. Tuple, List, Set, Dictionary (序對、串列、集合、字典)
  8. Function (函數)
  9. File, Module, Exception (檔案、模組、例外)
  10. Keras
  11. NumPy
  12. Series
  13. DataFrame
  14. Pandas 資料存取函數
  15. Pandas 繪圖函數


Homework 1 

Machine learning challenge 

SVM Practice


      public static svm_model buildModel(List<svm_node[]> positives, List<svm_node[]> negatives, 
         int SVM_TYPE, double gamma, double C) {
            svm_problem problem = new svm_problem();
            int numP = positives.size();
            int numN = negatives.size();
            int numNodes = numP+numN;

            problem.x = new svm_node[numNodes][positives.get(0).length];
            double[] y = new double[numNodes];
            for (int i=0; i<numP; i++) {
                  problem.x[i]=positives.get(i);
                  y[i]=1.0;
            }
            for (int i=0; i<numN; i++)      {
                  y[numP+i]=-1.0;
                  problem.x[numP+i]=negatives.get(i);
            }
            problem.y = y;
            problem.l = numNodes;
            
            svm_parameter params = new svm_parameter();
            
            if (SVM_TYPE == svm_parameter.LINEAR) { 
                  params.svm_type=svm_parameter.C_SVC;
                  params.kernel_type=svm_parameter.LINEAR;
                  params.gamma=gamma;
                  params.cache_size=100;
                  params.eps=0.001;
                  params.C = C;
                  params.nr_weight=2;
                  params.weight_label = new int[] {-1,1};
            }

            if (SVM_TYPE == svm_parameter.RBF) { 
                  params.svm_type=svm_parameter.C_SVC;
                  params.kernel_type=svm_parameter.RBF;
                  params.gamma=0.001; // 1/k, k is the number of attributes in the input data
                  params.cache_size=100;
                  params.eps=0.001;
                  params.C = 1.0; // default 1
                  params.nr_weight=2;
                  params.weight_label = new int[] {-1,1};
            }
            
            if (SVM_TYPE == svm_parameter.POLY) { 
                  params.svm_type=svm_parameter.C_SVC;
                  params.kernel_type= svm_parameter.POLY;
                  params.degree=5;
                  params.gamma=1.0/(positives.size()+negatives.size());
                  params.cache_size=100;
                  params.eps=0.001;
                  params.C = 1;
                  params.nr_weight=2;
                  params.weight_label = new int[] {-1,1};
            } 
            
            double pWeight = (double)numN/(double)numP;
            params.weight = new double[] {1.0,pWeight};
            return svm.svm_train(problem, params);
      }      

      public static RocResult svmPredict( svm_model model, List<svm_node[]> nodes, 
            boolean expected) {
            RocResult results = new RocResult();
            for (svm_node[] n : nodes) {
                  //double svm_predict = svm.svm_predict(model, n);
                  double[] dec_values = new double[1];
                  svm.svm_predict_values(model, n, dec_values);
                  int label = (dec_values[0] > 0.0) ? 1 : -1;
                  results.addSample(expected, label, dec_values[0]);
            }
            return results;
      }


    Comments