機器學習與生物資訊  Machine Learning and Bioinformatics


教學網站 http://syslab.nchu.edu.tw/

上課時間:周四 2,3,4節
上課教室:
資工系電腦教室 821

本課程由 江信毅老師(3/1 ~ 3/29)劉俊吉老師(4/12 ~ 6/21) 合授。

Textbook

Ethem Alpaydin, Introduction to Machine Learning, third edition

The MIT Press

http://www.cmpe.boun.edu.tr/~ethem/i2ml3e/



Syllabus

  1. Introduction (ppt) (Python Ch 1, 2)
  2. Supervised Learning (ppt) (Python Ch 3, 4)
  3. Support vector machine (SVM) (ppt(Python Ch 5 ~ 8)
  4. Dimension Reduction (ppt(Python Ch 9) (手把手打開Python資料分析大門 Page 110)
  5. Genetic Algorithm (ppt)
  6. Simulated Annealing (ppt)
  7. Bayesian Decision Theory (ppt)
  8. Bayesian Probabilistic model (ppt)
  9. Hidden Markov Models (HMM) (ppt)
  10. Baum-Welch Algorithm (ppt)
  11. Label propagation algorithm (ppt)
  12. Neural Networks (ppt)
  13. Gradient descent, how neural networks learn https://www.youtube.com/watch?v=IHZwWFHWa-w
  14. What is backpropagation really doing? https://www.youtube.com/watch?v=Ilg3gGewQ5U 

Keras: Deep Learning library


Python Programming

  1. Python Introduction, Setup Environment (簡介、安裝環境) ML
  2. Basic Syntax (基本語法)
  3. Variable, Assignment, Operator (變數、賦值、運算子)
  4. String, Number (字串、數字)
  5. If - else (判斷)
  6. Loop (迴圈)
  7. Tuple, List, Set, Dictionary (序對、串列、集合、字典)
  8. Function (函數)
  9. File, Module, Exception (檔案、模組、例外)
  10. Keras
  11. NumPy
  12. Series
  13. DataFrame
  14. Pandas 資料存取函數
  15. Pandas 繪圖函數


Homework 1

  1. Data https://docs.google.com/spreadsheets/d/1mQIDKuibYGNAmqtllRs8Eaq6EYEmqYSmCQhxLPBcRpE/edit?usp=sharing 
  2. There are 1992 columns. 
  3. Perform SVM to predict the labels of column 2 ~ 61 (60 columns).
  4. sklearn.svm.SVC http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html
dataset = "Homework1.tsv"
test_len = 60

import pandas as pd
import numpy as np

df = pd.read_csv(dataset, sep="\t", index_col=0, header=None)
label1 = df.loc["Label",:]
label_array = label1.values;
print("testing label ", label_array[:test_len])
print("training label ", label_array[test_len:])

df.drop(df.index[0], inplace=True)
df.drop(df.index[0], inplace=True)
df.head()

training = list(df.columns[test_len:])
test = list(df.columns[:test_len])
print("training ", len(training), " test ", len(test));
training_array = np.array(df[training])
test_array = np.array(df[test])

from sklearn import svm
clf1 = svm.SVC(kernel='linear')
clf1.fit(training_array.T, label_array[test_len:]) 
result = list(clf1.predict(test_array.T))
print("result ", result)
print(str(result).replace(".0, ", ""))



Homework 2

The top 4 students:
  1. 鄭凱儒  0.7853
  2. 蕭智予  0.7514
  3. 張君壁  0.7511
  4. 靳開翔  0.7498

    Homework 3

    • Perform machine learning to predict the labels of column 2 ~ 61 (60 columns).
    • Email your machine learning report to chunchiliu@gmail.com by June 14.

    Homework 4 (optional)




    Three SVR methods:
    • class sklearn.svm.SVR(kernel=’rbf’, degree=3, gamma=’auto’, coef0=0.0, tol=0.001, C=1.0, epsilon=0.1, shrinking=True, cache_size=200, verbose=False, max_iter=-1)
    • class sklearn.svm.LinearSVR(epsilon=0.0, tol=0.0001, C=1.0, loss=’epsilon_insensitive’, fit_intercept=True, intercept_scaling=1.0, dual=True, verbose=0, random_state=None, max_iter=1000)
    • class sklearn.svm.NuSVR(nu=0.5, C=1.0, kernel=’rbf’, degree=3, gamma=’auto’, coef0=0.0, shrinking=True, tol=0.001, cache_size=200, verbose=False, max_iter=-1)

    In [ ]:
    dataset = "Homework4.txt"
    test_len = 100
    
    import pandas as pd
    import numpy as np
    
    df = pd.read_csv(dataset, sep="\t", index_col=0, header=None)
    label1 = df.loc["Label",:]
    label_array = label1.values;
    test1_label = label_array[:test_len]
    test2_label = label_array[test_len:test_len*2]
    training_label = label_array[test_len*2:]
    
    print("test1 label ", test1_label, len(test1_label))
    print("test2 label ", test2_label, len(test2_label))
    print("training label ", training_label, len(training_label))
    
    df.drop(df.index[0], inplace=True)
    df.drop(df.index[0], inplace=True)
    df.head()
    
    In [ ]:
    def feature_selection_SFS(X, y, sel_num_, forward_, f_names, 
                              method = 1, C_coe = 1.0, alpha_coe = 0.1, max_iter_ = 100):
        if method == 1:
            from sklearn.svm import SVR
            clf = SVR(C=C_coe, epsilon=alpha_coe, kernel='linear', max_iter=max_iter_)
        if method == 2:
            from sklearn.svm import LinearSVR
            clf = LinearSVR(C=C_coe, epsilon=alpha_coe, max_iter=max_iter_)
        if method == 3:
            from sklearn.svm import NuSVR
            clf = NuSVR(C=C_coe, nu=alpha_coe, kernel='linear', max_iter=max_iter_)
        if method == 4:
            from sklearn.neural_network import MLPRegressor
            clf = MLPRegressor(hidden_layer_sizes=(20, 20), alpha=alpha_coe, activation= "logistic", max_iter=max_iter_)
        if method == 5:
            from sklearn.neural_network import MLPRegressor
            clf = MLPRegressor(hidden_layer_sizes=(30, 30), alpha=alpha_coe, activation= "logistic", max_iter=max_iter_)
        from sklearn import linear_model
        if method == 6:        
            clf = linear_model.Lasso(alpha=alpha_coe, max_iter=max_iter_)
        if method == 7:        
            clf = linear_model.Ridge(alpha=alpha_coe, max_iter=max_iter_)
        if method == 8:        
            clf = linear_model.ElasticNet(alpha=alpha_coe, max_iter=max_iter_)
        if method == 9:        
            from sklearn.ensemble import RandomForestRegressor 
            clf = RandomForestRegressor(max_depth=10) 
        if method == 10:        
            from sklearn.ensemble import GradientBoostingRegressor 
            clf = GradientBoostingRegressor(max_depth=10)
        
        print("feature_selection_SFS method", method, ":", clf)
        model1 = clf.fit(training_array.T, training_label)     
        from mlxtend.feature_selection import SequentialFeatureSelector as SFS
        sfs = SFS(model1, sel_num_, forward=forward_, floating=False, 
              scoring='r2', cv=5)
        fit = sfs.fit(X, y) 
        sel = []
        #print("Num Features:", fit.n_features_)    
        print("sfs.k_feature_idx_ ", sfs.k_feature_idx_)            
        for i in range(len(fit.k_feature_idx_)):
            sel.append(f_names[fit.k_feature_idx_[i]])
        print("sel ", sel)
        return sel  
    
    In [ ]:
    def learning(method = 3, C_coe = 1.0, alpha_coe = 0.1, max_iter_ = 1000): 
        test1_col = list(df.columns[:test_len])
        test2_col = list(df.columns[test_len:test_len*2])
        training_col = list(df.columns[test_len*2:])
    
        #print("test1_col ", len(test1_col))
        #print("test2_col ", len(test2_col))
        #print("training_col ", len(training_col))
    
        training_array = np.array(df[training_col])
        test1_array = np.array(df[test1_col])
        test2_array = np.array(df[test2_col])
    
        if method == 1:
            from sklearn.svm import SVR
            clf1 = SVR(C=C_coe, epsilon=alpha_coe, kernel='linear', max_iter=max_iter_)
        if method == 2:
            from sklearn.svm import LinearSVR
            clf1 = LinearSVR(C=C_coe, epsilon=alpha_coe, max_iter=max_iter_)
        if method == 3:
            from sklearn.svm import NuSVR
            clf1 = NuSVR(C=C_coe, nu=alpha_coe, kernel='linear', max_iter=max_iter_)
        if method == 4:
            from sklearn.neural_network import MLPRegressor
            clf1 = MLPRegressor(hidden_layer_sizes=(100, 100), alpha=alpha_coe, activation= "logistic", max_iter=max_iter_)
        if method == 5:
            from sklearn.neural_network import MLPRegressor
            clf1 = MLPRegressor(hidden_layer_sizes=(300, 300), alpha=alpha_coe, activation= "logistic", max_iter=max_iter_)
        
        from sklearn import linear_model
        if method == 6:        
            clf1 = linear_model.Lasso(alpha=alpha_coe, max_iter=max_iter_)
        if method == 7:        
            clf1 = linear_model.Ridge(alpha=alpha_coe, max_iter=max_iter_)
        if method == 8:        
            clf1 = linear_model.ElasticNet(alpha=alpha_coe, max_iter=max_iter_)
        if method == 9:        
            from sklearn.ensemble import RandomForestRegressor 
            clf1 = RandomForestRegressor(max_depth=20) 
        if method == 10:        
            from sklearn.ensemble import GradientBoostingRegressor 
            clf1 = GradientBoostingRegressor(max_depth=20)
    
            
        #print("method", method, ":", clf1)    
        clf1.fit(training_array.T, training_label) 
    
        result = clf1.predict(test1_array.T)        
    
        from scipy.stats.stats import pearsonr  
        #r1 = pearsonr(result, test1_label)
        #print("test1_label ", test1_label)
        #print("  r1", r1)
    
    
        result2 = list(clf1.predict(test2_array.T))
        r2 = pearsonr(result2, test2_label)
        #print("test2_label ", test2_label)
        print("m", method, "a", alpha_coe, "i", max_iter_, "r2", r2[0])
        if(r2[0] > 0.7):
            print("result ", result)        
    
    In [ ]:
    test1_col = list(df.columns[:test_len])
    test2_col = list(df.columns[test_len:test_len*2])
    training_col = list(df.columns[test_len*2:])
    
    print("test1_col ", len(test1_col))
    print("test2_col ", len(test2_col))
    print("training_col ", len(training_col))
    
    training_array = np.array(df[training_col])
    test1_array = np.array(df[test1_col])
    test2_array = np.array(df[test2_col])
    
    sel = feature_selection_SFS(training_array.T, training_label, 8, True, df.index,
            method = 9, C_coe = 1.0, alpha_coe = 0.1, max_iter_ = 100)
    df = df.loc[sel, ]
    df.head()
    
    In [ ]:
    for m in range(1, 11):
        learning(method = m, C_coe = 1.0, alpha_coe = 0.1, max_iter_ = 10000)
        learning(method = m, C_coe = 1.0, alpha_coe = 0.2, max_iter_ = 10000)
        learning(method = m, C_coe = 1.0, alpha_coe = 0.3, max_iter_ = 10000)
        


    ċ
    Homework2.zip
    (697k)
    Jim Liu,
    May 23, 2018, 1:04 AM
    ċ
    Homework4_data.zip
    (934k)
    Jim Liu,
    Jun 17, 2018, 12:31 AM
    Comments