AIMA

- P(A|B) = [P(B|A)P(A)]/P(B) bayes theorem
- P(A,B) = P(A|B)P(B) product rule
- P(A) = P(A,B) + P(A, not B) total probability : let B' = not B
- P(A) = P(A|B)P(B) + P(A|B')P(B') total probability
- P(A1,A2,A3,A4) = P(A1|A2,A3,A4)P(A2|A3,A4)P(A3|A4)P(A4) chain rule - can be extended to n variables

When two features are conditionally independent, we can calculate their co-occurrence as a simple multiplication: P( A +B | C) =P(A|C)*P(B|C)

Bayes Networks: how many parameters

A independent from B |given C -> ? A independent from B > NO

A independent from B -> ? A independent from B | given C > NO

Expectation Maximization vs K-Mean Clustering

Laplace Smoothing

http://en.wikipedia.org/wiki/Additive_smoothing

applied to http://en.wikipedia.org/wiki/Multinomial_distribution

Additive smoothing is a type of shrinkage estimator, as the resulting estimate will be between the empirical estimate x_i/n, and the uniform probability 1/d.

theta(i) = (x(i) + k)/(N+k*d)

x(i) - observation of success events ( size of vector x is d)

N - total number of events

k - smoothing param

d - number of parameters (size of theta and x >) vector), in another words - d is the number of values that the variable x can take on ( for spam / nospam d=2)

above: 12 is total # of unique words ( spam+ham); 9 - # of words in spam; 15 - # of words in ham

http://171.64.93.201/ClassX/system/users/web/pg/view_subject.php?subject=CS229_FALL_2011_2012

Propositional Logic First Order Logic

Logical implication p->q produces a value of false just in the singular case:

the first operand is true and the second operand is false.

p → q is equivalent to ¬p ∨ q

http://en.wikipedia.org/wiki/Truth_table

http://www.oursland.net/aima/propositionApplet.html

http://www.inf.unibz.it/~franconi/teaching/propcalc/

http://www-users.cselabs.umn.edu/classes/Fall-2008/csci5511/PredCalculusPractice.pdf

Законы де Моргана:

1) ;

2) ;

Закон контрапозиции:

;

Законы поглощения:

1) ;

2) ;

Законы дистрибутивности:

1) ;

2) .

Situation calculus

Poss(a,s): special binary predicate- denotes the executability of action a in situation s

Markov decision Process

http://en.wikipedia.org/wiki/Markov_decision_process

http://en.wikipedia.org/wiki/Bellman_equation

http://www.karenkopecky.net/Teaching/eco613614/Notes_ValueFunctionIteration.pdf

http://rlai.cs.ualberta.ca/~sutton/book/3/node8.html

http://www2.econ.iastate.edu/tesfatsi/dpintro.cooper.pdf

Berkeley AI Class

http://inst.eecs.berkeley.edu/~cs188/fa11/lectures.html

REINFORCMENT LEARNING

http://en.wikipedia.org/wiki/Reinforcement_learning

http://en.wikipedia.org/wiki/Q-learning

http://people.revoledu.com/kardi/tutorial/ReinforcementLearning

http://en.wikipedia.org/wiki/Temporal_difference_learning

http://www.scholarpedia.org/article/Temporal_difference_learning

HW 5.1

http://www.reddit.com/r/aiclass/comments/m80hq/home_work_51_could_someone_please_clarifying/

http://www.reddit.com/r/aiclass/comments/m73wm/problem_with_sarsa_formula_in_hw_51/

http://www.aiqus.com/questions/9802/q-learning-in-a-stochastic-environment

Q(s,a) = Q(s,a) + alpha ( R(s) + gamma*Q(s',a') - Q(s,a))

alpha=0.5

gamma=0.9

R(s)=100 ??

We went north but came to east !!!! Initial state: Q(33, any action)=0

Q(33, North) = 0 + alpha ( R(s) + gamma*Q(s',a') - Q(s,a)) = 0.5 *(100 +0.9*100) = 95

Q(33, North) = 0 + alpha ( R(s) + gamma*Q(s',a') - Q(s,a)) = 0.5 *(0 +0.9*100) = 45

Q-matrix

STATE / ACTION : NORTH WEST EAST SOUTH

(3,3) 0 0 0 0

R-matrix

STATE / ACTION : NORTH WEST EAST SOUTH

(3,4) 100 100 100 100

other states: 0 0 0 0

Assume we at (3,4)

Q=0

FOR each episode

select random init state

DO WHILE NOT REACH GOAL STATE

select one among all possible actions

using this possible actions consider going to next state

get MAX = MAX [ Q(next state, all actions)]

Q(s,a) = R(s,a) + gamma * MAX

Set next state as current state

ENDDO

END FOR

Q(33 , 34) = R(33, 34) + gamma * MAX {Q(34-> all direction) } = 0 + 0.9*100= 90;

HW 5.2

http://www.reddit.com/r/aiclass/comments/m92qr/rewrite_of_hw_52/

HW 5.3

http://www.reddit.com/r/aiclass/comments/m75ui/important_the_clarification_of_hw_53_at_the/

http://www.reddit.com/r/aiclass/comments/m63wf/hw_53/

http://www.aiqus.com/tags/hw5-3/

http://mostafaabedi.blogspot.com/2011/11/general-tips-for-homework-5-of-ai-class.html

Particle Filter

http://en.wikipedia.org/wiki/Particle_filter

A* algorithm

Admissible heuristic that never overestimates the cost of reaching a goal.

A*: f(s) = g(s) + h(s) here: g - distance from the root to current state; h - heurisic distance from current state to goal.

We are minimizing the total cost f(s).

A search algorithm is admissible if it is guaranteed to find a minimal path to a solution whenever such a solution exists. Breadth first search is admissible, because it looks at every state at level n before considering any state at level n+1. Breadth-first Search is an A* algorithm in which f(n) = g(n) + 0

Eight puzzle Example: The heuristic of number of tiles out-of-place is certainly less than the actual number of moves to the goal state. So this heuristic (combined with best-first search) is an admissible algorithm. So is the heuristic sum of the distances of out-of-place tiles, because it too underestimates the actual number of moves required to reach a goal state.

If we overestimate this distance, however, it is not guaranteed to give us the shortest path. In such cases, we have what is called an "inadmissible heuristic."

HW 6

problem3: HMM

P(A0 | X0) = P(X0|A0)*P(A0)/P(X0) = 0.1 * 0.5 /(0.1*0.5 + 0.8 *0.5) = 1/9 = 0.111

P(A1|x0) = P(A0|x0) * P(A->A) + P(B0|x0)*P(b->a) = (1/9)*0.5 + (8/9)*0.5 = 0.5

P(a1|x0,x1) = P(a1|x0)* P(x1|x0)

P(x1| x0) =

P(a0)*P(x|a)*P(a->a)*P(x|a) + ... ( 0.5 *0.1 * 0.5 * 0.1)

P(a0)*P(x|a)*P(a->b)*P(x|b) + ... ( 0.5 *0.1 * 0.5 * 0.8)

P(b0)*P(x|b)*P(b->b)*P(x|b) + ... ( 0.5 *0.8 * 0.5 * 0.8)

P(b0)*P(x|b)*P(b->a)*P(x|a) + ... ( 0.5 *0.8 * 0.5 * 0.1)

=0.25*0.1*0.9 + 0.25*0.8*0.9=0.25*0.9*0.9=0.2025

GAME THEORY

Dominant strategy (does not depend from opponent move)

Pareto equilibrium

Every game has equilbrium point then nobody wants to change strategy (Nash)

However, Nash equilibrium does not necessarily mean the best payoff for all the players involved; in many cases, all the players might improve their payoffs if they could somehow agree on strategies different from the Nash equilibrium: e.g., competing businesses forming a cartel in order to increase their profits.

Game 2*2

4p-3(1-p) = -5p +5(1-p) --> 7p-3=5-10p --> 17p=8 --> p = 8/17

4q - 5(1-q) = -3q+5(1-q) --> 9q-5 = -8q +5 --> 17q=10 -> q=10/17

Utility = 4*p*q - 5*p*(1-q) -3*(1-p)*q+5*(1-p)*(1-q)=

4pq + 5pq -5p -3q+3pq+5*(9/17)*(7/17) =

12pq-5p +5*(9/17)*(7/17)=

12*(8/17)*(10/17) -5*8/17 +5*(9/17)*(7/17)

(12*8*10 -5*8*17+5*9*7) / (17*17) = 2.06

960-680+315=595

http://alchemy.cs.washington.edu/ Open Source AI

Page updated

Google Sites

Report abuse