compsci

Mon 01/22/07

answer to three questions on the next exam: depends.

file structures -- to big for memory

programs will take awhile to run.

programs will be hybrids of our inclass work. the ideas are solid

read chapter 1

we are in 10.4

how fast is is? -- well how many things do it do -- of course faster machine is faster

for i = 1 to 10

a = b + c + d

End

how fast is that.

go around 100 times -> 10 times slower.

add b = bob(b);

now it depends on what bob is doing.

if we go to n -- how fast is it -- ? n !!

proportional to n

O(n)

that means proportional to n

if its proporitional to 10 then its a constant - every time i run it, it will take the same time.

propportional to n --> has to do with how big the thing is yo're doing -- snc's n is 2200, madison -> 20,000

not precise but pretty close math.

2^n -> slow -- 1 -- is fast

we are going to experiment -- lab reports should be standalone documents --

-------

Wed 01/24/07

You can't take somebody's word for it

Big O

O(1) - constant.

ex:

a=4;

b=2;

c=d;

O(3)

3 * O(1)

do it 10 times and its still a constant.

this is as fast as you can get.

always dealing with n things.

no matter how big n is, i'm going to find whatever i'm looking for in constant speed. you don't get faster than proportional to 1.

O(n) -- good buddy the "for" loop

for (i=0; i<n; i++) {

if(data[i] == ?)

}

that's O(n) -- function of how big it is.

adding a line of code doesn't change that.

O(n^2)

for (i=0; i< n; i++) {

for (j=0; j< n; j++) {

}

n*n -> n^2

example - bubble sort.

most sorting algorithms are n^2

O(log n)

2^? = n

binary search.

32/(2^5) = 1

log will always be base 2.

n^2 grows quickly!

sorting is n^2 --> don't sort :-D

..with that algorithm

remember, fastest thing is proportional to 1. you can have a bad constant of proportionality.

example of O(1) -- array of id numbers --

but you've got to have consecutive ID numbers.

for i=1 to n

a=4;

b=3;

for j=1 to 10

end

still n.

but if we use n for the inner loop -- n^2

n/2 * n = n^2/2 --> 1/2 O(n^2) -- that's still n^2

p = n

while p > 1 do

...

p=p/2

end

32.

log (n)

matrix

2d array

ADTs

you can use it without worrying about how it was implimented

ex: 2d array

m.put(81, row, col)

value = m.get(r,c)

m(size)

a[size][size]

A[r][c] = value

that's O(1)

return a[r][c]

fast.

that's also O(1)

if the array is big -- well it won't fit.

another way to do it

m.put(v,r,c)

a[n].v=v

a[n].r=r

a[n].c=c

n++;

that's still O(1).

but lets do the get

get(r,c)

for i=0 to n-1

if(a[].v= ...

end

return 0

so its n to get it out.

but we need to exclude 0.

if(v!=0) then do this.

we also need to figure out how to change a value .. could just add to the end and then search from the end.

-------

Fri 01/26/07

handout on G:

matrix.cpp

O(1) - constant, really fast

O(log n)

O(n)

O(n log n)

O(n^2)

stack - abstract data type.

last thing in is the first thing out.

all inserts and delets occur on the same end.

array implimentation

int stack[100];

stack

---

top = 2

efficient implimentations.

initialize

get something off

put something on

int

del

add

empty

add

top++

stack[top] = value

//check for overflow tho

if(top >= maxstack) -> trouble.

this is O(1). "really quick"

cout << ++a;

inc a and then c it out.

v = s.delete();

Del ()

Return (stack[top])

cool -- but we have to ajust top and check for empty

Empty

(top == -1)

if (top <= -1)

init ---> top = -1

empty

Return (top <= -1)

BUT!! how big do i make it?

int stack 500;

int * stack;

stack = new int[x];

it depends :-)

top--[c]->[b]->[a]/

empty condition - top=null

add - give me a new node add data and chage the ponters.

x = new Node()

[ if x == null, PROBLEMS]

x -> data = value

x -> link = top

top = x

Del

if (top == null) empty

temp = top -> data;

Q = top;

top = top -> link;

delete q;

Return (temp);

void Sally () {

stack s;

s.add(x);

cout << "hello";

return;

}

static is guarrenteed

-------

Mon 01/29/07

stack

everything proportional to 1

stack used by

recoursion

expresion evalation

backtracking -> like gaming, mazes, a bit of AI.

expression evaluation

cout << 1+2*3;

* has higher priority

2^n

A+B^2*D-4

squaring, * D, add A, sub 4.

usually done using postfix notation

postfix and infix

infix: operator (+) is inbetween operands

Postfix - operator is after the operands.

postfix - AB+

() and can evaluate quickly.

A+B*C - infix

ABC*+ - postfix

A+B*C

t[1] = B*C

t[2] = A+t[1]

BC*

AT[1]+

substitute

ABC*+

evaluate:

start at front, go until you hit an operator, pull your two operands and keep going.

BC* T[1] = B * C

t[2] = A+T[1]

answer in t[2]

infix A/B^C+D*E-A*C

postfix -> ABC^/DE*+AC*-

t[1] = b^c

at[1]/de*+ac*-

t[2] = a / t[1]

t[2] de*+ac*-

t[3] = d*e

t[4] = t[2]+t[3]

t[5] = a * c

t[6] = t[4] - t[5]

answer is in t[6]

eval code

loop

stack

case

hit an operator

unstack, perform math, stack answer

keep going

loop

if you're an operand stack it

forever

AB+C*

2 3, 4

(a+b)*c

next token - gets next number/operator

is it the end of my expression no

is it an operand? yes, stack it

b - operand, stack it

must e an operator

pop our operators

t[1] = A + B

stack t[1]

next c .. stack

next * -- pop it

t[2] = t[1] * C

t[2] is stacked

are you done? return the top of the stack.

ABC*+

A+B*C

stack a -- a is 2, so stack 2

b is 3 so stack 3

c is 4 -- stack 4

next token

* is an operator, so pop 4 and 3

stack the 12

next token

+ --> pop operands

2+12=14.

stack the result -- 14

answer on top of stack.

eval -- stacks operands

so how do we convert from infix to postfix

Postfix

// converts from infix to postfix

loop

x = next token

case

x= ∞ {}

x = operand output(x)

...

incoming priority

(

) is a special case -

else // operator

unstack the top operaters that have priority over x.

how fast is the eval function - proportional to n

A-(B+D)*E∞

put -∞ at start of stack

A is an operand, so output it

- -> an operator

isp - in stack priority

unstack all operators with higher priority

nothing there

so we just stack it

next (

the isp of a - is a 1 and the incoming priority of a ( is a four

incoming

4 2 1 4 n/a

in stack

3 2 1 0 -

again nothing to unstack, so stack it

operand .. so

on output

next an operator

isp of ( is a zero -- 0 is not greater than or equal to +, so we stack the plus.

open parens are never unstacked.

a 0 is never ≥ than any incoming value.

they get unstacked when you find a matching closing.

we could have made the open paren a special case, but you'd pay for it.

next operand, so move to output

ABD

whenever you find a closing, unstack until you find an open

ABD+

and then delete the open paren

* --> minus sign does not have priority over *, so stack the *

E - operand

output it

ABD+E

∞ - empty and output the stack

ABD+E*-∞

t[1] = b+d

t[2] = t[1] * E

t[3] = a- t[2]

A+B*C^D-E∞

A - operand, move it to output

+ -> nothing to unstack, so go ahead and stack it

b - operand ,move to output

the instack priority is a 1 and incoming (*) is a 2.

no presidence, so the * gets stacked

c - move to output

ABC

^ - nobody has more presidence so stack it

operand ove to output

ABCD

- -> instack is a 3 , incoming is a 1,

so unstack and output until we get down to ≥

ABCD^*+

operand --> output

ABCD^*+E

empty the stack

ABCD^*+E-

A-(B+C)*(D-E*F)-4∞

operand

operator - stack

stack (

+ gets stacked

closing paren -- output everything to open paren

ABC+

ABC+D

ABC+DE

ABC+DEF

)

ABC+DEF*-

ABC+DEF*-*-

stack -

ABC+DEF*-4-

e^x^2

so we need incoming priority of 4 instead of 3.

-------

Wed 01/31/07

fortran

** means exponentiation

a**n

ftn -- on the unix box

x^y^z

512 in fortran

2^3

64 in basic

2^9

2^3^2

2^(3^2)

e^(x^2)

is what we want

not!

(e^x)^2

we need to go backwards

sym isp icp

*/ 3 *4*

+- 2 2

( 1 1

-∞ 0 4

postfox

operands = output(x)

everytime you got an operator you stacked it

operators that had precidence would get unstacked first

-- so we use 4 for the icp so that we can go backwords

right to left exponents

( --- in stack 0, incoming 4

if(a<b+1 || d< c-4)

true or false

a<b+1

abl+<

add < to the table

isp - -> .8 icp .8

and > ≠ etc

all of them have to be done last.

|| is last -- so give it a .5

= gets .1 and .15 (we do = from right to left)

for (i=1+b; i< ..

everything in c is an expression.

a=2;

cout << b=a+1; //gives you 3

if(a=b) --> always true

Error checking

eval -

evaluated postfix epxpressions

AB+C*

had a stack.

if you have more left on the stack, not enough operators

AB+CD-R

if there's nothing to pop for operands -- too many operators

A+*B

postfix -

if you have ( left on stack ..

missing a close paren

if you have ) left, you're missing open parens.

you got too many operators, etc

another component -- queue ..

first in, first out

waiting line.

all inserts and deletes occur on opposite ends.

usually, we add to the rear and delete from the front.

like a line

this is like multitasking.

ADT

q.Add(val)

q.Delete()

q.Empty()

q.Init()

class Queue

Queue T;

[]

ints front and rear

want to add to the rear.

Add

R++;

Q[R] = val;

Del

temp = Q[f]

f++;

Return(temp);

OR . .

Return (Q[f++])

val = Del();

but that's not a very efficient use of the array

we'll make a circular array.

queuesize = 6

r= (r+1) % queuesize

for delete

f = (f+1) % queue size

full

[a]

[b]

[c]

[d]

[e] -r

[g] -f

f = r+1 -- i'm full;

or f= 0 and r = queuesize-1

empty

f=r+1;

f=r-1;

no no no!

can't tell you when i'm empty!!

so . . .

define empty as f=r.

it will cost us 1 location in the array.

f = 1 less than real front.

new full condition:

f= r +1

f = 0 && r= q+1 (?)

Add

R++;

If (f=r) full

Q[R] = val;

Del

temp = Q[f]

f= (f + 1) & qsize

Return Q[f];

talk to com ports

$A4, lengouth, data, checksum (byte)

friday - linked list version

-------

Fri 02/02/07

f=r

circular array queue

linked list queue

[c]->[b]->[a]/

adding

add to teh rear

give us a new node x , put a new value in here -- change x's pointer to r and change r to x.

delete from front

q = f->data

eventually

Return (q)

but we can't change the f

we can't get back.

f r

[a]-> [b]-> [c]->

we just flip the labels around and we're good

q F -> data

t=f

f - f->link

delete t

return q

add to the rear

x = new Node(x);

x->data gets value

x->link gets null

r->link = x

r= x

how we gonna represent empty

say f=null is our empty

but that makes adding a problem

if f=null

f=r=x

else r->link=x

delete --

//(unstack)

if f=null, then we're empty and i've got nothing to give

else [

q = f->data

t= f

f=f->link;

delete t

return(q)

]

what if the queue has only one value

delete

q='a'

t = f

f = f->link // so f is null

delete t

return q

so r points to nowhere but that's ok because we set r when we come in to the add function.

A= ((((((((((((((()))))))))))))))

they match but it won't compile.

A= (((((((((((((())))))))))))))

does compile

so you're limited 14 because they use an array stack .

Bob

gas station

two pumps -- should i add another pump

write a program that takes cars coming by based on some statistics and come on and go to the shortest queue and generate some random nuumbers to figuue out how long they stay here

we can compute

change one thing by adding another pump and see what happens.

1 - acuquire statistics

need to now the average waiting time at a pump

need to know arrival rate -- how frequently does a car come in

that should be it for now

a customer arrives every six minutes on the average

the average wating at the pump is 5 minutes - that's how long you stay at the gass station.

we need random numbers, psedo random number generators.

have a variable called time -- minutes -- every click of the clock is going to be a minute .. start at 1 and go though a whole day

see handout

event list

leave when done pumping

we don't allow switching of q's

time arrival/departure what q

30 A

32 D 134 A

35 A

we don't know what queue they're going into until they get there.

little variable time gets bigger and bigger

time 30 -- we get an arrival.

time 30

we get an arrival

so we queue it at the shortest line -- say q 2

32 -- departure

we keep track of the pump time in the q

in queue

id,

what q

when you got the pump

arrival time

when its done, you leave and the next guy gets the pump - record when hh gets the pump

34 - arrival

35 -

at the end of the day we llook at our stats and compare what we got.

can simulate a lab, stoplights.

this is going to work .

first arrival

setp time to 1

time 1 arrival

setp time to 1

time 5 -- we will have another arrival

start over

queue

event list

what rand does -

gives you integers in the range of zero to rand max

predfined constant calle drand_max

rand() % 6 +1

that gives us uniform distribution.

we want a uniform distribution that averages to 6

we can get random numbrs frm zero to 12.

-------

Mon 02/05/07

random numbers

rand()

numbers from 0 to rand_max

normally rand is from 0 to less than one.

x = int(Rand() * 6) +1 // excel rand.

0 to .99999999999

0 to 5.9999999999

int

0 to 5

from 1 to 6

random number generator

usually numbers are multiplicative.

once you have one random number, the next one is a function of the previous plus a couple magic numbers.

repeats after you generator another of the same random number

also something called a seed .. the random number is a function of the lst one --

how do you get the first one.

static seed .51917

f you have a staic it stays around the length of your program.

other various disappear at the end of their scope.

while (1)

cout << rand();

srand();

we like the numbers to be the same each time when debugging. easier that way.

float Rnd()

0 ≤ x <1

return (rand()/(float)(rand_max +1)

uniform distribution (flat)

bell curve way "normal distribution"

poission distribution

chop off the left tail. -- the negative part.

so for bob

average is 6

when is the next arrival going to happen.

according to the poisson curve.

poission code

float Poission (float mean)

{

return(-mean * ln (Rnd());

}

linked lists

top->[a]->[b]->[c]/

class Node {

public

char data;

Node * link;

}

Node * top;

Node * x;

x = new Node;

makes us a brand new node pointed to by x;

delete x;

x -> data= __ ;

* x.data= __ ;

x -> link = __;

* x.link = __;

Data(x)

Link(x)

will be used now on the board.

printlist *top

q = top

while (q != null)

Print Data(q)

q = Link(q)

end

void Printlist (Node * top) {

Node * q;

q = top;

while (q!= null) {

cout << q -> data;

q = q -> link;

}

GetNode(x)

ReturnNode(x)

top->[a]->[b]->[c]-/

pointers point to the whole node

data link

0 A 1

1 B 2

2 C -1

array representation of the heap

Printlist(top)

Printlist (z)

avail

linked list of every memory box that's free

with that avail list -- eveeryone is on some lis -- no memory leak.

getnode(x)

data(x) = w

link(x)= -1

adding -- stack

getnode is reaaly an unstack of avail.

getnode

free = avail.

avail = link(avail);

return free;

but if avail = -1, then the heap is full

-------

Wed 02/07/07

the heap

class node

char data

node * link

you own heap

everything on avail

avail = 0

build top->[a]->[b]/

getnode(top)

top = 0

avail = link(avail) ..

avail = 1

data(top) = a

link(top) = NULL .. puts a -1 in the link.

Getnode(x)

avail = 2 .. link(1)

data(x) = 'b'

link(x) = null

link(top) = x

we build on everything.

what if you've got more fields

name

major

gpa

adivsor

link field

then we'd have more parallel arrays

top - [a]->[b]->[c]/

problem --- we can go this way ---> but not this way <---

one way --- stack and then print.

so we put backward pointers as well

Node * forward and

Node * backward

in our array based reprentation -- ad another field called "backword"

print data(p)

f = forward(p)

data(f)

f = p -> forward

b = backward(p)

doubly linked list.

we'll treat it as an abstract type

with a singly linked list, we can't delete a random node w/o knowing anything else.

but what do we do about top previous.

use a header node.

[a] [b] [c]

top points to the header node -- the data field is junk.

link it up (forward)

we're going to go forward.

and we'll make it circular.

while p is not top, .....

end while

emtpy list -- top = link of top

now double link the list.

same printlist for forward.

for back , just change link to back -- or forward to back.

top

forward - points to top

backword points to top

top -> back = top

top -> front = top

NOde * top;

top = new Node;

top->font = top ->back = top

InsertAfter(p, datav)

{

GetNode(x)

Data(x)=datav

Front(x)= Front(p)

Back(x) = p

Front(p) = x

Back(Front(x)) = x

}

empty list

if you want to add a node, the only place p would be is the header node.

o(1)!!!!!!

insertBefore(P, data)

insertAfter(Back(p), data)

End

we may be asked to do the reverse on the test.

Delete( p, top)

If p = top, get out

Back(Front(p)) = Back(p)

Front(back(p)) = Front(p)

RetNode(p)

End

here you're already there.

so why not use double linked lists -- well you have an extra bit of data per node.

-------

Fri 02/09/07

top -> [a]->[b]->[c]/

copy the list.

(shallow copy -- b=top)

when done we should have a whole separate linked list.

printlist problem

p=top

while ( p!= null)

new node

data(x) = data(p)

link(x) = next value :-(

p = p->link;

}

that's a little messy

copy(top)

If top=null return null;

otherwise

getnode(x)

DATA(X) = data(top)

link(x) = copy(link(top))

.....

return(x)

copy is supposed to copy a list, so it returns a pointer that's a copy of the list.

recoursive copy.

activation stack

individual parts are called the activation records

every call allocates

all parameters, any local variables, RA

every return, you dealocate (unstack)

all reference to the stuff that we're stacking is tru top of the activation stack.

for a b c list

give address 91, 21, and 6 to the nodes

so top = 91

stack

top x ra

91 ?-- x M

....

normal recoursion picture - i'll draw it for you if you like

recoursion is powerful, but it costs you -- you pay for it.

top -> ....

b = copy(top);

we could run out of the activation records

so lets simulate the recoursion.

1) maintina our own stack {Add, Del, Push, Pop}

On every call, stack our current values (all param, locals, ra)

2) set new parameters

3) branch (jump) to the beginning

on every return,

1. if our stack is empty, then we'll use the normal return

1.5 if we're dealing with a function, then i'm gonna save the function return values

2. unstack everything -- params and locals, and then branch to the return address.

3.5 - if function whereever we're branching to , i need to process the return value.

we'll actually reference the parameter and the lccal variable itself now -- why stack them if we don't need them back

so . . convert copy(top) to a non recoursive routine that does the same thing

Function copy(top)

init stack

if { top is null then // Return(Null)

if s.empty then return (NULL)

save = Null // need to save the return value

unstack (top, x, RA)

goto(RA)

}

Getnode (x)

Data(x) = Data(top)

// copy(link(top))

stack (top, x, RA (1))

top = link(top);

goto begin;

(1) Link(x) = save

// Return(x)

If s.empty then Return(x)

save = x

unstack(top, x, RA)

Goto RA

end

to refine this -- the first part down to goto begin is a loop

init stack;

while (top !=null) {

Getnode(x)

Data(x) = Data(top)

stack (TOP, x, (1))

top = Link(top)

End

if s.empty then Return(Null)

save = null

snstack (top, x, ra)

goto ra

(1): Link(x) = save

If stack empty then reutrn (x)

save= x

unstack (top, x, ra)

goto ra

but we an do better

the only pssible value for ra is (1)

so get rid of "goto ra" , change to goto 1

and now e're not using ra anymore, so don't stack it.

but the last part .. we can get rid of the goto

watch this.

x=Null

while ! stack empty do

save

unstack

link of x

end

return x

so.

loop from up top

plus

x=Null

while ! stack empty do

save

unstack

link of x

end

return x

but why are you stacking top if y're not going to use it.

to get rid of top

-------

Mon 02/12/07

Recoursion -

removing rcoursion

tree

- formal definition: a finite set of one or more nodes such that:

1) There is a special node called root

2) all the nodes are partitioned into N≥0 disjoint sub-sets, where each subset is a tree.

Root

Parent

Child

Sibling

"tree is a finite set" - it terminates -- that's good. a tree quits sometime

one or more nodes

one of them called root.

all the other notdes are partitioned into n≥0 disjoin (really important) sub-sets where each subset is a tree.

Root (A)

Nodes a b c d

----- B

----- C

but then b and c are trees

so D can be a sbuset of B

disjoint means that the subtrees don't connect. -- thre is only one path to a given node.

one special node called root

everything else is the subset

the root is the parent of everbody, then everbody is the child -- or great grand child

think of a family tree.

usually we draw them downward in computer

and everyone of these subsets is infact a tree.

just one node is a tree -- a root with no children

who are the children of c? -- fg e -- they're one level down from C

given any node, there's only one parent -- on eway to get there

sibling -- you've got the same parent

"all other nodes are partitioned into N ≥0" --

jisjoint -- can't connect multiple times. -- only one path to a node

subsets are trees.

in definition, there's no implied ordering. -- you can move the sibilings around under the parent.

corporate tree

sports conference

how will we represent that

could have a node with a lot of pointers that pont to the children.

a node has a variable amt of link fields.

binary tree

every node splits two ways -- but you can have empty nodes.

we will convert other trees to binary tree.

Finite set of nodes which is either empty or i have two links.

and all other nodes are binary trees.

Level 1 -- root

level 2 - children or root

etc.

powers of 2.

2^(level# -1)

Max number of nodes in tree with level n is 2^n -1

every node splits two ways .

An implimentation

Data [A|B|C|D|E|F|G]

that's a tree

Root =1

left = pos *2

right = pos*2+1

parent = pos div 2

these rules recreate a tree.

you can't do any better than this for a tree.

but if we give F a child,

that child, h is at 13 -- and we have tons of blank memory cells.

so this only works well for a dense tree

this is called the linear representatin of a binary tree

root -> [left|data|right]

its a linked listish tree

class NOde {

public:

char data;

Node * left;

Node * right;};

Node * root = null;

Visit(Root) {

if root = null, return

cout << Data(root)

Visit(left(root))

Visit(right(root))

}

remove recoursion for this... maintain our own stack

Visit(Root)

Init(stack);

Begin:

If Root = null, then //return

{

(3) if s.empty() then return

unstack( __, __, ra)

goto ra

}

print data(root)

// visit (left(root))

stack(root, ra (1))

root = left(root)

goto begin;

(1)

//visit(right(root))

stack(root, (2))

root = right(root)

Goto Begin

(2) // Return

Goto (3)

Visit(root)

Inst Stack

Begin:

while(root ≠ null) {

print Data(root)

stack ( root, (1))

root = left(root)

}

(z)// root = null

If s.empty then return;

unstack (root, ra)

goto RA

(1)

stack (root, (2))

root = right(root)

Goto begin

(2) Go to (z)

but .. if i go to 2 i go to z, so just make z a 2.

begin.. go to begin -- is a loop forever

goto ra from 2 -- has to be either a 1 or 2.

but we through away the twos that we stack, so don't stack them

and then the goto becomes goto 1

which is the next statment

but now we don't use ra anymore, so don't stack it

that looks pretty nice.

-------

Wed 02/14/07

binary trees -- every node has a right , a left, and a data field

all nodes are binary trees.

visit/ process a tree

preorder - visit first, then you go left and right

inorder - visit in between

postorder visit last

preorder

starting at root

Visit data(root)

preorder(left)

preorder(right)

return

in order

visit in between

postorder

visit last

left

right

visit

post order doesn't refine as nicely.

nonRecPreOrd(Root)

int stack

r=root

loop

while (r != null)

visit Data(r)

stack (r)

r = left (r)

end

if stackempty then return

unstack(r)

r=right(r)

forever

try it

in order -

int stack

r=root

loop

while (r != null)

//visit Data(r)

stack (r)

r = left (r)

end

if stackempty then return

unstack(r)

visit Data(r)

r=right(r)

forever

post order doesn't refine as nicely.

parallel arrays

grow a tree

tree

pointer somewhere in middle of tree

add a new node as the immediate left subtree of the node pointed to by p

InsertRight(P, data)

Getnode(x)

Left(x) = null

Right(x) = right(p) // in our example that's null at this point but this way it'll work in the middle"

data(x) = data;

right(p)=x

InsertLeft( P, data)

Getnode(x)

Right(x) = null

Left(x) = left(p) // in our example that's null at this point but this way it'll work in the middle"

data(x) = data;

left(p)=x

delete

we want to know p -- but we gotta know parent.

we've got problems too if we delete a child that has children.

if we want all the children and grandchildren of a node gone,

we can delete doing a postorder.

that deletes the last one first.

... winodws folders

InsertRight

InsertLeft

"del"?

delete command

just

delete root; -- put it back to the heap

Mary

Joe Sam

Bill Fred Wanda

Bill

Joe

fred

mary

sam

wanda

in order traversal .. is in order!!

binary search tree

Left of P has values < Data(P)

Right of P -- values ≥ Data(P)

(fred was in the wrong place)

sally would be sam's left child

O --- n log(n)

so how oo you build one of these

r = root

if (value > Data(r)

R = Right(r)

--- if that's null, insert right of r

else

R = Left(R)

--- if null --- insert left(r)

build root -- put max_int for value.

binary search tree.

but if the tree is real lopsided -- its more like n/2

-------

Fri 02/16/07

big O

O(1) - flat

O(log n)

O(n)

O(n log n)

O(n^2)

ADT

Matrix -- an empty (rather) matrix takes up a lot of empty space

stack - array, linked list

postfix eval

state diagrams -- show it working.

don't memorize the postifx and eval code to understand how they work .. understand well enough that given enough time you should be able to understand it.

icp and isp

postfix algorithm works because of those values.

queues

^ 3 4

*/ 2 2

+- 1 1

( 0 4

-∞ -1 n/a

say you're in an environment that uses one priority table not too. - what doe s that imply about the expressions that its handling?

then we'd need a special case for (

exponent would have to be a special case (can't do it) as well.

also can't handle a=b=c

Queue - array and linked list

major application: simulation.

bob's gas

arrivals

departures

event list

arrival patterns - uniform, bell curve, posion curve

random #'s

array representation of linked list

int heap

getnode

retnode

parallel array's -- dta and link field

subscripts became pointersavail -- was a pointer to all the free nodes.

doubly linked

access mmthods

add,

addafter

addbefore (today is yesterday's tomorrow)

delete

used header node and circular to make the header and delete procedures easier.

recoursion and linked structures.

copy a singly linked list

removing recoursion

giving a recorsive procedure , come up with a non-recoursive version

General trees

... binary trees

traversion

preorder

inorder

postorder

references

chapter 6 - linked list

6.6 - array representation

chapter 7 - stacks

postfix and eval

chapter 8 - simulation

chapter 10 - recoursion

10.3 implimenting

11 - doubly linked lists

11.3

12 - binary trees

short answer.

no fillins, no tf, no multiple guess

short questions after ideas.

coding in our psudo language.

if(v≤0) return;

cout << v;

print(v-1);

return

}

print(4)

what we did in lab ... important

fill lab.

when you get an operator, you should unstack whatever has greater priority than what is coming in.

Binary search tree

L <

R >

Search(root, value)

(if we don't find it, return 'null')

R= root

While (R ≠ Null)

If (val < Data(r)) R = Left(R);

else [Found it at "R"; Return]

End

Return (Null) / not found

End

look for 9

speed of search -- log n provided that the tree is nicely shaped.

could have a tree lopsidedlike

-------

Mon 12/19/07

array representation of a linked list

first, put everything on avail.

getnode gives you a free node from the front of the list of avail ..

basically an unstack

x = avail;

avail = link(avail);

Delete

data array and link array

pointers became subscripts.

link(top) = link[top]

printlist

print data[top]

top = link[top]

traversing a binary tree

in order -- not going to visit until you do the whole left.

doubly linked list.

with header node

deleting

if p != header node

left(right(p))=left(p)

right(left(p))=right(p)

matrix -

2d array.

Get();

Put();

you could make this a 2d array.

but bad for sparce matrix.

we tried 3 1-d arrays

addin was fast

searching was slow

Postfix

A+B+C ..

ABC*+4,7^-

stack operators

unstack something that has priority.

while (isp ≥ icp(token)

unstack operator

output

end

stack(token)

isp, icp

^ 3 4

*/ 2 2

+- 1 1

( 0 4

-∞ -1

operand to output

unstack if higher priority

empty the stack at the end.

< done after +, -, so between 0 and 1.

( at bottom because we only unstack at special case.

and OR is laster than the greater than, less than.

removing recoursion

stack parameters and locals, and ra.

return - unstack.

...

everytime you make a call, stack current values (parameters, locals, ra)

set new parameters

jump

return . . .

unstack and go to ra

distributions

uniform

vs.

poison

-------

Wed 02/21/07

binary tree

[left][data][right]

7 nodes

14 link fields

only on way to get to a particular node.

root is a pointer, but its not a link field.

6 nodes pointed to by a link field

we've got more null links than not null.

some folks think we shuld make use of those null links

call them threads -- they can point to another node.

right threads -- point to the inorder successor

in order traversal

if empty, stop

left

visit

right

non-recursive version

InOrder(Root)

init stack

loop

while root ≠ null

stack(root)

root = left(root)

end

if s.empty, return

unstack(root)

visit root

root = right(root)

forever

end

every place we have a null right, point it to the in order sucessor.

if the last think you did was visit a node

say we just visited b -- goto the right of b

that's a

by definition, its an in order scessor, so we stop

now i just visited a

go right -- go lef as far as you can .. get to l.

header node

new format -- left, data, right, bool thread

INOrderSucc(P)

right (one)

if thread then stop

left as far as possible

now we can do an in order travseral without a stack.

InOrderSuc(P)

s = right(P)

if (p.thread) Return (s)

while (left(s) != null) do

s = left(s)

end

return (s)

end

P=root

loop

P = indordersec(p)

if p = root return

visit p

forever

try it out

BST

insertLeft

getnode(x)

left(x) = NULL

right(x) = p

tread(x) = true

left(p) = x

or so

InsertRight(p, val)

Getnode(x)

data(x) = val

left(x) = null

right(x) = right(p)

thread(x) = thread(p)

right(p) = x

thread(p) = false

costs us two statements to build.

InsertLeft(P, val)

getnode(x)

data(x) - D

at the end of the tree (left(p) = null)

Left(p) = x

Left(x) = null

Right(x) = p

*Thread(x) = true

every node with a non null left is pointed to by a thread

we gotta find the thread that pointed to the parent .. its an arbitrary distance away and we gotta point it to the new node.

if left(p) ≠ null

q = left(p)

while !thread(q)

Q = right(q)

end

Left(x) = left(p)

right(x) = p

thread(x) = true

left(p) = x

right(q) = x

so ajusting the thread is proportional to n

but binary search tree nver adds in the middle

so then we never run into this problem.

Take A and point it to the first child

the other children of A are going to be hooked up on the sibling pointer

Put right threads in

flip the tree 45degrees

empty sibling points to parents

so to get the parents, follow the sibling pointers until you get the parent.

any tree you draw, you can come up with

wanna save the binary tree.

how????

combine pre and in

A is first node

then all the nodes before the root are on the left

well who's the root of left?

b because its visited first.

p = p -> link; // teh suck

slow!!

L Data R arrays

binary write the whole structure at once.

so alls you need is an array.

Sort algorithms

binary search tree

n * log n

can't sort unless you can search

Indexed search

search index -- index is going to be parallel.

search array for key value -- if you find it return the position where you found it .. otherwise, return status indicator

index and n

void Search (val) {

p = -1

for i =0; i < n; n++

if == then set p and returh

return p; // -1

and a for is really a while

there are two decisions

can we make only 1?

yes -- put the if statement as the stopping condition

what if its not found?

but it in index[n]. we're not using n -- only n-1.

-------

Fri 02/23/07

threaded tree

right threads

advantage: process the tree with an inorder traversal without a stack

doesn't cost us much (well insert left for middle of the tree)

binary search tree -- always add at end of tree.

Converting from general to binary

take our node -- instead of left and right pointers, have first child and sibling.

-------

Mon 02/26/07

seqential search

faster sequential search

proportional to n

binary search

bring in array, value

return position

proble in the middle

stopping condition -- if low is ever bigger than high

low is the lower bounds of what we are search for , high is the top end.

should be on g:

keep the low and redfinite the high if less than

otherwise do the other if greater than.

otherwise its equal to, so you found it.

finding 32

0, 9

so first probe at 4

n=15

well constructed binary search tree // full tree

4 12

2 6 10 14

1 3 5 7 9 11 13 15

optimal tree

how fast -- log(n)

but we can be more recise

actual average search time is log(n) -1

because most of your searches require four looks.

average for thh first 3 levels.

1*1 + 2*2+ 4*3 = 17

/ 7 = 2.4

four levels.

eight mmore times four

3.2

gets closer and closer to log (n) -1

for the average search.

biggest yeahbut about the binary search -- the data has to be ordered.

int Luck(int data[], int, int val) {

int x;

do {

x = rand() % n)

if(data[x] = val return (x);

} while(1);

}

sequential:

best: 1

worst: n (at then end, not in there)

average: n/2

binary search:

best: 1

worst: log(n)

average: log(n) -1

binary search tree:

best: 1

worst: n

average:

luck:

best: 1

worst: ∞

avg: pretty good -- because there's no overhead.

O(n) (?)

best algorithm - index search -- no search! -- O(1)

***** (don't forget this)

yeahbuts - needs dense parrt numbers

not necessarily start at 0, but dense.

address calculation -- data[x-100]

Hashing

keys are not dense

41 11 <== probes

55 13

12 12

42 2

60 5

90 0

put in array

probe: key % tablesize

return (probe)

probe -- works better if you use a prime number for the key

one way to handle two things for a probe -- bucket

we'll loook at chaining

and progressive overflow.

chaining -

probe = key % table_size -- we want prime number (why?????) for table size

36 goes to 1

22 goes to 1

31 goes to 3

make a linked list at each spot.

Findkey (key)

prob = key % table size

p = table[probe]

then printlist -- while loop sequential search.

return a pointer to where you found it

best case = 1

worst = n

avg = 1.5 -- according to "them"

addkey

probe = key % tablesize

and add to front

wed. progressive overflow

-------

Fri 3/02/07

lab help

find the depth -- 16 or 17.

writing the file

reading it -- using getline

fstream df;

df.open("bob",ios::out);

cout at the end.

while loop in your depth program -- not going to work.

see handout.

divisor = probe = key % tablesize; -- cheap but it doesn't always give very good results.

you can do anything an hashing works.

so the whole idea is to do something to the key and hope that it distrubtes you rdata nicely.

another technique - mide=square

square the key

extract the middle digits

example

100,000 - 999,999

7,000 are unique

we'll use a table size from 0 - 6999

so square the key

172148

029634933904

you get no more than a 12 digit number back.

now remove the middle four digits

how many do i remove? -- if my table size is 7000, i need to remove four in order to represent 7000

take num/ 10000 = 02963493

prob = key % 10000; .. gives us 3493

but we need a range of 0 to 6999

so multiply by .7 aka 7000/10000

so divisor method gets prob right off the fist statement

this one is a bit of work but may give you better results.

we have 7000 unique probes.

collisions are bad -- we get sequential searches.

if you get a collision, rehash. -- so maybe you has with one function -- divsor -- so maybe you take a different function and rehash. use that as the offest to your psoition.

.. and one rehash is good -- its not worth it to rehash again.

can't let hashing go because its average performance is proportiional to 1.

but the worst case is really bad.

find pankratz in the poone book using hasing -- you can't .. nice try just using the string.

but you can sum the letters and use that to get your probe.

yeah but ABC = CBA.

we want to involve the psoitions as well.

so multiply by 100 each time.

but then our smallest key is 656565

but if we substract 'a' -- then we can get to zero.

well we still have big numbers

why not multiply by 2 instead -- then we have something that screams, sure we'll have collisions that way (?)

shift ---> << operator ????

whatif you were goging to write a speller

here at snc we have 400,000 words -- no word is longer than 6 characters.

we want to look up the words.

fast!

we could do a 26 branch tree instead of a binary tree

taat would be 8^26 nodes!

to print it, you'd have to use recoursion

could have * at then end of every word to differentiate car from carry

sorting.

bubble sort.

you could build a linked list.

also n^2

add to array

also n^2

binary search tree - for nice looking tree log(n) -- did that for every node, so n log(n)

we're going to look at three more sorts

insertion (fastest for small n proportional to n^2)

quicksort (fastest for big n)

heapsort (in the middle ..)

quicksort is n * log(n)

insertion sort -

insert your value in the right place.

insertion sort says this much of the array is ordered --

take the next value and call it my keyy -- put 12 in the right spot --

two parts

figure out where 12 goes and move the other guys down -- but we'll do that all in one.

if you're bigger than the key, move it odwn -- that moves 18 down 1

now we clobbered our 12 but we saved it in the key

i'm empty so my key goes here -- so that much of the array is sorted

now we do it again.

say the array goes from 1 to n.

we'll do the trick that we did with the sequential search.

one stopping condition is when you're done

the other is if its a smaller value

so lets put a very small value at the top

so we speed up the inside loop by having only one stopping condition.

Pro InsertionSort(Data,n)

// sort "data from 1 to n

data[0] = -∞

for i = 2 to N

j = i-1

key = data[i]

while key < data[j]

Data(j+1) = Data(j)

j--;

end

data(j+1) = key

end

demo.

insertion sort likes ordered data.

-------

Mon 03/05/07

Insertion sort

how fast?

best -

worst

avg

takes values -- and says this much of the array is ordered.

grabs the next one as the key.

best case O(n) .. for every value, you didn't have to do anything

.. happens when its already ordered.

proportional to n * number of values out of order

worst case -- reverse order

n/2 for the inner loop -- n times

to O(n^2)

what's the average

assuming random ordering

still O(n^2), but lower constant.

Quicksort

proc qsort(low, high)

if low ≥ high Return

i = low; j = high+1;

key = Data[low]

loop

Repeat i++ until data(i) ≥ key;

Repeat j-- until data(j) ≤ key;

if i<j then swap(data(i), data(j)

else break

forever

swap (Data(low), Data(j))

qusort(low, j-1)

qsort(j+1,high)

procedure quicksort () {

(1-N)

Data(n+1) = ∞

Qsort(1,n)

end

Data

11 <--8

∞ <--9

try it

low = 1, high = 8

i = 1 (low)

j = 9 (high + 1 , and then we decrement it)

key = low = 26

start looking for values bigger than 26 .. , less than 26

repeat means -- do while

i stops at 37 -- that's bigger than 26

j -- we stop decrementing whenever we find a value less than our key - we get one right away (11) so j stops there

the two havent crossed over

both are out of order -- so lets swap them.

we dont look at 11 again -- because its i++

i stops at 61 because that's greater than 26

j stops at 17 -- because its less than the key

the two crossed over, so we're out of the loop

so our key gets swapped at position j

26 is placed for ever -- it will not move

we've got two separate processes so we do it again

call quicksort on low and j-1

....

another example

powers of 2

16 values

1st time thru, you look at everybody

level 1 --- 16 or n

level 2 --- n/2 n/2

level 3 = 4(n/4)

...

we look at them all log of n times

n * log(n)

this is the fastest algorithm known for sorting random data.

worst case: n^2

reverse order is bad -- that's the worst case

in order must be good.

best case - n log n

worst case BAD n^2 -- lot of overhead.

what if data is in order.

that looks like the worst case.

wait .. they're both the worst case.

quick sort HATES ordered data!

it wants splits in the middle

what's the best case ? -- when the key that you grab is in the middle. .. but then there's the key you swapped ....

quicksort wants about equal splits

QuickSort -- nobody's beat it.

average case -- n log n ish.

gets really bad when you get dinky little arrays

a lot of overhead for an array of 2 .. or 3

but that's ok -- insertion sort likes little arrays.

furthermore, quicksort likes to give you sorta in order data.

so when you get small, call the insertion sort.

so when you make the quicksort calls, if the size is small, call insertion sort.

what if we loook at 3 keys -- first last and middle -- and take the middle of those three that you look at.

so now the worst case becomes the best case.

for random, it doesn't matter much -- 8% increase.

if you run out of stack -- do insertion sort.

-------

Wed 03/07/07

insertion sort

grabbing and finding where it goes -- two separate problems.

the search part was n/2

move everybody down -- another proportional to n

1) find location

2) move/ make room.

but we can make location search log(n)

Insertion Sort Big N -- we can do a binary search for step 1.

but we should never use insertion sort for Big N

Heap sort -

with a linked list,

sort - O(n)

adding is 1.

quicksort

heap - n log(n)

"heap" - 1. where dynamic memory comes from.

2. a binary tree with seach node's value ≥ its children.

how does it work?

1) build the initial heap

sort this

81 (root) is the biggist, so put it down and never move it.

swap the root with the bottom.

Adjust -

ajusts a heap if only 1 node is wrong?

adjust is log n

we call it n times.

1 -- 14

2 - 13

3 -- 10

left = p * 2

right = p * 2 +1

parent p /2

heap sort actually represents the tree as an array.

biggest val at 1

put it at bottom of tree

so swap 14 and 4

chop off box 7

now bottom at 6

call adjust

largest value is on e of the children

largest value is at either 2 or three

swap 13 and 4.

now tree at psoition 2 is out of order, so we call ajust on it.

biggest is either at 4 or 5

so we swap

.....

take the biggest, 13 and swap it with bottom.

code for adjust

Proc Adjust (i,n)

// make a heap of tree rooted at i, n nodes in the heap

provided the children of i are heaps

if i is a leaf node return (leaf node = no children) -- half of them are leaf .. that's how a binary tree works -- so if i > n/2, i'm a leaf node .. so get out

otherwise,

m = position of the max of data[i*2] and data[i*2+1]

if data[m] > data[i], then swap em

but when we swap them, we screw up the tttree, so we call adjust (m, n)

Heap sort

Proc heapSort(data, n)

// build the initial heap

for i = n/2 to 1

adjust (1,n)

end

for i = n down to 2

Swap(data[1], data[i])

Adjust(1,i-1)

end

but how do we build the initial heap?

well all leafs are heaps

so we don't have to loook at half the tree.

can call ajust as long as the children are heaps

Ajust 5,n

adjust 4,n

adjust 3,n

-------

Fri 03/09/07

tree

root =1

left p*2

right = p*2 +1

build an initial heap

start at bottom; keep calling adjust

start calling with position 5 -- 10/2 -- first node with a child.

adjust(5)

adjust(4)

....

adjust(1)

with initial heap

take biggest value (position 1 .. and stick it on the bottom -- and cut that off.

call adjust on position 1

heap sort -- n log n.

but we build a heap -- which itself is .5 (n log n)

so 1.5(n log n)

best case worst call -- its always that n log n

stable sorts.

if we have duplicates and they come out in the same order as they were initally, that's a stable sort.

why does thi matter

sorting the phone book

we want to sort by last name and then first name if there's a conflict. sort by first name first. then sort by last name -- only works with a stable sort

so what are our algs.

insertion sort is stable

22a

22b

stable because you say "if you're bigger, move it down -- NOT ≥"

heap and qsort are not stable

sorting character data.

sure we can .. but be careful.

better be consistant with lower and uppercase

in ascii land -- ABC then adb

in ebcdic

blank

special stuff

a-z

A-Z

numbers

its worse if you sort numbers as characters.

for floats, line up those decimal points!

he likes printf to generate the text file

don't let the user generate the text file if possible.

First Last City State

state is my major sort key -- i want everything ordered by state

how do you do that?

one way -- sort on the most minor key first.

then sort on the next most minor.

but the last name sorts would have to be stable or it wouldn't work. .. no qsort or heap.

another way to do this.

.. make a pass of the data file.

build a concatinated key

take state concat to city then last name and then first name

then sort on this concatinated new key

that puts all the wisconsin people together and then only looks at the city if you're in the same state.

this only works if these are fixed sized fields.

sorting with major key

3n^2 n log n

concatinate sort

2n then n log n then 2n again to get it back.

one more thing to think about -- an index sort.

build an index on your key -- say, artist .. with pointers to the data

then just sort the index --- not the data.

so the pointers are rearranged, not the actual data.

could have two indecies

so it looks like the file is sorted by both title and artist.

merge sort

two separate arrays

A1 and A1

sort them together

smaller one gets copied in .

but the two arrays are already sorted.

but you can sort little chuncks of it -- and call them a1 and a2 and then you take a1 and a2 togther and then you take a3 and a4 together and then merge those two together.

-------

Mon 03/12/07

20,000 values in random order -- and you want the 10 smallest.

Heap

always a full tree

fill it with the first 10 values.

-- the 10 smallest so far

by definition, we're only going to hold the 10 smallest.

3 is less than 101 -- the biggest of our smalls

so put 3 in for 100 and call adjust

etc.

so .. we have to read every value, but we're only comparing to the root -- you either ignore or swap

reajust is n log 10 .. only storing 10 nodes.

its an interesting algorithm to keep in your head.

AVL tree

Adison Velski and Landis

a binary search tree - garantees log(n) performance -- (not optimal, but log(n))

everything associated with the BST was log(n) when it was optimal

these guys rotate parts of it when it gets lopsided

also called a height balenced tree .. meaning its balanced by heighth not by weight.

the height of the left == height of the right

every node in the tree has a balance factor =

the height of the left subtree - the height of the right subtree

if i'm here, what's the longest path ?

you have an avl tree if the balance factor is -1, 0, or 1 on every node.

if the tree ever gets out of wack, you put it back in by doing these rotations.

how do you build? start at the root

that screws my tree up. now the balance factor for C is 0

LL - rotate around the node that's out of balance.

so we will promote B, keep A where it is. and the node that's out of balance gets demoted down.

call it LL becase we inserted left.

C E

Add a B - that screws up two nodes, but the rotation is always around the neearest node to the inserted one.

This is a LR rotation -- we did the left subtree and then the right subtree

A stays in its place .. B gets to be the root, and the C gets demoted.

four rotations, but only two to worry abut

LL and LR

and then the absolute symetric RR and RL

E H

C F

add D

rotation is always around the nearest node that went out of balance.

LL rotation

So E bcomes the root -- G gets deomoted down and anything attached to it and the D stays

what the f happens to F.

it got disconnected.

well G's left is now open -- so what ever was in E's right goes in G's left

and now we have a balanced tree

C G

B D F H

add M, and everyone's happy.

now add Z.

Ouch.

rotate around H

RR rotation

opposite cupping

so m becomes the root -- h gets demoted and z stays innplace

add another H (that's greater than the first one)

RL rotation

on G M and H

M stays

H gets promoted

G gives down H2 gets connected to M's left

and the rest of the tree stays.

gotta get down to n to add .. log n

rotation is constant -- based on the node that went out of order. constant cost.

thisis huge -- prevents really lopsided trees.

an O(n log n) sort -- log n plus you do it for every n.

code isn't as sticky as you might think -- see book.

whole idea is that you build the tree as you normally would -- rotate when you have to.

could use hashing .. has a great best case -- but a bad worse case -- progressive overflow fills up and gets mudy at 80% .. chainging doubles the storiage initially and the linked lists can get full

hashing's worst case is really bad -- so if you need it garenteed log n, then use AVL tree

--------

Graph, Network

you can get to a node multiple ways -- a node has multiple parents

not disjoint

computer networks.

transportation graphs

veretices and edges

edge - link

vertix - node

vertex has multple link fields

how do you represent this?

adjacency matrix

adjacency list

matrix:

A B C D

A 1 1

B 1 1

C 1 1

D 1 1

for graph

B C

with links like an outline

oneway link -- you have to know which way i can go -- directed graph -- on eway to go

undirected -- t way links -- you go from either vertex to any of the verticies with equal ease

we've just shown in our matrix undirected

and when its undirected, like in milage charts, we only have to prespresent half of it

and .. we're only useing a big to make the connection -- zero or a 1

so to store an ajacency matrix for 100

(100 * 100)/8 and /2 if undirected

adjacency list

A ---> B->C/

B ---> A->D

C ---> A->D

D ---> B->C

works really nicely if the connections are sparce.

the size of the list is n, but the number of nodes is how many connections you've got.

different representations.

could connect B and C

in a graph , the edges mayy have information about the connections.

no root -- because anything is equally accessable.

given, yo're going to start somewhere

typical problem - given the graph, visit all the nodes.

proc visit(spot)

spot = A

Visit spot

For each vertex connected to spot

visit(that spot)

End

umm - that's infinite

so we ned to change it to remember that you visited the spot

and then only visit it if i haven't already.

-------

Wed 03/14/07

Qsort

recoursive

qsort does like small n -- we said when the subarrays get small, call insertion sort

or when the subarrays get small, don't do anything

then when

qsort is done, call insertion sort on the WHOLE array

because its primed for insertion sort

one more misc thing

find the median value of a whole set of values

make a heap of he small values so the largest of the small would be at the root - the median

a heap of the n/2 smallest values.

graphs

really, a tree is a graph

a graph has multiple connections to a node

undirected - 2 way links

directed - one way links -- labeled which way do you go

adjacency mattrix

rows and columns of all the vertecies listed both ways -- only a bitmap.

if all the links are two way, then you only need to represent half of it.

adjacency list

list of all the vales and the connections to it.

works out pretty nicely if you dont have a whole lot of connections, edges

we were looking at processing this ..

adjacency list

1 -> 2, 5

2 _> 1,3,4

etc.

we'll use recoursion

for everyone that's hooked, we'll make a recoursive call to the linked one.

gotata make sure yo don't revisit.

so, we'll have another structure that says "you've already visited"

Proc Depth1st(start)

// set everyone to not visited

for i = 1 to n

visited[i] = flase

end

visit(start)

end

Proc Visit(p)

visit vertex p

ptr = List[p]

while ptr != null Do

v=vertex(ptr)

if not visited[v] visit(v)

ptr = link(ptr)

end while

end

Depth1st(1)

Depth1st -- you go as deep as you can

spanning tree - a tree because we visited every node only once.

spanning tree algorithm

numbrs 1-6 costs on each vertex.

looking for a spanning tree that minimizes the cost

that looks hard -- lots of possibilities

algorithm -- in our book

1) sort edges based on weight

2) pull from the list the next smallest edge weight

3) if its not already included in a cycle, then we inlude it.

sort this

at 5 2,3

at 6 2,4

at 10 3,4

at 11 6,2

14 6,4

so and 3 haven't been included , so take 2 and 3 at the cost of 5

4 hasn't been inlcuded, so linclude at a cost of 6

n + n log n

-----------------

for our visit algorithm, we could replace the visit with a stack .. and then at the top, take from the stack.

now lets change from a stack to a queue

that's a bredth first algorithm.

binary tree

recoursion -- depth first

queue - breth first

task/job management

3 programs running at the same time.

binary files

fstream df;

df.open("file", ios::in|ios::binary);

//df.read( address , # of bytes);

Data2 d;

df.read((char *) &d,sizeof(d));

as soon as you send the read command, the request toread is queued

anything you get in memory is free compared to disk access.

-------

Fri 03/16/07

iterrupt driven and time slicing

generally interrupts - i/o request

time slicing -- if you've got four things being processed at once.

read/write -- once you issue that command , you can't continue.

binary

class Data2

int

float

array int [10]

array of chars.

fstream df;

Data2 d;

df.open("hey.txt", ios::out|ios::binary);

d.i = ....

strcpy(striny,"hi");

df.write((char *)&d,sizeof(d))

do it as many times as you need records

when you're done,

df.close();

read

df.read((char *)&d, sizeof(d));

read until you get to the eof.

read

check for eof

process

repeat forever

see handout

disks - data recorded on tracks. - concentric circles. .. every track holds the same amount of data.

disk delays: seek time, rotational time, and transfer time.

we can eliminate seek time by usuing many read heads .. but that's costly.

if we have 12 records, we have to pay the i/o cost 12 times.

when we finish reading the first record, we're in perfect psition o read the 2nd one .. so read it and put it in some temporary place. - because we're gonna need the rest of it.

so we read ahead and put it in the buffer.

so if the data is buffered, the next time the program asks for data, its already read for it.

this is called blocking.

physical vs. logical

logical i/o is what the program thinks.

physically, i've got 12 records at a crack.

os pays buffering cost.

what if we have more than 12 records?

then we have more than one record block

we'll put the other block in another buffer.

3rd chunck? we ran out of memory for buffers at that point.

having two buffers -- double buffering., anticipatory buffering.

if we process the first row fast enough, then we can reuse it for the 3rd chunck of data -- but that's a huge hope.

this buffering is so important that windows does it no matter what.

works fine for sequential files but not direct ones.

file systems, file structures.

file systems -- os support.

text files -

they're worse, he says

df.open("bob", ios::write);

df <<" blah";

df.close();

text files convert everything to char -- you pay for the conversion there and back

text files are under software control

1234 bob 12.4

can do

df>>i>>s>>f;

geline(df, s);

i don't know how to read until runtime.

he says binary is 41 times faster.

-------

Mon 03/26/07

disks - one platter

concentric circles -- tracks.

track addressable and secotor addressable --

our disks today are sector addressable

you chop the tracks into sectors --

it looks like pizza slices.

physical i/o

vs logical i/o

logical -- what the program thinks is out there.

df.read((char *) myData, sizeof(myData));

say that the data is 310 bytes (per record)

in memory for program

------------

code

------

data

i/o channel -- disk controller

i/o channel sticks the data into the buffer

tracks - concentric circle -- every track holds the same amount of data

cylinder - every track is numberd --- the same track on every platter.

so if you have a pretty big file , you want it all allocated to the same cyllinder.

worst case is to have a little data way out on one track and more in a track far away.

when you get to the disk, all i/o is physical.

that means that you're always going to read a whole physical record. -- the space delimited by boundaries - sectors (or tracks)

ch='A';

df.write((char *), &ch, 1);

and loop

that's foolish - yo're writing one byte and you're taking up the whole track.

all i/o on the disk is physical

with a track based disk , we have record gaps to worry about.

our current os usually sets the buffer to the sector size -- its gotta read a whole sector at a time anyway.

so if the sector size is 512, a record size of 310 doesn't work so well. - we cross sector boundaries -- and then its two physical i/o's

track addressable disk -- can only sense the begining of a track.

you put the data out here and it has something called record gaps.

the minimal amount of space you get is a track.

but the record caps can be sensed as well. but the faster you move, the larger the record gap has to be.

we're kinda going back to track based disks for server because HD space is so cheap now.

text files

if you send a write command to a text file, it doesn't actually write the data until the write buffer is full.

the flush will write it right away .. but that slows things down immensely.

if you write to the size of the sector, things scream. so writing more data is faster -- because you're lining up the sectors with your records.

records usually are staggered -- so that there's breather room between them.

you can append files, but that leads to fragmentation - usually we cant add to where we left off.

so its best to write a bunch of records at once -- so that they'll be together on disk

so what can we do?

use a multiple of the sector size -- at least a power of 2.

don't use text.

disk delays

- seek time - move to proper track

- rotational delay -- the time it takes for my my record to line up underneath.

- transfer time -- how fast is the drive? how long does it take to read?

see handout

i have a file and I want it sorted.

we want to sort by ID.

we have 2000 bytes on each person.

it won't fit in memory -- sort it.

process charge -

the way you sort is you copy - and the copy is sorted.

the big problem -- you can't read and write a file at the same time.

in --> sort/merge --> out

unix has one -- windows has one only in DOS.

probably 12 man years in designing this thing.

works because you only need to have smaller chunks in memory.

so say sort/merge exists.

lets sort a binary file.

gotta tell sort/merge

1) where the key is

2) how big the records are . .. the key HAS TO BE in the same place .. or how will i know where to find it.

how do you comapre the values -- representations are all different.

so its hard to sort a binary file.

say the file's a text file....

gotta tell it the key.

where the key is.

you ave variable record lenghts -- where the eol is --under software control.

but you have tokens -- but then you have to do token processing.

DOS

in --> sort --> out

how to do it

sort /+10 <in> out

+10 -- key starts at col 10.

dos won't sort binary

so by putting col 10 -- we mean that 10 positions from the beginning is the start of our key.

starts at 10 and goes on.

unix is more complicated.

man sort

sort +0.17 -0.20 /* minor key */ +0.0 -.11 .. from 1 to 10

major sort key and then minor sort keys.

sorts on 18 and 19 -- starts after 17 -- but the first one is zero.

at then end of the mess, you can put options

-b ignore leading blanks

-f uppercase=lowercase.

-n numeric ordering

-r z-a

0. -- means the first token -- we won't get into that.

-------

Fri 03/30/07

BST creation/traversing

(Binary tree, thread

#) BST associated with Big search.

---

general trees.

if you have a general tree, you can always convert that to a binary tree

[first child|data|sibling]

searching

sequential

sequential with trick -- we stord the value we were looking for and that saved us two decision.

O(n)

if, in the sequential search, you move up the search result one cell -- promote it any time you search for it .. then it'll find popular things faster.

Address calculation. - fastest search is no search.

binary search

BST

hashing - O(1)

- dealing with collisions.

IV) sorting

insertion

heap

quick

(merge)

--> Hybrids

cobol has a sort verb -- but what kind of sort is it doing?

heap n log n

quick n log n

quick is faster on average, but worst case is worse than heap.

stable sorts.

sorting numbers as chars.

multiple key

index sort.

avl tree - balence tree.

BST is a sort -- AVL tree never gets out of balence, so always n log n.

be aware of the rotations in an avl tree!

graphs

directed

undirected

representation

adjacency matrix

" list

traversal -> spanning tree.

looked at both depth first and bredth first --

depth uses stack (double check)

files

os - multiprogramming environment.

interrupts, time slicing.

worried a lot about file proerties -- most certainly text vs binary.

text files are under software control -- have to be parsed out.

physical properties of the disk -- sectors.

blocking -- take a whole group of records and you make a larger record out of it

buffering -- two buffer system.

references:

nyhoff

BST 12.4

Threads 12.6

Bin Search 10.1, 12.1

Sorting 13

AVL 15.2

Graphs 16

Hashing 12.7

psudo code is fine.

- - - - -

Test Monday - yay!

updating a file -- change -- add, delete records, change records.

f the file is small enough , you can input it into a huge internal structure, update it and then write it. -- this is how word and note pad do it.

but if your file won't fit in memory, we've got issues, in that case, the only way you're going to update a file is with a copy.

if you copy a file you get a nnw version

file, old master -> update -> new master.

transactions such as add, changes and deletes, -- you uupdate this by copying and in the process getting an updated file.

we generally have a hunge number of records but only a few that need changes.

how fast is that -- because its a sequential file, you're going to have to read every record and write every record, minus the deletes on the writes, plus the adds on the writes --

generally the deletes and adds cancel each other out.

changes don't cost you anything.

how fast ? how many i/o's are you doing .. that's the part that's going to kill us.

one step deeper

how fast is it -- its like 2n, sort of .

but what kills you is not logical ios, but phyiscal ios.

so 2n/blocking size.

my file has a key and it has to be ordered on that key.

we want to change 41's address and 62's phone number and 10's salary.

but if the file is already sorted, our list of midifications must also be sorted .

so if you have a match, make the change.

only need to touch what's being changed.

now we got another problem.

we pay the same amount no matter how many changes you make to a sequential file -- therefore you must batch your transactions.

M T

3. 100 change

4. 100 change

100 1. 220 add

200 5. 500 A

300 6. 500 C

400

500 2. 500 del

600

700

if we wait too long for the batch time, we might not be able to get the same result.

we can't put it in chronological order becase we're putting it in key order.

sorted by id, but not in the right order

so we're going to do stable sort on this.

balanced line algorithm

every time you do a merge, you have two cleanup loops -- if irun out of one, then i have the other col to c

Page updated

Google Sites

Report abuse