compsci
Mon 01/22/07
answer to three questions on the next exam: depends.
file structures -- to big for memory
programs will take awhile to run.
programs will be hybrids of our inclass work. the ideas are solid
read chapter 1
we are in 10.4
how fast is is? -- well how many things do it do -- of course faster machine is faster
for i = 1 to 10
a = b + c + d
End
how fast is that.
go around 100 times -> 10 times slower.
add b = bob(b);
now it depends on what bob is doing.
if we go to n -- how fast is it -- ? n !!
proportional to n
O(n)
that means proportional to n
if its proporitional to 10 then its a constant - every time i run it, it will take the same time.
propportional to n --> has to do with how big the thing is yo're doing -- snc's n is 2200, madison -> 20,000
not precise but pretty close math.
2^n -> slow -- 1 -- is fast
we are going to experiment -- lab reports should be standalone documents --
-------
Wed 01/24/07
You can't take somebody's word for it
Big O
O(1) - constant.
ex:
a=4;
b=2;
c=d;
O(3)
3 * O(1)
do it 10 times and its still a constant.
this is as fast as you can get.
always dealing with n things.
no matter how big n is, i'm going to find whatever i'm looking for in constant speed. you don't get faster than proportional to 1.
O(n) -- good buddy the "for" loop
for (i=0; i<n; i++) {
if(data[i] == ?)
}
that's O(n) -- function of how big it is.
adding a line of code doesn't change that.
O(n^2)
for (i=0; i< n; i++) {
for (j=0; j< n; j++) {
//
}
}
n*n -> n^2
example - bubble sort.
most sorting algorithms are n^2
O(log n)
2^? = n
binary search.
32/(2^5) = 1
log will always be base 2.
n
2
4
8
16
32
64
n^2 grows quickly!
sorting is n^2 --> don't sort :-D
..with that algorithm
remember, fastest thing is proportional to 1. you can have a bad constant of proportionality.
example of O(1) -- array of id numbers --
but you've got to have consecutive ID numbers.
for i=1 to n
a=4;
b=3;
for j=1 to 10
end
end
still n.
but if we use n for the inner loop -- n^2
n/2 * n = n^2/2 --> 1/2 O(n^2) -- that's still n^2
p = n
while p > 1 do
...
...
...
p=p/2
end
32.
16
8
log (n)
matrix
2d array
ADTs
you can use it without worrying about how it was implimented
ex: 2d array
m.put(81, row, col)
value = m.get(r,c)
m(size)
a[size][size]
A[r][c] = value
that's O(1)
return a[r][c]
fast.
that's also O(1)
if the array is big -- well it won't fit.
another way to do it
m.put(v,r,c)
a[n].v=v
a[n].r=r
a[n].c=c
n++;
that's still O(1).
but lets do the get
get(r,c)
for i=0 to n-1
if(a[].v= ...
end
return 0
so its n to get it out.
but we need to exclude 0.
if(v!=0) then do this.
we also need to figure out how to change a value .. could just add to the end and then search from the end.
-------
Fri 01/26/07
handout on G:
matrix.cpp
O(1) - constant, really fast
O(log n)
O(n)
O(n log n)
O(n^2)
stack - abstract data type.
last thing in is the first thing out.
all inserts and delets occur on the same end.
array implimentation
int stack[100];
stack
a
---
b
---
c
---
top = 2
efficient implimentations.
initialize
get something off
put something on
int
del
add
empty
add
top++
stack[top] = value
//check for overflow tho
if(top >= maxstack) -> trouble.
this is O(1). "really quick"
cout << ++a;
inc a and then c it out.
v = s.delete();
Del ()
Return (stack[top])
cool -- but we have to ajust top and check for empty
Empty
(top == -1)
if (top <= -1)
init ---> top = -1
empty
Return (top <= -1)
BUT!! how big do i make it?
int stack 500;
int * stack;
stack = new int[x];
it depends :-)
top--[c]->[b]->[a]/
empty condition - top=null
add - give me a new node add data and chage the ponters.
x = new Node()
[ if x == null, PROBLEMS]
x -> data = value
x -> link = top
top = x
Del
if (top == null) empty
temp = top -> data;
Q = top;
top = top -> link;
delete q;
Return (temp);
void Sally () {
stack s;
s.add(x);
cout << "hello";
return;
}
static is guarrenteed
-------
Mon 01/29/07
stack
everything proportional to 1
stack used by
recoursion
expresion evalation
backtracking -> like gaming, mazes, a bit of AI.
expression evaluation
cout << 1+2*3;
* has higher priority
2^n
A+B^2*D-4
squaring, * D, add A, sub 4.
usually done using postfix notation
postfix and infix
infix: operator (+) is inbetween operands
Postfix - operator is after the operands.
postfix - AB+
() and can evaluate quickly.
A+B*C - infix
ABC*+ - postfix
A+B*C
t[1] = B*C
t[2] = A+t[1]
BC*
AT[1]+
substitute
ABC*+
evaluate:
start at front, go until you hit an operator, pull your two operands and keep going.
BC* T[1] = B * C
t[2] = A+T[1]
answer in t[2]
infix A/B^C+D*E-A*C
postfix -> ABC^/DE*+AC*-
t[1] = b^c
at[1]/de*+ac*-
t[2] = a / t[1]
t[2] de*+ac*-
t[3] = d*e
t[4] = t[2]+t[3]
t[5] = a * c
t[6] = t[4] - t[5]
answer is in t[6]
eval code
loop
stack
case
hit an operator
unstack, perform math, stack answer
keep going
loop
if you're an operand stack it
forever
AB+C*
2 3, 4
(a+b)*c
next token - gets next number/operator
is it the end of my expression no
is it an operand? yes, stack it
b - operand, stack it
must e an operator
pop our operators
t[1] = A + B
stack t[1]
next c .. stack
next * -- pop it
t[2] = t[1] * C
t[2] is stacked
are you done? return the top of the stack.
ABC*+
A+B*C
stack a -- a is 2, so stack 2
b is 3 so stack 3
c is 4 -- stack 4
next token
* is an operator, so pop 4 and 3
stack the 12
next token
+ --> pop operands
2+12=14.
stack the result -- 14
answer on top of stack.
eval -- stacks operands
so how do we convert from infix to postfix
Postfix
// converts from infix to postfix
loop
x = next token
case
x= ∞ {}
x = operand output(x)
...
incoming priority
^
*/
+-
(
) is a special case -
else // operator
unstack the top operaters that have priority over x.
how fast is the eval function - proportional to n
A-(B+D)*E∞
A
put -∞ at start of stack
A is an operand, so output it
- -> an operator
isp - in stack priority
unstack all operators with higher priority
nothing there
so we just stack it
next (
the isp of a - is a 1 and the incoming priority of a ( is a four
incoming
4 2 1 4 n/a
in stack
3 2 1 0 -
again nothing to unstack, so stack it
operand .. so
AB
on output
next an operator
isp of ( is a zero -- 0 is not greater than or equal to +, so we stack the plus.
open parens are never unstacked.
a 0 is never ≥ than any incoming value.
they get unstacked when you find a matching closing.
we could have made the open paren a special case, but you'd pay for it.
next operand, so move to output
ABD
whenever you find a closing, unstack until you find an open
ABD+
and then delete the open paren
* --> minus sign does not have priority over *, so stack the *
E - operand
output it
ABD+E
∞ - empty and output the stack
ABD+E*-∞
t[1] = b+d
t[2] = t[1] * E
t[3] = a- t[2]
A+B*C^D-E∞
A - operand, move it to output
+ -> nothing to unstack, so go ahead and stack it
b - operand ,move to output
AB
the instack priority is a 1 and incoming (*) is a 2.
no presidence, so the * gets stacked
c - move to output
ABC
^ - nobody has more presidence so stack it
operand ove to output
ABCD
- -> instack is a 3 , incoming is a 1,
so unstack and output until we get down to ≥
ABCD^*+
operand --> output
ABCD^*+E
empty the stack
ABCD^*+E-
A-(B+C)*(D-E*F)-4∞
operand
A
operator - stack
stack (
+ gets stacked
closing paren -- output everything to open paren
ABC+
*
ABC+D
-
ABC+DE
*
ABC+DEF
)
ABC+DEF*-
-
ABC+DEF*-*-
stack -
ABC+DEF*-4-
e^x^2
so we need incoming priority of 4 instead of 3.
-------
Wed 01/31/07
fortran
** means exponentiation
a**n
ftn -- on the unix box
x^y^z
512 in fortran
2^3
64 in basic
2^9
2^3^2
2^(3^2)
e^(x^2)
is what we want
not!
(e^x)^2
we need to go backwards
sym isp icp
^
*/ 3 *4*
+- 2 2
( 1 1
-∞ 0 4
postfox
operands = output(x)
everytime you got an operator you stacked it
operators that had precidence would get unstacked first
-- so we use 4 for the icp so that we can go backwords
right to left exponents
( --- in stack 0, incoming 4
if(a<b+1 || d< c-4)
true or false
a<b+1
abl+<
add < to the table
isp - -> .8 icp .8
and > ≠ etc
all of them have to be done last.
|| is last -- so give it a .5
= gets .1 and .15 (we do = from right to left)
for (i=1+b; i< ..
everything in c is an expression.
a=2;
cout << b=a+1; //gives you 3
if(a=b) --> always true
Error checking
eval -
evaluated postfix epxpressions
AB+C*
had a stack.
if you have more left on the stack, not enough operators
AB+CD-R
if there's nothing to pop for operands -- too many operators
A+*B
postfix -
if you have ( left on stack ..
missing a close paren
if you have ) left, you're missing open parens.
you got too many operators, etc
another component -- queue ..
first in, first out
waiting line.
all inserts and deletes occur on opposite ends.
usually, we add to the rear and delete from the front.
like a line
this is like multitasking.
ADT
q.Add(val)
q.Delete()
q.Empty()
q.Init()
class Queue
Queue T;
Q
[]
[]
[]
[]
[]
[]
ints front and rear
want to add to the rear.
Add
R++;
Q[R] = val;
Del
temp = Q[f]
f++;
Return(temp);
OR . .
Return (Q[f++])
val = Del();
but that's not a very efficient use of the array
we'll make a circular array.
queuesize = 6
r= (r+1) % queuesize
for delete
f = (f+1) % queue size
full
-
[a]
[b]
[c]
[d]
[e] -r
[g] -f
f = r+1 -- i'm full;
or f= 0 and r = queuesize-1
empty
f=r+1;
f=r-1;
no no no!
can't tell you when i'm empty!!
so . . .
define empty as f=r.
it will cost us 1 location in the array.
f = 1 less than real front.
new full condition:
f= r +1
or
f = 0 && r= q+1 (?)
Add
R++;
If (f=r) full
Q[R] = val;
Del
temp = Q[f]
f= (f + 1) & qsize
Return Q[f];
talk to com ports
$A4, lengouth, data, checksum (byte)
friday - linked list version
-------
Fri 02/02/07
f=r
circular array queue
linked list queue
[c]->[b]->[a]/
adding
add to teh rear
give us a new node x , put a new value in here -- change x's pointer to r and change r to x.
delete from front
q = f->data
eventually
Return (q)
but we can't change the f
we can't get back.
f r
[a]-> [b]-> [c]->
we just flip the labels around and we're good
q F -> data
t=f
t
f - f->link
delete t
return q
add to the rear
x = new Node(x);
x->data gets value
x->link gets null
r->link = x
r= x
how we gonna represent empty
say f=null is our empty
but that makes adding a problem
if f=null
f=r=x
else r->link=x
delete --
//(unstack)
if f=null, then we're empty and i've got nothing to give
else [
q = f->data
t= f
f=f->link;
delete t
return(q)
]
what if the queue has only one value
delete
q='a'
t = f
f = f->link // so f is null
delete t
return q
so r points to nowhere but that's ok because we set r when we come in to the add function.
A= ((((((((((((((()))))))))))))))
they match but it won't compile.
A= (((((((((((((())))))))))))))
does compile
so you're limited 14 because they use an array stack .
Bob
gas station
two pumps -- should i add another pump
write a program that takes cars coming by based on some statistics and come on and go to the shortest queue and generate some random nuumbers to figuue out how long they stay here
we can compute
change one thing by adding another pump and see what happens.
1 - acuquire statistics
need to now the average waiting time at a pump
need to know arrival rate -- how frequently does a car come in
that should be it for now
a customer arrives every six minutes on the average
the average wating at the pump is 5 minutes - that's how long you stay at the gass station.
we need random numbers, psedo random number generators.
have a variable called time -- minutes -- every click of the clock is going to be a minute .. start at 1 and go though a whole day
see handout
event list
leave when done pumping
we don't allow switching of q's
time arrival/departure what q
30 A
32 D 134 A
35 A
we don't know what queue they're going into until they get there.
little variable time gets bigger and bigger
time 30 -- we get an arrival.
time 30
we get an arrival
so we queue it at the shortest line -- say q 2
32 -- departure
we keep track of the pump time in the q
in queue
id,
what q
when you got the pump
arrival time
when its done, you leave and the next guy gets the pump - record when hh gets the pump
34 - arrival
35 -
at the end of the day we llook at our stats and compare what we got.
can simulate a lab, stoplights.
this is going to work .
first arrival
setp time to 1
time 1 arrival
setp time to 1
time 5 -- we will have another arrival
start over
queue
event list
what rand does -
gives you integers in the range of zero to rand max
predfined constant calle drand_max
rand() % 6 +1
that gives us uniform distribution.
we want a uniform distribution that averages to 6
we can get random numbrs frm zero to 12.
-------
Mon 02/05/07
random numbers
rand()
numbers from 0 to rand_max
normally rand is from 0 to less than one.
x = int(Rand() * 6) +1 // excel rand.
0 to .99999999999
0 to 5.9999999999
int
0 to 5
from 1 to 6
random number generator
usually numbers are multiplicative.
once you have one random number, the next one is a function of the previous plus a couple magic numbers.
repeats after you generator another of the same random number
also something called a seed .. the random number is a function of the lst one --
how do you get the first one.
static seed .51917
f you have a staic it stays around the length of your program.
other various disappear at the end of their scope.
while (1)
cout << rand();
srand();
we like the numbers to be the same each time when debugging. easier that way.
float Rnd()
0 ≤ x <1
return (rand()/(float)(rand_max +1)
uniform distribution (flat)
bell curve way "normal distribution"
poission distribution
chop off the left tail. -- the negative part.
so for bob
average is 6
when is the next arrival going to happen.
according to the poisson curve.
poission code
float Poission (float mean)
{
return(-mean * ln (Rnd());
}
linked lists
top->[a]->[b]->[c]/
class Node {
public
char data;
Node * link;
}
Node * top;
Node * x;
x = new Node;
makes us a brand new node pointed to by x;
delete x;
x -> data= __ ;
* x.data= __ ;
x -> link = __;
* x.link = __;
Data(x)
Link(x)
will be used now on the board.
printlist *top
q = top
while (q != null)
Print Data(q)
q = Link(q)
end
end
void Printlist (Node * top) {
Node * q;
q = top;
while (q!= null) {
cout << q -> data;
q = q -> link;
}
}
GetNode(x)
ReturnNode(x)
top->[a]->[b]->[c]-/
pointers point to the whole node
data link
0 A 1
1 B 2
2 C -1
array representation of the heap
Printlist(top)
Printlist (z)
avail
linked list of every memory box that's free
with that avail list -- eveeryone is on some lis -- no memory leak.
getnode(x)
data(x) = w
link(x)= -1
adding -- stack
getnode is reaaly an unstack of avail.
getnode
free = avail.
avail = link(avail);
return free;
but if avail = -1, then the heap is full
-------
Wed 02/07/07
the heap
class node
char data
node * link
you own heap
everything on avail
avail = 0
build top->[a]->[b]/
getnode(top)
top = 0
avail = link(avail) ..
avail = 1
data(top) = a
link(top) = NULL .. puts a -1 in the link.
Getnode(x)
avail = 2 .. link(1)
data(x) = 'b'
link(x) = null
link(top) = x
we build on everything.
what if you've got more fields
name
major
gpa
adivsor
link field
then we'd have more parallel arrays
top - [a]->[b]->[c]/
problem --- we can go this way ---> but not this way <---
one way --- stack and then print.
so we put backward pointers as well
so
Node * forward and
Node * backward
in our array based reprentation -- ad another field called "backword"
print data(p)
f = forward(p)
data(f)
f = p -> forward
b = backward(p)
doubly linked list.
we'll treat it as an abstract type
with a singly linked list, we can't delete a random node w/o knowing anything else.
but what do we do about top previous.
use a header node.
[a] [b] [c]
top points to the header node -- the data field is junk.
link it up (forward)
we're going to go forward.
and we'll make it circular.
while p is not top, .....
..
end while
emtpy list -- top = link of top
now double link the list.
same printlist for forward.
for back , just change link to back -- or forward to back.
top
forward - points to top
backword points to top
top -> back = top
top -> front = top
NOde * top;
top = new Node;
top->font = top ->back = top
InsertAfter(p, datav)
{
GetNode(x)
Data(x)=datav
Front(x)= Front(p)
Back(x) = p
Front(p) = x
Back(Front(x)) = x
}
empty list
if you want to add a node, the only place p would be is the header node.
o(1)!!!!!!
insertBefore(P, data)
insertAfter(Back(p), data)
End
we may be asked to do the reverse on the test.
Delete( p, top)
If p = top, get out
Back(Front(p)) = Back(p)
Front(back(p)) = Front(p)
RetNode(p)
End
here you're already there.
so why not use double linked lists -- well you have an extra bit of data per node.
-------
Fri 02/09/07
top -> [a]->[b]->[c]/
copy the list.
(shallow copy -- b=top)
when done we should have a whole separate linked list.
printlist problem
p=top
while ( p!= null)
new node
data(x) = data(p)
link(x) = next value :-(
p = p->link;
}
that's a little messy
copy(top)
If top=null return null;
otherwise
getnode(x)
DATA(X) = data(top)
link(x) = copy(link(top))
.....
return(x)
copy is supposed to copy a list, so it returns a pointer that's a copy of the list.
recoursive copy.
activation stack
individual parts are called the activation records
every call allocates
all parameters, any local variables, RA
every return, you dealocate (unstack)
all reference to the stuff that we're stacking is tru top of the activation stack.
for a b c list
give address 91, 21, and 6 to the nodes
so top = 91
stack
top x ra
91 ?-- x M
....
normal recoursion picture - i'll draw it for you if you like
recoursion is powerful, but it costs you -- you pay for it.
top -> ....
b = copy(top);
we could run out of the activation records
so lets simulate the recoursion.
1) maintina our own stack {Add, Del, Push, Pop}
On every call, stack our current values (all param, locals, ra)
2) set new parameters
3) branch (jump) to the beginning
on every return,
1. if our stack is empty, then we'll use the normal return
1.5 if we're dealing with a function, then i'm gonna save the function return values
2. unstack everything -- params and locals, and then branch to the return address.
3.5 - if function whereever we're branching to , i need to process the return value.
we'll actually reference the parameter and the lccal variable itself now -- why stack them if we don't need them back
so . . convert copy(top) to a non recoursive routine that does the same thing
Function copy(top)
init stack
if { top is null then // Return(Null)
if s.empty then return (NULL)
save = Null // need to save the return value
unstack (top, x, RA)
goto(RA)
}
Getnode (x)
Data(x) = Data(top)
// copy(link(top))
stack (top, x, RA (1))
top = link(top);
goto begin;
(1) Link(x) = save
// Return(x)
If s.empty then Return(x)
save = x
unstack(top, x, RA)
Goto RA
end
to refine this -- the first part down to goto begin is a loop
so
init stack;
while (top !=null) {
Getnode(x)
Data(x) = Data(top)
stack (TOP, x, (1))
top = Link(top)
End
if s.empty then Return(Null)
save = null
snstack (top, x, ra)
goto ra
(1): Link(x) = save
If stack empty then reutrn (x)
save= x
unstack (top, x, ra)
goto ra
but we an do better
the only pssible value for ra is (1)
so get rid of "goto ra" , change to goto 1
and now e're not using ra anymore, so don't stack it.
but the last part .. we can get rid of the goto
watch this.
x=Null
while ! stack empty do
save
unstack
link of x
end
return x
so.
loop from up top
plus
x=Null
while ! stack empty do
save
unstack
link of x
end
return x
but why are you stacking top if y're not going to use it.
to get rid of top
-------
Mon 02/12/07
Recoursion -
removing rcoursion
tree
- formal definition: a finite set of one or more nodes such that:
1) There is a special node called root
2) all the nodes are partitioned into N≥0 disjoint sub-sets, where each subset is a tree.
Root
Parent
Child
Sibling
"tree is a finite set" - it terminates -- that's good. a tree quits sometime
one or more nodes
one of them called root.
all the other notdes are partitioned into n≥0 disjoin (really important) sub-sets where each subset is a tree.
Root (A)
Nodes a b c d
----- B
----- C
but then b and c are trees
so D can be a sbuset of B
disjoint means that the subtrees don't connect. -- thre is only one path to a given node.
one special node called root
everything else is the subset
the root is the parent of everbody, then everbody is the child -- or great grand child
think of a family tree.
usually we draw them downward in computer
and everyone of these subsets is infact a tree.
just one node is a tree -- a root with no children
who are the children of c? -- fg e -- they're one level down from C
given any node, there's only one parent -- on eway to get there
sibling -- you've got the same parent
"all other nodes are partitioned into N ≥0" --
jisjoint -- can't connect multiple times. -- only one path to a node
subsets are trees.
in definition, there's no implied ordering. -- you can move the sibilings around under the parent.
corporate tree
sports conference
how will we represent that
could have a node with a lot of pointers that pont to the children.
a node has a variable amt of link fields.
binary tree
every node splits two ways -- but you can have empty nodes.
we will convert other trees to binary tree.
Finite set of nodes which is either empty or i have two links.
and all other nodes are binary trees.
Level 1 -- root
level 2 - children or root
etc.
powers of 2.
2^(level# -1)
Max number of nodes in tree with level n is 2^n -1
every node splits two ways .
An implimentation
Data [A|B|C|D|E|F|G]
that's a tree
Root =1
left = pos *2
right = pos*2+1
parent = pos div 2
these rules recreate a tree.
you can't do any better than this for a tree.
but if we give F a child,
that child, h is at 13 -- and we have tons of blank memory cells.
so this only works well for a dense tree
this is called the linear representatin of a binary tree
root -> [left|data|right]
its a linked listish tree
class NOde {
public:
char data;
Node * left;
Node * right;};
Node * root = null;
Visit(Root) {
if root = null, return
cout << Data(root)
Visit(left(root))
Visit(right(root))
}
remove recoursion for this... maintain our own stack
Visit(Root)
Init(stack);
Begin:
If Root = null, then //return
{
(3) if s.empty() then return
unstack( __, __, ra)
goto ra
}
print data(root)
// visit (left(root))
stack(root, ra (1))
root = left(root)
goto begin;
(1)
//visit(right(root))
stack(root, (2))
root = right(root)
Goto Begin
(2) // Return
Goto (3)
Visit(root)
Inst Stack
Begin:
while(root ≠ null) {
print Data(root)
stack ( root, (1))
root = left(root)
}
(z)// root = null
If s.empty then return;
unstack (root, ra)
goto RA
(1)
stack (root, (2))
root = right(root)
Goto begin
(2) Go to (z)
but .. if i go to 2 i go to z, so just make z a 2.
begin.. go to begin -- is a loop forever
goto ra from 2 -- has to be either a 1 or 2.
but we through away the twos that we stack, so don't stack them
and then the goto becomes goto 1
which is the next statment
but now we don't use ra anymore, so don't stack it
that looks pretty nice.
-------
Wed 02/14/07
binary trees -- every node has a right , a left, and a data field
all nodes are binary trees.
visit/ process a tree
preorder - visit first, then you go left and right
inorder - visit in between
postorder visit last
preorder
starting at root
Visit data(root)
preorder(left)
preorder(right)
return
in order
visit in between
postorder
visit last
left
right
visit
post order doesn't refine as nicely.
nonRecPreOrd(Root)
int stack
r=root
loop
while (r != null)
visit Data(r)
stack (r)
r = left (r)
end
if stackempty then return
unstack(r)
r=right(r)
forever
try it
in order -
int stack
r=root
loop
while (r != null)
//visit Data(r)
stack (r)
r = left (r)
end
if stackempty then return
unstack(r)
visit Data(r)
r=right(r)
forever
post order doesn't refine as nicely.
parallel arrays
grow a tree
tree
pointer somewhere in middle of tree
add a new node as the immediate left subtree of the node pointed to by p
InsertRight(P, data)
Getnode(x)
Left(x) = null
Right(x) = right(p) // in our example that's null at this point but this way it'll work in the middle"
data(x) = data;
right(p)=x
InsertLeft( P, data)
Getnode(x)
Right(x) = null
Left(x) = left(p) // in our example that's null at this point but this way it'll work in the middle"
data(x) = data;
left(p)=x
delete
we want to know p -- but we gotta know parent.
we've got problems too if we delete a child that has children.
if we want all the children and grandchildren of a node gone,
we can delete doing a postorder.
that deletes the last one first.
... winodws folders
InsertRight
InsertLeft
"del"?
delete command
just
delete root; -- put it back to the heap
Mary
Joe Sam
Bill Fred Wanda
Bill
Joe
fred
mary
sam
wanda
in order traversal .. is in order!!
binary search tree
Left of P has values < Data(P)
Right of P -- values ≥ Data(P)
(fred was in the wrong place)
sally would be sam's left child
O --- n log(n)
so how oo you build one of these
r = root
if (value > Data(r)
R = Right(r)
--- if that's null, insert right of r
else
R = Left(R)
--- if null --- insert left(r)
build root -- put max_int for value.
binary search tree.
but if the tree is real lopsided -- its more like n/2
-------
Fri 02/16/07
big O
O(1) - flat
O(log n)
O(n)
O(n log n)
O(n^2)
ADT
Matrix -- an empty (rather) matrix takes up a lot of empty space
stack - array, linked list
postfix eval
state diagrams -- show it working.
don't memorize the postifx and eval code to understand how they work .. understand well enough that given enough time you should be able to understand it.
icp and isp
postfix algorithm works because of those values.
queues
^ 3 4
*/ 2 2
+- 1 1
( 0 4
-∞ -1 n/a
say you're in an environment that uses one priority table not too. - what doe s that imply about the expressions that its handling?
then we'd need a special case for (
exponent would have to be a special case (can't do it) as well.
also can't handle a=b=c
Queue - array and linked list
major application: simulation.
bob's gas
arrivals
departures
event list
arrival patterns - uniform, bell curve, posion curve
random #'s
array representation of linked list
int heap
getnode
retnode
parallel array's -- dta and link field
subscripts became pointersavail -- was a pointer to all the free nodes.
doubly linked
access mmthods
add,
addafter
addbefore (today is yesterday's tomorrow)
delete
used header node and circular to make the header and delete procedures easier.
recoursion and linked structures.
copy a singly linked list
removing recoursion
giving a recorsive procedure , come up with a non-recoursive version
General trees
... binary trees
traversion
preorder
inorder
postorder
references
chapter 6 - linked list
6.6 - array representation
chapter 7 - stacks
postfix and eval
chapter 8 - simulation
chapter 10 - recoursion
10.3 implimenting
11 - doubly linked lists
11.3
12 - binary trees
short answer.
no fillins, no tf, no multiple guess
short questions after ideas.
coding in our psudo language.
if(v≤0) return;
cout << v;
print(v-1);
return
}
print(4)
what we did in lab ... important
fill lab.
when you get an operator, you should unstack whatever has greater priority than what is coming in.
Binary search tree
L <
R >
Search(root, value)
(if we don't find it, return 'null')
R= root
While (R ≠ Null)
If (val < Data(r)) R = Left(R);
else [Found it at "R"; Return]
End
Return (Null) / not found
End
look for 9
speed of search -- log n provided that the tree is nicely shaped.
could have a tree lopsidedlike
-------
Mon 12/19/07
array representation of a linked list
first, put everything on avail.
getnode gives you a free node from the front of the list of avail ..
basically an unstack
x = avail;
avail = link(avail);
Delete
data array and link array
pointers became subscripts.
link(top) = link[top]
printlist
print data[top]
top = link[top]
traversing a binary tree
in order -- not going to visit until you do the whole left.
doubly linked list.
with header node
deleting
if p != header node
left(right(p))=left(p)
right(left(p))=right(p)
matrix -
2d array.
Get();
Put();
you could make this a 2d array.
but bad for sparce matrix.
we tried 3 1-d arrays
addin was fast
searching was slow
Postfix
A+B+C ..
ABC*+4,7^-
stack operators
unstack something that has priority.
while (isp ≥ icp(token)
unstack operator
output
end
stack(token)
isp, icp
^ 3 4
*/ 2 2
+- 1 1
( 0 4
-∞ -1
operand to output
unstack if higher priority
empty the stack at the end.
< done after +, -, so between 0 and 1.
( at bottom because we only unstack at special case.
and OR is laster than the greater than, less than.
removing recoursion
stack parameters and locals, and ra.
return - unstack.
...
everytime you make a call, stack current values (parameters, locals, ra)
set new parameters
jump
return . . .
unstack and go to ra
distributions
uniform
vs.
poison
-------
Wed 02/21/07
binary tree
binary tree
[left][data][right]
7 nodes
14 link fields
only on way to get to a particular node.
root is a pointer, but its not a link field.
6 nodes pointed to by a link field
we've got more null links than not null.
some folks think we shuld make use of those null links
call them threads -- they can point to another node.
right threads -- point to the inorder successor
in order traversal
if empty, stop
left
visit
right
non-recursive version
InOrder(Root)
init stack
loop
while root ≠ null
stack(root)
root = left(root)
end
if s.empty, return
unstack(root)
visit root
root = right(root)
forever
end
every place we have a null right, point it to the in order sucessor.
if the last think you did was visit a node
say we just visited b -- goto the right of b
that's a
by definition, its an in order scessor, so we stop
now i just visited a
go right -- go lef as far as you can .. get to l.
header node
new format -- left, data, right, bool thread
INOrderSucc(P)
right (one)
if thread then stop
left as far as possible
now we can do an in order travseral without a stack.
InOrderSuc(P)
s = right(P)
if (p.thread) Return (s)
while (left(s) != null) do
s = left(s)
end
return (s)
end
P=root
loop
P = indordersec(p)
if p = root return
visit p
forever
try it out
BST
insertLeft
getnode(x)
left(x) = NULL
right(x) = p
tread(x) = true
left(p) = x
or so
InsertRight(p, val)
//
//
Getnode(x)
data(x) = val
left(x) = null
right(x) = right(p)
thread(x) = thread(p)
right(p) = x
thread(p) = false
costs us two statements to build.
InsertLeft(P, val)
getnode(x)
data(x) - D
at the end of the tree (left(p) = null)
Left(p) = x
Left(x) = null
Right(x) = p
*Thread(x) = true
every node with a non null left is pointed to by a thread
we gotta find the thread that pointed to the parent .. its an arbitrary distance away and we gotta point it to the new node.
if left(p) ≠ null
q = left(p)
while !thread(q)
Q = right(q)
end
Left(x) = left(p)
right(x) = p
thread(x) = true
left(p) = x
right(q) = x
so ajusting the thread is proportional to n
but binary search tree nver adds in the middle
so then we never run into this problem.
Take A and point it to the first child
the other children of A are going to be hooked up on the sibling pointer
Put right threads in
flip the tree 45degrees
empty sibling points to parents
so to get the parents, follow the sibling pointers until you get the parent.
any tree you draw, you can come up with
wanna save the binary tree.
how????
combine pre and in
A is first node
then all the nodes before the root are on the left
well who's the root of left?
b because its visited first.
p = p -> link; // teh suck
slow!!
L Data R arrays
binary write the whole structure at once.
so alls you need is an array.
Sort algorithms
binary search tree
n * log n
can't sort unless you can search
Indexed search
search index -- index is going to be parallel.
search array for key value -- if you find it return the position where you found it .. otherwise, return status indicator
index and n
void Search (val) {
p = -1
for i =0; i < n; n++
if == then set p and returh
return p; // -1
and a for is really a while
there are two decisions
can we make only 1?
yes -- put the if statement as the stopping condition
what if its not found?
but it in index[n]. we're not using n -- only n-1.
-------
Fri 02/23/07
threaded tree
right threads
advantage: process the tree with an inorder traversal without a stack
doesn't cost us much (well insert left for middle of the tree)
binary search tree -- always add at end of tree.
Converting from general to binary
take our node -- instead of left and right pointers, have first child and sibling.
-------
Mon 02/26/07
seqential search
faster sequential search
proportional to n
binary search
bring in array, value
return position
proble in the middle
stopping condition -- if low is ever bigger than high
low is the lower bounds of what we are search for , high is the top end.
should be on g:
keep the low and redfinite the high if less than
otherwise do the other if greater than.
otherwise its equal to, so you found it.
finding 32
0, 9
so first probe at 4
n=15
well constructed binary search tree // full tree
8
4 12
2 6 10 14
1 3 5 7 9 11 13 15
optimal tree
how fast -- log(n)
but we can be more recise
actual average search time is log(n) -1
because most of your searches require four looks.
average for thh first 3 levels.
1*1 + 2*2+ 4*3 = 17
/ 7 = 2.4
four levels.
eight mmore times four
3.2
gets closer and closer to log (n) -1
for the average search.
biggest yeahbut about the binary search -- the data has to be ordered.
int Luck(int data[], int, int val) {
int x;
do {
x = rand() % n)
if(data[x] = val return (x);
} while(1);
}
sequential:
best: 1
worst: n (at then end, not in there)
average: n/2
binary search:
best: 1
worst: log(n)
average: log(n) -1
binary search tree:
best: 1
worst: n
average:
luck:
best: 1
worst: ∞
avg: pretty good -- because there's no overhead.
O(n) (?)
best algorithm - index search -- no search! -- O(1)
***** (don't forget this)
yeahbuts - needs dense parrt numbers
not necessarily start at 0, but dense.
address calculation -- data[x-100]
Hashing
keys are not dense
41 11 <== probes
55 13
12 12
42 2
60 5
90 0
put in array
probe: key % tablesize
return (probe)
probe -- works better if you use a prime number for the key
one way to handle two things for a probe -- bucket
we'll loook at chaining
and progressive overflow.
chaining -
probe = key % table_size -- we want prime number (why?????) for table size
36 goes to 1
22 goes to 1
31 goes to 3
make a linked list at each spot.
Findkey (key)
prob = key % table size
p = table[probe]
then printlist -- while loop sequential search.
return a pointer to where you found it
best case = 1
worst = n
avg = 1.5 -- according to "them"
addkey
probe = key % tablesize
and add to front
wed. progressive overflow
-------
Fri 3/02/07
lab help
find the depth -- 16 or 17.
writing the file
reading it -- using getline
fstream df;
df.open("bob",ios::out);
cout at the end.
while loop in your depth program -- not going to work.
see handout.
divisor = probe = key % tablesize; -- cheap but it doesn't always give very good results.
you can do anything an hashing works.
so the whole idea is to do something to the key and hope that it distrubtes you rdata nicely.
another technique - mide=square
square the key
extract the middle digits
example
100,000 - 999,999
7,000 are unique
we'll use a table size from 0 - 6999
so square the key
172148
029634933904
you get no more than a 12 digit number back.
now remove the middle four digits
how many do i remove? -- if my table size is 7000, i need to remove four in order to represent 7000
take num/ 10000 = 02963493
prob = key % 10000; .. gives us 3493
but we need a range of 0 to 6999
so multiply by .7 aka 7000/10000
so divisor method gets prob right off the fist statement
this one is a bit of work but may give you better results.
we have 7000 unique probes.
collisions are bad -- we get sequential searches.
if you get a collision, rehash. -- so maybe you has with one function -- divsor -- so maybe you take a different function and rehash. use that as the offest to your psoition.
.. and one rehash is good -- its not worth it to rehash again.
can't let hashing go because its average performance is proportiional to 1.
but the worst case is really bad.
find pankratz in the poone book using hasing -- you can't .. nice try just using the string.
but you can sum the letters and use that to get your probe.
yeah but ABC = CBA.
we want to involve the psoitions as well.
so multiply by 100 each time.
but then our smallest key is 656565
but if we substract 'a' -- then we can get to zero.
well we still have big numbers
why not multiply by 2 instead -- then we have something that screams, sure we'll have collisions that way (?)
shift ---> << operator ????
whatif you were goging to write a speller
here at snc we have 400,000 words -- no word is longer than 6 characters.
we want to look up the words.
fast!
we could do a 26 branch tree instead of a binary tree
taat would be 8^26 nodes!
to print it, you'd have to use recoursion
could have * at then end of every word to differentiate car from carry
sorting.
bubble sort.
you could build a linked list.
also n^2
add to array
also n^2
binary search tree - for nice looking tree log(n) -- did that for every node, so n log(n)
we're going to look at three more sorts
insertion (fastest for small n proportional to n^2)
quicksort (fastest for big n)
heapsort (in the middle ..)
quicksort is n * log(n)
insertion sort -
insert your value in the right place.
14
18
12
91
6
insertion sort says this much of the array is ordered --
take the next value and call it my keyy -- put 12 in the right spot --
two parts
figure out where 12 goes and move the other guys down -- but we'll do that all in one.
if you're bigger than the key, move it odwn -- that moves 18 down 1
now we clobbered our 12 but we saved it in the key
i'm empty so my key goes here -- so that much of the array is sorted
now we do it again.
say the array goes from 1 to n.
we'll do the trick that we did with the sequential search.
one stopping condition is when you're done
the other is if its a smaller value
so lets put a very small value at the top
so we speed up the inside loop by having only one stopping condition.
Pro InsertionSort(Data,n)
// sort "data from 1 to n
data[0] = -∞
for i = 2 to N
j = i-1
key = data[i]
while key < data[j]
Data(j+1) = Data(j)
j--;
end
data(j+1) = key
end
end
demo.
insertion sort likes ordered data.
-------
Mon 03/05/07
Insertion sort
how fast?
best -
worst
avg
takes values -- and says this much of the array is ordered.
grabs the next one as the key.
best case O(n) .. for every value, you didn't have to do anything
.. happens when its already ordered.
proportional to n * number of values out of order
worst case -- reverse order
n/2 for the inner loop -- n times
to O(n^2)
what's the average
assuming random ordering
still O(n^2), but lower constant.
Quicksort
proc qsort(low, high)
if low ≥ high Return
i = low; j = high+1;
key = Data[low]
loop
Repeat i++ until data(i) ≥ key;
Repeat j-- until data(j) ≤ key;
if i<j then swap(data(i), data(j)
else break
forever
swap (Data(low), Data(j))
qusort(low, j-1)
qsort(j+1,high)
procedure quicksort () {
(1-N)
Data(n+1) = ∞
Qsort(1,n)
end
Data
26
5
22
37
1
17
61
11 <--8
∞ <--9
try it
low = 1, high = 8
i = 1 (low)
j = 9 (high + 1 , and then we decrement it)
key = low = 26
start looking for values bigger than 26 .. , less than 26
repeat means -- do while
i stops at 37 -- that's bigger than 26
j -- we stop decrementing whenever we find a value less than our key - we get one right away (11) so j stops there
the two havent crossed over
both are out of order -- so lets swap them.
we dont look at 11 again -- because its i++
i stops at 61 because that's greater than 26
j stops at 17 -- because its less than the key
the two crossed over, so we're out of the loop
so our key gets swapped at position j
26 is placed for ever -- it will not move
we've got two separate processes so we do it again
call quicksort on low and j-1
....
another example
50
8
52
6
90
17
89
..
powers of 2
16 values
1st time thru, you look at everybody
level 1 --- 16 or n
level 2 --- n/2 n/2
level 3 = 4(n/4)
...
we look at them all log of n times
n * log(n)
this is the fastest algorithm known for sorting random data.
worst case: n^2
reverse order is bad -- that's the worst case
in order must be good.
best case - n log n
worst case BAD n^2 -- lot of overhead.
what if data is in order.
that looks like the worst case.
wait .. they're both the worst case.
quick sort HATES ordered data!
it wants splits in the middle
what's the best case ? -- when the key that you grab is in the middle. .. but then there's the key you swapped ....
quicksort wants about equal splits
QuickSort -- nobody's beat it.
average case -- n log n ish.
gets really bad when you get dinky little arrays
a lot of overhead for an array of 2 .. or 3
but that's ok -- insertion sort likes little arrays.
furthermore, quicksort likes to give you sorta in order data.
so when you get small, call the insertion sort.
so when you make the quicksort calls, if the size is small, call insertion sort.
what if we loook at 3 keys -- first last and middle -- and take the middle of those three that you look at.
so now the worst case becomes the best case.
for random, it doesn't matter much -- 8% increase.
if you run out of stack -- do insertion sort.
-------
Wed 03/07/07
insertion sort
16
18
21
6
14
91
14
9
grabbing and finding where it goes -- two separate problems.
the search part was n/2
move everybody down -- another proportional to n
1) find location
2) move/ make room.
but we can make location search log(n)
Insertion Sort Big N -- we can do a binary search for step 1.
but we should never use insertion sort for Big N
Heap sort -
with a linked list,
sort - O(n)
adding is 1.
quicksort
heap - n log(n)
"heap" - 1. where dynamic memory comes from.
2. a binary tree with seach node's value ≥ its children.
how does it work?
1) build the initial heap
2)
sort this
81 (root) is the biggist, so put it down and never move it.
swap the root with the bottom.
Adjust -
ajusts a heap if only 1 node is wrong?
adjust is log n
we call it n times.
1 -- 14
2 - 13
3 -- 10
8
6
1
4
left = p * 2
right = p * 2 +1
parent p /2
heap sort actually represents the tree as an array.
biggest val at 1
put it at bottom of tree
so swap 14 and 4
chop off box 7
now bottom at 6
call adjust
largest value is on e of the children
largest value is at either 2 or three
swap 13 and 4.
now tree at psoition 2 is out of order, so we call ajust on it.
biggest is either at 4 or 5
so we swap
.....
take the biggest, 13 and swap it with bottom.
code for adjust
Proc Adjust (i,n)
// make a heap of tree rooted at i, n nodes in the heap
provided the children of i are heaps
if i is a leaf node return (leaf node = no children) -- half of them are leaf .. that's how a binary tree works -- so if i > n/2, i'm a leaf node .. so get out
otherwise,
m = position of the max of data[i*2] and data[i*2+1]
if data[m] > data[i], then swap em
but when we swap them, we screw up the tttree, so we call adjust (m, n)
Heap sort
Proc heapSort(data, n)
// build the initial heap
for i = n/2 to 1
adjust (1,n)
end
for i = n down to 2
Swap(data[1], data[i])
Adjust(1,i-1)
end
end
but how do we build the initial heap?
well all leafs are heaps
so we don't have to loook at half the tree.
can call ajust as long as the children are heaps
Ajust 5,n
adjust 4,n
adjust 3,n
-------
Fri 03/09/07
56
5
77
1
61
11
59
15
48
19
tree
root =1
left p*2
right = p*2 +1
build an initial heap
start at bottom; keep calling adjust
start calling with position 5 -- 10/2 -- first node with a child.
adjust(5)
adjust(4)
....
adjust(1)
with initial heap
take biggest value (position 1 .. and stick it on the bottom -- and cut that off.
call adjust on position 1
heap sort -- n log n.
but we build a heap -- which itself is .5 (n log n)
so 1.5(n log n)
best case worst call -- its always that n log n
stable sorts.
if we have duplicates and they come out in the same order as they were initally, that's a stable sort.
why does thi matter
sorting the phone book
we want to sort by last name and then first name if there's a conflict. sort by first name first. then sort by last name -- only works with a stable sort
so what are our algs.
insertion sort is stable
25
22a
1
22b
3
stable because you say "if you're bigger, move it down -- NOT ≥"
heap and qsort are not stable
sorting character data.
sure we can .. but be careful.
better be consistant with lower and uppercase
in ascii land -- ABC then adb
in ebcdic
blank
special stuff
a-z
A-Z
numbers
its worse if you sort numbers as characters.
for floats, line up those decimal points!
he likes printf to generate the text file
don't let the user generate the text file if possible.
First Last City State
state is my major sort key -- i want everything ordered by state
how do you do that?
one way -- sort on the most minor key first.
then sort on the next most minor.
but the last name sorts would have to be stable or it wouldn't work. .. no qsort or heap.
another way to do this.
.. make a pass of the data file.
build a concatinated key
take state concat to city then last name and then first name
then sort on this concatinated new key
that puts all the wisconsin people together and then only looks at the city if you're in the same state.
this only works if these are fixed sized fields.
sorting with major key
3n^2 n log n
concatinate sort
2n then n log n then 2n again to get it back.
one more thing to think about -- an index sort.
build an index on your key -- say, artist .. with pointers to the data
then just sort the index --- not the data.
so the pointers are rearranged, not the actual data.
could have two indecies
so it looks like the file is sorted by both title and artist.
merge sort
two separate arrays
A1 and A1
sort them together
smaller one gets copied in .
but the two arrays are already sorted.
but you can sort little chuncks of it -- and call them a1 and a2 and then you take a1 and a2 togther and then you take a3 and a4 together and then merge those two together.
-------
Mon 03/12/07
20,000 values in random order -- and you want the 10 smallest.
Heap
always a full tree
fill it with the first 10 values.
-- the 10 smallest so far
by definition, we're only going to hold the 10 smallest.
3 is less than 101 -- the biggest of our smalls
so put 3 in for 100 and call adjust
etc.
so .. we have to read every value, but we're only comparing to the root -- you either ignore or swap
reajust is n log 10 .. only storing 10 nodes.
its an interesting algorithm to keep in your head.
AVL tree
Adison Velski and Landis
a binary search tree - garantees log(n) performance -- (not optimal, but log(n))
everything associated with the BST was log(n) when it was optimal
these guys rotate parts of it when it gets lopsided
also called a height balenced tree .. meaning its balanced by heighth not by weight.
the height of the left == height of the right
every node in the tree has a balance factor =
the height of the left subtree - the height of the right subtree
if i'm here, what's the longest path ?
you have an avl tree if the balance factor is -1, 0, or 1 on every node.
if the tree ever gets out of wack, you put it back in by doing these rotations.
how do you build? start at the root
C
B
A
that screws my tree up. now the balance factor for C is 0
LL - rotate around the node that's out of balance.
so we will promote B, keep A where it is. and the node that's out of balance gets demoted down.
call it LL becase we inserted left.
D
C E
A
Add a B - that screws up two nodes, but the rotation is always around the neearest node to the inserted one.
This is a LR rotation -- we did the left subtree and then the right subtree
A stays in its place .. B gets to be the root, and the C gets demoted.
four rotations, but only two to worry abut
LL and LR
and then the absolute symetric RR and RL
G
E H
C F
add D
rotation is always around the nearest node that went out of balance.
LL rotation
So E bcomes the root -- G gets deomoted down and anything attached to it and the D stays
what the f happens to F.
it got disconnected.
well G's left is now open -- so what ever was in E's right goes in G's left
and now we have a balanced tree
E
C G
B D F H
add M, and everyone's happy.
now add Z.
Ouch.
rotate around H
RR rotation
opposite cupping
so m becomes the root -- h gets demoted and z stays innplace
add another H (that's greater than the first one)
RL rotation
on G M and H
M stays
H gets promoted
G gives down H2 gets connected to M's left
and the rest of the tree stays.
gotta get down to n to add .. log n
rotation is constant -- based on the node that went out of order. constant cost.
thisis huge -- prevents really lopsided trees.
an O(n log n) sort -- log n plus you do it for every n.
code isn't as sticky as you might think -- see book.
whole idea is that you build the tree as you normally would -- rotate when you have to.
could use hashing .. has a great best case -- but a bad worse case -- progressive overflow fills up and gets mudy at 80% .. chainging doubles the storiage initially and the linked lists can get full
hashing's worst case is really bad -- so if you need it garenteed log n, then use AVL tree
--------
Graph, Network
you can get to a node multiple ways -- a node has multiple parents
not disjoint
computer networks.
transportation graphs
veretices and edges
edge - link
vertix - node
vertex has multple link fields
how do you represent this?
adjacency matrix
adjacency list
matrix:
A B C D
A 1 1
B 1 1
C 1 1
D 1 1
for graph
A
B C
D
with links like an outline
oneway link -- you have to know which way i can go -- directed graph -- on eway to go
undirected -- t way links -- you go from either vertex to any of the verticies with equal ease
we've just shown in our matrix undirected
and when its undirected, like in milage charts, we only have to prespresent half of it
and .. we're only useing a big to make the connection -- zero or a 1
so to store an ajacency matrix for 100
(100 * 100)/8 and /2 if undirected
adjacency list
A ---> B->C/
B ---> A->D
C ---> A->D
D ---> B->C
works really nicely if the connections are sparce.
the size of the list is n, but the number of nodes is how many connections you've got.
different representations.
could connect B and C
in a graph , the edges mayy have information about the connections.
no root -- because anything is equally accessable.
given, yo're going to start somewhere
typical problem - given the graph, visit all the nodes.
proc visit(spot)
spot = A
Visit spot
For each vertex connected to spot
visit(that spot)
End
End
umm - that's infinite
so we ned to change it to remember that you visited the spot
and then only visit it if i haven't already.
-------
Wed 03/14/07
Qsort
recoursive
qsort does like small n -- we said when the subarrays get small, call insertion sort
or when the subarrays get small, don't do anything
then when
qsort is done, call insertion sort on the WHOLE array
because its primed for insertion sort
one more misc thing
find the median value of a whole set of values
make a heap of he small values so the largest of the small would be at the root - the median
a heap of the n/2 smallest values.
graphs
really, a tree is a graph
a graph has multiple connections to a node
undirected - 2 way links
directed - one way links -- labeled which way do you go
adjacency mattrix
rows and columns of all the vertecies listed both ways -- only a bitmap.
if all the links are two way, then you only need to represent half of it.
adjacency list
list of all the vales and the connections to it.
works out pretty nicely if you dont have a whole lot of connections, edges
we were looking at processing this ..
adjacency list
1 -> 2, 5
2 _> 1,3,4
etc.
we'll use recoursion
for everyone that's hooked, we'll make a recoursive call to the linked one.
gotata make sure yo don't revisit.
so, we'll have another structure that says "you've already visited"
Proc Depth1st(start)
// set everyone to not visited
for i = 1 to n
visited[i] = flase
end
visit(start)
end
Proc Visit(p)
visit vertex p
ptr = List[p]
while ptr != null Do
v=vertex(ptr)
if not visited[v] visit(v)
ptr = link(ptr)
end while
end
Depth1st(1)
Depth1st -- you go as deep as you can
spanning tree - a tree because we visited every node only once.
spanning tree algorithm
numbrs 1-6 costs on each vertex.
looking for a spanning tree that minimizes the cost
that looks hard -- lots of possibilities
algorithm -- in our book
1) sort edges based on weight
2) pull from the list the next smallest edge weight
3) if its not already included in a cycle, then we inlude it.
sort this
at 5 2,3
at 6 2,4
at 10 3,4
at 11 6,2
14 6,4
so and 3 haven't been included , so take 2 and 3 at the cost of 5
4 hasn't been inlcuded, so linclude at a cost of 6
n + n log n
-----------------
for our visit algorithm, we could replace the visit with a stack .. and then at the top, take from the stack.
now lets change from a stack to a queue
that's a bredth first algorithm.
binary tree
recoursion -- depth first
queue - breth first
task/job management
3 programs running at the same time.
binary files
fstream df;
df.open("file", ios::in|ios::binary);
//df.read( address , # of bytes);
Data2 d;
df.read((char *) &d,sizeof(d));
as soon as you send the read command, the request toread is queued
anything you get in memory is free compared to disk access.
-------
Fri 03/16/07
iterrupt driven and time slicing
generally interrupts - i/o request
time slicing -- if you've got four things being processed at once.
read/write -- once you issue that command , you can't continue.
binary
class Data2
int
float
array int [10]
array of chars.
fstream df;
Data2 d;
df.open("hey.txt", ios::out|ios::binary);
d.i = ....
strcpy(striny,"hi");
df.write((char *)&d,sizeof(d))
do it as many times as you need records
when you're done,
df.close();
read
df.read((char *)&d, sizeof(d));
read until you get to the eof.
do
read
check for eof
process
repeat forever
see handout
disks - data recorded on tracks. - concentric circles. .. every track holds the same amount of data.
disk delays: seek time, rotational time, and transfer time.
we can eliminate seek time by usuing many read heads .. but that's costly.
if we have 12 records, we have to pay the i/o cost 12 times.
when we finish reading the first record, we're in perfect psition o read the 2nd one .. so read it and put it in some temporary place. - because we're gonna need the rest of it.
so we read ahead and put it in the buffer.
so if the data is buffered, the next time the program asks for data, its already read for it.
this is called blocking.
physical vs. logical
logical i/o is what the program thinks.
physically, i've got 12 records at a crack.
os pays buffering cost.
what if we have more than 12 records?
then we have more than one record block
we'll put the other block in another buffer.
3rd chunck? we ran out of memory for buffers at that point.
having two buffers -- double buffering., anticipatory buffering.
if we process the first row fast enough, then we can reuse it for the 3rd chunck of data -- but that's a huge hope.
this buffering is so important that windows does it no matter what.
works fine for sequential files but not direct ones.
file systems, file structures.
file systems -- os support.
text files -
they're worse, he says
df.open("bob", ios::write);
df <<" blah";
df.close();
text files convert everything to char -- you pay for the conversion there and back
text files are under software control
1234 bob 12.4
can do
df>>i>>s>>f;
geline(df, s);
i don't know how to read until runtime.
he says binary is 41 times faster.
-------
Mon 03/26/07
disks - one platter
concentric circles -- tracks.
track addressable and secotor addressable --
our disks today are sector addressable
you chop the tracks into sectors --
it looks like pizza slices.
physical i/o
vs logical i/o
logical -- what the program thinks is out there.
df.read((char *) myData, sizeof(myData));
say that the data is 310 bytes (per record)
in memory for program
------------
code
------
data
i/o channel -- disk controller
i/o channel sticks the data into the buffer
tracks - concentric circle -- every track holds the same amount of data
cylinder - every track is numberd --- the same track on every platter.
so if you have a pretty big file , you want it all allocated to the same cyllinder.
worst case is to have a little data way out on one track and more in a track far away.
when you get to the disk, all i/o is physical.
that means that you're always going to read a whole physical record. -- the space delimited by boundaries - sectors (or tracks)
ch='A';
df.write((char *), &ch, 1);
and loop
that's foolish - yo're writing one byte and you're taking up the whole track.
all i/o on the disk is physical
with a track based disk , we have record gaps to worry about.
our current os usually sets the buffer to the sector size -- its gotta read a whole sector at a time anyway.
so if the sector size is 512, a record size of 310 doesn't work so well. - we cross sector boundaries -- and then its two physical i/o's
track addressable disk -- can only sense the begining of a track.
you put the data out here and it has something called record gaps.
the minimal amount of space you get is a track.
but the record caps can be sensed as well. but the faster you move, the larger the record gap has to be.
we're kinda going back to track based disks for server because HD space is so cheap now.
text files
if you send a write command to a text file, it doesn't actually write the data until the write buffer is full.
the flush will write it right away .. but that slows things down immensely.
if you write to the size of the sector, things scream. so writing more data is faster -- because you're lining up the sectors with your records.
records usually are staggered -- so that there's breather room between them.
you can append files, but that leads to fragmentation - usually we cant add to where we left off.
so its best to write a bunch of records at once -- so that they'll be together on disk
so what can we do?
use a multiple of the sector size -- at least a power of 2.
don't use text.
disk delays
- seek time - move to proper track
- rotational delay -- the time it takes for my my record to line up underneath.
- transfer time -- how fast is the drive? how long does it take to read?
see handout
i have a file and I want it sorted.
we want to sort by ID.
we have 2000 bytes on each person.
it won't fit in memory -- sort it.
process charge -
the way you sort is you copy - and the copy is sorted.
the big problem -- you can't read and write a file at the same time.
in --> sort/merge --> out
unix has one -- windows has one only in DOS.
probably 12 man years in designing this thing.
works because you only need to have smaller chunks in memory.
so say sort/merge exists.
lets sort a binary file.
gotta tell sort/merge
1) where the key is
2) how big the records are . .. the key HAS TO BE in the same place .. or how will i know where to find it.
how do you comapre the values -- representations are all different.
so its hard to sort a binary file.
say the file's a text file....
gotta tell it the key.
where the key is.
you ave variable record lenghts -- where the eol is --under software control.
but you have tokens -- but then you have to do token processing.
DOS
in --> sort --> out
how to do it
sort /+10 <in> out
+10 -- key starts at col 10.
dos won't sort binary
so by putting col 10 -- we mean that 10 positions from the beginning is the start of our key.
starts at 10 and goes on.
unix is more complicated.
man sort
sort +0.17 -0.20 /* minor key */ +0.0 -.11 .. from 1 to 10
major sort key and then minor sort keys.
sorts on 18 and 19 -- starts after 17 -- but the first one is zero.
at then end of the mess, you can put options
-b ignore leading blanks
-f uppercase=lowercase.
-n numeric ordering
-r z-a
0. -- means the first token -- we won't get into that.
-------
Fri 03/30/07
BST creation/traversing
(Binary tree, thread
#) BST associated with Big search.
---
general trees.
if you have a general tree, you can always convert that to a binary tree
[first child|data|sibling]
searching
sequential
sequential with trick -- we stord the value we were looking for and that saved us two decision.
O(n)
if, in the sequential search, you move up the search result one cell -- promote it any time you search for it .. then it'll find popular things faster.
Address calculation. - fastest search is no search.
binary search
BST
hashing - O(1)
- dealing with collisions.
IV) sorting
insertion
heap
quick
(merge)
--> Hybrids
cobol has a sort verb -- but what kind of sort is it doing?
heap n log n
quick n log n
quick is faster on average, but worst case is worse than heap.
stable sorts.
sorting numbers as chars.
multiple key
index sort.
avl tree - balence tree.
BST is a sort -- AVL tree never gets out of balence, so always n log n.
be aware of the rotations in an avl tree!
graphs
directed
undirected
representation
adjacency matrix
" list
traversal -> spanning tree.
looked at both depth first and bredth first --
depth uses stack (double check)
files
os - multiprogramming environment.
interrupts, time slicing.
worried a lot about file proerties -- most certainly text vs binary.
text files are under software control -- have to be parsed out.
physical properties of the disk -- sectors.
blocking -- take a whole group of records and you make a larger record out of it
buffering -- two buffer system.
references:
nyhoff
BST 12.4
Threads 12.6
Bin Search 10.1, 12.1
Sorting 13
AVL 15.2
Graphs 16
Hashing 12.7
psudo code is fine.
- - - - -
Test Monday - yay!
updating a file -- change -- add, delete records, change records.
f the file is small enough , you can input it into a huge internal structure, update it and then write it. -- this is how word and note pad do it.
but if your file won't fit in memory, we've got issues, in that case, the only way you're going to update a file is with a copy.
if you copy a file you get a nnw version
file, old master -> update -> new master.
transactions such as add, changes and deletes, -- you uupdate this by copying and in the process getting an updated file.
we generally have a hunge number of records but only a few that need changes.
how fast is that -- because its a sequential file, you're going to have to read every record and write every record, minus the deletes on the writes, plus the adds on the writes --
generally the deletes and adds cancel each other out.
changes don't cost you anything.
how fast ? how many i/o's are you doing .. that's the part that's going to kill us.
one step deeper
how fast is it -- its like 2n, sort of .
but what kills you is not logical ios, but phyiscal ios.
so 2n/blocking size.
my file has a key and it has to be ordered on that key.
we want to change 41's address and 62's phone number and 10's salary.
but if the file is already sorted, our list of midifications must also be sorted .
so if you have a match, make the change.
only need to touch what's being changed.
now we got another problem.
we pay the same amount no matter how many changes you make to a sequential file -- therefore you must batch your transactions.
M T
3. 100 change
4. 100 change
100 1. 220 add
200 5. 500 A
300 6. 500 C
400
500 2. 500 del
600
700
if we wait too long for the batch time, we might not be able to get the same result.
we can't put it in chronological order becase we're putting it in key order.
sorted by id, but not in the right order
so we're going to do stable sort on this.
balanced line algorithm
every time you do a merge, you have two cleanup loops -- if irun out of one, then i have the other col to c