language
Mon 08/25/08
everything on schedule page.
lab reports due thursday.
programming languages - define, describe, design
- implementatin issues
ome languages are written for speciffic machines -- completely proprietary
define: grammar .. there are very speccific kinds of grammars
Egnlish grammar .. there's lots of rules - punctaaction, nown verb etc
- french - some modifiers after the nouns others preceed
requrements for the way we put words together
.. sentence construction
grammar consists of terminals (punctuation etc) and variables and rules.
if -> if ( expression ) statement;
or ..
{ statment; }
if, { etc are terminals .. constants
expression can be all kinds of things, so there are rules for the expression
so we're not going to write those .. they're almost endless
pascel uses a "then"
and then "end if"
bengali .. subject usally left out
implementation issues:
arithmetic expressions
2 + 3 * 4
c, c++ = 2 + (3 * 4)
so, order of operations is one example of how you need to know whow things are implemented.
if (i < size && i >0)
in pascal, if i is the same as size, access violation -- array index out of bounds
if (i <size
if (i>0)
c++ lazy evaluation -- short circuit evaluation -- doesn't even bother with the second part.
if (p !=0 && p -> value = 0)
and in an OR situaiion, doesn't evaluatin the other side of a true or
depends on how the language is implemented
-------
Wed 08/27/08
Last Time (recap)
language design, description
and why we care about implementation
today:
chapter 1 - reasons for sstudying concepts of programming languages .. preliminaries
reasons for knowing more than 1
- increated your ability to learn more languages - a lot of the foundation hhs been laid . .. enables you to move on to the next one.
another reason to knoo more than one -- choose the appropriate language -- no C for string matching AI
tiobe programming ... web site.
tiobe.com
-- better understanding of your original language
-- different language paradigms encourage/require different thought process.
section 1.2 programming domains
scienntific applicatons -- characterized by lots of digits and accuracy .. COBAL is not your choice, VB is not yor choice you want accuracy out to 10-12 digits, arrays and matricies
FORTRAN and ALGO .. good scientific languages
businesss applications -- report generator
- decimal calculations efficiently .. don't care about floating points past 2 digits
character data.
COBAL
another realm: artifical intelligence -- how to kae a machine learn .. can it make associations .. a dog iis a n animals, but what are their differences.
uses symbolic computations, not numeric .. often comparing strings, names, fields that talk about .. might loook for a certain adjective or name or leationship between a and b..
not about numbers
make lists. .. linked structures
lisp, scheme, prolog, ml
operating systems -- C
system software needs to touch devices .. printers, other machines, network .. file systee .. touching the hardware -- so you have to be able to do that .. low -evel accessibility .. and you can embed assembly in C .. because an OS better run efficiently
systems programmers are usually some of the best programmers in the company :-D. ha!
less safety features in a language used to code an os because they expect that people who are writing system software can aviold those knnds of problmes.
web software
----------
markup languages (not programming languages, because thy're all about visual not abotu computing)
- scripting
- javascript
- perl
- java
very different .. if it crashes, oh well.
what makes a language good?
language evaluation criteria
- - - - - -
readablility
writability
reliablility
initally, in the 60s who cared if it was readable .. programmers wrote code that would get shipped along and no one would look at it again ..
needed a new os, you bought the whole thing didn't buy a lot of sotware -- built in
so when you wrote code, you saw it, the group did and that was about it
but back in the 80s partability and reuse and upgrade came about
.. so sombody had to look at the code.
waterfall model
- requirements
- analyse the problem
- design a solution
- implement
- test/verify
- maintain
if you wre going to maintain software, somebody had to look at the code and understand it agin.
important to sustainable software .. also as software got bigger .. people who implemented not necessarily the testers
not talkin just comments but that your ocde is readable as well.
simplicity - number of constructs the language has
for C++ there are 5 ways to make code repeat
if there are too many structures, then the subset that I know is not the subset that you know.
also harder to learn
something else for readability - feature mulitiplicity
. there are several ways to say the exact same thing -- i++ vs ++i, vs i = i+1 vs i += 1
p -> value
vs *p.value
operator overloading - if i say a + b
if a and b are ints, doubles it adds
now, the process is different
strings -- concatinate them
but what about arrays -- the meaning is not clear
typical vs atypical uses
so those things make a langauge read, write and reliable
orthoganality - idea that you can use a small set of constrcts in small ways to build other structures
small number of oonstructs can be combined in small number of ways to build control and data structures
.. IBM assembly language vs VAX
in vax, we could addl (add long)
and I could specify the first and second operator
could be memory cell or register and whatever combination, it works.
op2 gets op1 + op2
small -- 1 statement to combine several operations
IBM
- add reg 1, mem1
register gets the contents of the memory cell plus the register
ADDR - adds two registers
that's not orthoganal -- I have to use two things to accomplish half of what the other thing is. in this case, i'm not allowed to combine constructs.
another example:
C++ function
function (int a, string b)
pass by value
except for arrays!
so this is an issue with readability and writability.
relyability -- potential to change the array
p+i data *p and int i
so really p + (sizeof(datatype) * i)
control structures
repetition or conditionals
.. order or flow of the instructions
continue & break are really goto
control strcutures can make it a whole lot easier.
-------
Fri 08/29/08
letters and numbers and the symbols .. have to strip them based on that.
finish chapter 1 today
last time:
- issues of language design
finish chapter 1 .. start chapter 3
read chapter 2.. but won't be tested on it.
chapter 3 -- taking a language and making it do something
data types and structrues -- more readable, writable, reliable, \
more readable:
bool x; x = true; x = false;
C doesn't have bools
in C, false is 0, anything else is true
.. readability is harder that way..
being able to declare our own datatypes .. enumrated types
if I"m playing a game of cards, enumerate suites {diamds=0, etc}
defined a new datatype named suites
.. certainly more readable
if yo can make the meaning more clear, it's more reliable.
structs, class, record, collection
function (player & s) .. we just sent a ton of data .. grouped into a meaningful entity
more reliable .. cu'z you're not going to switch the numbers.
strong type checking
syntax design -
- some languages , indetifiers have max length
pascal had a max length of 8 .. and ignored characters after 8
fortran
identifiers that started with i,j,k,l,m,n were treated as integers -- didn't have to declare them.
good keywords
support for abstraction - may not help read and write but helps with reliability.
ability to write and reuse.
design something complex but hide details.
easy example is the sort or search function.
method of doing a linear search.
functions in general are abstractions
container classes .. lists, stacks, queues , arrays
template classes
vector like array but dynamic .. template class.
linked list class
EXPRESSIVITY:
- - - - - -
++ --> convenient
c# statement foreach
a in some collection
convenience rather than required
writability and reliablilty
lat three -- only realiable:
type checking -- type mismatches
default setting allows you to put a double into an int in C++
.. you've just lost data (does give you a warning)
type coersion - losing data and some realiabiliyt in the output
long ints, short ints, and overflow.
trying to add an int and a short int -- probably get the correct result -- unless thaat short into is negative.
the tighter the type checking, the more reliable but the more pain in the but it is.
exception handling - catching runtime errors within the program.
you asked for an integer and they've given you a string
division by 0,
index out of bounds
try
catch
throw
ristricted aliasing
aliasing .. two or more names associated with the same memory location.
we do this all the time -- reference parameters
- that's an intended use.
link lists -- aliasing can be dangerous
if you can't alias, then the unwanted side-effects don't happen
other highlights
when choosing a programming language , you need to think about the cost of your choice
1. training the workers. you insist i use python, I gotta go use python
2. cost of writing / time it takes to write - writeability .. easy to write in
3. cost of compiling / compiling time. in the old days this was a big issue. .. depends on the size of what you're trying to do.
.. separate compilation -- allows you to have a.cpp, b.cpp and main.cpp .. i can compile and produce object files and they'll get linked to produce main.
but why should we recompile a.cpp if it wasn't changed -- so we can comiple parts of it.
linux and unix do this with make files.
optimization
4. cost of execution ... how long is it going to take to run?
C is lean and mean .. code is highly optimized.
some languages come down with a lot of extra stuff.
5. cost of the implimentation environment
run a java program - free
ada - so expensive that only people who had defense contracts could afford it.
6. cost of poor reliability
7. maintainence costs can be 2-3 times the cost of the initial program
influences on language design:
computer architectureis one influence
cpu
control unit
arithmetic logic
memory -> data & programming instructions
van neuman model .. imparative languages .. this is the only model we have, though
imparative languages deal with statements, variables, and control structures.
we iterate .. statements are consective in im memory .. so more efficuient than jumping around.
variables are in memory -- memory cells are like variables .. closely tied to the von neuyman architecture
functional languages are exactly that .. all about functions and parameters -- no real blocks of code
...as are program methodologies
what are the methods we use to program?
what are good ways to write programs
stepwise refinement, top-down design, structured programming.
we still have the imperative bodies.
late 70s, early 80s - more of a focus on the data vs procedures
- abstrctions such as sort, search -- organizing the data.
early to mid 80s - object oriented
- data plus function
language categories
imperative - c, c++
- oop - a subset of the imperatives
- visual languages, ie, interfaces
- statements, control structures, order of statements important
- scripting languages
logic
- rule based - no particular order. rules used to generate truths.
functional - lots of recursion .. all about functions and parameters
markup
- html .. not true languages .. they don't execute .. but there is some actual code in there that rus
special purpose
- report gen, others
1.6 language dessgn tradeoffs .. you might choose a language that allows sloppiiness if you are sure you're not going to make mistakes like pointers out of bounds
size of the exe and using the things incorrectly
exe runs slower because of the size of the source.
1.7 Implementation methods
----
1. compiler
2. pure interpretation
3. hybrid interpretation
method 2 -- pure interpretaton
source program -> interpreter -> results
no exe.
interpreter really a simulator of a machine
vb typically runs as an interpretor
advantage .. easier debuging but slower to execute
interpreter is a piece of software that simulates a computer
1) compiler interpretation .. like C++
source code
|
\/
lexical analyser - splits things into little parts that have meaning .. lexical unit - lexemes
stores them in a symbol table.
syntax analyser - takes the little parts and see if "you are really an if statement"
parse trees also interacts with the symbol table -- "ooooohhh that's a function"
generate some intermediate code (not finished yet)
and a symantic analyser - about meaning .. what does it mean to have a while statement
at this point, it might to into an optimizer or not.
intermediate code goes to a code generator that produces machine language.
a complier takes source code into machine language.
load, link, run
results.
but we have something we can run over and over again .. vs interpreted.
but we have to run on a similar machine as mine.
last one -- the hybrid -- a lot what java has done .. this idea that
takes a high level language into an intermediate language and then uses an interpreter on the intermediate language
takes the source code - in a high level language, runs it through the lexical analysizer -> syntax analyiser -> intermediate code generator -> intermediate code.
-> interpreter -> results
interpreters that run on different machines and i can hand the intermediate code -- java byte code.
run on any machine - 2 an 3.
-------
Wed 09/03/08
lab due friday 1:40
in G: or attached to email
so far:
- language eval -- what makes it easy to read, etc
what things influence language design / tradeoffs.
today:
three implementation methods:
1. compiler
- takees source code .. part or entire program --> leximes
- syntax analyser
- parse trees
- intermediate code
- machine code generator
.. machine specific
symbol table - the first two pout stuff in there and then used later.
2. pure interpretation:
- simulator .. simulates the language
3. hybrid interpretation .. takes the source, goes into lexical analizer, produces the lexemes, these go into the syntax analyser
intermediate code generator.
and then that goes into an interpreter.
example java .. doesn't produce machine language .. java byte code .
as long as you have that and theeinterpreter, you get to run.
so not compiling at all, just taking byte code and sending it to the interpreter.
where we are in the text:
chapter 3 .. describing syntax and semantics.
chpater 4. lexical and syntax analysis
kinda out of order.
3.1 - 3.3
and then 4.2
terms
syntax - the actual form of an expression or a statement or a program unit; C & C++ doenst actually have statement s-- only expressins.
so this is form -- nothing to do with meaning.
semantics - assigns meetings .. how is it used? something as simple as an assignment statement has a meaning .. evaluate the right and put it in the left.
in pascal,
repeat <something> until <bool>;
syntax involves the order of the identifiers and other parts
keywords - repeat, and until.
<statement> - placeholder for a volid statement list .. allowable list of statements.
<boolexp> . placeholder for a valid boolean expression.
statement and bool have their own syntax rules
** how the thing gets put together
;
semantics: - are its meaning.
<statment> is executed, the boolean expression is examined/eval
if true, it leaves the statement
if false, it goes back and executes the statement.
it's backwords!!!!
post-test loop
so that's the meaning assigned to the statement
so symantics is about the meaning, syntax is about the form.
so the first thing we worry about: does it meet the required form?
how do we describe the form?
several formal methods.
used to define , describe syntax rules.
specify what strings of characters are allowed in a program.
samples from C++
+=, fred, 1.2345, for
pascal :=, =, +, fred, program
all allowable strings
very different parts, but all allowed.
if (x>7) x=75; .. also allowable string.
in its purist form, syntax rules involve lexical and syntactic rules.
lowest level units of syntax - lexemes
( lexical analysis)
separate breaking apart the peices from saying "this is a good if statement"
give me the parts and i'll tell you later if the parts are in the right order.
lexemes can be defined using sytax rules .. but tedious and lengthy
so we typically separate them .. they are usually defined using a lexical specification.
ie, begin with letter, follow by letters and digits.
described the characters
if the language and its lexemes are described using syntax rules, then the lexical analyser and syntax analizer can be combined.
but we usually dont - typically they're separated.
why?
- simplicity - 1 task 1 subprogram
lexical analysis is simpler than syntactic.
by doing it separately, it hides the details foom the syntax analizer .. don't overload it with what an identifier is.
efficiency .. lexical analyser can be optimized .. compared to if i'm trying to do that while i'm doing syntax.
portability - where do I have to do this? is it only good on the platform i'm on?
syntax has nothing to do with the platform.
lexical analysis, because it involves characters, is somewhat platform dependant.
what does a lexical analyser exactly do?
we know that it's purpose is to break it apart.
- identify tokens
- skip over comments
- skip over white space
- place user-defined identifiers into a symbol table.
example:
double cost;
ost = x+(.15 * x);
lexical analizer should identify double.
lexeme token (internal code)
double key_double
cost identifier, an dplace in symbol table with some attributes -- ex: where did it first appear
; semicolon
cost identifier
= assign_op
x identifier
+ add_op
( left paren
.15 double literal
* mult_op
x identifier
) right paren
; semicolon
pulls apart identifiers .. decides if they're key words
+= should identify as a plus and equal;
Add_assign_op
so that's what this guy has to do.
need some easy ways to do this -- not as simple as we'd like it to be.
formalize an approach to lexical analysis:
1. write regular expressions that describe the patterns of lexemes. use these expressions as input to a software tool that takes these expressions and generates a lexical analizer.
unix had a program called lex, which produced the lexemes and symbol table.
regexp
way of describing what a pattern might be.
suppose it must begin with an a,b, or c
[a-z A-Z][a-zA-Z0-9] .. most have one of these.
* - may choose 1 or more things.
"an identifier must start with a letter .. "
so that's a way of describing what an identifier might look like
C++ dfines all kinds of thigs for numbers.
suppose we want to say an integer can begin with a + or - , followed by five digits, first non-zero.
[+ -](notation for optional)[1-9][0-9]*
i can feed a whole bunch of expressions that look like this into a program .. and it'll spit back the lexems if you get this right.
second approach - design a state transition diagram that describes token patterns.
and write a program from those.
state-transition diagram - directed graph with edge labels
- edge labels - symbols, actions
suppose I just read +
O -- + --> O -- = --> add/assign op
--- + --> increment operator
--- ' ' --> addition operator
--- digit --> keep digit, but return add op. or sign
--- letter --> add op
--- other -->
switch (ch)
{
case '+': getchar(next); switch (next)
case '=' : blah
case '+' : blah
state diagram describes where I am
-- letter --> stay here if letter or digit
if we hit whitespace (any witespace)
return lookup(string) .. comes back as either keyword or identifier
end of identifier -- whitespace, symbol
is alpha, is digit
.. collapse all this arrows
but the state diagram is huge.
-------
Fri 09/05/08
Last Time -
state diagrams
sections 4.1 - 2 and 3.1 - 3.3
idea is to be able to do some lexical analysis of a language and then some syntax analysis
state diagram in lab yesterday
gets really big for an entire language
o not always the best solution
can use regular expressions -- are really great with lexical analysis
regular expressions
trying to form words / sentances/ cntrsucts
according to soee description
we tend to concatinate
union/ option
and 0,1 , or many occurances of some pattern
Theory of comp
a+b*
either a single a or (+ is or) a follow by 0,1, or more b's
a, ab, abb, abbb
(a+b)(a+b)*
starts with an a or starts with a b , followed by 0,1 or more a's and b's
a, b, aaa, aba,
for programming languages .. good strings are good doubles , good operators, etc
page 255 -
regular expressions
.. enclosed in forward slashes
enclosed brackets [] denote a set
/[A-zA-Z]/
.. one from the set
/[A-zA-Z]+/ plus is 1 or more from that set
/[a-zA-z][a-zA-z1-0]*/ must start with a letter and then can have any number of letters and numbers
\d stands for digit (?)
or .. the pipe
/[a-z]* | [A-Z]*/ the or THE , not the
CSCI 322
ART 335
uppercase 3 or 4 letters, 3 digits
? -> 0 or 1
/ [A-Z][A-Z][A-Z][A-Z]?[0-4]\d\d/
the author's example
/[\d+\.?\d* | \.\d+]/
\d+ one or more digits
\.? one or zero decimals
\d* 0,1 or many digits
5.5
2
so the back part allows floating points without stuff before the decimal
if I try to combine them , i make everything optional .. then the empty string is valid.
lowest form of syntax -- the allowable strings
I can use a state diagram any time I can use a regular expression but not the other way around.
reg ex and state diagrams are ok for describing the lexmes well
reg ex cannot be used for some language constructs
state diagram can be used for most programming languages
it'll get big but i can ..
i can't use a reg ex to deal with things like nested ifs
two ways to describe langauges ..
recognizer and a generator
reg ex is a generator, a state diagram can be used in that way too, but if we have the comiler but not the source, we can't tell what's in the language, only what's not
a grammar also describes the language
can't use regular expressions, and state diagrams get huge .. need another route to do that. .. grammars. grammar comes frm the compiler, compiler can give you the grammar
3.3 - formal methods for definiing syntax
- most popular / widely used means are grammars.
grammars were really created back in the 50's and we still use tee same stuff now
independantly developed by
Nohm chompsky
Jim becus
peter nohr
bnf bockus nohr
chmpsky was a lingust .. he had no care in the world about computing languages.
he came up with four different grammar types
1) regular - do exactly what regular expressions do .. in other words, lexical analysis
2) context free - good enough to describe most programming languages
3) context sensative - harder to write but a bit more powerful
4) unrestricted - more powerful yet
regular languages (in the center of the circle)
context sensative
context free
unrestricted
he came up with this stuff early in the 50s .. but because he wasn't a mathimatician , this work wasn't looked at the way we look at it
but BNF were working on the german association for applied math and mechanics .
involved in a project oo try and define the algo language
.. they came up with a way of describing a grammar .. and it was the same kind of idea as chomsky
BNF backus Nohr form
def.
a grammar is a collection of :
1. Non-terminals - variables . must be substituted for
2. teminals - things cannot be changed
3. rules left side, arrow and right side
replace left with right
suppose
non-bnf grammar
Nonterminals S, A, B
Terminals a b
Rules S-> AB
S -> a
A -> aA | a
B -> bB| b
-------------
generative form - grammars are generators
I can replace S with capital AB
S -> _A_B -> aA_B_ -> a_A_b -> aab
non-terminals -- things that you've gotta get rid of eventually
In C++
variable declarations
int x;
double a,b,c;
--------------
string name = "Niky";
BNF -- in angle brackets -- non terminal
<VAR_DEC> -> <DATA_TYPE> <IDLIST>;
<DATA_TYPE> -> int|char|bool|double|string|<identifier>
<IDLIST> --> <ID> | <ID>, <IDLIST>
<ID> --> use reg exp
<VAR_DEC> -> <DATA_TYPE> <IDLIST>;
--> int <IDLIST>;
--> int <ID>;
--> int x;
<VAR_DEC> -> <DATA_TYPE> <IDLIST>;
--> double <IDLIST>;
--> double <ID>,<IDLIST>;
--> double <ID>,<ID>,<IDLIST>;
--> double <ID>,<ID>,<ID>;
--> double a,<ID>, <ID>
--> double a,b, <ID>
--> double a,B, c
problems that lie ahead.
<exp> --> <expr> + <expr> | <expr> * <expr> | INT_LITERAL | <ID>
a+b*c
<expr> --> <expr> + <expr>
<expr> --> <ID> + <expr>
---> a + <expr>
---> a + <exp> * <exp>
---> a + <exp> * <exp>
....
by looking at that statement, we don't know whether we're supposed to multiply or add first.
what if we started with the * rule instead o fthe plus rule
doesn't look all ttat different but has a very different meaning
two way sof generating exactly the same string.
e first is a + (b * c)
2nd (a+ b) * c
this ting truly doesn't tell me whether to multiply or add first
you can build this in.
two way sof generating the same string -- called an ambigulous grammar
have to enforce the propar meaning in the grammar
-------
Mon 09/08/08
Last time
- regular expressions - lexemes
- grammars
- bnf
today:
3.3 but diddys from other places
BNF description for a language
- a form of grammar
- a collection of non-terminals .. BNF <~>
terminals -- not in angle brackets
start symbol - non-terminal
- rules - we like rules in which the left side is a single non-terminal - context free grammar
derivation - as sequence of steps that apply to rules -- substitute for non-teminals
example:
--------
<program> --> program <Id> { <statement list> }
<statement list> --> <statement>; | <statement> ; <statement>
<statement> --> <assign>
<assign> --> <variable> = <expr>
<var> --> a | b | c
<exppr> --> <var> + <var> | <var> * <var> | <var>
assigns values to variables.
start symbol is program
non-terminals are all in angle brackets
terminals : program { } ; assignment operator, a, b, c, +, * you know that they're terminals because there's nothing on the left t expalnd them
sentence - a squence of terminals derived from the start symbol
sentential form - a sequence of terminals and/or non-terminals derived from the start symbol .. not quite a sentence yet but has potential
leftmost derivation
rightmost derivation
or just derivation
in a leftmost derivation, we always apply a rule to the leftmost non-terminal in a sentential form.
rightmost .. always appply a rule to the rightmost.
derivation -- whichever
it shouldn't matter whether I choose leftmost or rightmost .. should have the same meaning no matter what
so the order of a derivation has nothing to do wih tt emeaaing of the thing that we're putting together.
the rules matter .. not the order
<program> --> program <id> { <statement list> }
leftmost .. expand the id
--> program fred {<statement list>}
--> program fred { <statement> <statement list> }
should not matter which one we do it first but easier oo tak about if consistant
--> program fred { <assign> <statement list> }
--> program fred { <vvar>= <expr>; <statement list> }
--> program fred { a= <expr>; <statement list> }
. . . .
--> program fred { a=b + c; b = a; }
i shouod be able to get to that regardless of whether i do leftmost or right most.
parse tree
----------
- derivation - hides the order of expansion of non-terminals.
- tree - root = start symbol
- leaves = terminals
- internal nodes = non-terminals
a
/ | \
b c d
children on the right hand side of the rule
left hand side of the rule: parent , always a non-terminal
children -- are the expansion.
all the guys on the right had side of our work are sentential forms.
leaves at any time give me a sentencial form
the leaves in this case give a sensentance in the language
parse trees hide the order in which we do things.
parse trees also do tsomething that the text example. - gives us an idea on order of processing statements. sometmes the parse tree gives us inFormaton
a + B * c example
suppose asign is really <var> = <expr>
<var> -> a | b| c
<expr> --> <exp> + <exp> | exp> * <exp>
A = A + B * C
unfortunately, we can make two different trees
start iwth assign -- assign gives me a var, an assignment operator, and an expression.
var gives an a
....
gotta do a post order traversal to do an evaluation
multiplication before addition
but there's nothing to say that we woul dget that tree
<assign>
var
a
but maybe you wanna do the * first
....
we have the same "yield"
but in our new guy .. the addition occurs before the * -- whic is not want we want
nothing to do with the order we applied the rules but what rule did I use at a given point.
two different parse trees.
for the same sentence
.. this can be bad.
so, two routes to go .. two ways of dealing with this:
1) during syntax analysis, I handle special cases - what to do with + and *
or 2) rewrite the grammar to remove the ambiguity
a grammar is ambiguous if there are two parse trees for the same sentential form.
side note: if I handed you a grammar and asked if it was ambiguous, you need to give two parse trees for the same sentence -- .. or not, is almost impossible to answer -- whould have to be a proff by exhaustion
no way you can say that the grammar is not ambiguous ... no algorithm possible.
the generall problem of sayin gif the grammar is ambiguous is impossible
no lovely way to tell
fix it!
we're worried about order of operations
two probelms with this grammar caused by ambiguity
1) + and * and the order of operations
-- operator precidence times should have precidence over plus
2) associativeity of operators
a + b+c
= (a+b)+c or a+(b+c)
who cares? + is associative! same with multiplication
but we do care .. a lot .. about a - b - c
or a/b/c
2-3-4 = ¯5 or 3
we know that we do the left guys first .. rewrite the grammar to get rid of this problem
let's rewrite the grammar .. simple rules for doing so .. which is why we bother
we want * further down in the derivation
<expr> --> <expr> + <term>| <term>
<term> --> <term> * <factor> | <factor>
<factor> --> (<expr>) | <id>
I can still get a * b .. I can get a+ b .. but in order to get theeplus .. Have to do it in one way
push higher precidence furtter down the tree because you evaluate from the bottom up
we've increased the length of the derivation - that's our tradeoff
but there's no ambiguity
solved operator precidence.
associative
a = a+ b + c
only way to get to + again is with the parens in <factor>
but i've just added the a and b before I added the c.
so plus is left associative
left recoursive - the left hand side dervies the left hand side and something
right recoursive rules -
left hand side derives something and then left hand side .. for operators that are right associative
2**3 = 2^3 = 8
2**3**2 = 512
in c++ right associative operators:
a=b=c;
we came accross two problems with easy grammars that we could solve by changing the grammar
.. grammar helps to assign meaning.
-------
Wed 09/10/08
no class friday.
one thing missing on the hand drawn one ..
getchar has the next character to be processed.
.. stops when you hit something that's not, for example, a digit
don't get next char if division operator
last time:
grammars
- define language constructs
- ambiguous
- how to use a grammar to give meaning -- more than just syntax
- oprator presidence
- operator assoc.
This time
- - - -
- a little more of the above
- extended bnf .. so we can read the rest of the book
- more complete example
grammar is a collection of nonterminals, terminals, rules, and a start symbol ( a non-terminal)
sentence is a string of terminals derived from the start symbol
sentencial form - string of terminals and/or non terminals
derivation tree - root is the start symbol leaves are terminals ad internal nodes are non-terminals.
derivation tree -- same regardless of order of non-terminals expanded
-- who cares about left terminals and right terminals
an ambiguous grammar has at least one sentance in its language that has two or more derivation trees
something very boring as an example:
<start> -> <start><start> | (<start> | ()
.. generates balanced ()
non-terminals <start>
terminals ()
thee rules . three choices.
start
( start )
(( ))
start
show that this is ambiguous .. need to come up with a string with two different parse trees
.. a whole pile of them
start doing 3, for or five pairs
removing ambiguity is not the easiest thing t'do
operator precidence and operator associativity
associative a + b + c is reaaly (a+b) +c
left associative
a=b=c .. right associative
<exp> -> <exp> + <exp> | <exp> * <exp>
/\ /\
problematic
expression tree tells us how to change our grammar
rather than use the same no-terminal everywhere,
add non-terminals and moving the operators of highest expresion away frm that.
<expr> + <expr> | term
<term> -> <term> * <term> | <Factor>
<factor> -> <ID> | (<expr>)
+ is left assicative .. we hav a left-recorsive rule t deal with +
goal -- add to this the asssgnment operator.
a=b=c
want to say
we know that this guy is right-associative in C++
use righh recursive rule
<e> --> <var> = <e> | <var> .
try it with a | b | c
a=b=c+2
a plus has higher pprecidence.
+ occurs first
so in our grammar, we hhve to put the = rule before the expression rule.
one thought:
add
<assign> --> <var> = <expr> /* simple ones */ | <var> = <assign>
hope is to get or guy and get our guy one wya
we oon't want ambiguity
<assign>
var = assign
a var = <expr>
b expr + term
term <Id.>
id 2
c
but we can't do a= (b= c+2)
a=2*b=c
using c++ comiler as a recognizer to test.
doesn't combile .. 'cuz * is higher presidence
how to allow an assgnment to be an expression.
<expr> --> <assign> .. could do .. does that work? gotta be careful
not as simpole always as adding it
but we did take care of the idea of presitcne and right associativity -- we just need to allow all the possibilities
extended bnf .. because the rest of the text usues it.
state diagrams and regular expressions .. helpful in lex analysis
state diagrams and grammars are helpful in the syntax analysis phase -- does the program entered follow the syntax rules.
somehwere along the line .. use a grammar to perform syntax analysis
also assigning meaning
translate these grammar ideas into a program that allows us to do syntax analysis
ebnf
grammar and regex combined!
<statementlist> --> <statement> | <statement>, statementlist
ebnf
- <statementlist> --> <statement>; {<statement};}
{ mean 0,1, or many
replaces two rules with one.
so that's curly braces
<ifstatement> -> if <expr> then <statement> [else <statement>]
.. so [] means optional
.. so two rules in that one , too
<expr> --> <expr> + <term> | <expr> - <term>
can be written
<expr> (+|-) <expr>
() <-- choose one
<expr> -> <term> {+ <term>} .. shorthand loses the idea that the + is left-associative and perhaps infinite
if (x>0)
if (y >0)
cout << "fred";
else
cout << "barney";
x=3, y =-2 .. barney
x=-1, y=-3 .. nothing
<ifstmt> --> if <expr> <statement else <statement>
| if <exp> statmet
nested if .. ambiguous
to whom does the else belong?
else goes with nearest previous unmatched if
gotta buld that into the grammar.
am I matched or not matched
author idea .. if I have an else,
the way they solved this -- look at what is matched and unmatched
new non-terminals to ehlp deal with this
<statm if> --> <matched> | <unmatched>
<Mathed> --> if <exp> <match> else <match> | any non if
unmatched> --> if <expr> statment> | if <expr> <matched> else <matched>
let's try it.
-------
Mon 09/15/08
unary minus
.. apears to lose presidence .. but code enforces
Last time:
---------
use grammars that describe lanagues and sometimes meaning
-- ultimately want to translate into program .. don't have to buiid the by hand
.. bnf, extended bnf
expression
**
unary minus
-2**3 = -(2^3)
these are the meanings to enforce
recursive decent parsing
associativity and presidence .. how to enforce that
capt 3, sections 1-3
chapt 4- sectins 1-3
today 4.4
a=b=c;
a=2+(b=a)
a=..
error: a= 2+b=c
.. the + operator has higher presidence so it tries to do 2+b first
how to solve .. push the + down and raise the = sign up
also know hat = is right associative
+=, have higher presidence than the assignment .. so does anyting else
<assign> => <var> = <expr>
<expr> = <expr> + <term> | <term> | <assign>
<term> = <term> * <factor> | factor
<factor> = <id> | (<expr>
)
parse tree
. have indeed honored precidence
more unfinished business .. the if statement
if ()
if()
s1
else()
s2
<ifstate> -> if (<bool>) <statement>
.....
else goes with previous unmatched if
can't use an if if it isn't matched with an else.
grammar that prevents an unmatched if inside our if ppart.
only thing we really have to worry about is if .. else
while as one body, repeat ..
only the onditinal statemets have multiple bodies
<start> -> <matched> | <unmatched>
<matched> -> if (<statement>) <matched> else <matched> | non- if statement
<unmatched> --> if(<bool>) <bool> | if(bool> matched> else <unmatched>
only thing we disallowed is putting an if inside an if
<state> -> {<statement>}
4.3 the parsing problem
----
syntax analyzer is really the same thing as a parser .. . terms are interchangeable
two jobs
to determine if the input program is syntacticallyy correct
if not, display a diagnositc message, recover, and continue
don' want a parsor to say "i found an error on line 88" and stop
if you program is 255 lines long, you wouldt know anything about result of the rpooram.
idea of recovery .. get to a known point
.. comiplers will often get rid of the rest on the line
cascading errors: an error on line 10, might often produce errors on the following lines.
a lot more to it than is is good and that is bad ... havs to pay attention to revoery and cascading errors .. don't complain about the same thing over again.
2) parser have to gather information o allow program translatin
what are the statement s in the langauge
either builds a parse tree or buld traversals
.. can only do this for correct syntactic parts.
essentially two kinds of parsors with two methodologies
TopDown - start from root and build to the leaves.
buttomUp - starts from t leavess and builds a root
Top down parsing:
- generally corresponds to th leftmost derivation .. always expanding the lftmost non-terminal
building off of grammar
- constrcts the parse tree in pre order .. not reaaly pre-order .. expand the children left to right
leftmost derivation grows first
e
e + t
e+t
t
t*f
f
a
left-most derivation
expand the non-terminals in pre-order
every left most derivation
.
terminals, non-terminals , mix of terminals and non-terminals
xAα
top down parsing is about .. now I might replace A, with the right hand side of one of it's rules.
.. deciding what rule do I apply next .. and we're always applyihg to the left most variable
so top down parsing often uses a look ahead symbol (really a tokeN)
.. how did I kow what to choose last week .. if it was a plus, I chose one thing otherwise another thing
one-symbol look ahead
if there are lots of rules, how do I decide which one is next?
recursive decent parsing is a top-down
idea of a function per non-terminal
easy to translate EBNF into code
- uses code to implement the rules
.. not a table
another top down method that uses table lookup
both example of LL algorithms
recoursive decent
top down
but also an LL algorithm for parsing.
Left to right scan of the input
second L says "left-most derivation"
.. always expanding the left most variable
....
bottom up:
build from thh leaves (string of terminals) and move to the root
a*b+(c+d)
- give the reverse of a right -most derivation
LR algorithm
left to right scan of the input
but uses a right-most derivation
matches from thh rigth .. uses all the right hand sides of the rules and then decides which shouldn't apply
example grammar
S-> aAc
A ->
aaBc
try to match a right hand side
a little different
than top down
trying to match right hand sides .. trying to chose the correct one.
A parsing algorithm that will correctly parse any language using its grammar takes O(n^3) where n is the lenght of the input
1,000 tokens --> O= 1,000,000,000 operations to process 1,000 tokens
this is the best you can do for a parsing algorithm to do any kind of grammar
so maybe i can write an efficient parsing algorithm with certain characteristtcs
trade generality for efficiency
certain kinds of grammars -- top down
other kinds - bottom up
rather than have one method for any grammer, i can have severl methods
result -- a program written from any unambiguous grammar can be parsed in O(n).
characteristics for which recursive decent works:
- LL grammar characteristics
(recoursive decent)
section 4.4 can resolve an ambiguous grammar with code .. handles ambiguous if statement page 179(?)
why not
E -> E + T
void expr()
{
expr()
}
that's bad
in EBNF
E -> T{+T}
makes it solveable
what this rule has
is direct left recoursion
because the variable on the left is the first character in a form on the right.
I can eliminate direct left recursion and still have a grammar and not use ebnf
general idea
suppose
A -->Aab | Ab| BAa| Bbc
direct recoursion on the first two rules
introduce a new non-terminal a-hat
replace rules
wanna replace the rules above with the following:
A -> BAaA-hat| BbcA-hat |
A-hat -> AbA-hat | BA-hat | ε, which means nothing
A--> BbcA-hat -->
we can use recursive decent on this w/o a problem
-------
Web 09/17/08
Exam I -- Monday Sept 29th .. in clase
plan: Chapters 1,3, 4 .. 5
how do we do the associativity .. look at the grouping in the code.
.. what is it that doesn't group it the other way
B = C* (A * C+B)
3.2
page 123
isn't going to do precidence correctly
one of the othe questions:
#12 .. show your work!
for "no" .. there's some patterns you can follow
section 4.4
recoursive decent parsing
- top down method
- EBNF is easily translated into code
for each non terminal (variable) we have a function
- trace parse tree .. rooted at the nonterminal and wholse leaves make input string
a+b*c
expression
/\
a + b *C eventually
and when this functio call ends .. we know all of that has to be done
and inside there we call term
and at the end of term we better have an a
term calls factor .. that better produce the b*c
responsbilities of cuttong these up .. is exactly the leaves of the subtrees
token value (next token) has been read .. we know what the next token is .. one token look-ahead
two problems with recoursive decent processing
with a grammar & recoursive decent processing
1. Direcdt left recursion A--> Ax
(INddirect left recoursion .. A ---> Ba, B --> Ab
so we get right back where we started.
2. A--> aB | abC | D
D--> ab| bD
my funciio A here .. if next token is an "a" .. what do we do .. which rule?
multiple rright had sides that start with the same token
expr()
{
term()
if nexttoken == plus
else nexttoken is a star.
removing left recoursion
given rules
A -> Aα[1], Aα[2] | β[1], β[2]
where α and β are strings
[i] does not being with A
rulees
1 - introduce a new non-terminal A-hat or A'
replace all rules with a on left hand side
with
A -> β[1]A' | β[2]A'
.. so for any rule where A doe snot exist on the left .. the non left roursive rules for A
put the new guy behind them.
introduce new rules for A' so taat
A' -> α[1]A' | α[2]A' | ε (nothing) .. erases the non-terminal
replaced four rules with five
A ==> Aα[1] ==> ..
we have to make sure that we can get the same strings in the new grammar
so fom A we need to get the string above
==> β1a' ==>
left recoursion eliminated.
E --> E + T | T
T --> T * F | F
F --> (E) | id
first two rules have left recoursion
E -> TE'
E' -> +TE'
α[1] = +T and β[1] = T
T --. FT'
T' --> *FT' | ε
F --> (E) | id
this also looses associativity!
using the parse tree
it's a bigger tree
so implemnetation must provide for associativity
useful to feed a code generator that doesn't accept EBNF
problem #2 .. multiple defined right sides .. with the same beginning token
NOT a+b vs a++ .. that's lexical analysis
this is about tokens
A --> aA | aB| bC | D
B --> bA | b
D -> aa| bC
C --> a
if token is a I have issues, eh?
aA? aB? D?
This grammar fails "Pairwise disjointness test"
'cuz we're interested in single lookahead
For each non terminal A in grammar
and for each pair of rules A --> α[i] and A --> αj
it must be true that first of αi n first of αj is the empty set
first: .. what's the first token here?
first (α) is a symbol a where α derives in some number of steps ax, where a is a terminal x is a string , possibily empty of terminals and non-terminals
and a could be ε
so first of some expression is all the termals that it can produce in the front
so an example
A --> aA | bB| aAa| BB
terminals: a, b ()
nontermials: AB
B --> (B) | bb
First (B) = ( and b
First (A) = a b (
A => aA
A --> bB
A--> BB -->> (B)B
0, 1 or many derivation steps
not a pariwise disjoint test yet .. comes on the right hand sondes .. not the left
A --> aA | A --> bB
First (aA) = {a} (in 0 stelps)
aA
first (bB) = {b} also in 0 steps
but First(aA) n First (bB) = empty , so pariwise disjoint .. these two rules are pairwise disjoint
but the grammar fails the test if it fails for any pair of rules
A --> bB and A-> BB
First(bB) = {b}
First(BB) = {b, )}
BB => (B)B
the intersectio is not empty .. so this is what's going to give me trouble
so the whole grammar fails
same left hand side .. because we're taking about the founction that goes along with a sepcific left side
so in this case it doesn't know which rule it should be using
so we can't use recursive decent unless we get rid of the problem
"left factoring"
- a method by which we can soemtimes (most) eliminate non-pairwise disjoint
<assign> --> <var> = <exp>
<var> --> <id> | <id>[expr] / assigning to array / | <id>.<id>
<ID> --> <Letter> ......
so var will give us trouble
var gives <Id> and VAr -> <id>[<expr>]
first (<id>) n first <id>[<expr>] = {a, b .. z A B .. Z} thre's lots of things in that inteesection .. clearly not empty
introduce a nw variable
<assign> --> <var> = <expr>
<var> --> <id><newvar>
<newvar> --> [<expr>] | .<id> | ε
newvar ()
{
if(tokenvalue == Lbracket)
...
else if . ....
else return
var()
id
newvar
-------
Fri 09/19/08
Lab --
goal .. move from the kindds of things we did last week to the new bool stuff
please complete if statement part .. she's ggiving you the code for that part.
the grammar is not C
can't do x = a<b
there are things allowed in C and C++ that this littoe grammar does not allow
and then there are things will parse differently than in c:
!a < b
does have something that violetes recoursive decent parsing
- get rid of left recoursion
- rules are pairwise disjoint . and she voiled that in this language
see handout.
pairwise disjoint
NBF dervies bool and comparative statements
First((bool)) = {(}
first(cexpr) = {a-z A-Z 0-9 (}
so the intersection between these two guys is non-empty
so this can be a problem for recoursive decent parsing, but she told us how to resolve it
ambiguous can be bad but can be overcome
Lab report
- can print source and place in G drive
- if you just put source on G: .. help her find it .. give a line number
give an index.
how to check you're doing it right:
recursive decent parsing says
if this is a non-terminal .. the substree that it generates, all the leaves are the input that matches that non-terminal
so for a+b
expr generates all of that .. so the first call to expr .. it would be nice to see the lexemes a+b but you'll see that the a happens on the outside and the next guy happens right before leaving but if you drave a parse tree you should be able to detect that it's what it generated.
If statement:
violates the rules of ambiguity
see handout
this rule is ambiguous
if ()
if()
else
.. there are two parse tress based on this grammar for that construct.
ifstmt() .. current token "if"
lex gets next token .. better be left paren
get next token to handle the boolean expression
bool comes back with the right paren .. processed everything in between
get rid of that begin on next statement
call stmt . processes watever's there
stmt calls ifstmt
.. processes the whole nested if statment
so even though the grammar is ambiguous, the code doesn't let the ambiguity
we made the choice for the program .. it only does one thing
so most of the problems we've encountered with grammars most of them we can get rid of
sometimes recoursive decent just is ntt possible
need another method .. look at it, example, but not implemented.
section 4.5
bottom up parsing
input(source)
and move from the leaves to root, which is the start symbol
of the parse tree
- replacing right hand sides of rules with the left hand side of the rule
E→ E+T | T
T→ T+F | F
F→ ID | (E)
not easy
id * id + id
E
E
T T
T F
F* F+ F
that's hard!
the idea :
bottom up parsing
given a sentential form α what substring of α is the right hand side of a rule in the grammar that must be reduced to it's left hand side to produce the previous sentential form in a right-most derivation
not only do i have to find a substring that matches one of the right hand sides, but it has to step me backwards into a right most derivation that I dont' know what it looks like yet
goal is to get back up to start
that's a lot to ask
so there are ways of solving this problem:
S -> aAb | bBA
A -> ab | aAB
B --> bB | b
\
given aaAbb .. how can we go backwards?
hard part .. finding the RIGHT right side
β is a simple phrase of right sentential form gamma
iff S - * > gamma and then can dervice α1βα2
β is a phrase of right sentential form γ
iff s dervices γ and then γ = α1Aα2 --> α1α2
aroow plus means one ore more steps
if I start from a, I can get beta
β is a handle of right sentential form γ
γ = αβw iff S =>+[rm] αAw ==>[rm] αβw
rm: rightmost !
for a handle we care about right most
w is only terminals .. rightmost derivation says tat A is the right most variable of γ
To bottom up parse , find the handle at each step .. and its unique and there's only one
diagram. see handout
parse tree
phrase .. the leaves of a partial parse tree rooted at a non-terminal
for a variable, what can a variable give me
phrase .. everything tht i can get from a variable that is at th4e bottom level fo the three at the moment.
simple phrase - exactly one step from a varaable
b is the only simple phrase. .. one step from a variable.
therefore, it must be a right hand side .. that's good .. cuz a handle must be the right hand side of the rule
our handle: the leftmost simple phrase.
in this case, the handle is the little b
replace the handle with its left hand side.
reduce B to b
now we have a new right sentential form .. what arethe phrases of this guy .. what are the simple phrases, what is the handle
aaAb
aAB
aAB is my simple phrase and handle
the example ih the text has a couple simples phrases.
reduce by aAB
phrases:
AAb
simple AAb
handle AAb
get to start symbol .. if we can get back to the start symbol we have a successful bottom up parse
but we were using the top down to get the bottom up!!
you build what's called the parse table .. everything that you've done so far .. and i use a stack
shift reduce algorithm
- uses a stack
- shift .. shifting from input to the stack .. i'm pushing
- reduce : pops to a specified point top and then pushes
LR parser
use a stack, trhow things on the stack and then when the handle appers on the top of the stack, i'm going to replace the handle iwth it's left hand side
what are possible handles .. gotta be the leftmost handle
how am I going to know what all those are .. & not just string matching
what typically happens .. use the rules of the grammar to build a parse table.
.. use a tool for that!
YACC - yet another compiler compiler
handout grammar rules #ed 1-6
r says reduce
action and goto table
start with a 0 on the stack .. stands for what state we're in
current state on top of stack
Donald Knuth alorithm
shift 5 -- puts id and current state on to the stack.
reduce 6 -- pop 5, pop id and push f
replacing right with left
how do you come up wit the table?
building the table is what makes bottom up hard
but i care because it does everything .. everythning recursive decent can do and more
also there's lots of tools out there.
-------
Mon 09/23/08
exam 1
Monday, 9-29-08
chap 1,3,4,5
! happens before ands and ors's
in our grammar doesn't do comparison operators correctly.
Last time:
-----
- bottom up parsing ethod
-building from the leaves to the root
- can handle all programming languages .. even those that rec. decent cannot
- building the parse table is hard in general
- the tables get very large
action table
- column per terminal and one more
6 rule grammar and ended up with 12 states
12 * columns
substantial sized table compared to the grammar we started with
so tables get large quickly
but with all these complains, most tables are built programatically by programs that take grammars as input
so most of these tables are the results of a grammar being fed into software .. out oces action and goto
then they get incorporated into the software
so alothough we need to know how a bottom up parsor works .. but won't be building one.
handout.
shifts
sift 5 .. through id on stack
move to state five ..
1$ .. accept
produced all the input and gotten back to the root of the tree
need these tables to do this
a better way to do things than recursive decent
grammars are na effective means of dscribing languages and some means and oome symantics
take a langauge
decide of the program is correctly written in that langauge
2 - produce machine language code based on the meaning
first we have our input program
\
\/
lexical analyszer
|
\/
parser - we've met the rules for the language
|
\/
intermediate code
|
machine language code
not good enough to say that the sentances are good .. doesn't tlel me aeverything there is to know about the langauge
int xy;
y=x;
valid datements but we don't know what y is .. and some languages don't care .. javascript
.. but we know in the c/ c++ world, we have to define our variables first
we have to create what's called a symbol table
and when we work iwth the intermediate code .. have to know about addresses and type checking
sybmol table needs to hold x, its type, address
how long is x alive?
scoping issues and life time issues
a simple table in its most basic for contains user ids .. y woul show up but it hasn't been defined
symbol table .. user identifiers, constants .. where are they store, what are their values, types
so taat I can do the rest of the hngs a compiler does.
chapter 5.
Names, bindings, type checking, scope
- - -
waat the heck is a variable?
- a building block of imperative languages
imparative = based on memory cells, assignment statements, and iterative repetition
- we think of avariabbe as being a memory cell .. so it is the building block .. where we are .. crux of how we manipulate data
attributes of a variable:
- has a name
- has an address of its corresponding storage
- what is its lifetime - how long is this variable ascociated with that memory cell
- what is it's scope -- when can I access that guy
.. usually the lifetime is bigger than the scope
- size (sorta) .. in the respect of type .. and how that's treated .. an int isn't always a 4 byte int.
- value
- initialization .. when I first have it, what is it?
int x;
x [ ]
easiest one to talk about is names:
already done this a bit with lexical analysis
.. fixed length or variable length .. these things help with readabilit .. meaningful variable names
- case sensative
a rose is not a Rose is not a ROSE
.. Hard to read and debug.
a writeable issuesif it is case sensative for things like matching standard functions
read sec 5- 1,2,3
some langaage allows you to put spaces in the identifier
special words
keywords
reserved words
predefined identifiers
every language has a set of words that allready have meaning
for example, if in c++ is a reserved word
.. cannot be used as a name
if , else, tye are what they are .. you cnn't change them
a keyword has a specific meaning in a specified context .. can be used as a name outside of taat context.
author's example: in fortran, the word Integer and Real are type identifiers .. they designate data types
so I can say Real x;
or Integer sum:
they are not reserved words, so I can use them in other situations .. so I can have a function named Real() . different context
also, Real Real;
Real Integer;
the predefined identifiers are the real issue for 110 students
in C++ .. all programs have a main()
inside main, you can have a variable named main; .. so that's the closest I get to sayhing main is a keyword.
but more fun are the ideas of predefined identifiers.
iostream has at least 2;
.. cout!
int cout;
m = cout << 3;
now << is a shift becuase cout is an int
reserved word that people have issues with is new
when we think of a variable,
the abstraction issthe idea of a collection of memory cells
ver seldom are we actually using a variable that corresponds to a single memory cell
char is one memory cell
int could be 2 bytes , 4 bytes
struct kid
name
age
collection of memory cells .. don't care how many .. just know it's there.
we know that a variable has a name
what is the address of a variable
we know ..
memory address not always the same
we think of address of the variable is a machine address .. but in OS we know about paging and segments
we think of it as a physical number but most mahhines run under the idea of virtual memory
.. we don't worry about that .. the hardware address.
but when does the same variable have a different address?
when it's in a function -- every time i call the function , i can get a different address
first time main called f() x can be put in some location
second time, main calls g() then g calls f .. whatever f needed before, ti has the same amount of stuff but in a different pace in memory
talking the same variable but it's at a different place
address isn't necessarily fixed
. hardware should be taking care of that .. so we can't compare the addresses
but at any time, there's only one address per variable -- does'thave multiple
aliasing -
- multiple variables may share the same location
- reference parameters
- pointers can do it as well.
alaiasing is also a (readability) problem
adress .. we think of it as a collection of memory cells -- istn' maintinaed as the same address the whole time, but to our program we don't care
type
a data type tells us a lot of things:
int x; .. we know what x can contain .. we specify how it's used
the size of the location (compiler dependant)
range of values as given by the size
suppose 16 bit then 2^15 -1 ..
char ch;
specify what ch can hold
i can go further an tolk about unsigned int or unsigned short int;
modifiers to the datatype but do talk about range and type.
double z; .. specifies a range .. and a set of operations
because somwhere in the compiler is the idea that I can add sub, * / % two ints . shift an int
.. can't shift doubles
very different than just "here's a box for an int"
5.3.4 value -
the value of a variable -- two kinds of values
L value -- which is the address .. locaion value
this is talking about the contents of the memory cell or cells
.. that's what you care about ..
what you really want is the content of the memory cells
sometimes this is called the r value as supposed to the l value -- left side of an assignment .. this is the right value
5.4 binding .. huge thing taat changes from language to language
when an association is made between an entity and an attribute
or an operation and a symbol
example that the author slams through:
when does * mean multiplication
when is a avariable associated with something physicaaly in memory
what about constants .. what happens to those
possible bindings
assigning an atribute to something in the prgram
possible binding times
when I design the language .. langaage design time --- * means multiplication
or ++ means inc by one
or && means and
when i'm designing my language i'm tying that symbol to that meaning
language implimentation time:
(compiler design time) . how am i gong to interpret that thing within the program
.. i decide if an int will have 4 our eight bytes .. because the comiler needs to make sure htat memory gets allocated
compile time: when I actually run the compiler on the source .. idea that a variable's type is bound to it at that point . at that poin in my symbol table x will be an int ; average will be a double
Load time: when the program is loaded into memory, that's most often when a variable address is specified
link time: address of a library / function call is linked to the call
execution time: pointers & allocated memory