Part 2: Implementing the CLOCK and MRU Buffer Replacement Strategies. (70%)
Setting up PostgreSQL and Other Files/Scripts
Everything needed for the second part of the assignment is available in an svn repos at ~cs186/fa08/postgress-8.0.3 ~cs186/fa08/hw1 on x86 linux instructional machines.
Note on conventions: In the labs we will use gray boxes with the '$' prompt to signify text that you should type in verbatim to the shell on one of the instructional machines using your cs186 account. Below we will provide an explanation of the commands you are entering.
To setup the environment start by:
$ /share/b/bin/mkhometmpdir
$ svn co file:///home/ff/cs186/fa08/postgres-8.0.3 /home/tmp/$USER/postgres-8.0.3
$ mkdir ~/hw1
$ cd ~/hw1
$ svn co file:///home/ff/cs186/fa08/hw1 hw1-code
$ ln -s /home/tmp/$USER/postgres-8.0.3 ./
- This is a script set up by the instructional computing support to create a special temporary directory. We will keep the source code and other large things here to prevent problems with disk quotas. You should clear this directory out (since it won't contain anything you can't recreate easily) after you are done with the project
- This checks out the slightly modified (for space reasons) version of the postgres source tree from Subversion.
- Create a directory for homework one in your home directory (~ is short hand for the current users home dir)
- Change into the directory we just created
- Check out the skeleton for the actual assignment (skeleton file of freelist.c). We are keeping this in subversion as well to ease deployment of possible updated that might need to be given out after the project is assigned. Typing 'svn update' from inside this directory will check for the latest version of the skeleton file at anytime.
- Create a symbolic link (like a pointer) to the temp directory copy of the postgres source. This is purely for your convenience.
You can build and install PostgrSQL by using these commands:
$ cd ~/hw1/postgres-8.0.3
$ ./configure --prefix=$HOME/pgsql --enable-debug --enable-cassert
$ make
$ make install
- Move to the checked out source tree.
- ./configure is a script found in many unix/linux programs that automatically customizes the build process for your specific computer. It is auto generated by autoconf (This might take a while).
- Build the source using make (This might take a while).
- Install postgres into the location you specified during configuration (--prefix=$HOME/pgsql).
Now that you have build and installed postgres, you can test your installation by running the following commands:
$ ~/pgsql/bin/initdb -D ~/pgdata
$ ~/pgsql/bin/pg_ctl -D ~/pgdata -l pre.log start
$ ~/pgsql/bin/psql template1
- Initialized a default empty database for postgres to use at ~/pgdata.
- Start the background server using the the blank environment you just initialized and the logfile ./pre.log (You can look in here for debugging information about the running server).
- psql is a shell that you can type SQL commands or special postgres commands into (starting with a \). Since we have not yet created a database yet, we will use the default one "template1".
Note: If you encounter an error like "FATAL: user "<login>" does not exist", you may be trying to connect to another student's Postgres instance on the same port number. See the troubleshooting section below.
This will open a postgres prompt from which you can type SQL commands to the database. Below are some example commands to try out. Feel free to play around to get a good feel for SQL. Can you implement the examples we talked about in discussion section? To quit simply press ctl-d or \q.
template1=# CREATE DATABASE test;
template1=# \connect test;
test=# CREATE TABLE test_table (field VARCHAR(25));
test=# INSERT INTO test_table (field) VALUES('Hello World');
test=# SELECT * FROM test_table;
test=# DROP TABLE test_table;
-
This is DDL to create a new database
- \connect lets you switch switch the connection to a new database (in this case from template1 to test)
- Create a table 'test_table" with a single field named 'field'.
- Insert the string 'Hello World' into the database. Note the use of single quotes instead of double quotes.
- Retrieve all the data from test_table
- Delete our experimental table.
Modifying the Buffer Pool Replacement Strategy
In version 8.0.3 a 2Q strategy is implemented. If you would like to check it out look at ~/hw1/postgres-8.0.3/src/backend/storage/buffer/freelist.c. For this part of the assignment you will be implementing two different strategies: MRU and Clock. You will then compare the performance of your implementation to the original. To help you out, we have provided the implementation for LRU.
Files to Look At
You can add and manage any new data structures that you need. The
existing code is not extremely clear, but it is understandable. It may
take you a few hours (or more) to digest it. Since understanding the
existing code is a significant part of the assignment, the TAs and
Professors will not assist you in your understanding of the code base
(beyond what we discuss here).
The actual buffer manager code is neatly separated from the rest of the code base. It is located in the postgres-8.0.3/src/backend/storage/buffer directory, and primarily made up of the files bufmgr.c and freelist.c. While bufmgr.c contains the implementation of the buffer manager, we are only interested in freelist.c, which defines the buffer manager strategy (e.g., LRU, MRU, CLOCK, etc.).
We have provided you with a commented example freelist.lru.c which details the functions you will need to implement. We recommend that you copy this to freelist.mru.c and freelist.clock.c in order to create your implementations. There is also stubs.c (which contains helper code needed to test) and buftest.c (a simple testcase). You may want to add additional tests to buftest.c as we will be doing this when we grade you.
The provided Makefile will compile your implementations and generate three binaries (testlru, testmru and testclock). Check the output to make sure you have implemented the policy correctly.
Rebuilding Postgres With Your Strategies
the following commands to rebuild and install a modified version of postgres that uses the clock or MRU replacement policy that you created, assuming you named your files freelist.mru.c and freelist.clock.c
$ cp freelist.<clock or mru>.c ~/hw1/postgres-8.0.3/src/backend/storage/buffer/freelist.c
$ cd ~/hw1/postgres-8.0.3/
$ gmake
$ gmake install
Note: The make file contains and example of how to automate this.
Performance Testing
After you have finished implementing and testing your strategies with the simple harness, it is time to test their performance in the real server. We first start the server using a log file named for the test we are doing (Ex: mru16.log), with additional arguments to see more statistics in the log.
$ ~/pgsql/bin/initdb -D ~/pgdata
$ ~/pgsql/bin/pg_ctl start -D ~/pgdata -l <log name>.log -o "-B <numbuffers> -N 8 -o '-te -fm -fh'"
$ ~/pgsql/bin/psql test
You can see the output of this log in realtime in another window by typing:
$ tail -f <log file>
For the performance testing you will be creating two tables (raw_s_tuples and raw_r_tuples) and populating them with the provided data.
test=# CREATE TABLE raw_r_tuples (tname varchar(2), pkey int, num2 int, num3 int, num1 int, node varchar(16), inserttime float8, lifetime float8, testname varchar(64));
test=# CREATE TABLE raw_s_tuples (tname varchar(2), pkey int, num2 int, num3 int, num1 int, node varchar(16), inserttime float8, lifetime float8, testname varchar(64));
test=# COPY raw_r_tuples FROM '/home/ff/cs186/fa08/data/R' DELIMITERS ',';
test=# COPY raw_s_tuples FROM '/home/ff/cs186/fa08/data/S' DELIMITERS ',';
Testing Matrix
For the final part of the lab, we are going to have you run two different queries under different circumstances and report your results. The statistics will be output in the log file you specify for the server and will be in the following format:
LOG: EXECUTOR STATISTICS
DETAIL: ! system usage stats:
! 0.065834 elapsed 0.068005 user 0.000000 system sec
! [36.306269 user 0.016001 sys total]
! 0/0 [0/0] filesystem blocks in/out
! 0/0 [0/1915] page faults/reclaims, 0 [0] swaps
! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
! 0/0 [134/88] voluntary/involuntary context switches
! buffer usage stats:
! Shared blocks: 71 read, 0 written, buffer hit rate = 99.26%
! Local blocks: 0 read, 0 written, buffer hit rate = 0.00%
! Direct blocks: 0 read, 0 written
Take special note of the buffer hit rate (99.26% in this example). Run each of the following queries at buffer sizes of 16, 32, and 64. Repeat the results for LRU, MRU and CLOCK. Record your results and the answers to the following questions in a text file
performance.txt and turn it in with your submission. You'll also turn in a text file
analysis.txt with a brief analsis of the results (see below).
Queries:
- SELECT * FROM raw_r_tuples;
- SELECT * FROM raw_r_tuples r, raw_s_tuples s WHERE r.pkey = s.pkey;
Run each query three (3) times for each of the three buffer sizes, for each of LRU, MRU, and Clock. You will be recording your minimum and maximum hit rates for the combinations of buffer sizes and policies, for both queries. Record the hit rate as a decimal to two places, without the (%) sign, e.g. 99.26. You'll be filling in the following table for both queries.
Clarification: Run the first query several times for each replacement policy and record your hit rate results. Then do the same for the second query. You don't need to clear your buffer between the two queries. The important aspect of this is that you detail in your analysis what you did and why you think you got the results that you got.
| |
16, min |
16, max |
32, min |
32, max |
64,min |
64,max |
| LRU |
|
|
|
|
|
|
| Clock |
|
|
|
|
|
|
| MRU |
|
|
|
|
|
|
Sumission
Partners
We will be using the parners functionality of the grading scripts.
The student submits the assignment using 'submit'. The system will look for a file
called MY.PARTNERS in your current directory or ask you for the login of your partner. The format of the
MY.PARTNERS file is a list of user names delimited by spaces, tabs or
newlines, such as:
cs123-aa cs123-ab cs123-ac
What To Turn In
- performance.txt -- TEXT ONLY! Two (one for each query) performance comparison tables: three rows, six columns separated by space, which represent the two tables we asked you to fill. Put the second table below the first.
- analysis.txt -- In 1-2 paragraphs, discuss whether you think the hit rates for MRU and Clock are as expected from the discussion in class.
- freelist.mru.c -- Your implementation of the MRU policy
- freelist.clock.c -- Your implementation of the Clock policy
- MY.PARTNERS - Listing the other person that you worked with
How to Submit
- Save your five files performance.txt, analysis.txt, freelist.mru.c, freelist.clock.c,and MY.PARTNERS in a directory called "hw1p2" within your cs186 home directory.
- cd into hw1p2.
- Run: submit hw1p2
Notes on Debugging
You should ONLY proceed to debugging postgresql after you
ensure that the logic of your freelist.skel.c is correct with the Test
Harness discussed after this section!
We suggest that you debug using the logging/printing feature: elog(DEBUG1,<format>,<arg>). Please see bufmgr.c for example uses of this routine.
Also, you may use gdb with the src/backend/postgres
standalone executable. Make sure you run your version of postgres and
not the one in the class account directory! To start the backend server
in stand-alone mode, run src/backend/postgres -D <DATADIR> test. It will use the data directory (and data) you prepared using initdb -D <DATADIR> and database test you created using createdb test. Alternatively, you can use the scripts in /MyCode/PerformanceEvaluationScripts to quickly setup your database. The postgres interface is directly to the backend, so psql features will not work. If your binary does not work correctly, it may corrupt the data, so be warned.
Troubleshooting
Since you are going to be sharing the machines with many other students, you might run into port number conflicts. To change the port number of of your Postgres instance, first shut down the instance. Then change your port number in the Postgres configuration file using your favorite text editor, e.g:
$ ~/pgsql/bin/pg_ctl -D ~/pgdata stop
$ emacs ~/pgdata/postgresql.conf
Edit the line with the port number, making sure to remove the comment (#) at the beginning of the line. Port numbers within 129 - 32768 should be usable. Then start your instance as usual, and start the your database connection using your specified port number.
$ ~/pgsql/bin/pg_ctl -D ~/pgdata -l <log name>.log start
$ ~/pgsql/bin/psql template1 -p <port>