Greenplum Database Client Utility

Here are the list of Greenplum database client utilities as of V 3.0. I will describe each utilities in more details in my future blogs. You can also buy my ebook Greenplum Database Utilities Book Preview for your future reference purpose.

1. clusterdb: Reclusters tables that were previously clustered with CLUSTER. To cluster the database test: clusterdb test. To cluster a single table test in a database named sachi: clusterdb --table test sachi

2. createdb: Creates a new database. To create the database test using the default options: 

createdb test. 

To create the database demo using the Greenplum master on host gpmaster, port 54321, using the LATIN1 encoding scheme:

createdb -p 54321 -h gpmaster -E LATIN1 demo

3. createlang: Defines a new procedural language for a database. To install the language plperl into the database template1:

createlang plperl template1

4. createuser: Creates a new database role. Create a role named joe using the default options:

createuser joe

Shall the new role be a superuser? (y/n) n

Shall the new role be allowed to create databases? (y/n) n

Shall the new role be allowed to create more new roles? (y/n) n

CREATE ROLE

To create the same role joe using connection options and avoiding the prompts and taking a look at the underlying command:

createuser -h masterhost -p 54321 -S -D -R -e joe

CREATE ROLE joe NOSUPERUSER NOCREATEDB NOCREATEROLE INHERIT LOGIN;

CREATE ROLE

To create the role joe as a superuser, and assign a password immediately:

createuser -P -s -e joe

Enter password for new role: admin123

Enter it again: admin123

CREATE ROLE joe PASSWORD 'admin123' SUPERUSER CREATEDB CREATEROLE INHERIT LOGIN;

CREATE ROLE

In the above example, the new password is not actually echoed when typed, but we show what was typed for clarity. However the password will appear in the echoed command, as illustrated if the -e option is used.

5. dropdb: Removes a database. To destroy the database named demo using default connection parameters:

dropdb demo

To destroy the database named demo using connection options, with verification, and a peek at the underlying command:

dropdb -p 54321 -h masterhost -i -e demo

Database "demo" will be permanently deleted.

Are you sure? (y/n) y

DROP DATABASE "demo"

DROP DATABASE

6. droplang: Removes a procedural language. To remove the language pltcl:

droplang pltcl mydatabase

7. dropuser: Removes a database role. To remove the role joe using default connection options:

dropuser joe

DROP ROLE

To remove the role joe using connection options, with verification, and a peek at the underlying command:

dropuser -p 54321 -h masterhost -i -e joe

Role "joe" will be permanently removed.

Are you sure? (y/n) y

DROP ROLE "joe"

DROP ROLE

8. ecpg: Embedded SQL C preprocessor. If you have an embedded SQL C source file named prog1.pgc, you can create an executable program using the following sequence of commands:

ecpg prog1.pgc

cc -I/usr/local/pgsql/include -c prog1.c

cc -o prog1 prog1.o -L/usr/local/greenplum-db-4.3.x.x/lib -lecpg

9. pg_config: Retrieves information about the installed version of Greenplum Database. 

To reproduce the build configuration of the current Greenplum Database installation, run the following command:

eval ./configure 'pg_config --configure'

The output of pg_config --configure contains shell quotation marks so arguments with spaces are represented correctly. Therefore, using eval is required for proper results.

10. pg_dump: Extracts a database into a single script file or other archive file.

Dump a database called mydb into a SQL-script file:

pg_dump mydb > db.sql

To reload such a script into a (freshly created) database named newdb:

psql -d newdb -f db.sql

Dump a Greenplum database in tar file format and include distribution policy information:

pg_dump -Ft --gp-syntax mydb > db.tar

To dump a database into a custom-format archive file:

pg_dump -Fc mydb > db.dump

To reload an archive file into a (freshly created) database named newdb:

pg_restore -d newdb db.dump

To dump a single table named mytab:

pg_dump -t mytab mydb > db.sql

To specify an upper-case or mixed-case name in -t and related switches, you need to double-quote the name; else it will be folded to lower case. But double quotes are special to the shell, so in turn they must be quoted. Thus, to dump a single table with a mixed-case name, you need something like:

pg_dump -t '"MixedCaseName"' mydb > mytab.sql

11. pg_dumpall: Extracts all databases in a Greenplum Database system to a single script file or other archive file.

To dump all databases:

pg_dumpall > db.out

To reload this file:

psql template1 -f db.out

To dump only global objects (including filespaces and resource queues):

pg_dumpall -g -f -r

12. pg_restore: Restores a database from an archive file created by pg_dump. Assume we have dumped a database called mydb into a custom-format dump file:

pg_dump -Fc mydb > db.dump

To drop the database and recreate it from the dump:

dropdb mydb

pg_restore -C -d template1 db.dump

To reload the dump into a new database called newdb. Notice there is no -C, we instead connect directly to the database to be restored into. Also note that we clone the new database from template0 not template1, to ensure it is initially empty:

createdb -T template0 newdb

pg_restore -d newdb db.dump

To reorder database items, it is first necessary to dump the table of contents of the archive:

pg_restore -l db.dump > db.list

The listing file consists of a header and one line for each item, for example,

; Archive created at Fri Jul 28 22:28:36 2006

; dbname: mydb

; TOC Entries: 74

; Compression: 0

; Dump Version: 1.4-0

; Format: CUSTOM

;

; Selected TOC Entries:

;

2; 145344 TABLE species postgres

3; 145344 ACL species

4; 145359 TABLE nt_header postgres

5; 145359 ACL nt_header

6; 145402 TABLE species_records postgres

7; 145402 ACL species_records

8; 145416 TABLE ss_old postgres

9; 145416 ACL ss_old

10; 145433 TABLE map_resolutions postgres

11; 145433 ACL map_resolutions

12; 145443 TABLE hs_old postgres

13; 145443 ACL hs_old

Semicolons start a comment, and the numbers at the start of lines refer to the internal archive ID assigned to each item. Lines in the file can be commented out, deleted, and reordered. For example,

10; 145433 TABLE map_resolutions postgres

;2; 145344 TABLE species postgres

;4; 145359 TABLE nt_header postgres

6; 145402 TABLE species_records postgres

;8; 145416 TABLE ss_old postgres

Could be used as input to pg_restore and would only restore items 10 and 6, in that order:

pg_restore -L db.list db.dump

If your installation has any local additions to the template1 database, be careful to load the output of pg_restore into a truly empty database; otherwise you are likely to get errors due to duplicate definitions of the added objects. To make an empty database without any local additions, copy from template0 not template1, for example:

CREATE DATABASE foo WITH TEMPLATE template0;

Once restored, it is wise to run ANALYZE on each restored table so the query planner has useful statistics.

13. psql: Interactive command-line interface for Greenplum Database.

Start psql in interactive mode:

psql -p 54321 -U sally mydatabase

In psql interactive mode, spread a command over several lines of input. Notice the changing prompt:

testdb=> CREATE TABLE my_table (

testdb(> first integer not null default 0,

testdb(> second text)

testdb-> ;

CREATE TABLE

Look at the table definition:

testdb=> \d my_table

Table "my_table"

Attribute | Type | Modifier

-----------+---------+--------------------

first | integer | not null default 0

second | text |

Run psql in non-interactive mode by passing in a file containing SQL commands:

psql -f /home/gpadmin/test/myscript.sql

psql returns 0 to the shell if it finished normally, 1 if a fatal error of its own (out of memory, file not found) occurs, 2 if the connection to the server went bad and the session was not interactive, and 3 if an error occurred in a script and the variable ON_ERROR_STOP was set.

psql is a client application for Greenplum Database. In order to connect to a database you need to know the name of your target database, the host name and port number of the Greenplum master server and what database user name you want to connect as. psql can be told about those parameters via command line options, namely -d, -h, -p, and -U respectively. If an argument is found that does not belong to any option it will be interpreted as the database name (or the user name, if the database name is already given). Not all these options are required; there are useful defaults. If you omit the host name, psql will connect via a UNIX-domain socket to a master server on the local host, or via TCP/IP to localhost on machines that do not have UNIX-domain sockets. The default master port number is 5432. If you use a different port for the master, you must specify the port. The default database user name is your UNIX user name, as is the default database name. Note that you cannot just connect to any database under any user name. Your database administrator should have informed you about your access rights. When the defaults are not right, you can save yourself some typing by setting any or all of the environment variables PGAPPNAME, PGDATABASE, PGHOST, PGPORT, and PGUSER to appropriate values.

It is also convenient to have a ~/.pgpass file to avoid regularly having to type in passwords. This file should reside in your home directory and contain lines of the following format:

hostname:port:database:username:password

The permissions on .pgpass must disallow any access to world or group (for example: chmod 0600 ~/.pgpass). If the permissions are less strict than this, the file will be ignored. (The file permissions are not currently checked on Microsoft Windows clients, however.). If the connection could not be made for any reason (insufficient privileges, server is not running, etc.), psql will return an error and terminate.

In normal operation, psql provides a prompt with the name of the database to which psql is currently connected, followed by the string => for a regular user or =# for a superuser. For example:

testdb=>

testdb=#

At the prompt, the user may type in SQL commands. Ordinarily, input lines are sent to the server when a command-terminating semicolon is reached. An end of line does not terminate a command. Thus commands can be spread over several lines for clarity. If the command was sent and executed without error, the results of the command are displayed on the screen.

14. reindexdb: Rebuilds indexes in a database.

To reindex the database mydb:

reindexdb mydb

To reindex the table foo and the index bar in a database named abcd:

reindexdb --table foo --index bar abcd

15. vacuumdb: Garbage-collects and analyzes a database. 

To clean the database test:

vacuumdb test

To clean and analyze a database named bigdb:

vacuumdb --analyze bigdb

To clean a single table foo in a database named mydb, and analyze a single column bar of the table. Note the quotes around the table and column names to escape the parentheses from the shell:

vacuumdb --analyze --verbose --table 'foo(bar)' mydb

Greenplum Database uses the standard PostgreSQL client programs and provides additional client utilities for administering a distributed Greenplum Database DBMS. Greenplum Database client utilities reside in $GPHOME/bin.