PostgreSQL 11 Server Side Programming Quick Start Guide
上QQ阅读APP看书,第一时间看更新

Returning values to the caller

A function can either return a single value of the specified type, or a whole result set (called SETOF). In order to return a single value, either a scalar or complex type, a RETURN statement is used. When the control flow reaches a RETURN, it immediately stops the function execution and provides the value to the caller.

It is possible to issue an empty RETURN, which is a statement that simply ends the control flow of the function without returning any particular value.

In order to see how RETURN can work with a scalar type, let's examine the code snipper of Listing 46. Here, the function accepts a single argument, f_size, which is supposed to be the size of a file (in bytes). The function compares the file size to constant values and determines a final value and a unit of measurement to express the size in. It then creates a string that contains both the numeric value (rounded to two decimal points) and the unit, and performs a RETURN to provide the computed text to the caller. It is possible to invoke the function on each tuple of the files table to get the results, as shown in Listing 7, where the f_size field is passed as an argument to the function, making the latter evaluate the argument on each tuple extracted from the SELECT statement:

testdb=> CREATE OR REPLACE FUNCTION
f_human_file_size( f_size numeric )
RETURNS text AS $code$
DECLARE
size_kb CONSTANT int := 1024 * 1024;
size_mb CONSTANT bigint := size_kb * 1024;
size_gb CONSTANT bigint := size_mb * 1024;
unit text := 'bytes';
BEGIN
IF f_size > size_gb THEN
f_size := f_size / size_gb;
unit := 'MB';
ELSEIF f_size > size_mb THEN
f_size := f_size / size_mb;
unit := 'MB';
ELSEIF f_size > size_kb THEN
f_size := f_size / size_kb;
unit := 'KB';
ELSE
unit := 'bytes';
END IF;
RETURN round( f_size, 2 ) || unit;
END $code$ LANGUAGE plpgsql;

testdb=> SELECT f_name, f_human_file_size( f_size ) FROM files;
f_name | f_human_file_size
--------------+-------------------
picture1.png | 120.80MB
picture2.png | 520.10MB
picture3.png | 300.00MB
audio1.mp3 | 157.65MB
audio2.mp3 | 221.50MB
chapter2.org | 22.40KB
chapter3.org | 36.40KB
TestFile.txt | 123.56KB
foo.txt | 10.50KB
Listing 7:  Function returning a text string

It is possible to return tuples from a function. This can be done in two main ways:

  • If the tuples are extracted from a query, use a RETURN QUERY statement, providing the query to be executed
  • Use RETURN NEXT if the tuples are appended to the result set one at a time

Consider a function that returns all data in the files table that matches a particular type. The results set is made up of a simple SELECT statement. The function that implements this behavior is shown in Listing 8. It accepts a single argument, which is the type of file to search for. Note that the return type of the function is not a single value but instead a result set, so RETURNS SETOF is used to declare it. Since the function is going to return tuples coming from the files table, its return type will be the table itself. In other words, RETURNS SETOF means the function returns tuples with the same structure as the files table. The implementation of the function is straightforward: it performs a simple SELECT statement and returns the entire results set to the caller by means of RETURN QUERY. In other words, this function works a a query wrapper.

Finally, since the function produces a results set of tuples, it can be used where a set of tuples is expected. This means it can be used as a table in the query that runs it.

testdb=> CREATE OR REPLACE FUNCTION
f_file_by_type( file_type text DEFAULT 'txt' )
RETURNS SETOF files AS $code$
BEGIN
RETURN QUERY
SELECT * FROM files
WHERE f_type = file_type
ORDER BY f_name;
END $code$ LANGUAGE plpgsql;

testdb=> SELECT f_name, f_size FROM f_file_by_type( 'png' );
f_name | f_size
--------------+-----------------
picture1.png | 129708383096.83
picture2.png | 558453371016.99
picture3.png | 322122916439.53
Listing 8:  A function that returns tuple from a query

Another way to return tuples is by using the RETURN NEXT statement. This statement produces a function to append a tuple to the output result set. This, to some extent, gives precedence to the caller that can start consuming the tuples, then resumes its execution and appends a new tuple to the result set. Once the function has finished appending tuples, it must perform an empty RETURN. It is also possible to change Listing 8 into Listing 9, where a FOR loop over a query is performed and each tuple is returned to the caller.

testdb=> CREATE OR REPLACE FUNCTION
f_file_by_type( file_type text DEFAULT 'txt' )
RETURNS SETOF files AS $code$
DECLARE
current_tuple record;
BEGIN
FOR current_tuple IN SELECT * FROM files
WHERE f_type = file_type
ORDER BY f_name
LOOP
-- append this tuple to the result set
RETURN NEXT current_tuple;
END LOOP;
--mandatory when done
RETURN;
END $code$ LANGUAGE plpgsql;

testdb=> SELECT f_name, f_size FROM f_file_by_type( 'png' );
f_name | f_size
--------------+-----------------
picture1.png | 129708383096.83
picture2.png | 558453371016.99
picture3.png | 322122916439.53
Listing 9:  A function that returns tuples one at a time

Using RETURN NEXT is suggested when there is a very large result set, so that it can be built and consumed a piece at a time, such as by a result paginator. Alternatively, suppose we have a need to populate the files table with fake data.

It is possible to build a function that generates tuples on the fly, as shown in Listing 10:

testdb=> CREATE OR REPLACE FUNCTION
f_fake_files( max int DEFAULT 10 )
RETURNS SETOF files AS $code$
DECLARE
current_tuple files%rowtype;
BEGIN
IF max <= 0 THEN
RAISE EXCEPTION 'Specify a positive limit for tuple generation';
END IF;
FOR i IN 1 .. max LOOP
current_tuple.pk = nextval( 'files_pk_seq' );
current_tuple.f_name := initcap( 'file_' || i );
current_tuple.f_size := random() * 1024 * 1024;
CASE i % 3
WHEN 0 THEN current_tuple.f_type := 'txt';
WHEN 1 THEN current_tuple.f_type := 'png';
WHEN 2 THEN current_tuple.f_type := 'mp3';
END CASE;
current_tuple.f_hash := md5( current_tuple.f_name );
current_tuple.ts := now();
RAISE DEBUG '%) % generated', i, current_tuple.pk;
RETURN NEXT current_tuple;
END LOOP;
RETURN;
END $code$ LANGUAGE plpgsql;

testdb=> SET client_min_messages TO debug;
testdb=> SELECT * FROM f_fake_files( 3 );
DEBUG:  1) 42 generated
DEBUG: 2) 43 generated
DEBUG: 3) 44 generated
pk | f_name | f_size | f_hash | f_type | ts
----+--------+------------+----------------------------------+--------+----------------------------
42 | File_1 | 979597.23 | 8ba5c586d44c70ca777c0f66b5243b40 | png | 2018-07-16 16:18:27.782463
43 | File_2 | 7134489.49 | f08d556d964b7a7b9374d96c6c2ca8ec | mp3 | 2018-07-16 16:18:27.782463
44 | File_3 | 258768.02 | aac47180851be5de7f5b19fa7afa91b7 | txt | 2018-07-16 16:18:27.782463
Listing 10:  Generating fake tuples via a function

The function shown in Listing 10 returns a set of tuples like the files table ones, the same as in the preceding example. However, to handle the tuple it is constructing, the function declares a current_tuple variable of files%rowtype type. Row type variables are always of the special type <table>, followed by %rowtype. In this case, current_tuple is a tuple with the same data structure as a row from the files table. It is worth noting that a rowtype variable has the same structure as a table tuple, but not the same defaults and constraints. The function performs a FOR loop starting from 1 and limited to the maximum integer argument in order to generate the maximum number of tuples.

Within the FOR loop, the current_tuple fields are initialized with randomly generated values. Each field of the tuple is de-referenced with the dot operator (such as current_tuple.f_name) and each field is assigned an appropriate value. Note that some fields are set to the return value of another function, such as the pk field, which is set to the return value of nextval() against the same sequence of the files table. Once the tuple has been fully built, RETURN NEXT is executed to append the tuple to the output result set that yields the function execution. Once the tuple has been appended to the result set, the function resumes from the FOR cycle and initializes another tuple to append the results set. Once the FOR loop ends, a regular RETURN marks the result set as complete and ends the execution of the function.

In the result set of Listing 10, there is another important thing to note: the ts timestamp is the same for every tuple. This is due to the fact that the function is executed in a single-statement transaction ( SELECT) and the time is set at the beginning of the transaction. If you need to get a different time value each time the function is evaluated, it is required to substitute the now() function call to clock_timestamp().

As you can see in Listing 11, it is possible to use more than one function within the same statement, concatenating the results of a function as arguments of another function and more complex scenarios:

testdb=> SELECT f_name, f_human_file_size( f_size ) FROM f_fake_files( 3 );
f_name | f_human_file_size
--------+-------------------
File_1 | 146713.36bytes
File_2 | 590618.06bytes
File_3 | 1.38KB
Listing 11:  Using more than one function within the same statement

Let's now consider Listing 12, which accepts a Boolean parameter that indicates the function to either create fake tuples or return existing ones from the files table. The function achieves its aim by issuing either a RETURN QUERY, to provide existing tuples from the database, or a RETURN NEXT to dynamically generate tuples:

testdb=> CREATE OR REPLACE FUNCTION
f_files_tuples( fake bool DEFAULT false, max int DEFAULT 5 )
RETURNS SETOF files AS $code$
DECLARE
current_tuple files%rowtype;
BEGIN
IF fake THEN
FOR i IN 1 .. max LOOP
current_tuple.pk = nextval( 'files_pk_seq' );
current_tuple.f_name := initcap( 'file_' || i );
current_tuple.f_size := random() * 1024 * 1024;
CASE i % 3
WHEN 0 THEN current_tuple.f_type := 'txt';
WHEN 1 THEN current_tuple.f_type := 'png';
WHEN 2 THEN current_tuple.f_type := 'mp3';
END CASE;
current_tuple.f_hash := md5( current_tuple.f_name );
current_tuple.ts := now();
RETURN NEXT current_tuple;
END LOOP;
RETURN;
ELSE
RETURN QUERY
SELECT * FROM files ORDER BY random() LIMIT max;
END IF;
END $code$ LANGUAGE plpgsql;

testdb=> SELECT * FROM f_files_tuples( false, 2 )
UNION
SELECT * FROM f_files_tuples( true, 2 )
ORDER BY f_name;

pk | f_name | f_size | f_hash | f_type | ts
----+--------------+-----------------+----------------------------------+--------+----------------------------
61 | File_1 | 1705244.13 | 8ba5c586d44c70ca777c0f66b5243b40 | png | 2018-07-16 17:15:48.571018
62 | File_2 | 2520174.21 | f08d556d964b7a7b9374d96c6c2ca8ec | mp3 | 2018-07-16 17:15:48.571018
18 | chapter2.org | 23488405.98 | 14b8f225d4e6462022657d7285bb77ba | org | 2018-07-12 10:56:15.239616
21 | picture2.png | 558453371016.99 | 323412345666462022657daa85bb77a2 | png | 2018-07-12 10:56:15.239616
Listing 12:  A function that either creates or return existing tuples

Another way to return tuples out of a function is by using RETURN TABLE. The idea behind this is that the function declares a table-like structure to output, including column names and types. The names effectively work as OUT arguments to the function and are therefore variables that are visible within the code block. When the function needs to append a new record to the returning result set, it does so with an empty RETURN NEXT statement. All the values of the named arguments will be used to build the returning tuple. To make this clearer, consider Listing 13, which provides a function to generate fake files tuples, similar to that shown in Listing 10. The difference in this case is that the returned result set is shrunk down to only three columns, called pk, file_name, and file_type, within the RETURNS TABLE clause. This clause defines the three variables that are then assigned within the FOR loop. RETURN NEXT outputs these variable values and appends a new tuple to the result set. Note that neither the table files (or its structure) or a record variable have been used to achieve the final result set.

testdb=> CREATE OR REPLACE FUNCTION
f_fake_small_files( max int DEFAULT 10 )
RETURNS TABLE ( pk int, file_name text, file_type text )
AS $code$
BEGIN
IF max <= 0 THEN
RAISE EXCEPTION 'Specify a positive limit for tuple generation';
END IF;
FOR i IN 1 .. max LOOP
pk = nextval( 'files_pk_seq' );
file_name := initcap( 'file_' || i );
CASE i % 3
WHEN 0 THEN file_type := 'txt';
WHEN 1 THEN file_type := 'png';
WHEN 2 THEN file_type := 'mp3';
END CASE;
RETURN NEXT;
END LOOP;
RETURN;
END $code$ LANGUAGE plpgsql;

testdb=> SELECT * FROM f_fake_small_files( 5 );
pk | file_name | file_type
-----+-----------+-----------
148 | File_1 | png
149 | File_2 | mp3
150 | File_3 | txt
151 | File_4 | png
152 | File_5 | mp3
Listing 13:  A function that returns a table

From a purely syntactic point of view, the function of Listing 13 is equivalent to the one in Listing 14, which explicitly uses a few OUT parameters and RETURNS SETOF to achieve the same result:

testdb=> CREATE OR REPLACE FUNCTION
f_fake_small_files( max int DEFAULT 10, pk OUT int,
file_name OUT text,
file_type OUT text )
RETURNS SETOF record AS $code$
BEGIN
IF max <= 0 THEN
RAISE EXCEPTION 'Specify a positive limit for tuple generation';
END IF;
FOR i IN 1 .. max LOOP
pk = nextval( 'files_pk_seq' );
file_name := initcap( 'file_' || i );
CASE i % 3
WHEN 0 THEN file_type := 'txt';
WHEN 1 THEN file_type := 'png';
WHEN 2 THEN file_type := 'mp3';
END CASE;
RETURN NEXT;
END LOOP;
RETURN;
END $code$ LANGUAGE plpgsql;

testdb=> SELECT * FROM f_fake_small_files( 5 );
pk | file_name | file_type
-----+-----------+-----------
148 | File_1 | png
149 | File_2 | mp3
150 | File_3 | txt
151 | File_4 | png
152 | File_5 | mp3
Listing 14:  A function that returns a result set with OUT parameters

It is also possible to build a more complex function that scans a directory in order to build up a set of records that correspond to the files found in that directory. A possible simple implementation of such a function is shown in Listing 15, but before digging into the code of the function, bear in mind that this kind of function must be run as database superuser. The reason for this is that the function exploits other PostgreSQL functions, such as pg_ls_dir(), that require superuser privileges in order to inspect the filesystem. While it is possible to grant execution permissions to a normal user, the pg_ls_dir() function will only take into account relative path, resulting therefore in a partial limitation in its capabilities. To overtake the problem, and for sake of simplicity, let's assume the function will be executed by a database superuser.

testdb=# CREATE OR REPLACE FUNCTION
f_files_from_directory( dir text DEFAULT '.' )
RETURNS SETOF files AS $code$
DECLARE
current_tuple files%rowtype;
current_file text;
name_parts text[];
is_dir bool;
BEGIN
RAISE DEBUG 'Reading files into %', dir;
FOR current_file IN SELECT pg_ls_dir( dir ) LOOP
RAISE DEBUG 'Building tuple for entry [%]', current_file;
current_tuple.f_name := current_file;
-- absolute file name
current_file := dir || '/' || current_file;
-- use the real file size and moditification time
SELECT size, modification, isdir
FROM pg_stat_file( current_file )
INTO current_tuple.f_size, current_tuple.ts, is_dir;
-- do not manage directories
CONTINUE WHEN is_dir;
-- initialize tuple primary key
current_tuple.pk = nextval( 'files_pk_seq' );
-- try to get the type of the file from its extension
name_parts := string_to_array( current_tuple.f_name, '.' );
IF array_length( name_parts, 1 ) > 1 THEN
current_tuple.f_type := name_parts[ array_length( name_parts, 1 ) ];
ELSE
current_tuple.f_type := 'txt';
END IF;
-- read the content of the file (not binary!) and compute an hash
current_tuple.f_hash := md5( pg_read_file( current_file ) );
RETURN NEXT current_tuple;
END LOOP;
RETURN;
END $code$ LANGUAGE plpgsql;
Listing 15:  A function that scans the local filesystem to produce tuples

The function in Listing 15 accepts the dir argument, which represents the directory in which to search for files. The pg_ls_dir() is used to obtain a set of file names in that directory. Then, those files are cycled in a FOR loop, in which each relative file name is assigned to the current_file variable. Since pg_ls_dir() does not output only the files but also the subdirectories, current_file may also hold the name of a subdirectory. Another utility function, pg_stat_file(), provides information about the metadata of the files, such as its size, its modification timestamp, as well as whether the current entry is a regular file or a directory. After having kept the relative file name in the current_tuple.f_name field, SELECT INTO is performed against pg_stat_file() to get the metadata. Since the directories are not managed, CONTINUE is issued if the is_dir local variable is set to a true value. In this case, the FOR loop restarts from the next directory entry (the next file name). On the other hand, if CONTINUE is not issued (so if the current entry is a file), more current_tuple fields are initialized with data extracted from the pg_stat_file() function call.

It is then time to decide the file type. To keep things simple, the type is determined by the last extension of the file name (for example, foo.txt will result in a file type .txt). First of all, the file name is split into a string array by the string_to_array() function, where each part of the array is the string behind a '.' separator. The resulting array is assigned to a name_parts variable. For example, a file with the name foo.bar.txt is transformed into an array with {'foo', 'bar', 'txt'}. If the name_parts array has more than one element, the last element is the extension and therefore determines the type of the file, otherwise the file type defaults to a .txt one. The last field that must be properly initialized is f_hash. In order to compute the hash of the file, its whole content is read via pg_read_file() and the md5() hash is computed. This function chain works for the content of text files. At this point, the whole tuple has been properly initialized and the function can append it to the result set with RETURN NEXT. Once no more files on the disk are left, the FOR loop ends and the function does an empty RETURN to finish the result set.

Listing 16 shows a possible invocation of the function and its results. The result set of the function can be used to populate the files table via an INSERT statement:

testdb=# SELECT * FROM 
files_from_directory( '/home/luca/git/fluca1978-pg-utils/examples/cte' );
pk | f_name | f_size | f_hash | f_type | ts
-----+-------------------------------+---------+----------------------------------+--------+---------------------
124 | family_tree.sql | 1879.00 | cc948a6e78a1581e350958c71093927d | sql | 2018-05-31 16:17:19
125 | family_tree_recursive_cte.sql | 400.00 | 42a149f41d3c78241160ea473154e4b5 | sql | 2018-05-31 16:17:19
126 | file_system_cte.sql | 1424.00 | acc41b140745747e7647de742868d768 | sql | 2018-05-31 16:17:19
127 | star_wars_family_tree_cte.sql | 2937.00 | 3e2bf991e553ae86e6f1ca2aa525b597 | sql | 2018-05-31 16:17:19
Listing 16:  Filesystem inspecting results