Summary
Relational databases like PostgreSQL have long been dominant for data storage and access, but sometimes you need access from your application to data that’s either in a different database format, in a non-relational database, or not in a database at all. As shown in this “how-to” post, you can do that with PostgreSQL’s Foreign Data Wrapper feature.
Extending a PostgreSQL Datastore With FDWs
This how-to looks at using the Foreign Data Wrapper feature of PostgreSQL to enable access from your application to data that’s not in a PostgreSQL relational datastore. The how-to covers the following topics:
- About Foreign Data Wrappers
- Example Implementation: WhiteDB
- Setting the Environment
- Into the Code
- Some Tricky Bits
- And Beyond…
About Foreign Data Wrappers
Relational databases like PostgreSQL (PG) have long been dominant for data storage and access, but sometimes you need access from your application to data that’s either in a different database format, in a non-relational database, or not in a database at all. In PostgreSQL, this capability is provided by Foreign Data Wrappers (FDWs), which support pluggable data backends. FDW backends can be a surprisingly powerful tool when your data model isn’t classically relational but you still want all the nice things that come with PostgreSQL (aggregates, client libraries, authentication, group by, etc.).
Leveraging PostgreSQL’s support for ANSI SQL and secure client libraries like JDBC and ODBC, FDWs in PG support a wide range of applications. They provide access to key:value stores like MongoDB, to ACID guarantees when accessing remote MySQL or PostgreSQL servers, and to web services like Twitter and Philips Hue smart light bulbs. A mostly complete list of implementations is available from the PostgreSQL Wiki.
FDWs are implemented using callback functions. In this primer we’ll show how to use FDWs to front-end your own datastores, and to allow JOINs with native PG data and data stored in other FDW-accessible systems. We use FDWs this way at Kentik as part of the Kentik Data Engine (KDE) that powers Kentik Detect, the massively scalable big data-based SaaS for network visibility.
FDWs enable Kentik Detect to use SQL-compatible syntax and to take advantage of PostgreSQL’s authentication, SSL, and libpsql functionality. They also allow us to simplify some sections of the code by relying on PG’s ability to combine multiple result sets. Kentik Detect supports multi-petabyte datastores spread across tens or hundred of nodes, but to users these stores look like SELECTs from normal PG tables. Queries run in parallel across the cluster(s), with multi-tenancy features like rate-limiting and “query fragment” caching that are critical for our use cases but aren’t found in PG itself.
So far the biggest hurdle that aspiring PostgreSQL hackers face in writing FDWs of their own is the lack of documentation; the best documentation has typically been the actual FDW implementations. Hopefully this primer will help make things easier. So let’s dig into how to implement an FDW…
Example Implementation: WhiteDB
FDWs come in two main forms: writable and read only. The only difference is that support for writes requires a few additional callback functions.
I’ve posted on GitHub a simple writable FWD example for WhiteDB (WDB), which is a project that stores data in shared memory (see http://www.whitedb.org). This FDW, wdb_fdw, allows PostgreSQL to read and write into WDB managed memory.
While WDB is a key:value store, and therefore provides only a subset of what you could do with full-on SQL database functionality, wdb_fdw enables you to read, write, and delete using statements like the following:
INSERT INTO my_table (key, value) VALUES ('key', 'value');
SELECT * FROM my_table WHERE key = 'key';
DELETE FROM my_table WHERE key = 'key';
Note: The code posted to GitHub to support this article has the following limitations:
- It only supports adding and deleting values based on a key.
- Integers are cast to TEXT (string) when displayed by PostgreSQL.
Setting the Environment
To get started, let’s set up our dev environment:
- Assuming a debian/ubuntu setup,
apt-get install postgresql-server-dev-9.X
(Current latest release is 9.4). - Install whitedb from source.
- When these are done, you should be able to:
git clone https://github.com/Kentik/wdb_fdw.git
cd wdb_fdw
make
sudo make install
Diving into the source code, first note a lot of boilerplate in the Makefile. Major things to change are the flags:
SHLIB_LINK = -lwgdb -lsasl2EXTENSION = wdb_fdw
Everything else can be left as it is from the defaults. This works because the following magic line pulls in all of the machinery PostgreSQL needs to build an FDW:
PGXS := $(shell $(PG_CONFIG) --pgxs)include $(PGXS)
Once things are installed, create a new table by installing the FWD extension (actually a .so file) for your database and then creating a table using the FDW as its data engine:
CREATE EXTENSION wdb_fdw;
CREATE SERVER wdb FOREIGN DATA WRAPPER wdb_fdw OPTIONS (address '1000', size '1000000');
CREATE USER MAPPING FOR PUBLIC SERVER wdb;
CREATE FOREIGN TABLE _table_name_ (key TEXT, value TEXT) SERVER wdb;
In the CREATE SERVER
call, address
is a shared memory segment address, and size is how many bytes to grab. If you try to write more rows than can fit into the address you create here, bad things happen.
Note: PostgreSQL will reload the .so file once per client process. So when you make a change and re-install this file, make sure to exit and re-connect to your psql client to pick up the latest code.
Into the Code
Now that the extension is working, we can dive right into the meat of the code:
src/wdb_fwd.c
A few notes on PostgreSQL’s code conventions:
- PG uses the Datum type as a general placeholder for any data type to be stored.
- PG has two major collection types: linked list (LIST *) and hash table (HTAB * ).
- Almost every function gets its arguments in PG passed in as a set of void*. Figuring out what what exactly you are getting and how to use it boils down to grepping in the PG source code for example uses. This is about as entertaining as it sounds. The best places I know of to look are the contributed code in PG’s source distribution and other FDWs online.
- PG implements its own memory management with palloc and pfree. These do what you would expect from the std lib.
In wdb_fdw
, the first code block you come to is this:
PG_FUNCTION_INFO_V1(wdb_fdw_handler);
PG_FUNCTION_INFO_V1(wdb_fdw_validator);
wdb_fdw_validator()
is what makes the CREATE SERVER
call work. It parses the OPTIONS
clause and sets up a wdbTableOptions
stuct.
Note that options come in as a linked list. The following macro Is how PostgreSQL iterates over a LIST* value:
foreach(cell, options_list) {DefElem \*def = (DefElem\*) lfirst(cell);
Moving on, wdb_fdw_handler()
is the real code of the FDW. It sets up the FDW and lays out all of the callback functions supplied. The only required functions are these:
fdwroutine->GetForeignRelSize = wdbGetForeignRelSize; // How many rows are in a foreign table?
fdwroutine->GetForeignPaths = wdbGetForeignPaths; // Bookkeeping function, which makes the PG planner able to do its thing.
fdwroutine->GetForeignPlan = wdbGetForeignPlan; // Another bookkeeping function for the planner.
fdwroutine->BeginForeignScan = wdbBeginForeignScan; // As you might expect, called at the beginning of each scan over a foreign table.
fdwroutine->IterateForeignScan = wdbIterateForeignScan; // Called for every row. When this function returns null, means that the scan should stop.
fdwroutine->ReScanForeignScan = wdbReScanForeignScan; // Sometimes PG may need to re-scan a table.
fdwroutine->EndForeignScan = wdbEndForeignScan; // At the end, put all of your cleanup code here.
These functions are called in this order by PostgreSQL, during the course of running SELECT * FROM my_table
. Here’s a closer look at the functions:
GetForeignRelSize
,GetForeignPaths
, andGetForeignPlan
are all stubs, which don’t really do anything for our purposes. More information about them is in the comments above each function in the code.wdbBeginForeignScan
is given the options defined in the create server clause. It uses these to open a handle into the WDB table. This handle is then used at each iteration (IterateForeignScan
) to read or write into the WDB table. And, as you might expect, it is closed in theEndForeignScan()
function.IterateForeignScan
is called repeatedly, and it has the option of either add a row to the result set, or else returning null. When it returnsNULL
, PG assumes that all of the data has been returned.
Additional functions to know about include:
FillTupleSlot()
is a helper function used by IterateForeignScan to actually give data to PostgreSQL. This function gets two pointers:Datum *columnValues
andbool *columnNulls
. The i’th column value goes incolumnValues\[i\]
. If a column is null however,columnNulls\[i\]
is set to true, andcolumnValues\[i\]
is left null as well. This is called once for each row added to a result set.ExecForeignInsert
must be implemented to support inserts.ExecForeignUpdate
andExecForeignDelete
handle updates and deletes respectively.
And that’s it — go out and create (for simpler use cases)!
Some Tricky Bits
At this point the careful reader will start looking at the code and realize that a lot of functions have been left unmentioned, in particular functions to parse the WHERE clause of a query and to handle data types.
The function ApplicableOpExpressionList()
in src/wdb_query.c
shows how both of these can be accomplished. Taken from the FWD for MongoDB (github.com/citusdata/mongo_fdw
) by the good folks at CitusData, this function iterates over a list of WHERE clauses:
foreach(restrictInfoCell, restrictInfoList) {
The function returns a list of the clauses that can be used by WDB. For example, integer and string comparisons are returned, but LIKE conditions are rejected because WDB doesn’t support them. The list of valid clauses is then passed to BuildWhiteDBQuery()
, which uses them to build a WDB query struct. EncodeConstantValue()
is then used to map from PostgreSQL data types into types that WDB can understand, using a big switch statement.
There are, of course, many other tricks and tips, including how to handle GROUP BY
, which we’ll cover in future posts.
And Beyond
The above primer is just a small taste of how Foreign Data Wrappers can be applied to extend a PostgreSQL datastore. At Kentik, we’ve found FDWs built on top of PG to be both stable and scalable, greatly increasing the capacity and capabilities of Kentik Detect. We heartily recommend FDWs if you have similar needs.