PostgreSQL 8.3 Function Performance

While overhauling many of the queries and functions from a PostgreSQL 8.3 database, I noticed different elapsed times with minor changes to code structure and decided to benchmark my observations. Simple modifications such as combining separate IF and ELSEIF expressions into one expression, or replacing them entirely with conditional expressions resulted in surprising performance boosts.

This very primitive benchmark function will determine the number of rows in which the first column (a) is the lowest number, the second column (b) contains the second-lowest number, and the middle column (c) is lower than both columns (d) and (e). It is not efficient and is not supposed to be, but it should be enough to provide a basis for more extensive formal testing, if needed.

There are much faster ways to do this; for instance, you could simply have a simple SELECT statement:

SELECT COUNT(*) FROM foo WHERE a <= b AND b <= c AND c <= d AND c <= e;
count: 17994
Time: 209.326 ms

Or even slightly faster:

Time: 196.504 ms

Again, the purpose of this exercise is to test the impact of code decisions over millions of operations inside Postgres 8.3 functions, not to optimize a query for the fastest result, or for comparing features, functions, or performance in Postgres 9.x, or higher.

Database Setup

Start by generating a simple one-million-row table with five columns of random integers from 0 to 100:

CREATE TABLE foo(a int, b int, c int, d int, e int);
INSERT INTO foo SELECT random()*100, random()*100, random()*100, random()*100, random()*100 FROM generate_series(1, 1000000);

Create a simple function to mimic the min() or least() function available in many languages:

CREATE OR REPLACE FUNCTION foo_min(integer[]) RETURNS integer AS $$
DECLARE result int; i int;
  FOR i IN 1 .. array_upper($1, 1) LOOP
    IF result IS NULL THEN result := $1[i]; 
    ELSEIF $1[i] < result THEN result := $1[i];
    END IF;
  RETURN result;

Unless specifically noted, new code snippets for this function will only replace the code between the BEGIN and END statements. Similarly, unless noted, the SELECT statement for additional benchmarks is the same as the initial benchmark below. The test queries return identical counts (unless noted), so the count will be generally omitted after the initial benchmark. Average times in benchmarks are the truncated mean of several groups of elapsed times in order to ensure reasonable consistency; fractional milliseconds are omitted from the average. All benchmarks are performed on the same dedicated hardware and software, and with no significant changes to the underlying database. All benchmarks were performed manually using the psql shell.

Get a starting benchmark.

SELECT COUNT(*) FROM foo WHERE a = foo_min(ARRAY[a,b,c,d,e]) AND b = foo_min(ARRAY[b,c,d,e]) AND c = foo_min(ARRAY[e,d,c]);
count = 17994
Time: 6958.124 ms (Average: 6959 ms)

The third call to the foo_min function in the above benchmark statement specifies the array elements in order of (e), (d), and (c). While I would have preferred presenting it in alphabetical order (like the first two calls), a bug within the third-party plugin that displays formatted code required rearranging the fields. The re-ordering did not impact metrics.

Wow. Seven seconds. Pretty bad — as intended. Let’s try making some changes.

TEST: Reduce IF Expressions

Consolidate the two IF and ELSEIF expressions into one. Average results were unexpectedly 13.8% faster. That’s quite a bit of an improvement for such a simple change.

FOR i IN 1 .. array_upper($1, 1) LOOP
  IF result IS NULL OR $1[i] < result THEN result := $1[i];
RETURN result;
Time: 5984.493 ms (Average: 5996 ms)

TEST: Replace IF with a Conditional Expression

Eliminate the IF/THEN/END IF expression and replace it with CASE WHEN/THEN/ELSE/END. Average results were 22% faster than the original benchmark. This is somewhat expected as SQL statements are generally faster than their PL/pgSQL equivalents.

FOR i IN 1 .. array_upper($1, 1) LOOP
  result := CASE WHEN $1[i] < result THEN $1[i] ELSE COALESCE(result, $1[i]) END;
RETURN result;
Time: 5436.668 ms (Average: 5418 ms)

TEST: Use a Temporary Variable for Element Value

Since there were several references to $1[i], I thought I’d check the impact of assigning the value to a new local variable (e) and reference that instead. The full function code snippet is included below due to the additional variable declaration above the BEGIN statement. Results were rather poor, 21% slower than the previous benchmark, and only 1.4% faster than the initial benchmark — too close to be a real differentiation.

CREATE OR REPLACE FUNCTION foo_min(integer[]) RETURNS integer AS $$
DECLARE result int; i int; e int;
FOR i IN 1 .. array_upper($1, 1) LOOP
  e = $1[i];
  result := CASE WHEN e < result THEN e ELSE COALESCE(result, e) END;
RETURN result;
Time: 6858.071 ms (Average: 6859 ms)

TEST: Use Aliases for Function Parameters

Since the temporary variable was a bust but I’d still like to avoid the $n parameters, let’s try the CASE WHEN with a parameter alias for increased readability. The full function code snippet is included below due to the altered method signature. Results were virtually identical to the CASE WHEN example, the difference too negligible (roughly 0.5%) to consider an improvement.

CREATE OR REPLACE FUNCTION foo_min(a integer[]) RETURNS integer AS $$
DECLARE result int; i int;
FOR i IN 1 .. array_upper(a, 1) LOOP
  result := CASE WHEN a[i] < result THEN a[i] ELSE COALESCE(result, a[i]) END;
RETURN result;
Time: 5436.733 ms (Average 5390 ms)

TEST: Designate the Function as VOLATILE

Reuse the initial benchmark function and omit the last word, IMMUTABLE, so that the last line matches the following:

$$ LANGUAGE plpgsql;

An IMMUTABLE function cannot modify the database and is guaranteed to return the same results given the same arguments forever. If the CREATE FUNCTION command does not specify a volatility category, a default of VOLATILE is used. When IMMUTABLE is omitted from this function that would otherwise benefit from pre-evaluation by the query optimizer, performance drops dramatically, over 45% slower than the benchmark.

Time: 10141.483 ms (Average 10136 ms)

TEST: Use SQL instead of PL/pgSQL

For this test, a different paradigm, a completely different function, and a new initial benchmark are needed. The function in the initial language benchmark assumes the array will always have five non-NULL integer values, and there is no validation or error handling. The same database is used in these tests, but since this is not an “apples to apples” comparison, do not compare results for these two language-specific benchmarks against any of the benchmarks above.

The new, rather ugly and non-readable function for initial benchmarking with PL/pgSQL:

CREATE OR REPLACE FUNCTION foo_min(integer[]) RETURNS integer AS $$
  RETURN (SELECT CASE WHEN $1[1]<$1[2] AND $1[1]<$1[3] AND $1[1]<$1[4] AND $1[1]<$1[5] THEN $1[1] WHEN $1[2]<$1[3] AND $1[2]<$1[4] AND $1[2]<$1[5] THEN $1[2] WHEN $1[3]<$1[4] AND $1[3]<$1[5] THEN $1[3] WHEN $1[4]<$1[5] THEN $1[4] ELSE $1[5] END);

The new SQL query that assumes exactly five input values, and the results for the PL/pgSQL function:

SELECT COUNT(*) FROM foo WHERE a = foo_min(ARRAY[a,b,c,d,e]);
count = 204543
Time: 23380.717 ms (Average 23505 ms)

Yikes. For the heck of it, I modified the PL/pgSQL version of the function above to use a parameter alias in the hope that it might somehow improve performance. Nope. The result was even worse — over 16% slower, averaging 28 seconds in duration.

The pure SQL equivalent (simply created by removing the BEGIN, END and RETURN statements, and changing plpgsql to sql) was almost six times (568.9%) faster:

CREATE OR REPLACE FUNCTION foo_min(integer[]) RETURNS integer AS $$
SELECT CASE WHEN $1[1]<$1[2] AND $1[1]<$1[3] AND $1[1]<$1[4] AND $1[1]<$1[5] THEN $1[1] WHEN $1[2]<$1[3] AND $1[2]<$1[4] AND $1[2]<$1[5] THEN $1[2] WHEN $1[3]<$1[4] AND $1[3]<$1[5] THEN $1[3] WHEN $1[4]<$1[5] THEN $1[4] ELSE $1[5] END;
Time: 4171.685 ms (Average 4132 ms)


Significant performance improvements can sometimes be made with small, subtle changes. Use the IMMUTABLE volatility category where appropriate (and ONLY where appropriate), use the closest-to-native query language that can obtain desired results, and rethink the use of control structures where practical. Use function parameter aliases when benchmarking tests prove they have no impact, but avoid the unnecessary declaration and assignment of ephemeral variables whenever possible.

Dealing with EC2 Internal Compiler Errors

As my hosting needs grow, I am continually surprised at how well Amazon EC2 t1.micro instances meet those needs as long as I’m serving up rather static or barely dynamic content. However, they sometimes fall short when intense computational, CPU, or network bursts surface.

The Problem

While compiling a daemon from C++ source in my t1.micro, it stopped with an unusual error:

g++: internal compiler error: Killed (program cc1plus)

The last few lines of output from dmesg showed the cause:

[26601705] Out of memory: Kill process 30909 (cc1plus)
[26601705] Killed process 30909 (cc1plus) total-vm:592952kB, anon-rss:548596kB, file-rss:0kB

The t1.micro instance simply ran out of memory needed for the compile.

The Solution

Create a 1GB swap file that enables some of your disk space to be used for memory:

sudo dd if=/dev/zero of=/swapfile bs=64M count=16
sudo mkswap /swapfile
sudo swapon /swapfile

The process on the tiny instance was as achingly slow as expected, but after an hour of maxing out the CPU at 100% my daemon finishing compiling without any additional errors. Any glitches found thereafter were the fault of my own programming, and not the compiler.

Curbing the Grammar Nazi

I admit it: I am a sociodescriptive prescriptivist.

For the better part of several decades, I have been a self-described Grammar Nazi, a fanatical grammar snob. While I never reached the heights (or depths) of those similarly obsessed who wield black Sharpie pens and bottles of White-Out to vandalize bus-stop advertisements and wall-mounted eatery menus with their copy edits, I have long half-admired their antics and the required ballsiness to perform them that I lack.

It’s not entirely my fault; I am a product of my upbringing. As an infant, I attended school with my mother at university as she studied to be an English school teacher; it is certainly possible that it was there that I absorbed the proper use of the oft-abused apostrophe, the serial comma, the lonely semicolon. In my early teens, while perusing a menu at whatever neighborhood Asian restaurant or pizza parlor we happened to dine, my Dad and I would race to locate such classic dishes as “rorsted chicken” or “side of rice and scorn”, a practice I continue today with my own son. But before even that began in earnest, I was forever ruined by my first-grade teacher.

I am a product of two continents — while I was born in the United States, I spent my formative years in England. As bad luck would have it, I was teased in England for my youthful American accent, and mercilessly teased upon return thanks to my pronounced British accent. I couldn’t win. Accent aside, I learnt to write in British public schools, not American ones. My first-grade teacher in the United States consistently gave me poor marks for spelling. One rueful day, fed up, I confronted her in the middle of class. Angrily, I pointed at each of the red-checked words, “What’s wrong with them?” Each, she huffily explained, was misspelled. No, they weren’t, I insisted. I grabbed the dictionary from her shelf and proceeded to show her every “misspelled” word. “If they’re in the dictionary, they aren’t spellt wrong, are they?” I asked with as much haughtiness as I could muster. The word amongst is the one that I still vividly remember to this day. While archaic even back in the ’70s, at least I had used the American spelling. Unfortunately, even though she was clearly wrong, my teacher refused to back down. A grammarian was born.

A few years ago, I hit rock bottom. Omitted (or extraneous) apostrophes or incorrect capitalization distracted me so much that I was frequently compelled to physically shield my eyes from the wall-projected offenses during business meetings in order to rely on aural input and deflect my focus back to the import of what was being meant, not what was incorrectly written. The errors became truly offensive, beacons of putrescence, literally screaming at me in rabid, frothing glorification of their unjust but undeniable existence. Just as an alcoholic must first admit they have an addiction before they can truly begin to loosen the shackles, the admission that I, too, was an addict — a grammar addict — was the first step. I am still in recovery.

I used to highlight so many errors on resumes that crossed my desk that they had more yellow pockmarks than a fair-skinned, green-eyed redhead has freckles on a sunny Irish afternoon. With admission of the addiction came slowly a slight softening of strict standards, beginning with the realization that it is less important which standard is used, as long as any standard is consistently and appropriately used. Today, unless I’m hiring a professional writer, my standards are much more relaxed. I am much more likely to forgive an engineer for incorrectly listing “JAVA” in his list of programming language skills, as long as he is consistent in the mis-capitalization. A development manager who refers back to his work experience “in the 90’s” might still proceed to the interview stage as long as he is consistent in his mis-use of the apostrophe. That said, a marketeer who spits out 140-character tweets with “u” or “2” in place of “you” or “to/too/two” may not display such inappropriate grammatical disregard when posting a status on Facebook, whether corporate or personal.

My cousin Meg recently reposted on Facebook a link to an article that catapulted me significantly across the chasm of recovery. The author, Mary Rolf, also a self-proclaimed grammar snob, recounted her own moments of epiphany that “changed [her] whole outlook, not just on grammar but also on the social impact of language.“, and solidified my own loosening standards in concrete terms.

There are two schools of thought when it comes to how we should use language. One is “prescriptive” and it’s backed by grammar snobs and the kind of people who froth at the mouth over the decline of “the King’s English”. The other is “descriptive” and it’s more about accepting that how people use language is how language works. A prescriptivist believes in the idea of standard English and sees mistakes everywhere. A descriptivist sees many englishes [sic], and none of them are standard.

Descriptive linguistics is the objective analysis and description of how language is (or was) spoken by a group of people in a community, without the bias of preconceived ideas about how it ought to be. On the other hand, linguistic prescription, predominant in education and publishing, seeks to define standard language forms and provide subjective guidelines on the appropriate use of language. While prescription and description are complementary, descriptivism is the belief that description is more significant or important to teach, study, and practice than prescription. A grammarian must understand and conceptually embrace the two linguistic schools before attaining the optimum compromise of “sociolinguistic prescriptivism” — prescriptivism based on the application of socially relevant descriptive linguistics.

What that mouthful means in plain English is to apply the rules of grammar and vocabulary using standards dictated by the social setting. How people use language is how language works, especially socially. Twitter imposes an artificial constraint that impacts adherence to formal language standards, establishing its own unique norm. The Pavlovian leg-jerk responses generated by mobile messaging typically belie formality. Socialeconomic, educational, anthropological, and cultural factors also impact use of language. Teens unconsciously recognize this intuitively; emails or messages they send to their parents or teachers are generally less informal than those sent to their BFFs.

Description is sinking in, although it has a Titanic’s chance of being unsinkable as it does entirely replacing prescriptivism. It’s now OK that my boss makes spelling mistakes while we’re informally collaborating on the overhead. It’s OK that the five-year mission was to boldly split infinitives in the name of entertainment. It’s OK that my Ukrainian-born daughter with less than four years’ exposure to English informally texted me that she was waiting for me and “siting on the bench outside”. It’s OK that I used “less than” in the previous sentence instead of the more traditionally correct “fewer than” because it is contextually and socially appropriate. It’s OK that in the name of artistic licence, I sometimes spelled spelled as spelt and othertimes spelt spelt as spelled since spelled may be spelled spelt or sometimes spelt spelled, depending on which English is being writ at the tyme.

Yet the average grammar snob is wholly prescriptive, afflicted with the inability to make or accept these social distinctions — myself not unrecently included. Lurking behind that mask of affliction is the knowledge that the English language is being socially changed by the ever-increasing population of people that don’t know any better, don’t give a dåmn, or it’s not their job to care. Grammar Nazis see themselves as the last bastion of proper English, crusaders in a losing battle to educate the uneducated, to carve out some perfection in an imperfect, chaotic, changing world in which the current form of English is being flavoured by the homogenous melting pot of an increasingly universal language. The author of a formal publication should be literally boiled in oil for the slightest of grammatical errors, any comments appended with an unsolicited “You’re welcome!” as if the literary offenders must also be educated on the necessity of gratitude, like it or not.

However, grammarians must avoid the parallelling stigma of the KKK or the Nazis, and accept sociolinguistic prescriptivism such that usage of formal English is much like the usage of formal clothing. Just as one should not expect their cardiologist to show up in surgery unshaven and wearing a dirty, holey T-shirt, one also should not expect their gardener to mow the lawn wearing a Caraceni suit. Conversely, on his days off, the surgeon can don his bathrobe and slippers on his front lawn for all I care. In the words of Robert Randolph and the Family Band, “Ain’t nothin’ wrong with that.”

I will leave you with this final philosophical thought experiment: If a grammar snob comments on an article, have they contributed anything meaningful to the conversation?”