Multiple Macs with One Time Capsule

Time Machine is a built-in backup system in OS X that backs up all of your files to an external hard drive so that you can restore them later or see how they looked in the past. It requires an external hard drive of some kind, whether it be an attached USB drive, a network storage drive, or an Apple Time Capsule.

With many households today having several Apple computers, it is likely that there will be a few Macs that need to be backed up to the same storage device, often for multiple users. Personally, I use a separate USB drive attached to my iMac for the most efficient, easiest backup solution, but our three MacBooks are backed up to a single Time Capsule.

The easiest method of setting up access to the shared disk on a Time Capsule is to use the base system password, but that password is used to reconfigure your Time Capsule, so that’s a really bad idea. Conversely, the most-recommended option is to specify a disk password. This assigns a separate password for use with Time Machine which cannot be used to reconfigure the Time Capsule and therefore does not compromise its security.

The final (and, in my opinion, best) option available is to set up different user accounts, each with its own name and password. This is the most secure but least used option, mainly because there’s a widely held misconception that it is extremely difficult to perform maintenance on the Time Capsule because each account user can only see their own space on the disk. Time Capsules frequently run out of disk space, and Time Machine will delete the oldest backups to make room — but only for the Mac it’s running on, and no others!

The Secret Solution

Apple has a secret, undocumented solution! There is one built-in user that can access all of the user folders. Browse to your Time Capsule using Finder. Click the “Connect As…” button. Enter “admin” as the Name, and the Time Capsule’s Base Station password in the Password field. This is a pseudo-root user that has access to all users’ folders and data.

Exactly how to use that user to perform maintenance and analyze usage is a topic for another time.



PostgreSQL 8.3 Function Performance

While overhauling many of the queries and functions from a PostgreSQL 8.3 database, I noticed different elapsed times with minor changes to code structure and decided to benchmark my observations. Simple modifications such as combining separate IF and ELSEIF expressions into one expression, or replacing them entirely with conditional expressions resulted in surprising performance boosts.

This very primitive benchmark function will determine the number of rows in which the first column (a) is the lowest number, the second column (b) contains the second-lowest number, and the middle column (c) is lower than both columns (d) and (e). It is not efficient and is not supposed to be, but it should be enough to provide a basis for more extensive formal testing, if needed.

There are much faster ways to do this; for instance, you could simply have a simple SELECT statement:

SELECT COUNT(*) FROM foo WHERE a <= b AND b <= c AND c <= d AND c <= e;
count: 17994
Time: 209.326 ms

Or even slightly faster:

SELECT COUNT(*) FROM foo WHERE b BETWEEN a AND c AND c <= d AND c <= e;
Time: 196.504 ms

Again, the purpose of this exercise is to test the impact of code decisions over millions of operations inside Postgres 8.3 functions, not to optimize a query for the fastest result, or for comparing features, functions, or performance in Postgres 9.x, or higher.

Database Setup

Start by generating a simple one-million-row table with five columns of random integers from 0 to 100:

CREATE TABLE foo(a int, b int, c int, d int, e int);
INSERT INTO foo SELECT random()*100, random()*100, random()*100, random()*100, random()*100 FROM generate_series(1, 1000000);

Create a simple function to mimic the min() or least() function available in many languages:

CREATE OR REPLACE FUNCTION foo_min(integer[]) RETURNS integer AS $$
DECLARE result int; i int;
BEGIN
  FOR i IN 1 .. array_upper($1, 1) LOOP
    IF result IS NULL THEN result := $1[i]; 
    ELSEIF $1[i] < result THEN result := $1[i];
    END IF;
  END LOOP;
  RETURN result;
END;
$$ LANGUAGE plpgsql IMMUTABLE;

Unless specifically noted, new code snippets for this function will only replace the code between the BEGIN and END statements. Similarly, unless noted, the SELECT statement for additional benchmarks is the same as the initial benchmark below. The test queries return identical counts (unless noted), so the count will be generally omitted after the initial benchmark. Average times in benchmarks are the truncated mean of several groups of elapsed times in order to ensure reasonable consistency; fractional milliseconds are omitted from the average. All benchmarks are performed on the same dedicated hardware and software, and with no significant changes to the underlying database. All benchmarks were performed manually using the psql shell.

Get a starting benchmark.

SELECT COUNT(*) FROM foo WHERE a = foo_min(ARRAY[a,b,c,d,e]) AND b = foo_min(ARRAY[b,c,d,e]) AND c = foo_min(ARRAY[e,d,c]);
count = 17994
Time: 6958.124 ms (Average: 6959 ms)

The third call to the foo_min function in the above benchmark statement specifies the array elements in order of (e), (d), and (c). While I would have preferred presenting it in alphabetical order (like the first two calls), a bug within the third-party plugin that displays formatted code required rearranging the fields. The re-ordering did not impact metrics.

Wow. Seven seconds. Pretty bad — as intended. Let’s try making some changes.

TEST: Reduce IF Expressions

Consolidate the two IF and ELSEIF expressions into one. Average results were unexpectedly 13.8% faster. That’s quite a bit of an improvement for such a simple change.

FOR i IN 1 .. array_upper($1, 1) LOOP
  IF result IS NULL OR $1[i] < result THEN result := $1[i];
  END IF;
END LOOP;
RETURN result;
Time: 5984.493 ms (Average: 5996 ms)

TEST: Replace IF with a Conditional Expression

Eliminate the IF/THEN/END IF expression and replace it with CASE WHEN/THEN/ELSE/END. Average results were 22% faster than the original benchmark. This is somewhat expected as SQL statements are generally faster than their PL/pgSQL equivalents.

FOR i IN 1 .. array_upper($1, 1) LOOP
  result := CASE WHEN $1[i] < result THEN $1[i] ELSE COALESCE(result, $1[i]) END;
END LOOP;
RETURN result;
Time: 5436.668 ms (Average: 5418 ms)

TEST: Use a Temporary Variable for Element Value

Since there were several references to $1[i], I thought I’d check the impact of assigning the value to a new local variable (e) and reference that instead. The full function code snippet is included below due to the additional variable declaration above the BEGIN statement. Results were rather poor, 21% slower than the previous benchmark, and only 1.4% faster than the initial benchmark — too close to be a real differentiation.

CREATE OR REPLACE FUNCTION foo_min(integer[]) RETURNS integer AS $$
DECLARE result int; i int; e int;
BEGIN
FOR i IN 1 .. array_upper($1, 1) LOOP
  e = $1[i];
  result := CASE WHEN e < result THEN e ELSE COALESCE(result, e) END;
END LOOP;
RETURN result;
END;
$$ LANGUAGE plpgsql IMMUTABLE;
Time: 6858.071 ms (Average: 6859 ms)

TEST: Use Aliases for Function Parameters

Since the temporary variable was a bust but I’d still like to avoid the $n parameters, let’s try the CASE WHEN with a parameter alias for increased readability. The full function code snippet is included below due to the altered method signature. Results were virtually identical to the CASE WHEN example, the difference too negligible (roughly 0.5%) to consider an improvement.

CREATE OR REPLACE FUNCTION foo_min(a integer[]) RETURNS integer AS $$
DECLARE result int; i int;
BEGIN
FOR i IN 1 .. array_upper(a, 1) LOOP
  result := CASE WHEN a[i] < result THEN a[i] ELSE COALESCE(result, a[i]) END;
END LOOP;
RETURN result;
END;
$$ LANGUAGE plpgsql IMMUTABLE;
Time: 5436.733 ms (Average 5390 ms)

TEST: Designate the Function as VOLATILE

Reuse the initial benchmark function and omit the last word, IMMUTABLE, so that the last line matches the following:

$$ LANGUAGE plpgsql;

An IMMUTABLE function cannot modify the database and is guaranteed to return the same results given the same arguments forever. If the CREATE FUNCTION command does not specify a volatility category, a default of VOLATILE is used. When IMMUTABLE is omitted from this function that would otherwise benefit from pre-evaluation by the query optimizer, performance drops dramatically, over 45% slower than the benchmark.

Time: 10141.483 ms (Average 10136 ms)

TEST: Use SQL instead of PL/pgSQL

For this test, a different paradigm, a completely different function, and a new initial benchmark are needed. The function in the initial language benchmark assumes the array will always have five non-NULL integer values, and there is no validation or error handling. The same database is used in these tests, but since this is not an “apples to apples” comparison, do not compare results for these two language-specific benchmarks against any of the benchmarks above.

The new, rather ugly and non-readable function for initial benchmarking with PL/pgSQL:

CREATE OR REPLACE FUNCTION foo_min(integer[]) RETURNS integer AS $$
BEGIN
  RETURN (SELECT CASE WHEN $1[1]<$1[2] AND $1[1]<$1[3] AND $1[1]<$1[4] AND $1[1]<$1[5] THEN $1[1] WHEN $1[2]<$1[3] AND $1[2]<$1[4] AND $1[2]<$1[5] THEN $1[2] WHEN $1[3]<$1[4] AND $1[3]<$1[5] THEN $1[3] WHEN $1[4]<$1[5] THEN $1[4] ELSE $1[5] END);
END;
$$ LANGUAGE plpgsql IMMUTABLE;

The new SQL query that assumes exactly five input values, and the results for the PL/pgSQL function:

SELECT COUNT(*) FROM foo WHERE a = foo_min(ARRAY[a,b,c,d,e]);
count = 204543
Time: 23380.717 ms (Average 23505 ms)

Yikes. For the heck of it, I modified the PL/pgSQL version of the function above to use a parameter alias in the hope that it might somehow improve performance. Nope. The result was even worse — over 16% slower, averaging 28 seconds in duration.

The pure SQL equivalent (simply created by removing the BEGIN, END and RETURN statements, and changing plpgsql to sql) was almost six times (568.9%) faster:

CREATE OR REPLACE FUNCTION foo_min(integer[]) RETURNS integer AS $$
SELECT CASE WHEN $1[1]<$1[2] AND $1[1]<$1[3] AND $1[1]<$1[4] AND $1[1]<$1[5] THEN $1[1] WHEN $1[2]<$1[3] AND $1[2]<$1[4] AND $1[2]<$1[5] THEN $1[2] WHEN $1[3]<$1[4] AND $1[3]<$1[5] THEN $1[3] WHEN $1[4]<$1[5] THEN $1[4] ELSE $1[5] END;
$$ LANGUAGE sql IMMUTABLE;
Time: 4171.685 ms (Average 4132 ms)

Conclusions

Significant performance improvements can sometimes be made with small, subtle changes. Use the IMMUTABLE volatility category where appropriate (and ONLY where appropriate), use the closest-to-native query language that can obtain desired results, and rethink the use of control structures where practical. Use function parameter aliases when benchmarking tests prove they have no impact, but avoid the unnecessary declaration and assignment of ephemeral variables whenever possible.


Dealing with EC2 Internal Compiler Errors

As my hosting needs grow, I am continually surprised at how well Amazon EC2 t1.micro instances meet those needs as long as I’m serving up rather static or barely dynamic content. However, they sometimes fall short when intense computational, CPU, or network bursts surface.

The Problem

While compiling a daemon from C++ source in my t1.micro, it stopped with an unusual error:

g++: internal compiler error: Killed (program cc1plus)

The last few lines of output from dmesg showed the cause:

[26601705] Out of memory: Kill process 30909 (cc1plus)
[26601705] Killed process 30909 (cc1plus) total-vm:592952kB, anon-rss:548596kB, file-rss:0kB

The t1.micro instance simply ran out of memory needed for the compile.

The Solution

Create a 1GB swap file that enables some of your disk space to be used for memory:

sudo dd if=/dev/zero of=/swapfile bs=64M count=16
sudo mkswap /swapfile
sudo swapon /swapfile

The process on the tiny instance was as achingly slow as expected, but after an hour of maxing out the CPU at 100% my daemon finishing compiling without any additional errors. Any glitches found thereafter were the fault of my own programming, and not the compiler.