The Perfect Credit Card Number RegEx

I’d love to give you a single credit-card-number-matching regular expression and tell you that my über-cool rule is The One Rule to match them all. I’d be lying.

Many developers have difficulty defining regular expressions; even more unknowingly fail to understand the appropriate usage of regexes and how and when to apply them. My experience has shown that regex validation of CCNs is the most commonly misused of all because developers fail to differentiate the subtle underpinnings:

  • Does it resemble a credit card number?
  • Is it probably a credit card number?

Despite their apparent similarities, the two questions and their answers are not the same.

Does it Resemble a Credit Card Number?

Resemblance is, by far, the most common type of validation performed on CCNs. Any decent shopping cart application or payment processing system will perform syntactic or basic regex validation at the time a customer enters their payment information; the goal is to assist the customer and help ensure that they have entered the correct number. Some use server-side validation, others use JavaScript client-side validation — the better ones use both.

During normal card number validation for commerce purposes, the widest possible net should be used — allowing anything that basically resembles a credit card number to pass input verification. Once basic resemblance is assured, rely on third-party merchant services to pass judgment on card validity and acceptance.

The most basic generally accepted rule for validating credit card numbers is that Visa cards start with a “4”, Mastercard starts with “51” through “55”, and Discover usually starts with “6011” or “65” — all card account numbers of which being generally 16 digits long. American Express card numbers start with “34” or “37”, and are 15 digits long. Almost all modern credit card numbers end with a Luhn-10 (or modulus-10) checksum digit.

Tossing usability aside, most single-field form-based input of credit card numbers disallow the use of delimiters, and expect the numbers to be specified in a single grouping (without spaces or dashes). Given those basic assumptions, most use a basic regular expression such as the following to validate the majority of CCNs:

\b(?:3[47]\d|(?:4\d|5[1-5]|65)\d{2}|6011)\d{12}\b

For non-commerce applications (or for the abysmally few commerce sites that embrace usability), 16-digit card numbers may contain spaces or dashes between blocks of four digits (e.g. NNNN-NNNN-NNNN-NNNN), while American Express card numbers are sometimes separated into blocks of four, six, and five digits (e.g. NNNN-NNNNNN-NNNNN). In order to cast an even wider net that also would match account numbers with dashes and spaces, the following regex could apply:

\b(?:3[47]\d{2}([\s-]?)\d{6}\1\d|(?:(?:4\d|5[1-5]|65)\d{2}|6011)([\s-]?)\d{4}\2\d{4}\2)\d{4}\b

Author’s Note: For simple form validation it might be easier to simply strip out dashes and spaces after input (e.g. s/[\s-]//g), then apply the simpler of the two regexes above. There’s no reason verification can’t occur in two or more steps.

After simple regex matching, the next most-common step is verification of the trailing check digit, a mathematical algorithm designed to detect transpositional typing errors (e.g. “4123…” instead of “4213…”). However, for use in e-commerce, blanket reliance on Luhn checks is cautioned:

“While the great majority of 16-digit Visa cards will pass modulus-10 checking, acquirers and merchants should be aware that some valid 19-digit debit cards may not pass modulus-10 checking. It is recommended that modulus-10 checking not be performed, particularly if both debit and credit are accepted.” — Visa Transaction Acceptance Device Guide 2.0 (March 2011)

Regex engines do not utilize mathematical functions or formulas during inspection, therefore Luhn checking is not possible with regular expressions.

Is it Probably a Credit Card Number?

Unlike form input validation that matches anything that positively resembles a credit card number, DLP applications or systems that create alerts or notifications based on the discovery of CCNs must essentially negatively ignore everything that doesn’t strongly resemble a credit card number. There’s a huge difference between asking “Hey, this is a credit card number; do you agree that it looks like a credit card number?” and asking “Hey, I’ve found something; do you think it’s a credit card number?”

While ideal for lightly vetting payment accounts, using wide open net patterns that merely check for basic resemblance to credit card numbers generate too many false positives. Most vendors and developers haven’t realized this, as evidenced by regexes found in their documentation or published by employee moderators in their support forums.

  • OpenDLP matches "4563 9601 2200-1999" with its Visa regex, but won’t match "4563960122001999", making their regex somewhat useless — but, hey, it’s open-source software, right? Fix it yourself.
    (\D|^)4[0-9]{3}(\ |\-|)[0-9]{4}(\ |\-|)[0-9]{4}(\ |\-|)[0-9]{4}(\D|$)
  • WebSense similarly matches "4563\9601-2200\1999" with its Visa regex, but misses the legitimate "4563960122001999". To their credit, WebSense indicates that improperly written regexes “can create many false-positive incidents intercepted on the system, can slow down Websense Data Security, and impede analysis.” They fail to mention that “stupidly written regexes will not match anything useful.”
    \b(4\d{3}[\-\\]\d{4}[\-\\]\d{4}[\-\\]\d{4})\b
  • McAfee HDLP catches a lot more than the other two, but overextends matches within random numerical data like "234563 9601-2200 1999-12" with its regex. Does it look like there’s a credit card number in there? McAfee says there is, wasting your time.
    (4\d{3})(-?|\040*)(\d{4}(-?|\040*?)){3}

All three of the above solutions are doing it wrong; they should be using a narrower net, decreasing volume of matches in favor of focused probability — allowing everything to pass through that doesn’t match a much stricter credit card validation process. Even regular-expressions.info, a widely respected reference on regular expressions, written by the creator of RegexBuddy (the defacto regex tool for Windows platforms), incorrectly advises that “unless your company uses 16-digit numbers for other purposes, you’ll have few false positives.” The author then apocryphally suggests that the following regex is “the only way” to find card numbers with spaces or dashes in them:

\b(?:\d[ -]*?){13,16}\b

I’m sure you realize how catastrophic that regex would be to use for discovery purposes across an entire file system, resulting in a myriad false positives such as "37 36 35 34 2011-12-31".

Of the remaining major DLP players not mentioned above, only Symantec and Code Green Networks appear to be doing it right, cross-referencing intelligent datastores with Luhn-validated results from sophisticated regexes that match optional separators and eliminate likely false positives by ignoring sequential or repeated digits. Code Green Networks, for example, uses the following regex pattern that matches most major 15- or 16-digit credit card numbers:

\b(3[47]\d{2}([ -]?)(?!(\d)\3{5}|123456|234567|345678)\d{6}\2(?!(\d)\4{4})\d{5}|((4\d|5[1-5]|65)\d{2}|6011)([ -]?)(?!(\d)\8{3}|1234|3456|5678)\d{4}\7(?!(\d)\9{3})\d{4}\7\d{4})\b

Regular expressions alone cannot establish viable probability, especially wide nets.

Author’s Note: My integrity compels the disclosure that I personally created the credit card number regular expression above for Code Green Networks.

Recommendations

Form validation of credit cards requires the following steps:

  1. Check the credit cards against the widest-possible net regex; you don’t want to reject valid payment methods for a customer who is actively in the process of sending you money. Nothing’s stopping Danish electronic payment provider Nets Holdings (IIN: 457123) from issuing debit card number 4571 2345 6789 0111, a seemingly obviously fake account number that passes all regex, mod-10, and IIN-assignment tests.
  2. Perform Luhn-10 validation only when applicable, warning customers appropriately of potential mistypes, applying rejection only with 100% certainty. You must make the business decision yourself as to whether or not the less-typical 13-, 17-, 18-, or 19-digit credit, debit, or prepaid payment cards found mostly outside the United States should be supported.
  3. Validate the credit card number through common merchant services, such as Verified by Visa or MasterCard Developer Zone; or — if not for e-commerce — match the CCN to ones stored in your clean database (or, better yet, match a seeded MD5 of the CCN to a list of seeded hashes in your database to prevent unnecessary local storage of credit card data in the clear).

Applications using credit card filtering or discovery methods require a different approach:

  1. Match potential credit card data against the narrowest-possible net regex, eliminating up front 80-90% of false positives. It is probably better to miss one real credit card number than match and manually review 1,000 invalid account numbers, but that again is a business decision you must make for yourself.
  2. Always perform Luhn-10 validation, reducing false positives by 90%, while failing to match an acceptable estimated 9 ten-thousandths of one percent (0.0009%) due to a small number of cards issued without a Luhn-10 checksum digit.
  3. If possible, verify each potential match against a clean datastore of all credit card numbers in which you are interested. Based on the most common CCN format, up to one billion account numbers per IIN will pass regex and Luhn-10 validation — registering ten million clean credit card numbers to cross-check inspection results will reduce 99% of false positives.

    Author’s Note: I have yet to see a customer have a completely “clean” registration database. Every customer has fake or false data in their datastore that must be filtered out prior to registration and inspection — every field of fake data that you allow through registration will generate false positives.

If you’ve gotten this far, be sure to read my “sticky” page on credit card number regexes.



LEGO TARDIS

Over a ten-day period last June, a LEGO® TARDIS slowly materialized in my living room. For those not in the know, a TARDIS is a space- and time-traveling vehicle from the British science-fiction show, Doctor Who.

Consisting of over five pounds of Bright Blue #23 LEGOs blocks and plates plus a few black, white, and clear pieces, the mini TARDIS is just over 15 inches from base to beacon.

A few special features include a cookie-jar-like design — the one-brick-wide walls surround an empty core and the roof of the TARDIS lifts off, the light beacon making a convenient pull knob. Within the base and the left-rear column are two one-by wiring conduits, making future lighting or sound customizations a much easier proposition. The Stage #3 photo above shows the rear of the TARDIS; you should be able to make out the entrance hole of the conduit at the base of the corner column. The front door is removable, although not easily; minor deconstruction is required.

Photo © Richard D. LeCour

More than 1,500 individual pieces were used in the construction, sourced from my son’s LEGO collection, six different LEGO stores, special orders from stores in several states, and even a delivery from Poland.

Initially, my design was more modular: I pre-built the door and window panels separately, fitting them on to the base between columns and using the roof to lock in the doors. However, the door panels are structurally unstable — every time I picked up the TARDIS to view at different angles, it imploded. A new computer-aided design allowed for a near uni-body construction of all but the front door panel to allow its removal.

Want Your Own?

Commission rates for this TARDIS model start at $750, and do not include shipping and handling charges or signage. Please contact me for additional details.

Desktop and Tablet Wallpapers

Several sizes of wallpapers of the artist’s proof model are available below.

768×680, 1080×960, 1366×768, 1440×900, 1920×1080, 2560×1440, 2560×1600, and 2880×1800.

Click the link of the desired size to open the wallpaper in a new window. Right-click and choose “Save Image As…” to save the image onto your computer.

If you post these on your own site, I ask that you provide proper credit to me as the artist, and also please don’t hot-link the images directly from my site. Unauthorized commercial use of the photographs is prohibited.

Update

Almost two years after building was complete, in April 2013, I transported the TARDIS to Davenport Beach near Aptos, California — it’s the closest I could get to somewhere that approximated Bad Wolf Bay. The pictures of the completed TARDIS and wallpapers were added to the site in June 2013, included computer generation of the signage. Due to transport difficulties for the art piece, portions of the bottom and top fourths were glued to ensure its permanency.


Which Airfare Website is the Cheapest?

About a decade ago, I spent a few years as a freelance travel agent as a way to gain a little extra income helping others get good travel deals, intending also to take advantage of the perks and low rates often offered to travel agents.

As soon as I began my exciting new career — my timing as perfect as always — the airlines stopped paying straight percentage commissions (usually at least 10%), deciding instead to cap commissions to just $25 per ticket. Even those tiny payouts were significantly reduced a short time later, leaving cruises and resort vacations as the only big money makers for travel agencies, areas in which I had very little interest. By 2002, led by Delta Airlines, all US carriers had eliminated agency commissions altogether. I ducked out of travel soon thereafter because the business model just didn’t work, however the experience taught me a lot about the industry.

While 80% of all airline ticket purchases where made through brick-and-mortar travel agencies ten years ago, only 51% were booked through such agencies in 2004, and that number will continue to fall (or at least shift) as Internet-only “agencies” replace the hands-on, personal service that travel agents have traditionally provided. Thanks to the elimination of all airline ticket commissions, agencies (including the Internet-based ones) are forced to rely on their own internal, often hidden fee structures to add black to the bottom line, leaving the consumer a bit befuddled as where the best prices can be found.

Each of the major travel websites and aggregators was identically put to the test with four itineraries, involving many hours of agonizing tedium and several hundred searches. I also contacted four independent non-Internet-based travel agencies. Sadly, only one agency responded to my query and their quoted fares are included on each of the itineraries at the bottom of the table. Because I had to wait 36 hours before the first emailed response on Monday morning, I did not include the travel agencies in the final analysis because they would not have had access to the same Saturday-night rates as the travel websites, skewing the results.

Travel Website Pricing Comparison

Travel WebsiteLAX to SFOJFK to MIABOS to LONSJC to SJDTotal Cost
AirFare$274$125$798$434$1,631
AirlineConsolidator$207$137$815$414$1,573
CheapoAir$191$102$778 $379$1,450
CheapTickets$172$102$919$378$1,571
Expedia$177$107$781$381$1,446
Farecast$169$100 $937$459$1,665
Hotwire$169--------------$374N/A
Kayak$169$100 $907$429$1,605
Lessno$179$109$922$389$1,599
Mobissimo$169$100 $778 $379$1,426
OneTravel$182$115$788$389$1,474
Orbitz$173$104$919$380$1,576
Priceline$167 $100 $917$373 $1,557
Sidestep$169$100 $907$429$1,605
Travelation$194$125$798$584$1,701
Travelocity$174$106$922$381$1,583
UltimateFares$198$131$946$389$1,664
Vayama-------$110$783$384N/A
Travel Agency$259$100 $983$459$1,801

The Test Itineraries

Itinerary #1:

Los Angeles to San Francisco; nonstop; leaving next Wednesday before noon, and returning Thursday between 3:00 p.m. and 6:00 p.m.

Most sites selected an American Airlines flight as the best choice. While Priceline managed to beat the American fare by $2 by switching to Virgin America, unfortunately CheapoAir, OneTravel, Travelocity, and UltimateFares (who all also chose Virgin) were not able to meet Priceline’s low price. AirFare was the only website to choose Delta, a costly mistake. Vayama would not price flights less than four days from departure, therefore they were ignored during the first scenario.

Itinerary #2:

New York’s JFK to Miami; nonstop, one-way; leaving on May 20.

All of the websites returned the same two flights that matched the criteria best, one flying American Airlines, the other on Delta. Kayak, Mobissimo, and Sidestep had slightly better pricing (by $2) on American, while AirFare and CheapoAir went with Delta. The rest of the websites priced both carriers identically, with Priceline once again in the leader pack. Hotwire does not quote prices on one-way trips, forwarding visitors to its parent company of Expedia instead.

Itinerary #3:

Boston to London (either airport); one stop; leaving anytime on a Thursday or Friday in June, returning 10-12 days later on a Monday or Tuesday, also in June.

The websites were split between two different carriers with this scenario. The best fares were from those eight websites that included Iberia Airlines in their search, AirlineConsolidator being the worst of those, while Mobissimo and CheapoAir battled for first place. The rest had fares greater than $900 due to selecting Air Canada, provider of the second-best-priced flight that matched the criteria, with co-owned Sidestep and Kayak leading that pack. Hotwire prevented me from viewing complex search results for about an hour for this one itinerary; my best guess is that Hotwire might have invoked some form of IP address throttling because my activity during the study may have resembled that of a web spider. Since data for this pricing study was gathered within a two-hour period just before midnight EST in order to compare site results fairly, the delay caused Hotwire’s pricing to reflect the next day’s higher prices, so their pricing was removed from the results of the third scenario.

Itinerary #4:

San Jose, CA to San Jose del Cabo, Mexico; any number of stops; leaving on the morning of August 17, and returning the following Sunday evening.

All websites found the same flights (US Airways on the way out, and returning on Mexicana), except Farecast which recommended US Airways for both segments, and Travelation which used Alaska Airlines for the return; both failures to use Mexicana resulted in higher airfares.

Impressions and User Experience

Hotwire is owned by Expedia, so it was surprising to see different pricing from each. I rather expected both of them to act like Kayak and Sidestep, who merged together in December of 2007 — showing exactly the same results during each respective search.

Several other websites also had a number of technical difficulties: On the very first search, UltimateFares returned a system error without returning results, although it worked just fine after that. Surprisingly, three websites (FarePath, SmartFares, and Travelation) had some browser-compatibility issues with Firefox, causing me to manually enter dates that would otherwise have been selected by selecting a date on a popup calendar. And CheapTickets frequently timed out requests, requiring me to repeatedly re-enter the flight search criteria; when it did work properly it was consistently the slowest of the pack.

Orbitz repeatedly gave me pages full of returning flights outside the departure window I had set, making choosing a returning flight a bit more cumbersome. Expedia doesn’t allow searching by flexible dates for international flights, while CheapoAir and OneTravel did not allow flexible dates for any flights, so it was a chore coming up with good results for the third itinerary, requiring ten manual searches on each website. Kayak, Lessno, Priceline, and a few others allowed for some limited but useful flexibility when searching by date. On the other end of the spectrum, CheapTickets and Orbitz made the process extremely easy, providing completely relevant results with only a single search. Granted, the websites that offered flexible results were never the lowest…

Conclusions

Mobissimo won the price-comparison war among these airfare websites for the chosen itineraries. Not only did the site have the lowest price for one of the scenarios, it had the lowest total price and always came within only a few dollars when it was not the low-price leader. Mobissimo also did a great job determining which of the carriers to examine. Priceline led the pack in value for three of the flights, but missed the significantly cheaper flight in the third scenario. Had Priceline picked the correct flight, it clearly would have won.

The worst results were from Travelation — last place (or close to last) when pricing any of the segments, and highest total price for all four flights. AirlineConsolidator was a major candidate for last place, saved only because its $1,573 total was a lot better than Travelation’s outrageous grand total of $1,701. Coming in only slightly better were AirFare and UltimateFares, all of which failed to produce any decent fares.

I was quite surprised by the results. For the past several years I have relied upon Travelocity, Orbitz, Hotwire, and Sidestep as my four travel search engines of choice, yet none of them had the lowest prices in any of the scenarios. When Travelocity found the same airfare, it consistently beat or tied Expedia, Lessno, and Vayama as I expected — only to be trumped by Orbitz, which itself was consistently beaten by CheapTickets. Hotwire did well (when results were found), as did CheapTickets, although they were both beaten by Priceline every time.

Despite coming in last place for its chosen flight in the first itinerary, CheapoAir should still be considered as a frontrunner for comparison shopping because of its tying win with Mobissimo on the third itinerary and its respectable total price for all flights. Skip Kayak and go to the identical Sidestep instead, as Sidestep produced reasonably good results despite completely missing the boat on the trip to Mexico.

My new fab four: Mobissimo, Priceline (shocker!), CheapoAir, and CheapTickets. I will undoubtedly continue to query Orbitz, Hotwire and Sidestep, despite their mediocre performance, if not just for quick sanity checks.