Regex for names


Question

Just starting to explore the 'wonders' of regex. Being someone who learns from trial and error, I'm really struggling because my trials are throwing up a disproportionate amount of errors... My experiments are in PHP using ereg().

Anyway. I work with first and last names separately but for now using the same regex. So far I have:

^[A-Z][a-zA-Z]+$  

Any length string that starts with a capital and has only letters (capital or not) for the rest. But where I fall apart is dealing with the special situations that can pretty much occur anywhere.

  • Hyphenated Names (Worthington-Smythe)
  • Names with Apostophies (D'Angelo)
  • Names with Spaces (Van der Humpton) - capitals in the middle which may or may not be required is way beyond my interest at this stage.
  • Joint Names (Ben & Jerry)

Maybe there's some other way a name can be that I'm no thinking of, but I suspect if I can get my head around this, I can add to it. I'm pretty sure there will be instances where more than one of these situations comes up in one name.

So, I think the bottom line is to have my regex also accept a space, hyphens, ampersands and apostrophes - but not at the start or end of the name to be technically correct.

1
23
12/29/2010 4:52:50 AM

Accepted Answer

  • Hyphenated Names (Worthington-Smythe)

Add a - into the second character class. The easiest way to do that is to add it at the start so that it can't possibly be interpreted as a range modifier (as in a-z).

^[A-Z][-a-zA-Z]+$
  • Names with Apostophies (D'Angelo)

A naive way of doing this would be as above, giving:

^[A-Z][-'a-zA-Z]+$

Don't forget you may need to escape it inside the string! A 'better' way, given your example might be:

^[A-Z]'?[-a-zA-Z]+$

Which will allow a possible single apostrophe in the second position.

  • Names with Spaces (Van der Humpton) - capitals in the middle which may or may not be required is way beyond my interest at this stage.

Here I'd be tempted to just do our naive way again:

^[A-Z]'?[- a-zA-Z]+$

A potentially better way might be:

^[A-Z]'?[- a-zA-Z]( [a-zA-Z])*$

Which looks for extra words at the end. This probably isn't a good idea if you're trying to match names in a body of extra text, but then again, the original wouldn't have done that well either.

  • Joint Names (Ben & Jerry)

At this point you're not looking at single names anymore?

Anyway, as you can see, regexes have a habit of growing very quickly...

40
11/8/2008 8:46:36 PM

This regex is perfect for me.

^([ \u00c0-\u01ffa-zA-Z'\-])+$

It works fine in php environments using preg_match(), but doesn't work everywhere.

It matches Jérémie O'Co-nor so I think it matches all UTF-8 names.


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon