Validating an email address is a classic requirement of almost any web application. Most of the modern client-side and server-side web frameworks provide native methods to fulfill this need: however, as many web developers know all too well, the methodologies used for validating email addresses do not always return the same result - and what is considered "valid" for them does not always correspond to the desired result for our specific case.
To better understand such concept, consider the following email addresses:
- Abc\@[email protected]
- Fred\ [email protected]
- Joe.\\[email protected]
- "Abc@def"@example.com
- "Fred Bloggs"@example.com
- customer/[email protected]
- [email protected]
- !def!xyz%[email protected]
- [email protected]
As strange as it might seem, all the above e-mail addresses are "valid": or at least they were, according to RFC 2822 section 3.4.1, until it was obsoleted by RFC 5322. However, even RFC 5322 takes as "valid" email addresses using a syntax that are widely considered to be simultaneously too strict (before the "@" character), too vague (after the "@" character), and too lax (allowing comments, whitespace characters, and quoted strings).
Here's another small list of "valid" email address as per RFC 5322:
- joe.blow@[IPv6:2001:db8::1]
- joe.blow(comment)@example.com
- joe.blow(comment)@(another comment with spaces)example.com
- "Joe..Blow"@example.com
... and so on. As we can see, we're still far from what we need for practical use.
An "almost perfect" solution came with the release of the HTML living standards, which introduced a definition based upon a "willful violation" of RFC 5322 to overcome the above issues. The new definition was even backed up with a JavaScript and Perl-compatible regular expression that can be used to properly implement it:
1 |
/^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/ |
The above Regex is good enough to cut out most of the "odd address" above - and that's the reason why I've used it for a lot of my personal and business apps, as well as suggesting it to my fellow developers or colleagues. The only real issue I have with it with th it still allows the following:
- admin@localhost
- abc@cba
As a matter of fact, there is nothing wrong with the above e-mail addresses: such "dot-less" format is definitely valid and do have sense in most scenarios - for example, if we need to support "intranet" e-mail addresses or similar scenarios. However, when implementing a web-based service for external users, we might want to exclude those "dot-less" e-mail address from the valid ones.
For that very reason, I've ended up implementing my own C# helper class that can be used to validate e-mail addresses with or without dots.
Here's the source code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
public static class EmailValidator { /// <summary> /// ref.: https://html.spec.whatwg.org/multipage/forms.html#valid-e-mail-address (HTML5 living standard, willful violation of RFC 3522) /// </summary> public static readonly string EmailValidation_Regex = @"^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$"; public static readonly Regex EmailValidation_Regex_Compiled = new Regex(EmailValidation_Regex, RegexOptions.IgnoreCase); public static readonly string EmailValidation_Regex_JS = $"/{EmailValidation_Regex}/"; /// <summary> /// Checks if the given e-mail is valid using various techniques /// </summary> /// <param name="email">The e-mail address to check / validate</param> /// <param name="useRegEx">TRUE to use the HTML5 living standard e-mail validation RegEx, FALSE to use the built-in validator provided by .NET (default: FALSE)</param> /// <param name="requireDotInDomainName">TRUE to only validate e-mail addresses containing a dot in the domain name segment, FALSE to allow "dot-less" domains (default: FALSE)</param> /// <returns>TRUE if the e-mail address is valid, FALSE otherwise.</returns> public static bool IsValidEmailAddress(string email, bool useRegEx = false, bool requireDotInDomainName = false) { var isValid = useRegEx // see RegEx comments ? email is not null && EmailValidation_Regex_Compiled.IsMatch(email) // ref.: https://stackoverflow.com/a/33931538/1233379 : new EmailAddressAttribute().IsValid(email); if (isValid && requireDotInDomainName) { var arr = email.Split('@', StringSplitOptions.RemoveEmptyEntries); isValid = arr.Length == 2 && arr[1].Contains("."); } return isValid; } } |
As you can see, the validation process relies upon the IsValidEmailAddress static function, which accepts the following parameters:
- email : the e-mail address to check / validate
- useRegEx : TRUE to use the HTML5 living standard e-mail validation RegEx, FALSE to use the built-in validator provided by .NET (default: FALSE)
- requireDotInDomainName : TRUE to only validate e-mail addresses containing a dot in the domain name segment, FALSE to allow "dot-less" domains (default: FALSE)
The above code has been released under MIT license, meaning that you're free to use it for any project or use it to develop your own e-mail validatior function.
Conclusion
That's basically it: if you like the above code, feel free to give us a feedback in the comments section of this post.