If a vowel (A, E, I, O, U) separates two consonants that have the same soundex code, the consonant to the right of the vowel is coded.
A variation called American Soundex was used in the 1930s for a retrospective analysis of the US censuses from 1890 through 1920. The Soundex code came to prominence in the 1960s when it was the subject of several articles in the Communications and Journal of the Association for Computing Machinery, and especially when described in Donald Knuth's The Art of Computer Programming, vol. 3: Sorting And Searching, Addison-Wesley Professional (1973), p. 391-392.
The National Archives and Records Administration (NARA) maintains the current rule set for the official implementation of Soundex used by the U.S. Government. These encoding rules are available from NARA, upon request, in the form of General Information Leaflet 55, "Using the Census Soundex".
The Soundex code for a name consists of a letter followed by three (3) numerical digits: the letter is the first letter of the name, and the digits encode the remaining consonants. Similar sounding consonants share the same digit so, for example, the labial consonants B, F, P, and V are each encoded as the number 1. Vowels can affect the coding, but are not coded themselves except as the first letter. However if "h" or "w" separate two (2) consonants that have the same soundex code, the consonant to the right of the vowel is not coded.
The correct value can be found as follows:
- If "h", "w" separate two consonants with the same soundex code, change consonants to right of the vowel into "h" until they have the same soundex code
- Replace consonants with digits as follows (but do not change the first letter):
- b, f, p, v = 1
- c, g, j, k, q, s, x, z = 2
- d, t = 3
- l = 4
- m, n = 5
- r = 6
- Collapse adjacent identical digits into a single digit of that value.
- Remove all non-digits after the first letter.
- Return the starting letter and the first three remaining digits. If needed, append zeroes to make it a letter and three digits.
Additional Soundex Coding Rules (National Archives)
- Names With Double Letters
If the surname has any double letters, they should be treated as one letter. For example:
- Gutierrez is coded G-362 (G, 3 for the T, 6 for the first R, second R ignored, 2 for the Z).
- Names with Letters Side-by-Side that have the Same Soundex Code Number
If the surname has different letters side-by-side that have the same number in the soundex coding guide, they should be treated as one letter. Examples:
- Pfister is coded as P-236 (P, F ignored, 2 for the S, 3 for the T, 6 for the R).
- Jackson is coded as J-250 (J, 2 for the C, K ignored, S ignored, 5 for the N, 0 added).
- Tymczak is coded as T-522 (T, 5 for the M, 2 for the C, Z ignored, 2 for the K). Since the vowel "A" separates the Z and K, the K is coded.
- Names with Prefixes
If a surname has a prefix, such as Van, Con, De, Di, La, or Le, code both with and without the prefix because the surname might be listed under either code. Note, however, that Mc and Mac are not considered prefixes.
For example, VanDeusen might be coded two (2) ways:
V-532 (V, 5 for N, 3 for D, 2 for S)
D-250 (D, 2 for the S, 5 for the N, 0 added).
- Consonant Separators
If a vowel (A, E, I, O, U) separates two consonants that have the same soundex code, the consonant to the right of the vowel is coded. Example:
- Tymczak is coded as T-522 (T, 5 for the M, 2 for the C, Z ignored (see "Side-by-Side" rule above), 2 for the K). Since the vowel "A" separates the Z and K, the K is coded.
- Example:Ashcraft is coded A-261 (A, 2 for the S, C ignored, 6 for the R, 1 for the F). It is not coded A-226.