There are several important steps in a record linking job:
•Initial Analysis •Data preparation •Record linking •Reviewing results •Producing the final product
Often the actual record linking runs take much less time than the preparation of the data and the analysis of the results, so it is important to plan who will be doing the work for these critical steps.
A typical record linking job usually progresses through these steps:
Initial Analysis and Report
The first step involves preparing a detailed report with an analysis of candidate linking fields and suggestions for cleaning and standardizing the data. It is also possible to estimate the maximum number of records that can be linked using different cut-off values and decide at this point if the results would be satisfactory for the application.
Using initial report as a guide, the data preparation steps are formalized and decisions made about resources to do the work. Depending on the quality of the data, this phase of the project can be very short or may require intensive effort. The better the quality of the data and the fewer number of values representing the same underlying value the better the record linking will work. For example, if there have been a variety of coding options used for a field it is important to pick a standardized value, e.g. changing "male", "m", "1" to "M" and "Feb", "02", "February", "Feburary", etc. to 2 will improve record linking results. Often the work at this step can be done by the data owner, however, we are able to undertake this job on a time and materials basis.
Once the data preparation is complete a detailed record linking plan is prepared which outlines the different record linking passes and the checking required at each stage. Record linking runs can usually be turned around in 24 to 48 hours for jobs with less than 1 million total records. Larger jobs will take longer times.
Using probabilistic record linking results in a score for each pair of records. It is important to check matches that occur around the cut-off value to see if the results are satisfactory. When the result of a record linking job is a genealogy there are a number of reports that are produced. For example, the distribution of children per marriage should reflect the population under study. When record linking parameters are not set correctly and over-linking occurs large sibships are often a symptom. Links can be provided in a variety of formats for checking. Or linked records can be viewed using the GenMergeDB viewer. Once a record linking pass is checked and found satisfactory, additional record linking passes may be done. When building a population from individual record sets each record linking pass adds data to merged individuals, for example birth date and place and parents names from a birth certificate and death date and place and spouse name from a death certificate. Each piece of information adds to the detail for each person and additional record linking passes will take advantage of the composite information to perhaps find additional links.
For some applications the list of matching records is the result. For others it is a genealogy in GEDCOM, XML, or other format. We will produce the required output at your specification.
Copyright © 2022 Pleiades Software Development, Inc.
1338 S. Foothill Suite 324
Salt Lake City, UT 84108
Tel: (801) 560-3587
Call Us Today: (801) 560-3587