top of page
Casey Wolf

F23: Week 3—Working toward a Term List

Week 3 involved further paring down of the term list—a process which involves going down the column to assess the data according to three requirements needed to complete the list: clarifying incomplete entries, cross-checking similar entries, and correctly identifying references to content within the letters. With the data coming from an export of an EndNote library housing digital records created from the Pemberton Papers, much of it requires further editing to update the information and align with protocols established since the creation of the digital library. Having originally created and populated the database in 2018 shortly after completing the digitization of volumes 1 and 2 at the Historical Society of Pennsylvania, my knowledge of the players, places, and the world they inhabited has increased. Therefore, decisions regarding uncertainties in both transcription and subject matter are easier to make with more context and knowledge. However, as mentioned before, the process of reducing the complexities of life and the world into machine-readable ones and zeroes remains a complicated one.


Clarifying incomplete entries was the most simple of the three requirements. The correspondents and writers within the letters did not always include a full name or the full name was not legible to me at the time I was writing metadata for the EndNote library. As a result, the term list included references that included PRINT’s standard notation for encoding uncertainty—square brackets with assumed information in the middle, if present; if not, an ellipses in the middle. For example, “Chapman, [...]” indicates no first name was given, while “C[...], John” indicates the first letter of the last name and the first name were written but the remainder was illegible. In these instances, I searched these terms within the EndNote library to find the associated record and attached image. Now with a few more years of transcription experience, I clarified several records that were illegible before.


Cross-checking similar entries proves a bit more complicated, especially given the strictures required by authority standardization. The first step was to run down the column and connect potential matches. These were linked by putting the line numbers of potential matches together in column B. Some still remain uncertain due to lack of other inclusions in the letters, but others were successfully connected to each other. A standardized spelling of the name was decided and any spelling variations were encoded into column D. This ensures that common spelling deviations, by either a writer of the time or end-product user in our time, still direct to a relevant entity with aligning the data with linked open data standards. It also limits what terms and names project collaborators can deploy when further encoding PRINT documents, reducing the amount of time dedicated to data cleanup.


Questions about correctly identifying references within the contents of the letter remain. Often, letter writers will mention those who traveled with them, where they were traveling, and when. Mentions such as these present a significant source of information to identify an individual’s presence in place or time where the historical record is otherwise silent. Having encoded these instances in EndNote with curly brackets following the individual’s name, information regarding their mention was pulled out into Column C. Capturing these mentions and the associated records are proving to be complicated from a database-friendly, query retrieval standpoint due to the complexity of the relationship that connects them. However, it is an important part of PRINT’s mission to connect correspondents within the networks through which they moved. Despite cross-checking to individuate people within the records, many remain unidentified due to lack of context or a larger data set. More attention to detail paid toward encoding mentions of people traveling or living alongside letter senders or receivers increases the likelihood they are connected to more context within larger data sets through proper metadata identification and digital connection.



An excerpt from the Term List CSV file showing the uses of the different columns


Comments


bottom of page