Ancestry’s indexing experiment with firms in China

I follow genealogist Michele Lewis on TikTok. She recently found an unusual Ancestry.com transcription from the 1820 Federal Census. Check out the handwritten first name. What does it look like to you?

ancestry index outsource to china

Now, I get it that a 200-year-old handwritten scrawl can be hard to read. But how could a transcriber even consider “Elizabether” in this case?

I think I know the answer. In 2008, I worked for an online technology publication, The Industry Standard (no longer online). I interviewed Tim Sullivan, CEO of The Generations Network, which was Ancestry.com’s official corporate until 2009. The article was published on October 3, 2008, on the website of The Industry Standard (see image below).

In the interview, Sullivan noted that computers were “not even close” to being able to read handwritten records, especially those from disparate sources such as census records which have many different styles of handwriting.

So Ancestry turned to human transcriptionists. Paid transcriptionists, not volunteers like on FamilySearch. Sullivan told me:

“The vast majority of the investment we’ve made in the last 10 years is not in acquisitions costs or imaging costs, it’s in the indexing costs.”

At the time, Sullivan said Ancestry was paying $10 million per year to transcribe old records. To cut costs, Ancestry hired overseas partners in China where English was not widely spoken, but they can get census records transcribed for less money:

So how did The Generations Network import the data from millions of old census forms into its online database? Sullivan says the company spent about $75 million over 10 years to build its “content assets” including the census data, and much of that cost went into partnering with Chinese firms whose employees read the data and entered it into Ancestry.com’s database. The Chinese staff are specially trained to read the cursive and other handwriting styles from digitized paper records and microfilm. The task is ongoing with other handwritten records, at a cost of approximately $10 million per year, he adds.

If you have ever tried to read old handwriting in an unfamiliar language, I am sure you can appreciate how difficult this task would be. But the lack of quality checks and nonsensical transcriptions is stunning. Keep in mind that Ancestry charges customers lots of money (up to 25% more as of January) but its main focus is generating profit for a string of private equity firms. Its current owner is a Wall Street PE firm, Blackstone Inc. It’s not clear if Ancestry still outsources its transcriptions to overseas firms, or if the OCR technology is good enough to hand off the task to computers.

Regardless, what’s especially frustrating is Ancestry customers have attempted to correct this particular error. The actual name is “Christopher Orr.” They’ve added the correct annotation multiple times, but Ancestry still shows the name from that 200-year-old census return as “Elizabether Orr.” Lots of people searching for this ancestor will never find him, thanks to Ancestry’s cost-cutting moves 15 years ago and lack of quality checks to correct such errors.

As Lewis notes at the end of her video, “Maybe you’re going to have the hand-search the indexes one at a time” to determine what the actual name is.

Archive of “Google stays mum on plans for public documents, Ancestry.com points to OCR hurdle.” By Ian Lamont. Published 10/3/2008, The Industry Standard.

ancestry china outsource index transcription 2008

 

Using paper forms for family genealogy

Last month, my company launched Genealogy Basics In 30 Minutes: The quick guide to creating a family tree, building connections with relatives, and discovering the stories of your ancestors. Professional genealogist Shannon Combs-Bennett wrote the book, which explains basic concepts of interest to anyone researching family origins. As you might expect, the book has sections about family trees, interviewing tips, genetic genealogy, and different type of source records. As an amateur genealogist myself, I expected Shannon to delve into these issues when I read the manuscript. However, I did not expect the topic of using genealogy forms to track research to come up, except perhaps in passing. Instead, it took up the better part of Chapter 4, “Tracking and sharing your research.” Here is how she introduced the topic:

“Tracking includes everything from creating good source citations to outputting data to a chart or tree. Along with preserving research (which we will cover in Chapter 5), it’s one of my least favorite tasks. After the initial excitement of making easy discoveries, it’s so frustrating to deal with tracking and filing and storing all of the information and papers you have found.

On the other hand, charts and other summary documents are a great way to share findings to family members. When you bring a complete pedigree chart to a family reunion, it will attract attention and prompt lots of questions. Be sure to bring copies to give away!”

Part of the reason I was not expecting to see such a deep examination of tracking research using genealogy forms relates to the fact that I use genealogy software to track my own research. The software lets me generate family group sheets, pedigree charts, and other pre-filled forms from my computer.

Not everyone uses family tree software for research, though. They prefer paper, and use blank genealogy forms to enter names, dates, and other information. In addition, as Shannon noted in the book, computers have drawbacks, including the risk of a crash or some other disaster that wipes out the data. Paper genealogy forms provide some reassurance on this front. They also do not require a power outlet!

Shannon and I discussed providing some free resources on the companion website to Genealogy Basics In 30 Minutes. Besides blog posts and tips, I have created a free genealogy forms starter kit that contains a free five-generation pedigree chart:

The pedigree chart contains fields for recording birth, death, and marriage information, and goes back to great-great-grandparents (all 16 of them!). Names are numbered for easy cross-referencing.

UPDATE July 2018: Since this post was written in October 2016, my company has created other genealogy forms, including a kit that brings genealogy for kids!