How to do a "Binary Search": Applying Computer Science Algorithms to Genealogy Research

6 January 2024 by Sherri Mastrangelo

Let’s take the ideas and concepts from certain computer science algorithms, break them down into simpler terms to understand them better, and learn how we can apply them to our genealogy research. No tech skills or coding required! In fact, you may already utilize some of these methods without even recognizing them.

The first algorithm we’ll discuss is called a “Binary Search”.

A binary search reduces the search to half at each step, to save time. Think of it as “divide and conquer”.

If you’ve ever tried to go to a certain page number in a book, say you are looking for page 343, you may have implemented a form of binary search by opening the book in the middle. You’re at page 201, so you know you need to go higher. Instead of turning the pages one by one, you estimate another chunk of pages and now you’re at 356, so you need to go lower. This is basically the concept behind a binary search. You would continue dividing the sections into smaller and smaller results until you find the target page.

Instead of working in a linear pattern, i.e. starting at the beginning of a list and reading the values all the way through the end, a binary search would start in the middle and either ascend or descend the list depending on if the resulting value is greater or less than the target value, repeatedly dividing each new search area, until it either finds the target or determines it is not there. Of course there is more to it when it comes to the writing the formulas to make the algorithm work, and the computer algorithm may be more precise in terms of dividing each resulting search array in half while you have roughly estimated, but I think this explains the general concept we need to implement.

It’s important to note that a binary search will only work with an ordered set of data, like an alphabetized list or a set of files organized chronologically. The list can be sorted ascending or descending, as long as its ordered. If the set of data is compiled randomly this method would not work successfully.

Image above generated with in part with AI tools.

Another example I’ve heard to describe binary search would be that of a detective watching security footage tapes (as pictured above), trying to figure out when a car was stolen overnight. This example might be a bit outdated with today’s technology but the same idea applies. Let’s say there is perfect high resolution video, aimed directly at the car, that disappeared sometime between when the owner parked it at 8 pm and when they noticed it was gone at 7 am. Instead of sitting down and watching eleven hours of video footage from start to finish, which would take all day, even sped up, the detective is going to implement a form of binary search. He will fast forward the video halfway to 1:30 am. If the car is still there, fast forward half of the remaining time to 4:45 am. If the car is no longer there, go back and divide the previous time between 1:30 am and 4:45 am, and so on until the target time is discovered. This method of searching would take significantly less time than watching hours of video footage!

So how can we apply the concept of binary search to our genealogy research?

The best case scenario for applying this method to genealogy research, would be when you have to look through records that are not indexed, or not transcribed correctly (but are still in some order - alphabetically, chronologically, or otherwise) and you need to narrow down your scope quickly.

Some examples of this and additional scenarios:

  1. A large set of record images in a database online, like FamilySearch, that is not indexed (therefore not searchable by name) where you need to find the year in order to narrow down the hunt for your record. I know it can seem overwhelming when you come across these huge image-only datasets to browse - but using the binary search method will help save a lot of time! First confirm there is an order within the collection - are the surnames alphabetical? Or is the information organized by date? Look at the first few pages to give you an idea, then skip ahead to half. Depending on what you need, either skip ahead by another half or backwards by half. Note your page numbers to remember where you have searched. Just be aware often these collections have other, smaller collections within them, and there is usually a small note on the microfilm image when a new section starts.

  2. An alphabetical City Directory where you need to find a certain surname. Maybe it’s a physical book in your hands, or a digital copy you found on Internet Archive or Google Books. You wouldn’t start with “A” and flip page by page if your surname started with an “R”, you’d skip ahead, and then back, as needed. And once you got to the “R’s” you would do the same for the full surname.

  3. You don’t have a date of death for your ancestor, but you know their address and can look them up in City Directories. You know she must have died sometime between the birth of her youngest child in 1931 and the 1940 Census, where her husband is widowed. Instead of looking at all the city directories in a linear fashion, 1932, 1933, 1934, and so on, you can start with 1936 (about halfway), and you’ve likely just cut your search time in half. If she’s alive, you only have 4 more years. If she’s not alive, you work backward - cutting each new search in half.

    Of course you could “get lucky” when searching in a linear fashion, if the information you needed was in first City Directory you checked, for example. This is just an example to illustrate the search method. And remember to check the printing date on City Directories to be precise.

Can you think of another case scenario where this might come in handy?

More computer science algorithms coming soon!

Sources & Further Reading