Project title

What Seattleites were Reading in 2020

Authors

Ben Royce Lauren Nguyen John Harrison Jake Seaberg

Date

5/16/22 Spring 2022

Abstract

Our main question is, how did the interests of Seattle residents shift throughout the year 2020? To address this question, we plan to analyze the change in the most popular genres throughout the year, which titles (excluding classic literature) were checked out the most often, and how the overall checkout rates per month based on changed throughout 2020. These analyses are important to gain deeper insight into the topics that were meaningful to Seattle residents and how Seattle residents used the Seattle Public Library system during the pandemic.

Since we were unable to push the dataset to our GitHub repository, here is a Google Drive link to the (abridged, 1.72-million-row) dataset we worked with.

Keywords

COVID-19, Human Interests, Education, Race Relations

Introduction

As we move further past 2020, it’s difficult to sift through the haze to grasp any specifics of those months in isolation. Between baking our fiftieth loaf of bread, protesting for social justice and watching the news cycle in dread, the whole year mushes together and the lines between our perspectives and those of the rest of the world are blurred. When we were locked inside for a year and a half, it’s safe to say that the media we consumed started to define us; what if we could paint a picture of a population’s beliefs, interests and fears based on something as simple as checkouts from a library? We hope to touch on this question at the level of the city of Seattle, using public checkout data from the Seattle Public Library to get a grasp on what was actually on the city’s mind during lockdown.

  • How did library checkout formats differ as the pandemic continued?
  • What were the most popular titles in 2020, excluding classic literature?
  • What were the most popular genres checked out in each month of 2020, and do the subgenres of materials being checked out indicate any reflection of current events at the time?

Understanding the response of a tumultuous year like 2020 at a citywide scale would provide valuable insight to see how a population reacts in the face of existential threats like COVID, as well as reckoning with systemic racism, political turmoil, and the desire for personal improvement.

The Dataset

This data comes from Seattle Open Data and was collected by the City of Seattle. Whenever a book is checked out at one of the Seattle Public Library locations, information about the book is automatically added to the dataset, which updates on the 6th of every month and runs from April 2005 to March 2022. While the City of Seattle does not explain why they collect this data, we can assume that a catalog of how many times each book is checked out in a month is useful to the libraries because it informs them of which books are more popular than others. This dataset contains 39.9 million observations overall with 11 features, though for the purposes of this project we will only be looking at those checked out in 2020 (1.72 million observations).

When working with this data, we need to consider that the only people represented in the set are those who checked out a library book from a Seattle Public Library. Anyone who does not have a library card, or checks out from another library system, such as the University of Washington’s system, will not be represented.

This data is also limited in its scope of measuring the interests of Seattle because whatever someone is interested in must already have a book written about it. Books take a long time to produce, so it’s unlikely that there were any books about the pandemic during 2020, even though that was certainly an interest on people’s minds. Thus this data will only be useful in modeling the popularity of topics that already have books written about them.

The size of the dataset itself poses a challenge; given that the entire set contains nearly 40 million rows, few applications are equipped to handle it. Before manipulating the data, we have to slice out the subset we want, which in our case is all checkouts during 2020. This leaves us 1.7 million rows, which is still a very large dataset but is small enough to work with more programs. Another problem with the set is that some books do not contain subject headings. Although this does not affect a large proportion of the data, it’s possible for some genres to be underrepresented. We can fill in some of this missing data by analyzing the title of the books ourselves for certain keywords that would allocate them to specific genres.

Implications

Our analysis will help policymakers better understand the interests and sentiments of Seattle residents over the year 2020. It will help identify changes in Seattlites’ interests in social justice, self-help, disaster preparation, and other genres related to salient events in 2020. It will help identify which topics Seattle residents are informing themselves on and which topics they find important. It will help identify how Seattle residents responded to big, unsettling events, which will help policymakers identify how to help residents manage uncertainty. This analysis will also help policymakers understand how Seattle residents engage with the Seattle Public Library system, and identify areas of improvement in resource provision.

Limitations & Challenges

What challenges or limitations might you need to address with your project idea more broadly? Briefly discuss. (at least 150 words)

Limitations:

  • Demographics (age, ethnicity) of library users likely don’t reflect the demographics of Seattle residents

  • From March 13, 2020 until April 27, 2021, most branches were closed to the public in some capacity due to COVID-19. This means lending was restricted to digital copies and curbside pickup, which made it harder for low-income patrons to access library materials.

  • Because library access was limited for that year, those that would typically need the library the most, whether it be for Wifi or books, no longer had access to Seattle Public Library’s resources meaning our data might not be representative of typical SPL visitors.

Challenges:

  • Dataset includes a lot of qualitative data

  • Our paired-down dataset includes 1.72 million rows and includes any form of any material that was checked out in 2020. There were many books that were listed separately in abridged and unabridged versions, some titles include special characters that should not be there, and the publication date column includes much punctuation that should not be there either. In addition, the Subjects column includes both broad and specific subjects, making it difficult to ensure that we have captured every title within a genre when conducting analyses.

Summary Information

## List of 5
##  $ avg_checkouts     : num 504476
##  $ total_checkouts   : int 6053717
##  $ low_checkouts     : num 354572
##  $ max_checkouts     : int 817471
##  $ standard_deviation: num 153496

We found that in the year of 2020, there was a total of 6053717 checkouts from the Seattle Public Library with an average of 504476.4166667 checkouts per month. With that, we found that the total checkouts per month varied greatly from 354572 checkouts in the month of April, to 817471 checkouts in the month of January which is more than double the number of checkouts. This wide range can be exemplified through the high standard deviation of 153495.545771. We found a trend where there were significantly more checkouts from January-March with a sharp decrease in April. This can be explained by the library closures in mid March due to COVID-19. Every month since the closures increased in checkouts from April-December with the exception of October.

Table

CheckoutMonth Checkouts_for_pop_genre1 Popular_genre1 Popular_genre2 popular_title
1 1558 Laptop computers iPad Computer FlexTech–Laptops.
2 1602 NA Headphones / Seattle Public Library.
3 1270 Literature Literature There There: A Novel
4 1894 Juvenile Fiction Juvenile Literature Harry Potter and the Sorcerer’s Stone: Harry Potter Series, Book 1 (unabridged) (Unabridged)
5 1093 Juvenile Fiction Juvenile Literature Harry Potter and the Sorcerer’s Stone: Harry Potter Series, Book 1 (unabridged) (Unabridged)
6 4903 African American Nonfiction Nonfiction So You Want to Talk about Race (Unabridged)
7 2016 African American Nonfiction Nonfiction So You Want to Talk about Race (Unabridged)
8 994 African American Nonfiction Nonfiction So You Want to Talk about Race (Unabridged)
9 835 African American Fiction Fiction Book of the Little Axe
10 569 Literature Literature Where the Crawdads Sing
11 830 Fantasy Young Adult Fiction Reverie
12 775 Literature Literature The Queen’s Gambit

This table shows the two most popular genres of books that were checked out in 2020 from Seattle Public Libraries. Along with that is the rate at which they were checked out as well as the most popular title of that month. The genre labels of ‘Fiction’ and ‘Nonfiction’ were excluded for the most popular genre in order to achieve higher level categorizations. This table in included because it gives us valuable information as to how reading trends have changed throughout the year. The most notable trend being African American literature becoming the most popular genre from June to September. Other than that, we can see titles like, ‘Harry Potter and the Sorcerer’s Stone’ and ‘The Queen’s Gambit’ being popular titles.

Chart 1

As the times have changed, libraries have grown from just spaces to house books into a cultural hub where materials of any medium can be shared - within the dataset, there are over 30 unique categories of materials, from books and CDs to atlases and flash cards. We wanted to see the breakdown of how these materials were used by the general public, while simultaneously seeing how the closure of SPL locations due to the coronavirus pandemic affected checkout rates over the course of the year. We split the dataset into five coded categories; books, eBooks, audio-related materials (CDs, cassettes, audio books), video-related materials (DVDs, VHS tapes) and a catch all “other” category.

When charting the checkout rates over the course of 2020, you’re able to see exactly how hard SPL was hit by COVID; overall checkouts were cut in half, going from over 200,000 in March to just under 100,000 in April. The split in format is also pretty distinct; eBooks jump in share slightly and audio checkouts shrink slightly, but physical books and videos are decimated until August, when curbside pickup was opened early in the month. Overall patronage never recovered from its early highs in 2020, but the continued use of eBooks and digital audio book services showed that Seattlites were still wanting to engage with the materials they had during quarantine. Further exploration is needed to explain why the video category consistently ranked so low; is it simply because SPL only did physical checkouts of videos, or are their streaming services less feasible than something like a Netflix or Hulu, if they’re present at all?

Chart 2

We chose to generate this chart because it displays the books that Seattle residents checked out most often from the Seattle Public Libraries in 2020. We chose to exclude all titles classified as classic literature because those titles skewed our results. People often choose to check out classic literature from public libraries because public libraries’ collections skew towards older publications and it is more convenient to borrow these books (that will only be read once) rather than purchase them. Such titles include Jane Eyre, Treasure Island, Pride and Prejudice and Frankenstein.

Of all non-classical literature titles in 2020, the text that Seattlites were interested in reading the most was Michelle Obama’s memoir, Becoming, followed by Ijeoma Oluo’s So You Want to Talk About Race and Tara Westover’s memoir Educated. Coinciding with protests in 2020 following the death of George Floyd at the hands of police, 40% of the top ten list speaks directly on race in America; notably, aside from Obama and Oluo’s aforementioned texts, Ibram X. Kendi and Robin Wall Kimmerer are the only other authors of color on this list with How to Be an Antiracist and Braiding Sweetgrass.

This chart can serve as a proof-of-concept for what we hope to acheive with our analysis, in that it shows a general consensus of the city’s interests, beliefs and general political ideologies. Checkout trends generally follow what’s nationally interesting; at the end of 2020, Becoming had sat on the New York Times bestseller list for 88 weeks, as did Glennon Doyle’s Untamed with 39 weeks and Braiding Sweetgrass with 34. The presence of so many texts surrounding social justice tracks with the left-leaning nature of the city and the state that it’s in, as does the inclusion of authors of color and authors within the LGBTQ+ community.

Chart 3

We chose this chart to analyze the public’s interest in specific genres over the course of the year. A top 6 was chosen as a compromise between overcrowding the graph and not tracking enough genres to highlight any insights. As we can see, the most popular genres stay fairly regular relative to one another, with the exception of Juvenile Fiction (YA novel), which began being checked out at an incredible rate as quarantine set in. A likely explanation for this finding is that when school was canceled for middle and high schoolers, many of them choose to check out their favorite books to read while stuck at home. Young Adult is a very popular genre with this age group and this would explain the sudden boom in checkouts.

Many of the books’ primary genres were labeled as fiction/nonfiction, which would’ve been the most popular “genres” by far had they been included on the graph. Because splitting up books into these two labels is vague and inconsistent (not every book had this distinction, and some had them elsewhere in their list of subjects), we chose to replace these labels with their secondary, more specific genre. “Literature” refers to the books that were labeled as fiction but had no other listed genre.