Welcome to my information science doctoral program portfolio. Connect with me through E-mail, LinkedIn, and Goodreads. Below is my blog, where you can read about recent conferences, workshops, conferences, publications, fellowships, and other news items.
On Friday, January 24th, the Metadata Research Center and Drexel CCI hosted the LEADS Forum, a full-day workshop to celebrate the research outputs from two years of Library Education and Data Science for the National Digital Platform (LEADS-4-NDP) data science fellows, and hear from advisory board members, mentors, early-to-mid-career professionals, and special guests from OCLC and the Library of Congress.
The early-to-mid career panel contributed some great ideas for how we can expand the scope LEADS program. The breakout sessions brainstormed some ideas on what skill sets should be emphasized for future instances of LEADS, and how they might envision a model of LEADS involving doctoral students and early-to-mid career professionals working together. We also heard from special guest, Digital Strategy Directorate at the Library of Congress, Laurie Allen. Laurie shared news about exciting projects and opportunities at LC Labs.
LEADS PI Jane Greenberg and I took a break during lunch to catch a quick photo together.
Comprehensive Exams ✥
On Thursday, October 24th, I passed the comprehensive written and oral exams, proceeding into the phase of doctoral candidacy. My committee consisted of Dr. Alex Poole, Dr. Weimao Ke, and my advisor, Dr. Jane Greenberg.
I am thrilled to have passed this major milestone, and look forward to the next stage: preparing to defend a doctoral proposal.
On Thursday, July 11th, I presented a paper with Temple University's Peter Logan at Digital Humanities 2019 in Utrecht, Netherlands.
Sam Grabus presenting at DH 2019, in Utrecht, demonstrating how the HIVE tool maps naturally-extracted keywords to controlled vocabulary terms.
The presentation, entitled, "Knowledge Representation: Old, New, and Automated Indexing," shared comparative topic relevance results from automatically indexing 19th century Encyclopedia Britannica entries with two controlled vocabularies: an historical knowledge organization system developed by Ephraim Chambers, as well as the contemporary Library of Congress Subject Headings.
Exploring Utrecht ✥
I had the great privilege of exploring the magical city of Utrecht during my stay in the Netherlands.
Highlights included exploring a local bookstore, Broese boekverkopers, enjoying craft beer and mussels at Brouwerij Oudaen, going on a guided tour of Museum Speelklok, and wandering the gorgeous canals that run through the city.
A personal favorite was Ethiopisch Restaurant Sunshine, where I experienced incredible hospitality, vibrant conversation, and pride in sharing their culture with me. The owner brought me the very best soul food, served with the traditional Injera (thin flatbread, similar to pancakes, that you use to scoop up your food). I was also delighted to try his Teji, which is Ethiopian mead, or honey wine. It was made with Ethiopian hops, so the taste was somewhere between beer and wine.
Boot Camp ✥
The LEADS-4-NDP 2019 fellowship program kicked off this week with a 3-day data science boot camp at Drexel University's College of Computing and Informatics. Eleven fellows from iSchools across the U.S. are paired with nine National Digital Platform partner sites for 10-week remote internships to address data science challenges.
Boot camp sessions included big data management; metadata; data pre-processing; data visualization; data mining and machine learning; large-scale and parallel computing, and automated data analytics tools. As part of the boot camp, LEADS mentors OCLC's Jean Godby and DCIC's Richard Marciano shared about data science opportunities at their institutions; And LEADS mentors Steven Dilliplane, Academy of Natural Sciences, and Peter Logan, Temple University's Digital Scholarship Center, participated in boot camp activities.
Read more about the LEADS program HERE.
LEADS-4-NDP Fellowship Research ✥
I was very fortunate to be a part of the "Library Education and Data Science for the National Digital Platform" fellowship during the summer of 2018. The goal of the fellowship program: preparing the next generation of LIS faculty so that they can meaningfully integrate data science and LIS education. The program consisted of three parts: Online preparatory curriculum; 3-day data science boot camp; and a 10-week virtual data science internship with a LEADS-4-NDP project partner.
I had the opportunity to work with Peter Logan, Academic Director of the Digital Scholarship Center (DSC) and Professor of English at Temple University. The DSC is working on "The 19th-Century Knowledge Project," which involves the digitization of 4 historical editions of the Encyclpedia Britannicas (3rd, 7th, 9th, and 11th editions, spanning fron 1797-1911).
Goals of the 19th-Century Knowledge Project:
- Long term goal/Broad data science question: Investigate how the specification of concepts change over time across 4 historical Encyclopedia Britannicas (1797-1911).
- Short term goal: Automated descriptive subject metadata creation for integration into the individual encyclopedia entry XML TEI headings
- Generating subject metadata for a very large corpus
- Linguistic idiosyncrasies of the primary source materials:
- Alternate spellings
- Obsolete & regional word usage
- Misspellings, and more
- Wanting Controlled vocabulary terms for the sake of interoperability & metadata consistency
- Automated keyword extraction
- Transformation of keywords into controlled vocabulary terms
- Possibility of indexing with multiple controlled vocabularies
- **Large-scale automatic generation of metadata with Controlled Vocabulary terms**
Moving Forward ✥
Reserach is moving ahead, beyond the scope of the fellowship, with a linked vocabulary/automatic indexing tool called HIVE (Helping Interdisciplinary Vocabulary Engineering), a tool hosted at Drexel University's Metadata Research Center, which integrates controlled vocabularies for the purposes of generating automatically-indexed controlled vocabulary subject headings. Current iterative expeirmental research is refining the RAKE keyword extraction algorithm parameters (e.g., minimum word frequency) to determine which settings return the most relevant results for encyclopedia entries of various lengths. Encyclopedia entry word counts range from less than ten words to over 150,000 words, so the relevancy of the results is not consistent without adjusting the algorithm parameters.
Presenting at Temple ✥
My fellowship and subsequent research on the 19th-Century Knowledge Project has generated much interest among the librarians and staff at Temple University's Paley Library. I am pleased to announce that I will be presenting my current progress next Monday, February 25th. Below you may view the slidedeck, which includes the output from my fellowship work as well as current exploratory analysis and next steps.