me

ABOUT ME



I am an Assistant Professor of information science at Drexel University in the College of Computing and Informatics. Prior to joining Drexel in the fall of 2016, I was a postdoctoral researcher and faculty instructor at the University of California, Berkeley, where I taught for the Master of Information and Data Science program in the School of Information. While my Ph.D. is in mathematical science, and I have an MS in applied mathematics and a BA in physics, my research and teaching interests center around data science, computational social science, natural language processing, mathematics, machine learning, scientific programming, and algorithms design.

Download CV

RESEARCH



With the ever-growing presence of data in modern society, it is becoming increasingly important to be able to analyze textual data in meaningful and transparent ways. As a mathematical linguist, my overarching goal is to develop and test mathematical and physical frameworks for understanding the social processes that generate lexical data. This focus of research requires acute mathematical knowledge, collaboration across domains, and, as textual data frequently comes in large volumes, extensive computational skills for working in distributed, cluster computing environments. My contributions to-date include, but are not limited to, the development of novel techniques for the uninformed extraction of meaningful phrases from large text sources, as well as work that challenges a current major theory dating back 15 years regarding language formation. For more detailed information about my research, please find my current publications below.

PUBLICATIONS


Abstract

The outbreak and frequency of violent protest activity since 2010 has been a cause for alarm among policy makers and the public at large and has renewed interest in the study of violent forms of protest action. Until recently, the study of violent protest action, and indeed protest action in general has been limited...

Continue reading

Abstract

Herbert Simon’s classic rich-gets-richer model is one of the simplest empirically supported mechanisms capable of generating heavy-tail size distributions for complex systems. Simon argued analytically that a population of flavored elements growing by either adding a novel element or randomly replicating an existing one would afford a distribution of group sizes with a power-law tail. Here, we show...

Continue reading

Abstract

In this article we present a novel algorithm for the task of comprehensively segmenting texts into MWEs. With the basis for this algorithm (referred to as text partitioning) being recently developed, these results constitute its first performance-evaluated application to a natural language processing task. A differentiating feature of this single-parameter model is its focus on gap (i.e., punctuation) crossings...

Continue reading

Abstract

The task of text segmentation may be undertaken at many levels in text analysis—paragraphs, sentences, words, or even letters. Here, we focus on a relatively fine scale of segmentation, hypothesizing it to be in accord with a stochastic model of language generation, as the smallest scale where independent units of meaning are produced. Our goals in this letter include the development of methods...

Continue reading

Abstract

The emergence and global adoption of social media has rendered possible the real-time estimation of population-scale sentiment, bearing profound implications for our understanding of human behavior. Given the growing assortment of sentiment measuring instruments, comparisons between them are evidently required. Here, we perform detailed tests...

Continue reading

Abstract

We propose and develop a Lexicocalorimeter: an online, interactive instrument for measuring the "caloric content" of social media and other large-scale texts. We do so by constructing extensive yet improvable tables of food and activity related phrases, and respectively assigning them with sourced estimates of caloric intake and expenditure...

Continue reading

Abstract

Background: Twitter has become the "wild-west" of marketing and promotional strategies for advertisement agencies. Electronic cigarettes have been heavily marketed across Twitter feeds, offering discounts, "kid-friendly" flavors, algorithmically generated false testimonials, and free samples. Methods: All electronic cigarette keyword related tweets from...

Continue reading

Abstract

In an effort to better understand meaning from natural language texts, we explore methods aimed at organizing lexical objects into contexts. A number of these methods for organization fall into a family defined by word ordering. Unlike demographic or spatial partitions of data, these collocation models are of special importance for their universal applicability. While we...

Continue reading

Abstract

Twitter, a popular social media outlet, has evolved into a vast source of linguistic data, rich with opinion, sentiment, and discussion. Due to the increasing popularity of Twitter, its perceived potential for exerting social influence has led to the rise of a diverse community of automatons, commonly referred to as bots. These inorganic and semi-organic Twitter entities can range...

Continue reading

Abstract

We demonstrate that the concerns expressed by Garcia et al. are misplaced, due to (1) a misreading of our findings; (2) a widespread failure to examine and present words in support of asserted summary quantities based on word usage frequencies; and (3) a range of misconceptions about word usage frequency, word rank, and expert-constructed word lists. In particular...

Continue reading

Abstract

Natural languages are full of rules and exceptions. One of the most famous quantitative rules is Zipf’s law, which states that the frequency of occurrence of a word is approximately inversely proportional to its rank. Though this “law” of ranks has been found to hold across disparate texts and forms of data, analyses of increasingly large corpora since the late 1990s...

Continue reading

Abstract

With Zipf’s law being originally and most famously observed for word frequency, it is surprisingly limited in its applicability to human language, holding over no more than three to four orders of magnitude before hitting a clear break in scaling. Here, building on the simple observation that phrases of one or more words comprise the most coherent units of meaning in language, we show...

Continue reading

Abstract

Using human evaluation of 100,000 words spread across 24 corpora in 10 languages diverse in origin and culture, we present evidence of a deep imprint of human sociality in language, observing that (i) the words of natural human language possess a universal positivity bias, (ii) the estimated emotional content of words is consistent between languages under translation, and...

Continue reading

Abstract

Profiting from the emergence of web-scale social data sets, numerous recent studies have systematically explored human mobility patterns over large populations and large time scales. Relatively little attention, however, has been paid to mobility and activity over smaller time-scales, such as a day. Here, we use Twitter to identify people’s frequently visited locations along with their...

Continue reading

Abstract

We propose a modification of a NALM-based 2R regenerator of phase-encoded signals which operates at considerably lower input powers than was studied earlier. Our modification consists of replacing the core-matched and lossless fiber coupler in the NALM by a coupler with a propagation constant mismatch and loss asymmetrically distributed between...

Continue reading

Abstract

We explore the potential of the nonlinear amplifying loop mirror (NALM)-based phase-preserving 2R (reamplification and reshaping) regenerator for simultaneous regeneration of multiple wavelength-division-multiplexed (WDM) channels. While not considering nonlinear multi-channel propagation, we address two issues of the phase-preserving NALM that...

Continue reading

TEACHING



I currently teach and develop coursework in data science for Drexel's Department of Information Science at both graduate and undergraduate levels. My past teaching experiences have been set in both traditional environments and under the flipped classroom model, where in-class live sessions are devoted to exercises, projects, and discussions. I have taught and lectured on a range of coursework, from undergraduate algebra and calculus, to research topics in complex systems, as well as machine learning coursework with a specific focus on scalability. Outside of the classroom, I mentor a group of research students interested in machine learning and natural langauge processing, who come from a variety backgrounds and educational levels.

CONTACT