Erjia Yan, Ph.D.

I am an assistant professor at Department of Information Science, College of Computing and Informatics (CCI) at Drexel University. My research interests lie in informetrics and scientometrics, scholarly data mining and analysis, and knowledge diffusion studies. My research helps to provide a vital empirical foundation for many facets of scientific activity, such as the propagation of innovations, the promotion of better and more transparent science policymaking, and the development of an equitable and sustainable scientific workforce.

Recent Projects:

Open access journal impact

Open Access Closed access journals have a noticeable advantage in social sciences, while open access journals perform well in medical and healthcare domains. After controlling for a journal’s rank and disciplinary differences, there are statistically more closed access journals in the top 10%, Quartile 1, and Quartile 2 categories as measured by CiteScore; in contrast, more open access journals in Quartile 4 gained scientific impact from 2011 to 2015. Considering dynamic and disciplinary trends in tandem, we find that more closed access journals in Social Sciences gained in impact, whereas in Biochemistry and Medicine, more open access journals experienced such gains. Read more

R software mention and citation network analysis

co-mention network We developed a software entity extraction method and identified 14,310 instances of R packages across the 13,684 PLoS journal papers mentioning or citing R. A paper-level co-mention network of these packages was visualized and analyzed. We found that the discipline and function of the packages can partly explain the largest clusters. The study offers the first large-scale analysis of R packages’ extensive use in scientific research. As such, it lays the foundation for future explorations of various roles played by software packages in the scientific enterprise. Read more

Research funding vs. citation impact

research funding Using a regression model with Heckman bias correction, we find that funding has a positive, significant association with a paper’s citations in STEMM fields. Further analyses show that this association is magnified by the factors of multiple authorship and multiple institutions. For funded papers in STEM, multi-author and multiinstitution papers tend to receive even more citations than single-authored and single-institution papers; however, funded papers in Medicine received less gain in citation impact when either factor is considered. Based on the finding that funding support has a stronger association with citation impact when it is treated as a binary variable than as a count variable, this study recommends the allocation of funding to researchers without active funding support, instead of giving awards to those with multiple funding supports at hand. Read more

Data set mentions and citations

data access This study provides evidence of data set mentions and citations in multiple disciplines based on a content analysis of 600 publications in PLoS One. We find that data set mentions and citations varied greatly among disciplines in terms of how data sets were collected, referenced, and curated. While a majority of articles provided free access to data, formal ways of data attribution such as DOIs and data citations were used in a limited number of articles. In addition, data reuse took place in less than 30% of the publications that used data, suggesting that researchers are still inclined to create and use their own data sets, rather than reusing previously curated data. This study provides a comprehensive understanding of how data sets are used in science and helps institutions and publishers make useful data policies. Read more

Word semantic change

word semantic change We find that for the selected words in PubMed, overall, meanings are becoming more stable in the 2000s than they were in the 1980s and 1990s. At the topic level, the global distance of most topics is declining, suggesting that the words used to discuss these topics are stabilizing semantically. At the word level, this study identifies two different trends in word semantics, as measured by the aforementioned distance metrics: on the one hand, words can form clusters with their semantic neighbors, and these words, as a cluster, coevolve semantically; on the other hand, words can drift apart from their semantic neighbors while nonetheless stabilizing in the global context. In relating our work to language laws on semantic change, we find no overwhelming evidence to support either the law of parallel change or the law of conformity. Read more

Domain-independent term extraction

Word frequency distribution This study developed an efficient, domain-independent term extraction method to extract disciplinary vocabularies from a large multidisciplinary corpus of PLoS ONE publications. It finds a power-law pattern in the frequency distributions of terms present in each discipline. The salient relationships amongst these vocabularies become apparent in application of a principal component analysis. For example, Mathematics and Computer and Information Sciences were found to have similar vocabulary use patterns along with Engineering and Physics; while Chemistry and the Social Sciences were found to exhibit contrasting vocabulary use patterns along with the Earth Sciences and Chemistry. Read more

Faculty hiring network analysis

faculty hiring network This study examines academic ranking and inequality in library and information science (LIS) using a faculty hiring network of 643 faculty members from 44 LIS schools in the United States. We study academic inequality using four distinct methods that include downward/upward placement, Lorenz curve, cliques, and egocentric networks of LIS schools and find that academic inequality exists in the LIS community. We show that the percentage of downward placement (68%) is much higher than that of upward placement (22%); meanwhile, 20% of the 30 LIS schools that have doctoral programs produced nearly 60% of all LIS faculty, with a Gini coefficient of 0.53. We also find cliques of highly ranked schools and a core/periphery structure that distinguishes LIS schools of different ranks. Read more

Journal knowledge trading analysis

entropy This study employs a set of trading based indicators to assess sources’ trading impact. These indicators are applied to several time-sliced source-tosource citation networks that comprise 33,634 sources indexed in the Scopus database. Results show that several interdisciplinary sources, such as Nature, PLOS ONE, Proceedings of the National Academy of Sciences, and Science, and several specialty sources, such as Lancet, Lecture Notes in Computer Science, Journal of the American Chemical Society, Journal of Biological Chemistry, and New England Journal of Medicine, have demonstrated their marked importance in knowledge trading. Furthermore, this study also reveals that, overall, sources have established more trading partners, increased their trading volumes, broadened their trading areas, and diversified their trading contents over the past 15 years from 1997 to 2011. Read more

More research outputs can be found at Research.