TECHNICAL DOCUMENTATION

Technical Report

This report presents a methodology for identifying dataset mentions in research publications across various citation databases and evaluating their differences.

What is the Issue?

Federal datasets are widely used in research, but tracking their impact requires identifying where and how they are cited in publications. Tools like the Food and Agricultural Research Data Usage Dashboard rely on these mentions to measure reach which informs future data investments.

How to Use This Report

The report is preliminary in nature. It provides an initial approach to characterizing dataset mentions about food and agriculture research datasets in research papers reported in various databases, specifically Scopus, OpenAlex, and Dimensions. It includes procedures for:

  • Identifying publication coverage across citation databases
  • Cross-referencing publications between datasets
  • Analyzing research themes and institutional representation

Stakeholder Applications

Research Evaluation
  • Track and evaluate the use of public datasets in academic research
  • Improve methods for measuring research impact and dataset reach
Strategic Planning
  • Understand coverage differences across citation databases
  • Inform decisions about data preservation, access, and investment

Note: The methods described can be applied to evaluate other citation databases such as Web of Science, Crossref, and Microsoft Academic, to name a few.

Report Features

The report features these reusable components:

Code Repository

Data cleaning and standardization tools

Data Schemas

Structured schemas by citation database

Standardized Institution Tables

Institution tables using IPEDS identifiers

Report Highlights

  • Citation overlap across databases is limited; many DOIs appear in only one source.
  • Databases recover different sets of publications depending on the dataset.
  • Topical coverage varies by dataset, reflecting diverse policy and disciplinary uses.
  • Relying on a single database can undercount usage and research scope.
Target Audience

This report is designed for:

  • Researchers
  • Data scientists
  • Program officers
  • Academics
  • Policymakers
  • Local and federal government officials

How to Cite Our Work

Please use the following citation when referencing our methodology.

Chenarides, L., Bryan, C., Ladislau, R., & Lane, J. (2025). Methodology for comparing citation database coverage of dataset usage. Available at: https://laurenchenarides.github.io/compare_scopus_openalex_report/report.html