Show how public data are being used in science so that the government can make more transparent public investments. We use machine learning and natural language processing to enable government agencies and researchers to quickly find the information they need.
The platform provides information about the usage of data assets. It describes how datasets identified by federal agencies have been used in scientific research. It uses machine learning algorithms to search over 90 million documents and find how datasets are cited, in what publications, and what topics they are used to study.
Read more here
The Democratizing Data project is inspired by the 2018 Foundations for Evidence-based Policymaking Act. The platform is designed to facilitate the collaboration between federal agencies and the public for the purpose of understanding how government data assets are used.
Read more here and here
The goal is to develop a community of practice through workshops, webinars, and direct engagement with the user community. The user community built the initial models through a Kaggle competition; a Show Us the Data conference brought together researchers, academic institutions, chief data officers, and publishers; subsequent conferences have built on the ideas. More are planned
Dr. Ray Hart: used National Assessment of Educational Progress (NCES) to examine the effectiveness of large urban schools in overcoming poverty-related challenges. His report, "Mirrors or Windows," challenges myths and provides fresh perspectives on urban education.
Learn more here
Dr. Becca Jablonski: used Agricultural Resource Management Survey (USDA) to study the gap in support for small producers of artisan and locally produced food, relative to their large-scale counterparts. She explores sustainability policies and shares insights for researchers.
Dr. Chen Zhen: used retail scanner data (from USDA) to construct panel price indexes to understand how changes in food prices can affect consumer behavior and ultimately public health outcomes.
Dr. Tiffany Oliver: used Survey of Earned Doctorates (NCSES) to explore the journeys of Black women earning STEM PhDs annually. She discusses her research on this demographic's educational history and postgraduation plans.
Dr. Janet Currie: used the Vital Statistics and the Supplemental Nutrition Program for Women Infants and Children (WIC) to investigate the effectiveness of government programs and maternal participation in improving children's health and well-being.
Dr. Julia Lane: used UMETRICS and the Survey of Earned Doctorates (NCSES) to explore how research experience influences career choices in STEM, focusing on the impact of gender and race on doctoral education.
The federal agencies have made a number of presentations – at the Federal Committee on Survey Methodology and, The Council of Professional Associations on Federal Statistics.
There have also been a number of webinars and conferences on how data contribute to the value of science.
The platform is expanding to include more agencies in 2024. There will be a special issue of the Harvard Data Science Review in March 2024, with almost 20 articles and vision pieces from well-known academics and government agency experts. There will be more webinars and podcasts.
Contact us here to stay informed.
The Democratizing Data initiative is working with a number of government agencies to ensure that data are more effectively used for public decision-making.