New release of OSDG Community dataset

January 1, 2022

The OSDG team is proud to present the second edition of the OSDG Community Dataset (OSDG-CD), the direct contribution of hundreds of citizen scientists who took part in our citizen science exercise – the OSDG Community platform.

The dataset is located in our repository at Zenodo under Version 2022.01.

You can find out more about the methodology behind the dataset in our initial announcement of OSDG-CD or at the repository.

We want to extend our gratitude to each volunteer that joined this community effort. Thanks to you, researchers across the globe can derive new insights into the nature of the Sustainable Development Goals (SDGs) using ontology-based or machine learning (ML) approaches

How are you using the OSDG dataset?

We continue to welcome your contributions and discoveries, made using the dataset. Have you used the dataset in your research paper or a blog post? Perhaps you have built your own ML model? Maybe you have some feedback on what could be improved? Please get in touch with us and share your ideas.

We also invite all data science enthusiasts to visit our GitHub repository to see examples of text classification in practice, or to contribute to the OSDG Labelling Tool. If you have any questions, just reach out to our team.

We aim to continue to update the OSDG-CD every quarter, presuming there are notable changes in the number of labels. If you would like to join the citizen science exercise and make a direct contribution to OSDG-CD with your knowledge and expertise of SDGs, simply fill out the registration form – we’re waiting for you!

If you plan to use the dataset in a research paper, please cite it as follows:

OSDG, UNDP IICPSD SDG AI Lab, & PPMI. (2022). OSDG Community Dataset
(OSDG-CD) (2022.01) [Data set]. Zenodo.