Using the Crossref REST API. Part 6 (with NLS)

5 minute read.

Using the Crossref REST API. Part 6 (with NLS)

Christine Cormack Wood, Ulf Kronman – 2017 October 06

In APIsIdentifiersInteroperabilityAPI Case Study

Continuing our blog series highlighting the uses of Crossref metadata, we talked to Ulf Kronman, Bibliometric Analyst at the National Library of Sweden about the work they’re doing, and how they’re using our REST API as part of their workflow.

Introducing the National Library of Sweden (NLS)

The NLS is a state agency, has a staff of about 320, and its main offices in Stockholm. Its primary duty is to preserve the Swedish cultural heritage by collecting everything printed in Sweden, and has been doing so since 1661. Nowadays the library also collects Swedish TV and radio programs, movies, videos, music, and computer games.

The National Library coordinates services and programs for all publicly funded libraries in Sweden and runs the national library catalogue system Libris and the national database for Swedish scholarly output, SwePub. The library also runs the Bibsam consortium, negotiating national subscription licenses and open access publishing agreements with publishers.

Images left to right: External and internal view of the National Library of Sweden, and Ulf Kronman, Bibliometric Analyst at NLS.

diptic image view NLS and Ulf Kronman Bibliometric Analyst

What problem is your service trying to solve?

The metadata in the national scholarly publication database SwePub is harvested from the Swedish universities’ local publication systems, where data often is entered manually by librarians and researchers. This means that the metadata can contain a lot of omissions, synonyms, spelling variants and errors. Using Crossref, we can enhance and correct the metadata delivered to us, if we just have a correct DOI.

Can you tell us how you are using Crossref metadata at the National Library of Sweden?

The Crossref metadata is presently used in two projects; Open APC Sweden and in our local analysis database for publication statistics used in negotiations with publishers.

Open APC Sweden is a pilot project to gather data on open access publication costs (APC’s – Article Processing Charges) from Swedish universities. The project is modelled from the German Bielefeld University Open APC initiative, which is a part of the INTACT project. After APC data has been delivered to the APC system, scripts are run against the Crossref API to fetch information about publishers and journals. A description of Open APC Sweden can be found here.

When building our local analysis database for publisher statistics, we download data from the SwePub database, use the Crossref DOIs for API lookup against Crossref to add correct ISSN and publisher data to the records and then match the records against a list of publisher serials. In this way, we can get information about how much Swedish researchers have been publishing with a certain publisher and use this data when negotiating conditions for open access publishing with the publisher in question.

What metadata values do you pull from the API?

In Open APC Sweden, a Python script supplied by staff at the Bielefeld University is used to pull metadata about publisher and journal names and ISSN’s from the Crossref API. The result is entered into an enriched version of the APC data files delivered by the universities and then statistics can be calculated on the result using an R script. The result can be seen here.

In the local analysis database, a modified copy of the Bielefeld Python script is used to add the same metadata to the records before matching them against publisher serial ISSNs.

Have you built your own interface to extract this data?

In Open APC Sweden, the Python script is developed and maintained at the Bielefeld University and an exact copy is being run in the Swedish project.

In the local analysis system, the Python script is somewhat modified to suit the special demands of this system.

But sometimes it is very convenient just to use the main DOI lookup to do a manual check-up of problematic records.

How often do you extract/query data?

In Open APC Sweden, usually about two-three times a month, when new datasets are delivered from the universities. In the local analysis database, usually lookups are being done on a daily basis as development of the database continues.

What do you do with the metadata once it’s pulled from the API?

In Open APC Sweden, the metadata is going into the APC data files for processing of statistics. In the local analysis database, the metadata is used to match against publisher journal ISSN’s.

What plans do you have for the future?

For the Open APC Sweden I would like to build a database system to make the system more scalable than just working with flat data files.

With both the SwePub system and the local analysis system, we are now using the new service oaDOI and their API to look up metadata about the open access status of the publications to enrich our local systems.

What else would you like to see the REST API offer?

In the process of normalising the publishers’ names, the names returned are sometimes at a “too high” or on a too generic level to be used to generate good statistics. For instance, Springer Nature are sometimes returned as Springer Nature, sometimes as Springer Science + Business Media and sometimes as Nature Publishing Group. A similar thing is valid for Taylor & Francis, where the mother company Informa UK Limited is returned instead of the publishing subsidiary of the company. One thing to wish for here is that we could agree on some kind of normalisation of the publishers’ names and that Crossref could return this as a supplement to the present metadata.

Thanks Ulf! If you would like to contribute a case study on the uses of Crossref Metadata APIs please contact the Community team.

Get involved

Find a service

Documentation

About us

2024 April 03

Testing times

2024 March 18

Mending Chesterton's Fence: Open Source Decision-making

2024 March 15

Credential Checking at Crossref

2024 March 13

Subject codes, incomplete and unreliable, have got to go

Blog