We’ve recently completed a project on Data Discoverability for the Scottish Government, which meshes very nicely with a best practice guide on search engine optimisation that the Geospatial Commission have just published. There are three facets to the work we’ve done:
Search Engine Optimisation
The basic idea is that data needs to be discoverable via search engines, because that is where the majority of searches start, not really by visits to metadata portals. The portals are, however, the way in which search engines find and display that data, so they can be subject to the same sort of Search-Engine Optimisation techniques that any other website can use.
For the Scottish Government project we worked with experts from JNCC to improve the “SEO” of their metadata portal spatialdata.gov.scot. This uses Geonetwork Open Source, with a number of changes to make it easier to analyse how people are using the portal, including things like being able to report on the number of people clicking on WMS or WFS URLs, or other data downloads, as well as recording overall site visitor metrics. Consequently Scottish Government can find out useful information about which datasets are actually being used, and what search terms led people to the site.
Furthermore, structured data is increasingly being used to inform the way that search results are displayed, and indeed how search engines identify what a page is about. That’s how you get nicely formatted recipe cards if you search google for “apple pie recipe”, for instance!
For any pages recognised as datasets, google will add them to the dataset search tool, which contains information about portals, licensing, citations and so on. This helps users differentiate canonical, or “source” datasets from other web pages where the dataset is merely mentioned.
We’ve incorporated this structured data into the Gemini 2.3 plugin for GeoNetwork, by mapping elements from the metadata standard to the schema.org dataset schema. This means that Gemini 2.3 records created with our plugin automatically include structured data when output from GeoNetwork.
Finally, when search engines start to index your metadata portal, the quality of the metadata will directly impact on the ranking of your datasets in search results. In a throw-back to Goldilocks and the Three Bears, elements such as page titles or descriptions, both of which derive from the metadata, cannot be “too long” or “too short”. It’s also a bad idea to have metadata records with duplicated titles, as search engines tend to assume this is an attempt to game the system. However in a multi-agency portal such as the Scottish Spatial Data Infrastructure, several different organisations may legitimately have datasets of, for example, Conservation Areas, so we need naming conventions to ensure users know which dataset is which, and search engines don’t think we’re trying to cheat!
Here, we get into the realms of data custodians rather than portal maintainers, but we’ve cooked up some reports for the Scottish Government to show which records don’t meet the data quality standards for titles and descriptions, or perhaps don’t have all the right elements. We’ve also created reports showing records with duplicate titles. These are delivered via an interactive dashboard that directly links to the metadata catalog, and are kept automatically up to date.
We recently held a joint webinar with Scottish Government and the Improvement Service to introduce the new recommendations and changes in GeoNetwork to the users, enabling them to start creating high-quality search-engine-friendly metadata.
We’ll shortly have further GeoNetwork enhancements to talk about, so watch this space!