It’s hard to ignore stories about ChatGPT at the moment, be that people using it to write novels, letters, emails etc, or people being concerned that it’s going to automate them out of a job. This is a perennial worry- in the 1400s when Gutenberg introduced the printing press, an entire social class (scribes) found themselves out of work. In the 1920s, the steady automation of telephone operators put many young women out of a common entry-level job1. Since 2000, and particularly since COVID, the threat of automation has intensified these concerns. Now, however, the robots are coming for our GIS jobs too!
It’s true, ChatGPT can do things that seem like magic, and yes, you can get it to do crazy things like run Docker2 or entire Linux virtual machines3, create Python scripts to do GIS tasks4 and so on. Who would blame us for feeling some slight trepidation about the future of our jobs in this case?
However, you have the case where a company is inundated with complaints about a service that it doesn’t actually provide, except ChatGPT says it does5, and has been merrily providing code to run said “service”. In another smaller case, in a recent webinar6 on integrating GPT within the FME workflow to do simple things like email or postcode lookups, the results weren’t 100% correct- where GPT couldn’t find the answer, it made the best guess. In some circumstances, a “best guess” may be fine, but an email address or postcode is either right or wrong, there is no ambiguity.
It’s worth taking a step back and thinking about what ChatGPT actually is, from a layperson’s perspective. It’s a chatbot built on a language model (GPT) that is really good at taking an input, and generating a related output, based on the corpus it was trained on. In the case of ChatGPT, the actual training corpus is a bit of a mystery, but it appears to include Wikipedia, the web, and a bunch of public domain scans of books. So when we give ChatGPT a prompt, be that a question such as “Who voiced Uncle Bulgaria in the Wombles?” or “What sort of jobs is ChatGPT going to make obsolete?” then what you’re really asking is “Given the statistical distribution of words in the vast public corpus of (English) text, what words are most likely to follow the sequence ‘Who voiced Uncle Bulgaria in the Wombles?’”. That takes away a bit of the magic, doesn’t it?
If we consider ChatGPT as a powerful search engine, in which scams, fakes and SEO haven’t (yet) polluted the results to the point of meaninglessness, then it certainly has its uses. However, where the accuracy and provenance of the result are important, you wouldn’t necessarily trust the first result you came across in a web search, and similarly, you shouldn’t rely on ChatGPT for the definitive answer.
Furthermore, as an open source, metadata and standards advocate, I’m concerned by the black box approach being taken here. Without definitive information on the training corpus and workflow used to create ChatGPT, or any other Machine Learning model, it’s dangerous to rely on the results. Put simply, we all understand the importance of good metadata to help us pick the most appropriate dataset for our purpose, and the same should apply to AI.
Having said all that, there are circumstances where ChatGPT can be really useful. If you’re struggling for inspiration when writing any form of text, then sure, use ChatGPT to help you find the correct phrasing (note ChatGPT wasn’t used when writing this post, except to check who voiced Uncle Bulgaria).
You can also find all sorts of examples of people using it for code commenting and refactoring. There’s a plugin for QGIS7, and there are multiple ways of integrating with PostgreSQL, including a clever Chrome extension8 that allows you to test SQL code generated by ChatGPT on your actual database, or by using the pg_vector extension to prepare your data for classification with the OpenAI API that sits behind ChatGPT9.
Another valuable use case for (Chat)GPT is the democratisation of Machine Learning tasks for GIS analytics. For example, you can use it for Natural Language Processing in keyword analysis or geoparsing, or text summarisation, simply by installing the GPT plugin for Google Sheets10 and pointing it at your data.
In summary, (Chat)GPT and the other Generative Models like Dall-E for images, are extremely powerful for specific use cases, but we should treat them with care and a degree of scepticism. Are they going to take our jobs? Probably not, but they might allow us to automate some of the boring repetitive tasks and allow us to focus on more interesting things!
At Astun, we’re actively looking at ways in which we can use Machine Learning in our data discoverability workflow. If this is something you’re interested in finding out more about, then get in touch!