GPT isn’t going to replace designers. But it’s coming for the users.
Every once in a while, we get hype about some new breakthrough in AI that will replace this job or that job. In the case of design, those tools have consistently flopped. Proponents usually dismiss the failures of their predecessors: this time it will be different, this time we have the technology.
The latest AI taking the discourse by storm — ChatGPT — is definitely very impressive technology, and capable of feats that its predecessors were not (such as generating jokes that, while not necessarily funny, resemble funny jokes).
But ChatGPT shares one weakness with its predecessors: its function is to match patterns rather than look for facts. This creates hard limits on ChatGPT’s practical applications. Unfortunately, designers buying into the hype of AI gloss over those hard limits with proposals such as “replace user research with ChatGPT”:
This is not a revolutionary or surprising application of AI — I predicted in 2020 that it would happen — because it fits neatly within the lineage of an age-old design problem: empathy.
The problem of empathy
In the popular imagination, empathy and design go hand-in-hand. The voice of the customer is held to be a core tenet of user-centered design. You’ll hear product managers talk about (or refer to themselves as) “user proxies,” able to speak on behalf of the user. And historically, this has failed to produce good decision-making. Empathetic managers are more likely to reject evidence about the user’s actual needs or preferences, because the belief that “they speak for the user” empowers them to equate their opinions with the user’s. As a result, their product decisions are manager-centered rather than human-centered, and inevitably lead to unpopular and unusable products.
Jade E. Davis says it best: “empathy privileges the self over the other, and moves the other into only being a result of mental reproduction.” The attempts to bring the user into the boardroom instead create a fake synthetic user, which is used as an excuse not to talk to any real ones. Rather than “giving voice” to the user, it helps us pretend those voices don’t exist.
ChatGPT represents an evolution of this fake, synthetic user. Now you can just ask a robot what people want! Wow! And even more conveniently, if you don’t like the answer, then you can just re-engineer your prompt until you get one that you do like. Unlike an interview participant or even a human user proxy, ChatGPT cannot challenge any assumptions in its prompt. And because it’s code rather than a human, it benefits from algorithm-washing to create a perception of unbiased data.
This kind of empathy creates false confidence in the data that the team will use to make decisions. Empathy-laundering masks the low quality of that information (it is essentially made up) and allows the team to convince themselves that they are doing the right thing without talking to a single person that would be impacted by their outputs. The empathy also acts as a shield for any objections they do get: “how dare you imply I’m not helping!”
Algorithmic empathy thus takes up the space of expert research and analysis; it feels like we have done something when we have done nothing. The final decision becomes more and more risky for the decision-maker, because the computer cannot be held accountable if the information it provided was misleading. And once ChatGPT’s feedback has been proven wrong, there is no recourse — you can always interview different participants, but there is only one ChatGPT model.
Synthesis is the new empathy
User researchers gather data for a particular purpose: through listening to multiple perspectives, they can define (and iteratively refine) the user’s mental model of the problem they face. That mental model is then extremely useful for understanding whether a given solution would actually help. This “empathy machine” has been successfully used in design for decades.
The trouble with ChatGPT is that it does not have a mental model, because it’s not capable of thinking. Talking to it gives you no valuable information. User researchers also cannot know whether GPT’s training corpus (the entire internet?) is adequately representative of their user’s context, and therefore whether its feedback is relevant to this particular problem.
It is tempting to reach for new tools just because they are new. ChatGPT (for text) or Stable Diffusion (for images) have a lot to offer designers as sketching or ideation aids. But GPT cannot play the role of the user, because it’s the user’s understanding that lets us establish a “definition of good” to measure those sketches against. Otherwise the design process becomes GPT grading itself and the designer will bear the consequences.