Why Scale questions in surveys suck

Published in

Bootcamp

11 min readMar 27, 2023

Scale or Rating questions are easy enough to design, but equally easy to mess up. This is your guide to designing meaningful and relevant questions.

What makes scaling questions so attractive is the ease of analysis and perceived ease of design. They are direct, straightforward and easy to report on. No wonder this type of question is used and abused heavily across the industry. Researchers and especially non-researchers tasked with finding out information via a survey rely on rating questions when they percieve an easy win from them (“Oh, look! 78% agree that our new feature is the best feature they’ve ever seen!”).

Don’t get me wrong, rating questions can be helpful under certain circumstances and can definitely lead to more or less robust results, if done correctly. However, the over-reliance on such questions more often than not leads to bad data, lost time and effort and frustration when your predictions don’t come to life.

So what’s the big deal?

Scale/rating questions aim to assign a numerical value to someone’s attitude and/or perceptions in order to make the experience “quantifiable”. For example, medical professionals use a scale in order to assess patient’s level of pain and act accordingly. Scales are also used in psychiatric and psychological diagnosis and evaluation. But what we are interested more in is the usage of scaling questions in the UXR or market research context.

UX researchers are routinely tasked with evaluating concepts and sometimes very rough ideas related to a product. They also frequently step into market researcher’s shoes and attempt at evaluating product as a whole, its position within the market and its relation to competitors. One of the most common ways to do it is via a survey. It’s hard to see a survey without any ranking questions nowadays. And while there is a plethora of research available on best practice, loads of even experienced researchers struggle in making the best use out of these questions (or they bluntly misuse them).

Wrong design isn’t the main issue with such questions, but rather lack of understanding of use cases that leads to ranking questions being used for pretty much every question and every research task. This leads to strategy being devised based on unreliable data, and solutions being rolled out to non-existent problems. While I’m not advocating for complete removal of such questions from your researcher arsenal, I do argue that their usage needs to be heavily limited to certain instances.

Let us dive into the most common issues.

Why is it a ranking question at all?

Look, as a researcher with a predominantly quantitative background I’m definitely guilty of signing up to research panels for the sole purpose of critiquing their surveys. I’ll admit though that it was incredibly helpful in the beginning of my career to put myself into respondent’s shoes and note the very obvious mistakes that wouldn’t have been so obvious were I to design these surveys myself.

What it showed me though is that even the most renowned research agencies mess up. And mess up badly. And scaling questions is one of their key vices.

Let’s look at our first example.

We actually have more than one problem here, but I’d like to focus on the statement first. What type of statement is it? If you unpack it, it’s actually talking about factual information. Or in other words, it’s a yes or no question. I either use project evaluation questions for the purposes they listed or I don’t. There might be a level of granularity there, for example, I might use it on certain occassions and not on others, but on a marco scale — it’s either yes — I use it, or no — I don’t.

What would “somewhat agree” mean in this case? I use it but not frequently? I use it but for other reasons? We can’t know. This is not the information we’ve asked from a respondent. In a nutshell, this scale is completely and utterly useless as we are giving granular response options to a generic yes/no question. There is nothing extra I’ll be able to squeeze out of it by adding multiple agree/disagree scale options. In fact, I’m losing the clarity and validity of responses as undoubtedly participants will be forced to use their own (unknow to the researcher) logic.

Funny enough, most researchers remember the 101 Research Methods course they took in uni (i.e. your response options should always mirror your question) when designing other types of questions, but for some reason scale questions are just too attractive to pass. A good question could have beed phrased in this way:

“On what occassions, if at all, do you use project evaluation questions?” OR “Why do you use project evaluation questions?”

N.B. You’ll notice that the second question assumes you are using evaluative questions, but I’m not against it being in that way as we are asking a factual question. The first answer to this question in this case should be “N/A — I don’t use project evaluation questions”.

Formulated in this way, we’ll be able to extract much richer data and proper use cases by asking directly what we want to know instead of doing mental gymnastics and hiding the real question behind the scale.

Why this scale?

Our second example is slightly less obvious but is incredibly common to see.

As with the previous example, we can actually see a few problems, the first one being the one we’ve discussed above. Have a look at the last statement and ask yourself what would “strongly disagree” mean compared to “disagee”?

However, I wanted to focus more on a scale itself rather than statements this time. Let’s see the first statement though: “the library was easy to find — agree/disagree”. You can, obviously, say you strongly disagree and it will most likely mean that it was very hard to find. Again, this is an assumption as we aren’t actually asking our participants that. We ask them to what extent they agree or disagree that it was easy to find. Hence, our reporting could only look this way:

“1 in 7 disagree that the library is easy to find”

How excited would your stakeholders be seeing that information? Does it even answer your question? Not really, if you put on your researcher’s hat. In fact, reporting it in any other way is a misinterpretation mistake. Remember that quant research is highly specific, it shouldn’t leave room for researcher’s (or even worse someone else’s) interpretation.

But reporting isn’t the only problem. Introducing statement questions inevitably leads to acquiescence bias (that is when respondents tend to automatically agree with researcher’s statements). This type of bias can be significantly diminished by formulating questions in a neutral way. In our example the question is very vague and isn’t really a question at all: “Please rate your experience at the library today”. In fact, you aren’t even asking them to rate anything, you are asking people to agree or disagree with a list of statements that all have a positive connotation.

So how can we get the information we need? We can still use a scale question for that. What we need to do though is think carefully about the metrics this time. What we are really asking here is how easy it was to find the library. That’s our question. So why invent the wheel?

“How easy or difficult was it to find the library?” OR “To what extent, if at all, was it easy to find the library?

Notice how I use both negative and positive words in the question itself. This has been proven by multiple studies (see, for example, Sterngold, Warland and Herrmann, 1994) to eliminate the aforementioned acquiescence bias by providing all possible options in the question rather than leading respondents to a positive answer. In the second suggestion I’ve made, I don’t necessarily use the negative, but by adding “if at all”, I implicitly admit that participants might have not found it easy at all. The answer options in this case would need to mirror the question, hence we go from a rather random agree/disagree scale to a more relevant easy/difficult one (or very easy to not easy at all, depending on your needs).

The same logic can be applied to the rest of the statements (except for the last one, which talks about factual information and not attitude or perception).

Why do you ask me that?

Our third example isn’t as bad as the ones before, but it does demontrate well how essential it is for a researcher to test the survey by trying to answer their own questions first.

Before we dive into this example, I just want to note that I wasn’t exaggerating when I was saying that research agencies are particularly notorious for making every question a scale question and this is another prime example of that.

However, that’s not what I wanted to focus on here. So this example was actually sent to me by my lovely colleague. Let’s see what she and I found wrong with it.

UX designers might actually be the first ones to spot the first problem, and I’m talking about the elusive neutral option (in this case neither/nor). An average participant spends seconds on each question and rarely wants or needs to pause and think about an answer. The creator of this example didn’t actually make a mistake. Neither/nor option isn’t hidden by bad design, it’s actually done on purpose. As researchers we always struggle in finding the balance between providing participants with relevant answer options and actually getting valuable insight out of questions. Neutral options are useless when it comes to reporting in 98% of cases, hence the more people select it, the more sample you lose. I’m not against making neutral answers less visible or hiding them at the bottom of the list (let’s admit it, we all do it). What I’m strongly against is not including an exhaustive list of options and trying to cover your ass by providing a lazy neutral option to cover all other possibilities.

So let’s go back to that statement. “I tend to book my short breaks outside of school holidays”. Not everyone will face any problems with this question, but as a researcher I’m immediately faced with a number of assumptions when looking at this question:

I know when school holidays usually are (in fact it further assumes that a) I have children of school age; or b) I’ve recently finished school hence remember when school holidays are);
I book or take short breaks;
I consciously book holidays outside of school holidays rather than that being by chance or due to other factors (e.g. I work in a place where there are set days/months for holidays that accidentally fall under the same dates as school breaks)

Neither of the three conditions would be applicable to everyone. In fact, it wasn’t applicable to me. I graduated from high school more than 10 years ago (let alone the fact that I didn’t even graduate in the UK, which is where the survey is conducted). I have no children. Consequently, I have no knowledge of when school holidays happen. And even if I had this knowledge, I might be booking my short breaks outside of those holidays by pure chance.

None of the answer options presented fit well my situation. I can obviously select a neutral option, because as a researcher I know how and why it’s usually used. Normal participants do not know that. They might be tempted to select “strongly disagree” as in their mind (and they are not wrong) they aren’t booking their breaks outside of school holidays because they don’t know when they happen or they don’t book short breaks at all. But that’s not how this data is going to be reported. Disagreeing for the researchers analysing this data would mean that I don’t tend to book my short breaks during school holidays. And that’s not factually correct (I might be, right? I just don’t know). That is why the list of answer options needs to be exhaustive even if you use traditional scales. However, all this mess could have been avoided by asking a simple yes/no or even frequency questions with “not applicable” or “don’t know” options, or even better — routing the correct sample to this questions by checking whether respondents at least have children of school age in the question before.

Do you really want to disgree?

Finally, the king of all mistakes and the absolute destructor of validity of findings — the unbalanced/wrong/utterly horribly designed scales.

I’m sure all of us have seen a fair share of badly designed scales. Even though reliable scales are readily available online by simply googling “Likert scales examples”, loads of researchers still go with what their gut tells them instead of following scientifically proven scales.

Here is a great example of that:

What we see here is an unbalanced scale. Unbalanced scale is a scale where positives outnumber the negatives or vice versa. In particular, we have 3 positive likelihood options and two negatives with no neutral option. Not having a neutral option is fine in a lot of cases and would have most likely been fine in this one. However, by allowing participants to have more positives than negatives, we are inevitably skewing the results towards a positive outcome.

Furthermore, ask yourself — what is the difference between “likely” and “somewhat likely”? I can guess your answer — there is none. In fact, just removing that useless “likely” option would make the scale almost perfect. I’m saying almost because there is something else quite subtle going on here.

You’ll notice how values go from positive to negative. It’s actually a rather common and very logical thing to do. Most of us think about scales exactly in this way. We tend to think about positives first, that’s just how our brain works. However, multiple studies have proven that inverting the scale (i.e. going from negative to positive) greatly increases the reliability and hence validity of findings. That is the case for a few reasons:

Inverting the scale diminishes the infamous acquiescence bias by not giving you the option to agree with from the get-go
As going from negative to positive is unnatural, it forces participants to think a bit before actually giving an answer instead of immediately agreeing with whatever has been presented to you

On another note, this isn’t the purpose of the article, but I just can’t simply pass by the fact that there is absolutely no explanation or guidance as to what they mean by “ financial product”. Don’t let participants guess what you mean! They don’t have psychic abilities, and you’ll end up making decisions based on completely unreliable data.

If you want to make sure you’re using correct scales, try to answer the question yourself and see what meaning you assign to each scale option. If you see no difference between two or more options, drop some of them.

Saving you and your participants from useless questions

I’ve covered the most common examples of badly written scale questions. While this list is not exhaustive, it hopefully gave you some advice and ideas on how not to design these type of questions. To sum it all up:

Do mirror your answer options from your question
Use balanced scales
Always use inverted scales, even if it feels unnatural
Include both positives and negatives in the question itself
Ask yourself if the question even needs a scale (a lot of times the answer is no) — i.e. Could I ask it in a different way?

Don’t fall into the trap of perceived easiness of such questions and always remember your audience shouldn’t be guessing.

Bootcamp

Why Scale questions in surveys suck

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Published in Bootcamp

Written by Christina Teklina

Responses (1)