The hidden insights in developers’ Google searches
A selection of real search queries, and what they can teach us about designing for developers.
Most of what developers do day to day involves lookups of one kind or another online: referencing documentation, choosing frameworks, investigating errors, checking keyboard shortcuts, and more.
Since online search is such an integral part of our work, it can act as a rich source of data about development practices, highlighting areas where good design can make a difference. A few years ago, I conducted research into exactly that — for two weeks, I collected the searches of 18 developers who volunteered for the study, asking them to explain some of their queries in detail every day.
The aim of the study was to look at search tooling specifically; it was only later that I realised the data might be interesting in a more general sense, too. Plus, a plain English summary of the full report is long overdue…
So, I’d like to offer here a curated selection of the search queries of those 18 programmers. I’ll cover what was sought for (the different informational goals), and how (the strategies yielding the best results). I’ll point out the occasional design implication, but — mostly — I want to illustrate the breadth of contexts and search styles, showing how search logs can act as an intimate window into the daily work of developers.
The data set
All participants signed an informed consent form and were fully aware of their search history being shared for the duration of the study. Their data was kept anonymous and used only for the research.
In total, I collected 2488 programming searches from 18 developers over two weeks of tracking. Participants made an average of 15.9 searches a day (min=1, max=83). Of all searches, 347 were annotated, which means the developer who made the search explained the context in further detail. Those 347 annotations were what I based most of the analysis on.
Search categories
I identified six search categories in the data. From most to least popular, they were:
- Ad hoc how-tos. The developer is in the middle of a task, knows what they need to do next, but doesn’t know how.
- Understanding an API. The developer wants to learn about the particulars of a language API or any third-party code they’re working with (libraries, frameworks etc), often concerning low-level details.
- Recalling forgotten details. The developer needs to look up the specific syntax or naming of something.
- Learning, research and investigation. The developer is exploring best practices, trade-offs and popular opinions, or learning about something new.
- Troubleshooting. The developer is looking for the cause of an error or how to fix it.
- Resources. The developer is looking for tools or libraries, or official documentation for those tools or libraries.
Ad hoc how-tos
With almost a third of all queries falling into this category, finding practical techniques for immediate use was the most common search motivation seen.
angular route uib tab
Was wondering how to combine UIB library’s tab directive with Angular UIRouter.
react setstate sub property
In React, state is immutable so you have to reset the state as a whole, but I wanted to mutate a sub key and set the state in one line.
When unaware of the exact APIs or techniques involved, the developer simply described the desired outcome:
bootstrap button next to input
I wanted to align a button next to an input field using Bootstrap.
Understanding an API
This category somewhat overlaps with how-to and recall searches, but what sets it apart is the interest in the official documentation of a particular language, library of framework.
Some searches in this category (like php preg_match
) were generic; others were motivated by questions like Is this supported? or How has it been implemented? Examples include:
forcelayout api
Was searching for an ability to select parent node in D3 force-layout.
golang copy built in
I was trying to understand memory efficiency when [this function is] used with slices.
Less so in other categories, searches here were sometimes prompted by reading code authored by someone else. This illustrates the wonderful learning opportunities hidden in code reviews!
strlen
I saw this php function in code and didn’t know what it did.
java comparator interface
→java default string comparator
I was reviewing some code written by a consultant and was wondering why they were not using methods/classes from the standard library.
Recalling forgotten details
This one is self-explanatory: the developer knows what she’s doing, but doesn’t remember the specific syntax or naming. With these searches, a quick code example is what people were often after.
ubuntu search packages
Searched after the command that’s used to search for programs on Linux Ubuntu.
URI uri = new URIBuilder
To make sure the syntax is correct.
java throw exception example
Looking for the correct syntax for method error throwing.
People often had a specific website in mind with these queries — they had likely made the search in the past and knew exactly where to look (this could also explain why a single search was often enough in this category). In this example, mdn
refers to Mozilla Developer Network:
mdn transform origin
CSS transform syntax.
Appearing in nearly 20% of all cases, the notable thing about recall searches was that they occurred as much as they did. Many IDEs are syntax-aware and have built-in documentation. Indeed, without those features the proportion of recall searches may have been even greater. Or, perhaps we’re at peak efficiency — a quick switch to the browser is so ingrained in muscle memory and usually yields quick results for recall searches, so what is the point of a specialised tool?
Learning, research and investigation
Queries here were about generic concepts (e.g., job queue
), best practices (e.g., bad things about scala implicits
) or trade-offs (e.g., svg vs png
). They often occurred when planning a change or embarking on an unfamiliar task:
segmented circle css
We’re implementing a new design that looks like a donut cut into 8 parts — wanted to see what was possible with CSS vs SVG.
best programming language for both ios and android
→app dev design, what languages to use
→code java app for ios and android
→write helloworld ios app in java
Been tasked with designing an app, no idea where to start.
But there were also instances of casual curiosity:
boilerplate template
My colleague was talking about boilerplate templates and that I should check them out. So I Googled to find out what they were.
Troubleshooting
We all know the feeling — something is failing and you don’t know why; annoyed, you copy and paste the error message into Google and hope for the best.
show is not a member of org.apache.spark. sql.GroupedData
I didn’t know why I couldn’t show GroupedData. I wanted to find how to do it.
Possible infinite loop detected
This was an error message from the percona tools I was using to sync up the replica instance with the master.
Unlike in other searches, the queries here were consistently structured: they frequently either described the problem or used the error message verbatim. It is also here that queries were most often refined, with at least one instance of refinement in about half of the queries:
babel-jest
→can’t console log in babel jest
→logging in babel-jest
→jest-cli
Thought I was in a different file than I actually was in so when running specs I couldn’t get console.log to work.
Resources
The resources people sought for were most often third-party tools and libraries to integrate into their current project. For instance:
json minify
I wanted to find an online minifier tool to reduce whitespace in JSON. Standard JS minify wouldn’t work.
rails kineses gem
I’m looking at migrating our analytics pipeline to AWS and so I wanted to see if there were any gems to support this.
The rest were searches for reference tables such as character codes or keyboard shortcuts (e.g., unicode
Looking for a unicode table), or things like dummy data generators. These searches stood out for being just as likely to occur during coding as during higher-level planning and research.
Search strategies
We’ve covered what people search for. Let’s now look at the how: the strategies employed when formulating the query and evaluating results.
Choice of search engine
Though most people in the study said they use several alternative search engines, only 5% of searches logged were actually made on something other than Google. DuckDuckGo was the search engine of choice for one participant, and only sometimes did people search directly on specialised sites like Mozilla Developer Network, GitHub or Hoogle.
Multiple services were sometimes used during a single search session, as seen here:
Google:
sumo collector filters
→ Google:sumo collector filters include
→ Sumo Logic Community:filters include
→ GitHub SumoLogic repository:filters
I wanted to see if we’d used filters before in sumologic (we hadn’t). Turns out they don’t work and their docs should be completely removed from SumoLogic’s website.
For the ten GitHub code searches that were annotated (out of 92 total), motivations varied from investigating low-level behaviour to using source code as documentation (as seen with filters
above).
When it comes to non-browser-based search tools, only one participant had their IDE equipped with web search features. Those who claimed to use offline documentation apps like Dash still made numerous documentation searches online, too.
Query refinement and keyword foraging
As already mentioned, queries were refined the least in recall searches, and the most in troubleshooting searches.
Query refinements weren’t necessarily tied to the type of search, though; they also occurred during keyword foraging, where the person searching doesn’t know what the concepts they need are called. (In the original report, I refer to this as “cross-domain translations”, but I’ve since been made aware of the much more suitable term.) Often, a familiar term is used first in the hopes that it will lead to better keywords:
jquery add to list
→jquery add to array
mongoid update_attributes
Was looking for the equivalent of a Rails ActiveRecord method in the Mongoid persistence ORM
There was one particular case of query refinement where search language itself was altered:
step by step json lexer
→comme costruire un lettore json
→como crear lexer json
The participant noted: “Results in Spanish tend to have more extensive explanations. Results in Italian or French yield fewer results typically.”
The point at which a multi-lingual programmer abandons one language in favour of another can hint at how they evaluate resources. Queries like these also remind us of the importance of considering non-English content in any tool that aggregates resources.
Natural language vs code terms
Comparing the use of natural language to the use of code terms, recall searches included the most code terms across all categories. This was expected, as people looking for reminders usually have a good idea of the specific line of code they’re about to write.
Natural language queries (such as full-sentenced questions) were found similarly in all categories. For example:
does md5sum read file into memory?
(Understanding a library or API)
what determines if a site is considered intranet for ie compatibility view settings
(Troubleshooting)
javascript library that embed a string and if find an embed content render
(Resources)
It seems to me that people sometimes fall back to natural language when the question is highly specific, or expressing it concisely is a struggle. These are the situations where you might want to just turn to the person sitting next to you and ask for help. Perhaps not all questions can be addressed by better IDE or search design.
Social validation
Finally, a prominent theme that emerged from the searches was the importance of social proof. Mentions of “the best” or “the accepted” way of doing things were frequent across categories:
react pass state to child
I wanted to pass the entire react state object to a child component, this is fairly simple to ‘just do’ but I was looking for the accepted and ‘safe’ way of doing so.
best process supervisor
I was . . . looking for people’s opinion on Linux process supervisors/monitors for managing web processes, workers, etc.
mysql using index
→mysql indexes best practices
When integrating search into IDEs, one design direction has been to filter and re-format results to make them quickly scannable, prominently featuring code snippets and removing the messy, non-parsable human discussions around them (see Assieme or Mica). While useful for syntax lookups and the like, it would completely fail when the aim is to gauge best practices, as with the queries above.
Query logs as a research tool
For those interested, the full write-up contains more detailed analyses of the searches, as well as a description of the methodology. As a closing thought, I’d like to instead further comment on query logs as a research tool in DX.
An intimate glimpse into how developers work
Because much of the previous studies in this area had been conducted in laboratory settings with made-up programming tasks (e.g., Brandt et al. 2009; Hoffmann et al. 2007), getting to see the very real every-day searches developers make was the most fun and exciting part of the project (and before this comes across as voyeuristic, let me assure you again that the data was fully anonymised!)
But a search is only a query until you know what prompted it. At this point, we get a glimpse into the wider context of teamwork…
easing functions examples
To showcase to my fellow developer what easing functions are and to show visual examples.
learning workflows…
I’m programming something in golang, but I don’t have enough familiarity with [it] . . . so I’m implementing an algorithm in python before I attempt to port it over to the golang code.
frustrations…
hook-cron
. . . the documentation online for Drupal 6 is non-existent. In fact, I get an un-styled html page that says ‘File not found’ when I click on the docs… useless.
…and culture:
git --force
Making a nerdy star wars + git joke on slack
In a sense, then, search log analysis can be thought of as a kind of ethnography, where people’s experiences are captured in the wild as they occur. And search is an interesting proxy to those experiences, as, indeed, it is when facing a problem that people usually turn to Google.
So, make a note of the next thing you look up online and consider whether it’s a design opportunity in disguise!
Although it’s been a few years since the study took place, I’d like to once more extend my sincerest thanks to the participants whose searches you see in this article. Thank you for trusting me with such personal data!