In my career I have often been involved in collecting and reporting on the usage of websites and applications through the data captured in logging / tracking systems. It can be fun to analyze and select data points and pull them into graphs and charts to be presented to leadership. However, there is a danger in relying only on quantitive data to make design decisions. People often view data collected from web/applications as definitive. It is after all a record of exactly what people clicked on and what pages / sections they visited, isn’t it? Unfortunately, this is an assumption and along with several other bad data manipulation practices you could be seeing problems where there are none.
You May Not Know You Have Incomplete DataLet me start by sharing about a time when there was a huge effort to fix a nonexistent problem based on, unknown at the time, incomplete data. It was shortly after I finished grad school and was working temporarily as a business analyst waiting for an opening on the newly forming UX team of a b2b distribution company. A report prepared by the site analyst had been shared with the CEO that showed the search functionality for the eCommerce site was returning zero results at a rate of around 30%.
Given the type of products we sold being very technical and having numerous attributes this wasn’t immediately flagged as a problem. However, the company had invested millions in the search functionality and at some point the someone shared that 30% seemed high and others in our industry were seeing 20% zero result rates. So this prompted fire alarms to go off in the executive suite and it came down to the e-commerce team to find out what was wrong and fix this problem.
Finding the Truth
I went into the analytics application to take a look at the data myself. There was one report for showing when the search engine returned X number of results broken out on each line. It was also possible to see what people were searching for in another view. Again, given the technical nature of our products there were a large number of searches for part numbers instead of brand names or part names and certain attributes. Going back to the X number of results report I noticed a problem. The report started at 2 results even though the report query was to show searches any results. The data came back with counts for searches with 2 or greater matches.
I showed this to the site analyst and we both realized that the analytics software was not tracking single results or what we called exact matches. Well it was, but it was not including it in the search results report because the search functionality would immediately send the person to the product detail page and not the search result page. So it was not counting it as a search result.
The next question was how many people were getting an exact match? We had no way of telling at that point as the product detail page visits would not show that the search engine had been used. It would show the referring page from wherever the person had been when they used the search. If it was the home page we could infer that many of those had used search as long as the product was not being featured on the homepage. However, people often searched for multiple parts and so many exact match searches would also have to be coming from other product details.
Going the Wrong Way
Before we had uncovered this problem in the data, the team had already embarked on numerous activities to try and fix the zero results “problem”. One of the decisions was to change the default behavior of the search when more than one term was entered from using the boolean AND to using OR. The reasoning for this was when using AND as the boolean between multiple terms, a product would need to have all terms to match. As a person added additional terms to their search there would be fewer results or very quickly no results.
By switching to OR for the boolean between multiple terms the products only need to match one of the terms. As a person added more terms to their search they would see more products. While the outcome of this change did indeed decrease the zero results, it also had the effect of making our search return products that were less relevant. While they did tweak things so that terms that had more of the matching terms were first. It still made it appear that we carried more product than we really had for a particular category. There was also a problem around subsets of different combinations of multiple term match that made relevancy sorting impossible.
Generally as noted, the results now included more products that were not necessarily relevant. Even worse though, was if we didn’t really have any of the product being searched, then all the results were not relevant. This was completely opposite of the expected behavior people had for search… That entering more terms would narrow results. Customer complaints started coming in that the search was broken. We also saw this in usability tests where customers would show us how frustrating it was to get thousands of results for what should be a a small number of results.
The Truth is Revealed (Sort of)
In the mean time I was able to finally narrow down the exact matches by using another tracking tool we had where we could see individual sessions. I was able to pull a subset of sessions where exact matches had occurred and determined that roughly 10% of searches ended in an exact match. This put our overall zero results at around 20%, which was in line with the industry at the time. Despite now knowing this, management was determined to keep the AND boolean configuration as they considered this as better than no results.
Eventually we corrected the tracking in the main analytic software so that the report accurately included the exact matches. With the boolean change to OR our zero result rate was now down to around 12%. However, the long term damage of people being frustrated with the search behavior continued for several years before the team was able to change the multi-term boolean back.
All this to say that the quantitative data from logging / tracking only reflects what you measure. And while you may think you are measuring everything, there may be limits to what can be captured or even a misconfiguration as was this case. We made it a practice to verify what we wanted tracked when releasing new features and then checked to make sure we were capturing things correctly. This was only one example of many on how quantitative data was not the iron clad truth that people believed it to be. There can be other problems with interpreting quantitative data which I will share in the next post.Go to the comment form