story.
Do you really know your Total Website Traffic?
Do you own a website that gets close to a million page views a day? Not yet?Wait, you might already be getting that!
So here goes the story - We at InfiSecure were approached by a customer with 400k-500k page views a day. What they wanted was to detect bad bots on their website and stop online fraud. We did our regular job, got the customer integrated and started analyzing their traffic. Within a day, the results were way too surprising!
The actual hits the customer got in a day were 900k, only 550k of which were coming from genuine humans. Read below to know what constituted the remaining 350k.
Majority (more than 80%) of the 350k hits came from crawlers. We saw a mind-boggling figure of 150k hits coming just from Yandex Bot from Yandex (The most popular search engine in Russia). By a simple math, this comes close to 2 hits per second. What does it mean for the business? This means that you are processing at least 2 hits per second from Yandex alone, adding to your server costs and limiting your server capacity every single second. But wait, the case is going to get worse. The page hits from Yandex were not consistent. Every hour, there were times when the crawling rate from Yandex slowed down and so were the times when the crawling speed became faster. The result of the faster crawling rate - the customer was getting over 5 requests per second from Yandex. This translates to processing 5 new requests every second, even during peak genuine traffic. This not only raises the server cost, but also leaves a bad experience for a genuine user. The genuine user sees slower page loads, or in certain cases, the request getting dropped altogether. Thankfully, our customer was spending a lot on server and the genuine users were not getting affected. But by far, the funniest part of the analysis is this - even when we select a unique page title from the customer’s website and search for it on Yandex, the search result shows the customer entry only on second page or later.
But by far, the funniest part of the analysis is this - even when we select a unique page title from the customer’s website and search for it on Yandex, the search result shows the customer entry only on second page or later.
Modifying A Bronx Tale, the saddest thing in life here is wasted effort!
The next in line to add to the crawler traffic was Google with its GoogleBot. The remaining share was predominantly taken by Bing and Baidu. There were small traces of Ahrefsbot, Dot Bot, Facebookexternalhit, Istella and Seznam as well. Most of these crawlers belong to search engines.Apart from the crawlers, we saw close to 52k hits coming from bad bots. Very few of these were dumb, most were sophisticated human mimicking bots. The biggest loss for the customer in this particular case was loss of money to click fraud. The bots were a part of an ad campaign that the customer was running. About 7-8% of the traffic coming from the publisher (where the ad has been posted) was non-human and non-crawler traffic. Almost all of these bots were sophisticated enough to get accounted for in Google Analytics.
So what’s the gist and what’s the takeaway?
Remember your server costs are yours to bear. Your server capacity is yours to handle. Your website content is yours to protect. Tackling bots (around 20-50% of your website traffic) on your website is important.Understanding who crawls your website and how much is as important as knowing who visits your website.
Keep a check on your bot traffic, the overall bot traffic on the web is growing at a much faster pace than our sweet old human traffic. Be aware, be protected! Do not be a victim to countless online frauds.