One thing you notice when you run a web site like this one is the occasional aberration in access. A particular IP address starts hitting your site repeatedly. Or you get a lot of views from a country for a single page. From the outside, a person just sees the web site. But from behind the scenes, there can sometimes be a cat-and-mouse game to keep bad actors off your site.
I recently rebuilt a web site. One of the things the site admin didn’t do was watch the access log files. At one point before Google Analytics became a pervasive web site metrics tool, people relied on their log files for pageviews and hit counts. In most hosting environments, every activity on your site is logged. If someone loads a web page, each element – image, text, stylesheet – is logged as it is loaded.
When you look at the log file, you can see every access to your site. One thing that might have either helped prevent the site being commandeered or flagged earlier that it had been, would have been to look at the log files.
Watch for Mice
Here’s an example from a recent log file of mine. It gives you an IP address for the requester (which you can plug into WHOIS to see who it is), tells you what that request (GET) was for and when, and a bit about what technology the requester was using:
184.108.40.206 - - [26/Jan/2020:09:27:24 -0500] "GET /blog/2020/01/16/rebuild-a-hacked-web-site/ HTTP/1.1" 200 35318 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:23.0) Gecko/20131011 Firefox/23.0 (UF)" 220.127.116.11 - - [26/Jan/2020:09:27:48 -0500] "GET /blog/2002/04/30/sample-collection-development-policy/ HTTP/1.1" 200 35323 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
For example, the first line was for a user accessing that hacked web site post. The next one is a Bing search crawler adding a page or contents to the Bing search index.
When you look at a log file, which you should do periodically, you can see what’s really happening on your web site. If you only interact with your web site through your content management system (WordPress, Joomla, etc.), you’re only seeing a narrow view of what’s actually happening.
Here’s a good example. This is some of the traffic from that hacked site. None of these pages exist on the site any longer, but requests are being made for them from some source:
18.104.22.168 - - [21/Jan/2020:09:51:45 -0500] "GET /Clarks-Girls-Shoes-DOLLY-HEARTH-Black-Leather-Mary-Jane-School-Various-Sizes-Clothes-Shoes-&-Accessories-194602/ HTTP/1.1" 200 220 "-" "Mozilla/5.0 (compatible; SemrushBot/6~bl; +http://www.semrush.com/bot.html)" 22.214.171.124 - - [21/Jan/2020:09:51:55 -0500] "GET /Clothes-Shoes-&-Accessories-Womens-Slippers-Haflinger-OLIVIA-OWL-HAFLINGER-SLIPPERS-524077/ HTTP/1.1" 200 220 "-" "Mozilla/5.0 (compatible; SemrushBot/6~bl; +http://www.semrush.com/bot.html)"
Bots are not necessarily bad but there is also no reason to let most of them reach your site and use your server resources. You can find advice to filter bots out of your analytics (they inflate your views, pageviews counts) but it’s better to block them at the server.
Another reason to watch is to see what other things people may be testing for. Are there strange URLs in your log file? Weird requests to your search engine or to a plugin on your site. Things like cross-site scripting and other exploits may occur if your site is misconfigured or your server software or CMS is not kept up to date.
Harden Your Site
I posted about this about a year ago. When you run a web site, you will want to keep it secured. Content management systems like WordPress and Joomla have guides on how to secure the CMS and web site.
This doesn’t secure your entire web presence. If you run a CMS, that’s only an application running on the web server. As the site I recently worked on shows, there’s lots of other things that could be happening outside the CMS.
I’m not going to repeat my previous post, but here are some additional resources that I used this time around.
Your CMS probably helps you block unwanted traffic. You can also block incoming requests before your CMS responds to them. This can be a bit technical but it means that the server is doing the least amount of work.
I use .htaccess for this but your web host may have other tools. The approach intercepts bad requests and redirects them to a dead end. This is a good explanation on how to do this, both explaining what the .htaccess file is and how to edit it after you’ve saved a copy. It doesn’t eliminate the resource usage on your server – your server will still have to respond to the request – but it doesn’t get to your web site.
Hide Your Login
When you run your web site on a common platform like WordPress, everyone knows how it is configured. Everyone knows that your login page is at https://yourdomain/wp-login.php and so on.
You can also restrict access to your login pages. Using .htaccess again, you can require that visitors to login pages only come from certain IP addresses. This is great if you’re working on a corporate site and have a static IP address (or range). It’s trickier if your IP address changes.
Firewall Your Site
This blog tends to be open to the world. But there are a couple of countries where I’ve had repeated instances of issues. If you visit from one of those countries, you get a challenge page. This allows real people to visit but, hopefully, blocks automated requests.
I used a similar approach with Cloudflare’s Firewall tool on the site I rebuilt. Since the site has a geographically limited customer base, I increased the countries that receive the challenge. This is not foolproof – if I use a VPN and impersonate a visitor from a different country I can bypass this check. But it meets my goal of reducing automated, undesirable visits. Since it happens at Cloudflare, those requests never reach the web site.
For example, rather than letting the bots even get to my web server, I have a Cloudflare firewall rule called bad bots. You turn on the Known Bots firewall rule and then customize it. Here’s mine, with Known Bots blocked but allowances made for Google and Bing to index my site, for Jetpack to monitor it, and for Feedly users to ping for updated posts:
Another firewall rule I have blocks access requests for any login files. This pushes those requests away from my server. I can still secure my login pages but it’s belt and suspenders.
Multi-Factor is Easier Now
Two factor authentication seems much more common than it used to be. I’ve used Duo Security on my site for a while now. If someone gets a hold of a password and gets to your login page, this provides one more step to block them. There are many ways to do multi-factor on WordPress and I’m sure it’s common on other sites.
I know I’m a broken record on this, but that only secures the web app. One reason that other site was hacked was that there was access outside the web app, WordPress. I don’t have a window into what happened there (no log files, etc.) but you can use multi-factor in multiple places.
Your web hosting account may offer two factor authentication, which would help protect your personal information and credit card or financial information. Your web site may use CPanel to enable management of your site. You can apply two-factor authentication to CPanel as well.
Out of Sight
The biggest reminder that hacked site had was that someone needs to be watching for mice. There are plenty of bad actors out there that don’t care a fig for your web site. They aren’t going to hack the site itself. But they can use your web server resources – which you may be paying for by usage – for their own ends.
Schedule a periodic check to make sure that you know what’s going on at your web server. Look at what files and folders exist outside your CMS web app. Check your log files to see who is asking for what. And use those checkups as an opportunity to continue to adapt to any threats that you may see.