Three-Nines

Woohoo! We’ve made it to 99%+ uptime for our core systems since we’ve begun keeping track (about 4 months at time of this post) — which isn’t just “not bad”, it’s great, considering we were doing live migrations, emergency data recovery, and cut-over to our new cluster all during this time.

screenshot_2016-09-11_02-04-55

Apart from some extra downtime shown for our gateway – it was blocking pings, and thus artificially was showing as down – we have done pretty well. EngSocSrv was the worst hit for true uptime, since it both had a failed disk earlier this year (knocking the club sites offline for a while) and was subject to downtime from our damaged ECF uplink. Both of these types of failure have been done-away with now that all systems are RAIDed+mirrored+backed up daily and also supported by a dual-wire redundant uplink.

Our first real performance reliability test came last week during the F!rosh events. Despite being hammered by high-volumes of traffic for people checking out the F!rosh week schedule, and people deliberately trying to overload the servers during HavengerScunt, everything stayed up.

Let’s cross our fingers that the good fortune carries on through midterms and finals, too!

New ECF Drop

A few months ago, our cable uplink to ECF / ITS data closet was damaged, dropping it down from gigabit to a measly 100mbit. And what an unsteady 100mbit connection it has been — already, it has oscillated up and down from gigabit a few times, but has also caused us disruptions and downtime grief as well.

That all changes this week! In mid July, we had installers come in an wire in a dual, redundant CAT6 ethernet drop. This one is located behind the secure door of the EngSoc data closet, and replaces the old drop that was located across the room, and required a second hop before it reached the data closet. This means far fewer chances for disruption (the wire is totally locked away & enclosed), and provides redundancy ( we have a 2nd wire, if one ever gets damaged again). To provide this redundancy, the new EngSoc router is dual-homed, and has both of the ethernet cards bonded together — if either wire breaks, we’ll be ready.

Just take a look at the new performance:
speedtest

Where before we were struggling to push 80Mbit/s download and upload, we’re now well into the hundreds-of-megabits range. What’s this all mean? Faster website hosting speeds for clubs, less downtime, and improved internet speed for the EngSoc officers! 

3aee85eb9782028cb6e4e3209a5a18e1

He’s Dead, Jim

Well, we’ve had our first redshirt moment:

Redshirt_characters_from_Star_Trek
It was the reallocated sectors that did him in, Captain.

Despite passing the initial week-long HDD stress test, one of out 3TB Toshiba NAS drives began to throw SMART errors, despite being left idle after deployment. So what did all this mean? Warranty claims, but of course — and also lit bit of mdadm hot-swap RAID fun.

First, we’ve got to tell our NAS box its disk has failed. With a one-liner, the disk is marked as failed and removed from the RAID array. With a flick of a handle, out slides the drive from the new, convenient hot-swap trays. Take that, old and inexplicably locked-up Apple Xserve RAID!

IMG_20160728_165727
Hotswap = life made easier when it comes to repairs

Luckily, NCIX is a little better with warranty claims than Canada Computers, which is one of the reasons they were chosen as our HDD supply source. Within 10 minutes, they had my RMA approved with Toshiba, and I was given a new disk and was on my way — no waiting around for 2 weeks for a mail-in replacement.

In a few minutes of being back at the data closet, the new drive is slid into its enclosure and into the NAS, and another two commands partition, then re-add the drive to the RAID array. Ah, life’s good.

What’s this say about Toshiba 3TB drives, though? Well, not much. One out of 6 disks failing might seem like a lot, but this is too small a sample size to draw any conclusions. Even in Blackblaze’s blog, they do a breakdown for the same DTO1ACA300 disks EngSoc uses, and note that even 58 disks is too small a sample size to draw conclusions.  One thing is certain though — as of June 2016, Toshiba drives undercut every other supplier on their cost per GB of storage!

 

 

The Servers are Coming!

The servers are coming, the servers are coming!

Yes, we’ve done it. We’ve gone and deployed a new, shiny, starburst-advertized server cluster – and a rack-mounted one at that, too. While there is still provisioning and service cut-over left to do (from the old systems to the new ones), the cluster is all up and running.

What we’ve got (from top of rack to bottom):

  • Lenovo SFF PC – acting as gateway router
  • Keyboard+Display & KVM switch
  • 2 x LXC host nodes
  • 12TB RAID server
  • DLink DGS1100-24 switch (on rear — not visible)
  • APC UPS1000
  • Old Xserve RAID, now defunct… but too heavy to move!

Some before and after shots:

before_closet before_serverafter_closet after_server

New RAID Storage for EngSoc

EngSoc is getting new servers — and what better way to start off than by replacing the crufty, decade-and-a-half old Apple XServe RAID server that had been the backing storage for all of the Club sites? Drawing 250W of power, and providing 1.2TB of space over 14 disks, the XServer RAID is definitely due for replacement.

xserve_raid_slot
Xserve was Apple’s one-time venture into commercial hardware. Unsurprisingly, it flopped.

Our replacement: a hand-spun server, providing 12TB of RAID storage over only 6 disks. The power footprint is also under 100W at the same time — 10 years makes a large difference, doesn’t it?  While the XServer RAID cost $5999 new in 2003, EngSoc’s solution cost only $1900 all told, all thanks to using off-the-shelf parts.

5742_10_rosewill_rsv_l4411_rackmount_server_case_review
The new RAID — still hotswap, and still rack-mountable, but with commodity hardware backing it 

We’re using good old Gigabit Ethernet for all our datanet backbones, and this server has 4 network cards. That’s a lot of bandwidth — 500MB/s in fact, which is more than enough to saturate the 400MB/s write speeds of this new server. Going with commodity Gigabit ethernet is great, because it’s both cheap, well-established, and inter-operable with any computer than can connect to an ethernet switch. This gives us much more flexibility in our network design — previously, the fibre-channel network for the Xserve RAID meant it could only operate as a ‘slave’ to another box with with a fibrechannel card.

All of this means that EngSoc has lots of storage now for proper daily backups, and a proper CIFS/SAMBA share server with some redundancy. This will go a long way in giving myself (the sysadmin), the webmaster, and the EngSoc Officers a staging area for files, temporary backups, and shares.

Speed Up Your WordPress with Caching

Hey Skuligans & Amature Webmasters:

today’s instalment deals with speeding up your Skule wordpress website. While half the battle is on my (the sysadmin’s) side  – providing decent hardward, up-to-date OS, tweak webserver settings, blah blah – the other half of the battle is ensuring your site is well-optimised by design.

That’s where caching comes in. Each time a user loads a non-cached webpage, the webserver must re-generated the HTML code of your WordPress from raw PHP. Caching, on the other hand, takes the generated HTML and stashes it away for the next time a visitor clicks on your site. While the servers we have contain fast processors, no matter how you slice it, having to do less processing will always be faster.

Enabling Caching

Big Caveat: only attempt this if you have some experience with WordPress before. Unexpected errors can and will pop up for you, so only attempt if you are confident!

Luckily for you, WordPress provides its own set of caching plugins. The two best are the W3 Total Cache plugin,  and the WP-Supercache Plugin.  For Skule usage, the W3 plugin works best right out of the box; the Supercache Plugin requires more configuration. So, lets get installing the W3 Total Cache plugin:

  1. Go to the Plugins page,  click ‘Add’, and search + select the W3 Total Cache plugin from the plugin store
  2. Enter your FTP password to begin installation. If installation fails, check you have write permissions for your wp-content and wp-content/plugins folder. This can be done via Cpanel File Manager.
  3. Once installation is done, you may be greeted about some post-installation items that failed. These are permissions-related. Follow the instructions, and enable write permissions on the files that failed. This can be done via the Cpanel File Manager:
    perms
  4. Once done, enable all the caching modules you desire. The ones you definitely want are Page Cache, Object Cache, Minify Cache, and Database Cache. Unless you know what you are doing, don’t enable any others!
    enable
  5. Make sure to change back permissions on any of the files you changed in step 3!
  6. Caching should now be installed. Load you site to prime the cache; the first load is usually very slow.

Since caching is now enabled, you will have to clean your cache every time you create a new website post; this is done by clicking the ‘Performance’ tab on your WordPress topbar and selecting ‘Empty All Caches’.

Last note: you may have to edit your .htaccess file in public_html folder to allow redirects. You will know if you get “page not found” when click around your site. If so, you need to add this:

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress

Performance Gains & Testing Your Site

Now that caching is up and running, it’s time to test. Pingdom has a great tool for this.

Check it! Here is this site BEFORE caching:

precache

…And here it is AFTER caching:postcache

That’s a change from 703ms before to about 394ms —  close to 50% time reduction! Now go, and set up a cache on your site.

spam

Eliminate Spam, Threats with WordPress Plugins & Updates

Dealing With Spam

WordPress is just as vulnerable to spam as email — in fact, more so, because the control of spam filters is put entirely in your (the end users’) hands.  A prime example of why you want to install spam filters for your WordPress comments was a site I just worked on:

spam

For those who can’t see it yet:
spam

That’s right 1981 pending comments — and over 99% of them are spam. Truly a way to learn via the school of hard knocks. A quick way to limit your Skule website’s spam comments (and make it easier to publish valid comments) is use the WP-Spamshield plugin:

https://wordpress.org/plugins/wp-spamshield/

Simply install, and keep up to date, and your comments will be spam-free.

Another helpful plugin which will outright ban bad IPs, and keep tabs on spammy visitors is Wordfence:

https://en-ca.wordpress.org/plugins/wordfence/

Dealing With Viruses & Hackers

WordPress  – like any software – needs to be kept up to date (here’s looking at you, <insert-lazy-club-webmaster-name-here> !) or you risk having hackers break into your site by exploiting  vulnerabilities. It’s not hard, and its necessary, if only to keep me (your humble sysadmin) and the webmaster from pulling our hair out a few times each month.

It’s really quite a simple process:

  1. Log into your wordpress
    login
  2. Click on the updates button
    update
  3. Click upgrade/install
    reinstall
  4. Enter your FTP credentials Note that you MUST use localhost as the hostname!
    ftp
  5. Upgrade & Enjoy!
    maint