<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Mammoth VPS Status</title>
	<atom:link href="http://status.mammothvps.com.au/feed" rel="self" type="application/rss+xml" />
	<link>http://status.mammothvps.com.au</link>
	<description>Service Status page for Mammoth VPS.</description>
	<lastBuildDate>Tue, 08 May 2012 05:26:20 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Mammoth VPS Incident Report for Thursday, 3 May 2012</title>
		<link>http://status.mammothvps.com.au/view/1890</link>
		<comments>http://status.mammothvps.com.au/view/1890#comments</comments>
		<pubDate>Tue, 08 May 2012 05:20:22 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Incident Reports]]></category>

		<guid isPermaLink="false">http://status.mammothvps.com.au/?p=1890</guid>
		<description><![CDATA[On Thursday from approximately 9:30AM through to 2:15PM, customer VPSs on hosts vps01, vps02, vps03 and vps04 experienced significantly degraded performance, caused by a rebuild of the disk array on the SAN that provides storage for these hosts. While it wasn&#8217;t a total outage (some VPSs that don&#8217;t do a lot of I/O would have [...]]]></description>
			<content:encoded><![CDATA[<p>On Thursday from approximately 9:30AM through to 2:15PM, customer VPSs on hosts vps01, vps02, vps03 and vps04 experienced significantly degraded performance, caused by a rebuild of the disk array on the SAN that provides storage for these hosts.</p>
<p>While it wasn&#8217;t a total outage (some VPSs that don&#8217;t do a lot of I/O would have been barely affected), many customers were severely impacted as their VPS performance dropped well below acceptable levels.</p>
<p>Issues experienced by customers included:</p>
<ul>
<li>Very slow I/O on their VPS</li>
<li>Very long reboot times (caused by the slow I/O), which often made it look like the VPS was not working at all</li>
<li>Lack of confirmation from support as to when the rebuild would complete</li>
<li>Unable to move to a new host due to the degraded I/O</li>
</ul>
<p><strong>The problem:</strong></p>
<p>The root cause of this problem appears to be a drive in the SAN temporarily locking up early Wednesday morning (2nd May), triggering a failed drive reaction in the array controller, which in turn means an array rebuild onto a hotspare was started. An array rebuild is not a particularly uncommon occurrence, and it normally has less of an impact &#8211; we&#8217;ve had drive failures in the past and most customers have not even noticed the performance degradation that resulted while the array was rebuilding.</p>
<p>Performance at this point was slightly reduced but generally everything was working as expected. A rebuild usually takes around 24-36 hours.</p>
<p>At approximately 10:30 two hosts that use this SAN, vps1 and vps3 began having issues, presumably due to the I/O load on the SAN from the array rebuild &#8211; vps1 became completely unresponsive and vps3 was almost completely unresponsive. We were unable to recover them so had to reset them. At this point we decided to drop the priority of the array rebuild to &#8216;low&#8217; hoping it would prevent this and further I/O starvation issues during the rebuild.</p>
<p>The next morning as the rebuild slowly progressed we decided to mark the drive that had locked up as failed, to prevent the array from re-using it once the rebuild had completed. (Since it had just locked up temporarily and not actually failed, it was once again marked as healthy by the array controller). This was done at 09:20 on Thursday 3rd May.</p>
<p>Not long after this we noticed severely decreased performance for vps running off the SAN and that several customers were reporting availability and booting issues. We began looking into reasons for this and because of the timing, the marking of that drive as failed seemed to be the most probable cause, even though it seemed unlikely that such an action should have any impact at all since the drive was not part of the array or being used by the SAN at all.</p>
<p>Over the course of the next few hours we attempted various things such as reducing our own load on the SAN as much as possible to cure the degraded performance but nothing was particularly successful. We were reluctant to modify the SAN settings in case it caused additional issues, however since the progress of the rebuild had no slowed right down to a complete crawl and the impact on customers was so high, we felt we had no choice.</p>
<p>So at 14:15 Thursday 3rd May we set the drive previously marked as failed back to the “ready” state &#8211; this immediately improved performance. After further discussion this appears to be a firmware bug in the RAID controller.</p>
<p>The array rebuild completed at 06:32 Friday 4th May and normal performance was restored.</p>
<p><strong>What we are doing now:</strong></p>
<p>A number of issues were raised during this outage that we&#8217;ll be addressing over the coming months.</p>
<ul>
<li>The first priority was to replace the suspect hard disk with a new drive as soon as the rebuild had finished. While it was currently reporting normally, we didn&#8217;t want to take any further risks with it. It has now been replaced.</li>
<li>We&#8217;ll update our internal process regarding disk array rebuilds to ensure that we do not manually fail a drive out while the rebuild is still in progress.</li>
<li>We&#8217;ve changed a controller setting which will prevent it from automatically performing a copyback operation as soon as a rebuild is completed (this was not a factor in the outage but may have impacted performance after drive replacement).</li>
<li>We&#8217;ll be looking at implementing a system to allow us to move customers (at their request) to a new host from their most recent backup. This will enable customers to restore service in the event of problems with their host server by migrating to a new host, at the cost of slightly outdated data.</li>
</ul>
<p>We would also strongly advise customers who have business requirements around site uptime to investigate options that provide proper redundancy. While we make all efforts to ensure our servers are online and utilize redundant network, storage, and power supplies; hardware failures do occasionally occur. A redundant hosting strategy will minimise downtime during such an event. We provide <a href="http://www.mammothvps.com.au/add-ons/ip-failover">IP failover</a> as a free service for customers who have multiple VPSs.</p>
]]></content:encoded>
			<wfw:commentRss>http://status.mammothvps.com.au/view/1890/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>vps06 Outage</title>
		<link>http://status.mammothvps.com.au/view/1885</link>
		<comments>http://status.mammothvps.com.au/view/1885#comments</comments>
		<pubDate>Tue, 08 May 2012 03:48:43 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Maintenance]]></category>
		<category><![CDATA[Resolved]]></category>
		<category><![CDATA[Status]]></category>
		<category><![CDATA[Unscheduled]]></category>

		<guid isPermaLink="false">http://status.mammothvps.com.au/?p=1885</guid>
		<description><![CDATA[The host vps06 rebooted, resulting in an outage for all customers with VPSs on this host. The cause of the reboot is unknown and we&#8217;re investigating further to see if we can determine the exact problem. Apologies for any inconvenience.]]></description>
			<content:encoded><![CDATA[<p>The host vps06 rebooted, resulting in an outage for all customers with VPSs on this host.</p>
<p>The cause of the reboot is unknown and we&#8217;re investigating further to see if we can determine the exact problem.</p>
<p>Apologies for any inconvenience.</p>
]]></content:encoded>
			<wfw:commentRss>http://status.mammothvps.com.au/view/1885/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Degraded disk performance on vps1-4</title>
		<link>http://status.mammothvps.com.au/view/1856</link>
		<comments>http://status.mammothvps.com.au/view/1856#comments</comments>
		<pubDate>Wed, 02 May 2012 23:55:55 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Maintenance]]></category>
		<category><![CDATA[Resolved]]></category>
		<category><![CDATA[Status]]></category>
		<category><![CDATA[Unscheduled]]></category>

		<guid isPermaLink="false">http://status.mammothvps.com.au/?p=1856</guid>
		<description><![CDATA[A possible hard disk failure has triggered an array rebuild on our primary SAN. The array rebuilding is resulting in increased disk activity which in turn might lead to slower-than-normal I/O for customers that are connected to the SAN. Only customers on vps1, vps2, vps3 or vps4 might be affected. Apologies for the inconvenience. Update [...]]]></description>
			<content:encoded><![CDATA[<p>A possible hard disk failure has triggered an array rebuild on our primary SAN. The array rebuilding is resulting in increased disk activity which in turn might lead to slower-than-normal I/O for customers that are connected to the SAN. Only customers on vps1, vps2, vps3 or vps4 might be affected.</p>
<p>Apologies for the inconvenience.</p>
<p>Update 11:40AM AEST: We are still investigating the problem. The array is rebuilding but much slower than normal, and performance is massively below what it should be (normally a rebuild should only have a small performance hit, so we&#8217;re unsure exactly what is causing the massive degradation in performance). We&#8217;ll update this as more information comes to light.</p>
<p>Update 1:10PM AEST: We&#8217;ve made a few changes to accelerate the rebuild process, which has made a small impact on performance, but it is still slow. Investigations continue, but at this point we&#8217;re unsure how long the rebuild process will take.</p>
<p>Update 2:45PM AEST: We&#8217;ve made a configuration change which has stopped the crippling performance issue and our monitoring is indicating that things are back to a more normal state for an array rebuild. Customer VPSs should now be in a much more operable state, though <strong>please note</strong> that the array is still rebuilding so performance won&#8217;t be at normal speeds until this has finished.</p>
<p>Now the rebuild is proceeding normally we hope to be able to provide an estimate for how long it will take to be completed; we&#8217;ll update this post as soon as we have that information.</p>
<p>Update 5:00PM AEST: The rebuild continues; we have enough information now from the rebuild progress to estimate that it has another 15 hours to go.</p>
<p>Update Friday, May 4, 9:00AM AEST: The array rebuild has finished and performance has returned to normal.</p>
<p>Update Friday, May 4, 1:00PM AEST: The disk responsible for the array build has been swapped out for a new disk.</p>
<p>We will be providing more information about what happened (and what we&#8217;re doing to stop it from happening again) to customers very shortly.</p>
<p>The Incident Report is currently being written and our plan is to mail it to affected customers early next week (probably Tuesday at this point).</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://status.mammothvps.com.au/view/1856/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>vps01, vps03 Outage</title>
		<link>http://status.mammothvps.com.au/view/1848</link>
		<comments>http://status.mammothvps.com.au/view/1848#comments</comments>
		<pubDate>Wed, 02 May 2012 00:56:56 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Maintenance]]></category>
		<category><![CDATA[Resolved]]></category>
		<category><![CDATA[Status]]></category>
		<category><![CDATA[Unscheduled]]></category>

		<guid isPermaLink="false">http://status.mammothvps.com.au/?p=1848</guid>
		<description><![CDATA[The VPS hosts vps01 and vps03 are currently offline &#8211; we&#8217;re investigating this as a matter of urgency and will update this as soon as we have more information. Update 11:06am AEST: the servers have stopped responding, reason still unknown. The servers are in the process of rebooting; we&#8217;ll have updates soon. Update 11:13am AEST: [...]]]></description>
			<content:encoded><![CDATA[<p>The VPS hosts vps01 and vps03 are currently offline &#8211; we&#8217;re investigating this as a matter of urgency and will update this as soon as we have more information. </p>
<p>Update 11:06am AEST: the servers have stopped responding, reason still unknown. The servers are in the process of rebooting; we&#8217;ll have updates soon. </p>
<p>Update 11:13am AEST: vps01 host is back up, vps03 still booting. Please note your VPS instance may take a while to boot up after the host node becomes available. </p>
<p>Update 11:24am AEST: vps03 host is back up, servers on it are still starting. All customers on vps01 should be online now. </p>
<p>Update 11:35am AEST: All customers on vps03 should be online now.</p>
]]></content:encoded>
			<wfw:commentRss>http://status.mammothvps.com.au/view/1848/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Network Performance Issues</title>
		<link>http://status.mammothvps.com.au/view/1841</link>
		<comments>http://status.mammothvps.com.au/view/1841#comments</comments>
		<pubDate>Thu, 26 Apr 2012 23:21:28 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Maintenance]]></category>
		<category><![CDATA[Resolved]]></category>
		<category><![CDATA[Status]]></category>
		<category><![CDATA[Unscheduled]]></category>

		<guid isPermaLink="false">http://status.mammothvps.com.au/?p=1841</guid>
		<description><![CDATA[There appears to be some networking issues between our service and the Telstra network at the moment, which might result in slower than usual connectivity to your VPS if you&#8217;re on an affected network. We&#8217;re currently looking into this and will update as soon as we have more information. Update 11:05 AM AEST: The problem [...]]]></description>
			<content:encoded><![CDATA[<p>There appears to be some networking issues between our service and the Telstra network at the moment, which might result in slower than usual connectivity to your VPS if you&#8217;re on an affected network. We&#8217;re currently looking into this and will update as soon as we have more information. </p>
<p>Update 11:05 AM AEST: The problem appears to now be resolved. We&#8217;re still looking into the root cause with our upstream provider. Apologies for any inconvenience.</p>
]]></content:encoded>
			<wfw:commentRss>http://status.mammothvps.com.au/view/1841/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>vps10 experiencing high load</title>
		<link>http://status.mammothvps.com.au/view/1827</link>
		<comments>http://status.mammothvps.com.au/view/1827#comments</comments>
		<pubDate>Tue, 20 Mar 2012 22:08:06 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Resolved]]></category>
		<category><![CDATA[Unscheduled]]></category>

		<guid isPermaLink="false">http://status.mammothvps.com.au/?p=1827</guid>
		<description><![CDATA[Host node vps10 is currently experiencing high load resulting in slow or failed access to customer VPS. We are attempting to determine the cause. 0925 We have tracked the problem down to a Denial-of-Service attack against a VPS on this host node. After blocking the attack performance has returned to normal. 0955 Some customer VPS [...]]]></description>
			<content:encoded><![CDATA[<p>Host node vps10 is currently experiencing high load resulting in slow or failed access to customer VPS. We are attempting to determine the cause.</p>
<p><strong>0925 </strong>We have tracked the problem down to a Denial-of-Service attack against a VPS on this host node. After blocking the attack performance has returned to normal.</p>
<p><strong>0955 </strong>Some customer VPS were shut down during the high load; these have now been restarted.</p>
]]></content:encoded>
			<wfw:commentRss>http://status.mammothvps.com.au/view/1827/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Website offline</title>
		<link>http://status.mammothvps.com.au/view/1822</link>
		<comments>http://status.mammothvps.com.au/view/1822#comments</comments>
		<pubDate>Sat, 17 Mar 2012 02:03:40 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Maintenance]]></category>
		<category><![CDATA[Resolved]]></category>
		<category><![CDATA[Status]]></category>
		<category><![CDATA[Unscheduled]]></category>

		<guid isPermaLink="false">http://status.mammothvps.com.au/?p=1822</guid>
		<description><![CDATA[The Mammoth VPS website at http://www.mammothvps.com.au is currently offline, which means mPanel is inaccessible. Apologies for the inconvenience! This is being investigated now. Customer VPSs are unaffected by this outage. Update 13:15 AEST: The site service has now been restored. The web servers were stuck in a weird, non-working state. This normally would have been [...]]]></description>
			<content:encoded><![CDATA[<p>The Mammoth VPS website at http://www.mammothvps.com.au is currently offline, which means mPanel is inaccessible. Apologies for the inconvenience! This is being investigated now.</p>
<p>Customer VPSs are unaffected by this outage.</p>
<p>Update 13:15 AEST: The site service has now been restored.</p>
<p>The web servers were stuck in a weird, non-working state. This normally would have been caught by alarming, but it looks like the alarming wasn&#8217;t set up correctly to catch this condition. We&#8217;ll be fixing this first thing to ensure that this does not happen again.</p>
<p>Again, our sincere apologies to those inconvenienced by this site outage.</p>
]]></content:encoded>
			<wfw:commentRss>http://status.mammothvps.com.au/view/1822/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Upstream Network Outage</title>
		<link>http://status.mammothvps.com.au/view/1819</link>
		<comments>http://status.mammothvps.com.au/view/1819#comments</comments>
		<pubDate>Fri, 17 Feb 2012 16:38:29 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Resolved]]></category>
		<category><![CDATA[Unscheduled]]></category>

		<guid isPermaLink="false">http://status.mammothvps.com.au/?p=1819</guid>
		<description><![CDATA[Between 3:19 AM and 3:25 AM we lost all network connectivity from our upstream provider, resulting in all VPS being unavailable from the internet during this period. We are awaiting an incident report from SOUL and will provide further details as they come to hand.]]></description>
			<content:encoded><![CDATA[<p>Between 3:19 AM and 3:25 AM we lost all network connectivity from our upstream provider, resulting in all VPS being unavailable from the internet during this period.</p>
<p>We are awaiting an incident report from SOUL and will provide further details as they come to hand.</p>
]]></content:encoded>
			<wfw:commentRss>http://status.mammothvps.com.au/view/1819/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Intermittent minor packet loss</title>
		<link>http://status.mammothvps.com.au/view/1815</link>
		<comments>http://status.mammothvps.com.au/view/1815#comments</comments>
		<pubDate>Thu, 16 Feb 2012 02:27:18 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Maintenance]]></category>
		<category><![CDATA[Resolved]]></category>
		<category><![CDATA[Status]]></category>
		<category><![CDATA[Unscheduled]]></category>

		<guid isPermaLink="false">http://status.mammothvps.com.au/?p=1815</guid>
		<description><![CDATA[There is a known issue involving network packet loss; we are currently investigating. The effects are currently only barely visible but customers might notice the occasional packet getting dropped or brief period of network loss (a few seconds). Update: This is now resolved.]]></description>
			<content:encoded><![CDATA[<p>There is a known issue involving network packet loss; we are currently investigating.</p>
<p>The effects are currently only barely visible but customers might notice the occasional packet getting dropped or brief period of network loss (a few seconds).</p>
<p>Update: This is now resolved.</p>
]]></content:encoded>
			<wfw:commentRss>http://status.mammothvps.com.au/view/1815/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Scheduled Network Maintenance</title>
		<link>http://status.mammothvps.com.au/view/1806</link>
		<comments>http://status.mammothvps.com.au/view/1806#comments</comments>
		<pubDate>Wed, 15 Feb 2012 04:29:10 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Resolved]]></category>
		<category><![CDATA[Scheduled]]></category>

		<guid isPermaLink="false">http://status.mammothvps.com.au/?p=1806</guid>
		<description><![CDATA[Expected downtime: 5 minutes During this maintenance period we will changing to a new router configuration to enable the eventual supply of IPv6 to customers later this year. As part of the maintenance period we are also decommissioning an older switch. Customers should plan for the network to be inaccessible for a period of up [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Expected downtime: </strong>5 minutes</p>
<p>During this maintenance period we will changing to a new router configuration to enable the eventual supply of IPv6 to customers later this year. As part of the maintenance period we are also decommissioning an older switch.</p>
<p>Customers should plan for the network to be inaccessible for a period of up to 30 minutes within the maintenance period. There is no other impact; all VPS will continue running and will be accessible without customer intervention following the scheduled maintenance period.</p>
<p>This maintenance period is outside of the 24/5 global trading window. Trading platforms can safely be left running. Individual VPSs will not be affected so you should not need to restart any services after the downtime, but we encourage users to double check just to make sure everything is working as expected after the maintenance window.</p>
]]></content:encoded>
			<wfw:commentRss>http://status.mammothvps.com.au/view/1806/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

