MonsterMegs - Investigating issue with Lightning server – Incident details

All systems operational

About This Site

Welcome to MonsterMegs' status page. If you want to keep on top of any disruptions in service to our website, control panel, or hosting platform, this is the page to check. We report minor/major outages and scheduled maintenance to this status page. However, issues affecting a small number of customers may be reported directly to affected customers in MyMonsterMegs (https://my.monstermegs.com). If you're experiencing an issue you do not see reported on this page, please log into MyMonsterMegs to view any alerts our team has added to your account.

Investigating issue with Lightning server

Resolved
Operational
Started 8 months agoLasted 4 days

Affected

Web Hosting Servers

Operational from 5:49 PM to 4:01 AM

Lightning

Operational from 5:49 PM to 4:01 AM

Updates
  • Resolved
    Resolved

    We are extremely happy to announce that the server is back online in full capacity. With the collaboration of our staff and the Cloudlinux staff, we were able to recover the boot partition and bring the server back online.

    We found that this corruption of the boot partition happened before the hardware migration took affect. So we are hoping the new hardware will resolve these issue with the random reboots on this server. We also have the Cloudlinux staff reviewing a kernel dump from another server that has a similar issue with the random reboots. Once they analyze the kdump from that server, we should get more information on what is causing the random reboots. Although from what we seen on this server, it might be unrelated and most likely was a failing hardware component.

    We thank everyone for their patience during all this. This was certainly one of our worst and challenging outages since we have been in business. We really felt it was gonna lead to a server reinstall, but our team and the cloudlinux team really pulled together to pull off a miracle to rebuild and restore the boot partition.

  • Identified
    Update

    We have discovered that after the crash of the server, one of the partitions got corrupted. We are working to repair the partition and then go from there. Most likely if we are unable to repair the partition, we will have to do a full reinstall and restore backups.

    We anticipate that neither process is gonna be quick and you should prepare that this will be a lengthy outage, but do not panic as we have full backups of all accounts from today and they will be restored if we need to reinstall the server.

  • Identified
    Update

    We got notice that the drives were migrated too the new hardware, but now we are facing issues getting the server to boot the os. We are working on this along with the datacenter to get the server back online.

    In a absolute worst case scenario, we do have up to date backups if it would come to that, but we are not at that stage yet.

  • Identified
    Update

    We are still waiting on confirmation of the hardware swap completion. We will update as soon as we hear anything.

  • Identified
    Update

    We are going to proceed with the hardware replacement to hopefully prevent further outages. You can expect around 2 hours downtime for this to take place. We will update further once the server is back online.

  • Identified
    Update

    We are in talks will replacing the hardware on the server and moving the hard drives to the new server. Once we come to a conclusion to move forward, we will update this incident.

  • Identified
    Identified

    The Lightning server seems to have crashed again. We have the datacenter looking into it and investigating the cause of the crash. Updates will follow when more information comes in.

  • Resolved
    Resolved

    We are still working with Cloudlinux to troubleshoot these issues, but for now we are going to close this post to unclutter our client portal. We will still post updates as more information comes in or if there are any further issues.

  • Monitoring
    Update

    We have brought in Cloudlinux technicians to take a further look at this. There has been similar issues on other servers since installing Cloudlinux and we are doing everything possible to get to the root of the problem.

  • Monitoring
    Monitoring

    The Lightning server is now back online. The datacenter did not detect any hardware issues, so we are going to dig through the server logs and see what is causing this. We will be monitoring the server very closely for any further disruptions.

  • Investigating
    Update

    The datacenter has reported that they are investigating the issue. We will update when we get more details.

  • Investigating
    Update

    We are still not sure of the issue, but the server is not coming up after a hardware reset. So we have contacted the datacenter to investigate further. We suspect it may be a failed hardware component, but we will wait for them to confirm.

  • Investigating
    Investigating

    We are aware of an issue with the Lightning server. It did several reboots right after another and then has not come back up. We are investigating this and will update when we have more information.