MonsterMegs - Notice history

All systems operational

About This Site

Welcome to MonsterMegs' status page. If you want to keep on top of any disruptions in service to our website, control panel, or hosting platform, this is the page to check. We report minor/major outages and scheduled maintenance to this status page. However, issues affecting a small number of customers may be reported directly to affected customers in MyMonsterMegs (https://my.monstermegs.com). If you're experiencing an issue you do not see reported on this page, please log into MyMonsterMegs to view any alerts our team has added to your account.

100% - uptime

Website - Operational

100% - uptime
Jan 2024 · 100.0%Feb · 100.0%Mar · 100.0%
Jan 2024
Feb 2024
Mar 2024

Customer Portal - Operational

100% - uptime
Jan 2024 · 100.0%Feb · 100.0%Mar · 100.0%
Jan 2024
Feb 2024
Mar 2024
99% - uptime

Thunder - Operational

98% - uptime
Jan 2024 · 100.0%Feb · 100.0%Mar · 93.77%
Jan 2024
Feb 2024
Mar 2024

Hurricane - Operational

100% - uptime
Jan 2024 · 100.0%Feb · 100.0%Mar · 100.0%
Jan 2024
Feb 2024
Mar 2024

Storm - Operational

100% - uptime
Jan 2024 · 99.99%Feb · 100.0%Mar · 100.0%
Jan 2024
Feb 2024
Mar 2024

Lightning - Operational

100% - uptime
Jan 2024 · 99.99%Feb · 100.0%Mar · 100.0%
Jan 2024
Feb 2024
Mar 2024
100% - uptime

DNS-1 - Operational

100% - uptime
Jan 2024 · 100.0%Feb · 100.0%Mar · 100.0%
Jan 2024
Feb 2024
Mar 2024

DNS-2 - Operational

100% - uptime
Jan 2024 · 100.0%Feb · 100.0%Mar · 100.0%
Jan 2024
Feb 2024
Mar 2024

DNS-3 - Operational

100% - uptime
Jan 2024 · 100.0%Feb · 100.0%Mar · 100.0%
Jan 2024
Feb 2024
Mar 2024

DNS-4 - Operational

100% - uptime
Jan 2024 · 100.0%Feb · 100.0%Mar · 100.0%
Jan 2024
Feb 2024
Mar 2024
100% - uptime

US Backup Storage Daily - Operational

100% - uptime
Jan 2024 · 100.0%Feb · 100.0%Mar · 100.0%
Jan 2024
Feb 2024
Mar 2024

US Backup Storage Weekly - Operational

100% - uptime
Jan 2024 · 100.0%Feb · 100.0%Mar · 100.0%
Jan 2024
Feb 2024
Mar 2024

EU Backup Storage Daily - Operational

100% - uptime
Jan 2024 · 100.0%Feb · 100.0%Mar · 100.0%
Jan 2024
Feb 2024
Mar 2024

EU Backup Storage Weekly - Operational

100% - uptime
Jan 2024 · 100.0%Feb · 100.0%Mar · 100.0%
Jan 2024
Feb 2024
Mar 2024
100% - uptime

Zabbix Monitoring Server US - Operational

100% - uptime
Jan 2024 · 100.0%Feb · 100.0%Mar · 100.0%
Jan 2024
Feb 2024
Mar 2024

Zabbix Monitoring Server EU - Operational

100% - uptime
Jan 2024 · 100.0%Feb · 100.0%Mar · 100.0%
Jan 2024
Feb 2024
Mar 2024

Third Party: Cloudflare → Cloudflare Sites and Services → CDN/Cache - Operational

Notice history

Mar 2024

Investigating issue with Thunder server
  • Resolved
    Resolved
    This incident has been resolved.
  • Monitoring
    Monitoring

    The server is now back online. It looks to be a hardware issue that is similar we faced with another server in this datacenter. Here is the response from the datacenter:

    you are on a new 7950X3D CPU and motherboard.
    I'm assuming it is the same issue we've seen with a few other 7950X servers. They would randomly reboot over and over out of no where. We are guessing we got sent a bad batch from the manufacture as we have hundreds of others running fine. We've seen online some others have experienced the same thing with some 7950x's. We started buying 7950X3D, same thing, a little more expensive, but are faster and use less power, seem to be more stable.

    We strongly believe this should resolve the downtime issue, but we will continue to monitor this closely.

  • Identified
    Identified

    The datacenter is moving the hard drives to completely new server setup to rule out any hardware issues. We update as more information comes in.

  • Investigating
    Update

    It looks like the server is stuck in a constant reboot. The datacenter is actively working on it and checking the hardware and running some quick hardware tests.

  • Investigating
    Investigating

    Well it appears it has crashed again. We are investigating.

  • Resolved
    Resolved

    We have now concluded our investigation on the outage of the Thunder server. Looking through the logs and the action taken on the server to bring it back online, we have come to the following conclusions and timeline of events.

    1. The server crashed for no apparent reason. This has been an ongoing issue across our whole server fleet. We have setup our servers with kdump that will produce a kernel dump that can be used for analysis. We have submitted the kernel dumps to Cloudlinux from several servers and they stated it will take a few weeks to analyze these.

      Kernel dumps save the contents of the system memory for analysis and average around 4 GB in size. They require specially trained technicians to read the output of the kernel dump and it is not something we could do.

    2. Once we found the server to be down, we discovered the server was not booting back into the kernel and instead was landing on a Grub screen. At that point we began our troubleshooting process and also brought the support staff from Cloudlinux to investigate. We also contacted the datacenter and requested a USB be installed with a rescue system installed.

      Discovering the cause of a no boot situation is very lengthy process and requires setting up a rescue system to boot into and mounting the original operating system of the server. This all takes extended amounts of time, along with coordinating with the the various staff and CL support team.

    3. After server hours of investigation, we found that upon reboot, one the NVMe hard drives that makes up the Raid-1 array on the server, had fallen out of the raid array. To simplify this, a Raid-1 array basically mirrors all the data across 2 drives. If one fails or falls out of the array, the other disk still contains the data on the drive.

      In this case the one hard drive fell out of one of the arrays, that contains the boot records. With new server technology, there are different boot records stored on each drive and they are not mirrored like the older systems. So when the one drive fell out of the array, it contained part of the boot records for EFI that had to be reinstalled.

    4. At this point we went to reinstall the boot grub config and attempted to reboot, but still it went back to booting into the grub screen. At this point we had the staff from Cloudlinux attempt this as well, and still the reboot failed.

    5. This is when we made the decision to bring in outside help. We have one of the best Linux technicians in the country on standby, that we have developed a working relationship over the fast few years. While he does come with a premium price per hour, we felt it had reached a point that we had exhausted all our attempts to bring the server back online.

      At this point we gave him a quick rundown of the state of the server and began his work. After a few attempts to reinstall the boot records himself, he still faced the same issues of a non-booting server. He continued to work the server for another 2 hours and we finally able to get the boot records to stick and the server booted correctly. To verify even further, we performed a second manual reboot and the server booted in the operating system as it should.

    At this point we will consider the server running stable, but we are still awaiting the analysis of the various kernel dumps that have been provided. There were some changes made since the random reboots/crashes started occurring and this is the first one on 3 weeks, compared to where they were happing several times a week, across several server. So we feel this might have been an isolated case with the hard drive dropping out of the raid array, leading to the kernel panic and crash.

    We will continue to work with the Cloudlinux team to find the cause of the crashes.

  • Monitoring
    Monitoring

    We are happy to state that the server is back online. We will be monitoring this closely and we are also going to have Cloudlinux examine a kernel dump that was generated on the crash, along with investigating why the one hard drive partially fell out of the raid array.

    We understand many customers wanted a timeline on when this server was going to be back online, but in situations like this, it is impossible to give a timeline. In the first few hours, we would have thought the server would be back up in an hour or so, but that was not the case. If we give a timeline and miss it, then there is gonna be negative feed back on that.

    Even further, if we ended up having to reinstall, it could have taken another day. In any outage we would love to give a timeline, but it not always feasible to do that, such in this case.

    We thank everything for their patience! If anything further pops up, we will update accordingly.

  • Identified
    Update

    Just a short update. We have pulled in one of the top linux technicians in the country and he has now taken over operations. he is actively working on the server as we speak and hope to have good news soon.

  • Identified
    Update

    The troubleshooting process is still ongoing. We may have tracked this down to a raid corruption on the server, but we are still not 100% sure on that.

    We appreciate everyone's patience. We know this is a long outage and we understand everyone's frustrations. We are facing the same frustrations, but some server issues are not always so cut and dry and can take hours and hours of troubleshooting. If at all possible we would rather take a little extra time and try to repair the server, than perform a full rebuild that is not a quick process and "can" come with it's own issues.

    We will continue to update as things progress.

  • Identified
    Update

    At this time there is not anything new to report. We are continuing to track down the issue and bring the boot records into place. These types of issues are not quick fixes and we do expect this to be an extended outage. We are doing everything we can to repair the state of the server and avoid a full server reinstall.

    We post as soon as we have further details.

  • Identified
    Update

    Just a short update, but we have had the datacenter install a rescue USB and we will continue our troubleshooting to see why it is not properly booting into the kernel. We have also brought aboard the Cloudlinux support team and we are working together to identify and resolve the kernel issue.

    We have had clients reach out and ask about backups. In a worst case scenario, where we would have to reinstall the server and restore backups, we have backups within 6 hours of the outage.

  • Identified
    Update

    We are continuing to work on the server and track down why it is not booting into the kernel. This may take a few hours to attempt to get the boot records reinstalled and the server booted back up.

  • Identified
    Identified

    We found upon notification of this server being down, that the server crashed and is not botting into the kernel properly. We are working on this and will update as more information comes in.

  • Investigating
    Investigating

    We are currently investigating an issue with the server Thunder. Our engineers have been alerted and further details will be provided if necessary.

Feb 2024

Investigating issue with Lightning server
  • Resolved
    Resolved

    We are extremely happy to announce that the server is back online in full capacity. With the collaboration of our staff and the Cloudlinux staff, we were able to recover the boot partition and bring the server back online.

    We found that this corruption of the boot partition happened before the hardware migration took affect. So we are hoping the new hardware will resolve these issue with the random reboots on this server. We also have the Cloudlinux staff reviewing a kernel dump from another server that has a similar issue with the random reboots. Once they analyze the kdump from that server, we should get more information on what is causing the random reboots. Although from what we seen on this server, it might be unrelated and most likely was a failing hardware component.

    We thank everyone for their patience during all this. This was certainly one of our worst and challenging outages since we have been in business. We really felt it was gonna lead to a server reinstall, but our team and the cloudlinux team really pulled together to pull off a miracle to rebuild and restore the boot partition.

  • Identified
    Update

    We have discovered that after the crash of the server, one of the partitions got corrupted. We are working to repair the partition and then go from there. Most likely if we are unable to repair the partition, we will have to do a full reinstall and restore backups.

    We anticipate that neither process is gonna be quick and you should prepare that this will be a lengthy outage, but do not panic as we have full backups of all accounts from today and they will be restored if we need to reinstall the server.

  • Identified
    Update

    We got notice that the drives were migrated too the new hardware, but now we are facing issues getting the server to boot the os. We are working on this along with the datacenter to get the server back online.

    In a absolute worst case scenario, we do have up to date backups if it would come to that, but we are not at that stage yet.

  • Identified
    Update

    We are still waiting on confirmation of the hardware swap completion. We will update as soon as we hear anything.

  • Identified
    Update

    We are going to proceed with the hardware replacement to hopefully prevent further outages. You can expect around 2 hours downtime for this to take place. We will update further once the server is back online.

  • Identified
    Update

    We are in talks will replacing the hardware on the server and moving the hard drives to the new server. Once we come to a conclusion to move forward, we will update this incident.

  • Identified
    Identified

    The Lightning server seems to have crashed again. We have the datacenter looking into it and investigating the cause of the crash. Updates will follow when more information comes in.

  • Resolved
    Resolved

    We are still working with Cloudlinux to troubleshoot these issues, but for now we are going to close this post to unclutter our client portal. We will still post updates as more information comes in or if there are any further issues.

  • Monitoring
    Update

    We have brought in Cloudlinux technicians to take a further look at this. There has been similar issues on other servers since installing Cloudlinux and we are doing everything possible to get to the root of the problem.

  • Monitoring
    Monitoring

    The Lightning server is now back online. The datacenter did not detect any hardware issues, so we are going to dig through the server logs and see what is causing this. We will be monitoring the server very closely for any further disruptions.

  • Investigating
    Update

    The datacenter has reported that they are investigating the issue. We will update when we get more details.

  • Investigating
    Update

    We are still not sure of the issue, but the server is not coming up after a hardware reset. So we have contacted the datacenter to investigate further. We suspect it may be a failed hardware component, but we will wait for them to confirm.

  • Investigating
    Investigating

    We are aware of an issue with the Lightning server. It did several reboots right after another and then has not come back up. We are investigating this and will update when we have more information.

Jan 2024

Emergency Hardware Replacement
  • Completed
    January 29, 2024 at 2:33 AM
    Completed
    January 29, 2024 at 2:33 AM

    The hardware components have been replaced and the server is now back online. We will continue to monitor the server closely for any further reboots, but hopefully the replacement of these components will be the solution.

  • In progress
    January 29, 2024 at 2:00 AM
    In progress
    January 29, 2024 at 2:00 AM

    Maintenance is now in progress

  • Planned
    January 29, 2024 at 2:00 AM
    Planned
    January 29, 2024 at 2:00 AM

    We are reaching out to inform you about an urgent maintenance operation that needs to be conducted on the server hosting your account. Over the past two weeks, we've encountered sporadic reboots on this server, occurring every few days. Initially, we suspected kernel panics as the root cause and took measures to enable extended logging to capture kernel dump logs during the subsequent reboots. However, upon analysis, we found no output from the kernel and no additional logs shedding light on the issue.

    Given the absence of software-related explanations, we have engaged with our datacenter, and they suspect a faulty CPU may be the culprit. To address this, we will need to replace the CPU, a process that will require the server to be offline for an estimated period of 30-60 minutes. Additionally, we've requested the datacenter to swap out the RAM sticks and the power supply concurrently. While the CPU is the primary suspect, we are taking a comprehensive approach to ensure that other hardware components are not contributing to the problem. The replacement of these additional components will only marginally extend the downtime, adding approximately 5-10 minutes.

    The scheduled maintenance to replace the CPU and other hardware components is set for** 8 PM CST tonight**. Typically, we like to provide more advanced notice for such operations, but given the urgency to mitigate further disruptions caused by the random reboots, we believe swift action is warranted.

    We sincerely appreciate your understanding and apologize for any inconvenience this may cause. Should you have any concerns or questions, please don't hesitate to reach out to our support team.

    Thank you for your cooperation.

    Warm regards,

    MonsterMegs Team

Kernel Panics - Thunder
  • Resolved
    Resolved

    The reboot has been completed and the kdump extended logging has been enabled. It appears there have been several kernel panics that caused these random reboots throughout the week. The addition of kdump will generate a dump file on any kernel panics that Cloudlinux will review to determine the cause of the kernel panics.

    This know this is directly related to Cloudlinux due to the fact that we had this server running on cpanel for over a month before the migrations, with no reboots or issues of this kind. Since incorporating Cloudlinux into the server, the issue started to present itself a few days later after the migrations were complete.

    Over the next several days we will be monitoring the server closely for any of these kernel panics and will work closely with the Cloudlinux team to track down the issue and get this resolved ASAP.

    If any more information comes to light, we will update this post accordingly.

  • Monitoring
    Monitoring

    We are going to require an additional server reboot to enable additional logging. This is going to happen in the next 5 minutes and should take about 3-5 minutes.

  • Investigating
    Update

    The reboot was completed at 6:53am CST.

    The cloudlinux team are still investigating the issue and we will update as more information comes in. Please be aware that further reboots may be required, but not guaranteed. They also may need to restart services as they troubleshoot, but have been informed to keep any impacts to a minimum.

  • Investigating
    Investigating

    We need to perform an emergency reboot on the server. We have found an issue where the server has been rebooting on its own and we are working with the Cloudlinux team to track down the cause.

Hosting Migrations - US Servers
  • Completed
    January 14, 2024 at 4:28 AM
    Completed
    January 14, 2024 at 4:28 AM

    The server migrations are now complete. Over the next few hours we will be resolving any small issues, performing follow up tasks, and doing a post migration audit.

    If you are using a 3rd party dns provider such as Cloudflare or DNSMadeEASY, you may now update 3rd party dns zone with the new ips:

    Storm Server: 184.95.50.250 -> 38.46.220.132
    Thunder Server: 108.170.49.202 -> 38.46.220.134

    If you have a dedicated ip assigned to your account, you can find this ip listed your cpanel account. It is listed in the right had column under the "General Information" section.

    We have also added Mailbaby email delivery to the Storm server. This is a email service that we have been using on our Semi-Dedicated hosting plans for the last year and have had great success with email deliverability. Those using a 3rd party dns service will need to add the following to their SPF txt entry.

    +include:relay.mailbaby.net

    Over the next couple days we will be resolving any lingering issues and ask everyone to be patient as we work through these issues.

  • Update
    January 14, 2024 at 3:35 AM
    In progress
    January 14, 2024 at 3:35 AM

    The server migration of the Thunder server is now complete. The Storm server is at about 74% and should complete within the next 2 hours or so.

    We will provide another update when the Storm server migration is complete.

    Once the migration itself is complete, we will have a couple hours of backend task to complete, so please be patient as we update all our backend components.

  • In progress
    January 14, 2024 at 2:00 AM
    In progress
    January 14, 2024 at 2:00 AM

    Maintenance is now in progress

  • Planned
    January 14, 2024 at 2:00 AM
    Planned
    January 14, 2024 at 2:00 AM

    As mentioned a couple of weeks ago, we will be migrating your server to new updated server and a new datacenter location in Salt Lake City, Utah (Fiberstate). In this email we will address a few details of the server migration process, along with a time line. The time line will be a rough estimate as it is hard to anticipate transfer speeds, but if anything we believe this will go faster than anticipated.

    We are very excited for this move and we feel our customers are going to love the changes. The server speeds on these new Ryzen 9 7950x servers are nothing short of amazing. These servers are outfitted with top of line components and really shows in the work we have done over the last month to get these servers ready for production.

    This Saturday (January 13th, 2024) at 8:00pm CST we will begin the migration process. We anticipate this transfer to take 4-6 hours, but like stated earlier, we expect it finish quite a bit faster. During this migration there will be no downtime, although there may be slow downs due to the strain on the servers during the migration. We recommend that if possible, to avoid any changes to websites, cpanel settings, etc during this time. These changes could get lost during the transfer process.

    Once the migration is complete, we have some small back-end tasks to complete before we can consider the migration to complete. Over the following 72 hours we will be tweaking settings and taking care of any small issues that may be detected. So you may notice some small slowdowns or very short periods of failed connections as we restart services. We anticipate the impact will be little to none.

    With all migrations, the ips that serve your website will be changing. If you are using our nameservers as standard, such as ns1.megpanel.com, ns2.megpanel.com, etc, there will nothing required on your end to point to the new server. As your account migrates to the new server, the new ip will be automatically updated in the dns cluster.

    Now, if you are using a 3rd party dns provider, such as Cloudflare, you will need to be aware that once the migrations are complete, you will need to update your dns zones directly at the dns provider with the new ip address. The ip changes are as follows:

    Storm Server: 184.95.50.250 -> 38.46.220.132
    Thunder Server: 108.170.49.202 -> 38.46.220.134

    If your account has a dedicated ip and using a 3rd party dns provider, you will need to use that dedicated ip in place of the ips listed above. You will be able to find your assigned dedicated ip in your cpanel interface after the migration. You can find the dedicated ip listed under the "General Information" in the right column of cpanel.

    We will be posting updates on our Server Status page throughout the entire process. So if you have not yet subscribed, we urge you do so as all updates will be there. Our Server Status page is also directly integrated into our billing portal, so all details will also be found on customer portal homepage.

    All migration updates and post-migration details will be posted on our Server Status page and in our client portal.

    We ask that everyone is patient during the migration process and the days following. Our staff will be incredibly busy with post migration tasks, so we ask that if possible, to hold off on any non-urgent tickets.

    We will send one final email the day before the migration that will serve as a final reminder.

Jan 2024 to Mar 2024

Next