Affected
Operational from 5:49 PM to 4:01 AM
Operational from 5:49 PM to 4:01 AM
- ResolvedResolved
We are extremely happy to announce that the server is back online in full capacity. With the collaboration of our staff and the Cloudlinux staff, we were able to recover the boot partition and bring the server back online.
We found that this corruption of the boot partition happened before the hardware migration took affect. So we are hoping the new hardware will resolve these issue with the random reboots on this server. We also have the Cloudlinux staff reviewing a kernel dump from another server that has a similar issue with the random reboots. Once they analyze the kdump from that server, we should get more information on what is causing the random reboots. Although from what we seen on this server, it might be unrelated and most likely was a failing hardware component.
We thank everyone for their patience during all this. This was certainly one of our worst and challenging outages since we have been in business. We really felt it was gonna lead to a server reinstall, but our team and the cloudlinux team really pulled together to pull off a miracle to rebuild and restore the boot partition.
- UpdateUpdate
We have discovered that after the crash of the server, one of the partitions got corrupted. We are working to repair the partition and then go from there. Most likely if we are unable to repair the partition, we will have to do a full reinstall and restore backups.
We anticipate that neither process is gonna be quick and you should prepare that this will be a lengthy outage, but do not panic as we have full backups of all accounts from today and they will be restored if we need to reinstall the server.
- UpdateUpdate
We got notice that the drives were migrated too the new hardware, but now we are facing issues getting the server to boot the os. We are working on this along with the datacenter to get the server back online.
In a absolute worst case scenario, we do have up to date backups if it would come to that, but we are not at that stage yet. - UpdateUpdate
We are still waiting on confirmation of the hardware swap completion. We will update as soon as we hear anything.
- UpdateUpdate
We are going to proceed with the hardware replacement to hopefully prevent further outages. You can expect around 2 hours downtime for this to take place. We will update further once the server is back online.
- UpdateUpdate
We are in talks will replacing the hardware on the server and moving the hard drives to the new server. Once we come to a conclusion to move forward, we will update this incident.
- IdentifiedIdentified
The Lightning server seems to have crashed again. We have the datacenter looking into it and investigating the cause of the crash. Updates will follow when more information comes in.
- ResolvedResolved
We are still working with Cloudlinux to troubleshoot these issues, but for now we are going to close this post to unclutter our client portal. We will still post updates as more information comes in or if there are any further issues.
- UpdateUpdate
We have brought in Cloudlinux technicians to take a further look at this. There has been similar issues on other servers since installing Cloudlinux and we are doing everything possible to get to the root of the problem.
- MonitoringMonitoring
The Lightning server is now back online. The datacenter did not detect any hardware issues, so we are going to dig through the server logs and see what is causing this. We will be monitoring the server very closely for any further disruptions.
- UpdateUpdate
The datacenter has reported that they are investigating the issue. We will update when we get more details.
- UpdateUpdate
We are still not sure of the issue, but the server is not coming up after a hardware reset. So we have contacted the datacenter to investigate further. We suspect it may be a failed hardware component, but we will wait for them to confirm.
- InvestigatingInvestigating
We are aware of an issue with the Lightning server. It did several reboots right after another and then has not come back up. We are investigating this and will update when we have more information.