News

[Update: Official Reason] WhatsApp, Facebook and Instagram server down

Published

on

Update 6: Real Reason of server outage:

Facebook has given a true reason and an explanation of this entire matter, check it below.

“This outage was triggered by the system that manages our global backbone network capacity. The backbone is the network Facebook has built to connect all our computing facilities together, which consists of tens of thousands of miles of fiber-optic cables crossing the globe and linking all our data centers.

Those data centers come in different forms. Some are massive buildings that house millions of machines that store data and run the heavy computational loads that keep our platforms running, and others are smaller facilities that connect our backbone network to the broader internet and the people using our platforms.

When you open one of our apps and load up your feed or messages, the app’s request for data travels from your device to the nearest facility, which then communicates directly over our backbone network to a larger data center. That’s where the information needed by your app gets retrieved and processed, and sent back over the network to your phone.

The data traffic between all these computing facilities is managed by routers, which figure out where to send all the incoming and outgoing data. And in the extensive day-to-day work of maintaining this infrastructure, our engineers often need to take part of the backbone offline for maintenance — perhaps repairing a fiber line, adding more capacity, or updating the software on the router itself.

This was the source of yesterday’s outage. During one of these routine maintenance jobs, a command was issued with the intention to assess the availability of global backbone capacity, which unintentionally took down all the connections in our backbone network, effectively disconnecting Facebook data centers globally. Our systems are designed to audit commands like these to prevent mistakes like this, but a bug in that audit tool prevented it from properly stopping the command.

This change caused a complete disconnection of our server connections between our data centers and the internet. And that total loss of connection caused a second issue that made things worse.

One of the jobs performed by our smaller facilities is to respond to DNS queries. DNS is the address book of the internet, enabling the simple web names we type into browsers to be translated into specific server IP addresses. Those translation queries are answered by our authoritative name servers that occupy well known IP addresses themselves, which in turn are advertised to the rest of the internet via another protocol called the border gateway protocol (BGP).

To ensure reliable operation, our DNS servers disable those BGP advertisements if they themselves can not speak to our data centers, since this is an indication of an unhealthy network connection. In the recent outage the entire backbone was removed from operation, making these locations declare themselves unhealthy and withdraw those BGP advertisements. The end result was that our DNS servers became unreachable even though they were still operational. This made it impossible for the rest of the internet to find our servers.

All of this happened very fast. And as our engineers worked to figure out what was happening and why, they faced two large obstacles: first, it was not possible to access our data centers through our normal means because their networks were down, and second, the total loss of DNS broke many of the internal tools we’d normally use to investigate and resolve outages like this.

Our primary and out-of-band network access was down, so we sent engineers onsite to the data centers to have them debug the issue and restart the systems. But this took time, because these facilities are designed with high levels of physical and system security in mind. They’re hard to get into, and once you’re inside, the hardware and routers are designed to be difficult to modify even when you have physical access to them. So it took extra time to activate the secure access protocols needed to get people onsite and able to work on the servers. Only then could we confirm the issue and bring our backbone back online.

Once our backbone network connectivity was restored across our data center regions, everything came back up with it. But the problem was not over — we knew that flipping our services back on all at once could potentially cause a new round of crashes due to a surge in traffic. Individual data centers were reporting dips in power usage in the range of tens of megawatts, and suddenly reversing such a dip in power consumption could put everything from electrical systems to caches at risk. “

Update 5: Formal Address from Facebook

Facebook has addressed this issue formally via a blog post.

“To all the people and businesses around the world who depend on us, we are sorry for the inconvenience caused by today’s outage across our platforms. We’ve been working as hard as we can to restore access, and our systems are now back up and running. The underlying cause of this outage also impacted many of the internal tools and systems we use in our day-to-day operations, complicating our attempts to quickly diagnose and resolve the problem.

Our engineering teams have learned that configuration changes on the backbone routers that coordinate network traffic between our data centers caused issues that interrupted this communication. This disruption to network traffic had a cascading effect on the way our data centers communicate, bringing our services to a halt.

Our services are now back online and we’re actively working to fully return them to regular operations. We want to make clear at this time we believe the root cause of this outage was a faulty configuration change. We also have no evidence that user data was compromised as a result of this downtime.

People and businesses around the world rely on us every day to stay connected. We understand the impact outages like these have on people’s lives, and our responsibility to keep people informed about disruptions to our services. We apologize to all those affected, and we’re working to understand more about what happened today so we can continue to make our infrastructure more resilient.”

Update 4: Back online

All three of these services are now back online and working as usual after the hours-long shortage of servers. All three platforms, owned and operated by Facebook. The services went out down at 11:39 a.m. ET. By around 6 p.m. ET, users of all three platforms reported that some service had been restored, but full functionality remained elusive well into Monday evening.

Facebook issued a statement on this matter and confirmed that the services are now resuming as well as apologized for the inconvenience.

“To the huge community of people and businesses around the world who depend on us: we’re sorry, We’ve been working hard to restore access to our apps and services and are happy to report they are coming back online now. Thank you for bearing with us,” said Facebook.

That’s not it, Facebook CEO, Mark Zuckerberg also addressed this matter after the services resumed and apologies: “Sorry for the disruption today — I know how much you rely on our services to stay connected with the people you care about.”

Meanwhile, Facebook nor its employees has mentioned any specific reason for this massive outage of servers at the moment.

Update 3: WhatsApp Statement:

“We’re aware that some people are experiencing issues with WhatsApp at the moment. We’re working to get things back to normal and will send an update here as soon as possible.” Wrote WhatsApp.

Update 2: Instagram Statement:

“Instagram and friends are having a little bit of a hard time right now, and you may be having issues using them. Bear with us, we’re on it! #instagramdown”

Update 1: Facebook Statement

Facebook has quickly jumped into this matter and said:

We’re aware that some people are having trouble accessing our apps and products. We’re working to get things back to normal as quickly as possible, and we apologize for any inconvenience.

Meanwhile, the problem continues to be reported by the consumer of the above-mentioned Facebook managed services.

October 4, 2021: Server Down

Today, a massive event has just occurred, when I tried to log in to my WhatsApp, Facebook, and Instagram accounts. They are taking too much time and eventually they failed to load the app or the corresponding app/website page.

So, I went on and jumped right on the internet, and found that the issues are now surfacing over the internet… “WhatsApp Facebook and Instagram are down”

This is a major outage for the server all across the globe.. as users are reporting this from different countries and parts of the world.

This is a developing story and we’ll keep you posted on further information because Facebook is yet to come up with an answer on this matter.

Meanwhile, the fun continues on Twitter, as fans and users trying to express their concerns and the bazaar of Memes has just opened on this situation. check some of those amazing ones in the feed down below.

Copyright © 2022 Huaweicentral.com