Incident date: 2022-03-18, Distributed Denial of Service Attack
Note: All times are in Asia/Kuala_Lumpur (UTC +0800)
This document is a technical explanation of the incident on the 18th of March 2022. It serves as an avenue for the community to learn and understand the root cause of the incident.
On the 18th of March, users were unable to access Billplz services which include but are not limited to making and receiving payment. Services were unavailable for approximately 4 hours. The issues caused by the massive amount of requests made to our services utilizing our scaling configuration to max.
As a solution, we configured a web application firewall (WAF) to filter out illegitimate requests and also blocking requests made directly to the server. Further fine-tuning was completed on 12:10 PM, 19th of March 2022 where the service is back to normal.
The DDoS attack occurred between 04:00 AM, 18th of March 2022,to 05:00 PM, 18th of March 2022. Billplz web service is back to normal on 12:10 PM, 19th of March 2022.
04:00 AM, 18 March 2022
Start of the incident. The majority of the services were still up but with degraded performance.
04:31 AM, 18 March 2022
First notification received from Sentry. Billplz web servers are unable to serve any requests. However Billplz merchant dashboard remains accessible with no performance impact as it is on a separate server. Merchants are able to get their account statement and perform some tasks provided through the dashboard.
04:33 AM, 18 March 2022
Received a ransom demanding for a payment in Bitcoin to stop the attack.
07:00 AM, 18 March 2022
Engineers noticed the attack and immediately set up a web application firewall (WAF) to neutralize the attack.
08:30 AM, 18 March 2022
The server configuration is scaled up further to cater for the ongoing attacks. As most of the bad traffic has been filtered out, Billplz is now up and able to receive the request with intermittent downtime.
2:30 PM, 18 March 2022
As the attacks remained and some of the requests still managed to bypass our web application firewall (WAF), we imposed a more strict mode of web application firewall (WAF) setting to further crackdown on the attacks.
5:00 PM, 18 March 2022
The attack stopped and we continue monitoring the impact with our merchants. To further secure our server, we deployed an additional middleware to our application level, as another layer of protection.
6:45 PM, 18 March 2022
The attacker moved the attack to Plzlogin. Merchants are unable to login to their dashboard as Plzlogin is the login module for merchants to get into Billplz Dashboard.
11:25 PM, 18 March 2022
The attacks towards Plzlogin stopped and merchants are able to access the system now.
12:10 PM, 19 March 2022
System-wide web application firewall (WAF) strict mode has been removed. From our monitoring and feedback received, we confirmed that all services are back to normal. Incident closed.
What went well
- The attacks didn’t cause a database server to crash.
- Only web facing servers crashed due to the attack. Non-web server works as usual and is not impacted by the attack.
What went wrong
- CloudFlare did not set-up prior to the incident. Hence, complicating the mitigation process which translates into longer downtime.
Where we got lucky
- PgBouncer as a database pooling has been set up in 2021. Hence we are not facing any issues with database connection limits issue upon horizontal scaling.
- Merchant dashboards have been separated from the main apps which enable merchants to continue the reconciliation process during the attack.
- Ensure web application firewall (WAF) are always-on to ensure smooth mitigation in the event of attack, thus reducing potential downtime.
- Middleware layer kind of protection in Ruby is useful to block unexpected traffic since it can process a lot faster at middleware than the application layer.
- Auto scaling is important to ensure enough capacity to serve a massive huge request.