We experienced downtime in our public API because a job in the database failed to create a new table partition for April. The lack of this partition caused all the writes for that table to fail, impacting our main API. After the partition was created, the writes to the table were restored, and the API was back.
We are working on the following:
1) Improve alerting and monitoring on the job that creates the table partition
2) Removing this strong dependency between our main API and this table
Resolved
We experienced downtime in our public API because a job in the database failed to create a new table partition for April. The lack of this partition caused all the writes for that table to fail, impacting our main API. After the partition was created, the writes to the table were restored, and the API was back.
We are working on the following:
1) Improve alerting and monitoring on the job that creates the table partition
2) Removing this strong dependency between our main API and this table
Monitoring
The API is back online.
Investigating
The Resend API is down. We have identified what we believe to be the issue and are working on a fix.