Our RDC check deposit service will be undergoing maintenance this evening from 8:00 PM Pacific to 12:00 AM Pacific. During this time, customers will not be able to deposit checks using the Simple app. All other Simple services will be available during the maintenance window. Please contact our Customer Relations team if you have any questions.
At 03:30 PDT, our processing partner partner will undergo a one-hour long scheduled maintenance session to enhance their capacity. During this time, customers will be unable to use their cards for purchases or make ATM withdrawals. We’ll update here as the maintenance proceeds. Please contact our Customer Relations team if you have any questions. Thanks! (^will)
04:30 PDT: All done! Thanks for waiting up with us; please let Customer Relations know if you have any questions. (^will)
We’re investigating reports that some customers’ swipes are being incorrectly declined. We’ll update here with more information as soon as possible. Please contact our Customer Relations team if you have any questions. Thank you! (^will)
09:12 PT: We continue to receive sporadic reports of swipes being incorrectly declined. We’ll update here when we know more. (^will)
10:11 PT: This issue is resolved. We’re following up with customers whose transactions were incorrectly declined. Thanks for your patience! (^will)
We’ve disabled the remote deposit capture feature while we work with our partner to resolve some technical issues. While the feature is disabled, customers will not be able to deposit checks with our mobile app. We’ll update here as soon as we’re able to turn the feature back on. Thanks for your patience! (^will 09:55 PT)
Update 11:30 PT: This issue is resolved. Thanks for your patience! (^will)
Remote Deposit Capture is temporarily unavailable while we investigate issues with our partner. While this feature is disabled, customers will not be able to deposit checks using our mobile app. We’ll post here when we have more information, but please contact Customer Relations if you have any questions. Thanks! (^will 10:30 PST)
11:30 PST: The underlying issue is resolved and we’ve re-enabled RDC. Customers that attempted to use RDC while the feature was disabled may need to log out of their app before the feature will be available on their phone again. Thanks for your patience! (^will)
Starting earlier this morning, some customers reported problems swiping their cards. We’ve been working closely with our partner to debug and resolve this issue. While we continue to work on this issue, swipes will continue to fail for some of our customers and those customers will be unable to create or edit payment contacts. Most customers should experience no problems, and all customers can continue to send payments or access their account data in our web and mobile apps as usual. We’ll update here soon; please contact our Customer Relations team if you have any questions. Thanks!
2:15 PM PST: We’ve resolved this issue. All Simple services are now functioning and customers should now be able to swipe, deposit, and make payments as usual. We’ll continue to address smaller issues related to this incident over the next day, but please get in touch with our Customer Relations team if you have any questions. Thanks for your patience!
Transactions are currently unavailable in the web and mobile applications. Swipes, payments, and support messages are all working. We’ll update here when we’ve resolved the issue loading transactions. Thanks for your patience! (^will)
5:17 PM PST: And we’re back. Transactions are loading and all other services are online. Thanks! (^will)
Transactions are taking longer than usual to show up in a customer’s Activity view. Swipes, deposits, bill payments and other financial processes are working fine, and we’re pushing code fixes that will help address the slowdown. Please contact customer support if you have any questions. (^will, 0740 PT)
Update: We’ve deployed code fixes to address the slowdown and transactions are flowing again. (^will, 0900 PT)
On Thursday, November 15th 2012, the services behind the Simple web and mobile applications failed to handle higher than usual load caused by a bug in our transaction ingest pipeline. Between 5:30AM PST and 11:00 AM PST, customers experienced intermittent failures to load or update their Activity feeds and we temporarily disabled our Send Money feature. Ingestion is separate from the systems that approve or reject swipes, deposits and other transactions; these were unaffected and card swipes, bill payments and deposits were all handled quickly and correctly throughout the morning.
At approximately 6:30 AM PST, we noticed that some Activity feeds were loading slowly and, in many cases, failed to load completely. We immediately paged our engineers and they began to investigate the cause.
Simple is built on a system of separate services. We partition these services as much as possible and avoid synchronous inter-service communication so that we can control the possibility of failure with a system of rate limiters and retries. Our web and mobile applications present information from many individual services, so they must rely on synchronous requests to render Activity, Goals, and support messages.
We traced the problem we observed on Thursday to the bill payment service behind our Send Money feature, which Activity contacts to load scheduled and upcoming payments. Our engineers discovered unoptimized database queries which caused the service to respond slowly. We temporarily disabled the Send Money feature while we worked on fixes to the underlying service. These fixes were deployed to production at 8:19 AM PST; while we saw response time improve significantly, customers continued to experience problems.
Our traffic patterns follow a common daily arc, with the heaviest load happening around 6:00 AM PST and peaking between 12:00 PM and 1:00 PM PST. On the 1st and 15th of each month, many customers sign in to check their direct deposits and pay their bills. However, we discovered that the load on the bill payment service was nearly five times greater than we expected based on past traffic. We also noticed that while Activity was now loading correctly, its results were out of date.
When we receive a new transaction from our partners, we transform the incoming data to make it more useful to our customers. Raw bill payments contain a contact ID rather than the information about the contact. We correct this by looking up the contact ID and inserting that information into the transaction before presenting it in a customer’s Activity feed. Our transaction service needs to talk to the payment contacts service to perform this transformation. On Thursday, several bugs in these services came together to amplify the already high traffic.
First, the transaction service’s request rate limiter broke when the payments service failed to respond to its requests in time. Second, the transaction service was configured to make many attempts after a single request failed, increasing the load on an already beleaguered service. Lastly, the transaction service was configured with a short timeout, so many slow-yet-successful responses from the payments service were considered failures by the transaction service.
Together, these bugs turned a performance problem in the payments database into a persistent problem that blocked our transaction processors. While the transaction processors were spinning in retry loops, nearly 2,600 transactions entered a backlog in our queueing system.
At 9:49 AM PST, our engineers deployed a fix which implemented backoffs in the case of some connection errors. This gave the transaction service enough space to process all but 700 backlogged transactions by 10:00AM PST. Soon, though, other errors in the same code began to cause the retry problem to recur, and processing of the transactions queue stalled again. Additional fixes were deployed at 10:09 AM PST and the queue was completely processed soon after. With the load removed from the payments service, we re-enabled bill pay.
We are taking immediate steps to address the issues we’ve discovered during this incident. We have already deployed fixes for the misbehaving code and will standardize our approach to similar areas of the system to better control performance under future traffic peaks. We are also extending our monitoring systems to alert us more quickly based on anomalous traffic patterns, errors logged, and queue sizes.
Since this incident occurred on the 15th of the month–a common payday–several of our customers experienced delays in seeing their paychecks appear in their Activity, although the funds had been deposited as expected. We know this particular situation was especially stressful for many, and we apologize for the stress that many of our customers felt. As always, we strive to be as transparent as possible when we have problems, and we are thankful for our customers’ trust.
- Ian Eure, backend engineer
We’re debugging a problem that’s causing customers’ Activity views to not load. Card swipes and customer deposits are not impacted and all schedule payments are being sent as usual. (^will, 07:00 PST)
07:30 PST: We’ve temporarily disabled the bill pay feature while we continue to investigate the problem.
09:20 PST: We’re still seeing intermittent failures to log in and load Activity.
10:20 PST: We’ve deployed code fixes to address this issue, our services are stabilizing and transactions are being ingested from the backlog for display in Activity. Swipes, deposits and payments continue to work without issue.
10:45 PST: We have finished processing all transactions that were stuck in our backlog. We’ve also re-enabled our bill pay feature. Please contact our support team if you notice any further issues. We’ll post a full analysis of this incident soon. Thanks for your patience!