High response times on Prod 1 server Friday 14th November 2025 15:32:00


Between 9:48 AM and 10:52 AM, production server 1 experienced slowdowns, reaching a maximum response time of 3.83 to 4.03 minutes.

Server stability recorded at 15:21. It remains under monitoring.

Azure Web App Diagnostics The application maintained an average availability of 98.74% over 23.75 hours, processing 261,168 requests with 2,871 successful requests and 3,297 server errors (5xx and ClientTimeout). There were 3,245 total request failures, primarily HTTP 400.604 errors (2,057 occurrences, 63.39%), indicating client requests were cancelled or timed out before completion. Additionally, 898 HTTP 500 errors (27.67%) were recorded, mostly due to Azure request timeouts exceeding 230 seconds. HTTP 502 errors accounted for 149 occurrences (4.59%), signaling issues with application startup, gateway configuration, or backend connectivity. High latency was detected, with a 95th percentile average of 57,655 ms, pointing to performance bottlenecks likely related to the Azure App Service Platform and application code errors. Memory dumps were captured and analyzed, though some attempts failed due to repeated collections. No exceptions were found, but an analyzer error of System.OperationCanceledException was noted during memory dump analysis. Root causes are primarily performance-related rather than direct application code faults.

We have identified the need to perform a complete replacement of one of the affected servers to fully mitigate the problem. The estimated time for this task is 30 minutes. During this period, there may be temporary service interruptions.

New increase in response time at 1:54 PM, reaching 3.84 minutes.

Server stability recorded at 12:16. It remains under monitoring.

Multiple occurrences of the exception were recorded: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and the max pool size was reached. Indicating that all available connections in the pool with SQL were exhausted. No connection was being released in time, so the requests got stuck waiting, causing slowness and timeout.