End to end and smoke tests give a really valuable angle on what the app is doing and can warn you about failures before they happen. However, because they’re working with a live app and a live database over a live network, they can introduce a lot of flakiness. Beyond just changes to the app, different data in the environment or other issues can cause a smoke test failure.
How do you handle the inherent flakiness of testing against a live app?
When do you run smokes? On every phoenix branch? Pre-prod? Prod only?
Who fixes the issues that the smokes find?
End-to-end tests are basically non-deterministic state machines. Flakiness can come from any point in the test: bad tests, bad state management, conflicting tests, network hiccups, etc.
Your goal is to reduce every single point of that flakiness. Just make sure you keep track of it. Sometimes flakiness in tests is really pointing at flakiness in the product itself.
Some things that can help reduce that flakiness:
Polling is certainly useful, but at some point introducing reliability degrades effectiveness. I certainly want to know if the app is unreachable over the open internet, and I absolutely need to know if a partner’s API is down.