Why Your Trading Bot Keeps Crashing (And How to Fix It)
If your Python trading bot keeps stopping, dying overnight, or throwing cryptic errors you've never seen before — you're not alone. After diagnosing hundreds of bot crashes from real users, we've found 90% of them fall into seven specific patterns. This guide walks through each, shows you the exact error signature, and gives you the fix.
The 7 reasons trading bots crash
01Missing or mismatched dependencies
ModuleNotFoundError: No module named 'pandas'
ImportError: cannot import name 'X' from 'Y'
This is by far the most common cause of bot crashes — and the most frustrating, because it usually means the bot worked fine on your laptop but breaks the moment you deploy it somewhere else (a VPS, a different venv, after a system update).
Three sub-cases:
- Package not installed in the active environment. Easy fix once spotted, but the error is often buried 200 lines into a log file.
- Wrong Python version. Your bot expects 3.10+ but runs in 3.8. Type hints like
list[str]break instantly. - Import name mismatch.
import cv2wants theopencv-pythonpackage.import yamlwantsPyYAML. There are dozens of these.
The manual fix: Maintain a strict requirements.txt, use a per-bot venv, and run pip install -r requirements.txt on every deploy. This works but is painful when you're iterating on bot code daily.
The better fix: Use a runtime that auto-installs missing dependencies. WatchDog Bot watches every bot process for ModuleNotFoundError, parses the missing package name, installs it into the bot's isolated venv via uv, and retries — up to three times. After version 1.1.13 you basically never see this error class again.
02Network errors that aren't being caught
requests.exceptions.ConnectionError: HTTPSConnectionPool(...)
socket.timeout: timed out
aiohttp.ClientConnectorError: Cannot connect to host
Your bot polls an exchange API every 5 seconds. 99.99% of the time it works. Then once a day, the exchange's load balancer hiccups for 300 ms and your bot dies with an unhandled exception.
This crash mode is insidious because it works most of the time. You think the bot is fine. Then you wake up and it's been dead since 4:17 AM.
The fix: Wrap every network call in a try/except, and your main loop in an outer try/except, with exponential backoff:
import time, random
import requests
def fetch_with_retry(url, max_attempts=5):
for attempt in range(max_attempts):
try:
r = requests.get(url, timeout=10)
r.raise_for_status()
return r.json()
except (requests.RequestException, ValueError) as e:
if attempt == max_attempts - 1:
raise
wait = (2 ** attempt) + random.uniform(0, 1)
print(f"[warn] {e} — retrying in {wait:.1f}s")
time.sleep(wait)
And the main loop:
while True:
try:
run_one_tick()
except Exception as e:
# Never let the loop die. Log and continue.
print(f"[error] tick failed: {e!r} — sleeping 60s")
time.sleep(60)
Anti-pattern alert: Don't write except: pass at the top level. You'll swallow real bugs (typos, logic errors, KeyboardInterrupt) and your bot will appear to "work" while doing nothing. Always catch Exception specifically, and always log.
03Silent rate limits from the exchange
HTTP 429 Too Many Requests
HTTP 418 (Binance teapot — IP banned for 5 minutes)
Empty responses, all-zero orderbooks, "stale" data
You wrote a loop that polls market data every second. The exchange thanked you for 47 minutes — then started returning 429s, or worse, started returning cached stale data with no error code.
Crypto exchanges and Kalshi all have rate limits — and they're rarely documented well. Binance, for example, has six different limit tiers depending on endpoint, account type, and time window.
The fix is a token bucket, not "sleep 1 second between calls":
import time, threading
class RateLimiter:
def __init__(self, calls_per_second):
self.interval = 1.0 / calls_per_second
self.last_call = 0.0
self.lock = threading.Lock()
def wait(self):
with self.lock:
elapsed = time.monotonic() - self.last_call
if elapsed < self.interval:
time.sleep(self.interval - elapsed)
self.last_call = time.monotonic()
limiter = RateLimiter(calls_per_second=5)
def safe_get(url):
limiter.wait()
return requests.get(url).json()
Better yet: respect Retry-After headers when you do get a 429. Many exchanges return them and most bots ignore them.
04Slow memory leaks over hours or days
MemoryError: Unable to allocate ... GiB
Process killed by OS (OOM killer)
Bot mysteriously dies overnight with no traceback
The symptom: your bot runs fine for the first hour. After 6 hours, it's using 800 MB. After 24 hours, the OS kills it and you wake up to a dead bot.
The cause is almost always one of these three:
- An unbounded list. You're appending every price tick to
self.history = []and never trimming it. After a day at 1 tick/sec, that's 86,400 floats. After a week, 600,000. They stay in memory forever. - Pandas DataFrame concatenation in a loop.
df = pd.concat([df, new_row])in a hot loop. Every call allocates a new DataFrame. Memory grows linearly with time. - Logger handlers that never flush. Default Python logging writes to a buffer. Without explicit flushing or rotation, the buffer just grows.
The fix: Use collections.deque(maxlen=N) for any rolling window. Build a list and call pd.DataFrame(list) once per tick, not pd.concat. Use logging.handlers.RotatingFileHandler.
from collections import deque
class Strategy:
def __init__(self):
self.recent_prices = deque(maxlen=1000) # auto-trims to last 1000
def on_tick(self, price):
self.recent_prices.append(price)
# use it freely — memory is bounded
05Expired or rotated API credentials
HTTP 401 Unauthorized
HTTP 403 Forbidden
Cryptic JSON:
{"code": "invalid_signature"}You created the API key 90 days ago. Today, the exchange auto-rotated it (Coinbase Pro does this) or your key expired (Binance requires re-confirmation every 90 days). Your bot is now authenticated as nobody.
The fix isn't in your bot — it's in how you store credentials. Three guidelines:
- Never hardcode keys in source. Always read from environment variables or a secrets store.
- Set up calendar reminders for key expiry — most exchanges email you, but the email goes to spam often enough that you can't rely on it.
- Catch 401/403 specifically and alert (email, Slack, push) instead of just retrying. A bot retrying with bad credentials looks like a brute-force attack and can get your IP banned.
WatchDog Bot's wd.connection("Kalshi") reads credentials from an encrypted store keyed to the bot, not the source code. Rotating a key means updating one row in the settings UI — not redeploying every bot that uses it.
06Unexpected payload shapes
KeyError: 'price'
TypeError: 'NoneType' object is not subscriptable
ValueError: could not convert string to float
The exchange returns {"price": "47000.50"} normally. Then, occasionally, it returns {"price": null} when the market is paused, or {"error": "halted"} when there's a system issue. Your code does float(response["price"]) and dies.
This is a defensive-programming problem. APIs lie. Treat every external response as untrusted.
def parse_price(resp: dict) -> float | None:
if not isinstance(resp, dict):
return None
raw = resp.get("price")
if raw is None:
return None
try:
return float(raw)
except (TypeError, ValueError):
return None
price = parse_price(response)
if price is None:
print("[warn] no price available — skipping tick")
return
# now safe to use `price`
For larger schemas, use Pydantic models with extra="ignore" so unknown fields don't crash you and missing required fields fail loudly with a clear message.
07System-level issues (sleep, restart, network)
Bot died at 2:34 AM with no logs after that line
OSError: [Errno 24] Too many open files
BrokenPipeError
This is the category most users don't even consider. Your bot is fine — the machine running it isn't.
- Laptop went to sleep. macOS and Windows aggressively suspend processes when the lid closes.
caffeinate(macOS) or disabling sleep (Windows) helps for short stretches; for production, run on a server. - System restart for updates. Windows Update reboots your machine. Your bot is gone unless you set it to start on boot.
- Wi-Fi drops or VPN flaps. Network errors weren't caught (see #2) and the bot died silently.
- File handle leak. You're opening log files without closing them. After ~1,024 files (Linux default ulimit), every
open()fails. Use context managers:with open(...) as f:.
For 24/7 reliability, the right move is to run bots somewhere they don't depend on your laptop being awake. Common options:
- A VPS (DigitalOcean, Hetzner, Linode) — $5–$10/mo, full control, but you manage Python, venvs, systemd, log rotation, the firewall, deploys, and uptime monitoring yourself.
- WatchDog Bot — desktop app that keeps bots running while your machine is awake, plus cloud logging so you see what's happening when you're not. Zero server setup.
The pattern behind all seven
Most bot crashes share one root cause: trading bots are long-running processes interacting with unreliable external systems, written in a language that treats every uncaught exception as fatal. The cure isn't "write better code." It's defense in depth:
- Catch exceptions at every layer (request, parser, strategy, top-level loop)
- Log every error with enough context to diagnose later
- Bound every resource (memory, file handles, rate limits, retries)
- Treat dependencies, credentials, and network as things that will fail
- Have something watching the watcher — uptime monitoring, cloud log shipping, alerts
You can build all of this yourself. Or you can use a runtime that builds it in.
Bots don't fail in production. Bots fail at 4:17 AM when you can't fix them.
Stop diagnosing crashes. Start trading.
WatchDog Bot handles all seven crash modes above out of the box. Free trial, no credit card.
Start Free Trial →
WatchDog Bot