Diagnosis · Python · Reliability

Why Your Trading Bot Keeps Crashing (And How to Fix It)

Published May 20, 2026 · 12 min read · By WatchDog Bot Team

If your Python trading bot keeps stopping, dying overnight, or throwing cryptic errors you've never seen before — you're not alone. After diagnosing hundreds of bot crashes from real users, we've found 90% of them fall into seven specific patterns. This guide walks through each, shows you the exact error signature, and gives you the fix.

The 7 reasons trading bots crash

Missing or mismatched dependencies
Network errors that aren't being caught
Silent rate limits from the exchange
Slow memory leaks over hours or days
Expired or rotated API credentials
Unexpected payload shapes
System-level issues (sleep, restart, network)

01Missing or mismatched dependencies

Symptom
ModuleNotFoundError: No module named 'pandas'
ImportError: cannot import name 'X' from 'Y'

This is by far the most common cause of bot crashes — and the most frustrating, because it usually means the bot worked fine on your laptop but breaks the moment you deploy it somewhere else (a VPS, a different venv, after a system update).

Three sub-cases:

Package not installed in the active environment. Easy fix once spotted, but the error is often buried 200 lines into a log file.
Wrong Python version. Your bot expects 3.10+ but runs in 3.8. Type hints like list[str] break instantly.
Import name mismatch. import cv2 wants the opencv-python package. import yaml wants PyYAML. There are dozens of these.

The manual fix: Maintain a strict requirements.txt, use a per-bot venv, and run pip install -r requirements.txt on every deploy. This works but is painful when you're iterating on bot code daily.

The better fix: Use a runtime that auto-installs missing dependencies. WatchDog Bot watches every bot process for ModuleNotFoundError, parses the missing package name, installs it into the bot's isolated venv via uv, and retries — up to three times. After version 1.1.13 you basically never see this error class again.

02Network errors that aren't being caught

Symptom
requests.exceptions.ConnectionError: HTTPSConnectionPool(...)
socket.timeout: timed out
aiohttp.ClientConnectorError: Cannot connect to host

Your bot polls an exchange API every 5 seconds. 99.99% of the time it works. Then once a day, the exchange's load balancer hiccups for 300 ms and your bot dies with an unhandled exception.

This crash mode is insidious because it works most of the time. You think the bot is fine. Then you wake up and it's been dead since 4:17 AM.

The fix: Wrap every network call in a try/except, and your main loop in an outer try/except, with exponential backoff:

import time, random
import requests

def fetch_with_retry(url, max_attempts=5):
    for attempt in range(max_attempts):
        try:
            r = requests.get(url, timeout=10)
            r.raise_for_status()
            return r.json()
        except (requests.RequestException, ValueError) as e:
            if attempt == max_attempts - 1:
                raise
            wait = (2 ** attempt) + random.uniform(0, 1)
            print(f"[warn] {e} — retrying in {wait:.1f}s")
            time.sleep(wait)

And the main loop:

while True:
    try:
        run_one_tick()
    except Exception as e:
        # Never let the loop die. Log and continue.
        print(f"[error] tick failed: {e!r} — sleeping 60s")
        time.sleep(60)

Anti-pattern alert: Don't write except: pass at the top level. You'll swallow real bugs (typos, logic errors, KeyboardInterrupt) and your bot will appear to "work" while doing nothing. Always catch Exception specifically, and always log.

03Silent rate limits from the exchange

Symptom
HTTP 429 Too Many Requests
HTTP 418 (Binance teapot — IP banned for 5 minutes)
Empty responses, all-zero orderbooks, "stale" data

You wrote a loop that polls market data every second. The exchange thanked you for 47 minutes — then started returning 429s, or worse, started returning cached stale data with no error code.

Crypto exchanges and Kalshi all have rate limits — and they're rarely documented well. Binance, for example, has six different limit tiers depending on endpoint, account type, and time window.

The fix is a token bucket, not "sleep 1 second between calls":

import time, threading

class RateLimiter:
    def __init__(self, calls_per_second):
        self.interval = 1.0 / calls_per_second
        self.last_call = 0.0
        self.lock = threading.Lock()

    def wait(self):
        with self.lock:
            elapsed = time.monotonic() - self.last_call
            if elapsed < self.interval:
                time.sleep(self.interval - elapsed)
            self.last_call = time.monotonic()

limiter = RateLimiter(calls_per_second=5)

def safe_get(url):
    limiter.wait()
    return requests.get(url).json()

Better yet: respect Retry-After headers when you do get a 429. Many exchanges return them and most bots ignore them.

04Slow memory leaks over hours or days

Symptom
MemoryError: Unable to allocate ... GiB
Process killed by OS (OOM killer)
Bot mysteriously dies overnight with no traceback

The symptom: your bot runs fine for the first hour. After 6 hours, it's using 800 MB. After 24 hours, the OS kills it and you wake up to a dead bot.

The cause is almost always one of these three:

An unbounded list. You're appending every price tick to self.history = [] and never trimming it. After a day at 1 tick/sec, that's 86,400 floats. After a week, 600,000. They stay in memory forever.
Pandas DataFrame concatenation in a loop. df = pd.concat([df, new_row]) in a hot loop. Every call allocates a new DataFrame. Memory grows linearly with time.
Logger handlers that never flush. Default Python logging writes to a buffer. Without explicit flushing or rotation, the buffer just grows.

The fix: Use collections.deque(maxlen=N) for any rolling window. Build a list and call pd.DataFrame(list) once per tick, not pd.concat. Use logging.handlers.RotatingFileHandler.

from collections import deque

class Strategy:
    def __init__(self):
        self.recent_prices = deque(maxlen=1000)  # auto-trims to last 1000

    def on_tick(self, price):
        self.recent_prices.append(price)
        # use it freely — memory is bounded

05Expired or rotated API credentials

Symptom
HTTP 401 Unauthorized
HTTP 403 Forbidden
Cryptic JSON: {"code": "invalid_signature"}

You created the API key 90 days ago. Today, the exchange auto-rotated it (Coinbase Pro does this) or your key expired (Binance requires re-confirmation every 90 days). Your bot is now authenticated as nobody.

The fix isn't in your bot — it's in how you store credentials. Three guidelines:

Never hardcode keys in source. Always read from environment variables or a secrets store.
Set up calendar reminders for key expiry — most exchanges email you, but the email goes to spam often enough that you can't rely on it.
Catch 401/403 specifically and alert (email, Slack, push) instead of just retrying. A bot retrying with bad credentials looks like a brute-force attack and can get your IP banned.

WatchDog Bot's wd.connection("Kalshi") reads credentials from an encrypted store keyed to the bot, not the source code. Rotating a key means updating one row in the settings UI — not redeploying every bot that uses it.

06Unexpected payload shapes

Symptom
KeyError: 'price'
TypeError: 'NoneType' object is not subscriptable
ValueError: could not convert string to float

The exchange returns {"price": "47000.50"} normally. Then, occasionally, it returns {"price": null} when the market is paused, or {"error": "halted"} when there's a system issue. Your code does float(response["price"]) and dies.

This is a defensive-programming problem. APIs lie. Treat every external response as untrusted.

def parse_price(resp: dict) -> float | None:
    if not isinstance(resp, dict):
        return None
    raw = resp.get("price")
    if raw is None:
        return None
    try:
        return float(raw)
    except (TypeError, ValueError):
        return None

price = parse_price(response)
if price is None:
    print("[warn] no price available — skipping tick")
    return
# now safe to use `price`

For larger schemas, use Pydantic models with extra="ignore" so unknown fields don't crash you and missing required fields fail loudly with a clear message.

07System-level issues (sleep, restart, network)

Symptom
Bot died at 2:34 AM with no logs after that line
OSError: [Errno 24] Too many open files
BrokenPipeError

This is the category most users don't even consider. Your bot is fine — the machine running it isn't.

Laptop went to sleep. macOS and Windows aggressively suspend processes when the lid closes. caffeinate (macOS) or disabling sleep (Windows) helps for short stretches; for production, run on a server.
System restart for updates. Windows Update reboots your machine. Your bot is gone unless you set it to start on boot.
Wi-Fi drops or VPN flaps. Network errors weren't caught (see #2) and the bot died silently.
File handle leak. You're opening log files without closing them. After ~1,024 files (Linux default ulimit), every open() fails. Use context managers: with open(...) as f:.

For 24/7 reliability, the right move is to run bots somewhere they don't depend on your laptop being awake. Common options:

A VPS (DigitalOcean, Hetzner, Linode) — $5–$10/mo, full control, but you manage Python, venvs, systemd, log rotation, the firewall, deploys, and uptime monitoring yourself.
WatchDog Bot — desktop app that keeps bots running while your machine is awake, plus cloud logging so you see what's happening when you're not. Zero server setup.

The pattern behind all seven

Most bot crashes share one root cause: trading bots are long-running processes interacting with unreliable external systems, written in a language that treats every uncaught exception as fatal. The cure isn't "write better code." It's defense in depth:

Catch exceptions at every layer (request, parser, strategy, top-level loop)
Log every error with enough context to diagnose later
Bound every resource (memory, file handles, rate limits, retries)
Treat dependencies, credentials, and network as things that will fail
Have something watching the watcher — uptime monitoring, cloud log shipping, alerts

You can build all of this yourself. Or you can use a runtime that builds it in.

Bots don't fail in production. Bots fail at 4:17 AM when you can't fix them.

Stop diagnosing crashes. Start trading.

WatchDog Bot handles all seven crash modes above out of the box. Free trial, no credit card.

Start Free Trial →

Why Your Trading Bot Keeps Crashing (And How to Fix It)

The 7 reasons trading bots crash

01Missing or mismatched dependencies

02Network errors that aren't being caught

03Silent rate limits from the exchange

04Slow memory leaks over hours or days

05Expired or rotated API credentials

06Unexpected payload shapes

07System-level issues (sleep, restart, network)

The pattern behind all seven

Stop diagnosing crashes. Start trading.

Related reading