Diagnosis · Python · Reliability

Why Your Trading Bot Keeps Crashing (And How to Fix It)

Published May 20, 2026 · 12 min read · By WatchDog Bot Team

If your Python trading bot keeps stopping, dying overnight, or throwing cryptic errors you've never seen before — you're not alone. After diagnosing hundreds of bot crashes from real users, we've found 90% of them fall into seven specific patterns. This guide walks through each, shows you the exact error signature, and gives you the fix.

The 7 reasons trading bots crash

  1. Missing or mismatched dependencies
  2. Network errors that aren't being caught
  3. Silent rate limits from the exchange
  4. Slow memory leaks over hours or days
  5. Expired or rotated API credentials
  6. Unexpected payload shapes
  7. System-level issues (sleep, restart, network)

01Missing or mismatched dependencies

Symptom
ModuleNotFoundError: No module named 'pandas'
ImportError: cannot import name 'X' from 'Y'

This is by far the most common cause of bot crashes — and the most frustrating, because it usually means the bot worked fine on your laptop but breaks the moment you deploy it somewhere else (a VPS, a different venv, after a system update).

Three sub-cases:

The manual fix: Maintain a strict requirements.txt, use a per-bot venv, and run pip install -r requirements.txt on every deploy. This works but is painful when you're iterating on bot code daily.

The better fix: Use a runtime that auto-installs missing dependencies. WatchDog Bot watches every bot process for ModuleNotFoundError, parses the missing package name, installs it into the bot's isolated venv via uv, and retries — up to three times. After version 1.1.13 you basically never see this error class again.

02Network errors that aren't being caught

Symptom
requests.exceptions.ConnectionError: HTTPSConnectionPool(...)
socket.timeout: timed out
aiohttp.ClientConnectorError: Cannot connect to host

Your bot polls an exchange API every 5 seconds. 99.99% of the time it works. Then once a day, the exchange's load balancer hiccups for 300 ms and your bot dies with an unhandled exception.

This crash mode is insidious because it works most of the time. You think the bot is fine. Then you wake up and it's been dead since 4:17 AM.

The fix: Wrap every network call in a try/except, and your main loop in an outer try/except, with exponential backoff:

import time, random
import requests

def fetch_with_retry(url, max_attempts=5):
    for attempt in range(max_attempts):
        try:
            r = requests.get(url, timeout=10)
            r.raise_for_status()
            return r.json()
        except (requests.RequestException, ValueError) as e:
            if attempt == max_attempts - 1:
                raise
            wait = (2 ** attempt) + random.uniform(0, 1)
            print(f"[warn] {e} — retrying in {wait:.1f}s")
            time.sleep(wait)

And the main loop:

while True:
    try:
        run_one_tick()
    except Exception as e:
        # Never let the loop die. Log and continue.
        print(f"[error] tick failed: {e!r} — sleeping 60s")
        time.sleep(60)

Anti-pattern alert: Don't write except: pass at the top level. You'll swallow real bugs (typos, logic errors, KeyboardInterrupt) and your bot will appear to "work" while doing nothing. Always catch Exception specifically, and always log.

03Silent rate limits from the exchange

Symptom
HTTP 429 Too Many Requests
HTTP 418 (Binance teapot — IP banned for 5 minutes)
Empty responses, all-zero orderbooks, "stale" data

You wrote a loop that polls market data every second. The exchange thanked you for 47 minutes — then started returning 429s, or worse, started returning cached stale data with no error code.

Crypto exchanges and Kalshi all have rate limits — and they're rarely documented well. Binance, for example, has six different limit tiers depending on endpoint, account type, and time window.

The fix is a token bucket, not "sleep 1 second between calls":

import time, threading

class RateLimiter:
    def __init__(self, calls_per_second):
        self.interval = 1.0 / calls_per_second
        self.last_call = 0.0
        self.lock = threading.Lock()

    def wait(self):
        with self.lock:
            elapsed = time.monotonic() - self.last_call
            if elapsed < self.interval:
                time.sleep(self.interval - elapsed)
            self.last_call = time.monotonic()

limiter = RateLimiter(calls_per_second=5)

def safe_get(url):
    limiter.wait()
    return requests.get(url).json()

Better yet: respect Retry-After headers when you do get a 429. Many exchanges return them and most bots ignore them.

04Slow memory leaks over hours or days

Symptom
MemoryError: Unable to allocate ... GiB
Process killed by OS (OOM killer)
Bot mysteriously dies overnight with no traceback

The symptom: your bot runs fine for the first hour. After 6 hours, it's using 800 MB. After 24 hours, the OS kills it and you wake up to a dead bot.

The cause is almost always one of these three:

The fix: Use collections.deque(maxlen=N) for any rolling window. Build a list and call pd.DataFrame(list) once per tick, not pd.concat. Use logging.handlers.RotatingFileHandler.

from collections import deque

class Strategy:
    def __init__(self):
        self.recent_prices = deque(maxlen=1000)  # auto-trims to last 1000

    def on_tick(self, price):
        self.recent_prices.append(price)
        # use it freely — memory is bounded

05Expired or rotated API credentials

Symptom
HTTP 401 Unauthorized
HTTP 403 Forbidden
Cryptic JSON: {"code": "invalid_signature"}

You created the API key 90 days ago. Today, the exchange auto-rotated it (Coinbase Pro does this) or your key expired (Binance requires re-confirmation every 90 days). Your bot is now authenticated as nobody.

The fix isn't in your bot — it's in how you store credentials. Three guidelines:

WatchDog Bot's wd.connection("Kalshi") reads credentials from an encrypted store keyed to the bot, not the source code. Rotating a key means updating one row in the settings UI — not redeploying every bot that uses it.

06Unexpected payload shapes

Symptom
KeyError: 'price'
TypeError: 'NoneType' object is not subscriptable
ValueError: could not convert string to float

The exchange returns {"price": "47000.50"} normally. Then, occasionally, it returns {"price": null} when the market is paused, or {"error": "halted"} when there's a system issue. Your code does float(response["price"]) and dies.

This is a defensive-programming problem. APIs lie. Treat every external response as untrusted.

def parse_price(resp: dict) -> float | None:
    if not isinstance(resp, dict):
        return None
    raw = resp.get("price")
    if raw is None:
        return None
    try:
        return float(raw)
    except (TypeError, ValueError):
        return None

price = parse_price(response)
if price is None:
    print("[warn] no price available — skipping tick")
    return
# now safe to use `price`

For larger schemas, use Pydantic models with extra="ignore" so unknown fields don't crash you and missing required fields fail loudly with a clear message.

07System-level issues (sleep, restart, network)

Symptom
Bot died at 2:34 AM with no logs after that line
OSError: [Errno 24] Too many open files
BrokenPipeError

This is the category most users don't even consider. Your bot is fine — the machine running it isn't.

For 24/7 reliability, the right move is to run bots somewhere they don't depend on your laptop being awake. Common options:

The pattern behind all seven

Most bot crashes share one root cause: trading bots are long-running processes interacting with unreliable external systems, written in a language that treats every uncaught exception as fatal. The cure isn't "write better code." It's defense in depth:

  1. Catch exceptions at every layer (request, parser, strategy, top-level loop)
  2. Log every error with enough context to diagnose later
  3. Bound every resource (memory, file handles, rate limits, retries)
  4. Treat dependencies, credentials, and network as things that will fail
  5. Have something watching the watcher — uptime monitoring, cloud log shipping, alerts

You can build all of this yourself. Or you can use a runtime that builds it in.

Bots don't fail in production. Bots fail at 4:17 AM when you can't fix them.

Stop diagnosing crashes. Start trading.

WatchDog Bot handles all seven crash modes above out of the box. Free trial, no credit card.

Start Free Trial →

Related reading