DeepSeek API Python Tutorial 2026: Complete Guide

2026-05-05 — by Global API Team

deepseek-api python-tutorial deepseek-python llm-api openai-sdk-python deepseek-v4-flash ai-api-tutorial deepseek-v4-flash tutorial

DeepSeek API Python Tutorial 2026: Complete Guide

DeepSeek's models deliver GPT-4-class performance at a fraction of the cost — $0.14/M input tokens and $0.28/M output tokens for DeepSeek V4 Flash. This guide walks you through everything you need to call the DeepSeek API from Python: installation, authentication, chat completions, streaming responses, function calling, error handling, and cost optimization patterns.

TL;DR: DeepSeek's API is fully OpenAI-compatible. Install openai, point base_url at https://global-apis.com/v1, and your existing OpenAI code works — at 74% less cost.

Prerequisites

Python 3.9+
A DeepSeek API key — get one for free (100 credits, no credit card required) at https://global-apis.com/register
Basic familiarity with Python and REST APIs

1. Installation

DeepSeek's API follows the OpenAI specification, so you use the official openai Python package — no vendor-specific SDK needed.

pip install openai

That's it. No additional packages required for basic usage.

2. Authentication & Client Setup

Option A: Environment Variable (Recommended)

Set your API key as an environment variable so it's never hardcoded in source files:

# Linux / macOS
export DEEPSEEK_API_KEY="a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4"

# Windows (PowerShell)
$env:DEEPSEEK_API_KEY = "a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4"

Then initialize the client:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://global-apis.com/v1",
)

Option B: Inline (Quick Testing Only)

from openai import OpenAI

client = OpenAI(
    api_key="a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4",  # Replace with your key
    base_url="https://global-apis.com/v1",
)

⚠️ Never commit a real API key to version control. Use environment variables or a secrets manager in production.

3. Your First Chat Completion

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://global-apis.com/v1",
)

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "system", "content": "You are a helpful Python programming assistant."},
        {"role": "user", "content": "Write a Python function to flatten a nested list."},
    ],
    temperature=0.7,
    max_tokens=512,
)

print(response.choices[0].message.content)
print(f"\nTokens used — input: {response.usage.prompt_tokens}, output: {response.usage.completion_tokens}")

Sample output:

def flatten(nested_list):
    result = []
    for item in nested_list:
        if isinstance(item, list):
            result.extend(flatten(item))
        else:
            result.append(item)
    return result

Tokens used — input: 38, output: 71

4. Choosing the Right Model

Global API exposes two DeepSeek models:

| Model | Best For | Input Price | Output Price | |-------|----------|-------------|--------------| | deepseek-v4-flash | Everyday tasks: summarization, coding assist, Q&A, translation | $0.14/M tokens | $0.28/M tokens | | deepseek-reasoner | Complex reasoning: math proofs, multi-step logic, code debugging | $0.55/M tokens | $2.19/M tokens |

Rule of thumb: Start with deepseek-v4-flash. Switch to deepseek-reasoner only when the task genuinely requires multi-step logical reasoning — it's ~8× more expensive per output token.

# For complex reasoning tasks
response = client.chat.completions.create(
    model="deepseek-reasoner",
    messages=[
        {"role": "user", "content": "Prove that the square root of 2 is irrational."}
    ],
    max_tokens=1024,
)

5. Streaming Responses

For chatbot UIs or any scenario where latency matters, use streaming to display tokens as they arrive:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://global-apis.com/v1",
)

stream = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "user", "content": "Explain async/await in Python in simple terms."}
    ],
    stream=True,
)

print("Response: ", end="", flush=True)
for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.content is not None:
        print(delta.content, end="", flush=True)

print()  # newline at the end

Streaming reduces time-to-first-token from ~2s to ~200ms for most responses, which dramatically improves perceived performance in interactive applications.

6. Multi-Turn Conversations

To maintain context across turns, pass the full conversation history in each request:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://global-apis.com/v1",
)

def chat(messages: list[dict]) -> str:
    response = client.chat.completions.create(
        model="deepseek-v4-flash",
        messages=messages,
        temperature=0.7,
    )
    return response.choices[0].message.content

# Build conversation history
history = [
    {"role": "system", "content": "You are a concise technical assistant."}
]

user_inputs = [
    "What is a Python decorator?",
    "Can you show me a practical example?",
    "Now make it work as a retry decorator.",
]

for user_message in user_inputs:
    history.append({"role": "user", "content": user_message})
    reply = chat(history)
    history.append({"role": "assistant", "content": reply})
    print(f"User: {user_message}")
    print(f"Assistant: {reply}\n{'─' * 60}")

Pro tip: To control costs in long conversations, trim old messages from history once the conversation exceeds your token budget. Keep the system prompt and the last N turns.

7. Function Calling (Tool Use)

DeepSeek supports OpenAI-compatible function calling. This lets the model decide when to invoke your code:

import json
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://global-apis.com/v1",
)

# Define available tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "The city name, e.g. 'London'"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit"
                    }
                },
                "required": ["city"],
            },
        },
    }
]

def get_weather(city: str, unit: str = "celsius") -> dict:
    """Stub — replace with a real weather API call."""
    return {"city": city, "temperature": 22, "unit": unit, "condition": "sunny"}

messages = [{"role": "user", "content": "What's the weather like in Tokyo?"}]

# First API call — model decides whether to call a function
response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=messages,
    tools=tools,
    tool_choice="auto",
)

message = response.choices[0].message

# Check if the model wants to call a function
if message.tool_calls:
    tool_call = message.tool_calls[0]
    fn_name = tool_call.function.name
    fn_args = json.loads(tool_call.function.arguments)

    # Execute the function
    fn_result = get_weather(**fn_args)

    # Append function result and get final response
    messages.append(message)
    messages.append({
        "role": "tool",
        "tool_call_id": tool_call.id,
        "content": json.dumps(fn_result),
    })

    final_response = client.chat.completions.create(
        model="deepseek-v4-flash",
        messages=messages,
    )
    print(final_response.choices[0].message.content)
else:
    print(message.content)

8. Error Handling & Retries

Production code needs robust error handling. Here's a pattern with exponential backoff:

import os
import time
import random
from openai import OpenAI, RateLimitError, APITimeoutError, APIConnectionError

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://global-apis.com/v1",
    timeout=30.0,
)

def chat_with_retry(messages: list[dict], max_retries: int = 3, **kwargs) -> str:
    """Call the API with exponential backoff on transient errors."""
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="deepseek-v4-flash",
                messages=messages,
                **kwargs,
            )
            return response.choices[0].message.content

        except RateLimitError:
            if attempt == max_retries - 1:
                raise
            wait = (2 ** attempt) + random.uniform(0, 1)
            print(f"Rate limited. Retrying in {wait:.1f}s...")
            time.sleep(wait)

        except (APITimeoutError, APIConnectionError) as e:
            if attempt == max_retries - 1:
                raise
            wait = (2 ** attempt) + random.uniform(0, 1)
            print(f"Network error: {e}. Retrying in {wait:.1f}s...")
            time.sleep(wait)

    raise RuntimeError("Max retries exceeded")  # Should not be reached

# Usage
result = chat_with_retry(
    messages=[{"role": "user", "content": "Summarize the Python GIL in one paragraph."}],
    temperature=0.5,
    max_tokens=200,
)
print(result)

9. Tracking Token Usage & Cost

With DeepSeek V4 Flash pricing ($0.14/M input, $0.28/M output), even heavy workloads cost very little. Here's how to instrument cost tracking:

import os
from dataclasses import dataclass, field
from openai import OpenAI

INPUT_COST_PER_TOKEN  = 0.14 / 1_000_000   # $0.14 per 1M input tokens
OUTPUT_COST_PER_TOKEN = 0.28 / 1_000_000   # $0.28 per 1M output tokens

@dataclass
class UsageTracker:
    total_input_tokens: int = 0
    total_output_tokens: int = 0
    request_count: int = 0
    total_cost_usd: float = field(init=False, default=0.0)

    def record(self, usage):
        self.total_input_tokens  += usage.prompt_tokens
        self.total_output_tokens += usage.completion_tokens
        self.request_count       += 1
        self.total_cost_usd = (
            self.total_input_tokens  * INPUT_COST_PER_TOKEN +
            self.total_output_tokens * OUTPUT_COST_PER_TOKEN
        )

    def report(self):
        print(f"Requests:       {self.request_count}")
        print(f"Input tokens:   {self.total_input_tokens:,}")
        print(f"Output tokens:  {self.total_output_tokens:,}")
        print(f"Estimated cost: ${self.total_cost_usd:.6f}")

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://global-apis.com/v1",
)
tracker = UsageTracker()

prompts = [
    "Explain list comprehensions.",
    "What is a Python context manager?",
    "When should I use a generator vs a list?",
]

for prompt in prompts:
    response = client.chat.completions.create(
        model="deepseek-v4-flash",
        messages=[{"role": "user", "content": prompt}],
        max_tokens=200,
    )
    tracker.record(response.usage)

tracker.report()
# Output (approximate):
# Requests:       3
# Input tokens:   51
# Output tokens:  462
# Estimated cost: $0.000136

10. Async Usage with `asyncio`

For high-throughput applications, use the async client to run multiple requests concurrently:

import asyncio
import os
from openai import AsyncOpenAI

async_client = AsyncOpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://global-apis.com/v1",
)

async def summarize(text: str) -> str:
    response = await async_client.chat.completions.create(
        model="deepseek-v4-flash",
        messages=[
            {"role": "system", "content": "Summarize the following in one sentence."},
            {"role": "user",   "content": text},
        ],
        max_tokens=100,
    )
    return response.choices[0].message.content

async def main():
    documents = [
        "Python was created by Guido van Rossum and released in 1991...",
        "Machine learning is a subset of artificial intelligence that...",
        "REST APIs use HTTP methods to perform CRUD operations on resources...",
    ]

    # Run all requests concurrently
    results = await asyncio.gather(*[summarize(doc) for doc in documents])

    for doc, summary in zip(documents, results):
        print(f"Original: {doc[:60]}...")
        print(f"Summary:  {summary}\n")

asyncio.run(main())

Running 3 requests concurrently cuts total latency from ~6s (sequential) to ~2s (parallel).

11. Cost Optimization Tips

1. Set `max_tokens` on every request

Without max_tokens, the model can generate very long responses. Set a reasonable ceiling:

# Bad — unlimited output, unpredictable cost
response = client.chat.completions.create(model="deepseek-v4-flash", messages=[...])

# Good — capped at 500 tokens
response = client.chat.completions.create(model="deepseek-v4-flash", messages=[...], max_tokens=500)

2. Trim conversation history

Each turn in a multi-turn conversation is billed at full context length. After ~10 turns, summarize old messages:

def trim_history(history: list[dict], max_turns: int = 6) -> list[dict]:
    system = [m for m in history if m["role"] == "system"]
    non_system = [m for m in history if m["role"] != "system"]
    return system + non_system[-max_turns * 2:]

3. Use `deepseek-v4-flash` for most tasks

deepseek-reasoner is ~8× more expensive. Route only genuinely complex reasoning tasks to it:

def pick_model(prompt: str) -> str:
    reasoning_keywords = {"prove", "derive", "optimize", "debug", "analyze step by step"}
    if any(kw in prompt.lower() for kw in reasoning_keywords):
        return "deepseek-reasoner"
    return "deepseek-v4-flash"

4. Cache frequent prompts

If users ask the same questions repeatedly, cache responses in Redis or a local dict:

import hashlib, json

_cache: dict[str, str] = {}

def cached_chat(messages: list[dict]) -> str:
    key = hashlib.md5(json.dumps(messages, sort_keys=True).encode()).hexdigest()
    if key in _cache:
        return _cache[key]
    result = chat_with_retry(messages)  # from Section 8
    _cache[key] = result
    return result

Real-World Cost Estimates

| Use Case | Monthly Volume | Estimated Cost | |----------|---------------|----------------| | Internal Slack bot (short Q&A) | 50K requests / 100M tokens | ~$21/mo | | Customer support chatbot | 200K requests / 500M tokens | ~$98/mo | | Code review pipeline (CI/CD) | 10K PRs / 200M tokens | ~$42/mo | | Document summarization (RAG) | 1M docs / 2B tokens | ~$420/mo |

Compare that to GPT-4o ($2.50/M input, $10/M output) at the same volumes: the Slack bot alone would cost $330/mo — 15× more.

Want credits that never expire and no monthly subscription? Check out Global API credit packs.

Quick Reference

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://global-apis.com/v1",
)

# Chat completion
r = client.chat.completions.create(
    model="deepseek-v4-flash",          # or "deepseek-reasoner"
    messages=[{"role": "user", "content": "Hello!"}],
    temperature=0.7,                # 0 = deterministic, 1 = creative
    max_tokens=512,
    stream=False,
)
print(r.choices[0].message.content)
print(r.usage.prompt_tokens, r.usage.completion_tokens)

| Parameter | Values | Default | |-----------|--------|---------| | model | deepseek-v4-flash, deepseek-reasoner | — | | temperature | 0.0 – 2.0 | 1.0 | | max_tokens | 1 – 8192 | model max | | stream | True / False | False | | top_p | 0.0 – 1.0 | 1.0 |

FAQ

Do I need a special SDK for DeepSeek?
No. DeepSeek's API is OpenAI-compatible. pip install openai is all you need.

What's the context window size?
deepseek-v4-flash supports up to 64K tokens per request (input + output combined).

Is the API rate-limited?
Global API applies per-account rate limits. For high-volume use cases, contact us — we offer custom rate limit increases.

Can I use LangChain or LlamaIndex with DeepSeek?
Yes. Set openai_api_base (LangChain) or openai_base_url (LlamaIndex) to https://global-apis.com/v1. Your existing chains work without modification.

How do credits work?
1 credit = $0.01. DeepSeek V4 Flash costs 14 credits per 1M input tokens and 28 credits per 1M output tokens. Credits never expire.

Get Started Today

You now have everything you need to call the DeepSeek API from Python — authentication, chat completions, streaming, function calling, async usage, and cost tracking.

Start free: https://global-apis.com/register — 100 free credits, no credit card required.
Explore pricing: https://global-apis.com/pricing — credit packs from $19.99.

Have a question or hit a snag? Drop us a message through the dashboard — we're happy to help.

DeepSeek API Python Tutorial 2026: Complete Guide

DeepSeek API Python Tutorial 2026: Complete Guide

Prerequisites

1. Installation

2. Authentication & Client Setup

Option A: Environment Variable (Recommended)

Option B: Inline (Quick Testing Only)

3. Your First Chat Completion

4. Choosing the Right Model

5. Streaming Responses

6. Multi-Turn Conversations

7. Function Calling (Tool Use)

8. Error Handling & Retries

9. Tracking Token Usage & Cost

10. Async Usage with `asyncio`

11. Cost Optimization Tips

1. Set `max_tokens` on every request

2. Trim conversation history

3. Use `deepseek-v4-flash` for most tasks

4. Cache frequent prompts

Real-World Cost Estimates

Quick Reference

FAQ

Get Started Today

Related Articles

Part of DeepSeek API Complete Guide

Related Articles

Start Building with Global API

DeepSeek API Python Tutorial 2026: Complete Guide

DeepSeek API Python Tutorial 2026: Complete Guide

Prerequisites

1. Installation

2. Authentication & Client Setup

Option A: Environment Variable (Recommended)

Option B: Inline (Quick Testing Only)

3. Your First Chat Completion

4. Choosing the Right Model

5. Streaming Responses

6. Multi-Turn Conversations

7. Function Calling (Tool Use)

8. Error Handling & Retries

9. Tracking Token Usage & Cost

10. Async Usage with asyncio

11. Cost Optimization Tips

1. Set max_tokens on every request

2. Trim conversation history

3. Use deepseek-v4-flash for most tasks

4. Cache frequent prompts

Real-World Cost Estimates

Quick Reference

FAQ

Get Started Today

Related Articles

Part of DeepSeek API Complete Guide

Related Articles

Start Building with Global API

10. Async Usage with `asyncio`

1. Set `max_tokens` on every request

3. Use `deepseek-v4-flash` for most tasks