Global API
← Back to Blog

How to Use DeepSeek API with Python: Complete Guide (2026)

2026-05-05 β€” by Global API Team

How to Use DeepSeek API with Python: Complete Guide (2026)
deepseek-apipython-tutorialdeepseek-pythonllm-apiopenai-sdk-pythondeepseek-chatai-api-tutorialdeepseek-v4-flashtutorial

How to Use DeepSeek API with Python: Complete Guide (2026)

DeepSeek's models deliver GPT-4-class performance at a fraction of the cost β€” $0.14/M input tokens and $0.28/M output tokens for DeepSeek V4 Flash. This guide walks you through everything you need to call the DeepSeek API from Python: installation, authentication, chat completions, streaming responses, function calling, error handling, and cost optimization patterns.

TL;DR: DeepSeek's API is fully OpenAI-compatible. Install openai, point base_url at https://global-apis.com/v1, and your existing OpenAI code works β€” at 74% less cost.


Prerequisites

  • Python 3.9+
  • A DeepSeek API key β€” get one for free (100 credits, no credit card required) at https://global-apis.com/register
  • Basic familiarity with Python and REST APIs

1. Installation

DeepSeek's API follows the OpenAI specification, so you use the official openai Python package β€” no vendor-specific SDK needed.

pip install openai

That's it. No additional packages required for basic usage.


2. Authentication & Client Setup

Option A: Environment Variable (Recommended)

Set your API key as an environment variable so it's never hardcoded in source files:

# Linux / macOS
export DEEPSEEK_API_KEY="a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4"

# Windows (PowerShell)
$env:DEEPSEEK_API_KEY = "a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4"

Then initialize the client:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://global-apis.com/v1",
)

Option B: Inline (Quick Testing Only)

from openai import OpenAI

client = OpenAI(
    api_key="a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4",  # Replace with your key
    base_url="https://global-apis.com/v1",
)

⚠️ Never commit a real API key to version control. Use environment variables or a secrets manager in production.


3. Your First Chat Completion

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://global-apis.com/v1",
)

response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {"role": "system", "content": "You are a helpful Python programming assistant."},
        {"role": "user", "content": "Write a Python function to flatten a nested list."},
    ],
    temperature=0.7,
    max_tokens=512,
)

print(response.choices[0].message.content)
print(f"\nTokens used β€” input: {response.usage.prompt_tokens}, output: {response.usage.completion_tokens}")

Sample output:

def flatten(nested_list):
    result = []
    for item in nested_list:
        if isinstance(item, list):
            result.extend(flatten(item))
        else:
            result.append(item)
    return result

Tokens used β€” input: 38, output: 71

4. Choosing the Right Model

Global API exposes two DeepSeek models:

| Model | Best For | Input Price | Output Price | |-------|----------|-------------|--------------| | deepseek-chat | Everyday tasks: summarization, coding assist, Q&A, translation | $0.14/M tokens | $0.28/M tokens | | deepseek-reasoner | Complex reasoning: math proofs, multi-step logic, code debugging | $0.55/M tokens | $2.19/M tokens |

Rule of thumb: Start with deepseek-chat. Switch to deepseek-reasoner only when the task genuinely requires multi-step logical reasoning β€” it's ~8Γ— more expensive per output token.

# For complex reasoning tasks
response = client.chat.completions.create(
    model="deepseek-reasoner",
    messages=[
        {"role": "user", "content": "Prove that the square root of 2 is irrational."}
    ],
    max_tokens=1024,
)

5. Streaming Responses

For chatbot UIs or any scenario where latency matters, use streaming to display tokens as they arrive:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://global-apis.com/v1",
)

stream = client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {"role": "user", "content": "Explain async/await in Python in simple terms."}
    ],
    stream=True,
)

print("Response: ", end="", flush=True)
for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.content is not None:
        print(delta.content, end="", flush=True)

print()  # newline at the end

Streaming reduces time-to-first-token from ~2s to ~200ms for most responses, which dramatically improves perceived performance in interactive applications.


6. Multi-Turn Conversations

To maintain context across turns, pass the full conversation history in each request:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://global-apis.com/v1",
)

def chat(messages: list[dict]) -> str:
    response = client.chat.completions.create(
        model="deepseek-chat",
        messages=messages,
        temperature=0.7,
    )
    return response.choices[0].message.content

# Build conversation history
history = [
    {"role": "system", "content": "You are a concise technical assistant."}
]

user_inputs = [
    "What is a Python decorator?",
    "Can you show me a practical example?",
    "Now make it work as a retry decorator.",
]

for user_message in user_inputs:
    history.append({"role": "user", "content": user_message})
    reply = chat(history)
    history.append({"role": "assistant", "content": reply})
    print(f"User: {user_message}")
    print(f"Assistant: {reply}\n{'─' * 60}")

Pro tip: To control costs in long conversations, trim old messages from history once the conversation exceeds your token budget. Keep the system prompt and the last N turns.


7. Function Calling (Tool Use)

DeepSeek supports OpenAI-compatible function calling. This lets the model decide when to invoke your code:

import json
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://global-apis.com/v1",
)

# Define available tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "The city name, e.g. 'London'"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit"
                    }
                },
                "required": ["city"],
            },
        },
    }
]

def get_weather(city: str, unit: str = "celsius") -> dict:
    """Stub β€” replace with a real weather API call."""
    return {"city": city, "temperature": 22, "unit": unit, "condition": "sunny"}

messages = [{"role": "user", "content": "What's the weather like in Tokyo?"}]

# First API call β€” model decides whether to call a function
response = client.chat.completions.create(
    model="deepseek-chat",
    messages=messages,
    tools=tools,
    tool_choice="auto",
)

message = response.choices[0].message

# Check if the model wants to call a function
if message.tool_calls:
    tool_call = message.tool_calls[0]
    fn_name = tool_call.function.name
    fn_args = json.loads(tool_call.function.arguments)

    # Execute the function
    fn_result = get_weather(**fn_args)

    # Append function result and get final response
    messages.append(message)
    messages.append({
        "role": "tool",
        "tool_call_id": tool_call.id,
        "content": json.dumps(fn_result),
    })

    final_response = client.chat.completions.create(
        model="deepseek-chat",
        messages=messages,
    )
    print(final_response.choices[0].message.content)
else:
    print(message.content)

8. Error Handling & Retries

Production code needs robust error handling. Here's a pattern with exponential backoff:

import os
import time
import random
from openai import OpenAI, RateLimitError, APITimeoutError, APIConnectionError

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://global-apis.com/v1",
    timeout=30.0,
)

def chat_with_retry(messages: list[dict], max_retries: int = 3, **kwargs) -> str:
    """Call the API with exponential backoff on transient errors."""
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="deepseek-chat",
                messages=messages,
                **kwargs,
            )
            return response.choices[0].message.content

        except RateLimitError:
            if attempt == max_retries - 1:
                raise
            wait = (2 ** attempt) + random.uniform(0, 1)
            print(f"Rate limited. Retrying in {wait:.1f}s...")
            time.sleep(wait)

        except (APITimeoutError, APIConnectionError) as e:
            if attempt == max_retries - 1:
                raise
            wait = (2 ** attempt) + random.uniform(0, 1)
            print(f"Network error: {e}. Retrying in {wait:.1f}s...")
            time.sleep(wait)

    raise RuntimeError("Max retries exceeded")  # Should not be reached

# Usage
result = chat_with_retry(
    messages=[{"role": "user", "content": "Summarize the Python GIL in one paragraph."}],
    temperature=0.5,
    max_tokens=200,
)
print(result)

9. Tracking Token Usage & Cost

With DeepSeek V4 Flash pricing ($0.14/M input, $0.28/M output), even heavy workloads cost very little. Here's how to instrument cost tracking:

import os
from dataclasses import dataclass, field
from openai import OpenAI

INPUT_COST_PER_TOKEN  = 0.14 / 1_000_000   # $0.14 per 1M input tokens
OUTPUT_COST_PER_TOKEN = 0.28 / 1_000_000   # $0.28 per 1M output tokens

@dataclass
class UsageTracker:
    total_input_tokens: int = 0
    total_output_tokens: int = 0
    request_count: int = 0
    total_cost_usd: float = field(init=False, default=0.0)

    def record(self, usage):
        self.total_input_tokens  += usage.prompt_tokens
        self.total_output_tokens += usage.completion_tokens
        self.request_count       += 1
        self.total_cost_usd = (
            self.total_input_tokens  * INPUT_COST_PER_TOKEN +
            self.total_output_tokens * OUTPUT_COST_PER_TOKEN
        )

    def report(self):
        print(f"Requests:       {self.request_count}")
        print(f"Input tokens:   {self.total_input_tokens:,}")
        print(f"Output tokens:  {self.total_output_tokens:,}")
        print(f"Estimated cost: ${self.total_cost_usd:.6f}")

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://global-apis.com/v1",
)
tracker = UsageTracker()

prompts = [
    "Explain list comprehensions.",
    "What is a Python context manager?",
    "When should I use a generator vs a list?",
]

for prompt in prompts:
    response = client.chat.completions.create(
        model="deepseek-chat",
        messages=[{"role": "user", "content": prompt}],
        max_tokens=200,
    )
    tracker.record(response.usage)

tracker.report()
# Output (approximate):
# Requests:       3
# Input tokens:   51
# Output tokens:  462
# Estimated cost: $0.000136

10. Async Usage with asyncio

For high-throughput applications, use the async client to run multiple requests concurrently:

import asyncio
import os
from openai import AsyncOpenAI

async_client = AsyncOpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://global-apis.com/v1",
)

async def summarize(text: str) -> str:
    response = await async_client.chat.completions.create(
        model="deepseek-chat",
        messages=[
            {"role": "system", "content": "Summarize the following in one sentence."},
            {"role": "user",   "content": text},
        ],
        max_tokens=100,
    )
    return response.choices[0].message.content

async def main():
    documents = [
        "Python was created by Guido van Rossum and released in 1991...",
        "Machine learning is a subset of artificial intelligence that...",
        "REST APIs use HTTP methods to perform CRUD operations on resources...",
    ]

    # Run all requests concurrently
    results = await asyncio.gather(*[summarize(doc) for doc in documents])

    for doc, summary in zip(documents, results):
        print(f"Original: {doc[:60]}...")
        print(f"Summary:  {summary}\n")

asyncio.run(main())

Running 3 requests concurrently cuts total latency from ~6s (sequential) to ~2s (parallel).


11. Cost Optimization Tips

1. Set max_tokens on every request

Without max_tokens, the model can generate very long responses. Set a reasonable ceiling:

# Bad β€” unlimited output, unpredictable cost
response = client.chat.completions.create(model="deepseek-chat", messages=[...])

# Good β€” capped at 500 tokens
response = client.chat.completions.create(model="deepseek-chat", messages=[...], max_tokens=500)

2. Trim conversation history

Each turn in a multi-turn conversation is billed at full context length. After ~10 turns, summarize old messages:

def trim_history(history: list[dict], max_turns: int = 6) -> list[dict]:
    system = [m for m in history if m["role"] == "system"]
    non_system = [m for m in history if m["role"] != "system"]
    return system + non_system[-max_turns * 2:]

3. Use deepseek-chat for most tasks

deepseek-reasoner is ~8Γ— more expensive. Route only genuinely complex reasoning tasks to it:

def pick_model(prompt: str) -> str:
    reasoning_keywords = {"prove", "derive", "optimize", "debug", "analyze step by step"}
    if any(kw in prompt.lower() for kw in reasoning_keywords):
        return "deepseek-reasoner"
    return "deepseek-chat"

4. Cache frequent prompts

If users ask the same questions repeatedly, cache responses in Redis or a local dict:

import hashlib, json

_cache: dict[str, str] = {}

def cached_chat(messages: list[dict]) -> str:
    key = hashlib.md5(json.dumps(messages, sort_keys=True).encode()).hexdigest()
    if key in _cache:
        return _cache[key]
    result = chat_with_retry(messages)  # from Section 8
    _cache[key] = result
    return result

Real-World Cost Estimates

| Use Case | Monthly Volume | Estimated Cost | |----------|---------------|----------------| | Internal Slack bot (short Q&A) | 50K requests / 100M tokens | ~$21/mo | | Customer support chatbot | 200K requests / 500M tokens | ~$98/mo | | Code review pipeline (CI/CD) | 10K PRs / 200M tokens | ~$42/mo | | Document summarization (RAG) | 1M docs / 2B tokens | ~$420/mo |

Compare that to GPT-4o ($2.50/M input, $10/M output) at the same volumes: the Slack bot alone would cost $330/mo β€” 15Γ— more.

Want credits that never expire and no monthly subscription? Check out Global API credit packs.


Quick Reference

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://global-apis.com/v1",
)

# Chat completion
r = client.chat.completions.create(
    model="deepseek-chat",          # or "deepseek-reasoner"
    messages=[{"role": "user", "content": "Hello!"}],
    temperature=0.7,                # 0 = deterministic, 1 = creative
    max_tokens=512,
    stream=False,
)
print(r.choices[0].message.content)
print(r.usage.prompt_tokens, r.usage.completion_tokens)

| Parameter | Values | Default | |-----------|--------|---------| | model | deepseek-chat, deepseek-reasoner | β€” | | temperature | 0.0 – 2.0 | 1.0 | | max_tokens | 1 – 8192 | model max | | stream | True / False | False | | top_p | 0.0 – 1.0 | 1.0 |


FAQ

Do I need a special SDK for DeepSeek?
No. DeepSeek's API is OpenAI-compatible. pip install openai is all you need.

What's the context window size?
deepseek-chat supports up to 64K tokens per request (input + output combined).

Is the API rate-limited?
Global API applies per-account rate limits. For high-volume use cases, contact us β€” we offer custom rate limit increases.

Can I use LangChain or LlamaIndex with DeepSeek?
Yes. Set openai_api_base (LangChain) or openai_base_url (LlamaIndex) to https://global-apis.com/v1. Your existing chains work without modification.

How do credits work?
1 credit = $0.01. DeepSeek V4 Flash costs 14 credits per 1M input tokens and 28 credits per 1M output tokens. Credits never expire.


Get Started Today

You now have everything you need to call the DeepSeek API from Python β€” authentication, chat completions, streaming, function calling, async usage, and cost tracking.

Start free: https://global-apis.com/register β€” 100 free credits, no credit card required.
Explore pricing: https://global-apis.com/pricing β€” credit packs from $19.99.

Have a question or hit a snag? Drop us a message through the dashboard β€” we're happy to help.

Related Articles

Article Series

Part of DeepSeek API Complete Guide

Everything you need to build with the DeepSeek API β€” models, pricing, code examples, and best practices.

  1. πŸ“–DeepSeek API Complete Guide← Start here
  2. 01DeepSeek API Complete Beginner's Guide 2026: From Zero to Production
  3. 02DeepSeek V4 Flash Complete Review: Benchmarks, Code Examples & Implementation Tips
  4. 03deepseek-v4-flash-review
  5. 04DeepSeek API Pricing Guide 2026: Complete Cost Breakdown & Savings Calculator
  6. 05How to Use DeepSeek API with Python: Complete Guide (2026)You are here
  7. 06deepseek-api-javascript-tutorial
  8. 07deepseek-coder-api-guide-2026
  9. 08deepseek-vs-openai-comparison
  10. 09deepseek-vs-qwen-vs-kimi-vs-glm-2026
  11. 10How to Migrate from OpenAI to DeepSeek in 10 Minutes (Complete Guide)
  12. 11OpenAI API Alternative 2026: Top 10 Cheapest Options (Tested & Ranked)
  13. 12build-ai-chat-app-deepseek-api
  14. 13ai-api-latency-comparison-2026

Related Articles

Cheapest DeepSeek API in 2026: Complete Buying Guide β†’AI API Cost Comparison 2026: GPT-4o vs Claude vs DeepSeek vs Gemini β†’DeepSeek API Pricing Guide 2026: Complete Cost Breakdown & Savings Calculator β†’

Start Building with Global API

100 free credits on signup. 180+ AI models, one API key. PayPal accepted.

View Pricing β†’

Β© 2026 Global API. All rights reserved.