How to Use DeepSeek API with Python: Complete Guide (2026)
2026-05-05 — by Global API Team
How to Use DeepSeek API with Python: Complete Guide (2026)
DeepSeek's models deliver GPT-4-class performance at a fraction of the cost — $0.14/M input tokens and $0.28/M output tokens for DeepSeek V4 Flash. This guide walks you through everything you need to call the DeepSeek API from Python: installation, authentication, chat completions, streaming responses, function calling, error handling, and cost optimization patterns.
TL;DR: DeepSeek's API is fully OpenAI-compatible. Install
openai, pointbase_urlat https://global-apis.com/v1, and your existing OpenAI code works — at 74% less cost.
Prerequisites
- Python 3.9+
- A DeepSeek API key — get one for free (100 credits, no credit card required) at https://global-apis.com/register
- Basic familiarity with Python and REST APIs
1. Installation
DeepSeek's API follows the OpenAI specification, so you use the official openai Python package — no vendor-specific SDK needed.
pip install openai
That's it. No additional packages required for basic usage.
2. Authentication & Client Setup
Option A: Environment Variable (Recommended)
Set your API key as an environment variable so it's never hardcoded in source files:
# Linux / macOS
export DEEPSEEK_API_KEY="a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4"
# Windows (PowerShell)
$env:DEEPSEEK_API_KEY = "a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4"
Then initialize the client:
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://global-apis.com/v1",
)
Option B: Inline (Quick Testing Only)
from openai import OpenAI
client = OpenAI(
api_key="a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4", # Replace with your key
base_url="https://global-apis.com/v1",
)
⚠️ Never commit a real API key to version control. Use environment variables or a secrets manager in production.
3. Your First Chat Completion
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://global-apis.com/v1",
)
response = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "system", "content": "You are a helpful Python programming assistant."},
{"role": "user", "content": "Write a Python function to flatten a nested list."},
],
temperature=0.7,
max_tokens=512,
)
print(response.choices[0].message.content)
print(f"\nTokens used — input: {response.usage.prompt_tokens}, output: {response.usage.completion_tokens}")
Sample output:
def flatten(nested_list):
result = []
for item in nested_list:
if isinstance(item, list):
result.extend(flatten(item))
else:
result.append(item)
return result
Tokens used — input: 38, output: 71
4. Choosing the Right Model
Global API exposes two DeepSeek models:
| Model | Best For | Input Price | Output Price |
|-------|----------|-------------|--------------|
| deepseek-chat | Everyday tasks: summarization, coding assist, Q&A, translation | $0.14/M tokens | $0.28/M tokens |
| deepseek-reasoner | Complex reasoning: math proofs, multi-step logic, code debugging | $0.55/M tokens | $2.19/M tokens |
Rule of thumb: Start with deepseek-chat. Switch to deepseek-reasoner only when the task genuinely requires multi-step logical reasoning — it's ~8× more expensive per output token.
# For complex reasoning tasks
response = client.chat.completions.create(
model="deepseek-reasoner",
messages=[
{"role": "user", "content": "Prove that the square root of 2 is irrational."}
],
max_tokens=1024,
)
5. Streaming Responses
For chatbot UIs or any scenario where latency matters, use streaming to display tokens as they arrive:
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://global-apis.com/v1",
)
stream = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "user", "content": "Explain async/await in Python in simple terms."}
],
stream=True,
)
print("Response: ", end="", flush=True)
for chunk in stream:
delta = chunk.choices[0].delta
if delta.content is not None:
print(delta.content, end="", flush=True)
print() # newline at the end
Streaming reduces time-to-first-token from ~2s to ~200ms for most responses, which dramatically improves perceived performance in interactive applications.
6. Multi-Turn Conversations
To maintain context across turns, pass the full conversation history in each request:
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://global-apis.com/v1",
)
def chat(messages: list[dict]) -> str:
response = client.chat.completions.create(
model="deepseek-chat",
messages=messages,
temperature=0.7,
)
return response.choices[0].message.content
# Build conversation history
history = [
{"role": "system", "content": "You are a concise technical assistant."}
]
user_inputs = [
"What is a Python decorator?",
"Can you show me a practical example?",
"Now make it work as a retry decorator.",
]
for user_message in user_inputs:
history.append({"role": "user", "content": user_message})
reply = chat(history)
history.append({"role": "assistant", "content": reply})
print(f"User: {user_message}")
print(f"Assistant: {reply}\n{'─' * 60}")
Pro tip: To control costs in long conversations, trim old messages from history once the conversation exceeds your token budget. Keep the system prompt and the last N turns.
7. Function Calling (Tool Use)
DeepSeek supports OpenAI-compatible function calling. This lets the model decide when to invoke your code:
import json
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://global-apis.com/v1",
)
# Define available tools
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The city name, e.g. 'London'"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["city"],
},
},
}
]
def get_weather(city: str, unit: str = "celsius") -> dict:
"""Stub — replace with a real weather API call."""
return {"city": city, "temperature": 22, "unit": unit, "condition": "sunny"}
messages = [{"role": "user", "content": "What's the weather like in Tokyo?"}]
# First API call — model decides whether to call a function
response = client.chat.completions.create(
model="deepseek-chat",
messages=messages,
tools=tools,
tool_choice="auto",
)
message = response.choices[0].message
# Check if the model wants to call a function
if message.tool_calls:
tool_call = message.tool_calls[0]
fn_name = tool_call.function.name
fn_args = json.loads(tool_call.function.arguments)
# Execute the function
fn_result = get_weather(**fn_args)
# Append function result and get final response
messages.append(message)
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(fn_result),
})
final_response = client.chat.completions.create(
model="deepseek-chat",
messages=messages,
)
print(final_response.choices[0].message.content)
else:
print(message.content)
8. Error Handling & Retries
Production code needs robust error handling. Here's a pattern with exponential backoff:
import os
import time
import random
from openai import OpenAI, RateLimitError, APITimeoutError, APIConnectionError
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://global-apis.com/v1",
timeout=30.0,
)
def chat_with_retry(messages: list[dict], max_retries: int = 3, **kwargs) -> str:
"""Call the API with exponential backoff on transient errors."""
for attempt in range(max_retries):
try:
response = client.chat.completions.create(
model="deepseek-chat",
messages=messages,
**kwargs,
)
return response.choices[0].message.content
except RateLimitError:
if attempt == max_retries - 1:
raise
wait = (2 ** attempt) + random.uniform(0, 1)
print(f"Rate limited. Retrying in {wait:.1f}s...")
time.sleep(wait)
except (APITimeoutError, APIConnectionError) as e:
if attempt == max_retries - 1:
raise
wait = (2 ** attempt) + random.uniform(0, 1)
print(f"Network error: {e}. Retrying in {wait:.1f}s...")
time.sleep(wait)
raise RuntimeError("Max retries exceeded") # Should not be reached
# Usage
result = chat_with_retry(
messages=[{"role": "user", "content": "Summarize the Python GIL in one paragraph."}],
temperature=0.5,
max_tokens=200,
)
print(result)
9. Tracking Token Usage & Cost
With DeepSeek V4 Flash pricing ($0.14/M input, $0.28/M output), even heavy workloads cost very little. Here's how to instrument cost tracking:
import os
from dataclasses import dataclass, field
from openai import OpenAI
INPUT_COST_PER_TOKEN = 0.14 / 1_000_000 # $0.14 per 1M input tokens
OUTPUT_COST_PER_TOKEN = 0.28 / 1_000_000 # $0.28 per 1M output tokens
@dataclass
class UsageTracker:
total_input_tokens: int = 0
total_output_tokens: int = 0
request_count: int = 0
total_cost_usd: float = field(init=False, default=0.0)
def record(self, usage):
self.total_input_tokens += usage.prompt_tokens
self.total_output_tokens += usage.completion_tokens
self.request_count += 1
self.total_cost_usd = (
self.total_input_tokens * INPUT_COST_PER_TOKEN +
self.total_output_tokens * OUTPUT_COST_PER_TOKEN
)
def report(self):
print(f"Requests: {self.request_count}")
print(f"Input tokens: {self.total_input_tokens:,}")
print(f"Output tokens: {self.total_output_tokens:,}")
print(f"Estimated cost: ${self.total_cost_usd:.6f}")
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://global-apis.com/v1",
)
tracker = UsageTracker()
prompts = [
"Explain list comprehensions.",
"What is a Python context manager?",
"When should I use a generator vs a list?",
]
for prompt in prompts:
response = client.chat.completions.create(
model="deepseek-chat",
messages=[{"role": "user", "content": prompt}],
max_tokens=200,
)
tracker.record(response.usage)
tracker.report()
# Output (approximate):
# Requests: 3
# Input tokens: 51
# Output tokens: 462
# Estimated cost: $0.000136
10. Async Usage with asyncio
For high-throughput applications, use the async client to run multiple requests concurrently:
import asyncio
import os
from openai import AsyncOpenAI
async_client = AsyncOpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://global-apis.com/v1",
)
async def summarize(text: str) -> str:
response = await async_client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "system", "content": "Summarize the following in one sentence."},
{"role": "user", "content": text},
],
max_tokens=100,
)
return response.choices[0].message.content
async def main():
documents = [
"Python was created by Guido van Rossum and released in 1991...",
"Machine learning is a subset of artificial intelligence that...",
"REST APIs use HTTP methods to perform CRUD operations on resources...",
]
# Run all requests concurrently
results = await asyncio.gather(*[summarize(doc) for doc in documents])
for doc, summary in zip(documents, results):
print(f"Original: {doc[:60]}...")
print(f"Summary: {summary}\n")
asyncio.run(main())
Running 3 requests concurrently cuts total latency from ~6s (sequential) to ~2s (parallel).
11. Cost Optimization Tips
1. Set max_tokens on every request
Without max_tokens, the model can generate very long responses. Set a reasonable ceiling:
# Bad — unlimited output, unpredictable cost
response = client.chat.completions.create(model="deepseek-chat", messages=[...])
# Good — capped at 500 tokens
response = client.chat.completions.create(model="deepseek-chat", messages=[...], max_tokens=500)
2. Trim conversation history
Each turn in a multi-turn conversation is billed at full context length. After ~10 turns, summarize old messages:
def trim_history(history: list[dict], max_turns: int = 6) -> list[dict]:
system = [m for m in history if m["role"] == "system"]
non_system = [m for m in history if m["role"] != "system"]
return system + non_system[-max_turns * 2:]
3. Use deepseek-chat for most tasks
deepseek-reasoner is ~8× more expensive. Route only genuinely complex reasoning tasks to it:
def pick_model(prompt: str) -> str:
reasoning_keywords = {"prove", "derive", "optimize", "debug", "analyze step by step"}
if any(kw in prompt.lower() for kw in reasoning_keywords):
return "deepseek-reasoner"
return "deepseek-chat"
4. Cache frequent prompts
If users ask the same questions repeatedly, cache responses in Redis or a local dict:
import hashlib, json
_cache: dict[str, str] = {}
def cached_chat(messages: list[dict]) -> str:
key = hashlib.md5(json.dumps(messages, sort_keys=True).encode()).hexdigest()
if key in _cache:
return _cache[key]
result = chat_with_retry(messages) # from Section 8
_cache[key] = result
return result
Real-World Cost Estimates
| Use Case | Monthly Volume | Estimated Cost | |----------|---------------|----------------| | Internal Slack bot (short Q&A) | 50K requests / 100M tokens | ~$21/mo | | Customer support chatbot | 200K requests / 500M tokens | ~$98/mo | | Code review pipeline (CI/CD) | 10K PRs / 200M tokens | ~$42/mo | | Document summarization (RAG) | 1M docs / 2B tokens | ~$420/mo |
Compare that to GPT-4o ($2.50/M input, $10/M output) at the same volumes: the Slack bot alone would cost $330/mo — 15× more.
Want credits that never expire and no monthly subscription? Check out Global API credit packs.
Quick Reference
from openai import OpenAI
import os
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://global-apis.com/v1",
)
# Chat completion
r = client.chat.completions.create(
model="deepseek-chat", # or "deepseek-reasoner"
messages=[{"role": "user", "content": "Hello!"}],
temperature=0.7, # 0 = deterministic, 1 = creative
max_tokens=512,
stream=False,
)
print(r.choices[0].message.content)
print(r.usage.prompt_tokens, r.usage.completion_tokens)
| Parameter | Values | Default |
|-----------|--------|---------|
| model | deepseek-chat, deepseek-reasoner | — |
| temperature | 0.0 – 2.0 | 1.0 |
| max_tokens | 1 – 8192 | model max |
| stream | True / False | False |
| top_p | 0.0 – 1.0 | 1.0 |
FAQ
Do I need a special SDK for DeepSeek?
No. DeepSeek's API is OpenAI-compatible. pip install openai is all you need.
What's the context window size?
deepseek-chat supports up to 64K tokens per request (input + output combined).
Is the API rate-limited?
Global API applies per-account rate limits. For high-volume use cases, contact us — we offer custom rate limit increases.
Can I use LangChain or LlamaIndex with DeepSeek?
Yes. Set openai_api_base (LangChain) or openai_base_url (LlamaIndex) to https://global-apis.com/v1. Your existing chains work without modification.
How do credits work?
1 credit = $0.01. DeepSeek V4 Flash costs 14 credits per 1M input tokens and 28 credits per 1M output tokens. Credits never expire.
Get Started Today
You now have everything you need to call the DeepSeek API from Python — authentication, chat completions, streaming, function calling, async usage, and cost tracking.
Start free: https://global-apis.com/register — 100 free credits, no credit card required.
Explore pricing: https://global-apis.com/pricing — credit packs from $19.99.
Have a question or hit a snag? Drop us a message through the dashboard — we're happy to help.