Python으로 DeepSeek API 사용하는 방법: 완전 가이드 (2026)

2026-05-05 — by Global API Team

deepseek-api python-tutorial deepseek-python llm-api openai-sdk-python deepseek-v4-flash ai-api-tutorial deepseek-v4-flash tutorial

Python으로 DeepSeek API 사용하는 방법: 완전 가이드 (2026)

DeepSeek의 모델은 GPT-4 수준의 성능을 그 비용의 일부로 제공합니다 — DeepSeek V4 Flash의 경우 $0.14/M 입력 토큰 및 $0.28/M 출력 토큰. 이 가이드는 Python에서 DeepSeek API를 호출하는 데 필요한 모든 것을 안내합니다: 설치, 인증, 채팅 완성, 스트리밍 응답, 함수 호출, 오류 처리 및 비용 최적화 패턴.

TL;DR: DeepSeek의 API는 완전히 OpenAI 호환됩니다. openai를 설치하고 base_url을 https://global-apis.com/v1로 지정하면 기존 OpenAI 코드가 작동합니다 — 74% 저렴한 비용으로.

사전 요구사항

Python 3.9+
DeepSeek API 키 — https://global-apis.com/register에서 무료로 받으세요 (100 크레딧, 신용카드 불필요)
Python 및 REST API에 대한 기본 지식

1. 설치

DeepSeek의 API는 OpenAI 사양을 따르므로 공식 openai Python 패키지를 사용합니다 — 벤더별 SDK가 필요하지 않습니다.

pip install openai

이게 전부입니다. 기본 사용을 위한 추가 패키지는 필요하지 않습니다.

2. 인증 및 클라이언트 설정

옵션 A: 환경 변수 (권장)

API 키를 환경 변수로 설정하여 소스 파일에 하드코딩되지 않도록 하세요:

# Linux / macOS
export DEEPSEEK_API_KEY="a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4"

# Windows (PowerShell)
$env:DEEPSEEK_API_KEY = "a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4"

그런 다음 클라이언트를 초기화합니다:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://global-apis.com/v1",
)

옵션 B: 인라인 (빠른 테스트 전용)

from openai import OpenAI

client = OpenAI(
    api_key="a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4",  # Replace with your key
    base_url="https://global-apis.com/v1",
)

⚠️ 실제 API 키를 버전 관리에 커밋하지 마세요. 프로덕션에서는 환경 변수나 시크릿 관리자를 사용하세요.

3. 첫 번째 채팅 완성

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://global-apis.com/v1",
)

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "system", "content": "You are a helpful Python programming assistant."},
        {"role": "user", "content": "Write a Python function to flatten a nested list."},
    ],
    temperature=0.7,
    max_tokens=512,
)

print(response.choices[0].message.content)
print(f"\nTokens used — input: {response.usage.prompt_tokens}, output: {response.usage.completion_tokens}")

샘플 출력:

def flatten(nested_list):
    result = []
    for item in nested_list:
        if isinstance(item, list):
            result.extend(flatten(item))
        else:
            result.append(item)
    return result

Tokens used — input: 38, output: 71

4. 올바른 모델 선택

Global API는 두 가지 DeepSeek 모델을 제공합니다:

| 모델 | 최적 용도 | 입력 가격 | 출력 가격 | |-------|----------|-------------|--------------| | deepseek-v4-flash | 일상 작업: 요약, 코딩 지원, Q&A, 번역 | $0.14/M 토큰 | $0.28/M 토큰 | | deepseek-reasoner | 복잡한 추론: 수학 증명, 다단계 논리, 코드 디버깅 | $0.55/M 토큰 | $2.19/M 토큰 |

경험 법칙: deepseek-v4-flash로 시작하세요. 작업이 진정으로 다단계 논리적 추론을 필요로 할 때만 deepseek-reasoner로 전환하세요 — 출력 토큰당 약 8배 더 비쌉니다.

# For complex reasoning tasks
response = client.chat.completions.create(
    model="deepseek-reasoner",
    messages=[
        {"role": "user", "content": "Prove that the square root of 2 is irrational."}
    ],
    max_tokens=1024,
)

5. 스트리밍 응답

챗봇 UI나 지연 시간이 중요한 시나리오의 경우 토큰이 도착하는 대로 표시하기 위해 스트리밍을 사용하세요:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://global-apis.com/v1",
)

stream = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "user", "content": "Explain async/await in Python in simple terms."}
    ],
    stream=True,
)

print("Response: ", end="", flush=True)
for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.content is not None:
        print(delta.content, end="", flush=True)

print()  # newline at the end

스트리밍은 대부분의 응답에 대해 첫 토큰까지의 시간을 ~2초에서 ~200ms로 줄여 대화형 애플리케이션의 체감 성능을 극적으로 향상시킵니다.

6. 멀티턴 대화

턴 간에 컨텍스트를 유지하려면 각 요청에 전체 대화 기록을 전달하세요:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://global-apis.com/v1",
)

def chat(messages: list[dict]) -> str:
    response = client.chat.completions.create(
        model="deepseek-v4-flash",
        messages=messages,
        temperature=0.7,
    )
    return response.choices[0].message.content

# Build conversation history
history = [
    {"role": "system", "content": "You are a concise technical assistant."}
]

user_inputs = [
    "What is a Python decorator?",
    "Can you show me a practical example?",
    "Now make it work as a retry decorator.",
]

for user_message in user_inputs:
    history.append({"role": "user", "content": user_message})
    reply = chat(history)
    history.append({"role": "assistant", "content": reply})
    print(f"User: {user_message}")
    print(f"Assistant: {reply}\n{'─' * 60}")

프로 팁: 긴 대화에서 비용을 관리하려면 대화가 토큰 예산을 초과할 때 history에서 오래된 메시지를 잘라내세요. 시스템 프롬프트와 마지막 N턴을 유지하세요.

7. 함수 호출 (도구 사용)

DeepSeek은 OpenAI 호환 함수 호출을 지원합니다. 이를 통해 모델이 코드를 호출할 시기를 결정할 수 있습니다:

import json
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://global-apis.com/v1",
)

# Define available tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "The city name, e.g. 'London'"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit"
                    }
                },
                "required": ["city"],
            },
        },
    }
]

def get_weather(city: str, unit: str = "celsius") -> dict:
    """Stub — replace with a real weather API call."""
    return {"city": city, "temperature": 22, "unit": unit, "condition": "sunny"}

messages = [{"role": "user", "content": "What's the weather like in Tokyo?"}]

# First API call — model decides whether to call a function
response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=messages,
    tools=tools,
    tool_choice="auto",
)

message = response.choices[0].message

# Check if the model wants to call a function
if message.tool_calls:
    tool_call = message.tool_calls[0]
    fn_name = tool_call.function.name
    fn_args = json.loads(tool_call.function.arguments)

    # Execute the function
    fn_result = get_weather(**fn_args)

    # Append function result and get final response
    messages.append(message)
    messages.append({
        "role": "tool",
        "tool_call_id": tool_call.id,
        "content": json.dumps(fn_result),
    })

    final_response = client.chat.completions.create(
        model="deepseek-v4-flash",
        messages=messages,
    )
    print(final_response.choices[0].message.content)
else:
    print(message.content)

8. 오류 처리 및 재시도

프로덕션 코드에는 견고한 오류 처리가 필요합니다. 지수 백오프 패턴입니다:

import os
import time
import random
from openai import OpenAI, RateLimitError, APITimeoutError, APIConnectionError

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://global-apis.com/v1",
    timeout=30.0,
)

def chat_with_retry(messages: list[dict], max_retries: int = 3, **kwargs) -> str:
    """Call the API with exponential backoff on transient errors."""
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="deepseek-v4-flash",
                messages=messages,
                **kwargs,
            )
            return response.choices[0].message.content

        except RateLimitError:
            if attempt == max_retries - 1:
                raise
            wait = (2 ** attempt) + random.uniform(0, 1)
            print(f"Rate limited. Retrying in {wait:.1f}s...")
            time.sleep(wait)

        except (APITimeoutError, APIConnectionError) as e:
            if attempt == max_retries - 1:
                raise
            wait = (2 ** attempt) + random.uniform(0, 1)
            print(f"Network error: {e}. Retrying in {wait:.1f}s...")
            time.sleep(wait)

    raise RuntimeError("Max retries exceeded")  # Should not be reached

# Usage
result = chat_with_retry(
    messages=[{"role": "user", "content": "Summarize the Python GIL in one paragraph."}],
    temperature=0.5,
    max_tokens=200,
)
print(result)

9. 토큰 사용량 및 비용 추적

DeepSeek V4 Flash 가격($0.14/M 입력, $0.28/M 출력)으로는 무거운 워크로드도 비용이 매우 적게 듭니다. 비용 추적을 계측하는 방법입니다:

import os
from dataclasses import dataclass, field
from openai import OpenAI

INPUT_COST_PER_TOKEN  = 0.14 / 1_000_000   # $0.14 per 1M input tokens
OUTPUT_COST_PER_TOKEN = 0.28 / 1_000_000   # $0.28 per 1M output tokens

@dataclass
class UsageTracker:
    total_input_tokens: int = 0
    total_output_tokens: int = 0
    request_count: int = 0
    total_cost_usd: float = field(init=False, default=0.0)

    def record(self, usage):
        self.total_input_tokens  += usage.prompt_tokens
        self.total_output_tokens += usage.completion_tokens
        self.request_count       += 1
        self.total_cost_usd = (
            self.total_input_tokens  * INPUT_COST_PER_TOKEN +
            self.total_output_tokens * OUTPUT_COST_PER_TOKEN
        )

    def report(self):
        print(f"Requests:       {self.request_count}")
        print(f"Input tokens:   {self.total_input_tokens:,}")
        print(f"Output tokens:  {self.total_output_tokens:,}")
        print(f"Estimated cost: ${self.total_cost_usd:.6f}")

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://global-apis.com/v1",
)
tracker = UsageTracker()

prompts = [
    "Explain list comprehensions.",
    "What is a Python context manager?",
    "When should I use a generator vs a list?",
]

for prompt in prompts:
    response = client.chat.completions.create(
        model="deepseek-v4-flash",
        messages=[{"role": "user", "content": prompt}],
        max_tokens=200,
    )
    tracker.record(response.usage)

tracker.report()
# Output (approximate):
# Requests:       3
# Input tokens:   51
# Output tokens:  462
# Estimated cost: $0.000136

10. `asyncio`를 사용한 비동기 사용

높은 처리량이 필요한 애플리케이션의 경우 비동기 클라이언트를 사용하여 여러 요청을 동시에 실행하세요:

import asyncio
import os
from openai import AsyncOpenAI

async_client = AsyncOpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://global-apis.com/v1",
)

async def summarize(text: str) -> str:
    response = await async_client.chat.completions.create(
        model="deepseek-v4-flash",
        messages=[
            {"role": "system", "content": "Summarize the following in one sentence."},
            {"role": "user",   "content": text},
        ],
        max_tokens=100,
    )
    return response.choices[0].message.content

async def main():
    documents = [
        "Python was created by Guido van Rossum and released in 1991...",
        "Machine learning is a subset of artificial intelligence that...",
        "REST APIs use HTTP methods to perform CRUD operations on resources...",
    ]

    # Run all requests concurrently
    results = await asyncio.gather(*[summarize(doc) for doc in documents])

    for doc, summary in zip(documents, results):
        print(f"Original: {doc[:60]}...")
        print(f"Summary:  {summary}\n")

asyncio.run(main())

3개의 요청을 동시에 실행하면 총 지연 시간이 ~6초(순차)에서 ~2초(병렬)로 줄어듭니다.

11. 비용 최적화 팁

1. 모든 요청에 `max_tokens` 설정

max_tokens 없이는 모델이 매우 긴 응답을 생성할 수 있습니다. 합리적인 상한을 설정하세요:

# Bad — unlimited output, unpredictable cost
response = client.chat.completions.create(model="deepseek-v4-flash", messages=[...])

# Good — capped at 500 tokens
response = client.chat.completions.create(model="deepseek-v4-flash", messages=[...], max_tokens=500)

2. 대화 기록 정리

멀티턴 대화의 각 턴은 전체 컨텍스트 길이로 청구됩니다. ~10턴 후에 오래된 메시지를 요약하세요:

def trim_history(history: list[dict], max_turns: int = 6) -> list[dict]:
    system = [m for m in history if m["role"] == "system"]
    non_system = [m for m in history if m["role"] != "system"]
    return system + non_system[-max_turns * 2:]

3. 대부분의 작업에 `deepseek-v4-flash` 사용

deepseek-reasoner는 약 8배 더 비쌉니다. 진정으로 복잡한 추론 작업만 라우팅하세요:

def pick_model(prompt: str) -> str:
    reasoning_keywords = {"prove", "derive", "optimize", "debug", "analyze step by step"}
    if any(kw in prompt.lower() for kw in reasoning_keywords):
        return "deepseek-reasoner"
    return "deepseek-v4-flash"

4. 빈번한 프롬프트 캐싱

사용자가 동일한 질문을 반복적으로 하는 경우 Redis나 로컬 dict에 응답을 캐시하세요:

import hashlib, json

_cache: dict[str, str] = {}

def cached_chat(messages: list[dict]) -> str:
    key = hashlib.md5(json.dumps(messages, sort_keys=True).encode()).hexdigest()
    if key in _cache:
        return _cache[key]
    result = chat_with_retry(messages)  # from Section 8
    _cache[key] = result
    return result

실제 비용 추정

| 사용 사례 | 월간 볼륨 | 추정 비용 | |----------|---------------|----------------| | 내부 Slack 봇 (짧은 Q&A) | 50K 요청 / 100M 토큰 | ~$21/월 | | 고객 지원 챗봇 | 200K 요청 / 500M 토큰 | ~$98/월 | | 코드 리뷰 파이프라인 (CI/CD) | 10K PR / 200M 토큰 | ~$42/월 | | 문서 요약 (RAG) | 100만 문서 / 20억 토큰 | ~$420/월 |

동일한 볼륨에서 GPT-4o($2.50/M 입력, $10/M 출력)와 비교하면: Slack 봇만으로도 월 $330의 비용이 듭니다 — 15배 더 비쌉니다.

만료되지 않고 월간 구독이 필요 없는 크레딧을 원하시나요? Global API 크레딧 팩을 확인하세요.

빠른 참조

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://global-apis.com/v1",
)

# Chat completion
r = client.chat.completions.create(
    model="deepseek-v4-flash",          # or "deepseek-reasoner"
    messages=[{"role": "user", "content": "Hello!"}],
    temperature=0.7,                # 0 = deterministic, 1 = creative
    max_tokens=512,
    stream=False,
)
print(r.choices[0].message.content)
print(r.usage.prompt_tokens, r.usage.completion_tokens)

| 매개변수 | 값 | 기본값 | |-----------|--------|---------| | model | deepseek-v4-flash, deepseek-reasoner | — | | temperature | 0.0 – 2.0 | 1.0 | | max_tokens | 1 – 8192 | 모델 최대값 | | stream | True / False | False | | top_p | 0.0 – 1.0 | 1.0 |

FAQ

DeepSeek에 특별한 SDK가 필요한가요?
아니요. DeepSeek의 API는 OpenAI 호환입니다. pip install openai만 있으면 됩니다.

컨텍스트 윈도우 크기는 얼마인가요?
deepseek-v4-flash은 요청당 최대 64K 토큰(입력 + 출력 합산)을 지원합니다.

API에 레이트 리밋이 있나요?
Global API는 계정별 레이트 리밋을 적용합니다. 대량 사용 사례의 경우 문의해 주세요 — 맞춤형 레이트 리밋 증가를 제공합니다.

DeepSeek과 함께 LangChain이나 LlamaIndex를 사용할 수 있나요?
예. LangChain의 openai_api_base 또는 LlamaIndex의 openai_base_url을 https://global-apis.com/v1로 설정하세요. 기존 체인이 수정 없이 작동합니다.

크레딧은 어떻게 작동하나요?
1 크레딧 = $0.01. DeepSeek V4 Flash 비용: 100만 입력 토큰당 14 크레딧, 100만 출력 토큰당 28 크레딧. 크레딧은 만료되지 않습니다.

오늘 시작하기

이제 Python에서 DeepSeek API를 호출하는 데 필요한 모든 것 — 인증, 채팅 완성, 스트리밍, 함수 호출, 비동기 사용, 비용 추적 — 을 갖추었습니다.

무료로 시작: https://global-apis.com/register — 100 무료 크레딧, 신용카드 불필요.
가격 살펴보기: https://global-apis.com/pricing — $19.99부터 크레딧 팩.

질문이 있거나 문제가 발생하셨나요? 대시보드를 통해 메시지를 보내주세요 — 기꺼이 도와드리겠습니다.

Global API로 구축 시작하기

가입 시 100 무료 크레딧을 받으세요 — 신용카드 불필요. 하나의 OpenAI 호환 API 키로 180개 이상의 AI 모델(DeepSeek, Qwen, Kimi, GLM, Doubao 등)에 액세스하세요.

👉 무료로 시작하기 →

PayPal 사용 가능 (Visa, Mastercard, Amex). 5분 설정.

Python으로 DeepSeek API 사용하는 방법: 완전 가이드 (2026)

Python으로 DeepSeek API 사용하는 방법: 완전 가이드 (2026)

사전 요구사항

1. 설치

2. 인증 및 클라이언트 설정

옵션 A: 환경 변수 (권장)

옵션 B: 인라인 (빠른 테스트 전용)

3. 첫 번째 채팅 완성

4. 올바른 모델 선택

5. 스트리밍 응답

6. 멀티턴 대화

7. 함수 호출 (도구 사용)

8. 오류 처리 및 재시도

9. 토큰 사용량 및 비용 추적

10. `asyncio`를 사용한 비동기 사용

11. 비용 최적화 팁

1. 모든 요청에 `max_tokens` 설정

2. 대화 기록 정리

3. 대부분의 작업에 `deepseek-v4-flash` 사용

4. 빈번한 프롬프트 캐싱

실제 비용 추정

빠른 참조

FAQ

오늘 시작하기

관련 글

Global API로 구축 시작하기

Part of DeepSeek API Complete Guide

Related Articles

Start Building with Global API

Python으로 DeepSeek API 사용하는 방법: 완전 가이드 (2026)

Python으로 DeepSeek API 사용하는 방법: 완전 가이드 (2026)

사전 요구사항

1. 설치

2. 인증 및 클라이언트 설정

옵션 A: 환경 변수 (권장)

옵션 B: 인라인 (빠른 테스트 전용)

3. 첫 번째 채팅 완성

4. 올바른 모델 선택

5. 스트리밍 응답

6. 멀티턴 대화

7. 함수 호출 (도구 사용)

8. 오류 처리 및 재시도

9. 토큰 사용량 및 비용 추적

10. asyncio를 사용한 비동기 사용

11. 비용 최적화 팁

1. 모든 요청에 max_tokens 설정

2. 대화 기록 정리

3. 대부분의 작업에 deepseek-v4-flash 사용

4. 빈번한 프롬프트 캐싱

실제 비용 추정

빠른 참조

FAQ

오늘 시작하기

관련 글

Global API로 구축 시작하기

Part of DeepSeek API Complete Guide

Related Articles

Start Building with Global API

10. `asyncio`를 사용한 비동기 사용

1. 모든 요청에 `max_tokens` 설정

3. 대부분의 작업에 `deepseek-v4-flash` 사용