(1부) 슬라이딩 윈도우 API로 긴 대화 이어가기 Cookbook

CLOVA Studio 운영자 · June 14, 2024

들어가며

CLOVA Studio에서 제공하는 Chat Completions API는 HyperCLOVA X 모델을 기반으로 하며, 사용자의 입력에 따라 자연스러운 대화를 생성할 수 있습니다.
하지만 대화가 길어질수록 모델이 처리할 수 있는 토큰 수(4096) 제한에 부딪혀 대화를 이어갈 수 없는 한계가 있습니다. 이러한 문제를 해결하기 위해
슬라이딩 윈도우 API와 Chat Completion API를 연결하는 방법을 소개하고자 합니다. 이를 통해 대화의 길이에 상관없이 지속적으로 자연스러운 대화를 생성할 수 있게 됩니다.
두 API의 연계를 통해 Context window의 한계를 극복하고, 보다 원활하고 확장성 있는 대화 시스템을 구현할 수 있을 것입니다.

슬라이딩 윈도우 작동 원리

대화 턴에서 입력된 대화 내역의 총 토큰 수Ⓐ와 새롭게 생성할 대화의 최대 토큰 수Ⓑ의 합이 모델이 처리할 수 있는 최대 토큰 수Ⓧ를 초과하면(Ⓐ+Ⓑ > Ⓧ),
Chat Completions API는 대화 생성을 중단합니다. 이를 해결하기 위해 슬라이딩 윈도우 API는 초과된 토큰 수(Ⓐ+Ⓑ-Ⓧ)만큼 기존 대화 중 오래된 대화 턴부터 삭제합니다.
대화 턴의 일부만 삭제되지 않도록 대화 턴 단위로 묶어서 삭제합니다.

전체 구조

Quote

Python 3.12.2

필요한 API들을 별도의 파일로 분리하여 모듈화하였고, 이를 필요에 따라 불러올 수 있도록 구성하였습니다. 관리의 편의성을 위해 슬라이딩 윈도우 API와 Chat Completions API 기능을 각각 별도의 파일로 분리하고, main.py에서 필요한 모듈을 불러와 사용할 수 있도록 구성하였습니다.

프로젝트 구조

project-root/
│
├── clovastudio_executor.py
├── sliding_window_executor.py
├── completion_executor.py
└── main.py

각 모듈 설명

1. 상위 클래스 정의

CLOVAStudioExecutor는 API 호출에 필요한 공통 기능을 묶어놓은 상위 클래스입니다.
이 클래스는 인증 토큰을 관리하고 API 요청을 처리하는 역할을 합니다. 전체 프로젝트 폴더 내에 clovastudio_executor.py로 저장합니다.

clovastudio_executor.py

import json
import http.client
from http import HTTPStatus
 
class CLOVAStudioExecutor:
    def __init__(self, host, api_key, api_key_primary_val, request_id):
        self._host = host
        self._api_key = api_key
        self._api_key_primary_val = api_key_primary_val
        self._request_id = request_id
  
    def _send_request(self, completion_request, endpoint):
        headers = {
            'Content-Type': 'application/json; charset=utf-8',
            'X-NCP-CLOVASTUDIO-API-KEY': self._api_key,
            'X-NCP-APIGW-API-KEY': self._api_key_primary_val,
            'X-NCP-CLOVASTUDIO-REQUEST-ID': self._request_id
        }
  
        conn = http.client.HTTPSConnection(self._host)
        conn.request('POST', endpoint, json.dumps(completion_request), headers)
        response = conn.getresponse()
        status = response.status
        result = json.loads(response.read().decode(encoding='utf-8'))
        conn.close()
        return result, status
  
    def execute(self, completion_request, endpoint):
        res, status = self._send_request(completion_request, endpoint)
        if status == HTTPStatus.OK:
            return res, status
        else:
            error_message = res.get("status", {}).get("message", "Unknown error") if isinstance(res, dict) else "Unknown error"
            raise ValueError(f"오류 발생: HTTP {status}, 메시지: {error_message}")

2. Chat Completions API

Chat Completions API를 정의하는 클래스입니다. 전체 프로젝트 폴더 내에 completion_executor.py로 저장합니다.

completion_executor.py

from clovastudio_executor import CLOVAStudioExecutor
from http import HTTPStatus
import requests
 
class ChatCompletionExecutor(CLOVAStudioExecutor):
    def __init__(self, host, api_key, api_key_primary_val, request_id):
        super().__init__(host, api_key, api_key_primary_val, request_id)
 
    def execute(self, completion_request, stream=True):
        headers = {
            'X-NCP-CLOVASTUDIO-API-KEY': self._api_key,
            'X-NCP-APIGW-API-KEY': self._api_key_primary_val,
            'X-NCP-CLOVASTUDIO-REQUEST-ID': self._request_id,
            'Content-Type': 'application/json; charset=utf-8',
            'Accept': 'text/event-stream' if stream else 'application/json'
        }
 
        with requests.post(self._host + '/testapp/v1/chat-completions/HCX-003',
                           headers=headers, json=completion_request, stream=stream) as r:
            if stream:
                if r.status_code == HTTPStatus.OK:
                    response_data = ""
                    for line in r.iter_lines():
                        if line:
                            decoded_line = line.decode("utf-8")
                            print(decoded_line)
                            response_data += decoded_line + "\n"
                    return response_data
                else:
                    raise ValueError(f"오류 발생: HTTP {r.status_code}, 메시지: {r.text}")
            else:
                if r.status_code == HTTPStatus.OK:
                    return r.json()
                else:
                    raise ValueError(f"오류 발생: HTTP {r.status_code}, 메시지: {r.text}")

3. 슬라이딩 윈도우 API

슬라이딩 윈도우 API를 정의하는 클래스입니다. 전체 프로젝트 폴더 내에 sliding_window_executor.py로 저장합니다.

sliding_window_executor.py

from clovastudio_executor import CLOVAStudioExecutor
import json
 
class SlidingWindowExecutor(CLOVAStudioExecutor):
    def execute(self, completion_request):
        endpoint = '/v1/api-tools/sliding/chat-messages/HCX-003'
        try:
            result, status = super().execute(completion_request, endpoint)
            if status == 200:
                # 슬라이딩 윈도우 적용 후 메시지를 반환
                return result['result']['messages']
            else:
                error_message = result.get('status', {}).get('message', 'Unknown error')
                raise ValueError(f"오류 발생: HTTP {status}, 메시지: {error_message}")
        except Exception as e:
            print(f"Error in SlidingWindowExecutor: {e}")
            return 'Error'

4. main.py

main.py 파일은 사용자 입력을 받아 슬라이딩 윈도우 API와 Chat Completions API를 순서대로 호출합니다.
이를 통해 최대 토큰 수를 초과하지 않도록 오래된 대화를 삭제하여 대화를 지속할 수 있도록 합니다.

from completion_executor import ChatCompletionExecutor
from sliding_window_executor import SlidingWindowExecutor
import json
 
# 스트리밍 응답에서 content 부분만 추출
def parse_stream_response(response):
    content_parts = []
    for line in response.splitlines():
        if line.startswith('data:'):
            data = json.loads(line[5:])
            if 'message' in data and 'content' in data['message']:
                content_parts.append(data['message']['content'])
    content = content_parts[-1] if content_parts else ""
    return content.strip()
 
# 논스트리밍 응답에서 content 부분만 추출
def parse_non_stream_response(response):
    result = response.get('result', {})
    message = result.get('message', {})
    content = message.get('content', '')
    return content.strip()
 
def main():
    # 초기 시스템 프롬프트 설정
    system_prompt = "- HyperCLOVA X는 네이버 클라우드의 하이퍼스케일 AI입니다."
    messages = []
 
    sliding_window_executor = SlidingWindowExecutor(
        host='clovastudio.apigw.ntruss.com',
        api_key = '<api_key>',
        api_key_primary_val = '<api_key_primary_val>',
        request_id = '<request_id>'
    )
 
    completion_executor = ChatCompletionExecutor(
        host='https://clovastudio.stream.ntruss.com',
        api_key = '<api_key>',
        api_key_primary_val = '<api_key_primary_val>',
        request_id = '<request_id>'
    )
 
    # stream 옵션에 따라 응답을 토큰 단위(stream=True), 전체(stream=False)로 받을 수 있습니다.
    stream = True
 
    while True:
        user_input = input("USER: ('exit'으로 종료): ")
        if user_input.lower() in ['exit', 'quit']:
            break
 
        messages.append({"role": "user", "content": user_input})
 
        request_data = {
            "messages": [{"role": "system", "content": system_prompt}] + messages,
            "maxTokens": 100  # 슬라이딩 윈도우에서 사용할 토큰 수
        }
 
        # SlidingWindowExecutor를 사용하여 조정된 메시지 가져오기
        try:
            adjusted_messages = sliding_window_executor.execute(request_data)
            if adjusted_messages == 'Error':
                print("Error adjusting messages with SlidingWindowExecutor")
                continue
        except Exception as e:
            print(f"Error adjusting messages: {e}")
            continue
 
        # Chat Completion 요청 데이터 생성
        completion_request_data = {
            "messages": adjusted_messages,
            "maxTokens": 100,  # Chat Completion에서 사용할 토큰 수
            "temperature": 0.5,
            "topK": 0,
            "topP": 0.8,
            "repeatPenalty": 1.2,
            "stopBefore": [],
            "includeAiFilters": True,
            "seed": 0
        }
 
        try:
            response = completion_executor.execute(completion_request_data, stream=stream)
            if stream:
                response_text = parse_stream_response(response)
            else:
                response_text = parse_non_stream_response(response)
             
            messages.append({"role": "assistant", "content": response_text})
 
            # 대화 내역 표시
            print("\nAdjusted Messages:", adjusted_messages, "\n")
            print("System Prompt:", system_prompt)
            print("USER Input:", user_input)
            print("CLOVA Response:", response_text, "\n")
 
        except Exception as e:
            print(f"Error: {e}")
 
if __name__ == "__main__":
    main()

main.py 호출 구조

이제 main.py의 자세한 호출 구조를 설명하겠습니다.

▼ 다음은 stream 옵션을 실행했을 때와 실행하지 않았을 때 각각의 요청을 처리하기 위한 함수입니다.

모듈 불러오기

from completion_executor import ChatCompletionExecutor
from sliding_window_executor import SlidingWindowExecutor
import json

▼ 먼저 필요한 모듈을 임포트하는 부분입니다. 'SlidingWindowExecutor'와 'CompletionExecutor' 클래스를 각각의 모듈에서 가져옵니다.
불러온 모듈과 클래스는 이후 코드에서 사용될 것입니다.

작동을 위한 중심 코드

# 스트리밍 응답에서 content 부분만 추출
def parse_stream_response(response):
    content_parts = []
    for line in response.splitlines():
        if line.startswith('data:'):
            data = json.loads(line[5:])
            if 'message' in data and 'content' in data['message']:
                content_parts.append(data['message']['content'])
 
    content = content_parts[-1] if content_parts else ""
    return content.strip()
 
# 논스트리밍 응답에서 content 부분만 추출
def parse_non_stream_response(response):
    result = response.get('result', {})
    message = result.get('message', {})
    content = message.get('content', '')
    return content.strip()

▼ 사용자 입력을 처리하는 과정을 설명해 드리겠습니다. 먼저 시스템 프롬프트를 설정한 후, 사용자로부터 입력을 받습니다.
받은 입력은 messages라는 리스트에 차례로 추가됩니다.

초기 설정 및 사용자 입력 처리

def main():
    # 시스템 프롬프트 설정
    system_prompt = "- HyperCLOVA X는 네이버 클라우드의 하이퍼스케일 AI입니다."
    messages = []
 
    sliding_window_executor = SlidingWindowExecutor(
        host='clovastudio.apigw.ntruss.com',
        api_key = '<api_key>',
        api_key_primary_val = '<api_key_primary_val>',
        request_id = '<request_id>'
    )
 
    completion_executor = ChatCompletionExecutor(
        host='https://clovastudio.stream.ntruss.com',
        api_key = '<api_key>',
        api_key_primary_val = '<api_key_primary_val>',
        request_id = '<request_id>'
    )
 
    # stream 옵션에 따라 응답을 토큰 단위(stream=True), 전체(stream=False)로 받을 수 있습니다.
    stream = True
 
    while True:
        user_input = input("USER: ('exit'으로 종료): ")
        if user_input.lower() in ['exit', 'quit']:
            break
 
        messages.append({"role": "user", "content": user_input})

▼ 사용자의 새로운 입력과 기존 대화 내역을 슬라이딩 윈도우 API에 전달합니다. 이 API는 입력된 데이터의 토큰 수가 최대 허용 개수를 초과하지 않도록
대화 내역을 조정하는 역할을 합니다. 조정된 대화 내역은 'adjusted_messages'라는 변수에 저장되어 이후 처리 과정에서 활용됩니다.

슬라이딩 윈도우 API 호출

# 슬라이딩 윈도우 요청 데이터 생성
request_data = {
    "messages": [{"role": "system", "content": system_prompt}] + messages,
    "maxTokens": 100  # 슬라이딩 윈도우에서 사용할 토큰 수
}
 
try:
    adjusted_messages = sliding_window_executor.execute(request_data)
    if adjusted_messages == 'Error':
        print("Error adjusting messages with SlidingWindowExecutor")
        continue
except Exception as e:
    print(f"Error adjusting messages: {e}")
    continue

▼ 슬라이딩 윈도우 API로 조정된 'adjusted_messages' 변수를 Chat Completions API에 전달하여 사용자 입력에 대한 응답을 생성합니다.
생성된 응답은 대화 내역에 추가되며, 이 대화 내역은 터미널 창에 출력되어 사용자가 확인할 수 있습니다.

Chat Completions API 호출

# Chat Completion 요청 데이터 생성
completion_request_data = {
    "messages": adjusted_messages,
    "maxTokens": 100,  # Chat Completion에서 사용할 토큰 수
    "temperature": 0.5,
    "topK": 0,
    "topP": 0.8,
    "repeatPenalty": 1.2,
    "stopBefore": [],
    "includeAiFilters": True,
    "seed": 0
}
 
try:
    response = completion_executor.execute(completion_request_data, stream=stream)
    if stream:
        response_text = parse_stream_response(response)
    else:
        response_text = parse_non_stream_response(response)
     
    messages.append({"role": "assistant", "content": response_text})

▼ Chat Completions API를 통해 받은 응답을 처리하고 출력합니다. 이 스크립트는 대화 내역을 사용자가 보기 편하게 출력해 줍니다.
그리고 스크립트 실행 시 main() 함수가 자동으로 호출되도록 설계되어 있습니다.

응답 처리 및 출력

# 대화 내역 표시
            print("\nAdjusted Messages:", adjusted_messages, "\n")
            print("System Prompt:", system_prompt)
            print("USER Input:", user_input)
            print("CLOVA Response:", response_text, "\n")
 
        except Exception as e:
            print(f"Error: {e}")
 
if __name__ == "__main__":
    main()

터미널 출력

Adjusted Messages: [{'role': 'system', 'content': '- HyperCLOVA X는 네이버 클라우드의 하이퍼스케일 AI입니다.'}, {'role': 'user', 'content': '안녕 만나서 반가워 나는 서울에 살고 있어'}, {'role': 'assistant', 'content': '저도 사용자님을 만나게 되어 반갑습니다! 서울은 대한민국의 수도로 다양한 문화와 역사를 가진 도시죠. 혹시 특별히 좋아하는 장소가 있으신가요?'}, {'role': 'user', 'content': '내가 어디 살고 있는지 기억하니?'}] 
 
System Prompt: - HyperCLOVA X는 네이버 클라우드의 하이퍼스케일 AI입니다.
USER Input: 내가 어디 살고 있는지 기억하니?
CLOVA Response: 네, 사용자님께서는 서울에 거주하고 계신다고 말씀해 주셨습니다. 그 외에도 저에게 여러 가지 이야기를 해 주셨어요. 제가 도움 드릴 수 있는 다른 것이 있을까요?

맺음말

이 글을 통해 우리는 Chat Completions API와 슬라이딩 윈도우 API를 결합하여 대화의 길이에 구애받지 않고 자연스러운 대화를 유지하는 방법을 살펴보았습니다.
사용자와 AI 어시스턴트 간의 대화가 길어질수록 모델이 처리할 수 있는 토큰의 개수 제한에 부딪히게 되는데, 이는 대화의 연속성과 자연스러움을 저해하는 요인입니다.
하지만 슬라이딩 윈도우 API를 도입함으로써 대화 내역에서 오래된 대화 턴을 제거하고 메모리를 효율적으로 활용할 수 있게 되었습니다.
이를 통해 대화의 흐름을 끊지 않고 최신 문맥을 반영한 응답을 생성할 수 있게 되었으며, 사용자 친화적인 대화형 AI 시스템을 구현할 수 있었습니다.

앞으로도 다양한 엔지니어링 기법과 아이디어를 활용하여 대화형 AI 기술의 발전 방향을 모색해야 할 것입니다.

리드넘버 · October 22, 2024

자바로 된 예제 코드가 있을까요?

(1부) 슬라이딩 윈도우 API로 긴 대화 이어가기 Cookbook

Recommended Posts

CLOVA Studio 운영자

리드넘버

게시글 및 댓글을 작성하려면 로그인 해주세요.

NAVER Cloud

Home