Function calling 기반 멀티 데이터 분석·시각화 시스템

CLOVA Studio 운영자5 · September 9

들어가며

Function calling은 LLM이 사용자의 요청을 처리하기 위해 특정 도구를 직접 호출할 수 있도록 해주는 기능입니다. 이를 활용해 목적에 맞는 도구를 정의하고, 실시간 정보 처리나 외부 데이터 연동, 복잡한 계산 등 기존의 LLM만으로는 해결하기 어려운 작업까지 수행할 수 있습니다.

이번 쿡북에서는 Function calling을 복잡한 멀티 턴 시나리오에 적용하여, 궁극적으로 나만의 데이터 자동 분석 솔루션을 구현하는 방법을 살펴보겠습니다.
Function calling의 기본 개념과 활용법에 대한 보다 자세한 내용은 아래 이전 글들을 참고하시기 바랍니다.

작동 원리

멀티 턴

최근에는 Function calling이 단일 호출로 끝나는 것이 아니라, 사용자와 LLM간 여러 번의 대화를 주고받으며 복합적인 목표를 달성하는 멀티 턴(multi-turn) 방식이 새로운 트렌드로 자리 잡고 있습니다.
기존의 단일 호출 방식은 "오늘 날씨 어때?"처럼 한 번의 질문에 한 번의 Function calling으로 답을 얻는 간단한 시나리오에 적합합니다. 하지만 실제 사용자의 요청은 "오늘 날씨 알려주고, 비가 오면 우산 챙기라고 알려줘"와 같이 여러 요청이 포함된 멀티 쿼리 형태 혹은 모델의 답변을 보고, 추가적인 호출을 요청하는 멀티 턴 형태의 경우가 대다수입니다.

멀티 턴 방식은 이러한 대화의 맥락을 유지하면서 여러 Function calling을 순차적 또는 병렬적으로 수행할 수 있게 해주며, 이를 통해 더욱 복잡하고 실용적인 작업을 처리할 수 있습니다.

기본 작동 방식: 사용자의 요청을 받은 LLM은 사용 가능한 도구 목록을 확인한 뒤, 어떤 도구를 어떤 파라미터와 함께 사용해야 할지를 판단합니다. 그리고 그 결과를 JSON 형태로 반환합니다. 개발자는 이를 실제 도구 실행에 연결하고, 실행 결과를 다시 LLM에 전달합니다.
멀티 턴 작동 방식: 멀티 턴의 경우 이러한 과정이 한 번의 요청으로 끝나지 않고, 이후 이어지는 사용자 요청에서도 이전 맥락을 참고하여 기능을 수행합니다. 이 과정은 대화가 종료될 때까지 반복됩니다.

함수 실행 자동화

기존의 Function calling 방식은 LLM이 사용자 요청을 분석하여 함수를 호출하고 필요한 파라미터를 담은 JSON 객체를 반환하는 것에 그칩니다. 이 경우, 사용자는 반환된 JSON을 파싱하여 직접 코드를 실행하고, 그 결과를 다시 LLM에게 전달하는 번거로운 과정을 거쳐야 합니다.

이 과정은 다음과 같은 흐름으로 진행됩니다.

사용자 요청
LLM이 함수 및 파라미터 호출
사용자가 직접 해당 함수 코드 실행
결과값 획득
결과값을 LLM에게 다시 전달
LLM이 최종 답변 생성

하지만 이번 쿡북에서 다룰 방식은 이러한 과정을 자동화합니다. 미리 함수 코드를 정의하고, 모델이 함수를 실행할 수 있도록 설계합니다. 따라서 LLM이 함수 호출을 결정하면, 시스템이 사용자의 개입 없이 함수를 실행하고 그 결과를 LLM에게 다시 전달하는 과정을 전부 자동으로 처리합니다.
기존 방식과 비교한 멀티 턴 기반 자동화 파이프라인 구조는 다음과 같습니다.

시나리오

이번 쿡북에서 다룰 시나리오는 주문 상품에 대한 고객 CS 데이터 분석 작업입니다. 시나리오에서는 LLM의 Function calling 기능을 활용하여 복잡하고 반복적인 데이터 분석 작업을 자동화하고, 사용자가 자연어로 던지는 질문에 따라 데이터 분석 시각화 등 다양한 작업 수행 과정을 보여줍니다.
시나리오에서 설계할 도구는 총 5개입니다. LLM은 사용자의 요청을 바탕으로 이 도구들을 적절하게 호출하여 데이터 분석 파이프라인을 구축합니다.

load_data: 분석할 CSV 파일을 불러오는 도구
filter_data: 특정 조건에 맞는 데이터만 추출하는 도구
check_data: 데이터의 품질을 검사하는 도구
analyze_data: 고객 평점을 기준으로 다양한 통계 정보를 계산하는 도구
save_data: 분석이 완료된 데이터를 저장하는 도구

단순히 도구를 한 번 호출하는 것이 아니라, 사용자와의 멀티 턴 대화 과정에서 여러 도구를 순차적으로 호출하며 복합적인 분석 작업을 자동으로 수행하는 과정을 보여주는 데 초점을 맞추고 있습니다.

구체적인 CS 데이터 분석 시나리오 파이프라인은 다음과 같습니다.

분석 데이터

분석할 데이터는 구매한 제품에 대한 고객 문의 내용이 담긴 CSV 파일입니다.

Quote

본 쿡북에서 제공하는 예시 데이터 파일은 간단한 구현을 위해 임의로 구축하였습니다.

이 데이터는 다음과 같은 정보를 포함하고 있습니다.

id: 고유 식별자
date: 문의 작성 날짜
customer_id: 고객 ID
product_name: 상품명
cs_category: CS 유형 (예: as, payment , exchange, delivery, return)
text: 고객의 구체적인 문의 내용
rating: 서비스 만족도 평점 (1~5점)

시나리오 구현

1. 환경 설정

Quote

예제 코드는 Python 3.11.1에서 실행 확인하였으며, 최소 Python3.7 이상을 필요로 합니다.

API 토큰 발급은 CLOVA Studio API 가이드를 참조하세요.

2. 파일 생성

생성할 파일 구성은 다음과 같습니다.

cookbook/
├── main.py                    # 멀티 턴 대화 시나리오 실행 스크립트입니다.
├── fc_core.py                 # 핵심 함수들이 모인 모듈입니다. function_call()로 load_data/filter_data/analyze_data/check_data/save_data 함수들을 제공합니다.
├── config.py                  # API 설정과 함수 실행 우선순위를 정의합니다. 
├── tool_schema.py             # AI가 사용할 도구들의 스키마를 정의합니다.

2.1 config.py

config.py는 CLOVA Studio API 연결과 함수 실행 우선순위를 정의하는 설정 파일입니다. 이 파일은 전체 시스템의 핵심 설정을 중앙 집중식으로 관리하여 유지보수성을 높입니다.
실제 사용을 위해서는 본인의 CLOVA Studio API 키를 입력해야 합니다.

API_KEY = "YOUR_API_KEY" 
API_URL = "https://clovastudio.stream.ntruss.com/testapp/v3/chat-completions/HCX-DASH-002"

def get_headers() -> dict:
    return {
        "Content-Type": "application/json",
        "Authorization": API_KEY,
    }

# 함수 실행 우선 순위 정의
FUNCTION_PRIORITY = {
    "load_data": 0,
    "filter_data": 1,
    "check_data": 2,
    "analyze_data": 3,
    "save_data": 4,
}

2.2 tool_schema.py

tool_schema.py 는 LLM이 사용할 수 있는 도구의 스키마를 정의하는 파일입니다. LLM이 어떤 함수를 호출할 수 있고 각 함수가 어떤 파라미터를 받는지 명시합니다.
정의할 함수 정보는 다음과 같습니다.

load_data: CSV 파일을 로드하는 기본 함수입니다. 'filename'은 필수 파라미터이며, 같은 폴더 내에 해당 이름을 가진 파일이 존재해야 합니다.
filter_data: 다양한 조건으로 데이터를 필터링하는 함수입니다. 'base' 파라미터로 필터 적용 대상을 선택할 수 있으며, 각 파라미터들은 해당 열에 명시된 값들을 제공합니다.
check_data: 데이터 품질을 검사하고 정리하는 함수입니다. 'check_type'과 'action'은 필수 파라미터로, 검사 유형과 수행할 작업을 지정합니다.
analyze_data: 특정 제품의 평점 통계를 분석하고 월별 시각화 차트를 생성하는 분석 함수입니다. 'product_name'은 필수 파라미터이며, 'bins'는 시각화 도구의 구간 수를 지정합니다.
save_data: 필터링된 데이터를 CSV 파일로 저장하는 함수입니다. 'filename'은 저장할 파일의 이름을 지정하는 필수 파라미터입니다.

Quote

모든 함수는 사용자 목적에 맞게 수정할 수 있습니다. 예를 들어, analyze_data의 경우 제공된 시나리오와 다른 시각화 도구를 사용하거나, 일·주·월 별로 분석 단위 선택 파라미터를 추가하는 식으로 수정이 가능합니다. 마찬가지로, 함수의 description도 예시 그대로 활용하지 않고 사용자 목적에 맞게 수정 가능합니다.

def get_tools() -> list:
    return [
        {
            "type": "function",
            "function": {
                "description": "CSV 파일을 로드하고 데이터 정보를 제공합니다.",
                "name": "load_data",
                "parameters": {
                    "properties": {
                        "filename": {
                            "type": "string",
                            "description": "업로드할 CSV 파일명 (예: 'data.csv', 'sales.csv')",
                        }
                    },
                    "required": ["filename"],
                    "type": "object",
                },
            },
        },
        {
            "type": "function",
            "function": {
                "description": "사용자가 특정 조건의 데이터를 요청할 때 호출합니다.",
                "name": "filter_data",
                "parameters": {
                    "properties": {
                        "base": {
                            "type": "string",
                            "enum": ["filtered", "all"],
                            "description": "필터 적용 대상. 'filtered'는 이전 필터링 결과에 추가 필터를 적용하고, 'all'은 전체 데이터에서 새로 필터링합니다. 기본값은 'filtered'입니다.",
                        },
                        "product_name": {"type": "string"},
                        "cs_category": {
                            "enum": [
                                "as",
                                "payment",
                                "exchange",
                                "delivery",
                                "return",
                            ],
                            "type": "string",
                        },
                        "date_range": {"type": "string"},
                        "rating_threshold": {"type": "integer"},
                        "customer_id": {"type": "string"},
                    },
                    "required": [],
                    "type": "object",
                },
            },
        },
        {
            "type": "function",
            "function": {
                "description": "특정 product의 rating 기술통계와 월별 시각화 차트를 생성합니다.",
                "name": "analyze_data",
                "parameters": {
                    "properties": {
                        "product_name": {"type": "string"},
                        "bins": {
                            "type": "integer",
                            "description": "rating 히스토그램 구간 수 (기본값: 12)",
                        },
                    },
                    "required": ["product_name"],
                    "type": "object",
                },
            },
        },
        {
            "type": "function",
            "function": {
                "description": "사용자가 데이터 품질 검사나 정리를 요청할 때 호출합니다.",
                "name": "check_data",
                "parameters": {
                    "properties": {
                        "check_type": {
                            "description": "검사 유형 (duplicates: 중복, missing: 결측값)",
                            "enum": ["duplicates", "missing"],
                            "type": "string",
                        },
                        "action": {
                            "description": "수행할 작업 (check: 확인만, remove: 제거)",
                            "enum": ["check", "remove"],
                            "type": "string",
                        },
                        "column": {
                            "description": "검사할 컬럼명 (없으면 전체)",
                            "type": "string",
                        },
                    },
                    "required": ["check_type", "action"],
                    "type": "object",
                },
            },
        },
        {
            "type": "function",
            "function": {
                "description": "사용자가 특정 파일명으로 데이터를 저장하라고 요청할 때 호출합니다. 필터링된 데이터를 CSV 형식으로 저장합니다.",
                "name": "save_data",
                "parameters": {
                    "properties": {
                        "filename": {
                            "description": "저장할 CSV 파일명 (예: 'result.csv')",
                            "type": "string",
                        },
                    },
                    "required": ["filename"],
                    "type": "object",
                },
            },
        },
    ]

2.3. fc_core.py

fc_core.py는 멀티 턴, Function calling, 함수 실행 자동화 등 전체 시스템의 핵심 기능을 담당하는 파일입니다. 이 파일은 크게 멀티 턴 대화를 위한 컨텍스트 관리, Function calling 처리, 그리고 실행 함수 구현 세 부분으로 나뉩니다.

2.3.1. 멀티 턴 정보 전달

멀티 턴 대화에서 이전 턴의 정보를 다음 턴에 전달하기 위한 컨텍스트 관리 함수들입니다.

build_summary 함수는 현재 대화 상태를 요약하여 문자열로 반환합니다. 데이터 로드 정보, 최근 필터링 결과, 데이터 검사 결과 등 함수 실행 결과를 포함하여 LLM이 이전 턴의 작업 내용을 이해할 수 있도록 도와줍니다.

append_context 함수는 앞서 생성된 컨텍스트 요약을 메시지 히스토리에 추가합니다. 이 함수는 다음 턴 시작 전에 호출되어 LLM에게 이전 턴의 작업 결과를 전달합니다.

import json
import os
from typing import Any, Dict, List
import pandas as pd
import numpy as np
import requests
import matplotlib 
matplotlib.use('Agg') 
import matplotlib.pyplot as plt

from config import get_headers, API_URL, FUNCTION_PRIORITY

current_df = None # load_data로 불러온 원본 데이터를 의미합니다.
filtered_df = None # 다른 함수 실행으로 인해 필터링된 데이터를 의미합니다.

conversation_state = {
    "data_loaded": False,
    "filename": None,
    "rows": 0,
    "columns": 0,
    "last_filter": None,
}

def build_summary() -> str:
    parts: List[str] = []
    if conversation_state.get("data_loaded"):
        parts.append(
            f"데이터 로드됨(file={conversation_state.get('filename')}, rows={conversation_state.get('rows')}, cols={conversation_state.get('columns')})"
        )
    lf = conversation_state.get("last_filter")
    if lf:
        conds = ", ".join(lf.get("conditions", [])) if lf.get("conditions") else "-"
        parts.append(
            f"최근 필터(count={lf.get('filtered_count')}/{lf.get('total_count')}, conds={conds})"
        )
    lc = conversation_state.get("last_check")
    if lc:
        check_type = lc.get("type", "")
        action = lc.get("action", "")
        column = lc.get("column", "")
        removed = lc.get("removed", 0)
        remaining = lc.get("remaining", 0)
        if check_type == "missing":
            parts.append(f"결측값 제거({column}: {removed}개 제거, {remaining}개 남음)")
        elif check_type == "duplicates":
            parts.append(f"중복 제거({column}: {removed}개 제거, {remaining}개 남음)")

    return "[context] " + " | ".join(parts) if parts else ""


def append_context(messages: List[Dict[str, Any]]) -> None:
    summary = build_summary()
    if summary:
        messages.append({"role": "assistant", "content": summary})

2.3.2. Function calling 처리

LLM이 사용자 쿼리를 바탕으로 정의된 도구를 호출하는 Function calling 과정을 설계합니다.

execute_function 함수는 LLM이 요청한 함수를 실행하는 역할을 합니다. 함수 이름과 파라미터를 받아서 해당하는 실제 함수를 호출하고 결과를 반환합니다.
function_call 함수는 멀티 턴 대화의 함수 호출 과정을 보여주는 핵심 함수입니다.
1. 첫 번째 호출을 통해 message와 tool_schema를 LLM에게 전송합니다.
2. LLM이 요청 함수들을 우선순위에 따라 실행합니다.
3. 함수 실행 결과를 LLM에게 전달하여 최종 응답을 생성합니다.
4. 새로운 메시지와 이전 컨텍스트를 포함하여 두 번째 턴을 실행합니다.
5. 1-3 과정을 반복하여 최종 응답을 생성합니다.

def execute_function(func_name: str, args: Dict[str, Any]) -> Dict[str, Any]:
    if func_name == "load_data":
        return load_data_file(args.get("filename"))
    if func_name == "filter_data":
        return filter_data_func(args)
    if func_name == "analyze_data":
        return analyze_data_func(args)
    if func_name == "check_data":
        return check_data_func(args)
    if func_name == "save_data":
        return save_data_func(args)
    return {"error": f"알 수 없는 함수: {func_name}"}


def function_call(messages: List[Dict[str, Any]]) -> Dict[str, Any] | None:
    
    from tool_schema import get_tools

    payload = {"messages": messages, "tools": get_tools(), "toolChoice": "auto"} # 요청 페이로드 생성
    try:
        response = requests.post(API_URL, headers=get_headers(), json=payload, timeout=30) # Tool call 요청 전송
        if response.status_code != 200:
            return None # 응답 코드가 200이 아닌 경우 오류 반환

        result = response.json() # 응답 결과 파싱

        message = (
            result.get("result", {}).get("message", {})
            if "result" in result
            else (result.get("choices", [{}])[0].get("message", {}))
        )
        if not message:
            return result 

        tool_calls_raw = message.get("toolCalls", message.get("tool_calls", [])) # Tool call 목록 추출
        if not tool_calls_raw:
            return {"message": message}

        tool_calls = sorted(
            tool_calls_raw,
            key=lambda c: FUNCTION_PRIORITY.get(c.get("function", {}).get("name", ""), 99),
        ) # 우선순위에 따라 정렬

        tool_messages: List[Dict[str, Any]] = []
        for i, call in enumerate(tool_calls):
            func_name = call["function"]["name"]
            args_str = call["function"]["arguments"]
            call_id = call.get("id", f"call_{i}")
            # 실행 로그: 어떤 함수가 어떤 arguments로 호출되는지 출력
            print(f"\n 함수 실행 {i+1}: {func_name}")
            print(f"Arguments: {args_str}")

            try:
                args = json.loads(args_str) if isinstance(args_str, str) else args_str
                function_result = execute_function(func_name, args) # 함수 실행
                
                print("함수 결과:")
                print(json.dumps(function_result, ensure_ascii=False, indent=2))
                tool_messages.append(
                    {
                        "role": "tool",
                        "content": json.dumps(function_result, ensure_ascii=False),
                        "tool_call_id": call_id,
                    }
                )
            except Exception:
                tool_messages.append(
                    {
                        "role": "tool",
                        "content": json.dumps({"error": "함수 실행 실패"}, ensure_ascii=False),
                        "tool_call_id": call_id,
                    }
                )

        # 두 번째 호출
        messages.extend(tool_messages) # 원본 메시지에 tool 실행 결과 추가
        second_payload = {"messages": messages, "tools": get_tools(), "toolChoice": "auto"} # 두 번째 호출 페이로드 생성
        second_response = requests.post(
            API_URL, headers=get_headers(), json=second_payload, timeout=30
        ) # 두 번째 호출 요청 전송
        if second_response.status_code != 200:
            return {"message": message, "has_tool_calls": True} 

        second_result = second_response.json()
        # 두 번째 호출 결과 파싱
        second_message = (
            second_result.get("result", {}).get("message", {})
            if "result" in second_result
            else (second_result.get("choices", [{}])[0].get("message", {}))
        )
        return {"message": second_message} if second_message else {"message": message, "has_tool_calls": True}
    except Exception:
        return None

2.3.3. 실행 함수 구현

실제 데이터 처리를 담당하는 도구 함수들을 구현합니다. 각 함수는 LLM이 요청하는 구체적인 작업을 직접 수행합니다.

load_data_file 함수는 CSV 파일을 읽어서 current_df에 저장하고, 데이터의 기본 정보를 반환합니다. 파일 존재 여부를 확인하고, 컬럼명과 데이터 타입 정보를 생성하여 사용자가 데이터 구조를 파악할 수 있도록 도와줍니다.
filter_data_func 함수는 다양한 조건으로 데이터를 필터링합니다. 제품명, 카테고리, 평점, 고객 ID, 날짜 범위 등 다양한 조건을 적용할 수 있습니다. 필터링 결과는 filtered_df에 저장되고, 상위 3개 레코드를 미리보기로 제공합니다.
check_data_func 함수는 데이터 품질을 검사하고 정리합니다. 중복 데이터와 결측값을 검사하거나 제거할 수 있으며, 특정 컬럼에 대해서만 작업을 수행할 수도 있습니다.
analyze_data_func 함수는 특정 제품의 평점 통계를 분석하고 월별 시각화 차트를 생성합니다. 평점에 대한 기술통계(평균, 표준편차, 최소값, 최대값 등)를 계산하고, 월별 건수와 월별 평균 평점을 시각화한 두 개의 차트 파일을 생성합니다.
save_data_func 함수는 필터링된 데이터를 CSV 파일로 저장합니다. 파일 확장자가 .csv가 아닌 경우 자동으로 추가하며, 저장된 파일의 정보와 함께 완료 메시지를 반환합니다.

# 함수 정의
def load_data_file(filename: str) -> Dict[str, Any]:
    global current_df
    try:
        current_dir = os.getcwd()
        file_path = os.path.join(current_dir, filename)
        if not os.path.exists(file_path):
            return {"error": "파일을 찾을 수 없습니다"}

        current_df = pd.read_csv(file_path)

        # 컬럼명과 타입 정보 생성
        df_info = f"columns:\n"
        for i, col in enumerate(current_df.columns):
            dtype = str(current_df[col].dtype)
            df_info += f"   • {col}: {dtype}"
            if i < len(current_df.columns) - 1:
                df_info += "\n"

        conversation_state.update(
            {
                "data_loaded": True,
                "filename": file_path,
                "rows": int(len(current_df)),
                "columns": int(len(current_df.columns)),
            }
        )
        return {
            "success": True,
            "filename": file_path,
            "rows": int(len(current_df)),
            "columns": int(len(current_df.columns)),
            "data_info": df_info,
        }
    except Exception:
        return {"error": "CSV 읽기 실패"}


def filter_data_func(args: Dict[str, Any]) -> Dict[str, Any]:
    global current_df, filtered_df
    try:
        if current_df is None:
            return {"error": "먼저 load_data로 데이터를 로드해주세요."}

        base_target = args.get("base", "filtered")
        if base_target == "filtered" and filtered_df is not None and len(filtered_df) > 0:
            df = filtered_df.copy()
        else:
            df = current_df.copy()

        # 필터 적용
        filtered = df.copy()
        conditions: list[str] = []
        if "product_name" in args:
            filtered = filtered[filtered["product_name"] == args["product_name"]]
            conditions.append(f"product_name: {args['product_name']}")
        if "cs_category" in args:
            filtered = filtered[filtered["cs_category"] == args["cs_category"]]
            conditions.append(f"cs_category: {args['cs_category']}")
        if "rating_threshold" in args:
            filtered = filtered[filtered["rating"] >= args["rating_threshold"]]
            conditions.append(f"rating >= {args['rating_threshold']}")
        if "customer_id" in args:
            filtered = filtered[filtered["customer_id"] == args["customer_id"]]
            conditions.append(f"customer_id: {args['customer_id']}")
        if "date_range" in args:
            date_range = args["date_range"]
            if "1월" in date_range and "2월" in date_range and "date" in filtered.columns:
                filtered = filtered[filtered["date"].astype(str).str.contains("2025-01|2025-02")]
                conditions.append("date_range: 1-2월")

        filtered_df = filtered
        
        # 필터된 데이터 미리보기 (처음 3개만 노출)
        preview_data = []
        if len(filtered_df) > 0:
            # 미리보기 컬럼 구성
            preview_cols = ["id", "date", "product_name", "cs_category", "text", "rating"]
            available_cols = [col for col in preview_cols if col in filtered_df.columns]
            preview_data = filtered_df[available_cols].head(3).to_dict('records')
        
        conversation_state.update(
            {
                "last_filter": {
                    "filtered_count": int(len(filtered_df)),
                    "total_count": int(len(df)),
                    "conditions": conditions,
                    "base": base_target,
                }
            }
        )
        return {
            "success": True,
            "total_count": int(len(df)),
            "filtered_count": int(len(filtered_df)),
            "filter_conditions": conditions,
            "preview_data": preview_data,
            "message": f"필터링 완료: 전체 {len(df)}개 중 {len(filtered_df)}개 데이터 추출 (필터링된 데이터가 저장되었습니다)",
        }
    except Exception:
        return {"error": "필터링 실패"}


def check_data_func(args: Dict[str, Any]) -> Dict[str, Any]:
    global current_df, filtered_df
    try:
        if current_df is None:
            return {"error": "먼저 load_data로 데이터를 로드해주세요."}
        # 항상 필터링된 데이터 기준으로 수행. 없으면 오류 반환
        if filtered_df is None or len(filtered_df) == 0:
            return {"error": "필터링된 데이터가 없습니다. 먼저 filter_data를 수행하세요."}
        df = filtered_df  

        check_type = args.get("check_type", "duplicates")
        action = args.get("action", "check")
        column = args.get("column")

        if check_type == "duplicates":
            if column:
                duplicates_count = df[column].duplicated().sum()
            else:
                duplicates_count = df.duplicated().sum()
            if action == "check":
                result = {
                    "success": True,
                    "check_type": "duplicates",
                    "action": "check",
                    "column": column,
                    "duplicates_found": int(duplicates_count),
                    "total_records": int(len(df)),
                }
                return result
            else:
                df_clean = df.drop_duplicates(subset=[column]) if column else df.drop_duplicates()
                # 필터링된 데이터 갱신 
                filtered_df = df_clean
                conversation_state.update(
                    {
                        "last_check": {
                            "type": "duplicates",
                            "action": "remove",
                            "column": column,
                            "removed": int(duplicates_count),
                            "remaining": int(len(df_clean)),
                        }
                    }
                )
                return {
                    "success": True,
                    "check_type": "duplicates",
                    "action": "remove",
                    "column": column,
                    "original_records": int(len(df)),
                    "duplicates_removed": int(duplicates_count),
                    "remaining_records": int(len(df_clean)),
                }

        if check_type == "missing":
            if column:
                # NaN과 빈 문자열 모두 결측치로 처리
                missing_count = df[column].isna().sum() + (df[column] == '').sum()
            else:
                # 모든 컬럼에서 NaN 또는 빈 문자열이 있는 행 찾기
                missing_mask = df.isna().any(axis=1) | (df == '').any(axis=1)
                missing_count = missing_mask.sum()
            if action == "check":
                return {
                    "success": True,
                    "check_type": "missing",
                    "action": "check",
                    "column": column,
                    "missing_found": int(missing_count),
                    "total_records": int(len(df)),
                }
            else:
                if column:
                    # 특정 컬럼에서 NaN과 빈 문자열 제거
                    df_clean = df[~(df[column].isna() | (df[column] == ''))]
                else:
                    # 모든 컬럼에서 NaN 또는 빈 문자열이 있는 행 제거
                    df_clean = df[~(df.isna().any(axis=1) | (df == '').any(axis=1))]
                # 필터링된 데이터 갱신 
                filtered_df = df_clean
                conversation_state.update(
                    {
                        "last_check": {
                            "type": "missing",
                            "action": "remove",
                            "column": column,
                            "removed": int(missing_count),
                            "remaining": int(len(df_clean)),
                        }
                    }
                )
                return {
                    "success": True,
                    "check_type": "missing",
                    "action": "remove",
                    "column": column,
                    "original_records": int(len(df)),
                    "missing_removed": int(missing_count),
                    "remaining_records": int(len(df_clean)),
                }
        return {"error": "지원하지 않는 check_type 입니다."}
    except Exception:
        return {"error": "데이터 검사 실패"}


def analyze_data_func(args: Dict[str, Any]) -> Dict[str, Any]:
    global current_df, filtered_df
    try:
        if current_df is None:
            return {"error": "먼저 load_data로 데이터를 로드해주세요."}
        base_df = filtered_df if (filtered_df is not None and len(filtered_df) > 0) else current_df
        df = base_df

        product_name = args.get("product_name")
        if not product_name:
            return {"error": "product_name 파라미터가 필요합니다."}
        if "rating" not in df.columns:
            return {"error": "rating 컬럼이 없습니다."}

        df_product = df[df["product_name"] == product_name]
        if len(df_product) == 0:
            return {"error": f"해당 product 데이터 없음: {product_name}"}

        desc = df_product["rating"].describe()
        stats = {
            "count": int(desc.get("count", 0)),
            "mean": float(desc.get("mean", 0)) if not pd.isna(desc.get("mean", None)) else 0.0,
            "std": float(desc.get("std", 0)) if not pd.isna(desc.get("std", None)) else 0.0,
            "min": float(desc.get("min", 0)) if not pd.isna(desc.get("min", None)) else 0.0,
            "25%": float(desc.get("25%", 0)) if not pd.isna(desc.get("25%", None)) else 0.0,
            "50%": float(desc.get("50%", 0)) if not pd.isna(desc.get("50%", None)) else 0.0,
            "75%": float(desc.get("75%", 0)) if not pd.isna(desc.get("75%", None)) else 0.0,
            "max": float(desc.get("max", 0)) if not pd.isna(desc.get("max", None)) else 0.0,
        }

        # 월별 시각화 데이터 기반 차트 생성
        chart_files = []
        if "date" in df_product.columns:
            try:
                df_product["date"] = pd.to_datetime(df_product["date"])
                df_product["month"] = df_product["date"].dt.strftime("%Y-%m")
                monthly_counts = df_product["month"].value_counts().sort_index()
                monthly_ratings = df_product.groupby("month")["rating"].mean()
                
                # 차트 1: 월별 건수
                plt.figure(figsize=(10, 6))
                bars = plt.bar(monthly_counts.index, monthly_counts.values, color='skyblue', alpha=0.7)
                plt.title(f'{product_name} Monthly AS Cases', fontsize=14, fontweight='bold')
                plt.xlabel('Month')
                plt.ylabel('Number of Cases')
                plt.xticks(rotation=45)
                plt.grid(True, alpha=0.3)
                y_max = max(monthly_counts.values) if len(monthly_counts) > 0 else 0
                offset = max(0.02 * y_max, 0.5)
                for bar, count in zip(bars, monthly_counts.values):
                    y = max(bar.get_height() - offset, bar.get_height() * 0.5)
                    plt.text(
                        bar.get_x() + bar.get_width() / 2,
                        y,
                        str(count),
                        ha='center',
                        va='top',
                        fontweight='bold'
                    )
                plt.tight_layout(pad=1.2)
                chart_filename_1 = f"{product_name.replace(' ', '_')}_monthly_cases.png"
                plt.savefig(chart_filename_1, dpi=300)
                plt.close()
                
                # 차트 2: 월별 평균 평점
                plt.figure(figsize=(10, 6))
                plt.plot(monthly_ratings.index, monthly_ratings.values, marker='o', linewidth=2, markersize=8, color='red')
                plt.title(f'{product_name} Monthly Average Rating', fontsize=14, fontweight='bold')
                plt.xlabel('Month')
                plt.ylabel('Average Rating')
                plt.xticks(rotation=45)
                plt.grid(True, alpha=0.3)
                plt.ylim(0, 5)
                
                for x, y in zip(monthly_ratings.index, monthly_ratings.values):
                    plt.text(x, y + 0.1, f'{y:.2f}', ha='center', va='bottom', fontweight='bold')
                plt.tight_layout(pad=1.2)
                chart_filename_2 = f"{product_name.replace(' ', '_')}_monthly_ratings.png"
                plt.savefig(chart_filename_2, dpi=300)
                plt.close()
                
                chart_files = [chart_filename_1, chart_filename_2]
            except Exception:
                chart_files = []
        
        return {
            "success": True,
            "product_name": product_name,
            "stats": stats,
            "records": int(len(df_product)),
            "chart_files": chart_files,
        }
    except Exception:
        return {"error": "분석 실패"}


def save_data_func(args: Dict[str, Any]) -> Dict[str, Any]:
    global filtered_df
    try:
        if filtered_df is None or len(filtered_df) == 0:
            return {"error": "저장할 데이터가 없습니다. 먼저 filter_data로 데이터를 준비하세요."}
        filename = args.get("filename")
        if not filename:
            return {"error": "filename 파라미터가 필요합니다."}
        base, ext = os.path.splitext(filename)
        if ext.lower() != ".csv":
            filename = f"{base}.csv"
        filtered_df.to_csv(filename, index=False, encoding="utf-8")
        return {
            "success": True,
            "filename": filename,
            "rows": int(len(filtered_df)),
            "cols": int(len(filtered_df.columns)),
            "data_source": "filtered",
            "message": f"데이터 저장 완료: {filename} ({len(filtered_df)}개 레코드, {len(filtered_df.columns)}개 컬럼)",
        }
    except Exception:
        return {"error": "저장 실패"}

Quote

⚠️ filter_data_func의 'product_name', 'cs_category'와 같은 고유 값을 받는 파라미터들은 CSV에 저장된 실제 값과 정확히 일치해야 정상적으로 필터링됩니다.예를들어 데이터에는 'Smart watch'로 저장되어 있는데, 사용자가 '스마트워치'라고 입력하면 언어·대소문자·공백 같은 표기 차이 때문에 매칭이 실패할 수 있습니다.
이런 경우, 필요에 따라 정규화 혹은 동의어 매핑 등 추가 작업을 통해 실행 함수 정의를 수정하거나, 시스템 프롬프트와 함수 description을 수정하여 올바른 파라미터를 선택하도록 유도할 수 있습니다.

2.4. main.py

main.py는 멀티 턴 대화 시나리오를 실행하는 메인 파일입니다. 시나리오에 맞는 쿼리와 프롬프트를 입력하여 데이터 분석 작업 수행 예시를 제공합니다.

from typing import List, Dict, Any
from fc_core import function_call, append_context


def multiturn_calling() -> None:
    # 멀티 턴 대화 시나리오 실행
    
    # 시스템 프롬프트 설정
    messages: List[Dict[str, Any]] = [
        {
            "role": "system",
            "content": "당신은 상품 주문에 대한 고객 CS 데이터를 분석하는 AI 어시스턴트입니다. 함수를 호출해서 CSV 파일을 불러오고, 데이터를 필터링하고, 데이터를 검사하고 분석한 뒤 결과를 바탕으로 사용자에게 도움이 되는 답변을 제공하세요. 일반적인 답변은 하지 말고, 항상 적절한 함수를 선택해서 호출해야 합니다. 사용자가 여러 작업을 요청하면 필요한 모든 함수를 호출하세요.",
        }
    ]

    # 1st Turn: CSV 로드 + Smart watch as 필터링
    # - 컨텍스트 요약은 턴2에서만 추가하여 중복/과다 메시지 방지
    print("\n🔄 턴 1")
    # 유저 쿼리 작성
    user_query_1 = (
        "cs_data.csv 파일을 로드해줘. 그리고 Smart watch의 as 관련 데이터만 필터링해줘"
    )
    print(f" User: {user_query_1}")
    print("-" * 60)
    messages.append({"role": "user", "content": user_query_1})
    result_1 = function_call(messages)  # 1차 호출 + 도구 실행 + 2차 호출로 응답 생성까지 수행

    # 어시스턴트 자연어 응답은 메시지 히스토리에 추가하지 않음(Tool 결과만으로 충분)

    print("✅ 턴 1 완료")
    print('-'*60)

    # 2nd Turn: 이전 필터링된 Smart watch 데이터를 분석 + 결측값 제거 + CSV 저장
    # filtered_df 기준으로 연속적으로 함수가 수행되는지 확인
    print("\n🔄 턴 2")
    # 2번째 유저 쿼리 작성
    user_query_2 = (
        "text가 빈 값들은 제거하고 월별로 시각화 분석 한 뒤 데이터를 watch_as.csv 파일로 저장해줘."
    )
    print(f" User: {user_query_2}")
    print('-'*60)

    append_context(messages)  # 1턴 결과를 요약해 모델에 1회만 전달
    messages.append({"role": "user", "content": user_query_2})
    result_2 = function_call(messages)

    # 어시스턴트 자연어 응답은 메시지 히스토리에 추가하지 않음
    print("✅ 턴 2 완료")



def main() -> None:
    # 멀티 턴 예시를 실행
    multiturn_calling()


if __name__ == "__main__":
    main()

Quote

main.py의 스크립트는 이해를 돕기 위해 첫 번째 턴과 두 번째 턴을 한 파일에서 연속적으로 실행하지만, 실제 시나리오는 사용자가 첫 번째 턴을 실행하고 그 결과를 확인한 뒤 다음 지시를 내리는 흐름으로 구성됩니다. 첫 번째 실행이 끝나면 시스템은 요약 컨텍스트와 내부 정보를 기억하고 있으므로, 사용자는 추가적인 요청만 보내면 됩니다.
실제 시나리오로 코드를 재구성 하기 위해서는 예시의 두 번째 턴 자동 실행 대신, 첫번째 턴의 결과를 보여준 뒤 사용자가 다음 요청을 보낼 때 마다 한 턴씩 처리하도록 입력 루프로 변경해야 합니다.
세션별로 messages 또는 최소 conversation_state/filtered_df 등을 유지하고, 매 턴 시작 전에 append_context를 붙인 뒤 function_call을 실행하여 결과를 출력하거나 저장할 수 있습니다.

3. 파일 실행

앞서 구현한 파일들을 저장한채로 main.py 파일을 실행한 결과는 다음과 같습니다.

🔄 턴 1
 User: cs_data.csv 파일을 로드해줘. 그리고 Smart watch의 as 관련 데이터만 필터링해줘
------------------------------------------------------------

 함수 실행 1: load_data
Arguments: {'filename': 'cs_data.csv'}
함수 결과:
{
  "success": true,
  "filename": "/Users/user/fc/cs_data.csv",
  "rows": 500,
  "columns": 7,
  "data_info": "columns:\n   • id: int64\n   • date: object\n   • customer_id: object\n   • product_name: object\n   • cs_category: object\n   • text: object\n   • rating: int64"
}

 함수 실행 2: filter_data
Arguments: {'product_name': 'Smart watch', 'cs_category': 'as'}
함수 결과:
{
  "success": true,
  "total_count": 500,
  "filtered_count": 101,
  "filter_conditions": [
    "product_name: Smart watch",
    "cs_category: as"
  ],
  "preview_data": [
    {
      "id": 5,
      "date": "2025-01-03",
      "product_name": "Smart watch",
      "cs_category": "as",
      "text": "터치가 안 되는데 수리 가능한가요",
      "rating": 3
    },
    {
      "id": 6,
      "date": "2025-01-03",
      "product_name": "Smart watch",
      "cs_category": "as",
      "text": NaN,
      "rating": 3
    },
    {
      "id": 10,
      "date": "2025-01-04",
      "product_name": "Smart watch",
      "cs_category": "as",
      "text": "지문인식이 안 되는데 AS 가능한가요",
      "rating": 3
    }
  ],
  "message": "필터링 완료: 전체 500개 중 101개 데이터 추출 (필터링된 데이터가 저장되었습니다)"
}
✅ 턴 1 완료
------------------------------------------------------------

🔄 턴 2
 User: text가 빈 값들은 제거하고 Smart watch에 대해서 월별로 시각화 분석 한 뒤 데이터를 watch_as.csv 파일로 저장해줘.
------------------------------------------------------------

 함수 실행 1: check_data
Arguments: {'check_type': 'missing', 'action': 'remove', 'column': 'text'}
함수 결과:
{
  "success": true,
  "check_type": "missing",
  "action": "remove",
  "column": "text",
  "original_records": 101,
  "missing_removed": 2,
  "remaining_records": 99
}

 함수 실행 2: analyze_data
Arguments: {'product_name': 'Smart watch', 'bins': 12}
함수 결과:
{
  "success": true,
  "product_name": "Smart watch",
  "stats": {
    "count": 99,
    "mean": 2.4343434343434343,
    "std": 0.9914081529558191,
    "min": 1.0,
    "25%": 1.5,
    "50%": 3.0,
    "75%": 3.0,
    "max": 4.0
  },
  "records": 99,
  "chart_files": [
    "Smart watch_cases.png",
    "Smart watch_monthly_ratings.png"
  ]
}

 함수 실행 3: save_data
Arguments: {'filename': 'watch_as.csv'}
함수 결과:
{
  "success": true,
  "filename": "watch_as.csv",
  "rows": 99,
  "cols": 7,
  "data_source": "filtered",
  "message": "데이터 저장 완료: watch_as.csv (99개 레코드, 7개 컬럼)"
}
✅ 턴 2 완료

결과 데이터 및 시각화 이미지

watch_as.csv

마무리

이번 쿡북에서는 LLM의 핵심 기능인 Function calling 기반 멀티 턴 자동화를 살펴보았습니다. 시나리오에서는 기본적인 데이터 분석 도구를 사용했지만, 시각화, 감성 분석, 요약 등 다양한 함수를 추가해 더 풍부한 분석을 할 수 있습니다. 예제에서는 원할한 이해를 위해 한 번의 함수 실행으로 멀티 턴을 구현했지만, 실제 환경에서는 단계별 입력을 받아 결과를 확인하고 다음 작업을 결정할 수 있습니다. 이를 통해 필요에 따라 턴을 추가·수정하며 유연한 분석이 가능합니다.
이 쿡북을 통해 Function Calling 역량을 높이고, 자신만의 자동화 솔루션을 구축해 보세요! 🚀