Turing Post Korea
Posts
[전문가 기고] Ollama, Llama 3, Milvus로 함수 호출 (Function Calling)하기

[전문가 기고] Ollama, Llama 3, Milvus로 함수 호출 (Function Calling)하기

Llama 3.1과 Milvus 벡터 DB나 API 등 외부 도구를 연동하기 위한 단계별 가이드

Stephen Batifol
August 19, 2024

LLM에게 ‘함수 호출 (Function Calling)’이란 바깥 세상과의 ‘연결점’을 만들어주는 것이라고 생각합니다. LLM을 직접 작성한 함수나 외부 API 등과 연동해서, 해결하고자 하는 문제를 푸는 서비스를 만들 수도 있구요.

이번 글은, 얼마 전에 공개된 Llama 3.1을 Zilliz가 만든 오픈소스 벡터 DB Milvus라든가 기타 API들과 연동해서 강력한 ‘컨텍스트 기반’의 어플리케이션을 만드는 법을 간단히 알아볼 수 있도록 Zilliz의 스티븐 바티폴 (Milvus 개발자)가 작성한 글입니다.

함수 호출 (Function Calling)을 이해해 보자

현재 GPT-4, Mistral Nemo, Llama 3.1 등의 LLM은 함수를 호출해야 하는 시점을 감지하고 해당 함수를 호출하는데 필요한 파라미터를 포함한 JSON 포맷을 작성할 수 있습니다.

‘함수 호출’ 기능을 사용하면, AI 어플리케이션 개발자는:

데이터를 추출하거나 태깅 - 예를 들면 위키피디아에서 사람의 이름을 추출하는 등 - 해서 활용하는 LLM 기반의 솔루션 개발
자연어를 API 콜 또는 그에 해당하는 DB 쿼리로 변환해서 작동하는 어플리케이션
특정한 종류의 Knowledge Base를 대상으로 대화형 지식 검색을 하는 엔진

등 LLM의 기본적인 제약 조건을 넘어서 다양한 기능을 제공하는 강력한 AI 어플리케이션을 만들 수 있습니다.

이런 AI 어플리케이션을 만들려면 어떤 작업을 해야 하는지 살펴보려고 하는데요, 오늘 사용해 볼 도구들은 다음과 같습니다.:

Ollama (Open Large Language Model Application)
- Ollama는 로컬 PC 환경에서 거대언어모델 (LLM)을 쉽게 실행할 수 있게 해 주는 오픈소스 소프트웨어로, 특히 메타의 Llama 모델을 손쉽게 사용할 수 있게 해 줍니다.
- LLM 사용에 필요한 모든 설정을 Model File이라는 1개 파일에 정의해서 사용하는데, 여기에는 모델 데이터, 설정 및 내부 실행 내용이 포함됩니다.
- 윈도우즈, 리눅스, 맥에서 사용 가능합니다.

Milvus 벡터 DB
- ‘벡터 DB’는 머신러닝 모델의 임베딩을 사용해서 ‘비정형 데이터’를 처리하고 인덱싱, 검색할 수 있도록 설계한 특수한 유형의 DB라고 보면 되겠습니다. 전통적인 RDB (관계형 DB)처럼 테이블 형식으로 데이터를 구성하는 게 아니라 데이터를 ‘벡터 임베딩’이라고 부르는 고차원 벡터로 표현하고 관리합니다.
- 시중에 벡터 DB가 여러가지 있는데, Milvus는 그 중 하나로 오픈소스 벡터 DB라고 생각하시면 되겠습니다.

Llama 3.1-8B
- 얼마 전 메타에서 발표한 Llama 3.1 패밀리에 속한 모델들 중 하나로 가장 소형 모델입니다. 이전의 Llama 3에 비교해서 Context가 8K에서 128K로 아주 많이 커졌고, 다국어 지원 능력이 더 뛰어난 모델이다 정도로 이해하시면 될 것 같습니다.
- 이 글의 맥락에서 가장 중요한 건, 이전 버전과 달리 Llama 3.1은 함수 호출이 기본적으로 가능하도록 내재화되어 있다는 건데요. 메타에서는 긴 대화의 맥락을 이어가면서 함수 호출을 하려면 Llama 3.1-70B나 Llama 3.1-405B를 추천하는데, 이 글처럼 간단한 테스트는 8B 모델도 무리없습니다.

위 도구들을 사용해서, 아래와 같은 흐름으로 실행되는 아주 간단한 AI 어플리케이션 예제를 만드는 과정을 따라가 보려고 합니다:

시스템 상 질문 답변의 흐름

Llama 3.1과 Ollama 사용하기

Llama 3.1 모델은 함수 호출을 하기 위해서 파인튜닝을 한 모델입니다. 단일 (Single) 함수 호출, 중첩 (Nested) 함수 호출, 병렬 (Parallel) 함수 호출 뿐 아니라 다중 턴 (Multi-turn) 함수 호출도 지원합니다 - 다시 말하면, 여러 단계나 복잡한 병렬적인 처리가 필요한 작업을 함수 호출로 처리할 수 있다는 겁니다.

이 글의 예제에서는 Milvus에서 비행 시간 (Flight Time)을 가져오고 검색을 하기 위한 API 호출을 하는 다양한 함수를 구현해 보려고 합니다. Llama 3.1이 사용자의 질의에 따라서 어떤 함수를 호출할지 결정합니다.

모델과 라이브러리 설치

자 먼저, 예제를 실행해 보기 위한 환경 설정을 해야 합니다. Llama 3.1을 Ollama를 사용해서 다운로드합니다:

ollama run llama3.1

위 명령어를 이용해서, 사용하시는 랩탑이나 PC에 모델을 다운로드하고 Ollama로 사용할 수 있도록 준비합니다. 다음으로는, 구동에 필요한 각종 라이브러리를 설치합니다:

! pip install ollama openai "pymilvus[model]"

우리는 Milvus에서 활용 가능한 모델로 데이터를 임베딩해주는, 로컬에서 사용 가능한 Milvus Lite 버전을 설치합니다.

Milvus에 데이터 생성

자 이네, Milvus에 데이터를 좀 입력해 볼까요? 이 데이터가 Llama 3.1 모델이 검색, 활용하게 될 타겟 데이터입니다.

from pymilvus import MilvusClient, model
embedding_fn = model.DefaultEmbeddingFunction()

docs = [
    "Artificial intelligence was founded as an academic discipline in 1956.",
    "Alan Turing was the first person to conduct substantial research in AI.",
    "Born in Maida Vale, London, Turing was raised in southern England.",
]

vectors = embedding_fn.encode_documents(docs)

# The output vector has 768 dimensions, matching the collection that we just created.
print("Dim:", embedding_fn.dim, vectors[0].shape)  # Dim: 768 (768,)

# Each entity has id, vector representation, raw text, and a subject label.
data = [
    {"id": i, "vector": vectors[i], "text": docs[i], "subject": "history"}
    for i in range(len(vectors))
]

print("Data has", len(data), "entities, each with fields: ", data[0].keys())
print("Vector dim:", len(data[0]["vector"]))

# Create a collection and insert the data
client = MilvusClient('./milvus_local.db')

client.create_collection(
    collection_name="demo_collection",
    dimension=768,  # The vectors we will use in this demo has 768 dimensions
)

client.insert(collection_name="demo_collection", data=data)

위 코드를 실행하면, 새로운 Collection에 768 차원 벡터로 이루어진 3개 요소가 생기게 됩니다.

사용할 함수를 정의

이 예제에서는 두 개의 함수를 정의하는데, 하나는 ‘비행 시간’을 확인하기 위한 API 호출을 시뮬레이션 (get_flight_times)하는 것이고, 나머지 하나는 Milvus DB에 검색 쿼리 (search_data_in_vector_db)를 하는 겁니다.

함수 ‘search_data_in_vector_db는 앞에서 생성한 Milvus의 로컬 DB (Collection)을 사용하지만, 함수 ‘get_flight_times’는 보시다시피 함수 안에 필요한 데이터를 넣어 놓았기 때문에 시뮬레이션한다고 이야기한 것이고요 (실제 어플리케이션이라면 비행 시간을 알려주는 외부 서비스를 호출하겠죠)

from pymilvus import model
import json
import ollama
embedding_fn = model.DefaultEmbeddingFunction()

# Simulates an API call to get flight times
# In a real application, this would fetch data from a live database or API
def get_flight_times(departure: str, arrival: str) -> str:
    flights = {
        'NYC-LAX': {'departure': '08:00 AM', 'arrival': '11:30 AM', 'duration': '5h 30m'},
        'LAX-NYC': {'departure': '02:00 PM', 'arrival': '10:30 PM', 'duration': '5h 30m'},
        'LHR-JFK': {'departure': '10:00 AM', 'arrival': '01:00 PM', 'duration': '8h 00m'},
        'JFK-LHR': {'departure': '09:00 PM', 'arrival': '09:00 AM', 'duration': '7h 00m'},
        'CDG-DXB': {'departure': '11:00 AM', 'arrival': '08:00 PM', 'duration': '6h 00m'},
        'DXB-CDG': {'departure': '03:00 AM', 'arrival': '07:30 AM', 'duration': '7h 30m'},
    }

    key = f'{departure}-{arrival}'.upper()
    return json.dumps(flights.get(key, {'error': 'Flight not found'}))

# Search data related to Artificial Intelligence in a vector database
def search_data_in_vector_db(query: str) -> str:
    query_vectors = embedding_fn.encode_queries([query])
    res = client.search(
        collection_name="demo_collection",
        data=query_vectors,
        limit=2,
        output_fields=["text", "subject"],  # specifies fields to be returned
    )

    print(res)
    return json.dumps(res)

만든 함수를 LLM이 사용할 수 있도록 명령

이제, 위에서 정의한 함수를 LLM이 사용할 수 있게끔 명령어를 작성합니다. Llama 3.1은 tool_choice 파라미터가 아니라 특별한 프롬프트 문법을 사용해서 함수 호출을 하게끔 되어 있습니다.

def run(model: str, question: str):
    client = ollama.Client()

    # Initialize conversation with a user query
    messages = [{"role": "user", "content": question}]

    # First API call: Send the query and function description to the model
    response = client.chat(
        model=model,
        messages=messages,
        tools=[
            {
                "type": "function",
                "function": {
                    "name": "get_flight_times",
                    "description": "Get the flight times between two cities",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "departure": {
                                "type": "string",
                                "description": "The departure city (airport code)",
                            },
                            "arrival": {
                                "type": "string",
                                "description": "The arrival city (airport code)",
                            },
                        },
                        "required": ["departure", "arrival"],
                    },
                },
            },
            {
                "type": "function",
                "function": {
                    "name": "search_data_in_vector_db",
                    "description": "Search about Artificial Intelligence data in a vector database",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "query": {
                                "type": "string",
                                "description": "The search query",
                            },
                        },
                        "required": ["query"],
                    },
                },
            },
        ],
    )

    # Add the model's response to the conversation history
    messages.append(response["message"])

    # Check if the model decided to use the provided function

    if not response["message"].get("tool_calls"):
        print("The model didn't use the function. Its response was:")
        print(response["message"]["content"])
        return

    # Process function calls made by the model
    if response["message"].get("tool_calls"):
        available_functions = {
            "get_flight_times": get_flight_times,
            "search_data_in_vector_db": search_data_in_vector_db,
        }

        for tool in response["message"]["tool_calls"]:
            function_to_call = available_functions[tool["function"]["name"]]
            function_args = tool["function"]["arguments"]
            function_response = function_to_call(**function_args)

            # Add function response to the conversation
            messages.append(
                {
                    "role": "tool",
                    "content": function_response,
                }
            )

    # Second API call: Get final response from the model
    final_response = client.chat(model=model, messages=messages)

    print(final_response["message"]["content"])

예제 실행

원하는 비행편의 시간을 확인할 수 있는지 테스트해 봅니다:

question = "What is the flight time from New York (NYC) to Los Angeles (LAX)?"

run('llama3.1', question)

위와 같이 실행하면 그 결과는:

The flight time from New York (JFK/LGA/EWR) to Los Angeles (LAX) is approximately 5 hours and 30 minutes. However, please note that this time may vary depending on the airline, flight schedule, and any potential layovers or delays. It's always best to check with your airline for the most up-to-date and accurate flight information.

자, 이번에는 Llama 3.1이 Milvus로 벡터 검색을 할 수 있는지 확인해 보죠.

question = "What is Artificial Intelligence?"

run(‘llama3.1’, question)

그럼 결과는 아래와 같이 반환되는 걸 확인할 수 있습니다:

data: ["[{'id': 0, 'distance': 0.4702666699886322, 'entity': {'text': 'Artificial intelligence was founded as an academic discipline in 1956.', 'subject': 'history'}}, {'id': 1, 'distance': 0.2702862620353699, 'entity': {'text': 'Alan Turing was the first person to conduct substantial research in AI.', 'subject': 'history'}}]"] , extra_info: {'cost': 0}

맺으며

LLM을 통해서 함수를 호출한다는 건, 앞서 말씀드린 바와 같이 LLM의 한계를 벗어나 수많은 다양한 작업을 할 수 있는 어플리케이션들을 만들 수 있는 가능성을 열어줍니다. Llama 3.1을 Milvus나 기타 3rd Party API 등과 연동한다면, 당신이 원하는 어떤 문제라도 해결할 수 있는 멋진 AI 어플리케이션을 만들 수 있을 겁니다.

이 글에서는 아주 간단한 구조의 예시를 보여드린 것에 불과하지만, Milvus 웹사이트, 깃헙의 코드 등을 참고해서 더 세련된 어플리케이션을 만들어 보시기 바랍니다. 관심이 있으시다면 디스코드의 Milvus 커뮤니티에도 조인해서 다른 개발자들과 교류도 해 보시구요.

*이 글의 원본은 Zilliz.com에 스티븐 바티폴이 게재한 포스트입니다. 계속해서 튜링 포스트 코리아를 지원해 주는 것에 대해서 Zilliz.com에 감사드립니다.

읽어주셔서 감사합니다. 재미있게 보셨다면 친구와 동료 분들에게도 뉴스레터를 추천해 주세요.

Reply

or to participate.