生成AIをコア技術に持つプロダクト開発の知見を公開

PydanticOutputParser vs StructuredOutputParser

StructuredOutputParser

from langchain.output_parsers import ResponseSchema, StructuredOutputParser

response_schemas = [
    ResponseSchema(name="name", description="名前", type="string"),
    ResponseSchema(name="length", description="体長(cm)", type="int"),
]
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)
print(output_parser.get_format_instructions())

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
        "name": string  // 名前
        "length": int  // 体長(cm)
}
```

直感的かつシンプルな指示↑

長い指示→

体感StructuredOutputParserの方が失敗しにくい

PydanticOutputParser

from pydantic import BaseModel, Field
from langchain.output_parsers import PydanticOutputParser

class Bird(BaseModel):
    name: str = Field(description="名前")
    length: int = Field(description="体長(cm)")

output_parser = PydanticOutputParser(pydantic_object=Bird)
print(output_parser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"name": {"description": "名前", "title": "Name", "type": "string"}, "length": {"description": "体長(cm)", "title": "Length", "type": "integer"}}, "required": ["name", "length"]}
```

テンプレート	形式	マルチモーダル
PromptTemplate	文字列
ChatPromptTemplate	System/AI/Humanのリスト形式

	従来の機械学習モデル	LLM利用
更新	データ準備→学習→オフライン評価→オンライン評価というステップを踏む	モデル名やプロンプトで簡単に変更が可能
評価	回帰や分類など定量化しやすい	自然言語の対話体験は評価が難しく、人間による評価が必須

変更候補	結果
Gemini 2.5 Flash-Lite	同価格だが、体感性能が明らかに低下
Gemini 2.5 Flash (思考オフ)	コストが3倍以上。指示の忠実さ、長文コンテキストの扱いに弱く、体感性能は劣る印象。同じ発言の繰り返し、古いメッセージへの返信、AI同士の対話で無限ループ。

⽣成AIをコアとするプロダクト開発で使う技術

複数AIと対話できるプロダクトと業務を通して

Tomoki Yoshida

自己紹介

吉田 知貴

今日学べる内容

生成AIをコアとするプロダクトの技術的工夫

プロダクト開発全体の工夫

生成AIをコアとするプロダクトの技術的工夫

複数AIと対話できるプロダクトから学ぶ

今日扱う題材

デモ動画: マルチAIマルチモーダル対話

システム構成

LLMアーキテクチャ

話者決定LLM - 構造化出力を学ぼう -

動的スキーマによる工夫

LangChainの構造化出力

PydanticOutputParser vs StructuredOutputParser

StructuredOutputParser

PydanticOutputParser

with_structured_outputの挙動チェック

構造化出力の信頼性向上

リトライ

別のLLMへ修正依頼

発話生成LLM - マルチAI対話の課題 -

エラーの原因と解決方法

原因：AIが連続で発話すると空文字列が返る

解決： 対話履歴を文字列として与える

マルチモーダル対応

画像入力も扱いたい！

マルチAIマルチモーダルならどうする？

マルチAIマルチモーダル対応

ローカルLLM（Ollama）サポート

音声合成 - レイテンシ削減の工夫 -

プロダクト開発全体の工夫

「最初は高性能なモデルでやって、後から安価なモデルに変えればいいや」

「プロンプトは後で洗練させればいいや」

本当にそれでいいですか？

評価の重要性 - 本来モデル変更は大変なもの -

評価の重要性 - オフライン評価 -

プロンプトエンジニアリング

基本テクニック

プロンプトの洗練

コンテキストキャッシュ - コストと速度の最適化へ -

具体例

ポイント

エピソード: LLMモデル変更の苦悩

LangChainを使う理由 - 生のAPIを叩かない理由 -

AI駆動開発について

開発に便利ツール・サイト

GitHub関連

MCP

まとめ

生成AIをコアとするプロダクトの技術的工夫

プロダクト開発全体の工夫

ご清聴ありがとうございました

余談

FYI: DeNA 生成AI新規プロダクト開発

Appendix

音声認識（ASR）- 実装パターン -

音声認識 - 発話区間検知（VAD）周辺の工夫 -

工夫

画像生成機能でシーン生成

Gemini 2.0 Flash Preview Image Generation

FastSD CPU

吉田知貴

解決：対話履歴を文字列として与える