【Text-based Image Editing】InstructPix2Pixでテキストを元に画像を編集する

2022年11月に発表されたテキストの指示により画像を編集する、拡散モデルベースの画像編集モデルです。

サンプル画像を使用して、実際に任意のテキストを与えて画像を編集してみましょう。

Google colabを使用して簡単に実装することができますので、ぜひ最後までご覧ください。

今回の内容

・InstructPix2Pixとは

・InstructPix2Pixの導入

・実装

・実装例の紹介

1. InstructPix2Pixとは
2. InstructPix2Pixの導入
2.1. 導入
2.2. 必要な関数の定義とモデルのダウンロード
3. 実装
4. 実装例の紹介
5. まとめ

InstructPix2Pixとは

InstructPix2Pixは、テキストの指示により画像を編集する拡散モデルベースの画像編集モデルです。

言語モデルであるGPT-3 とテキストから画像へ変換するモデルであるStable Diffusionの 2 つの事前トレーニング済みモデルの知識を組み合わせて、画像編集例の大規模なデータセットを生成します。

人を動物に変えたり、オブジェクトの色を変更したりするなど、任意の方法で画像を編集するためのテキスト指示を与えることができます。

以下の例のように、左の画像に対して「replace the dog with monkey」というテキストにより変換すると、右のような画像を得ることができます。

引用：https://www.timothybrooks.com/instruct-pix2pix

詳細は以下のリンクからご確認ください。

コード：https://github.com/timothybrooks/instruct-pix2pix
論文：https://arxiv.org/abs/2211.09800

InstructPix2Pixの導入

導入

ここからはGoogle colabを使用して実装していきます。

今回紹介するコードは以下のボタンからコピーして使用していただくことも可能です。

まずはGPUを使用できるように設定をします。

「ランタイムのタイプを変更」→「ハードウェアアクセラレータ」をGPUに変更

次にGoogleドライブをマウントします。

from google.colab import drive
drive.mount('/content/drive')
%cd ./drive/MyDrive

公式よりcloneしてきます。

!git clone https://github.com/timothybrooks/instruct-pix2pix

%cd instruct-pix2pix

必要なライブラリをインストールします。

!pip install git+https://github.com/huggingface/diffusers.git@69c76173faaa3831cc4bc6f19b60b4ac8e9e4473
!pip install transformers==4.25.1 accelerate==0.15.0 safetensors==0.2.8

以上で導入は完了です。

必要な関数の定義とモデルのダウンロード

次に必要な関数の定義とモデルのダウンロードします。

import PIL
import requests
import torch
from diffusers import StableDiffusionInstructPix2PixPipeline, EulerAncestralDiscreteScheduler

device = 'cuda' if torch.cuda.is_available() else "cpu"
print("using device is", device)

モデルを指定します。

model_id = "timbrooks/instruct-pix2pix"
pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16, revision="fp16", safety_checker=None)

以上で準備は完了です。

実装

変換したい画像を用意します。今回はこちらの画像を使用します。

pipe.to("cuda")
pipe.enable_attention_slicing()

image = PIL.Image.open("./test.jpg")
image = image.convert("RGB")
image

prompt = "what would it look like if it were snowing?"
pipe(prompt, image=image, num_inference_steps=100, image_guidance_scale=0.4).images[0]

実装例の紹介

この画像でも試してみます。

prompt = "replace the dog with cat"
pipe(prompt, image=image, num_inference_steps=20,image_guidance_scale=1.5, guidance_scale=7).images[0]

prompt = "replace the dog with bird"
pipe(prompt, image=image, num_inference_steps=20,image_guidance_scale=1.5, guidance_scale=7).images[0]

prompt = "replace the dog with monkey"
pipe(prompt, image=image, num_inference_steps=20,image_guidance_scale=1.5, guidance_scale=7).images[0]

まとめ

最後までご覧いただきありがとうございました。

今回はInstructPix2Pixについて紹介しました。

stable diffusionをはじめとした画像関係の技術は進歩は素晴らしいですね。

カテゴリー: Python、画像生成

タグ: InstructPix2Pix Python Stable Diffusion