跳至主要内容

T3_03/31_預期處理方式

· 閱讀時間約 12 分鐘
Max Kuo
Website Users

0331 的進度

本來預計是使用其他的模型來使用訓練 AI

ollama3.3

今天有看到其他的方案

  • 使用 Ollama(本地輕量 LLM)

    使用途徑:作為自然語言控制中介、生成處理流程或提示說作為自然語言控制中介、生成處理流程或提示說明

  • 還有 LM Studio 但我暫時還沒使用過

🧠 Ollama:負責分析語意、生成推薦處理指令或解釋

🐍 Python Server(FastAPI 或 Flask):負責處理圖片與執行 AI 模型(如去噪、自動增強、白平衡)

🧱 Java Spring:作為後端主控制系統,呼叫 API 與負責資料邏輯


在使用者「上傳圖片」後:

  1. 自動分析這張圖片可能的問題(例如模糊、過暗、偏色)
  2. 給出建議的處理步驟(如增亮、去噪、白平衡)
  3. 把這些步驟顯示在前端讓使用者勾選想要執行的處理項目

🖼️ 使用者上傳圖片

🎯 Python AI Server:圖像分析模組(分析圖片本身)

🧠 Ollama:語言模型輔助生成建議(結合圖像特徵)

📦 傳給 Java Spring / 前端

✅ 使用者在前端選擇處理步驟

⚙️ 將選擇的項目再發回處理


至 ollama 官網下載 → 這裡

目前使用過的模型:


python source code

import cv2
import numpy as np
import requests
import json

def analyze_image_features(image_path):
img = cv2.imread(image_path)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# --- 模糊度判斷 ---
blur_score = cv2.Laplacian(gray, cv2.CV_64F).var()

# --- 亮度計算(灰階平均)---
brightness = np.mean(gray)

# --- 色偏判斷 ---
mean_colors = cv2.mean(img)[:3]
mean_b, mean_g, mean_r = mean_colors
dominant_color = max(("blue", mean_b), ("green", mean_g), ("red", mean_r), key=lambda x: x[1])[0]

# --- 對比程度 ---
contrast = gray.std()

# 回傳圖像特徵描述
return {
"blur_score": blur_score,
"brightness": brightness,
"color_dominance": dominant_color,
"contrast": contrast
}

def generate_processing_prompt(features):
blur = features["blur_score"]
brightness = features["brightness"]
contrast = features["contrast"]
color = features["color_dominance"]

prompt = (
"Given the following image characteristics:\n"
f"- Blur score: {blur:.2f} (lower = blurrier)\n"
f"- Brightness: {brightness:.2f}\n"
f"- Contrast: {contrast:.2f}\n"
f"- Dominant color: {color}\n\n"
"Suggest 1-3 suitable image enhancement steps using snake_case English terms, "
"such as adjust_brightness, denoise, color_balance, etc. "
"Also briefly explain your reasoning."
)
return prompt

model_select = ["mistral", "deepseek-r1", "llama3.3"]

def query_ollama(prompt, model="deepseek-r1"):
url = "http://localhost:11434/api/generate"
headers = {"Content-Type": "application/json"}
payload = {
"model": model,
"prompt": prompt,
"stream": False
}
response = requests.post(url, headers=headers, data=json.dumps(payload))
return (response.json().get("response", "No response from model."), model)



def translate_to_chinese_ollama(text_to_translate, model="llama2"):
"""
使用 Ollama 和指定模型將文字翻譯成中文。
Args:
text_to_translate (str): 要翻譯的文字。
model (str, optional): 要使用的 Ollama 模型。預設為 "llama2"。
Returns:
tuple: (翻譯後的中文文字, 使用的模型名稱)。
"""
url = "http://localhost:11434/api/generate"
headers = {"Content-Type": "application/json"}
prompt = f"將以下文字翻譯成中文:\n\n{text_to_translate}"
payload = {
"model": model,
"prompt": prompt,
"stream": False
}
try:
response = requests.post(url, headers=headers, data=json.dumps(payload))
response.raise_for_status() # 檢查 HTTP 狀態碼
return (response.json().get("response", "模型沒有回應。"), model)
except requests.exceptions.RequestException as e:
return (f"翻譯時發生錯誤:{e}", model)
except json.JSONDecodeError as e:
return (f"回應格式錯誤:{e}", model)


# 🔧 整體流程打包
def analyze_and_recommend(image_path):
features = analyze_image_features(image_path)
prompt = generate_processing_prompt(features)
print("📤 Prompt to Ollama:\n", prompt)
(response, model) = query_ollama(prompt)
print("📥 Model Suggestion:\n", response)
return features, response, model

# 測試用

if __name__ == "__main__":
img_path = "1743387383341.jpg" # ← 你可以替換成任意圖片
(features, response, model) = analyze_and_recommend(img_path)
(chinese_response, use_model) = translate_to_chinese_ollama(response)
print(chinese_response)
with open(f"{model}_response.txt", "w", encoding="UTF-8") as file:
file.writelines(f"model: {model}\n{'-'*2}Start{'-'*2}\n")
file.writelines(f"prompt:\n{features}\n{'-'*10}\n")
file.writelines(f"response:\n{response}\n{'-'*10}\n")
file.writelines(f"chinese_response:\n{chinese_response}\n{'-'*3}End{'-'*3}")
# print(f"\nmodel: {model}")

輸出範例:

model: deepseek-r1
--Start--
prompt:
{'blur_score': np.float64(69.92813807859359), 'brightness': np.float64(174.56549217972815), 'color_dominance': 'red', 'contrast': np.float64(55.71121399112855)}
----------
response:
<think>
Okay, I'm looking at this user query where they want me to suggest some image enhancement steps based on the given characteristics. The image has a blur score of 69.93, brightness of 174.57, contrast of 55.71, and a dominant color red.

First, I need to understand what each characteristic means. A high blur score suggests that the image is somewhat blurry or out of focus. Brightness at 174 seems very high; maybe it's overexposed? Wait, typical brightness ranges vary by sensor type. I should consider if this might be a problem. High contrast could mean there are sharp areas and others are too bright, leading to harsh edges.

Dominant color red implies the image has a lot of red elements, so enhancing colors around that would make sense. Now, thinking about enhancement steps. The user asked for snake_case terms like adjust_brightness etc., so I should use those.

I'll start with desharpening because high blur can be fixed by sharpening edges to bring focus back. Then, denoising since even if the image isn't too noisy, it's better than having excess noise.

Adjusting brightness: Since it's 174, which is super bright. If it's overexposed, maybe reduce it a bit. Maybe half of that? Not sure about the range, but reducing could help with overall tones and prevent harshness.

Contrast enhancement is next because high contrast can lead to harsh edges, so smoothing that out would improve visual appeal without losing important details.

Color balance adjustment makes sense too since dominant color is red; adjusting might make it more balanced. Maybe subtract some red to bring in warmer tones or vice versa depending on the image.

Wait, but order matters. Desharping first because blur can be fixed by sharpening edges, then noise reduction as desharping could create some artifacts that need cleaning up. Then brightness adjustment, followed by contrast. Finally, color balance if necessary.

Putting it all together: desharpen, denoise, adjust_brightness, enhance_contrast, adjust_color_balance. That seems logical and covers the main issues without overcomplicating things.
</think>

Here are 3 suitable image enhancement steps based on the provided characteristics:

1. **desharpen** - The blur score of 69.93 suggests that the image may be slightly blurred or out of focus in some areas. Desharpening can help bring back sharpness and clarity to the edges and details.

2. **denoise** - While the brightness (174.57) is quite high, it might indicate overexposure rather than actual noise. However, if noise exists, denoising can help reduce it while preserving image detail.

3. **adjust_brightness** - The brightness value of 174.57 could potentially be too high and may cause overexposure or harshness in the image's tones. Adjusting the brightness down (e.g., to half its current value) can help balance the overall tone and prevent visual discomfort.

These steps are suggested based on the assumption that the high blur score indicates a need for desharpening, while the very high brightness may require adjustment if it is indeed overexposure. The contrast enhancement could also be considered as an additional step to smooth out sharpness changes caused by desharping.
----------
chinese_response:
以下是翻譯的中文版:

我在思考用戶提供的图像,它具有輕度模糊(blur score 69.93)、亮度(174.57)和对比度(55.71)各自特點。首先,我需要了解這些特點的含義。高模糊分數可能表示图像有一些模糊或是否對焦。高亮度(174)可能會讓影像Overexpose,我需要考慮這可能會導致問題。高对比度可能表示图像有許多明亮區域和其他地方是太亮,使得邊緣模糊。

dominant color red implied that the image has a lot of red elements,因此提高顏色會更有感觸。现在,我們思考進一步的優化步驟。用戶提供了snake_caseterminal,我需要使用這些詞彙。

我們開始 с desharpening,因為高模糊可以通過sharpning Edge來 RESTORE focus。然後,我們進行denoising,因為不論是否有多少騷動,image的noise應該被降低。

接下來,我們考慮adjust_brightness,因為高亮度(174)可能會讓影像Overexpose。如果需要,可以降低其中的一半來實現更好的整體調整。最後,我們進行contrast和color_balance adjustment,因為高对比度可能會導致邊緣模糊,而dominant color red implied that the image has a lot of red elements,這可以通過调整顏色來更好地平衡。

根據我們的思考,這些步驟是最Logical and straightforward的:desharpening first、denoising second、adjust_brightness third、contrast和color_balance adjustment last.

以下是三個適當的圖像優化步驟,基於提供特點:

1. **desharpen** - 高模糊(69.93)可能表示图像有些模糊或是否對焦。通過sharpning Edge來 RESTORE focus。
2. **denoise** - 高亮度(174)可能會讓影像Overexpose,所以進行denoising來reduce noise而Preserve image detail。
3. **adjust_brightness** - 高亮度(174)可能會讓影像Overexpose。如果需要,可以降低其中的一半來更好地平衡整體調整。
---End---

在 github 找模型測試,臉部修復與模糊照片清晰化(針對人臉) https://github.com/TencentARC/GFPGAN

python inference_gfpgan.py -i inputs -o results -v 1.3 -s 2

圖片出處:國立臺灣大學圖書館_數位典藏庫 使用前: 32 使用後: 32 (1)


圖像去雜訊(Denoising):

破損區域修復(Inpainting):

解析度提升(Super-Resolution):

色彩還原(Colorization):

細節增強(Sharpening & Contrast):


一、目標:分析圖片「需要怎麼修圖」的模型類型:

1. 自動圖像增強建議模型

這類模型會針對圖片給出「增強建議」,如對比度提升、色溫修正、銳化等。

模型推薦:

  • EnhanceGAN:基於 GAN,自動分析圖片內容並輸出修圖風格建議(如曝光、色彩、飽和度)
  • Deep Photo Enhancer (DPE):可根據訓練資料推論出高品質圖像風格的增強建議
  • Zero-DCE / Zero-DCE++:針對曝光不足圖片,自動分析後進行亮度、對比等增強

2. 圖片美學評分模型(Image Aesthetic Assessment)

這類模型會幫圖片「打分數」並指出不足之處。

模型推薦:

  • NIMA (Neural Image Assessment):Google 開發,可預測圖片在美學上的分數與改進方向
  • AVA dataset 上的模型:針對照片構圖、光線等進行分析,搭配 CNN 訓練而來

3. 風格識別 + 建議模型

適合想要針對圖片做「風格化調整」或「符合特定風格的優化」。

模型推薦:

  • Style Classification CNN:辨識出圖片是否偏清新、復古、HDR 等風格,然後決定修圖策略
  • PhotoStyle Transfer (PhotoWCT):進行修圖建議或轉換到特定風格

4. 基於問答與多模態分析模型(進階應用)

使用多模態模型,如 GPT-4V(Vision)BLIP-2、LLaVA 等,可以讓模型「看圖回答」修圖建議。

用法範例:

  • 給模型一張圖片 + prompt:「這張圖應該如何修圖會更好看?」
  • 模型會回應:「增加亮度與對比、降低藍色色溫、提升飽和度...」

二、組合應用建議(圖片分析 + 修圖模型)

圖片分析模型輸出搭配的修圖模型
EnhanceGAN / NIMA需要曝光修正、色彩調整EnlightenGAN、Deep WB、RetinexNet
Style Classifier偏灰、無風格Style Transfer (AdaIN, WCT)
GPT-4V / LLaVA自然語言建議自定義修圖流程(OpenCV/PIL 實作)