The most common way to write an AI image prompt is to throw in one sentence like “a penguin using a computer” and wait for the AI to randomize a result. If you are lucky, it works occasionally. If not, you regenerate seven or eight times. Once the prompt is structured, the success rate can rise from about 30% to 70%. This article documents the method Penchan actually uses.

The Four-Layer Prompt Structure

Split the prompt into four blocks. Each block answers one question:

Layer 1: Subject. What should be drawn?

This is the most basic layer. Describe the main character, scene, and action. The more specific, the better. “A penguin” and “a small penguin wearing an orange scarf, sitting at a desk with an open laptop in front of it” produce completely different results.

Layer 2: Style. What style?

Watercolor, 3D render, pixel art, colored pencil, Japanese illustration, minimal line art. Style decides the overall “feel” of the image. Colored pencil and flat illustration are relatively safe choices that do not look too AI-generated.

Layer 3: Composition. How should it be arranged?

Camera angle (top-down, eye level, low angle), subject placement (center, left third), whitespace position (empty space on the right for text), aspect ratio (16:9 banner, 1:1 square).

Layer 4: Constraints. What should be avoided?

Many people ignore this layer, but it is very effective for output control. “No text,” “no yellow beak,” “no oversaturated colors,” “no photorealistic style.”

Four-layer prompt structure

Practical Gemini/ChatGPT Prompt Examples

These are formats Penchan has actually used in Gemini.

Example 1: Blog Cover Image

主體:一隻小企鵝坐在桌前,面前擺著三台螢幕,螢幕上顯示不同的 AI 工具介面
風格:彩色鉛筆風格,柔和暖色調,略帶手繪感
構圖:16:9 橫幅,企鵝在畫面左三分之一處,右側留白可放標題文字
約束:不要照片寫實風,不要過度銳利的邊緣,不要黃色尖嘴(嘴巴是橘色圓形)

Example 2: Social Image

主體:一隻小企鵝拿著放大鏡,看著一段散發光芒的程式碼
風格:平面插畫風格,色塊分明,有輕微紋理
構圖:1:1 方形,主體置中,背景簡潔
約束:不要 3D 效果,不要漸層背景,背景用單一淺色

Example 3: Tutorial Step Diagram

主體:一個簡單的流程圖,左邊是麥克風圖示,中間是 AI 處理的齒輪圖示,右邊是字幕文字圖示,用箭頭連接
風格:線條插畫,深藍配橘色,乾淨俐落
構圖:16:9 橫幅,三個元素等距排列
約束:不要寫實圖片,不要多餘裝飾元素,如果有文字請用英文

What these examples share: written in Chinese, clearly structured, one line for each part. Gemini understands this format very well. It does not need the English-plus--- parameter style that Midjourney uses.

More Scenario Prompts You Can Copy Directly

The three examples above are tool-oriented. The following are the scenarios Penchan switches between most often in real work.

Article Cover Image (Blog, Newsletter, Press Release)

Scenario: Main image for a blog article, newsletter, or press release. Usually 16:9, with space on the right for a title. Best tools: Gemini/ChatGPT (first choice, strongest instruction following), Midjourney (after translating into English) How to use: Fill in the topic and title keywords, then paste into the Gemini chat window.

主體:桌上散落著三本筆記本、一杯冒著熱氣的咖啡、一台打開的筆電,螢幕顯示簡單的文字編輯器
風格:水彩風格,柔和的早晨光線,略帶紙張紋理
構圖:16:9 橫幅,物件集中在左半邊,右半邊留空白可以疊標題文字
色調:暖米色背景配淺褐和淡藍,整體飽和度偏低
主題關鍵字:[填主題,例如:晨間寫作習慣]
禁止:文字、logo、3D 效果、過度銳利的邊緣、高飽和度的鮮豔色塊

Penchan tip: A blog cover should echo the page’s main color. In practice, upload an existing cover first and tell Gemini to “refer to this image’s color tone.” Consistency improves a lot.

Social Post Image (IG, Threads, X)

Scenario: Square or 4:5 vertical image for a short post. It needs to catch attention and stop the scroll. Best tools: Gemini, ChatGPT, Midjourney How to use: Choose the ratio by platform: 1:1 for X and Threads, 4:5 for IG and Facebook.

主體:一個簡單的視覺隱喻,表達 [貼文主題,例如:資訊焦慮]
風格:平面插畫,色塊分明,有一點點手繪不規則感
構圖:1:1 方形,主角置中偏上,下方留三分之一空間給疊字
色調:低飽和的莫蘭迪色系,主色深藍灰配一點暖橘
氛圍:安靜、帶一點幽默感,像朋友在說一件小事
禁止:文字、臉部特寫、高飽和霓虹、漸層背景、3D 渲染

Penchan tip: The biggest risk for social images is looking too similar to everyone else. Fix a color palette, such as deep blue-gray plus warm orange, and apply it to every post. Over time, followers will recognize the image as yours.

Product Promo Image (E-commerce, Crowdfunding)

Scenario: Context image for an e-commerce product page or crowdfunding page. It should make people want to buy without looking like stock material. Best tools: Gemini/ChatGPT (first choice, can upload product photo as reference), Midjourney (for atmosphere images) How to use: Always upload a real product photo before using this prompt.

主體:參考上傳的產品,把它放進一個日常使用的場景:[例如 週末下午的書桌上,旁邊放著翻開的書和一杯茶]
風格:生活攝影感,自然光,淺景深
構圖:4:5 直幅,產品在畫面中央偏下三分之一的位置,上方背景稍微虛化
光線:側光,從畫面右上方進來,在產品上形成柔和的陰影
氛圍:慢、安靜、有生活感,像隨手拍下的一瞬間
禁止:塑膠感、過度光滑、AI 味人物、握手和西裝商務場景、偽造產品細節
重要:產品的外觀、顏色、logo 必須跟上傳的圖完全一致,不能改動

Penchan tip: The last line, “do not change product appearance,” is important. Gemini sometimes helpfully “beautifies” a product, but then the output differs from the real product by one shade and clients get angry.

Character Illustration (Avoiding AI Faces)

Scenario: A blog illustration needs a person. AI-generated faces often have unnatural eyes and teeth. Best tools: Gemini, ChatGPT, Midjourney How to use: The key is avoiding front-facing close-ups and using back views or side faces.

主體:一個人坐在窗邊的書桌前,從背後或側面看過去,手邊有一本書和一支筆
風格:彩色鉛筆手繪風,紙張紋理明顯,線條略有抖動感
構圖:16:9 橫幅,人物在畫面左側三分之一處,不顯示正面五官
角度:從後斜上方 45 度俯視,看到後腦勺和肩膀,臉部朝向窗外
色調:午後陽光的暖橘配淡綠,低飽和
禁止:正面人臉、牙齒特寫、對視鏡頭的眼神、塑膠感皮膚、完美五官

Penchan tip: If the prompt contains words like “front-facing” or “close-up,” AI easily draws a strange face. Use descriptions like “back view,” “45-degree side face,” or “only up to the shoulders,” and it almost never goes wrong. If you really need a face, use real photo material or shoot it yourself.

Information Diagrams (Flowcharts, Comparison Diagrams)

Scenario: An article needs a simple diagram to explain a flow or comparison. This is not a formal infographic. Best tools: Gemini/ChatGPT (can draw simple line diagrams), manual Figma work (most stable; AI-generated text is often blurry) How to use: If the diagram contains text, ask AI to draw only the graphics and add the text manually in Figma.

主體:一張簡單的三步驟流程圖,三個圓角方塊由左到右排列,中間用箭頭連接
元素:
  第一格:一張紙的圖示,代表輸入資料
  第二格:齒輪和 AI 晶片的組合,代表處理
  第三格:一個對話框圖示,代表輸出
風格:極簡線條插畫,粗細一致的描邊,沒有填色或只填淺色
構圖:16:9 橫幅,三個方塊等距排列,背景留白
色調:背景純白 #FFFFFF,線條深灰 #2D3748,重點色用一點淺藍 #90CDF4
禁止:任何文字(中英都不要)、3D 立體、漸層、陰影、多餘裝飾

Penchan tip: The final line, “no text of any kind,” is the key. AI-generated text is almost always blurry or wrong. It is better to leave the image empty and add clean Chinese text in Figma. This saves an entire retry round.

Reference Images: The Key to Consistency

Pure text prompts have a ceiling: AI can only guess the image in your head. Reference images can close that gap significantly.

The practical method is to upload the image directly to Gemini, then tell it: “Refer to this image’s style and character design, then generate the following content.”

This is especially useful for solving character consistency. For example, the brand penguin has an orange rounded beak, but real penguins in AI training data mostly have yellow pointed beaks. If you only emphasize “orange rounded beak” in text, the model often gets pulled back to the yellow pointed beak. Once you attach a reference image, the error rate drops noticeably.

Before and after prompt optimization

How to Reduce the AI Look

AI-generated images have an “AI look” that people can recognize at a glance: high saturation, overly smooth textures, edges that are unnaturally sharp, lighting that is too perfect, gradients. There are several ways to reduce it:

Specify a textured style. Colored pencil, watercolor, pastel, crayon. These styles naturally include irregular strokes and textures, so they look less AI-generated than 3D render styles.

Lower saturation. Add “soft tones,” “low saturation,” or “muted colors” to the prompt. AI’s default colors tend to be highly saturated. Once you pull them down, the whole image feels much more comfortable.

Add a little imperfection. “Slightly hand-drawn,” “edges should not be too sharp,” “natural lighting, not over-HDR.” These small instructions make the final image feel less overly clean.

Avoid styles AI is best at. Hyperrealistic portraits, sci-fi scenes, 3D product renders. These are AI’s comfort zones, and the result often looks obviously AI-made. Imperfect styles like colored pencil and hand-drawn illustration tend to have much less AI look.

Penchan brand images almost all use colored pencil style for a simple reason: they are the least likely to be recognized as AI-generated at first glance.

Pitfall: The Penguin Beak Story

This pitfall deserves its own section because it shows a fundamental limitation of AI image generation.

The brand penguin has an orange rounded beak. A very simple feature, but AI keeps drawing it wrong.

The first instinct was that the prompt was not clear enough, so the line the penguin has an brown rounded beak, NOT yellow, NOT pointy was added. It improved things, but errors still appeared occasionally.

The real reason is that there are millions of penguin photos in the model’s training data, and most penguins have yellow pointed beaks. No matter how much the prompt emphasizes the difference, the model’s “instinct” still pulls it back to a yellow pointed beak.

The final solution was to combine reference images with text constraints. Attach one reference image with the correct beak, and also write “orange rounded beak” explicitly in the prompt. Only after using both did the success rate stabilize.

Lesson: AI output is strongly tied to training data. When what you want differs from common patterns in the training data, text description alone is not enough. Give it a visual reference.

Prompt Writing Differences by Tool

Comparison itemGemini (Nano Banana Pro / Nano Banana 2)Latest MidjourneyChatGPT built-in (GPT Image 2.0)
LanguageChinese and English both workEnglish onlyChinese works (conversation auto-translates)
FormatNatural language, no special syntaxNeeds parameters like --ar, --styleNatural language, conversational
Negative constraintsDirectly write “do not XX”Use --no parameterDirectly write “do not XX”
Reference imagesUpload image plus text descriptionUse image URL plus /describeChatGPT conversation can attach images
Style controlDescribe the style in text--style raw plus style keywordsDescribe in text, weaker control
Learning curveLowHighLow

For model-version differences, also see Gemini Free vs Pro differences.

Complete Image Generation Workflow

The flow from idea to finished image:

  1. First decide the purpose and placement of the image
  2. Write the prompt with the four-layer structure (subject, style, composition, constraints)
  3. If a brand character is involved, attach a reference image
  4. Generate 3-4 images and choose the closest one
  5. If none are right, adjust the weakest layer in the prompt and generate again
  6. After choosing, do final tweaks in Figma (add text, adjust colors, crop)

The whole process takes about 5-15 minutes per image. A new scene takes longer the first time because it needs more rounds to find the right direction.

FAQ

How do I write AI image prompts that do not produce strange images?

The key is structure. Split the prompt into four blocks: subject, style, composition, and constraints. The more concrete each block is, the less likely the AI is to drift. Pay special attention to the constraints layer. Clearly telling AI what you do not want is more effective than only telling it what you want.

Why do AI-generated images look fake?

Usually it is a style problem. AI defaults toward high saturation, smooth textures, and overly sharp rendering. That is the so-called AI look. Specifying hand-drawn, watercolor, colored pencil, or other textured styles can significantly reduce it.

Can AI image prompts be written in Chinese?

Depends on the tool. Gemini and ChatGPT understand Chinese prompts well, so you can write directly in Chinese. Midjourney only accepts English, so translate yourself or ask AI to help turn it into an English prompt.

What is the most commonly ignored part of a prompt?

The constraints layer. Most people only tell AI what they want, but not what they do not want. Adding negative constraints, such as no text, no oversaturation, or no yellow beak, can significantly reduce the number of regenerations.

How do I make AI generate images in a consistent style every time?

Reference images are the most effective method. Upload an image that is already confirmed OK, then ask AI to reference its style. Generating continuously within the same session also maintains some consistency, but reopening the next day can drift.


Penchan’s Take

Penchan first encountered AI image generation during the early Midjourney Discord-interface period. Later, the main tools shifted to Gemini and ChatGPT for a simple reason: they follow Chinese instructions well, allow direct reference image uploads, and make brand character consistency much more stable than pure text descriptions. Canva’s AI image generation was also tested for a while, but its gradient handling and overall texture did not fit, so Penchan did not return to it.

“Colored pencil + no gradients” is the fixed foundation for Penchan brand images. The reason is that AI’s default high-saturation, gradient, 3D-texture style is too easy to recognize at a glance. Colored pencil style comes with hand-drawn texture and irregularity, so it has the lowest chance of falling into the AI look.

Building a prompt library is also a habit formed over the past few years. Every time a good instruction structure is found, it gets saved. The next time a similar image is needed, changing a few words is much faster than starting from zero. The pen-pings series is the sharing format that organizes these frequently used prompts.

Prompting has no finish line. Every time a tool version changes, methods that worked before may stop working, and different models produce different results. In the long run, the key to consistently producing usable images is building your own instruction library and iterating with tool versions, not clinging to one “god prompt.”

Further Reading


— Penchan