So, we all know hard prompts. You ask LLM like llama or ChatGPT: what is a hard prompt? or you tell diffuser model like flux: generate a horse with big eyes and huge frog riding it in anime style.
It’s easy to see what you want to know. Now, it’s harder for llm to understand so people tried to a better way.
It was found, that you can start from random tokens like “fhytghhffbg” “jgygfcbcvhh“ and train them to create soft (not readable by humans) who are perfectly understandable by LLM or diffusion model for images.
So, how to transform a hard prompt into trained soft prompt tokens?
I will show the process using CLIP.
CLIP is a transformer, huge neural network used to associate text to image for diffuse image generation model. CLIP knows how things look and can pass this knowledge to generate complex images. It knows what horse looks like, how eye looks like etc.
How the training process looks?
- Starting Point:
- Hard prompt: “image of serene landscape at sunrise with trees mountains and lake”
- Random soft tokens: [v1][v2][v3]…[v10] (initialized as random noise) v’s look like “tfhgdth” like shown earlier.
- Training Process:
Original prompt → CLIP embeddings ↑ [v1][v2][v3] → Train these to match
- Get embeddings of your hard prompt using CLIP
- Gradually update the soft tokens to capture the same meaning
- Use loss functions to make soft tokens match the semantic meaning. This means program tries to minimize the distance between hard and soft.
- Result:
- Trained tokens like [v1][v2][v3]… that encode the “landscape” concept
- These might look like gibberish (e.g., “jX#@ kL$%”) but they encode the meaning
- Usage:
# Original way "image of serene landscape at sunrise with trees mountains and lake" # With trained soft prompt "[v1][v2][v3] mountain view" # Shorter but carries same meaning
The key idea is:
- Hard prompt: Human-readable, long, descriptive
- Soft prompt: Machine-optimized tokens that encode the same meaning more efficiently
Further reading:
Here is a curated list of sources that delve into the concept of soft prompts and their applications in natural language processing:
1. “Soft Prompts” – Learn Prompting
• Description: An in-depth exploration of soft prompts, detailing their development and applications.
• Link: https://learnprompting.org/docs/trainable/soft_prompting
2. “Soft Prompts” – Hugging Face
• Description: A comprehensive guide on soft prompts, discussing their implementation and benefits in model training.
• Link: https://huggingface.co/docs/peft/conceptual_guides/prompting
3. “Understanding Prompt Tuning: Enhance Your Language Models with Precision” – DataCamp
• Description: An article explaining prompt tuning and how soft prompts can be utilized to improve language model performance.
• Link: https://www.datacamp.com/tutorial/understanding-prompt-tuning
4. “Soft Prompts” – KoboldAI/KoboldAI-Client GitHub Wiki
• Description: A technical overview of soft prompts, including their creation and integration within the KoboldAI framework.
• Link: https://github-wiki-see.page/m/KoboldAI/KoboldAI-Client/wiki/Soft-Prompts
5. “The Power of Scale for Parameter-Efficient Prompt Tuning” – arXiv
• Description: A research paper discussing the scalability and efficiency of prompt tuning using soft prompts.
• Link: https://arxiv.org/abs/2104.08691
6. “Guiding Frozen Language Models with Learned Soft Prompts” – Google Research
• Description: An article exploring how soft prompts can be used to guide pre-trained language models without altering their core parameters.
• Link: https://research.google/blog/guiding-frozen-language-models-with-learned-soft-prompts/
These resources provide comprehensive insights into soft prompts and their applications in enhancing language model performance.
Warning: deep rabbit hole ahead