← Back to Home
Train Custom Model
🎯 What is this?
Train a custom embedding model from your domain-specific text. The model will learn terminology and synonyms specific to your field (medical, legal, finance, etc.).
📋 Requirements:
- Minimum: 1000 words (3+ paragraphs)
- Recommended: 5000-10000 words (50-100 paragraphs)
- File format: .txt or .md
- Max file size: 10 MB
⚠️ Important: Training takes 5-60 seconds depending on corpus size. The page will redirect when complete. Larger corpora (5000+ words) produce better synonym learning.
Upload Training Corpus
💡 Tips for Best Results
- Use domain-specific text: Medical records, legal documents, financial reports, etc.
- Include synonyms: Use both technical and common terms (pyrexia/fever, renal/kidney)
- More is better: 50-100 paragraphs produce better models than 10-20
- Consistent formatting: Separate paragraphs with blank lines
- Clean text: Remove excessive formatting, headers, footers
📚 Example Use Cases
- Medical: Train on medical textbooks to understand pyrexia↔fever, renal↔kidney, UTI↔bladder infection
- Legal: Train on legal documents to understand plaintiff↔claimant, liability↔responsibility
- Finance: Train on financial reports to understand liquidity↔cash flow, equity↔stock
- Technical: Train on documentation to understand domain-specific jargon