Gemini: Multimodal & Workspace

Pre-Flight Briefing

The Million-Token Multimodal Engine

Google's Gemini 1.5 Pro features a revolutionary 1-million to 2-million token context window. More importantly, it is natively multimodal—it doesn't just read text; it processes images, audio waveforms, and video frames directly.

Because it processes video natively, you can ask highly specific temporal questions (e.g., 'What happens at timestamp 1:04?').

Gemini also features Workspace Grounding. By using tags like @Gmail or @GoogleDocs in your prompt, you allow the model to securely search your personal ecosystem to retrieve real-time context before answering.

Reference Examples

Workspace Grounding

@Gmail Find the latest email from John about the 'Project Alpha' budget. Cross-reference it with the budget numbers in @GoogleDocs 'Q3 Finances' and note any discrepancies.