My AI Journal

Build Smarter AI Apps with This LangGraph + Vercel AI SDK Starter Template 🚀

Sirsho Chakraborty — Mon, 09 Jun 2025 13:00:05 GMT

For the past few weeks, I’ve been exploring how to build more sophisticated AI applications – not just chatbots, but actual agents that can reason, search the web, interact with APIs, and deliver useful outcomes.

I wanted a starter project that could bring together the power of the Vercel AI SDK, the flexibility of LangGraph (via LangChain), and the usability of a modern React frontend, all with built-in support for real-time web search using SerpAPI.

To my surprise, there wasn’t a good one out there.

So, I decided to build it myself.

👉 GitHub Repo

🧠 What This Project Solves

Most AI starter kits are either too simple or too fragmented – they might show you how to call an OpenAI model, or how to use LangChain chains, or how to build a React UI... but not all of it working together in a coherent, extendable structure.

This project aims to bridge that gap and give developers a solid foundation to build truly intelligent applications.

🔧 Tech Stack

Here’s what’s under the hood:

1. Vercel AI SDK

Makes it easier to stream responses from LLMs with built-in support for OpenAI, Anthropic, and others. No need to handle SSE manually.

2. LangChain + LangGraph

Used to create structured, multi-step flows that let the AI reason, take actions, and return meaningful outputs – not just one-shot completions.

LangGraph brings agent workflows to life through a graph-based architecture, which is perfect for branching logic and tool use.

3. React Frontend

A clean and responsive interface to interact with your AI agents. Uses useStream from the Vercel SDK for real-time UX.

4. SerpAPI

Brings live, contextual web search into your LangGraph agents. Your AI can now look things up before answering, and soon – cite sources too.

⚡ Why This Project Matters

If you’re building:

AI copilots
Research agents
Knowledge assistants
Workflow automation bots

...then starting with a strong, integrated foundation saves weeks of effort and gets you building features faster.

📌 What’s Next?

This is just the beginning. Here’s what I’m planning to add next:

Source Attribution
Show which web links or sources were used to generate the answer – transparency matters.
Composio Integration
Enable the agent to take action across multiple tools and APIs (e.g., Notion, Gmail, Google Calendar) with just one setup.
Complex LangGraph Architectures
Think beyond simple tool calls — build real multi-step agents with memory, conditional flows, and fallback strategies.
Mem0 for Memory
Persistent, contextual memory for users, allowing the agent to retain information across sessions and be truly helpful over time.

🤝 Contribute or Fork

This project is open-source and built to be extended. Whether you want to build your own custom AI product or contribute back, I’d love your feedback and ideas.

Check out the code and get started here:
👉 https://github.com/Sirsho29/dia-langchain

If you find it useful, give it a star 🌟 and feel free to share what you build!

💬 Let’s Connect

If you’re working on AI apps, agents, or just interested in the future of intelligent tooling – let’s talk!
I’d love to learn from others building in this space.

#LangChain #LangGraph #VercelAI #ReactJS #SerpAPI #AIagents #LLMs #OpenSource #Hackernotes #AItools

Let’s Decode Google I/O 2025

Sirsho Chakraborty — Thu, 22 May 2025 09:17:31 GMT

Some days you dive into code. Other days, you sit back, sip chai (or coffee if you must), and try to make sense of everything Sundar Pichai just casually dropped like it’s no big deal. Today’s one of those days. Google I/O 2025 was less of a keynote and more of a full-blown “AI is eating the world” performance. And honestly? I’m here for it.

Let’s break it down!

🚢 “Shipping at a Relentless Pace” – and they mean it

Gone are the days when tech giants waited for I/O to launch their shiny toys. In this Gemini era, Google just yeets state-of-the-art models on a random Tuesday. The highlight? Gemini 2.5 Pro now sweeps LMArena like it’s cleaning house. They’ve cranked their Elo scores up by over 300 points since the OG Gemini Pro. Casual.

Also, the new Ironwood TPUs? 42.5 exaflops per pod. That’s not a typo. That’s just obscene compute power casually packed into a machine. Apparently, inferential AI is the new GPU arms race – and Google’s lifting heavy.

🌍 480 Trillion Tokens Later…

Let’s talk scale. Last year they were processing 9.7 trillion tokens a month. Now it’s 480 trillion. That’s not growth. That’s exponential puberty. And it’s not just devs geeking out.

400M+ monthly Gemini users
7M+ developers building with Gemini
Usage on Vertex AI? Up 40x.

Google has clearly figured out how to get their models in – in your apps, in your systems, and soon, in your cereal.

👋 Project Starline → Google Beam: Holograms, But Real

Remember Starline – that futuristic 3D calling concept? It just got real. Now called Google Beam, it uses six cameras, real-time head tracking, and a new AI-first video model to give you full-on 3D presence.
Think Zoom… but if Zoom went to the gym, studied optics, and came out with an HP partnership.

Also – real-time speech translation in Google Meet is here. Matching your voice, tone, and facial expressions. English ↔ Spanish to start. So yes, your meetings might finally be both global and understandable.

👁️ Project Astra → Gemini Live

Now this is where it gets wild. Gemini Live is giving full-on Black Mirror vibes – in a good way (hopefully). Your phone camera becomes the eyes of the assistant. People are already using it to prep for interviews, plan marathons, and probably ask it if their outfit slaps.

Screen sharing, file uploads, and real-time assistant magic — all baked into your phone. Rolling out on Android now, iOS catching up.

🕹️ Project Mariner → Agent Mode

This one is huge. Google’s building the agent economy.
Agent Mode can now use a computer like a person – click things, search Zillow, filter listings, and even schedule a tour. With "teach and repeat", you only have to show it once.

It’s like having an intern. A very competent, tireless, slightly sentient intern.

Bonus points to Google for backing interoperability. Their Agent2Agent protocol is now playing nice with Anthropic’s Model Context Protocol. In plain English? Agents can now talk to each other like it’s the beginning of an AI Avengers crossover.

✉️ Personal Context – Smart Replies That Actually Sound Like You

Gemini will soon dig through your Docs, Drive, and Gmail (with permission) to craft hyper-personalised smart replies.
Your friend asks for road trip tips. Gemini will find that chaotic itinerary from 2018, capture your tone, and maybe even sneak in your usual “cheers, bro” signoff.

Gmail replies that actually sound like you wrote them? My inbox might finally stand a chance.

🔍 AI Mode in Search – A Full Redesign

We’ve seen AI Overviews. But now, AI Mode is a full-on tab in Google Search.

You can ask longer, more complex queries
You can follow-up naturally
You’ll actually want to scroll down

It’s now live in the U.S. and coming soon elsewhere. Google is effectively turning Search into a conversation — but with the world’s most overqualified librarian.

⚡ Gemini 2.5 Pro + Flash + Deep Think

We’re now entering boss-level model mode:

Gemini 2.5 Flash: Fast, cheap, and nearly as good as Pro.
Gemini 2.5 Pro: Getting a turbo boost called Deep Think — a new reasoning mode using parallel thinking.

It’s like Gemini got a brain upgrade and now thinks in multiple tabs simultaneously.

🎨 Media Models Go Full Hollywood

Enter Veo 3 (AI video with sound) and Imagen 4 (top-tier AI images). These are already in the Gemini app. And there’s Flow – a new filmmaker tool that lets you stitch scenes and extend clips.

If you’re creative, this is your playground. If you’re not, Flow might make you one.

💡 The Big Picture

What really stood out to me wasn’t just the firehose of features. It was about how personal this AI wave is getting.

From personalization in Gmail
To immersive video calls
To AI assistants that know your context, tone, and tasks
And models that think better, faster, deeper

Google’s clearly aiming for AI that doesn’t just work — it works for you. And maybe that’s the ultimate unlock: AI that understands your world, your files, your tone, your mess — and helps you get through the day like a silent, brilliant partner.

Final Thought

Sundar ended his talk with a sweet anecdote about his dad being wowed by Waymo. It’s easy to forget that the stuff we build — the tech, the models, the hype — eventually lands in the hands of real people. People who are just trying to get home, or reply to an email, or call their family from another city.

And when that tech makes life a little easier, a little more magical — that’s when it hits different.

Google I/O 2025 wasn’t just about product updates. It was a quiet declaration:
The future is here. It just wants to be useful.

If you're still reading, go play with the Gemini app — and maybe tell it to write your next email. You might be surprised by how much it sounds like… well, you.

PS – If you haven’t played the I/O game yet, just go and check it out here.

What Happens After You Hit “Batch”?

Sirsho Chakraborty — Wed, 21 May 2025 13:10:14 GMT

If you’ve read my previous post, you know how to structure a .jsonlfile, upload it, and create a batch request to OpenAI.

But what happens after the request is made?

This blog walks you through:

How to track a batch job
How to download and interpret the output
What to do when something fails
A few underrated batch methods you should know

Step 1: Wait and Watch (Fetching Status)

After creating a batch, the first thing you should do is check its status. This helps you:

Know if it’s still validating, in progress, or completed
Ensure there were no silent failures

You can poll the status like so:

batch = client.batches.retrieve(batch_id="your_batch_id")
print(batch.status)

Once the status flips to "completed" and output_file_id is available, you’re ready to extract the results.

Step 2: Get the Output (Safely)

Here’s the complete script I used to:

Fetch the output from OpenAI
Save it in both .jsonl and .json formats
Cleanly extract just the custom_id and final response message

from openai import OpenAI
import os
import json
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Initialize OpenAI client
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# Provide your completed batch ID
batch_id = "batch_682aed00c6d88190990751eb7966abeb"

# Retrieve batch info
batch = client.batches.retrieve(batch_id=batch_id)

# Ensure it's completed and output file exists
if batch.status != "completed" or not batch.output_file_id:
    raise ValueError(f"Batch not ready or missing output_file_id. Status: {batch.status}")

# Get the output content using `files.content`
output_file = client.files.content(batch.output_file_id)

# Save raw output as JSONL
lines = output_file.text
with open("openai_calls/batch_output.jsonl", "w", encoding="utf-8") as f:
    f.write(lines)

# Parse and save a clean JSON array
with open("openai_calls/batch_output.json", "w", encoding="utf-8") as f:
    json_data = [
        {
            'custom_id': json.loads(line)['custom_id'],
            'response': json.loads(line)['response']['body']['choices'][0]['message']['content']
        }
        for line in lines.splitlines()
    ]
    json.dump(json_data, f, indent=4)

This structure makes it easy to read, log, or even pipe into downstream analytics tools or databases.

Bonus Functions You Should Know

Batch APIs come with a couple of helpful utilities that make your workflow smoother:

1. List All Your Batches

See what you’ve run recently:

batches = client.batches.list()

Use this to track jobs across your team or workspace, especially when running multiple experiments.

2. Cancel a Batch (If You Catch a Mistake)

Did you spot an error in your batch input right after launching it? Cancel it before it starts processing:

client.batches.cancel(batch_id="your_batch_id")

Note: You can only cancel a batch while it's in the validating or queued stage. Once it moves to in_progress, it’s too late.

Summary: Life After Batch Creation

Here’s what a full batch lifecycle looks like:

Create → Upload input and start the batch
Track → Poll for status until completed
Download → Use the output file ID to fetch responses
Parse → Extract insights, summaries, or tags from the JSONL
Repeat or Cancel → Use list() to audit, or cancel() when needed

If you're working with any asynchronous, large-scale LLM task, batch APIs are not just a convenience; they're an optimisation layer.

What’s Next?

I’m currently chaining these batch summaries into:

Embedding pipelines (for search)
Auto-tagging workflows (for knowledge org)
Notification systems (summarize & alert)

If you're exploring something similar, feel free to fork the script or ping me. I’ll also share more about embedding and search in future blogs.

Stay tuned.

Read the previous blog → What are Batch APIs? feat. OpenAI
Docs: OpenAI Batch API Guide

What are Batch APIs? feat. OpenAI

Sirsho Chakraborty — Mon, 19 May 2025 11:32:52 GMT

🧠 TL;DR

Batch APIs are your best friends when you want to run large-scale LLM tasks – without maxing out your rate limits or sending requests one at a time like it’s 2022.

In this post:

Why batch APIs are useful
A real-world use case: summarizing a bunch of CXO emails
How to set it up with OpenAI
What I learned (and what to watch out for)

📨 The Problem

If you’re building anything LLM-powered, this probably sounds familiar:

“I’ve got hundreds of emails/docs/chats... and I want a summary for each.”

Now imagine calling OpenAI's chat endpoint 500 times, one after another. You’ll:

Hit rate limits
Burn through API tokens inefficiently
Lose time and, frankly, patience

So instead, we use…

🧩 Enter: Batch APIs

Batch APIs let you send a bunch of requests together – in a single file – and OpenAI will process them asynchronously on their side.

Here’s what makes them awesome:

✅ More efficient than real-time calls
✅ No need to manage retries or throttling
✅ Great for summarization, embedding, tagging, etc.

📌 Important: As of now, OpenAI only supports a 24h completion window. That means your batch gets processed within a day.

👉 OpenAI Docs: Batch APIs

🛠️ Real Use Case: Summarizing Emails from CXOs

I have prepared a dataset of internal and external emails (through ChatGPT) – like this one:

{
  "from": "customer@loyalclient.com",
  "to": "ceo@company.com",
  "subject": "Praise and Feedback: Exceptional Support Experience",
  "body": "I wanted to personally commend your support team—especially Priya and Omar..."
}

And I wanted to generate short summaries like:

"Michael O'Connor from LoyalClient Corp praised Priya and Omar for excellent integration support."

So I did what any tired dev would do - I batch processed all of them with GPT-4 using OpenAI’s Batch API.

🧪 Step-by-Step: Code to Batch Like a Pro

1. First, install the required package:

pip install openai python-dotenv

2. Set up your environment:

Make sure your .env file contains:

OPENAI_API_KEY="your-api-key-here"

3. Python Code (Batch Creation)

from openai import OpenAI
import json, os
from dotenv import load_dotenv

load_dotenv()
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

emails = json.load(open("./dataset/email_samples.json", "r", encoding="utf-8"))

with open("openai_calls/batch_input.jsonl", "w") as f:
    for i, email in enumerate(emails):
        prompt = f"From: {email['from']}\nSubject: {email['subject']}\n\n{email['body']}"
        obj = {
            "method": "POST",
            "url": "/v1/chat/completions",
            "body": {
                "model": "gpt-4.1-nano-2025-04-14",
                "messages": [
                    {
                        "role": "system",
                        "content": "You are an email summarizer. Summarize this email in 2–3 sentences. Make sure to include all important pointers in the email."
                    },
                    {"role": "user", "content": prompt}
                ]
            },
            "custom_id": f"email-{i}"
        }
        f.write(json.dumps(obj) + "\n")

4. Upload the input file to OpenAI

batch_input_file = client.files.create(
    file=open("openai_calls/batch_input.jsonl", "rb"),
    purpose="batch"
)
batch_input_file_id = batch_input_file.id
print(f"Uploaded batch file ID: {batch_input_file_id}")

5. Create the batch request

batch = client.batches.create(
    input_file_id=batch_input_file_id,
    endpoint="/v1/chat/completions",
    completion_window="24h",
    metadata={
        "description": "Summarize CXO email samples for AI blog"
    }
)
print("Batch request created:", batch)

⚠️ Don’t forget to store it somewhere – you’llbatch.id need it to track status and fetch results later.

🔁 Checking Status

batch = client.batches.retrieve(batch_id="your_batch_id_here")
print(batch.status)

Common statuses:

validating
in_progress
completed
failed

You can also list all batches:

batches = client.batches.list()
for b in batches:
    print(b.id, b.status)

🔎 When Should You Use Batch APIs?

Here’s a quick cheat sheet:

Use Case	Good for Batch?
Email summarization	✅
Document tagging	✅
Large-scale embedding	✅
Real-time chat	❌
Function calling with rapid response	❌

Batch APIs shine in background tasks where latency doesn’t matter – but cost, efficiency, and scale do.

📦 Future Ideas

I’m thinking of chaining this with:

Try embeddings
Automatic storage into a vector database
Semantic search across summaries
Tag generation (next experiment?)

🗨️ Final Thoughts

If you’re still sending LLM requests one by one and hitting rate limits – do yourself a favour: batch it.

It’s faster. Cheaper. And built exactly for use cases like summarizing, tagging, classification, etc.

Let me know if you want a GitHub template or help plugging it into your own workflow.

📎 Docs link again (bookmark this): OpenAI Batch API Guide

📎 Codebase: Github

📬 Got questions? DM me or drop a comment. Always happy to debug, rant, or batch together.

Enough theory; let's get our hands dirty!

Sirsho Chakraborty — Fri, 16 May 2025 15:28:31 GMT

Alright, fellow Gemini explorers, buckle up! In my last blog, we scratched the surface of the amazing things you can do with Vertex AI.

But enough theory, right? Let's dive into the good stuff: actually using the Gemini APIs with Python and the ever-so-handy Langchain.

What's in Your Toolkit?

Before we embark on this coding adventure, make sure you have a few essentials:

A Python Virtual Environment: This is like creating a neat little sandbox for our project, keeping all our specific tools (packages) in one place without messing with your main Python setup. If you haven't got one, setting it up is a breeze. Most Python installations come with venv. Just navigate to your project directory in your terminal and type:
```
 python -m venv gemini_blog_env
```
And to activate it:
- On macOS and Linux: source gemini_blog_env/bin/activate
- On Windows: .\gemini_blog_env\Scripts\activate You'll know it's active when you see your environment's name in the terminal prompt.
A Few Key Packages: We'll need to invite some friends to our coding party. The main guests are:
- langchain-google-genai: This is the star player, allowing Langchain to talk to Google's Gemini models.
- google-genai: The official Google AI Python SDK.
- python-dotenv (optional but recommended): Super useful for managing your precious API key without hardcoding it.
Your Gemini API Key: This is your golden ticket to access the Gemini models. You can grab one from Google AI Studio. Keep it secret, keep it safe!

Let's Get Installing!

Assuming your virtual environment is up and running (you'll see its name in your terminal prompt), let's install those packages. Open your terminal and type:

pip install langchain langchain-google-genai google-genai python-dotenv

Pip, Python's package installer, will fetch and install everything for you.

Time to Write Some Actual Code! (The Exciting Part!)

Alright, the stage is set. Let's get Langchain and Gemini to chat.

First, if you're using python-dotenv (which I highly recommend for keeping your API key secure), create a file named .env in your project directory and add your API key like this:

GOOGLE_API_KEY="YOUR_SUPER_SECRET_API_KEY_HERE"

Now, for the Python magic. Create a Python file (e.g., gemini_chat.py) and let's get coding:

import os
from dotenv import load_dotenv
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.prompts import ChatPromptTemplate
from langchain.schema import HumanMessage, SystemMessage

# Load environment variables from .env file
load_dotenv()

# Securely get your API key (optional if you set it directly)
# Make sure your GOOGLE_API_KEY is set in your environment or .env file
google_api_key = os.getenv("GOOGLE_API_KEY")
if not google_api_key:
    raise ValueError("GOOGLE_API_KEY not found in environment variables.")

# Initialize the Gemini LLM with Langchain
# You can choose different models like "gemini-2.0-flash" etc.
# Check the Google AI documentation for the latest model names and capabilities.
llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash-preview-04-17", google_api_key=google_api_key)

# Define our roles with System and Human messages
system_prompt_text = """I am writing a series on Learning Gemini in form of blogs.
I am writing these blogs while I am learning myself.
You are an expert in using Python, Langchain and Gemini APIs.
Help me write blogs on topics that I give."""

user_prompt_text = "Write a short blurb on Google Gemini."

# Create the messages
messages = [
    SystemMessage(content=system_prompt_text),
    HumanMessage(content=user_prompt_text)
]

# Let's get the response!
response = llm.invoke(messages)

print("Assistant's Response:")
print(response.content)

Run this script from your activated virtual environment: python gemini_chat.py

And voila! You should see Gemini, guided by your system prompt, generating a blurb about itself. Talk about meta!

Hold on, isn't this recursion? Deja Vu!

You got me! Asking an AI that I'm learning about to help me write a blog about learning that AI... but don't you worry your human heads; I'm still the one typing these blogs out, adding my own (questionable) humour and insights. No infinite AI loops here... yet! 😉

Let's Get Streamy: Implementing Streaming Responses

Sometimes, you don't want to wait for the whole answer to generate. You want it to flow, like a good conversation. Langchain and Gemini support streaming responses beautifully.

Here's how you can modify the code to get a streaming response:

import os
from dotenv import load_dotenv
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.schema import HumanMessage, SystemMessage

# Load environment variables from .env file
load_dotenv()

google_api_key = os.getenv("GOOGLE_API_KEY")
if not google_api_key:
    raise ValueError("GOOGLE_API_KEY not found in environment variables.")

llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash-preview-04-17", google_api_key=google_api_key, stream=True) # Note: stream=True can often be inferred

system_prompt_text = """I am writing a series on Learning Gemini in form of blogs.
I am writing these blogs while I am learning myself.
You are an expert in using Python, Langchain and Gemini APIs.
Help me write blogs on topics that I give."""

user_prompt_text = "Write a short blurb on Google Gemini, and make it snappy!"

messages = [
    SystemMessage(content=system_prompt_text),
    HumanMessage(content=user_prompt_text)
]

print("Assistant's Streaming Response:")
for chunk in llm.stream(messages):
    print(chunk.content, end="", flush=True)
print() # For a new line at the end

When you run this, you'll see the response appear chunk by chunk, which is pretty neat for more interactive applications.

And there you have it! Our first foray into coding with Gemini and Langchain. We've set up our environment, installed the necessary tools, had a (slightly recursive) chat with Gemini, and even made it stream its wisdom.

Adding the GitHub repo as well: https://github.com/Sirsho29/gemini_blog

Stay tuned for the next blog, where we'll dive deeper into more advanced features. Until then, happy coding, and don't let the AIs write all your content!

My First Week with Vertex AI & Gemini

Sirsho Chakraborty — Thu, 15 May 2025 05:14:33 GMT

First things first:
If you’re wondering where to begin —
👉 Head to Google Cloud Console → Vertex AI

Over the last few days, I’ve been diving into Gemini and Vertex AI — trying to figure out what’s changed, what’s possible, and how to actually use this stack to build useful stuff.

Spoiler: A lot has changed.
I started out watching some YouTube videos on fine-tuning Gemini, and half the UI references were already outdated. Turns out, Google AI Studio (which used to handle this) now mostly focuses on chat + API keys, while Vertex AI has become the main control room.

So, this blog is the first in a series I’ll be writing while I learn. I’m not an expert — just documenting what I’m trying, what’s working, and what’s breaking. If you’re building with Gemini or exploring Vertex AI, you might find this helpful (or at least relatable).

✅ Step Zero: Setup

Once you're inside the Vertex AI dashboard, the first thing you need to do is click
“Enable all recommended APIs.”
No fancy config required. That one click gets the engine running.

🧭 What You’ll Find Inside Vertex AI (as of now)

Here’s a quick overview of what I’ve explored so far. I’ll be digging into each of these in later posts — this is just a surface-level map to orient myself (and maybe you too).

🌱 Model Garden

This is where most of your playing around will start.

You get access to:

Gemini (of course)
Claude (Anthropic)
LLaMA models
DeepSeek
And a ton of open models from HuggingFace
Sadly, no OpenAI (they are “non-profits”)

You can try these out in the UI or deploy them into your own GCP setup.
⚠️ Just be careful with deployments — these models eat credits for breakfast.

🧠 Prompt Management

This one’s underrated.

It’s like a CMS for prompts — super useful if you’re experimenting a lot.
What it helps with:

Version control for your prompts
Decoupling prompts from your code
Performance tracking of different versions
And clean integration into your stack

📚 Prompt Gallery

This feels like a little idea board — a public collection of prompts to:

Learn better prompt writing
Find working examples
Share your own experiments

It’s basically a way to avoid reinventing the wheel, especially when you’re stuck.

🛠️ Finetuning

This is where things get deeper.
You can actually fine-tune models, including Gemini, by feeding in your own data or instructions. I’m still exploring this — but the idea is to shape the model’s behaviour beyond just writing clever prompts.

Expect a detailed post on this soon, as I am currently working on this!

🤖 Agent Garden

Think of this as a library of pre-built agents and toolkits.
Great for inspiration or if you want to fast-track a prototype without starting from zero.

⚙️ Agent Engine

Once you’ve built your agent, this is where you deploy and manage it.

You get:

Fully managed infra
Built-in testing + monitoring
Support for any framework you use

Basically, this takes care of the backend mess so you can focus on logic and UX.

🧾 Datasets

Vertex AI also provides a managed dataset service, so you can store, access, and train on your data directly inside GCP. No bucket shuffling or permission hell.

🧪 Model Development

This covers the full journey — from:

Defining your problem
Prepping data
Training + evaluating
Deploying your model

Whether you’re building an agent, a classifier, or something niche — this is where the actual ML workflow lives.

💡 What’s Next?

This blog was just me getting familiar with the Vertex AI ecosystem.
I’ll be learning finetuning, prompt design, agent workflows, and Gemini-specific tricks next — and documenting all of it as I go.

If you’re exploring this space too, feel free to build alongside. Ping me if you get stuck or figure out something I haven’t covered yet — I would love to include it in future posts.

Follow the Series →

I’ll be publishing everything here on Hashnode as I go. No fluff — just real-time learning, mistakes, and progress.

And maybe at the end of this series, we’ll both have built something cool.

Quick AI Model Cost Estimator

Sirsho Chakraborty — Thu, 15 May 2025 04:23:06 GMT

Like many of you building with LLMs, I often found myself jumping between multiple documentation pages just to figure out how much a certain query would cost across different models.

And let’s be honest — no one has time to memorize OpenAI’s per-million token costs, compare them with Anthropic, DeepSeek, or Gemini, and then mentally compute costs based on input/output tokens and query types.

So… I built myself a simple cost calculator that does exactly what I need:
📍 Give me an approximate cost of running a specific type of query on a selected model.

💡 Why I Built It

I was repeatedly:

Searching for the latest OpenAI pricing
Comparing it with Claude or Gemini
Trying to remember if 75 words = 100 tokens or the other way around
Doing math in my head or a notepad every time I needed to estimate costs

It got old, fast.

So I made a small spreadsheet-based calculator that lets me:

✅ Pick a query type (normal, research, function calling, or custom)
✅ Choose a model from OpenAI, Anthropic, Google, or DeepSeek
✅ Instantly get a final estimated cost for a fixed number of queries (default: 10)

🧮 What It Does

Behind the scenes, the calculator:

Uses a standard 75 words = 100 tokens conversion
Maps query types to average input/output token counts
Pulls in model-specific cost per million tokens
Computes the total cost for the number of queries selected

And that’s it. Simple, fast, and surprisingly handy.

🔧 What's Inside?

Here’s a peek at what powers the tool:

Component	Purpose
Query Type	Sets average input/output tokens
Model Selection	Pulls cost per million tokens
Token Math	Computes cost per query type/model
Final Cost	Combines everything × number of queries

📌 Why Approximate?

This tool isn’t meant to be 100% accurate to the last decimal — it’s designed for:

Quick ballparks during architecture discussions
Budget estimates before production usage
Cost comparisons across providers/models

In real-world use, actual token counts vary due to system messages, model verbosity, and temperature settings. But this gives you a solid directional cost estimate.

🛠️ What's Next?

I'm thinking of extending it with:

Support for embedding models
Somehow integrate caching values
Batch API estimations
Adjustable verbosity (to estimate more or fewer output tokens)
Overhead token buffers for function calling

Let me know if that would be useful — or if you want a copy to use or contribute to!

🔗 Want to Try It?

https://docs.google.com/spreadsheets/d/18-TPKGPVEiYMH5I0dt9YMRQuL_--mqE-HbdxtW4vkzc/edit?gid=0#gid=0

Any feedback? Just drop a comment or DM me on Twitter/X.

💬 Ever built your own utility out of frustration? Share your tool or workflow below!