:::note
**TL;DR**
- A complete natal chart from the API is about 5,774 LLM tokens, roughly 22 kilobytes of JSON.
- Re-read on every turn of a 10-message chat, that is about 58,000 input tokens of context for the chart alone.
- Send the model only the fields it needs (planet, sign, retrograde) and the same chart drops to 216 tokens, a 96 percent cut.
- Fetch on demand over Remote MCP instead of pinning the whole chart in every prompt.
:::

You wire an astrology API into an AI chatbot, drop the natal chart into the system prompt, and the language-model bill climbs faster than your traffic. The cause is tokens. A full birth chart is a dense object: ten plus planets, twelve houses, every aspect, and an interpretation attached to each, and the model has to read all of it on every turn. Most of that data never reaches the answer. This post measures the real token cost of an astrology API response with a GPT tokenizer, shows why feeding the whole payload to a model is the expensive part, and gives two fixes that each cut it by more than 90 percent: trim the response to the fields your prompt needs, and ground the model on demand over Remote MCP. The good news is that the waste sits in your own code, not the API, so it is entirely under your control. Every number below is measured on the live endpoint, so you can reproduce it.

## How many LLM tokens does an astrology chart payload cost?

A complete natal chart from the API is about 5,774 LLM tokens, measured with a GPT-4 class tokenizer on the live endpoint. That is roughly 22 kilobytes of JSON holding every planet with its sign, degree, speed, retrograde flag, and house, all twelve house cusps, the full aspect grid, and an interpretation for each placement. The structure is lean, about 3.9 bytes per token, so the weight is the chart itself, not formatting.

::: stat 5,774 tokens
**One complete natal chart payload**, counted with a GPT-4 class tokenizer on the live [natal chart endpoint](/api-reference#tag/western-astrology/POST/astrology/natal-chart "natal chart API endpoint reference and live playground"). Pull the same response and count it yourself.
:::

For scale, a short paragraph of text is around 50 tokens and a typical system prompt is a few hundred. One chart is larger than the instructions you wrote for the entire assistant. That is fine when the chart is the point, but a chatbot rarely needs all of it to answer a single question, which is exactly where the cost leaks. The fix is not a cheaper model or a smaller plan, it is sending less to the model in the first place.

Ready to build this? The [Astrology API](/products/astrology-api "production astrology API for natal charts, transits, and synastry") returns clean, structured chart data you can trim to size. [See pricing](/pricing "RoxyAPI pricing tiers").

## Why does feeding full payloads to an LLM get expensive?

Language models are stateless, so they re-read their entire context on every turn. Anything pinned in the system prompt is billed again on each message. Put a 5,774-token chart in the prompt and a ten-message conversation pays for it ten times, about 58,000 input tokens, before you add the user messages, the replies, or any retrieved data. Across thousands of sessions the chart, not the conversation, dominates the bill.

It also crowds the window. Every token of chart is a token the model cannot spend on chat history or instructions, so long conversations truncate sooner and answers drift. Bigger inputs raise latency too, because the model reads more before it writes anything. The cost compounds in two directions at once, more tokens per call and more calls per session, so a feature that looked cheap in a demo becomes the biggest line on the model bill at scale. The full payload is the right thing to keep in your backend and the wrong thing to hand a model on every turn.

## How do you cut the token cost of an astrology API response?

Send the model only the fields it reasons over. Most chart questions turn on the planet, its sign, and whether it is retrograde, which for all bodies is 216 tokens, a 96 percent cut from the full 5,774. Because the response is flat structured JSON rather than deeply nested prose, the filter is a one-line map over the planets array. The interpretation text and high-precision coordinates stay server side until a turn actually calls for them.

The same idea scales by question. Pass houses only when the user asks about houses, aspects only for a synastry reading, the daily transit only when they ask what is happening today. Keep the raw chart in your database for rendering a wheel or a report, and feed the model a task-shaped slice. Cache that slice keyed by the birth data and house system, and repeat questions in the same session cost nothing extra. A lean payload is what makes the slice a one-liner instead of a parsing project.

## What makes an astrology API response easy to trim for an LLM?

A payload is easy to trim when its shape is predictable: flat objects, one value per key, and prose kept in its own field rather than woven through the data. The chart returned here is an array of planet objects with the same keys every time, so a single map reaches exactly the fields you want with no unwrapping. Coordinates, house, and interpretation each sit in a named field you can keep or drop in one line.

Verbose, deeply nested response shapes do the opposite. When a sign is buried several levels down, repeated as a label in three places, or returned as a sentence of ready-made HTML, the same chart costs more tokens and resists filtering, so teams give up and forward the whole thing to the model. Lean structured JSON is the difference between a one-line trim and a parser, and it is also why the data drops cleanly into a tool schema for Remote MCP.

## How to fetch a chart and feed your LLM only what it needs

Resolve coordinates with location search, request the natal chart, then map the planets array to the few fields your prompt needs before it reaches the model. For a multi-turn chatbot, do not pin the chart in the system prompt at all. Expose the API as tools over Remote MCP so the agent fetches the chart on the one turn that needs it and reasons over the lean result, the same grounding pattern that stops it from inventing placements.

:::tabs
### curl
```bash
# 1. Resolve coordinates and timezone
curl -s "https://roxyapi.com/api/v2/location/search?q=New%20York" \
  -H "X-API-Key: $ROXY_KEY"

# 2. Full chart, then trim to the lean 216-token subset with jq
curl -s -X POST "https://roxyapi.com/api/v2/astrology/natal-chart" \
  -H "X-API-Key: $ROXY_KEY" \
  -H "Content-Type: application/json" \
  -d '{"date":"1990-07-15","time":"14:30:00","latitude":40.7128,"longitude":-74.006,"timezone":"America/New_York"}' \
  | jq '[.planets[] | {name, sign, isRetrograde}]'
```
### TypeScript
```ts
const res = await fetch("https://roxyapi.com/api/v2/astrology/natal-chart", {
  method: "POST",
  headers: {
    "X-API-Key": process.env.ROXY_KEY!,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    date: "1990-07-15",
    time: "14:30:00",
    latitude: 40.7128,
    longitude: -74.006,
    timezone: "America/New_York",
  }),
});

const chart = await res.json();
// 5,774 tokens down to 216: hand the model only what it reasons over
const lean = chart.planets.map((p) => ({
  name: p.name,
  sign: p.sign,
  isRetrograde: p.isRetrograde,
}));
```
### Python
```python
import os, requests

res = requests.post(
    "https://roxyapi.com/api/v2/astrology/natal-chart",
    headers={"X-API-Key": os.environ["ROXY_KEY"]},
    json={
        "date": "1990-07-15",
        "time": "14:30:00",
        "latitude": 40.7128,
        "longitude": -74.006,
        "timezone": "America/New_York",
    },
)

chart = res.json()
lean = [
    {"name": p["name"], "sign": p["sign"], "isRetrograde": p["isRetrograde"]}
    for p in chart["planets"]
]
```
:::

:::tip
For a chatbot, ground the model over [Remote MCP](/docs/mcp "Remote MCP servers expose every RoxyAPI domain as tools for AI agents") so the chart is fetched on the turn that needs it, not pinned in every prompt. The same pattern keeps the model from [making up birth charts](/blogs/ai-chatbot-hallucinates-birth-charts-grounding "ground an AI chatbot so it stops inventing birth charts").
:::

## FAQ

**How many tokens is an astrology API response?**
A complete natal chart from the RoxyAPI astrology API is about 5,774 tokens, measured with a GPT-4 class tokenizer, or roughly 22 kilobytes of JSON. That covers every planet, all twelve houses, the aspect grid, and an interpretation for each placement. A positions-only view of the same chart is about 216 tokens.

**Why is my AI astrology chatbot so expensive to run?**
Most likely you are pinning the full chart in the system prompt. Models re-read their whole context on every turn, so a 5,774-token chart is billed again on each message, about 58,000 input tokens across a ten-message chat. The conversation is cheap; the un-trimmed chart is what runs up the bill.

**How do I reduce LLM token usage with an astrology API?**
Send the model only the fields it reasons over. Trimming a full chart to planet, sign, and retrograde state drops it from 5,774 to 216 tokens, a 96 percent cut, and the RoxyAPI response is flat structured JSON so the filter is one line. Keep the full payload in your backend for rendering and reports.

**Does Remote MCP lower token costs?**
Remote MCP lowers context cost by changing when data enters the prompt. Instead of pinning a chart in every message, the agent calls the RoxyAPI tool on the one turn that needs it and reasons over that result. The model also stops inventing placements because it is grounded on real calculated data.

**Is a smaller API plan the way to cut LLM costs?**
No. Your API plan covers requests to RoxyAPI; your token bill is what you forward to the language model. The lever is the size of the payload you hand the model, not the size of your plan. One Starter plan covers a busy chatbot across all 12 domains, so trim the response and ground over Remote MCP instead.

**How do I measure the token cost of my own API responses?**
Save a RoxyAPI response to a file and run it through a GPT or Claude tokenizer rather than guessing from byte size. The structured JSON runs about four bytes per token, so a 22 kilobyte chart lands near 5,774. Counting your actual payloads tells you which fields to trim before they reach the model.

## Conclusion

The token bill of an AI astrology app is set in your code, not your API plan. Measure your payloads, hand the model only the fields a turn needs, and ground it on demand over Remote MCP, and the same chart that cost 5,774 tokens costs a couple hundred. Start with the [Astrology API](/products/astrology-api "production astrology API with clean, structured chart data") and trim from there.