Understanding Your App's Performance with Loggy Metrics
A deep dive into Loggy's performance metrics — RPM, response times, throughput, and status code breakdowns. Learn how to spot bottlenecks, set SLOs, and keep your services running smoothly.
Performance
🚀 Production API — last 24 hours
Avg RPM
120.5
+18% vs yesterday
Avg Response
94ms
-12ms vs yesterday
P99 Latency
485ms
< 500ms SLO ✓
Error Rate
1.3%
638 total 5xx
There’s a moment every developer hits eventually — usually at 2 AM, usually when something important is broken — where you realize that logs alone aren’t enough. You can see that something went wrong, but you can’t see the bigger picture. Was this a one-off blip, or has your API been slowly getting slower for the past three days? Are you handling more traffic than last week, or less? Is that new database query you shipped actually faster than the old one, or did you just convince yourself it was during local testing?
That’s the gap performance metrics fill. They give you the bird’s-eye view of how your application is actually behaving over time, not just individual request-by-request snapshots, but real trends and patterns that tell a story about your system’s health.
What We Actually Track
When we built Loggy’s performance metrics, we wanted to focus on the numbers that actually matter for day-to-day operations rather than drowning you in hundreds of charts you’d never look at. Here’s what you get, and more importantly, why each metric matters.
Requests per minute (RPM) is probably the most fundamental metric there is. It tells you how much work your application is doing at any given moment. But RPM isn’t just about knowing your traffic volume — it’s about understanding your traffic patterns. When you can see that your API handles 50 RPM at 3 AM but spikes to 300 RPM at 10 AM, you start making better decisions about scaling, caching, and rate limiting. You notice when traffic drops unexpectedly (which usually means something is broken upstream) or when it spikes beyond what you’ve planned for.
Response times are where things get really interesting. We track the average, the minimum, the maximum, and the percentiles — and honestly, the average is often the least useful of those numbers. Here’s why: if 99 requests take 50ms and one request takes 10 seconds, your average is about 150ms, which sounds perfectly fine. But that one user who waited 10 seconds? They’re furious. That’s why we prominently surface the P99 latency (the response time that 99% of requests are faster than), because it tells you how your worst experiences feel, not just your typical ones.
Throughput — bytes in and bytes out — helps you understand the physical volume of data moving through your system. This matters more than people think. If your response sizes suddenly jump from 2KB to 200KB because someone added a new field that includes an entire user object instead of just an ID, throughput metrics will catch that before your bandwidth bill does. It’s also crucial for understanding the relationship between payload size and response time. Often the “slow” endpoint isn’t actually slow — it’s just returning way more data than it should be.
Status code breakdowns give you the health check at a glance. We bucket these into the standard HTTP categories: 2xx (success), 3xx (redirects), 4xx (client errors), and 5xx (server errors). The ratio between these tells you a lot. A high 4xx rate might mean your API documentation is confusing, or that a client is sending malformed requests. A sudden spike in 5xx means something is genuinely broken on your end and needs attention now.
Setting It Up
Getting performance metrics flowing into Loggy takes about two minutes if you’re using our SDK, which is one of those things we’re genuinely proud of. Most APM tools require you to install agents, configure exporters, set up collectors, and pray everything connects properly. With Loggy, you add a middleware and you’re done.
Here’s the Node.js setup with Express:
import express from 'express';
import { CreateLoggy, CreateMetrics } from '@loggydev/loggy-node';
const app = express();
const loggy = CreateLoggy({ accessToken: process.env.LOGGY_TOKEN });
const metrics = CreateMetrics({
accessToken: process.env.LOGGY_TOKEN,
flushIntervalMs: 60000 // aggregate and send every minute
});
// Add the metrics middleware — that's it
app.use(metrics.middleware());
app.get('/api/users', (req, res) => {
// Your normal route handler
res.json({ users: [] });
});
app.listen(3000);
And here’s the equivalent in Go:
package main
import (
"net/http"
loggy "github.com/loggy-dev/loggy-go"
)
func main() {
metrics := loggy.NewMetrics(loggy.MetricsConfig{
AccessToken: os.Getenv("LOGGY_TOKEN"),
FlushInterval: 60 * time.Second,
})
mux := http.NewServeMux()
mux.HandleFunc("/api/users", handleUsers)
// Wrap your handler with the metrics middleware
http.ListenAndServe(":3000", metrics.Middleware(mux))
}
The middleware automatically captures the request method, path, status code, response time, and bytes transferred for every request that passes through it. It aggregates this data locally and flushes it to Loggy every minute (or whatever interval you configure), so you’re not adding per-request overhead to your API.
One thing worth mentioning: the middleware intentionally doesn’t capture request or response bodies. That would be a privacy nightmare and would add significant overhead. It only captures the metadata — timing, status codes, and byte counts. If you need to see what’s actually in the requests, that’s what logging is for.
Reading the Dashboard
Once metrics start flowing in, the performance page in your Loggy dashboard comes alive with charts and numbers. Let me walk you through how to actually read them, because raw numbers without context are about as useful as a speedometer without knowing the speed limit.
Performance
🚀 Production API — last 24 hours
Avg RPM
120.5
+18% vs yesterday
Avg Response
94ms
-12ms vs yesterday
P99 Latency
485ms
< 500ms SLO ✓
Error Rate
1.3%
638 total 5xx
The top row shows your summary stats for the selected time window. The Avg RPM tells you your baseline traffic level, while the comparison to the previous period helps you spot trends. If RPM is climbing week over week, you might need to start thinking about scaling. If it suddenly drops, something might be wrong with your load balancer, DNS, or an upstream service that sends you traffic.
The RPM chart shows traffic over time, and this is where patterns jump out at you. Most web applications have a very recognizable daily cycle — low traffic at night, ramping up in the morning, peaking in the afternoon, and tapering off in the evening. When you see deviations from this pattern, pay attention. A flat line at 3 PM on a Tuesday usually means something is wrong, not that everyone decided to take the afternoon off.
The Response Time chart overlays your average and P99 latency, and ideally you want these two lines to be relatively close together. A big gap between average and P99 means you have a long tail of slow requests that are making some users miserable while most users are having a fine experience. If you’ve set up an SLO (service level objective), you’ll see a dashed line showing your target — like “P99 must be under 500ms.” When your actual P99 creeps close to or above that line, it’s time to investigate.
The Status Code donut gives you a quick health check. In a healthy system, you want to see that donut almost entirely green (2xx). A thin amber sliver of 4xx responses is normal — that’s clients making bad requests, which happens in any real-world API. But if the red 5xx slice is anything more than a sliver, you have a problem worth investigating.
Spotting and Diagnosing Performance Issues
Here’s where performance metrics really earn their keep. Let me walk through some real scenarios you might encounter and how the metrics help you figure out what’s going on.
The gradual slowdown is one of the trickiest issues to catch without metrics. Your API was responding in 80ms on average last week. This week it’s 95ms. Next week it’s 120ms. No single deployment caused it, no error spike accompanies it — it’s just getting slower. This almost always points to a growing dataset problem. A database query that was fast with 10,000 rows is getting slower as the table approaches a million rows. Or a cache that used to have a 90% hit rate is now at 60% because the working set has grown. The response time chart makes this visible in a way that individual request logs never would.
The endpoint outlier shows up clearly in the top endpoints table. When you see that /api/v1/reports/generate has a P99 of 4,200ms while every other endpoint is under 200ms, you’ve found your problem child. Now you can focus your optimization efforts where they’ll actually matter, instead of trying to shave 5ms off an endpoint that’s already fast enough.
The traffic spike is the classic scenario. Your RPM suddenly doubles and response times go through the roof. The metrics help you understand the sequence of events — did response times increase because of the traffic spike (capacity problem), or did something else slow down your responses and cause requests to pile up (dependency problem)? If response times were already climbing before the RPM spike, you probably have a downstream dependency that’s struggling. If response times were fine until the RPM jumped, you need more capacity.
The silent failure is when your 5xx rate creeps up slowly enough that no individual alert fires, but over a few hours it goes from 0.1% to 3%. The status code breakdown chart makes this visible even when absolute numbers are small. Three 500 errors out of 100 requests is a 3% error rate and deserves attention, even though three errors in isolation might not trigger an alert.
Combining Metrics with Logs and Traces
Performance metrics are powerful on their own, but they become even more useful when you combine them with Loggy’s other observability tools. The metrics tell you that something is slow or failing. The logs and traces tell you why.
Here’s a workflow that we find ourselves using constantly: you notice on the performance dashboard that P99 latency for your API has spiked in the last hour. You click through to the traces view for the same project and filter by duration — show me traces longer than 500ms. You find a trace that took 2.3 seconds, expand it, and see that the database query span took 2.1 seconds of that time. You click through to the correlated logs and see a warning: “sequential scan on users table, consider adding index on email column.” Problem identified, solution clear, and you didn’t have to guess or reproduce the issue locally.
This is the real power of having logging, tracing, and metrics in one platform. There’s no context-switching between three different tools, no trying to correlate timestamps across systems, no wondering if the log you’re looking at corresponds to the slow request you saw in your metrics tool. Everything is connected because it all lives in the same place.
Setting Up Alerts on Metrics
Once you’re comfortable reading the dashboard, the natural next step is setting up alerts so you don’t have to stare at charts all day. Loggy lets you create alert rules based on your metrics — for example, you might want to know if your P99 response time exceeds 500ms for more than 5 minutes, or if your 5xx rate goes above 2%.
The key to good alerting on performance metrics is avoiding both false positives and false negatives. Set your thresholds based on your actual traffic patterns, not on what you think they should be in an ideal world. If your P99 normally sits around 300ms and occasionally spikes to 450ms during peak hours, setting an alert at 400ms is going to wake you up for no reason. Set it at 600ms instead — something that’s genuinely abnormal and worth investigating.
Cooldown periods are your friend here too. A single minute where P99 hits 550ms might just be a garbage collection pause or a briefly slow database query. But if it stays elevated for 5 or 10 minutes, that’s a real problem. Configure your alert cooldowns to match the urgency of the metric — response time alerts can usually tolerate a few minutes of cooldown, while a 5xx spike alert should probably have a shorter cooldown because server errors affect real users immediately.
Retention and Limits
Performance metrics are available on Pro and Team plans. On Pro, you get 7 days of metric retention, which is enough to compare this week to last week and spot short-term trends. On Team, you get 30 days, which lets you do month-over-month comparisons and track the impact of larger initiatives like database migrations or infrastructure changes.
The metrics middleware aggregates data into one-minute buckets before sending, which keeps the data volume manageable while still giving you minute-level granularity. For most applications, minute-level resolution is more than sufficient — if you need to debug something that happened within a specific second, that’s what traces and logs are for.
The Bottom Line
Performance metrics close the gap between “I know my app works” and “I know my app works well.” They turn vague feelings (“the API seems slower lately”) into concrete data (“P99 latency increased 40% over the past week, concentrated on the /api/search endpoint”). And when something does go wrong, they give you the context you need to start debugging from the right place instead of flailing around in log files.
If you’re already using Loggy for logging, adding performance metrics is genuinely a two-minute setup. Drop in the middleware, deploy, and within a few minutes you’ll start seeing your first data points. You might be surprised by what you learn about how your application actually behaves in production — most developers are.