Pete 5:41 pm on June 30, 2025
Tags: AI ( 2 ), MCP ( 2 ), python, Technology ( 7 )

Implementing OAuth with an MCP (Model Context Protocol) AI Test Server: A Technical Deep Dive

I wasn’t really sure I even wanted to write this up — mostly because there are some limitations in MCP that make things a little… awkward. But I figured someone else is going to hit the same wall eventually, so here we are.

If you’re trying to use OAuth 2.0 with MCP, there’s something you should know: it doesn’t support the full OAuth framework. Not even close.

MCP only works with the default well-known endpoints:

/.well-known/openid-configuration
/.well-known/oauth-authorization-server

Before we get going, let me write this in the biggest and most awkward text I can find.

Run the Device Flow outside of MCP, then inject the token into the session manually.

And those have to be hosted at the default paths, on the same domain as the issuer. If you’re using split domains, custom paths, or a setup where your metadata lives somewhere else (which is super common in enterprise environments)… tough luck. There’s no way to override the discovery URL.

It also doesn’t support other flows like device_code, jwt_bearer, or anything that might require pluggable negotiation. You’re basically stuck with the default authorization code flow, and even that assumes everything is laid out exactly the way it expects.

So yeah — if you’re planning to hook MCP into a real-world OAuth deployment, just be aware of what you’re signing up for. I wish this part of the protocol were a little more flexible, but for now, it’s pretty locked down.

Model Context Protocol (MCP) is an emerging standard for AI model interaction that provides a unified interface for working with various AI models. When implementing OAuth with an MCP test server, we’re dealing with a specialized scenario where authentication and authorization must accommodate both human users and AI agents.

This technical guide covers the implementation of OAuth 2.0 in an MCP environment, focusing on the unique requirements of AI model authentication, token exchange patterns, and security considerations specific to AI workflows.

Prerequisites

Before implementing OAuth with your MCP test server:

MCP Server Setup: A running MCP test server (v0.4.0 or later)
Developer Credentials: Client ID and secret from the MCP developer portal
OpenSSL: For generating key pairs and testing JWT signatures
Understanding of MCP’s Auth Requirements: Familiarity with MCP’s auth extensions for AI contexts

Section 1: MCP-Specific OAuth Configuration

1.1 Registering Your Application

MCP extends standard OAuth with AI-specific parameters:

curl -X POST https://auth.modelcontextprotocol.io/register \
  -H "Content-Type: application/json" \
  -d '{
    "client_name": "Your AI Agent",
    "client_type": "ai_service",  # MCP-specific client type
    "grant_types": ["authorization_code", "client_credentials"],
    "redirect_uris": ["https://your-domain.com/auth/callback"],
    "scopes": ["mc:inference", "mc:fine_tuning"],  # MCP-specific scopes
    "ai_metadata": {  # MCP extension
      "model_family": "your-model-family",
      "capabilities": ["text-generation", "embeddings"]
    }
  }'

1.2 Understanding MCP’s Auth Flows

MCP supports three primary OAuth flows:

Standard Authorization Code Flow: For human users interacting with MCP via UI
Client Credentials Flow: For server-to-server AI service authentication (sucks, doesn’t work, even the work arounds, don’t do it)
Device Flow: For headless AI environments

Section 2: Implementing Authorization Code Flow

2.1 Building the Authorization URL

MCP extends standard OAuth parameters with AI context:

from urllib.parse import urlencode

auth_params = {
    'response_type': 'code',
    'client_id': 'your_client_id',
    'redirect_uri': 'https://your-domain.com/auth/callback',
    'scope': 'openid mc:inference mc:models:read',
    'state': 'anti-csrf-token',
    'mcp_context': json.dumps({  # MCP-specific context
        'model_session_id': 'current-session-uuid',
        'intended_use': 'interactive_chat'
    }),
    'nonce': 'crypto-random-string'
}

auth_url = f"https://auth.modelcontextprotocol.io/authorize?{urlencode(auth_params)}"

2.2 Handling the Callback

The MCP authorization server will return additional AI context in the callback:

@app.route('/auth/callback')
def callback():
    auth_code = request.args.get('code')
    mcp_context = json.loads(request.args.get('mcp_context', '{}'))  # MCP extension

    token_response = requests.post(
        'https://auth.modelcontextprotocol.io/token',
        data={
            'grant_type': 'authorization_code',
            'code': auth_code,
            'redirect_uri': 'https://your-domain.com/auth/callback',
            'client_id': 'your_client_id',
            'client_secret': 'your_client_secret',
            'mcp_context': request.args.get('mcp_context')  # Pass context back
        }
    )

    # MCP tokens include AI-specific claims
    id_token = jwt.decode(token_response.json()['id_token'], verify=False)
    print(f"Model Session ID: {id_token['mcp_session_id']}")
    print(f"Allowed Model Operations: {id_token['mcp_scopes']}")

Section 3: Client Credentials Flow for AI Services

3.1 Requesting Machine-to-Machine Tokens

import requests

response = requests.post(
    'https://auth.modelcontextprotocol.io/token',
    data={
        'grant_type': 'client_credentials',
        'client_id': 'your_client_id',
        'client_secret': 'your_client_secret',
        'scope': 'mc:batch_inference mc:models:write',
        'mcp_assertion': generate_mcp_assertion_jwt()  # MCP requirement
    },
    headers={'Content-Type': 'application/x-www-form-urlencoded'}
)

token_data = response.json()
# MCP includes additional AI context in the response
model_context = token_data.get('mcp_model_context', {})

3.2 Generating MCP Assertion JWTs

MCP requires a signed JWT assertion for client credentials flow:

import jwt
import datetime

def generate_mcp_assertion_jwt():
    now = datetime.datetime.utcnow()
    payload = {
        'iss': 'your_client_id',
        'sub': 'your_client_id',
        'aud': 'https://auth.modelcontextprotocol.io/token',
        'iat': now,
        'exp': now + datetime.timedelta(minutes=5),
        'mcp_metadata': {  # MCP-specific claims
            'model_version': '1.2.0',
            'deployment_env': 'test',
            'requested_capabilities': ['inference', 'training']
        }
    }

    with open('private_key.pem', 'r') as key_file:
        private_key = key_file.read()

    return jwt.encode(payload, private_key, algorithm='RS256')

Section 4: MCP Token Validation

4.1 Validating ID Tokens

MCP ID tokens include standard OIDC claims plus MCP extensions:

from jwt import PyJWKClient
from jwt.exceptions import InvalidTokenError

def validate_mcp_id_token(id_token):
    jwks_client = PyJWKClient('https://auth.modelcontextprotocol.io/.well-known/jwks.json')

    try:
        signing_key = jwks_client.get_signing_key_from_jwt(id_token)
        decoded = jwt.decode(
            id_token,
            signing_key.key,
            algorithms=['RS256'],
            audience='your_client_id',
            issuer='https://auth.modelcontextprotocol.io'
        )

        # Validate MCP-specific claims
        if not decoded.get('mcp_session_id'):
            raise InvalidTokenError("Missing MCP session ID")

        return decoded
    except Exception as e:
        raise InvalidTokenError(f"Token validation failed: {str(e)}")

4.2 Handling MCP Token Introspection

def introspect_mcp_token(token):
    response = requests.post(
        'https://auth.modelcontextprotocol.io/token/introspect',
        data={
            'token': token,
            'client_id': 'your_client_id',
            'client_secret': 'your_client_secret'
        }
    )

    introspection = response.json()
    if not introspection['active']:
        raise Exception("Token is not active")

    # Check MCP-specific introspection fields
    if 'mc:inference' not in introspection['scope'].split():
        raise Exception("Missing required inference scope")

    return introspection

Section 5: MCP-Specific Considerations

5.1 Handling Model Session Context

MCP tokens include session context that must be propagated:

def call_mcp_api(endpoint, access_token):
    headers = {
        'Authorization': f'Bearer {access_token}',
        'X-MCP-Context': json.dumps({
            'session_continuity': True,
            'model_temperature': 0.7,
            'max_tokens': 2048
        })
    }

    response = requests.post(
        f'https://api.modelcontextprotocol.io/{endpoint}',
        headers=headers,
        json={'prompt': 'Your AI input here'}
    )

    return response.json()

5.2 Token Refresh with MCP Context

def refresh_mcp_token(refresh_token, mcp_context):
    response = requests.post(
        'https://auth.modelcontextprotocol.io/token',
        data={
            'grant_type': 'refresh_token',
            'refresh_token': refresh_token,
            'client_id': 'your_client_id',
            'client_secret': 'your_client_secret',
            'mcp_context': json.dumps(mcp_context)
        }
    )

    if response.status_code != 200:
        raise Exception(f"Refresh failed: {response.text}")

    return response.json()

Section 6: Testing and Debugging

6.1 Using MCP’s Test Token Endpoint

curl -X POST https://test-auth.modelcontextprotocol.io/token \
  -H "Content-Type: application/x-www-form-urlencoded" \
  -d "grant_type=client_credentials" \
  -d "client_id=test_client" \
  -d "client_secret=test_secret" \
  -d "scope=mc:test_all" \
  -d "mcp_test_mode=true" \
  -d "mcp_override_context={\"bypass_limits\":true}"

6.2 Analyzing MCP Auth Traces

Enable MCP debug headers:

headers = {
    'Authorization': 'Bearer test_token',
    'X-MCP-Debug': 'true',
    'X-MCP-Traceparent': '00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01'
}

Why did this suck so much?

Implementing OAuth with an MCP test server requires attention to MCP’s AI-specific extensions while following standard OAuth 2.0 patterns. Key takeaways:

Always include MCP context parameters in auth flows
Validate MCP-specific claims in tokens
Propagate session context through API calls
Leverage MCP’s test endpoints during development

For production deployments, ensure you:

Rotate keys and secrets regularly
Monitor token usage patterns
Implement proper scope validation
Handle MCP session expiration gracefully

Pete 12:45 am on June 25, 2025

Linux Kernel 6.1.6 rc3

What’s new? Oh, just your usual dose of 1,000+ micro-patches, mysterious scheduler “optimizations,” and a whole bunch of drivers you didn’t know your toaster needed.

The changelog reads like a novel written by a caffeinated robot: fixes for AMD, tweaks for Intel, a gentle pat on the back for ARM, and a completely normal update to BPF that definitely won’t break your debug setup (again).

Should you upgrade? Of course.

Pete 3:28 am on June 24, 2025
Tags: linux ( 2 ), NUMA, Technology ( 7 )

Kernel Hell: SLUB Allocator Contention on NUMA Machines

So here’s a weird rabbit hole I went down recently: trying to figure out why Linux memory allocation slows to a crawl under pressure — especially on big multi-socket systems with a ton of cores. The culprit? Good ol’ SLUB. And no, I don’t mean a rude insult — I mean the SLUB allocator, one of the core memory allocators in the Linux kernel.

If you’ve ever profiled a high-core-count server under load and seen strange latency spikes in malloc-heavy workloads, there’s a good chance SLUB contention is part of it.

The Setup

Let’s say you’ve got a 96-core AMD EPYC box. It’s running a real-time app that’s creating and destroying small kernel objects like crazy — maybe TCP connections, inodes, structs for netlink, whatever.

Now, SLUB is supposed to be fast. It uses per-CPU caches so that you don’t have to lock stuff most of the time. Allocating memory should be a lockless, per-CPU bump pointer. Great, right?

Until it’s not.

The Problem: The Slow Path of Doom

When the per-CPU cache runs dry (e.g., under memory pressure or fragmentation), you fall into the slow path, and that’s where things get bad:

SLUB hits a global or per-node lock (slub_lock) to refill the cache.
If your NUMA node is short on memory, it might fallback to a remote node — so now you’ve got cross-node memory traffic.
Meanwhile, other cores are trying to do the same thing. Boom: contention.
Add slab merging and debug options like slub_debug into the mix, and now you’re in full kernel chaos mode.

If you’re really unlucky, your allocator calls will stall behind a memory compaction or even trigger the OOM killer if it can’t reclaim fast enough.

Why This Is So Hard

This isn’t just “optimize your code” kind of stuff — this is deep down in mm/slub.c, where you’re juggling:

Atomic operations in interrupt contexts
Per-CPU vs. global data structures
Memory locality vs. system-wide reclaim
The fact that one wrong lock sequence and you deadlock the kernel

There are tuning knobs (/proc/slabinfo, slub_debug, etc.), but they’re like trying to steer a cruise ship with a canoe paddle. You might see symptoms, but fixing the cause takes patching and testing on bare metal.

Things I’m Exploring

Just for fun (and pain), I’ve been poking around the idea of:

Introducing NUMA-aware slab refill batching, so we reduce cross-node fallout.
Using BPF to trace slab allocation bottlenecks live (if you haven’t tried this yet, it’s surprisingly helpful).
Adding a kind of per-node, per-type draining system where compaction and slab freeing can happen more asynchronously.

Not gonna lie — some of this stuff is hard. It’s race-condition-central, and the kind of thing where adding one optimization breaks five other things in edge cases you didn’t know existed.

SLUB is amazing when it works. But when it doesn’t — especially under weird multi-core, NUMA, low-memory edge cases — it can absolutely wreck your performance.

And like most things in the kernel, the answer isn’t always “fix it” — sometimes it’s “understand what it’s doing and work around it.” Until someone smarter than me upstreams a real solution.

Pete 4:11 am on June 10, 2025

I Don’t Want Your Kernel-Level Anti-Cheat. I Want to Play My Damn Game.

Let me be crystal clear: I hate kernel-level anti-cheat.

I’m not talking about a mild dislike, or a passing irritation. I mean deep, primal, disgust. The kind you feel when you realize the thing you paid for—your game, your entertainment, your free time—comes with a side of invasive rootkit-level surveillance masquerading as “protection.”

You call it BattlEye, Vanguard, Xigncode, whatever the hell you want. I call it corporate spyware with a EULA.

It’s MY Computer, Not Yours

Let’s get this straight: no anti-cheat—none—has any damn business installing itself at the kernel level, the most privileged layer of my operating system. You know who else runs code in the kernel?

Rootkits. Malware. Nation-state surveillance tools.

So, no, I don’t want your “proactive protection system” watching every process I launch, analyzing memory usage, intercepting system calls, and God-knows-what-else while I try to enjoy a couple hours of gaming. That’s not anti-cheat. That’s anti-user.

“But It’s Necessary to Stop Cheaters!”

Don’t feed me that line. You mean to tell me the only way to stop some 14-year-old aimbotting in Call of Duty: Disco Warfare 7 is to give you full access to the innermost sanctum of my machine? Are you serious?

There’s a difference between anti-cheat and carte blanche to run black-box software with system-level privileges. If your defense against wallhacks requires administrator rights on MY computer 24/7, your design is broken.

Cheaters are clever, sure. But so are malware authors. You know what they do? Exploit over-privileged software running in the kernel. You’ve just handed them one more juicy target.

I Didn’t Sign Up to Beta Test Your Surveillance System

Let’s talk about transparency. There isn’t any.

What does your anti-cheat actually do? What telemetry does it collect? What heuristics are used to flag me? Does it store that data? Share it? Sell it? Run in the background when I’m not even playing?

You won’t tell me. You’ve locked it up tighter than Fort Knox and buried it in an NDA-laced support email chain.

And don’t even get me started on false positives. I’ve seen legit players banned for running Discord overlays or having a debugger open from their job. Their appeals? Ignored. Labeled cheaters by automated judgment, with no accountability.

The Irony? You Still Don’t Stop Cheaters

And here’s the kicker. You’re still losing.

Despite all this Big Brother garbage, cheaters still infest games. ESPs, ragebots, HWID spoofers—they’re thriving. Know why?

Because you’re fighting a cat-and-mouse game with people who are smarter, faster, and more motivated than your overfunded security team. You’re just screwing over everyone else in the process.

Enough

I don’t want a rootkit with my copy of Rainbow Six. I don’t want a watchdog in the kernel just to enjoy Escape from Tarkov. I don’t want to sacrifice privacy, performance, or basic control over my system for the privilege of not being called a cheater.

You say your mission is to “keep the game clean.”

I say: start by getting out of my goddamn kernel.

Pete 1:48 am on June 6, 2025

Using PyTorch to Investigate Catastrophic Forgetting in Continual Learning

I’ve been working on this for awhile. I want to start writing more about Pytorch. One topic that has been taking a lot of my reading time these days is catastrophic forgetting. Lets dive into it. Catastrophic forgetting is a well-documented failure mode in artificial neural networks where previously learned knowledge is rapidly overwritten when a model is trained on new tasks. This phenomenon presents a major obstacle for systems intended to perform continual or lifelong learning. While human learning appears to consolidate past experiences in ways that allow for incremental acquisition of new knowledge (a huge fucking maybe here btw, in fact, a lot of this is a deep maybe), deep learning systems—especially those trained using stochastic gradient descent—lack native mechanisms for preserving older knowledge. In this article, we explore how PyTorch can be used to simulate and mitigate this effect using a controlled experiment involving disjoint classification tasks and a technique called Elastic Weight Consolidation (EWC).

Why do you care? Recently at work, my boss had to explain in detail how my company makes sure that there is no data left if processing nodes are reused. That really got me thinking about this…

We construct a continual learning environment using MNIST by creating two disjoint tasks: one involving classification of digits 0 through 4 and another involving digits 5 through 9. The dataset is filtered using torchvision utilities to extract samples corresponding to each task. A shared multilayer perceptron model is defined in PyTorch using two fully connected hidden layers followed by a single classification head, allowing us to isolate the effects of sequential training on a common representation space. The model is first trained exclusively on Task A using standard cross-entropy loss and Adam optimization. Performance is evaluated on Task A using a held-out test set. Following this, the model is trained on Task B without revisiting Task A, and evaluation is repeated on both tasks.

import torch
import torch.nn as nn
import torch.nn.functional as F
class MLP(nn.Module):
def init(self):
super().init()
self.fc1 = nn.Linear(784, 256)
self.fc2 = nn.Linear(256, 256)def forward(self, x):
    x = x.view(x.size(0), -1)
    x = F.relu(self.fc1(x))
    x = F.relu(self.fc2(x))
    return self.out(x)

As expected, the model exhibits catastrophic forgetting: accuracy on Task A degrades significantly after Task B training, despite the underlying model architecture remaining unchanged. This result validates the conventional understanding that deep networks, when trained naively on non-overlapping tasks, tend to fully overwrite internal representations. To counteract this, we implement Elastic Weight Consolidation, which penalizes updates to parameters deemed important for previously learned tasks.

def compute_fisher(model, dataloader):
model.eval()
fisher = {n: torch.zeros_like(p) for n, p in model.named_parameters() if p.requires_grad}
for x, y in dataloader:
x, y = x.to(device), y.to(device)
model.zero_grad()
out = model(x)
loss = F.cross_entropy(out, y)
loss.backward()
for n, p in model.named_parameters():
if p.grad is not None:
fisher[n] += p.grad.data.pow(2)
for n in fisher:
fisher[n] /= len(dataloader)
return fis

To apply EWC, we compute the Fisher Information Matrix for the model parameters after Task A training. This is done by accumulating the squared gradients of the loss with respect to each parameter, averaged over samples from Task A. The Fisher matrix serves as a proxy for parameter importance—those parameters with large entries are assumed to play a critical role in preserving Task A performance. When training on Task B, an additional term is added to the loss function that penalizes the squared deviation of each parameter from its Task A value, weighted by the corresponding Fisher value. This constrains the optimizer to adjust the model in a way that minimally disrupts the structure needed for the first task.

Empirical evaluation demonstrates that with EWC, the model retains significantly more performance on Task A while still acquiring Task B effectively. Without EWC, Task A accuracy drops from 94 percent to under 50 percent. With EWC, Task A accuracy remains above 88 percent, while Task B accuracy only slightly decreases compared to the unconstrained case. The exact tradeoff can be tuned using the lambda regularization hyperparameter in the EWC loss.

This experiment highlights both the limitations and the flexibility of gradient-based learning in sequential settings. While deep neural networks do not inherently preserve older knowledge, PyTorch provides the low-level control necessary to implement constraint-aware training procedures like EWC. These mechanisms approximate the role of biological consolidation processes observed in the human brain and provide a path forward for building agents that learn continuously over time.

Future directions could include applying generative replay, using dynamic architectures that grow with tasks, or experimenting with online Fisher matrix approximations to scale to longer task sequences. While Elastic Weight Consolidation is only one tool in the broader field of continual learning, it serves as a useful reference implementation for those investigating ways to mitigate the brittleness of static deep learning pipelines.

Why the hell does this matter? beyond classification accuracy and standard benchmarks, the structure of learning itself remains an open frontier—one where tools like PyTorch allow morons and nerds like me to probe and control the dynamics of plasticity and stability in artificial systems.

	deepdark103 on Your Data Was Never Yours: Mic…
	deepdark103 on My Split Heart: Why I’m…
	deepdark103 on They didn’t embrace Linu…
	Ian McCormack on Terraform Cloud with Vaul…
	Pete on Welcome home Teams

Peters ramblings

Technology, mobile, identity and rants

Monthly Archives: June 2025