I keep noticing that seemingly unrelated fields share the same deep structure.

This is where I think out loud. Notes on AI, complexity, systems that learn to simplify themselves, and the recurring patterns I keep finding between domains that aren't supposed to be related.

Currently working on

Cognitive offloading (teaching AI agents to replace themselves with simpler code) · Token maturation for reducing hallucinations · A CVPR 2026 paper on the modality gap in vision-language models


/ Notes

The Sea Squirt Principle: When AI Learns to Shrink Itself

Cognitive Offloading · March 2026

There's an animal called the sea squirt that no longer needs its nervous system once it settles. Not literally a brain — I'm saying this before the biologists come for me — but the core idea is fascinating: early on, it needs complex machinery to explore. Once settled, it doesn't.

I've been thinking about this as a learning principle. A useful sign of learning is not that a system thinks faster. It's that the same repeated situations require less explicit reasoning. A chess master isn't a beginner who calculates faster — in many positions, the master isn't "calculating" in the beginner's sense at all.

There's a similar pattern in LLM-based systems. An LLM may fail when directly asked how many r's are in "strawberry," yet easily write a short program that counts characters correctly. The code outperforms the model that wrote it. That's not just optimization — it's knowledge being converted from description to reliable execution.

I've been building a system where the model gradually reduces its own role over time, crystallizing repeated reasoning into verified tools. Early results: 67% of tasks offloaded to deterministic code, same accuracy, a quarter of the cost. The system is learning by shrinking the set of things that still require open-ended reasoning.

AI Today is Dial-Up

Inference & the Future · February 2026

The state of AI today really reminds me of the internet in the 90s. Before broadband, the internet was mostly... chat. You'd type something, wait, get a text response. Sound familiar?

At 50–150 tokens/sec, AI is a chatbot. Everything we're building — the wrappers, UX patterns, business models — assumes that interaction model. But the hardware trajectory is pointing somewhere different. Groq, Cerebras, SambaNova, newer players like Taalas — all pushing into thousands of tokens/sec.

At that speed, AI stops being a chatbot. Just like broadband didn't make email faster — it created Netflix. Instead of ever-larger models that "know" the answer, you could run a small model thousands of times in parallel. Generate, mutate, verify, search at runtime. Interfaces that don't exist until you need them. Robotics untethered from the cloud.

But these are just examples I can think of today. The most interesting things broadband enabled weren't predicted in 1995. I suspect the same will be true here. One thing I do think is predictable: when generation becomes nearly free, the bottleneck moves to verification. When tokens are cheap, trust is expensive.

The Interference Alignment Lesson

Patterns Across Fields · February 2026

Do you know Interference Alignment? For me, it's a story with a moral that we lived through in the wireless research community almost 20 years ago. A lesson about the gap between mathematical beauty and engineering reality.

In 2008, a seemingly magical idea appeared: even with infinite users, each one could still get half the global spectrum. The math actually worked out. We tried to build it. It worked — at small scale, in the lab. The vibe was: "We just need better engineering."

But reality had other plans. The required accuracy of channel state information grows exponentially with users. At scale, almost 100% of bandwidth becomes overhead just to keep the alignment alive. The system collapsed under its own complexity. Not because the math was wrong, but because engineering details don't just change the constants — sometimes they break the scaling laws entirely.

Since then, whenever I see a beautiful theoretical result with incredible scaling, I first look for the hidden contract: what must stay perfectly aligned? And what does it cost to keep it that way? I see a very similar dynamic in quantum error correction. I could be wrong, but I've seen this movie before.

LLMs as Formulators, Not Optimizers

On AlphaEvolve & Classical Tools · 2026

One thing keeps bothering me about program synthesis approaches like AlphaEvolve. The optimization is performed through code generation — the LLM generates code that implicitly performs the search. That can easily mean thousands of model calls just to explore the search space.

But we already have extremely powerful tools: convex optimization, MILP solvers, CMA-ES, simulated annealing, reinforcement learning, dynamic programming. We spent more than 50 years building this toolbox.

Instead of synthesizing programs that implicitly perform optimization, we could often do something simpler: LLM formulates the problem, classical optimizer solves it. This reminds me of a very common research pattern in electrical engineering about 10–15 years ago: model the problem, derive a tractable relaxation, solve it with a solver, bound the gap. The hard part was never solving. It was formulating the problem correctly. The same might be true for LLMs today.

Complex Means Interesting

Complexity Theory · 2026

There are many definitions of complexity. Entropy, Kolmogorov complexity, logical depth, statistical complexity. Some contradict each other. Yet intuitively we rarely struggle to recognize complexity when we encounter it.

My current intuition is simple. Complex means interesting. From an evolutionary perspective, interesting means something more precise: it signals that investing energy to understand a phenomenon may produce advantage. Order is not interesting because there's nothing to gain. Pure randomness is not interesting because there's nothing to exploit. What we call complex sits exactly in between — it is structure that rewards stronger observers.

There's also a small paradox hidden here. If complexity is a signal of whether investment will pay off, then we need to know whether to invest before investing. This makes complexity less like an explicit metric and more like an internal heuristic. Perhaps this explains why complexity is so hard to define formally yet so easy to feel.


/ Lab

Published research and ongoing work. Links to papers where available.

In progress

Cognitive Offloading in Autonomous Agents

A system where LLM agents learn to replace their own reasoning with verified deterministic code. 67% offloaded, same accuracy, 4x cheaper.

Under review — ICML

Token Maturation

Delayed token commitment for reducing hallucinations. Letting representations mature before making hard decisions.

CVPR 2026

Closing the Modality Gap in Vision-Language Models

The modality gap in CLIP-style models hurts robustness. A few lines of linear algebra fix it — no retraining, drop-in for any VLM.

IBM · Published

Granite Vision

Co-authored IBM's compact 2B parameter vision-language model. Our team at IBM Haifa contributed significantly to its development.

ACL · 21 citations in first year

Real-mm-RAG

An automatically generated benchmark for multimodal retrieval-augmented generation. The community needed it — adoption was faster than expected.

Published

Complexity as Advantage

Complexity as the performance gap between observers with different capabilities. Connects entropy, MDL, regret, and logical depth.

IEEE JSAC · 579 citations

Deep Multi-User Reinforcement Learning for Dynamic Spectrum Access

One of the earlier papers applying deep RL to multi-user wireless networks. Distributed policy learning without centralized coordination.


/ About

I trained my first neural network in high school, in Visual Basic, after reading Kurzweil. Then I spent 15 years doing what seemed like different things — applied math, signal processing, defense systems, distributed optimization. It took me a while to realize they were all the same thing.

During my postdoc at Washington University in St. Louis, I got curious about a reference in a widely-cited 2010 paper. It pointed to a 1967 paper nobody had read. I ordered a physical copy through the library. It turned out that 20 years of modern research had been unknowingly rediscovering what was already there. That experience shaped how I work — I always go to the original source, because fields forget.

At Rafael, I worked on reinforcement learning for defense — systems that had to learn in real time, with no room for error. At IBM, I co-authored Granite Vision, created the Real-mm-RAG benchmark, and manage the AI for Knowledge group. Currently a Principal RSM and Master Inventor.

I'm genuinely interested in a lot of things — probably too many — and I tend to spread across fields rather than dig into one. But I've found that the interesting ideas usually live at the intersections.

28+
publications
1,200+
citations
2
US patents

/ Contact

If any of this resonated, or you're thinking about similar problems — I enjoy those conversations. I'm not always fast to reply, but I read everything.