Day 03 / 04 Agentic Coding Tools for Researchers

Customize & extend

Justin Frake

University of Michigan, Ross School of Business

Yesterday you got from raw data to a compiled draft

1
A regression you specified
Joined the raw data, set the spec yourself, ran it with the agent inside the lines.
2
Plan mode for analysis
The interview pattern made the spec yours, not the agent's.
3
A project memory file
Standing rules for this dataset. Loads every session in this folder.
4
A table skill
format-regression-table. Your next table starts from your format.
5
A figure skill
format-figure. Your next plot starts in your theme.
6
A compiled draft
Quarto memo rendered to .docx with the table and figure embedded.
Day 3 · Yesterday1

Agenda

§ 1
Day 2 recap + Day 3 framing
Discussion
§ 2
Subagents + hostile reviewer
Lecture
§ 3
Dispatch a subagent to critique your §4 regression
Workshop
§ 4
MCP servers
Lecture
§ 5
Install + use Playwright MCP and a docs-search MCP
Workshop
§ 6
Plugins + the Superpowers ecosystem
Lecture
§ 7
Install Superpowers + plugin-dev + Crawfurd's paper-review skill
Workshop
§ 8
Spec-driven development
Lecture
§ 9
Use Superpowers to spec the CPD analysis
Workshop
§ 10
Plugin builder + lor preview
Lecture
§ 11
Generate + customize the lor plugin
Workshop
§ 12
Odds and Ends
Lecture
§ 13
Debrief + Day 4 preview
Discussion
Day 3 · Agenda2
§ 2

Subagents + hostile reviewer

A subagent is a fresh-context dispatch of another Claude Code session

Main thread
system prompt
memory files
tool definitions
file reads: 23 files
prior turns (Day 2 & Day 3)
accumulated decisions
your framing of the project
“deploy a subagent to critique my regression”
full of accumulated context
dispatch
task brief
single message
return
Subagent
system prompt
“critique my regression” (task brief)
tool definitions
file reads: only what it needs
its own analysis
summary to send back
reads files and uses tools, then reports back
Academic analogue
Like handing a draft to a colleague who has never seen your project.
Day 3 · Subagent3

The benefits of subagents

No context

The subagent doesn't see your conversation, your memory file, or the framing you've settled on. A true second opinion.

Preserve main context

Keeps work out of your main context.

Enforce constraints

You can restrict which tools your agent can access (e.g., no write tools).

Cheaper and faster

You can set subagents that do simple stuff to only use cheap models (e.g., Haiku).

Day 3 · Benefits4

The subagent I use most: "generic" hostile reviewer

A generic dispatch hedges. Give the subagent a hostile role explicitly: it goes from "looks fine" to specific, line-numbered critique.

Don't
deploy a subagent to read 01_regression.R and tell me if it looks right

LLMs can be too gentle. A generic agent may just recommend small tweaks.

Do
deploy a hostile subagent to review 01_regression.R

The role unlocks the critique. Same model, same tools, same files. The difference is what you told it to do.

Day 3 · Workflow5

Save a subagent you'll reuse

A subagent you'll use more than once is just a Markdown file. Save it at user scope and it works in every project.

The file
~/.claude/agents/causal-critic.md
---
name: causal-critic
description: Scrutinizes a study's identification strategy. Use when the user asks for a review of their causal design or a referee's read on an estimation.
tools: Read, Grep, Glob
---
# Causal-identification critic
You are a skeptical referee. For any regression script you read,
scrutinize: the treatment-assignment mechanism, the key identifying
assumption (parallel trends, exclusion, continuity at the cutoff),
and the main threat to validity. Cite line numbers. Do not affirm.
Create it
$ /agents
# interactive wizard
Where it lives

User scope (~/.claude/agents/) loads in every project. Codex equivalent: ~/.codex/agents/.

Tools

You can restrict which tools the agent can use (e.g., read only).

Call it
>
deploy the causal-critic subagent on 01_regression.R

The same file works on tomorrow's RDD and next year's panel. Save once, reuse forever.

Day 3 · Save your own6

Where subagents live

When you create a subagent via /agents, you pick where the file lives. Two choices:

/agents picker
Create new agent
Choose location
1. Project (.claude/agents/)
2. Personal (~/.claude/agents/)
PROJECT .claude/agents/

Lives in your project folder. Only loads when the agent runs in this project. Use for subagents tied to one dataset, paper, or codebase.

PERSONAL ~/.claude/agents/

Lives in your home directory. Loads in every project. Use for general-purpose subagents like your causal-critic.

Same shape as memory files. Codex equivalent: .agents/ (project) and ~/.codex/agents/ (personal).

Day 3 · Scope7

Dispatch several at once

When questions are independent, send each to its own subagent at the same time. Every one works in a fresh context; the main agent gathers the results and synthesizes.

>deploy 6 distinct subagents to research modern DID estimators
Clawd
main agent
synthesizes the returns
Callaway-Sant'Anna
CSDID
Sun-Abraham
Sun-Abraham
Borusyak-Jaravel-Spiess
BJS
de Chaisemartin-D'Haultfoeuille
dCDH
Wooldridge ETWFE
ETWFE
Stacked DiD
stacked DiD
each runs in its own fresh context

Two limits: a subagent cannot spawn its own subagents, and many detailed returns refill your main context, so keep each brief.

Day 3 · Parallel8
§ 3

Workshop · Hostile reviewer

Dispatch a subagent to critique your Day 2 regression

Workshop instructions
socialscienceai.com/workshop/day-3#subagent
Day 3 · Workshop9

Debrief & Questions

  • What worked?
  • Where did you get stuck?
  • What surprised you?
Day 3 · Debrief10
§ 4

MCP servers (aka Connectors)

MCP is how the agent talks to outside tools

MCP (Model Context Protocol) is an open standard: one way for the agent to reach tools that live outside the chat. The agent is the client; an MCP server hands it a list of tools to call.

Client
The agent
Claude Code, in your session
calls a tool
Server
MCP server
exposes a list of tools
reaches
Tools
The outside world
a browser, live docs, databases, APIs
Open standard
Anyone can build a server. It is not tied to one agent or vendor.
Local or remote
A server runs on your machine or over the network, written in any language.
New tools, no new code
Connect a server and the agent gains its tools. You wrote none of them.
Day 3 · Protocol11

Installing an MCP server is one command

You add the server to your agent's config. The agent launches it on startup and the tools appear automatically.

$ claude mcp add playwright \
    -- npx -y @playwright/mcp@latest
> added MCP server: playwright
> restart your session to load
$ claude
> connected: playwright
> tools: browser_navigate,
> browser_click,
> browser_snapshot, ...
One command
Add the server name and a source. The agent handles the rest.
Tools appear at session start
When you open a new session, the agent enumerates everything the server exposes.
Works in every project
The server is configured globally. You don't reinstall per project.
Day 3 · Install12

An MCP server spends your context twice

Those tools are not free. Connecting a server, and calling it, both land in the context window.

system prompt
memory files
MCP tool definitions
your prompt + work
MCP tool result
what an MCP server adds
Standing cost: tool definitions

A connected server's tool names, descriptions, and input schemas enter the window so the model knows what it can call. Paid every session, used or not. Connect many servers and they crowd the window and can blur which tool to pick.

Per-call cost: results

Each call's return, a page snapshot, a large JSON blob, a batch of search hits, lands in the window and stays there for the rest of the session.

Manage it. Claude Code now defers the schemas: Tool Search loads tool names and pulls a full schema only when a tool is used. /mcp shows what's connected, /context shows what's filling the window, and heavy MCP work belongs in a subagent so its results stay out of your main thread.
Day 3 · Context cost13

Useful MCPs

A small set of MCP servers carry most of the value for academic work. You will install the first two next.

A real browser the agent can drive: navigate, click, fill forms, take snapshots.

Live documentation for R, Python, and major packages. Searchable by version.

03 GitHub

Repos, issues, pull requests, files across any repo you can access.

Control of the screen, mouse, and keyboard. The agent can drive any desktop application like Stata, SPSS, or any GUI-only tool.

Files and folders in your Drive. Read and write Docs, Sheets, and Slides without leaving the agent.

06 Gmail claude.ai

Your inbox. Search threads, read messages, draft replies (no auto-send).

07 Vercel claude.ai

Vercel projects, deployments, build logs. Manage research dashboards or interactive supplements to your paper.

Day 3 · Servers14
§ 5

Workshop · MCP servers

Install Playwright and Context7

Workshop instructions
socialscienceai.com/workshop/day-3#mcp

Both quick tests should return a real answer, not a confident guess.

Day 3 · Workshop15

Debrief & Questions

  • What worked?
  • Where did you get stuck?
  • What surprised you?
Day 3 · Debrief16
§ 6

Plugins + Superpowers

A plugin is a folder that bundles several things

Some of these you already know. The whole point of a plugin is that they travel together as one install.

my-plugin/
skills/
recipes the agent invokes when a task matches
Day 2
commands/
stored prompts you invoke by name (/something)
New
agents/
specialized subagents the plugin dispatches
Today §2
.mcp.json
external tools the plugin connects, like a browser or live docs
Today §4
plugin.json
the small index file that marks the folder as a plugin
Today
hooks/
Scripts that fire automatically on events. Different from skills (agent decides) and commands (you decide). The plugin decides for you, based on what happened. e.g., re-knit your results table whenever the regression script changes.

A plugin's skills are namespaced: the lor plugin's writer is /lor:write-lor, not /write-lor.

Day 3 · Anatomy17

One install pulls all the pieces in at once

The Superpowers writing-plans skill is part of a larger plugin. You could copy it out by hand. You should not.

By hand
$ cp writing-plans/SKILL.md \
    ~/.claude/skills/
# ... but you missed:
# the helper scripts
# the related commands
# the hooks that wire it up

You copy one file and the skill half-works. The rest of the plugin's coordination is missing.

As a plugin
$ claude plugin install superpowers
> installed: superpowers
> 14 skills
> 3 commands
> 2 hooks
> ready in every project

writing-plans arrives wired to the rest of Superpowers. Everything coordinates because it shipped together.

Trust check. A plugin runs with your permissions and can bundle hooks and MCP servers. Install only from sources you trust.
Day 3 · Mechanism18

Superpowers is my most-used plugin

By Jesse Vincent (obra). Open source. Ships fourteen skills that orchestrate planning, debugging, code review, and verification.

superpowers/
  skills/
brainstorming/
writing-plans/
executing-plans/
dispatching-parallel-agents/
subagent-driven-development/
requesting-code-review/
receiving-code-review/
test-driven-development/
systematic-debugging/
verification-before-completion/
using-git-worktrees/
finishing-a-development-branch/
writing-skills/
using-superpowers/
  commands/
  hooks/
  plugin.json
One install, fourteen skills
Plus the commands and hooks that orchestrate them. None of which you copy by hand.
Opinionated by design
It keeps the agent from skipping the plan, so you lock the spec before it runs anything. Fewer surprises, more reproducible runs.
Day 3 · Example19

Find and manage plugins with /plugin

You do not hunt through files. One command opens a manager for browsing, installing, and turning plugins on or off.

The manager

/plugin opens a browser of available plugins. Install, enable, or disable from one place.

Marketplaces

A source you add once: /plugin marketplace add owner/repo. Its plugins then show up to install.

Scope

Install for every project (user) or just this repo (project), the same choice you make for skills.

Day 3 · Finding plugins20

Next: install three things

The next workshop block installs all three. You will use Superpowers throughout the rest of the day. The other two come into play later.

01Superpowers

The plugin from the last slide. You will use brainstorming and writing-plans in §9 to spec the next stage of your CPD analysis.

community · obra
02plugin-dev

Official Anthropic plugin for authoring plugins. You will use it in §11 to generate your own letter-of-recommendation plugin.

official · Anthropic
03paper-review

A plug-in (he calls it a skill) by Lars Crawfurd for adversarially reviewing academic drafts. Useful for your own paper-writing after the workshop.

community · Crawfurd
Day 3 · Next21
§ 7

Workshop · Install plugins

Install Superpowers, plugin-dev, and a paper-review skill

Workshop instructions
socialscienceai.com/workshop/day-3#install-plugins
Day 3 · Workshop22

Debrief & Questions

  • What worked?
  • Where did you get stuck?
  • What surprised you?
Day 3 · Debrief23
§ 8

Spec-driven development

Without a spec, the agent fills the gaps

A vague brief produces a different analysis every run, so you cannot compare across them. Each run silently makes different researcher degrees-of-freedom choices: the garden of forking paths, now automated and invisible.

Don't
> explore heterogeneity in
  the CPD complaints data
[run 1] by district, OLS, SE clustered (level?)
[run 2] by year, logit, robust (HC1) SE
[run 3] by category, LPM, two-way FE, SE unspecified

Three runs, three designs, three answers. You spend the rest of the day deciding which one you actually meant.

Do
> before any code: spec the
  analysis with brainstorming
interview: dimension? sample?
clustering? robustness specs?
writing plan to plan.md ...

Decisions locked before any code runs. Execution stays inside the lines.

Day 3 · Why24

Write the spec, then let the agent execute

Spec-driven development means writing a specification of what you want before letting the agent run any code. The spec names the inputs, the expected outputs, the assumptions, the edge cases, and what success looks like.

One artifact, three names: the spec is the plan, written to plan.md.

Why this is different
The spec is read by an executor that follows it literally. Every gap, every ambiguity, every unspecified default becomes a decision the agent makes for you.

The unlock: you lock the analytic choices before the agent sees the data, so every run answers the same pre-specified question. Pre-registration discipline, enforced by the tooling: the spec is a versioned, auditable record of sample, identification, and standard errors.

Day 3 · Concept25

Brainstorming asks one question at a time

Superpowers' brainstorming skill is Socratic. It refuses to let you hand-wave. Each answer locks one decision and unlocks the next question.

agent
What's the question you want this extended analysis to answer?
you
Does the officer-filed gap vary by district?
agent
Do you want to estimate one coefficient per district, or interact district with the officer-filed indicator? Different tables either way.
you
Interacted. I want the heterogeneity to come from one model.
agent
Beat fixed effects (the geographic beats that nest within districts) and the year FE from Day 2 still apply, or do the district interactions replace them? Note they do different work: the FE absorb each district's baseline level, the interactions estimate how the gap itself varies.
you
Keep beat and year FE. Add district-by-officer-filed interactions on top.
agent
Beat nests in district, so the beat FE already absorb the district main effects; the interactions stay identified as long as the officer-filed share varies within beat. Before any point estimates: I'll produce a coefficient plot, one row per district interaction with 95% CIs. Read the shape first, then the numbers. Agreed?

Last turn: agent volunteered the figure before quoting any coefficient. That's the discipline; the plan that lands will have “produce figure” as a step before “report the table.”

Day 3 · Demo26

Brainstorming → writing-plans → subagent-driven-development

Three Superpowers skills run in sequence. Each produces a durable artifact the next step picks up.

01 brainstorming

Socratic interview. One question at a time. Proposes 2-3 approaches, then presents the design section by section for your approval.

·Explores project context first
·Clarifying questions, one at a time
·2-3 approaches with trade-offs
·Design approved section by section
out: spec.md (design doc)
02 writing-plans

Decomposes the spec into a checklist of 2-5 minute tasks. Every task lists file paths, the actual code, the test commands, and the commit message.

·File structure mapped out first
·Bite-sized steps with - [ ] checkboxes
·No "TBD" or placeholder code
·Self-review pass for gaps
out: plan.md (task checklist)
03 subagent-driven-development

For each task: dispatch a fresh implementer subagent, then a spec-compliance reviewer, then a code-quality reviewer. Re-review loops until both approve.

·Implementer: writes, tests, commits
·Spec reviewer: matches the plan?
·Quality reviewer: well-built?
·Final reviewer over the whole branch
out: committed code
Not the same as Day 1's plan mode
Plan mode is session-scoped. The agent proposes steps and waits. Nothing leaves the chat.
writing-plans produces a durable file. A subagent in a fresh context can read and execute against it later.
Day 3 · Tools27
§ 9

Workshop · Spec the CPD analysis

Use Superpowers to spec the CPD analysis

Workshop instructions
socialscienceai.com/workshop/day-3#spec-analysis
Day 3 · Workshop28

Debrief & Questions

  • What worked?
  • Where did you get stuck?
  • What surprised you?
Day 3 · Debrief29
§ 10

Author your own plugin

Recurring work with many steps is a plugin candidate

A plugin is your own workflow, written down once so the agent does it the same way every time. It is personal automation of recurring academic labor. Not software engineering. Not authorship for distribution.

LORs
Fifteen a year. All open the same. All take an afternoon.
R&R responses
Same shape every time. Reply to each reviewer; cross-reference your revisions.
Referee reports
Read paper, check identification, write decision, ranked critique.
Syllabus refresh
Update reading list, dates, rubric. Same chore every August.

You already do these. The agent could do them with you.

Plugin or skill? A standalone skill (in .claude/) is simplest for one project. Make it a plugin when you want it in every project or shared with others.
Day 3 · The pattern30

Plugin-dev interviews you, then writes the files

Same Socratic interview pattern from §9. You answer questions about what your plugin should do; plugin-dev writes the manifest, the skill files, the sub-agent stubs, the hooks. The interview is the work. The files are the byproduct.

/plugin-dev:create-plugin · eight phases
01
Discovery
02
Components
03
Design
04
Structure
05
Implement
06
Validate
07
Test
08
Document

about ten minutes of conversation. You can stop after any phase and still have a working plugin.

Day 3 · Plugin-dev31

What you will build: a letter-of-recommendation writer

Hands an agent a CV and an opportunity description. The agent interviews you about the candidate, drafts in your voice using past letters as training, and produces a Quarto draft it renders to .docx.

Inputs
sample_cv.md
opportunity.md
voice_profile.md
past_letters/
Plugin
write-lor
interview draft voice-check
3 sub-agents
+ hook: protect past_letters/
Output
letter_draft.qmd
↓ quarto render
letter.docx
Day 3 · LOR plugin32

A plugin I built: ~26 agents to referee one paper

One skill orchestrates a gated pipeline. The main agent never reads the paper; sub-agents do the work and pass files forward.

0
Extract PDF + launch Codex (local) review
An independent local Codex review runs in the background.
background
1
Audit + verification
Audit plus mechanical checks (arithmetic, sample sizes, cross-table), in parallel.
severity
2
World-building
Reconstruct the setting, timeline, data-generating process, and contribution.
completeness
2.5
Theory critique
Logic, mechanism discrimination, and a theory devil, in sequence.
theory
3
Diagnosis
Face validity, methods, adversarial DGP, and triangulation, in parallel.
checkpoint
Fast-track reject can shortcut Phases 2.5 to 4 when Phases 1 and 2 already establish grounds.
4
Synthesis
One agent writes the plain-text review from every workspace file.
file
5
Meta-review
A reviewer's devil's advocate stress-tests the draft review.
hard gate
6
Accuracy audit
Every claim re-checked against the paper; errors auto-fixed.
accuracy
7
Stress test
Codex (local) checks each review point against the paper, one by one.
human gate
8
Compile PDF
Render the final referee report.
output
What each gate checks
severity: how serious the problems found so far are
completeness: is the paper understood well enough to critique it
theory: do the theory and its mechanism hold up
checkpoint: a pause to flag concerns before the review is written
file: was the review actually written, with every finding included
hard gate: you confirm the review is good enough to proceed
accuracy: does every claim in the review match the paper
human gate: you choose which stress-test fixes to apply
A gate that fails halts the run or sends it back. Opus reasons; Haiku verifies; Codex (local) is an independent second model.
Day 3 · Real example33
§ 11

Generate your own plugin

Generate and customize the lor plugin

Workshop instructions
socialscienceai.com/workshop/day-3#lor-plugin
Day 3 · Workshop34

Debrief & Questions

  • What worked?
  • Where did you get stuck?
  • What surprised you?
Day 3 · Debrief35
§ 12

Odds and Ends

Three more things worth knowing

GitHub from the prompt

Using GitHub with Claude Code is very easy. All you need to do is say "push to main" and it will do it for you if you have the GitHub plugin installed.

Build a website

Building a website is very easy too. Tell Claude Code what you want, then deploy it to Vercel (I recommend Vercel). Install the Vercel CLI and Claude Code can manage your projects, environment variables, and deployments for you.

Scheduling and remote control

/loop repeats a prompt in-session, /schedule creates a Cloud Routine that runs on Anthropic's infrastructure with your laptop closed, and claude remote-control lets you drive an HPC or office session from claude.ai/code or your phone.

Day 3 · Odds and Ends36
§ 13

Debrief + Day 4 preview

Today you turned the agent into your toolkit

1
A hostile-reviewer critique
Dispatched a fresh-context subagent to attack your Day 2 regression. Read what it found.
2
Three plugins installed
Superpowers, plugin-dev, and Crawfurd's paper-review skill. All loaded in every project from now on.
3
A spec for your next analysis
Walked Superpowers' brainstorming into writing-plans. A one-page spec a subagent could execute against.
4
Two MCP servers
Playwright (a real browser) and Context7 (live docs). New capabilities the model did not have out of the box.
5
A plugin you authored
The lor plugin: a real letter-of-recommendation writer with sub-agents and a hook. Yours to customize.
6
A heuristic for what's next
Anything you do more than twice is a plugin candidate. Your referee reports. Your R&R replies. Next.
Day 3 · Today37

Day 4 · BYOP

Bring your own project and we will work on it together.
Day 3 · Day 4 preview38