Day 03 / 04 Agentic Coding Tools for Researchers

Customize & extend

Justin Frake

University of Michigan, Ross School of Business

Yesterday

Yesterday you got from raw data to a compiled draft

A regression you specified

Joined the raw data, set the spec yourself, ran it with the agent inside the lines.

Plan mode for analysis

The interview pattern made the spec yours, not the agent's.

A project memory file

Standing rules for this dataset. Loads every session in this folder.

A table skill

format-regression-table. Your next table starts from your format.

A figure skill

format-figure. Your next plot starts in your theme.

A compiled draft

Quarto memo rendered to .docx with the table and figure embedded.

Day 3 · Yesterday1

Today

Agenda

§ 1

Day 2 recap + Day 3 framing

Discussion

§ 2

Subagents + hostile reviewer

Lecture

§ 3

Dispatch a subagent to critique your §4 regression

Workshop

§ 4

MCP servers

Lecture

§ 5

Install + use Playwright MCP and a docs-search MCP

Workshop

§ 6

Plugins + the Superpowers ecosystem

Lecture

§ 7

Install Superpowers + plugin-dev + Crawfurd's paper-review skill

Workshop

§ 8

Spec-driven development

Lecture

§ 9

Use Superpowers to spec the CPD analysis

Workshop

§ 10

Plugin builder + lor preview

Lecture

§ 11

Generate + customize the lor plugin

Workshop

§ 12

Odds and Ends

Lecture

§ 13

Debrief + Day 4 preview

Discussion

Day 3 · Agenda2

§ 2

Subagents + hostile reviewer

Subagent

A subagent is a fresh-context dispatch of another Claude Code session

Main thread

system prompt

memory files

tool definitions

file reads: 23 files

prior turns (Day 2 & Day 3)

accumulated decisions

your framing of the project

“deploy a subagent to critique my regression”

full of accumulated context

dispatch

task brief

single message

return

Subagent

system prompt

“critique my regression” (task brief)

tool definitions

file reads: only what it needs

its own analysis

summary to send back

reads files and uses tools, then reports back

Academic analogue

Like handing a draft to a colleague who has never seen your project.

Day 3 · Subagent3

Benefits

The benefits of subagents

No context

The subagent doesn't see your conversation, your memory file, or the framing you've settled on. A true second opinion.

Preserve main context

Keeps work out of your main context.

Enforce constraints

You can restrict which tools your agent can access (e.g., no write tools).

Cheaper and faster

You can set subagents that do simple stuff to only use cheap models (e.g., Haiku).

Day 3 · Benefits4

Workflow

The subagent I use most: "generic" hostile reviewer

A generic dispatch hedges. Give the subagent a hostile role explicitly: it goes from "looks fine" to specific, line-numbered critique.

Don't

deploy a subagent to read 01_regression.R and tell me if it looks right

LLMs can be too gentle. A generic agent may just recommend small tweaks.

deploy a hostile subagent to review 01_regression.R

The role unlocks the critique. Same model, same tools, same files. The difference is what you told it to do.

Day 3 · Workflow5

Save your own

Save a subagent you'll reuse

A subagent you'll use more than once is just a Markdown file. Save it at user scope and it works in every project.

The file

~/.claude/agents/causal-critic.md

---
name: causal-critic
description: Scrutinizes a study's identification strategy. Use when the user asks for a review of their causal design or a referee's read on an estimation.
tools: Read, Grep, Glob
---
# Causal-identification critic
You are a skeptical referee. For any regression script you read,
scrutinize: the treatment-assignment mechanism, the key identifying
assumption (parallel trends, exclusion, continuity at the cutoff),
and the main threat to validity. Cite line numbers. Do not affirm.

Create it

$ /agents
# interactive wizard

Where it lives

User scope (~/.claude/agents/) loads in every project. Codex equivalent: ~/.codex/agents/.

Tools

You can restrict which tools the agent can use (e.g., read only).

Call it

>
deploy the causal-critic subagent on 01_regression.R

The same file works on tomorrow's RDD and next year's panel. Save once, reuse forever.

Day 3 · Save your own6

Scope

Where subagents live

When you create a subagent via /agents, you pick where the file lives. Two choices:

/agents picker

Create new agent
Choose location
›1. Project (.claude/agents/)
2. Personal (~/.claude/agents/)

PROJECT .claude/agents/

Lives in your project folder. Only loads when the agent runs in this project. Use for subagents tied to one dataset, paper, or codebase.

PERSONAL ~/.claude/agents/

Lives in your home directory. Loads in every project. Use for general-purpose subagents like your causal-critic.

Same shape as memory files. Codex equivalent: .agents/ (project) and ~/.codex/agents/ (personal).

Day 3 · Scope7

Parallel

Dispatch several at once

When questions are independent, send each to its own subagent at the same time. Every one works in a fresh context; the main agent gathers the results and synthesizes.

          >deploy 6 distinct subagents to research modern DID estimators
        

main agent

synthesizes the returns

CSDID

Sun-Abraham

BJS

dCDH

ETWFE

stacked DiD

each runs in its own fresh context

Two limits: a subagent cannot spawn its own subagents, and many detailed returns refill your main context, so keep each brief.

Day 3 · Parallel8

§ 3

Workshop · Hostile reviewer

Workshop

Dispatch a subagent to critique your Day 2 regression

Workshop instructions

socialscienceai.com/workshop/day-3#subagent

Day 3 · Workshop9

Debrief

Debrief & Questions

What worked?
Where did you get stuck?
What surprised you?

Day 3 · Debrief10

§ 4

MCP servers (aka Connectors)

Protocol

MCP is how the agent talks to outside tools

MCP (Model Context Protocol) is an open standard: one way for the agent to reach tools that live outside the chat. The agent is the client; an MCP server hands it a list of tools to call.

Client

The agent

Claude Code, in your session

calls a tool

→

Server

MCP server

exposes a list of tools

reaches

→

Tools

The outside world

a browser, live docs, databases, APIs

Open standard

Anyone can build a server. It is not tied to one agent or vendor.

Local or remote

A server runs on your machine or over the network, written in any language.

New tools, no new code

Connect a server and the agent gains its tools. You wrote none of them.

Day 3 · Protocol11

Install

Installing an MCP server is one command

You add the server to your agent's config. The agent launches it on startup and the tools appear automatically.

$ claude mcp add playwright \
    -- npx -y @playwright/mcp@latest
> added MCP server: playwright
> restart your session to load
$ claude
> connected: playwright
>   tools: browser_navigate,
>          browser_click,
>          browser_snapshot, ...

One command

Add the server name and a source. The agent handles the rest.

Tools appear at session start

When you open a new session, the agent enumerates everything the server exposes.

Works in every project

The server is configured globally. You don't reinstall per project.

Day 3 · Install12

Context cost

An MCP server spends your context twice

Those tools are not free. Connecting a server, and calling it, both land in the context window.

system prompt

memory files

MCP tool definitions

your prompt + work

MCP tool result

what an MCP server adds

Standing cost: tool definitions

A connected server's tool names, descriptions, and input schemas enter the window so the model knows what it can call. Paid every session, used or not. Connect many servers and they crowd the window and can blur which tool to pick.

Per-call cost: results

Each call's return, a page snapshot, a large JSON blob, a batch of search hits, lands in the window and stays there for the rest of the session.

Manage it. Claude Code now defers the schemas: Tool Search loads tool names and pulls a full schema only when a tool is used. /mcp shows what's connected, /context shows what's filling the window, and heavy MCP work belongs in a subagent so its results stay out of your main thread.

Day 3 · Context cost13

Servers

Useful MCPs

A small set of MCP servers carry most of the value for academic work. You will install the first two next.

01 Playwright

A real browser the agent can drive: navigate, click, fill forms, take snapshots.

02 Context7

Live documentation for R, Python, and major packages. Searchable by version.

03 GitHub

Repos, issues, pull requests, files across any repo you can access.

04 Computer use

Control of the screen, mouse, and keyboard. The agent can drive any desktop application like Stata, SPSS, or any GUI-only tool.

05 Google Drive

Files and folders in your Drive. Read and write Docs, Sheets, and Slides without leaving the agent.

06 Gmail claude.ai

Your inbox. Search threads, read messages, draft replies (no auto-send).

07 Vercel claude.ai

Vercel projects, deployments, build logs. Manage research dashboards or interactive supplements to your paper.

Day 3 · Servers14

§ 5

Workshop · MCP servers

Workshop

Install Playwright and Context7

Workshop instructions

socialscienceai.com/workshop/day-3#mcp

Both quick tests should return a real answer, not a confident guess.

Day 3 · Workshop15

Debrief

Debrief & Questions

What worked?
Where did you get stuck?
What surprised you?

Day 3 · Debrief16

§ 6

Plugins + Superpowers

Anatomy

A plugin is a folder that bundles several things

Some of these you already know. The whole point of a plugin is that they travel together as one install.

my-plugin/

skills/
recipes the agent invokes when a task matches
Day 2
commands/
stored prompts you invoke by name (/something)
New
agents/
specialized subagents the plugin dispatches
Today §2
.mcp.json
external tools the plugin connects, like a browser or live docs
Today §4
plugin.json
the small index file that marks the folder as a plugin
Today

hooks/
Scripts that fire automatically on events. Different from skills (agent decides) and commands (you decide). The plugin decides for you, based on what happened. e.g., re-knit your results table whenever the regression script changes.

A plugin's skills are namespaced: the lor plugin's writer is /lor:write-lor, not /write-lor.

Day 3 · Anatomy17

Mechanism

One install pulls all the pieces in at once

The Superpowers writing-plans skill is part of a larger plugin. You could copy it out by hand. You should not.

By hand

$ cp writing-plans/SKILL.md \
    ~/.claude/skills/
# ... but you missed:
# the helper scripts
# the related commands
# the hooks that wire it up

You copy one file and the skill half-works. The rest of the plugin's coordination is missing.

As a plugin

$ claude plugin install superpowers
> installed: superpowers
>   14 skills
>   3 commands
>   2 hooks
>   ready in every project

writing-plans arrives wired to the rest of Superpowers. Everything coordinates because it shipped together.

Trust check. A plugin runs with your permissions and can bundle hooks and MCP servers. Install only from sources you trust.

Day 3 · Mechanism18

Example

Superpowers is my most-used plugin

By Jesse Vincent (obra). Open source. Ships fourteen skills that orchestrate planning, debugging, code review, and verification.

              
              superpowers/
            
  skills/
brainstorming/
writing-plans/
executing-plans/
dispatching-parallel-agents/
subagent-driven-development/
requesting-code-review/
receiving-code-review/
test-driven-development/
systematic-debugging/
verification-before-completion/
using-git-worktrees/
finishing-a-development-branch/
writing-skills/
using-superpowers/
  commands/
  hooks/
  plugin.json

One install, fourteen skills

Plus the commands and hooks that orchestrate them. None of which you copy by hand.

Opinionated by design

It keeps the agent from skipping the plan, so you lock the spec before it runs anything. Fewer surprises, more reproducible runs.

Day 3 · Example19

Finding plugins

Find and manage plugins with /plugin

You do not hunt through files. One command opens a manager for browsing, installing, and turning plugins on or off.

The manager

/plugin opens a browser of available plugins. Install, enable, or disable from one place.

Marketplaces

A source you add once: /plugin marketplace add owner/repo. Its plugins then show up to install.

Scope

Install for every project (user) or just this repo (project), the same choice you make for skills.

Day 3 · Finding plugins20

Next: install three things

The next workshop block installs all three. You will use Superpowers throughout the rest of the day. The other two come into play later.

01Superpowers

The plugin from the last slide. You will use brainstorming and writing-plans in §9 to spec the next stage of your CPD analysis.

community · obra

02plugin-dev

Official Anthropic plugin for authoring plugins. You will use it in §11 to generate your own letter-of-recommendation plugin.

official · Anthropic

03paper-review

A plug-in (he calls it a skill) by Lars Crawfurd for adversarially reviewing academic drafts. Useful for your own paper-writing after the workshop.

community · Crawfurd

Day 3 · Next21

§ 7

Workshop · Install plugins

Workshop

Install Superpowers, plugin-dev, and a paper-review skill

Workshop instructions

socialscienceai.com/workshop/day-3#install-plugins

Day 3 · Workshop22

Debrief

Debrief & Questions

What worked?
Where did you get stuck?
What surprised you?

Day 3 · Debrief23

§ 8

Spec-driven development

Why

Without a spec, the agent fills the gaps

A vague brief produces a different analysis every run, so you cannot compare across them. Each run silently makes different researcher degrees-of-freedom choices: the garden of forking paths, now automated and invisible.

Don't

> explore heterogeneity in
  the CPD complaints data
[run 1] by district, OLS, SE clustered (level?)
[run 2] by year, logit, robust (HC1) SE
[run 3] by category, LPM, two-way FE, SE unspecified

Three runs, three designs, three answers. You spend the rest of the day deciding which one you actually meant.

> before any code: spec the
  analysis with brainstorming
interview: dimension? sample?
clustering? robustness specs?
writing plan to plan.md ...

Decisions locked before any code runs. Execution stays inside the lines.

Day 3 · Why24

Concept

Write the spec, then let the agent execute

Spec-driven development means writing a specification of what you want before letting the agent run any code. The spec names the inputs, the expected outputs, the assumptions, the edge cases, and what success looks like.

One artifact, three names: the spec is the plan, written to plan.md.

Why this is different

The spec is read by an executor that follows it literally. Every gap, every ambiguity, every unspecified default becomes a decision the agent makes for you.

The unlock: you lock the analytic choices before the agent sees the data, so every run answers the same pre-specified question. Pre-registration discipline, enforced by the tooling: the spec is a versioned, auditable record of sample, identification, and standard errors.

Day 3 · Concept25

Demo

Brainstorming asks one question at a time

Superpowers' brainstorming skill is Socratic. It refuses to let you hand-wave. Each answer locks one decision and unlocks the next question.

agent

What's the question you want this extended analysis to answer?

you

Does the officer-filed gap vary by district?

agent

Do you want to estimate one coefficient per district, or interact district with the officer-filed indicator? Different tables either way.

you

Interacted. I want the heterogeneity to come from one model.

agent

Beat fixed effects (the geographic beats that nest within districts) and the year FE from Day 2 still apply, or do the district interactions replace them? Note they do different work: the FE absorb each district's baseline level, the interactions estimate how the gap itself varies.

you

Keep beat and year FE. Add district-by-officer-filed interactions on top.

agent

Beat nests in district, so the beat FE already absorb the district main effects; the interactions stay identified as long as the officer-filed share varies within beat. Before any point estimates: I'll produce a coefficient plot, one row per district interaction with 95% CIs. Read the shape first, then the numbers. Agreed?

Last turn: agent volunteered the figure before quoting any coefficient. That's the discipline; the plan that lands will have “produce figure” as a step before “report the table.”

Day 3 · Demo26

Tools

Brainstorming → writing-plans → subagent-driven-development

Three Superpowers skills run in sequence. Each produces a durable artifact the next step picks up.

01 brainstorming

Socratic interview. One question at a time. Proposes 2-3 approaches, then presents the design section by section for your approval.

·Explores project context first

·Clarifying questions, one at a time

·2-3 approaches with trade-offs

·Design approved section by section

out: spec.md (design doc)

02 writing-plans

Decomposes the spec into a checklist of 2-5 minute tasks. Every task lists file paths, the actual code, the test commands, and the commit message.

·File structure mapped out first

·Bite-sized steps with - [ ] checkboxes

·No "TBD" or placeholder code

·Self-review pass for gaps

out: plan.md (task checklist)

03 subagent-driven-development

For each task: dispatch a fresh implementer subagent, then a spec-compliance reviewer, then a code-quality reviewer. Re-review loops until both approve.

·Implementer: writes, tests, commits

·Spec reviewer: matches the plan?

·Quality reviewer: well-built?

·Final reviewer over the whole branch

out: committed code

Not the same as Day 1's plan mode

Plan mode is session-scoped. The agent proposes steps and waits. Nothing leaves the chat.

writing-plans produces a durable file. A subagent in a fresh context can read and execute against it later.

Day 3 · Tools27

§ 9

Workshop · Spec the CPD analysis

Workshop

Use Superpowers to spec the CPD analysis

Workshop instructions

socialscienceai.com/workshop/day-3#spec-analysis

Day 3 · Workshop28

Debrief

Debrief & Questions

What worked?
Where did you get stuck?
What surprised you?

Day 3 · Debrief29

§ 10

Author your own plugin

The pattern

Recurring work with many steps is a plugin candidate

A plugin is your own workflow, written down once so the agent does it the same way every time. It is personal automation of recurring academic labor. Not software engineering. Not authorship for distribution.

LORs

Fifteen a year. All open the same. All take an afternoon.

R&R responses

Same shape every time. Reply to each reviewer; cross-reference your revisions.

Referee reports

Read paper, check identification, write decision, ranked critique.

Syllabus refresh

Update reading list, dates, rubric. Same chore every August.

You already do these. The agent could do them with you.

Plugin or skill? A standalone skill (in .claude/) is simplest for one project. Make it a plugin when you want it in every project or shared with others.

Day 3 · The pattern30

Plugin-dev

Plugin-dev interviews you, then writes the files

Same Socratic interview pattern from §9. You answer questions about what your plugin should do; plugin-dev writes the manifest, the skill files, the sub-agent stubs, the hooks. The interview is the work. The files are the byproduct.

/plugin-dev:create-plugin · eight phases

01

Discovery

02

Components

03

Design

04

Structure

05

Implement

06

Validate

07

Test

08

Document

about ten minutes of conversation. You can stop after any phase and still have a working plugin.

Day 3 · Plugin-dev31

LOR plugin

What you will build: a letter-of-recommendation writer

Hands an agent a CV and an opportunity description. The agent interviews you about the candidate, drafts in your voice using past letters as training, and produces a Quarto draft it renders to .docx.

Inputs

sample_cv.md

opportunity.md

voice_profile.md

past_letters/

→

Plugin

write-lor

interview draft voice-check

3 sub-agents

+ hook: protect past_letters/

→

Output

letter_draft.qmd

↓ quarto render

letter.docx

Day 3 · LOR plugin32

Real example

A plugin I built: ~26 agents to referee one paper

One skill orchestrates a gated pipeline. The main agent never reads the paper; sub-agents do the work and pass files forward.

0

Extract PDF + launch Codex (local) review

An independent local Codex review runs in the background.

background

1

Audit + verification

Audit plus mechanical checks (arithmetic, sample sizes, cross-table), in parallel.

severity

2

World-building

Reconstruct the setting, timeline, data-generating process, and contribution.

completeness

2.5

Theory critique

Logic, mechanism discrimination, and a theory devil, in sequence.

theory

3

Diagnosis

Face validity, methods, adversarial DGP, and triangulation, in parallel.

checkpoint

Fast-track reject can shortcut Phases 2.5 to 4 when Phases 1 and 2 already establish grounds.

4

Synthesis

One agent writes the plain-text review from every workspace file.

file

5

Meta-review

A reviewer's devil's advocate stress-tests the draft review.

hard gate

6

Accuracy audit

Every claim re-checked against the paper; errors auto-fixed.

accuracy

7

Stress test

Codex (local) checks each review point against the paper, one by one.

human gate

8

Compile PDF

Render the final referee report.

output

What each gate checks

severity: how serious the problems found so far are

completeness: is the paper understood well enough to critique it

theory: do the theory and its mechanism hold up

checkpoint: a pause to flag concerns before the review is written

file: was the review actually written, with every finding included

hard gate: you confirm the review is good enough to proceed

accuracy: does every claim in the review match the paper

human gate: you choose which stress-test fixes to apply

A gate that fails halts the run or sends it back. Opus reasons; Haiku verifies; Codex (local) is an independent second model.

Day 3 · Real example33

§ 11

Generate your own plugin

Workshop

Generate and customize the lor plugin

Workshop instructions

socialscienceai.com/workshop/day-3#lor-plugin

Day 3 · Workshop34

Debrief

Debrief & Questions

What worked?
Where did you get stuck?
What surprised you?

Day 3 · Debrief35

§ 12

Odds and Ends

Three more things worth knowing

GitHub from the prompt

Using GitHub with Claude Code is very easy. All you need to do is say "push to main" and it will do it for you if you have the GitHub plugin installed.

Build a website

Building a website is very easy too. Tell Claude Code what you want, then deploy it to Vercel (I recommend Vercel). Install the Vercel CLI and Claude Code can manage your projects, environment variables, and deployments for you.

Scheduling and remote control

/loop repeats a prompt in-session, /schedule creates a Cloud Routine that runs on Anthropic's infrastructure with your laptop closed, and claude remote-control lets you drive an HPC or office session from claude.ai/code or your phone.

Day 3 · Odds and Ends36

§ 13

Debrief + Day 4 preview

Debrief

Today you turned the agent into your toolkit

A hostile-reviewer critique

Dispatched a fresh-context subagent to attack your Day 2 regression. Read what it found.

Three plugins installed

Superpowers, plugin-dev, and Crawfurd's paper-review skill. All loaded in every project from now on.

A spec for your next analysis

Walked Superpowers' brainstorming into writing-plans. A one-page spec a subagent could execute against.

Two MCP servers

Playwright (a real browser) and Context7 (live docs). New capabilities the model did not have out of the box.

A plugin you authored

The lor plugin: a real letter-of-recommendation writer with sub-agents and a hook. Yours to customize.

A heuristic for what's next

Anything you do more than twice is a plugin candidate. Your referee reports. Your R&R replies. Next.

Day 3 · Today37

Tomorrow

Day 4 · BYOP

Bring your own project and we will work on it together.

Day 3 · Day 4 preview38