Day 01 / 04 Foundations & First Workflow

Agentic Coding Tools
for Researchers

Justin Frake

University of Michigan, Ross School of Business

Instructor

Justin Frake

Assistant Professor of Strategy, Ross School of Business, University of Michigan

Research

I study labor markets, misconduct, and the role of politics in the workplace.

Methods

Causal inference and quasi-experimental designs.

Tools

Daily user of agentic coding tools for well over a year: Claude Code, Cursor, Codex, and Antigravity. I will primarily use Claude Code.

Find me

justinfrake.com · jfrake@umich.edu

Why this matters

Why I love agentic coding tools

More time on

Brainstorming
Reading
Framing
Theory
Logic
Insight
Empirical design

Less time on

Collecting data
Merging data
Writing code
Writing first drafts and rote prose
Editing typos and grammar
Finding and formatting references
Wrangling minor formatting

Day 1 · Why this matters1

What this looks like

What I actually use these tools for (research and teaching)

Research

All the coding for my papers
First drafts, then editing
Literature searches
Brainstorming and stress-testing designs
Scraping and API data collection
Replication packages
Data-visualization sites (politicsatwork.org)

Teaching

All slide decks, including this one
Syllabi and course pages
Problem sets and exam questions
Lecture notes, reading guides, and grading rubrics

Day 1 · What this looks like2

What this looks like

What I actually use these tools for (editorial work and life)

Editorial and service

A first pass on papers I review
Organizing reviewer reports into a decision
Finding candidate reviewers
Drafting letters of recommendation

Admin and life

Making my personal website (justinfrake.com)
Triaging my inbox
Building my to-do list and schedule
A morning brief, emailed to me
Grocery shopping and recipe planning
House-management software that keeps my kids doing chores
A camera that nudges my dog off the couch with a sound only she can hear

I have not written a line of code by hand in over a year. Most of what's on these two slides is not code at all.

Day 1 · What this looks like3

One rule

The judgment and responsibility are always mine

These tools do not make my decisions. They do the gathering, drafting, and formatting, so I spend my time on the part that is actually mine to do.

The agent prepares

Reads a submission and lays out the case for and against
Assembles reviewer reports and flags where they conflict
Does a first pass and maps the issues in a paper
Drafts a letter in my voice from the CV and my notes

I decide

Whether it is a desk reject
The decision, and every line of the letter
The actual assessment of the work
The strength of the signal I send

Everything I submit is mine, even if I did not write the first draft. I read it, and I sign off on it.

Day 1 · One rule4

Who's in the room

Who you are

Role

Research field

Live, based on [N] survey responses.

Day 1 · Who's in the room5

Who's in the room

Your day-to-day setup

Operating system

Paper-writing tool

Live, based on [N] survey responses.

Day 1 · Who's in the room6

Tool experience

How you code now

Coding frequency

Languages used regularly

Live, based on [N] survey responses.

Day 1 · Tool experience7

Tool experience

AI exposure and terminal comfort

AI tools used for work

Comfort with the terminal

Live, based on [N] survey responses.

Day 1 · Tool experience8

Who's in the room

What you want from these tools

Projects you brought

Scraping news articles and product reviews
Cleaning large archival datasets (NLSY, administrative panels)
Meta-analysis and literature review work
Multilevel and dyadic team data
Re-running or restructuring an existing paper's analyses
Sentiment analysis on reviews

Goals for the four days

Build a research project end-to-end with the agent
Complex data analysis without losing researcher oversight
Level up from ChatGPT + R copy-paste to an agentic workflow
Specific deliverables: a personal website with chatbot; education research apps; remote-server data access
Manage AI use for reviewer and editor scrutiny

Day 1 · Who's in the room9

Course

Agenda for this week

Day 01 · today

Foundations & first workflow

· Mental model of agentic coding
· Install & first conversation
· Context, plan mode, permissions
· Failure modes & customization

Day 02

Working with data

· Power prompts
· Regressions, tables, figures
· Project CLAUDE.md
· Compiling a paper draft

Day 03

Customize & extend

· Slash commands & hooks
· Plugins & skills
· Sub-agents
· MCP integrations

Day 04

Applied & BYOP

· Bring your own project
· Apply everything to your research
· Open Q&A and troubleshooting

Day 1 · Course agenda10

Today

Agenda

§ 1

Foundations

Lecture

§ 2

Setup & first conversation

Workshop

§ 3

Tooling

Lecture

§ 4

Install R and Python

Workshop

§ 5–7

Working with the agent

Lecture

§ 8

Visualize the data

Workshop

§ 9–10

Slash commands & memory

Lecture

§ 11

Write your global CLAUDE.md

Workshop

§ 12

Privacy & copyright

Lecture

§ 13

Failure modes

Lecture

§ 14

Debrief & Day 2 preview

Discussion

Day 1 · Agenda11

How we'll work together

How we'll work together over the next four days

Format

Interactive. Ask questions any time. Interrupt me if something isn't clear.

About half lecture, half workshop.

Flexible. Feel free to modify, explore, or skip any exercise.

Conduct

Stay muted unless you're speaking to the room.

Keep your camera on unless you have a compelling reason not to.

Email me anytime: jfrake@umich.edu.

Day 1 · How we'll work together12

Goals

Three goals for the next four days

Get comfortable using agentic coding tools.

Use them to write code and write papers.

Use advanced features like plugins, skills, and sub-agents.

Reality check

You will not be an expert in four days. These tools take practice. You have to learn how to talk to them, what you can trust from them, and what you can't.

Day 1 · Goals13

Workshop website

socialscienceai.com

Open in new tab →

Day 1 · Workshop website14

§ 1

What are agentic coding tools?

Mental model

Agentic coding tools are like very smart (and sometimes weird) RAs

RA you can only text

Reads only what you paste in.

Can't see your files or run your code.

Can't change anything for you.

RA with your files and a computer

Reads your project files directly.

Runs code, edits scripts, runs tests.

Stops to ask when it's not sure.

Day 1 · Mental model15

Demo

Claude Code Demo

Landscape

Four major agentic coding tools; we will use Claude Code

Primary

Claude Code Anthropic

Terminal-native agent. Reads files, runs code, asks permission. The tool I'll instruct with.

claude.ai/code

Reasonable alt

Codex OpenAI

Terminal agent comparable to Claude Code. Most concepts this week port over directly.

openai.com/codex

Cursor Anysphere

Agent-first workspace built around its AI code editor. As of Cursor 3, the workflow is directing fleets of agents.

cursor.com

Antigravity Google

Google's agentic development platform (IDE + CLI + SDK). Replaces Gemini CLI as of mid-2026.

antigravity.google

Day 1 · Landscape16

First, the names

The differences between Claude, Claude.ai, and Claude Code CLI

Claude.ai

The chat web app. claude.ai in a browser.

You type, the model answers. It cannot see your files, run your code, or save anything to your computer.

Same idea as ChatGPT, from Anthropic.

Claude Code CLI

An agentic command-line application. Lives in your terminal.

Reads your files. Runs commands. Edits code.

We'll refer to this as the harness.

Foundation model

Claude

e.g., Opus 4.8

The Claude desktop and mobile apps also sit on this model. This week we work only with Claude Code.

Day 1 · The names17

Same picture, OpenAI

The differences between ChatGPT and Codex CLI

ChatGPT

The chat web app. chatgpt.com in a browser.

You type, the model answers. It cannot see your files, run your code, or save anything to your computer.

Same role as Claude.ai, from OpenAI.

Codex CLI

An agentic command-line application. Lives in your terminal.

Reads your files. Runs commands. Edits code.

Same role as Claude Code, from OpenAI.

Foundation model

GPT-5.5

e.g., gpt-5.5

Day 1 · The names18

Codex vs Claude

Day 1 · Codex vs Claude19

§ 2

How do agents work?

The loop

What an agent does between your prompt and its reply

One real loop

1.

prompt: “regress wages on tenure”

2.

tool calls: writes 01_reg.R, then Rscript 01_reg.R

3.

result: error, undefined column 'tenure_yrs'

↪ agent reads codebook.md, edits the script to fix the column name, reruns it, this time succeeds

4.

reply: regression table with N, coefficient, SE

Day 1 · The loop20

Safety

The agent asks before doing anything irreversible

Three safeguards that hold across Claude Code and Codex CLI. They are the reason this is safe to run on your own laptop.

01 It asks first

Before the agent edits a file, runs a shell command, or touches anything outside the current folder, it pauses and asks for approval. You see exactly what it's about to do.

Edit 01_clean.R?
 1. Yes  2. No  3. Always allow

02 You can stop it

If the agent goes the wrong direction mid-task, hit Esc. It stops immediately. You can correct course or start over.

[Esc]
> interrupted. what would you like to do?

03 You can rewind

Hit Esc Esc to roll back to an earlier point in the conversation.

Claude Code

rewinds chat, file edits, or both, your choice

Codex CLI

rewinds chat only; files stay as the agent left them

Backup Keep your work backed up somewhere else: Dropbox, Google Drive, GitHub.

Day 1 · Safety21

Anatomy

The harness is not the model

The customization stack hangs off the harness, not the model. Codex CLI has analogous primitives with different names.

Day 1 · Mental model22

Where you are

Five levels of autonomy you give the agent

Browser chat

Copy from ChatGPT or Claude.ai, paste into your editor.

Code completion

Inline suggestions and chat inside your editor.

IDE agent mode

VS Code, JetBrains Junie, Windsurf, Zed. Reads files, runs tests, refactors.

Terminal agent

Claude Code or Codex CLI. The agent reads, edits, runs in your terminal.

Autonomous agents

The agent runs longer tasks without stopping to ask. Overnight jobs, scheduled reruns, parallel agents.

Common starting point

You’ll be here today

Start here by end of workshop

Day 1 · The ladder23

Why context

Context is the single most important concept this week

Context is everything the model can see right now.

The model has no other memory between turns. If it isn't in context, it doesn't exist.

Context determines quality.

Context determines how well the agent does what you want. It needs to know about your project and about your preferences.

Context is finite and it degrades.

Today's defaults: roughly 1M tokens for Claude Code (Opus 4.8), 400K for Codex CLI (GPT-5.5).
Big, but fills faster than you'd think.
Attention is a budget shared by every token, and quality drops well before the limit.

Day 1 · Why context24

§ 3

Get set up

Install

One command installs the agent

Claude Code

macOS / Linux / WSL · in Terminal

curl -fsSL https://claude.ai/install.sh | bash

Windows · in PowerShell

irm https://claude.ai/install.ps1 | iex

Verify

claude --version

Codex CLI

macOS / Linux / WSL · in Terminal

curl -fsSL https://chatgpt.com/codex/install.sh | sh

Windows · in PowerShell

powershell -ExecutionPolicy ByPass -c "irm https://chatgpt.com/codex/install.ps1 | iex"

Verify

codex --version

Stuck? Troubleshooting at socialscienceai.com/help, including the inspect-then-run alternative for locked-down university machines. Windows: Git for Windows is optional but gives Claude Code a Bash shell.

Day 1 · Install25

Pricing

Pick a tier

Claude Code · Anthropic

Pro · Floor

$20/mo

Tight caps. Expect to hit the limit mid-exercise.

Max (5x) · Recommended

$100/mo

Recommended for workshop use. Enough headroom to daily-drive Claude Code.

Max (20x)

$200/mo

Heavy daily use. My tier; justified by daily research volume, not workshop needs.

API

Per token

No caps; needs an API key. Most expensive day-to-day.

codex

Plus · Floor

$20/mo

2x promo through May 31; standard caps from June 1.

Pro

$200/mo

Heavy daily use. The Codex analogue of Max (20x).

API

Per token

No caps; needs an API key. Same per-token economics as Claude.

tiers as of May 28, 2026 · live pricing: claude.com/pricing, openai.com/chatgpt/pricing

Day 1 · Pricing26

§ 4

Workshop · Set up & first conversation

Workshop format

How workshop breakouts work

people per breakout room

When you get stuck, in this order:

01 Ask your agent

Ask in plain English. Just like you'd ask me in the chat.

02 Ask your group

See if any of the other people in your group can help you. They may have had the same issue.

03 Ask me

Click Ask for Help in your breakout toolbar. I get a popup and join your room.

Day 1 · Breakout format27

Data tour

A quick look at the data you're about to download

All week we use the Chicago Police Department complaint data published by the Invisible Institute. The §2 workshop has you download it; here's what's in it before you do.

complaints-complaints.csv

234,971 rows × 19 cols
# key columns
cr_id           complaint id
incident_date   1967-2023
beat            CPD beat number
complainant_type CIVILIAN / CPD
final_finding   SU / NS / UN /
                EX / NA / blank

One row per complaint. Coverage densest 1990-2018; tails on either side.

Day 1 §2 downloads this

What's in final_finding

SU · Sustained. Allegation supported by the evidence; officer disciplined.

NS · Not sustained. Evidence insufficient.

UN · Unfounded. Allegation didn't happen.

EX · Exonerated. Happened but was justified.

NA / blank · Open, withdrawn, or not coded.

Day 2's regression builds sustained = (final_finding == "SU").

the outcome variable in tomorrow's spec

There's a second file, complaints-accused.csv, that links complaints to officers. We add it on Day 2 when we move to the merge.

Day 1 · Data tour28

Workshop

Set up Claude Code and explore the data

Workshop instructions

socialscienceai.com/workshop/day-1#setup

Open the terminal, launch the agent, download the data, and ask it what it is.

Day 1 · Workshop29

Debrief

Debrief & Questions

What worked?
Where did you get stuck?
What surprised you?

Day 1 · Debrief30

§ 5

Tooling

Interface

Both tools work in every interface

Start here this week

Terminal
(CLI)

Desktop
app

IDE
VS Code, JetBrains

Web
browser

Claude Code

Anthropic

Codex

OpenAI

Some workflows (SSH to a remote machine, automating an agent over many files in a script) only run in the terminal. Switch to a graphical option if you prefer.

Day 1 · Interface31

Language tools

R and Python work better with coding agents than Stata or SAS

R / Python

Plays well with coding agents.

Training data

An enormous amount of R and Python code is on the public web, so coding agents write both fluently.

Terminal access

Both print their results and errors straight to the terminal, so the agent sees what happened and fixes its own mistakes.

Lots of training data. Results land in the terminal.

Stata · SAS · SPSS

Works, but less smoothly.

Training data

Less of this code lives on the public web, so the agent’s output is rougher and needs more correction.

Terminal access

These run from the terminal too, but results and errors land in separate log or viewer files, so the agent has to dig them out before it can fix anything.

Less training data. Results are buried in log files.

Day 1 · Language tools32

Document tools

Compile your papers with Quarto or LaTeX, your choice

Both build a finished paper from a plain-text source the agent can read, diff, and version. Day 2 gives you two tracks: use Quarto if you must use Word to work with co-authors. If co-authors are okay with it, then LaTeX to PDF is what I use.

Quarto (Word)

For co-authors who edit in Word.

The agent writes paper.qmd, a plain-text source mixing Markdown prose and code. Quarto runs the code, then renders a .docx through Pandoc; apply house styles with a reference template. The agent edits the source and re-renders rather than patching the binary file.

LaTeX (PDF)

What I use. Only if co-authors are on board.

The agent writes paper.tex, and LaTeX compiles it to PDF. Tables, figures, and citations land where the code puts them. The agent reads every intermediate file and iterates on its own.

Pick the track that matches where your document is going. The Day 2 exercise has you build the draft in whichever one you choose.

Day 1 · Document tools33

Companion tools

Two non-agentic tools worth installing this week

★ marks what I use.

Dictation

Talk, don’t type

Hold a key, speak, release. Text drops into your prompt. About 3x faster for long instructions.

Both Claude Code (/voice) and Codex now have built-in dictation, still rolling out.

★ Wispr Flow Mac, Windows wisprflow.ai

SuperWhisper Mac superwhisper.com

Built-in dictation free; built into the OS.

Terminal

A better terminal

A modern terminal adds quality-of-life features: tabs for multiple sessions, autocomplete on commands, split panes for side-by-side work, clickable links and file paths, and cleaner handling of long agent output. The built-in Terminal.app still works.

★ Warp Mac, Windows, Linux warp.dev

Ghostty Mac, Linux ghostty.org

iTerm2 Mac iterm2.com

Day 1 · Companion tools34

§ 6

Workshop · Install R and Python

Workshop

Install R and Python

Workshop instructions

socialscienceai.com/workshop/day-1#install

Have the agent install R, Python, Quarto, TinyTeX (optional), and the data-analysis packages.

Day 1 · Workshop35

Debrief

Debrief & Questions

What worked?
Where did you get stuck?
What surprised you?

Day 1 · Debrief36

§ 7

Context management

Context window

Everything the agent has seen this session is one long document

Limit: 1M tokens for Claude Code (Opus 4.8); 400K for Codex CLI (GPT-5.5). A token is roughly three-quarters of a word. Fills faster than you'd think.

Day 1 · Context window37

Context window

See what's in the context window right now

The stack on the previous slide is abstract. The harness will show you the actual numbers. In Claude Code, type /context for a per-category breakdown. In Codex CLI, type /status for the equivalent summary.

What the terminal prints

> /context
System prompt        2.3k  (0.2%)
System tools         11.4k (1.1%)
MCP tools            0     (0.0%)
Memory files        3.1k  (0.3%)
Messages              47.8k (4.7%)
-----------------------------------
Free                 935k  (93.7%)

Illustrative numbers.

illustrative output

What to read off it

Memory files. Your CLAUDE.md and AGENTS.md totals. If this is double-digit percent, your memory file is too long.

System tools. The built-in Read / Edit / Bash schemas. Fixed cost, not yours to tune.

MCP tools. Anything you install on Day 3 shows up here.

Messages. Your prompts and the agent's replies, plus every tool result. Grows fastest.

Free. The headroom before auto-compaction triggers.

three habits: keep memory short, clear when free drops below 30%, compact intentionally

Day 1 · Context window38

Three sources

Context comes from three places. The agent can reach two of them.

Dictate it

When you know what to say but typing is slow. Wispr Flow, SuperWhisper, Aqua Voice.

Have the agent interview you

When you might not know what to say. Next slide.

Day 1 · Three sources39

Workspace setup

Drop everything the agent might need into your project folder

Your project folder
your-project/

                  CLAUDE.md
                  Claude Code
                

                  AGENTS.md
                  Codex CLI
                
↑ Loaded at every session start
notes/
research_questions.md
data_dictionary.md
meeting_notes.md
papers/
johnson-2023.pdf
li-and-smith-2024.pdf
code/
01_clean.R
02_analyze.R
data/
complaints-complaints.csv
drafts/
outline.md

CLAUDE.md / AGENTS.md is special

The only file the agent loads automatically at session start. Project conventions, your preferences, anything the agent should always know.

Other files

Research questions, data dictionaries, meeting notes, drafts. The agent reads them when you reference them, or when it explores the project.

Source materials

PDFs of papers you're citing, prior code, scratch analyses, codebooks. Drop them in. The agent reads them on demand.

Principle

Richer directory, richer context. If the agent might need it, drop it in.

Day 1 · Workspace setup40

Interview-me

Have the agent interview you before you ask for something hard

01 Vague task

plan my talk

what you typed

02 Interview

Q1. How long is the talk?

A. 20 minutes.

Q2. Who is the audience?

A. Business-school faculty.

Q3. What do you want them to remember?

A. My main result is robust to alternative IVs.

questions you may not have thought to specify

03 Sharp task

draft a 20-min talk for business-school faculty; landing point = main result is robust to alternative IVs

what plan mode now has to work with

Day 1 · Interview-me41

Auto-compaction

When context fills, the agent summarizes older turns into a memory block

Auto-compact works fine for most research workflows when state is in files. Manual /compact gives you control over when and what to keep.

Day 1 · Auto-compaction42

Context rot

What is Context Rot?

The phenomenon

As context fills, the agent has more to keep track of and starts losing the thread. The signal-to-noise ratio drops.

After many turns in a heavy session, performance is noticeably worse than at the start.

What it looks like

The agent misremembers earlier decisions
Forgets constraints you set early in the session
Goes in circles on a problem it would solve fresh
Uses old patterns after you've corrected it

As your session grows, accuracy on the same task drops.

Pattern after Hong, Troynikov & Huber (Chroma Research, 2025); independently confirmed in Du et al. (EMNLP Findings, 2025). Curve illustrative.

Day 1 · Context rot43

Managing context

Three habits that keep context useful

01 Write state to files

Have the agent write progress, plans, and decisions to .md files in your project. Files persist across sessions; conversation context doesn't.

02 Watch the context budget

Run /context (Claude Code) or /status (Codex CLI) periodically. When free space drops below about 30% (roughly the zone where rot starts to bite), start fresh or compact. The signal is the budget, not a fixed turn count.

03 Compact intentionally

Type /compact before context fills, with a hint about what to keep. Better than waiting for auto-compact to lose detail.

three rules via Paul Goldsmith-Pinkham

Day 1 · Managing context44

§ 8

Plan mode

Plan mode is where you and the agent agree on the approach before any code runs.

Before the agent touches a file, it states what it intends to do. You read the steps, push back on the wrong ones, refine. By the time anything executes, the two of you have agreed on the approach.

Without plan mode Agent assumes

The agent reads your prompt and acts. It picks variable names, file paths, methods, and sequencing on its own.

By the time you see what it did, you are unwinding decisions that were never yours.

With plan mode you negotiate

The agent proposes the steps it would take, in order, before any of them run.

You approve, reject, or amend. The assumptions become visible and editable before they execute.

Day 1 · Plan mode45

Workflow rhythm

A useful rhythm for non-trivial work: Plan, Execute, Clear

Naming the rhythm helps you notice which phase you're in mid-task. Use it as a default for non-trivial work. Skip Plan when the task is small.

01 Plan

The agent proposes what it would do, before it does anything. You read the plan, approve, reject, or amend.

Shift+Tab+Tab to enter plan mode (or skip it for your own planning ritual)

02 Execute

The agent runs the plan: edits files, runs commands, asks permission as needed. You watch, and intervene if it drifts.

ESC to interrupt

03 Clear

When context fills, /compact to keep going with key state preserved. When the task is done, /clear to start fresh. Files persist; conversation context doesn't.

/compact or /clear

framing via Steve Pocock

Day 1 · Plan / Execute / Clear46

Plan-mode anatomy

What plan mode looks like in your terminal

Plan mode
1. Read codebook.md to confirm variable names
2. Load data/raw/complaints-complaints.csv
3. Profile: nrow, summary, missingness rates
4. Save profile to results/profile.txt
5. Report top findings inline
This plan requires approval
Do you want to proceed?
> 1. Yes, run the plan
2. No, let me give feedback to refine the plan
3. Cancel
Esc to cancel · Shift+Tab+Tab toggles plan mode

Stylized example. Live wording may differ slightly.

What’s happening

You hit Shift+Tab+Tab to enter plan mode. The agent proposes the steps it would take, before doing any of them.

What you see

A numbered list of discrete actions, so you can see exactly what the agent intends. If you want to change a step, pick option 2 and tell it what to change in plain English.

What you do

1 approves the plan and exits plan mode into execute. 2 sends the agent back with your feedback. 3 cancels.

When to use it

Any task with more than one step. Cleaning a dataset, multi-file refactors, anything where the wrong first move sends the agent in a wrong direction.

Day 1 · Plan-mode anatomy47

§ 9

Permissions

Permission model

The agent asks before doing anything that changes your system

Reads

File contents, project structure, git status. Looks at things without changing them.

Runs silently. Reads do not change anything, so no prompt.

Edits

Creating, editing, or deleting files. Changes things on your computer.

Asks each time. Shows the proposed change first. Approve, decline, or always-approve for this project.

Shell

Runs terminal commands like Rscript, git, rm. (A shell is the program that runs your terminal commands.)

Asks before every command, including destructive ones like rm or git reset.

By default the agent asks before almost every action. Use /permissions to pre-approve commands as you build trust.

Day 1 · Permission model48

Anatomy

What a permission prompt looks like in your terminal

A terminal is the text-based interface to your computer. The agent prints prompts like this one when it needs your approval.

Bash command
Rscript code/01_clean.R
Run the data cleaning script
This command requires approval
Do you want to proceed?
> 1. Yes
2. Yes, and don't ask again for: Rscript *
3. No
Esc to cancel

Stylized example. Live wording may differ slightly.

What's happening

The agent paused before running a Bash command and is asking for approval.

What you see

The exact command, then a one-line description of what it's for. Read both before approving.

What you do

1 runs once. 2 auto-allows the pattern (any Rscript *) for this project. 3 declines. Esc cancels.

Day 1 · Anatomy49

Graduated path

Permission layers, least to most permissive.

The annoyance of repeated prompts is a real cost. The right fix is to remove the prompts you do not need, not to remove all the gates.

Layer 1. Allowlist specific commands.

Reads and other safe actions are always allowed without asking. You can add a list of other commands to skip prompts for, e.g. running R scripts or checking git status. Anything risky, like deleting files or force-pushing to a repo, still asks.

Layer 2. Auto-approve file edits.

File edits happen without asking. Running scripts still asks. Useful when edit approvals are the main annoyance and you know you can always undo with /rewind.

Layer 3. Auto mode. recommended starting point

A separate safety check decides which actions look routine (lets them run) and which look risky (asks you). Better than skipping all checks, because the dangerous stuff still gets reviewed. Turn it on in ~/.claude/settings.json. Needs Opus 4.6+ or Sonnet 4.6. Codex version: --ask-for-approval (Auto by default).

Layer 4. Skip all prompts. once you trust the workflow

Nothing asks for approval. Best when your work is recoverable (version control plus /rewind). As of recent versions this flag also bypasses writes it used to protect: .git/, .claude/, .vscode/, and shell config files (only catastrophic deletes still prompt), so the blast radius now includes your git history and dotfiles. Remaining risk: a webpage or document the agent reads could try to trick it into running something destructive. This is prompt injection; we come back to it in the next section. Codex version: --yolo.

Per session: Shift+Tab cycles modes in Claude Code; Codex CLI uses /permissions, which offers three modes: Auto (default), Read-only, and Full Access.

Day 1 · Graduated path50

§ 10

Workshop · Visualize the data

Workshop

Visualize the data

Workshop instructions

socialscienceai.com/workshop/day-1#visualize

Have the agent make an annual time series and a density map of the complaints data.

Day 1 · Workshop51

Debrief

Debrief & Questions

What worked?
Where did you get stuck?
What surprised you?

Day 1 · Debrief52

§ 11

Slash commands

Why slash commands

Plain language goes to the model. Slash commands stop at the harness.

Three things only slash commands can do: clear context, show your cost, edit permissions.

Day 1 · Why slash commands53

Slash commands

The most important slash commands in Claude Code

/rewind

Roll back files and conversation to an earlier point. Your safety net for risky changes.

/clear

Start a new conversation. Past sessions stay in /resume.

/compact

Summarize the conversation to free context.

/resume

Resume a previous conversation by name or picker.

/usage

Show session cost, plan limits, activity stats. /cost is an alias.

/model

Switch models for the current session.

Opus: hard reasoning
Sonnet: fast iteration

/permissions

Manage allow / ask / deny rules for tool permissions.

/init

Initialize the project with a starter CLAUDE.md.

/exit

End the session.

+ advanced (later this week): /loop · /rc

These are Claude Code commands. Codex CLI has its own set; many overlap, but it has no /rewind (closest: /fork).

Day 1 · Slash commands54

Recently shipped

Three newer commands worth knowing

All three landed in May 2026. The tools change weekly, so treat this as a snapshot, not a fixed feature set.

/effort

Dials how hard the model thinks, alongside /model. Turn it up (/effort xhigh) for a hard identification or debugging problem; turn it down to conserve rate limits on routine edits.

/goal

Set a finish line and Claude works across turns until it is met, showing elapsed time, turns, and tokens. A new rung on the autonomy ladder. Phrase the condition as something its own output demonstrates, e.g. “the tests pass.”

/workflows preview

Claude writes a script that fans the work across many background subagents at once. Plan-gated (Max, Team, Enterprise), so I may demo later in the week.

/goal is distinct from /loop, which repeats on a time interval rather than working toward a condition.

Day 1 · Recently shipped55

§ 12

Memory

Two primary places to store memories/rules

Every session, the agent reads both files and prepends them to its context. The global file follows you across every project. The project file lives in the folder and only applies there.

Global memory file

~/.claude/CLAUDE.md
~/.codex/AGENTS.md

What goes here. Who you are; which language and packages you use everywhere; the writing voice you want; reproducibility defaults.

Examples: “Primary languages are R and Python.” “Regressions in fixest.” “Cluster SE at the unit of treatment.”

written once on Day 1, edited rarely

Project memory file

<project>/CLAUDE.md
<project>/AGENTS.md

What goes here. The research question; data paths and codebook; the join key and unit of analysis; the project-specific clustering level.

Examples: “Cluster SE at beat.” “Inner join on cr_id; report the share dropped.” “Outcome is sustained.”

written when the project starts, edited as you learn the data

Both files are concatenated into the system prompt at session start. Keep universal rules in the global file. Keep facts that only apply to one paper in the project file.

Day 1 · Memory56

§ 13

Workshop · Write your global CLAUDE.md

Workshop

Write your global CLAUDE.md

Workshop instructions

socialscienceai.com/workshop/day-1#global-memory

Have the agent interview you, then write a global CLAUDE.md from the conversation.

Day 1 · Workshop57

Debrief

Debrief & Questions

What worked?
Where did you get stuck?
What surprised you?

Day 1 · Debrief58

§ 14

Privacy and copyright

Privacy

Does Anthropic train on your Claude Code data?

No, not by default.

Retention varies by plan.

Pro / Max

days retention

Not trained on by default. You can opt in to training (extends retention to 5 years).

API

days retention

Never trained on. Pay-per-token tier; standard for any researcher using an API key.

Enterprise / Team

days, with ZDR

Never trained on. Zero Data Retention available on request.

Using IRB-protected or otherwise sensitive data? Consult first.

Prompts and tool outputs travel to Anthropic regardless of training behavior. That may not satisfy your IRB protocol or data-use agreement. Check with your IRB office, library, or research compliance officer before pointing Claude at sensitive data.

Sources: privacy.claude.com · platform.claude.com · code.claude.com (ZDR)

Day 1 · Privacy59

Privacy

Does OpenAI train on your Codex CLI data?

Depends on how you sign in.

Default is opposite of Claude on consumer plans.

ChatGPT Plus / Pro

days after deletion

Trained on by default. Opt out in ChatGPT Settings → Data Controls → “Improve the model for everyone.”

API

days, abuse logs

Never trained on. Pay-per-token. ZDR available with OpenAI approval.

Business / Enterprise / Edu

days, with ZDR

Never trained on. Zero Data Retention supported for the Codex App, CLI, and IDE.

Using IRB-protected or otherwise sensitive data? Consult first.

Prompts and tool outputs travel to OpenAI regardless of training behavior. That may not satisfy your IRB protocol or data-use agreement. Check with your IRB office, library, or research compliance officer before pointing Codex at sensitive data.

Sources: developers.openai.com (API) · developers.openai.com (Codex enterprise) · help.openai.com (ChatGPT plans)

Day 1 · Privacy60

Can you upload copyrighted articles to Claude Code?

Probably not.

Most articles you have access to come through your library’s subscription.

Library-licensed PDFs

Most of what you read.

Articles accessed through your university’s subscription to Elsevier, Wiley, Springer, JSTOR, and similar publishers. The license between your library and the publisher typically restricts feeding subscribed content to AI tools.

Lower-risk sources

More permissive, but still verify.

CC-BY or CC0 articles. Check the explicit license; not all “open access” is permissive.

Preprints you authored. Verify the platform’s license; arXiv’s default does not grant AI-training rights.

Drafts of your own work that have not been signed over to a publisher.

Material your library has explicitly cleared for AI use. Most subscription contracts do not include this; ask before assuming.

When in doubt, ask first.

Most publisher subscription contracts restrict feeding subscribed content to AI tools, and personal purchase does not override the publisher’s TOS. Fair use is a US doctrine actively contested in pending AI litigation. Check with your library, IRB office, or general counsel before uploading material you did not write or explicitly license.

Sources: Sag, “Fairness and Fair Use in Generative AI” · SPARC TDM tracker · anthropic.com/legal/commercial-terms

I’m not a lawyer and this is not legal advice. Check with your library or counsel for your specific situation.

Day 1 · Copyright61

§ 15

Failure modes: what the agent can't do

Failure modes

You are the P.I. The agent is the R.A.

AI coding assistants can produce code, run scripts, and draft text. They do not have the contextual judgment to make the choices that determine whether an analysis is defensible. Six responsibilities remain with the researcher.

Responsibility	What it covers
Taste	The agent doesn't know your field. Journal-specific style conventions, current methodological debates, and niche or recent techniques are underrepresented in its training.
Theoretical framing	What conversation in your field the paper joins, what counts as a contribution, which precedent to cite. The agent can suggest framings; it can't tell you what your field will register as novel.
Data quality	Whether the data actually answers the question. The agent will run regressions on whatever variables you give it without flagging that they may not measure what you think they measure.
Methodological choices	Sample construction, estimator, identification strategy, and standard error specification.
Verification	Direct inspection of the analytic output: cleaned data, row counts, regression tables, figures.
Stopping rules	When the analysis is done versus when you're p-hacking. The agent will keep running specifications as long as you ask for more.

Day 1 · Failure modes62

Failure modes

Sycophancy and leading prompts

The agent tends to comply with user direction. Prompts that signal a desired result, or that request unmotivated changes to the sample or specification, produce output that conforms to those expectations rather than to the structure of the data.

Leading	Neutral
“This effect should be significant. Try a few specs.”	“Run the pre-registered spec. Report the coefficient and SE.”
“Add controls until the coefficient is significant.”	“Estimate the model with the pre-specified control set. Report results with and without controls.”

Day 1 · Failure modes63

Failure modes

Three failure modes to watch for

Failure mode	What happens	Mitigation
Destructive actions	The agent runs a command that deletes or overwrites files you did not intend to lose.	Use version control. Back up raw data outside the project. Commit changes often.
Changes that spread	A request to change one file may quietly modify others.	Name the file you want changed. Review the diff before approving.
Gaming the checks	Asked to fix a failing check (e.g., "N should be 154,525"), the agent may weaken or delete the check itself rather than figure out why the number is off.	Write your own data-integrity checks. Review any agent change to them before approving.

Day 1 · Failure modes64

Failure modes

Prompt injection: the risk that comes from outside

The other failure modes come from the agent or from you. This one comes from a third party. The agent treats text it reads as if it were your instructions, so a hidden line in an email, a PDF, or a web page can redirect it.

What it is	Detail
Where you are exposed	Anything that feeds the agent text you did not write: an email it triages, a paper you are reviewing, a scraped web page, a dataset's README, the output of an external tool.
What it can cause	Leaking data the agent can see, running a shell command, or sending an email or making an edit you never asked for. The damage scales with the agent's reach: file access, network, and send permissions.
How to limit it	Do not run full autonomy (Layer 4) on a session that is reading strangers' text. Keep approvals on for actions that reach outside: shell, send, network. Treat anything read from outside as data, not commands. Keep raw data backed up and work under version control so any edit is reversible.

The rule of thumb: the more autonomy you grant a session, the less untrusted text it should be reading.

Day 1 · Failure modes65

§ 16

Debrief & Day 2 preview

Debrief

Today you went from zero to working

Installed Claude Code

You went from zero to a working agentic terminal.

Permissions you control

Allow you to control what the agent can do and access.

Plan mode

Read the plan before the agent does anything risky.

Real data exploration

Used shell tools and web search to inspect the Chicago police complaints data without writing a line of R.

Your global CLAUDE.md

Persists across every project, every session. Tomorrow the agent already knows you.

R and Python ready

Installed and verified. Tomorrow you start the analysis.

Day 1 · Today66

Tomorrow

Day 2 · Working with data

Plan mode in practice

Plan mode and the prompts that get you the spec you intended.

Join the data

Merge the complaints and accused-officer files; report row counts and match rates.

Your first regression

Beat fixed effects, clustered standard errors, interpret the coefficient.

Reusable skills

Author table- and figure-formatting skills that encode your conventions.

Compile your draft

A Quarto/markdown and/or LaTeX memo rendered to Word (.docx) and/or PDF.

Day 1 · Day 2 preview67

Agentic Coding Toolsfor Researchers

Justin Frake

Why I love agentic coding tools

More time on

Less time on

What I actually use these tools for (research and teaching)

Research

Teaching

What I actually use these tools for (editorial work and life)

Editorial and service

Admin and life

The judgment and responsibility are always mine

The agent prepares

I decide

Who you are

Your day-to-day setup

How you code now

AI exposure and terminal comfort

What you want from these tools

Projects you brought

Goals for the four days

Agenda for this week

Agenda

How we'll work together over the next four days

Format

Conduct

Three goals for the next four days

socialscienceai.com

What are agentic coding tools?

Agentic coding tools are like very smart (and sometimes weird) RAs

Claude Code Demo

Four major agentic coding tools; we will use Claude Code

The differences between Claude, Claude.ai, and Claude Code CLI

The differences between ChatGPT and Codex CLI

Codex vs Claude

How do agents work?

What an agent does between your prompt and its reply

The agent asks before doing anything irreversible

The harness is not the model

Five levels of autonomy you give the agent

Context is the single most important concept this week

Get set up

One command installs the agent

Pick a tier

Workshop · Set up & first conversation

How workshop breakouts work

When you get stuck, in this order:

A quick look at the data you're about to download

Set up Claude Code and explore the data

Debrief & Questions

Tooling

Both tools work in every interface

R and Python work better with coding agents than Stata or SAS

Compile your papers with Quarto or LaTeX, your choice

Two non-agentic tools worth installing this week

Workshop · Install R and Python

Install R and Python

Debrief & Questions

Context management

Everything the agent has seen this session is one long document

See what's in the context window right now

Context comes from three places. The agent can reach two of them.

Drop everything the agent might need into your project folder

Have the agent interview you before you ask for something hard

When context fills, the agent summarizes older turns into a memory block

What is Context Rot?

Three habits that keep context useful

Plan mode

Plan mode is where you and the agent agree on the approach before any code runs.

A useful rhythm for non-trivial work: Plan, Execute, Clear

What plan mode looks like in your terminal

Permissions

The agent asks before doing anything that changes your system

What a permission prompt looks like in your terminal

Permission layers, least to most permissive.

Workshop · Visualize the data

Visualize the data

Debrief & Questions

Slash commands

Plain language goes to the model. Slash commands stop at the harness.

Agentic Coding Tools
for Researchers