Day 01 / 04 Foundations & First Workflow

Agentic Coding Tools
for Researchers

Justin Frake

University of Michigan, Ross School of Business

Justin Frake
Instructor

Justin Frake

Assistant Professor of Strategy, Ross School of Business, University of Michigan

Research
I study labor markets, misconduct, and the role of politics in the workplace.
Methods
Causal inference and quasi-experimental designs.
Tools
Daily user of agentic coding tools for well over a year: Claude Code, Cursor, Codex, and Antigravity. I will primarily use Claude Code.
Find me
justinfrake.com · jfrake@umich.edu

Why I love agentic coding tools

More time on

  • Brainstorming
  • Reading
  • Framing
  • Theory
  • Logic
  • Insight
  • Empirical design

Less time on

  • Collecting data
  • Merging data
  • Writing code
  • Writing first drafts and rote prose
  • Editing typos and grammar
  • Finding and formatting references
  • Wrangling minor formatting
Day 1 · Why this matters1

What I actually use these tools for (research and teaching)

Research

  • All the coding for my papers
  • First drafts, then editing
  • Literature searches
  • Brainstorming and stress-testing designs
  • Scraping and API data collection
  • Replication packages
  • Data-visualization sites (politicsatwork.org)

Teaching

  • All slide decks, including this one
  • Syllabi and course pages
  • Problem sets and exam questions
  • Lecture notes, reading guides, and grading rubrics
Day 1 · What this looks like2

What I actually use these tools for (editorial work and life)

Editorial and service

  • A first pass on papers I review
  • Organizing reviewer reports into a decision
  • Finding candidate reviewers
  • Drafting letters of recommendation

Admin and life

  • Making my personal website (justinfrake.com)
  • Triaging my inbox
  • Building my to-do list and schedule
  • A morning brief, emailed to me
  • Grocery shopping and recipe planning
  • House-management software that keeps my kids doing chores
  • A camera that nudges my dog off the couch with a sound only she can hear

I have not written a line of code by hand in over a year. Most of what's on these two slides is not code at all.

Day 1 · What this looks like3

The judgment and responsibility are always mine

These tools do not make my decisions. They do the gathering, drafting, and formatting, so I spend my time on the part that is actually mine to do.

The agent prepares

  • Reads a submission and lays out the case for and against
  • Assembles reviewer reports and flags where they conflict
  • Does a first pass and maps the issues in a paper
  • Drafts a letter in my voice from the CV and my notes

I decide

  • Whether it is a desk reject
  • The decision, and every line of the letter
  • The actual assessment of the work
  • The strength of the signal I send

Everything I submit is mine, even if I did not write the first draft. I read it, and I sign off on it.

Day 1 · One rule4

Who you are

Role
Research field

Live, based on [N] survey responses.

Day 1 · Who's in the room5

Your day-to-day setup

Operating system
Paper-writing tool

Live, based on [N] survey responses.

Day 1 · Who's in the room6

How you code now

Coding frequency
Languages used regularly

Live, based on [N] survey responses.

Day 1 · Tool experience7

AI exposure and terminal comfort

AI tools used for work
Comfort with the terminal

Live, based on [N] survey responses.

Day 1 · Tool experience8

What you want from these tools

Projects you brought

  • Scraping news articles and product reviews
  • Cleaning large archival datasets (NLSY, administrative panels)
  • Meta-analysis and literature review work
  • Multilevel and dyadic team data
  • Re-running or restructuring an existing paper's analyses
  • Sentiment analysis on reviews

Goals for the four days

  • Build a research project end-to-end with the agent
  • Complex data analysis without losing researcher oversight
  • Level up from ChatGPT + R copy-paste to an agentic workflow
  • Specific deliverables: a personal website with chatbot; education research apps; remote-server data access
  • Manage AI use for reviewer and editor scrutiny
Day 1 · Who's in the room9

Agenda for this week

Day 01 · today
Foundations & first workflow
  • · Mental model of agentic coding
  • · Install & first conversation
  • · Context, plan mode, permissions
  • · Failure modes & customization
Day 02
Working with data
  • · Power prompts
  • · Regressions, tables, figures
  • · Project CLAUDE.md
  • · Compiling a paper draft
Day 03
Customize & extend
  • · Slash commands & hooks
  • · Plugins & skills
  • · Sub-agents
  • · MCP integrations
Day 04
Applied & BYOP
  • · Bring your own project
  • · Apply everything to your research
  • · Open Q&A and troubleshooting
Day 1 · Course agenda10

Agenda

§ 1
Foundations
Lecture
§ 2
Setup & first conversation
Workshop
§ 3
Tooling
Lecture
§ 4
Install R and Python
Workshop
§ 5–7
Working with the agent
Lecture
§ 8
Visualize the data
Workshop
§ 9–10
Slash commands & memory
Lecture
§ 11
Write your global CLAUDE.md
Workshop
§ 12
Privacy & copyright
Lecture
§ 13
Failure modes
Lecture
§ 14
Debrief & Day 2 preview
Discussion
Day 1 · Agenda11

How we'll work together over the next four days

Format

1
Interactive. Ask questions any time. Interrupt me if something isn't clear.
2
About half lecture, half workshop.
3
Flexible. Feel free to modify, explore, or skip any exercise.

Conduct

1
Stay muted unless you're speaking to the room.
2
Keep your camera on unless you have a compelling reason not to.
3
Email me anytime: jfrake@umich.edu.
Day 1 · How we'll work together12

Three goals for the next four days

1
Get comfortable using agentic coding tools.
2
Use them to write code and write papers.
3
Use advanced features like plugins, skills, and sub-agents.
Reality check
You will not be an expert in four days. These tools take practice. You have to learn how to talk to them, what you can trust from them, and what you can't.
Day 1 · Goals13

socialscienceai.com

Open in new tab →
Day 1 · Workshop website14
§ 1

What are agentic coding tools?

Agentic coding tools are like very smart (and sometimes weird) RAs

RA you can only text

chatgpt.com ChatGPT Help me fix the bug in my regression analysis code I need the code. Paste the regression code that is failing. Include: • Language and libraries • Error messages or warnings • What you expect vs. what you get If the bug is logical rather than syntactic, say what looks wrong and why.
Reads only what you paste in.
Can't see your files or run your code.
Can't change anything for you.

RA with your files and a computer

Your Notes Your Code Your Data Terminal Internet Papers claude code > read codebook.md > head wages.csv > write reg.R > Rscript reg.R ↳ F = 12.4
Reads your project files directly.
Runs code, edits scripts, runs tests.
Stops to ask when it's not sure.
Day 1 · Mental model15
Demo

Claude Code Demo

Claude Code v2.1 terminal welcome screen showing the model, organization, and the prompt input.

Four major agentic coding tools; we will use Claude Code

Primary
Claude Code Anthropic
Terminal-native agent. Reads files, runs code, asks permission. The tool I'll instruct with.
claude.ai/code
Reasonable alt
Codex OpenAI
Terminal agent comparable to Claude Code. Most concepts this week port over directly.
openai.com/codex
Cursor Anysphere
Agent-first workspace built around its AI code editor. As of Cursor 3, the workflow is directing fleets of agents.
cursor.com
Antigravity Google
Google's agentic development platform (IDE + CLI + SDK). Replaces Gemini CLI as of mid-2026.
antigravity.google
Day 1 · Landscape16

The differences between Claude, Claude.ai, and Claude Code CLI

Claude Claude.ai

The chat web app. claude.ai in a browser.

You type, the model answers. It cannot see your files, run your code, or save anything to your computer.

Same idea as ChatGPT, from Anthropic.
Claude Claude Code CLI

An agentic command-line application. Lives in your terminal.

Reads your files. Runs commands. Edits code.

We'll refer to this as the harness.
Foundation model
Claude
e.g., Opus 4.8

The Claude desktop and mobile apps also sit on this model. This week we work only with Claude Code.

Day 1 · The names17

The differences between ChatGPT and Codex CLI

OpenAI ChatGPT

The chat web app. chatgpt.com in a browser.

You type, the model answers. It cannot see your files, run your code, or save anything to your computer.

Same role as Claude.ai, from OpenAI.
OpenAI Codex CLI

An agentic command-line application. Lives in your terminal.

Reads your files. Runs commands. Edits code.

Same role as Claude Code, from OpenAI.
Foundation model
GPT-5.5
e.g., gpt-5.5
Day 1 · The names18

Codex vs Claude

Codex runs long, hands-off, terse Claude interactive, explains, judgment Verifiable / unambiguous Ambiguous / judgment make the tests pass refactor a module design a figure frame the finding
Day 1 · Codex vs Claude19
§ 2

How do agents work?

What an agent does between your prompt and its reply

YOU User AGENT Claude Code picks a tool, runs it, repeats until done. YOUR COMPUTER read files edit files run shell run R / Python search the web prompt 1 tool call 2 result 3 reply 4 repeat until done
One real loop
1.
prompt: “regress wages on tenure”
2.
tool calls: writes 01_reg.R, then Rscript 01_reg.R
3.
result: error, undefined column 'tenure_yrs'
↪ agent reads codebook.md, edits the script to fix the column name, reruns it, this time succeeds
4.
reply: regression table with N, coefficient, SE
Day 1 · The loop20

The agent asks before doing anything irreversible

Three safeguards that hold across Claude Code and Codex CLI. They are the reason this is safe to run on your own laptop.

01 It asks first

Before the agent edits a file, runs a shell command, or touches anything outside the current folder, it pauses and asks for approval. You see exactly what it's about to do.

Edit 01_clean.R?
 1. Yes 2. No 3. Always allow
02 You can stop it

If the agent goes the wrong direction mid-task, hit Esc. It stops immediately. You can correct course or start over.

[Esc]
> interrupted. what would you like to do?
03 You can rewind

Hit Esc Esc to roll back to an earlier point in the conversation.

Claude Code
rewinds chat, file edits, or both, your choice
Codex CLI
rewinds chat only; files stay as the agent left them

Backup Keep your work backed up somewhere else: Dropbox, Google Drive, GitHub.

Day 1 · Safety21

The harness is not the model

The customization stack hangs off the harness, not the model. Codex CLI has analogous primitives with different names.

HARNESS Claude Code MODEL Claude generates language, reasons over context. What the harness does: · reads & edits files · runs tools / shells · asks permission · manages context · loads the stack → CUSTOMIZATION STACK CLAUDE.md Skills Subagents Hooks MCP servers Plugins
Day 1 · Mental model22

Five levels of autonomy you give the agent

1
Browser chat
Copy from ChatGPT or Claude.ai, paste into your editor.
2
Code completion
Inline suggestions and chat inside your editor.
3
IDE agent mode
VS Code, JetBrains Junie, Windsurf, Zed. Reads files, runs tests, refactors.
4
Terminal agent
Claude Code or Codex CLI. The agent reads, edits, runs in your terminal.
5
Autonomous agents
The agent runs longer tasks without stopping to ask. Overnight jobs, scheduled reruns, parallel agents.
Common starting point
You’ll be here today
Start here by end of workshop
Day 1 · The ladder23

Context is the single most important concept this week

1
Context is everything the model can see right now.
The model has no other memory between turns. If it isn't in context, it doesn't exist.
2
Context determines quality.
Context determines how well the agent does what you want. It needs to know about your project and about your preferences.
3
Context is finite and it degrades.
  • Today's defaults: roughly 1M tokens for Claude Code (Opus 4.8), 400K for Codex CLI (GPT-5.5).
  • Big, but fills faster than you'd think.
  • Attention is a budget shared by every token, and quality drops well before the limit.
Day 1 · Why context24
§ 3

Get set up

One command installs the agent

Claude Code
macOS / Linux / WSL · in Terminal
curl -fsSL https://claude.ai/install.sh | bash
Windows · in PowerShell
irm https://claude.ai/install.ps1 | iex
Verify
claude --version
Codex CLI
macOS / Linux / WSL · in Terminal
curl -fsSL https://chatgpt.com/codex/install.sh | sh
Windows · in PowerShell
powershell -ExecutionPolicy ByPass -c "irm https://chatgpt.com/codex/install.ps1 | iex"
Verify
codex --version
Stuck? Troubleshooting at socialscienceai.com/help, including the inspect-then-run alternative for locked-down university machines. Windows: Git for Windows is optional but gives Claude Code a Bash shell.
Day 1 · Install25

Pick a tier

Claude Code · Anthropic
Pro · Floor
$20/mo
Tight caps. Expect to hit the limit mid-exercise.
Max (5x) · Recommended
$100/mo
Recommended for workshop use. Enough headroom to daily-drive Claude Code.
Max (20x)
$200/mo
Heavy daily use. My tier; justified by daily research volume, not workshop needs.
API
Per token
No caps; needs an API key. Most expensive day-to-day.
codex
Plus · Floor
$20/mo
2x promo through May 31; standard caps from June 1.
Pro
$200/mo
Heavy daily use. The Codex analogue of Max (20x).
API
Per token
No caps; needs an API key. Same per-token economics as Claude.

tiers as of May 28, 2026 · live pricing: claude.com/pricing, openai.com/chatgpt/pricing

Day 1 · Pricing26
§ 4

Workshop · Set up & first conversation

How workshop breakouts work

~5
people per breakout room

When you get stuck, in this order:

01 Ask your agent

Ask in plain English. Just like you'd ask me in the chat.

02 Ask your group

See if any of the other people in your group can help you. They may have had the same issue.

03 Ask me

Click Ask for Help in your breakout toolbar. I get a popup and join your room.

Day 1 · Breakout format27

A quick look at the data you're about to download

All week we use the Chicago Police Department complaint data published by the Invisible Institute. The §2 workshop has you download it; here's what's in it before you do.

complaints-complaints.csv
234,971 rows × 19 cols
# key columns
cr_id complaint id
incident_date 1967-2023
beat CPD beat number
complainant_type CIVILIAN / CPD
final_finding SU / NS / UN /
                EX / NA / blank

One row per complaint. Coverage densest 1990-2018; tails on either side.

Day 1 §2 downloads this
What's in final_finding
SU · Sustained. Allegation supported by the evidence; officer disciplined.
NS · Not sustained. Evidence insufficient.
UN · Unfounded. Allegation didn't happen.
EX · Exonerated. Happened but was justified.
NA / blank · Open, withdrawn, or not coded.

Day 2's regression builds sustained = (final_finding == "SU").

the outcome variable in tomorrow's spec

There's a second file, complaints-accused.csv, that links complaints to officers. We add it on Day 2 when we move to the merge.

Day 1 · Data tour28

Set up Claude Code and explore the data

Workshop instructions
socialscienceai.com/workshop/day-1#setup

Open the terminal, launch the agent, download the data, and ask it what it is.

Day 1 · Workshop29

Debrief & Questions

  • What worked?
  • Where did you get stuck?
  • What surprised you?
Day 1 · Debrief30
§ 5

Tooling

Both tools work in every interface

Start here this week
Terminal
(CLI)
Desktop
app
IDE
VS Code, JetBrains
Web
browser
Claude Code
Anthropic
Codex
OpenAI

Some workflows (SSH to a remote machine, automating an agent over many files in a script) only run in the terminal. Switch to a graphical option if you prefer.

Day 1 · Interface31

R and Python work better with coding agents than Stata or SAS

R  /  Python

Plays well with coding agents.

Training data
An enormous amount of R and Python code is on the public web, so coding agents write both fluently.
Terminal access
Both print their results and errors straight to the terminal, so the agent sees what happened and fixes its own mistakes.
Lots of training data. Results land in the terminal.
Stata · SAS · SPSS

Works, but less smoothly.

Training data
Less of this code lives on the public web, so the agent’s output is rougher and needs more correction.
Terminal access
These run from the terminal too, but results and errors land in separate log or viewer files, so the agent has to dig them out before it can fix anything.
Less training data. Results are buried in log files.
Day 1 · Language tools32

Compile your papers with Quarto or LaTeX, your choice

Both build a finished paper from a plain-text source the agent can read, diff, and version. Day 2 gives you two tracks: use Quarto if you must use Word to work with co-authors. If co-authors are okay with it, then LaTeX to PDF is what I use.

Quarto (Word)

For co-authors who edit in Word.

The agent writes paper.qmd, a plain-text source mixing Markdown prose and code. Quarto runs the code, then renders a .docx through Pandoc; apply house styles with a reference template. The agent edits the source and re-renders rather than patching the binary file.
LaTeX (PDF)

What I use. Only if co-authors are on board.

The agent writes paper.tex, and LaTeX compiles it to PDF. Tables, figures, and citations land where the code puts them. The agent reads every intermediate file and iterates on its own.

Pick the track that matches where your document is going. The Day 2 exercise has you build the draft in whichever one you choose.

Day 1 · Document tools33

Two non-agentic tools worth installing this week

marks what I use.

Dictation
Talk, don’t type
Hold a key, speak, release. Text drops into your prompt. About 3x faster for long instructions.
Both Claude Code (/voice) and Codex now have built-in dictation, still rolling out.
Wispr Flow Mac, Windows wisprflow.ai
SuperWhisper Mac superwhisper.com
Built-in dictation free; built into the OS.
Terminal
A better terminal
A modern terminal adds quality-of-life features: tabs for multiple sessions, autocomplete on commands, split panes for side-by-side work, clickable links and file paths, and cleaner handling of long agent output. The built-in Terminal.app still works.
Warp Mac, Windows, Linux warp.dev
Ghostty Mac, Linux ghostty.org
iTerm2 Mac iterm2.com
Day 1 · Companion tools34
§ 6

Workshop · Install R and Python

Install R and Python

Workshop instructions
socialscienceai.com/workshop/day-1#install

Have the agent install R, Python, Quarto, TinyTeX (optional), and the data-analysis packages.

Day 1 · Workshop35

Debrief & Questions

  • What worked?
  • Where did you get stuck?
  • What surprised you?
Day 1 · Debrief36
§ 7

Context management

Everything the agent has seen this session is one long document

SYSTEM INSTRUCTIONS CLAUDE.MD HIDDEN PREAMBLE loaded once at session start CLAUDE BUILT-IN TOOLS USER PROMPT YOUR PROMPT what you typed READ() WEB SEARCH() WRITE() TOOL CALLS files, web, shell, every result LLM MESSAGE LLM RESPONSE what the agent wrote back USER PROMPT READ() WRITE() LLM MESSAGE EACH TURN APPENDS prompts, tool calls, replies, all stacked USER PROMPT READ() WRITE() . . . until context fills. CONTEXT ROT agent gets dumber as context fills

Limit: 1M tokens for Claude Code (Opus 4.8); 400K for Codex CLI (GPT-5.5). A token is roughly three-quarters of a word. Fills faster than you'd think.

Day 1 · Context window37

See what's in the context window right now

The stack on the previous slide is abstract. The harness will show you the actual numbers. In Claude Code, type /context for a per-category breakdown. In Codex CLI, type /status for the equivalent summary.

What the terminal prints
> /context
System prompt        2.3k  (0.2%)
System tools         11.4k (1.1%)
MCP tools            0     (0.0%)
Memory files        3.1k  (0.3%)
Messages              47.8k (4.7%)
-----------------------------------
Free                 935k  (93.7%)

Illustrative numbers.

illustrative output
What to read off it
Memory files. Your CLAUDE.md and AGENTS.md totals. If this is double-digit percent, your memory file is too long.
System tools. The built-in Read / Edit / Bash schemas. Fixed cost, not yours to tune.
MCP tools. Anything you install on Day 3 shows up here.
Messages. Your prompts and the agent's replies, plus every tool result. Grows fastest.
Free. The headroom before auto-compaction triggers.
three habits: keep memory short, clear when free drops below 30%, compact intentionally
Day 1 · Context window38

Context comes from three places. The agent can reach two of them.

ON YOUR COMPUTER files, code, data IN YOUR HEAD intent, what “good” looks like ON THE INTERNET docs, papers, search CONTEXT WINDOW SYSTEM CLAUDE.MD USER PROMPT READ() WEB SEARCH() LLM MESSAGE
Dictate it
When you know what to say but typing is slow. Wispr Flow, SuperWhisper, Aqua Voice.
Have the agent interview you
When you might not know what to say. Next slide.
Day 1 · Three sources39

Drop everything the agent might need into your project folder

Your project folder
your-project/
CLAUDE.md Claude Code
AGENTS.md Codex CLI
↑ Loaded at every session start
notes/
research_questions.md
data_dictionary.md
meeting_notes.md
papers/
johnson-2023.pdf
li-and-smith-2024.pdf
code/
01_clean.R
02_analyze.R
data/
complaints-complaints.csv
drafts/
outline.md
CLAUDE.md / AGENTS.md is special
The only file the agent loads automatically at session start. Project conventions, your preferences, anything the agent should always know.
Other files
Research questions, data dictionaries, meeting notes, drafts. The agent reads them when you reference them, or when it explores the project.
Source materials
PDFs of papers you're citing, prior code, scratch analyses, codebooks. Drop them in. The agent reads them on demand.
Principle
Richer directory, richer context. If the agent might need it, drop it in.
Day 1 · Workspace setup40

Have the agent interview you before you ask for something hard

01 Vague task
plan my talk
what you typed
02 Interview

Q1. How long is the talk?

A. 20 minutes.

Q2. Who is the audience?

A. Business-school faculty.

Q3. What do you want them to remember?

A. My main result is robust to alternative IVs.

questions you may not have thought to specify
03 Sharp task
draft a 20-min talk for business-school faculty; landing point = main result is robust to alternative IVs
what plan mode now has to work with
Day 1 · Interview-me41

When context fills, the agent summarizes older turns into a memory block

SYSTEM INSTRUCTIONS CLAUDE.MD CLAUDE BUILT-IN TOOLS USER PROMPT READ() WEB SEARCH() WRITE() LLM MESSAGE USER PROMPT READ() WRITE() LLM MESSAGE USER PROMPT READ() WRITE() CONTEXT ROT agent gets dumber as context fills AFTER SYSTEM INSTRUCTIONS CLAUDE.MD CLAUDE BUILT-IN TOOLS USER PROMPT MEMORIES conversation gets summarized Recent conversation gets more detail. Older conversation gets less.

Auto-compact works fine for most research workflows when state is in files. Manual /compact gives you control over when and what to keep.

Day 1 · Auto-compaction42

What is Context Rot?

The phenomenon

As context fills, the agent has more to keep track of and starts losing the thread. The signal-to-noise ratio drops.

After many turns in a heavy session, performance is noticeably worse than at the start.

What it looks like
  • The agent misremembers earlier decisions
  • Forgets constraints you set early in the session
  • Goes in circles on a problem it would solve fresh
  • Uses old patterns after you've corrected it

As your session grows, accuracy on the same task drops.

100% 75% 50% 25% 0% 1K 4K 16K 64K 256K start ~turn 1 ~turn 3 ~turn 10 ~turn 30 Got it right Tokens in the prompt · approx. turn in a heavy agent session ~85% fresh session ~50% heavy, full session

Pattern after Hong, Troynikov & Huber (Chroma Research, 2025); independently confirmed in Du et al. (EMNLP Findings, 2025). Curve illustrative.

Day 1 · Context rot43

Three habits that keep context useful

01 Write state to files

Have the agent write progress, plans, and decisions to .md files in your project. Files persist across sessions; conversation context doesn't.

02 Watch the context budget

Run /context (Claude Code) or /status (Codex CLI) periodically. When free space drops below about 30% (roughly the zone where rot starts to bite), start fresh or compact. The signal is the budget, not a fixed turn count.

03 Compact intentionally

Type /compact before context fills, with a hint about what to keep. Better than waiting for auto-compact to lose detail.

three rules via Paul Goldsmith-Pinkham

Day 1 · Managing context44
§ 8

Plan mode

Plan mode is where you and the agent agree on the approach before any code runs.

Before the agent touches a file, it states what it intends to do. You read the steps, push back on the wrong ones, refine. By the time anything executes, the two of you have agreed on the approach.

Without plan mode Agent assumes

The agent reads your prompt and acts. It picks variable names, file paths, methods, and sequencing on its own.

By the time you see what it did, you are unwinding decisions that were never yours.

With plan mode you negotiate

The agent proposes the steps it would take, in order, before any of them run.

You approve, reject, or amend. The assumptions become visible and editable before they execute.

Day 1 · Plan mode45

A useful rhythm for non-trivial work: Plan, Execute, Clear

Naming the rhythm helps you notice which phase you're in mid-task. Use it as a default for non-trivial work. Skip Plan when the task is small.

01 Plan

The agent proposes what it would do, before it does anything. You read the plan, approve, reject, or amend.

Shift+Tab+Tab to enter plan mode (or skip it for your own planning ritual)
02 Execute

The agent runs the plan: edits files, runs commands, asks permission as needed. You watch, and intervene if it drifts.

ESC to interrupt
03 Clear

When context fills, /compact to keep going with key state preserved. When the task is done, /clear to start fresh. Files persist; conversation context doesn't.

/compact or /clear

framing via Steve Pocock

Day 1 · Plan / Execute / Clear46

What plan mode looks like in your terminal

Plan mode
1. Read codebook.md to confirm variable names
2. Load data/raw/complaints-complaints.csv
3. Profile: nrow, summary, missingness rates
4. Save profile to results/profile.txt
5. Report top findings inline
This plan requires approval
Do you want to proceed?
> 1. Yes, run the plan
2. No, let me give feedback to refine the plan
3. Cancel
Esc to cancel · Shift+Tab+Tab toggles plan mode

Stylized example. Live wording may differ slightly.

What’s happening
You hit Shift+Tab+Tab to enter plan mode. The agent proposes the steps it would take, before doing any of them.
What you see
A numbered list of discrete actions, so you can see exactly what the agent intends. If you want to change a step, pick option 2 and tell it what to change in plain English.
What you do
1 approves the plan and exits plan mode into execute. 2 sends the agent back with your feedback. 3 cancels.
When to use it
Any task with more than one step. Cleaning a dataset, multi-file refactors, anything where the wrong first move sends the agent in a wrong direction.
Day 1 · Plan-mode anatomy47
§ 9

Permissions

The agent asks before doing anything that changes your system

Reads
File contents, project structure, git status. Looks at things without changing them.
Runs silently. Reads do not change anything, so no prompt.
Edits
Creating, editing, or deleting files. Changes things on your computer.
Asks each time. Shows the proposed change first. Approve, decline, or always-approve for this project.
Shell
Runs terminal commands like Rscript, git, rm. (A shell is the program that runs your terminal commands.)
Asks before every command, including destructive ones like rm or git reset.

By default the agent asks before almost every action. Use /permissions to pre-approve commands as you build trust.

Day 1 · Permission model48

What a permission prompt looks like in your terminal

A terminal is the text-based interface to your computer. The agent prints prompts like this one when it needs your approval.

Bash command
Rscript code/01_clean.R
Run the data cleaning script
This command requires approval
Do you want to proceed?
> 1. Yes
2. Yes, and don't ask again for: Rscript *
3. No
Esc to cancel

Stylized example. Live wording may differ slightly.

What's happening
The agent paused before running a Bash command and is asking for approval.
What you see
The exact command, then a one-line description of what it's for. Read both before approving.
What you do
1 runs once. 2 auto-allows the pattern (any Rscript *) for this project. 3 declines. Esc cancels.
Day 1 · Anatomy49

Permission layers, least to most permissive.

The annoyance of repeated prompts is a real cost. The right fix is to remove the prompts you do not need, not to remove all the gates.

Layer 1. Allowlist specific commands.
Reads and other safe actions are always allowed without asking. You can add a list of other commands to skip prompts for, e.g. running R scripts or checking git status. Anything risky, like deleting files or force-pushing to a repo, still asks.
Layer 2. Auto-approve file edits.
File edits happen without asking. Running scripts still asks. Useful when edit approvals are the main annoyance and you know you can always undo with /rewind.
Layer 3. Auto mode. recommended starting point
A separate safety check decides which actions look routine (lets them run) and which look risky (asks you). Better than skipping all checks, because the dangerous stuff still gets reviewed. Turn it on in ~/.claude/settings.json. Needs Opus 4.6+ or Sonnet 4.6. Codex version: --ask-for-approval (Auto by default).
Layer 4. Skip all prompts. once you trust the workflow
Nothing asks for approval. Best when your work is recoverable (version control plus /rewind). As of recent versions this flag also bypasses writes it used to protect: .git/, .claude/, .vscode/, and shell config files (only catastrophic deletes still prompt), so the blast radius now includes your git history and dotfiles. Remaining risk: a webpage or document the agent reads could try to trick it into running something destructive. This is prompt injection; we come back to it in the next section. Codex version: --yolo.

Per session: Shift+Tab cycles modes in Claude Code; Codex CLI uses /permissions, which offers three modes: Auto (default), Read-only, and Full Access.

Day 1 · Graduated path50
§ 10

Workshop · Visualize the data

Visualize the data

Workshop instructions
socialscienceai.com/workshop/day-1#visualize

Have the agent make an annual time series and a density map of the complaints data.

Day 1 · Workshop51

Debrief & Questions

  • What worked?
  • Where did you get stuck?
  • What surprised you?
Day 1 · Debrief52
§ 11

Slash commands

Plain language goes to the model. Slash commands stop at the harness.

HARNESS Claude Code MODEL Claude generates language, reasons over context SLASH COMMAND /clear intercepted by the harness PLAIN LANGUAGE please clear our context passes through to the model

Three things only slash commands can do: clear context, show your cost, edit permissions.

Day 1 · Why slash commands53

The most important slash commands in Claude Code

/rewind
Roll back files and conversation to an earlier point. Your safety net for risky changes.
/clear
Start a new conversation. Past sessions stay in /resume.
/compact
Summarize the conversation to free context.
/resume
Resume a previous conversation by name or picker.
/usage
Show session cost, plan limits, activity stats. /cost is an alias.
/model
Switch models for the current session.
  • Opus: hard reasoning
  • Sonnet: fast iteration
/permissions
Manage allow / ask / deny rules for tool permissions.
/init
Initialize the project with a starter CLAUDE.md.
/exit
End the session.

+ advanced (later this week): /loop · /rc

These are Claude Code commands. Codex CLI has its own set; many overlap, but it has no /rewind (closest: /fork).

Day 1 · Slash commands54

Three newer commands worth knowing

All three landed in May 2026. The tools change weekly, so treat this as a snapshot, not a fixed feature set.

/effort
Dials how hard the model thinks, alongside /model. Turn it up (/effort xhigh) for a hard identification or debugging problem; turn it down to conserve rate limits on routine edits.
/goal
Set a finish line and Claude works across turns until it is met, showing elapsed time, turns, and tokens. A new rung on the autonomy ladder. Phrase the condition as something its own output demonstrates, e.g. “the tests pass.”
/workflows preview
Claude writes a script that fans the work across many background subagents at once. Plan-gated (Max, Team, Enterprise), so I may demo later in the week.

/goal is distinct from /loop, which repeats on a time interval rather than working toward a condition.

Day 1 · Recently shipped55
§ 12

Memory

Two primary places to store memories/rules

Every session, the agent reads both files and prepends them to its context. The global file follows you across every project. The project file lives in the folder and only applies there.

Global memory file
~/.claude/CLAUDE.md
~/.codex/AGENTS.md

What goes here. Who you are; which language and packages you use everywhere; the writing voice you want; reproducibility defaults.

Examples: “Primary languages are R and Python.” “Regressions in fixest.” “Cluster SE at the unit of treatment.”

written once on Day 1, edited rarely
Project memory file
<project>/CLAUDE.md
<project>/AGENTS.md

What goes here. The research question; data paths and codebook; the join key and unit of analysis; the project-specific clustering level.

Examples: “Cluster SE at beat.” “Inner join on cr_id; report the share dropped.” “Outcome is sustained.”

written when the project starts, edited as you learn the data

Both files are concatenated into the system prompt at session start. Keep universal rules in the global file. Keep facts that only apply to one paper in the project file.

Day 1 · Memory56
§ 13

Workshop · Write your global CLAUDE.md

Write your global CLAUDE.md

Workshop instructions
socialscienceai.com/workshop/day-1#global-memory

Have the agent interview you, then write a global CLAUDE.md from the conversation.

Day 1 · Workshop57

Debrief & Questions

  • What worked?
  • Where did you get stuck?
  • What surprised you?
Day 1 · Debrief58
§ 14

Privacy and copyright

Does Anthropic train on your Claude Code data?

No, not by default.
Retention varies by plan.
Pro / Max
30
days retention
Not trained on by default. You can opt in to training (extends retention to 5 years).
API
7
days retention
Never trained on. Pay-per-token tier; standard for any researcher using an API key.
Enterprise / Team
0
days, with ZDR
Never trained on. Zero Data Retention available on request.
Using IRB-protected or otherwise sensitive data? Consult first.
Prompts and tool outputs travel to Anthropic regardless of training behavior. That may not satisfy your IRB protocol or data-use agreement. Check with your IRB office, library, or research compliance officer before pointing Claude at sensitive data.

Sources: privacy.claude.com · platform.claude.com · code.claude.com (ZDR)

Day 1 · Privacy59

Does OpenAI train on your Codex CLI data?

Depends on how you sign in.
Default is opposite of Claude on consumer plans.
ChatGPT Plus / Pro
30
days after deletion
Trained on by default. Opt out in ChatGPT Settings → Data Controls → “Improve the model for everyone.”
API
30
days, abuse logs
Never trained on. Pay-per-token. ZDR available with OpenAI approval.
Business / Enterprise / Edu
0
days, with ZDR
Never trained on. Zero Data Retention supported for the Codex App, CLI, and IDE.
Using IRB-protected or otherwise sensitive data? Consult first.
Prompts and tool outputs travel to OpenAI regardless of training behavior. That may not satisfy your IRB protocol or data-use agreement. Check with your IRB office, library, or research compliance officer before pointing Codex at sensitive data.

Sources: developers.openai.com (API) · developers.openai.com (Codex enterprise) · help.openai.com (ChatGPT plans)

Day 1 · Privacy60

Can you upload copyrighted articles to Claude Code?

Probably not.
Most articles you have access to come through your library’s subscription.
Library-licensed PDFs
Most of what you read.
Articles accessed through your university’s subscription to Elsevier, Wiley, Springer, JSTOR, and similar publishers. The license between your library and the publisher typically restricts feeding subscribed content to AI tools.
Lower-risk sources
More permissive, but still verify.
CC-BY or CC0 articles. Check the explicit license; not all “open access” is permissive.
Preprints you authored. Verify the platform’s license; arXiv’s default does not grant AI-training rights.
Drafts of your own work that have not been signed over to a publisher.
Material your library has explicitly cleared for AI use. Most subscription contracts do not include this; ask before assuming.
When in doubt, ask first.
Most publisher subscription contracts restrict feeding subscribed content to AI tools, and personal purchase does not override the publisher’s TOS. Fair use is a US doctrine actively contested in pending AI litigation. Check with your library, IRB office, or general counsel before uploading material you did not write or explicitly license.

Sources: Sag, “Fairness and Fair Use in Generative AI” · SPARC TDM tracker · anthropic.com/legal/commercial-terms

I’m not a lawyer and this is not legal advice. Check with your library or counsel for your specific situation.

Day 1 · Copyright61
§ 15

Failure modes: what the agent can't do

You are the P.I. The agent is the R.A.

AI coding assistants can produce code, run scripts, and draft text. They do not have the contextual judgment to make the choices that determine whether an analysis is defensible. Six responsibilities remain with the researcher.

Responsibility What it covers
Taste The agent doesn't know your field. Journal-specific style conventions, current methodological debates, and niche or recent techniques are underrepresented in its training.
Theoretical framing What conversation in your field the paper joins, what counts as a contribution, which precedent to cite. The agent can suggest framings; it can't tell you what your field will register as novel.
Data quality Whether the data actually answers the question. The agent will run regressions on whatever variables you give it without flagging that they may not measure what you think they measure.
Methodological choices Sample construction, estimator, identification strategy, and standard error specification.
Verification Direct inspection of the analytic output: cleaned data, row counts, regression tables, figures.
Stopping rules When the analysis is done versus when you're p-hacking. The agent will keep running specifications as long as you ask for more.
Day 1 · Failure modes62

Sycophancy and leading prompts

The agent tends to comply with user direction. Prompts that signal a desired result, or that request unmotivated changes to the sample or specification, produce output that conforms to those expectations rather than to the structure of the data.

Leading Neutral
“This effect should be significant. Try a few specs.” “Run the pre-registered spec. Report the coefficient and SE.”
“Add controls until the coefficient is significant.” “Estimate the model with the pre-specified control set. Report results with and without controls.”
Day 1 · Failure modes63

Three failure modes to watch for

Failure mode What happens Mitigation
Destructive actions The agent runs a command that deletes or overwrites files you did not intend to lose. Use version control. Back up raw data outside the project. Commit changes often.
Changes that spread A request to change one file may quietly modify others. Name the file you want changed. Review the diff before approving.
Gaming the checks Asked to fix a failing check (e.g., "N should be 154,525"), the agent may weaken or delete the check itself rather than figure out why the number is off. Write your own data-integrity checks. Review any agent change to them before approving.
Day 1 · Failure modes64

Prompt injection: the risk that comes from outside

The other failure modes come from the agent or from you. This one comes from a third party. The agent treats text it reads as if it were your instructions, so a hidden line in an email, a PDF, or a web page can redirect it.

What it is Detail
Where you are exposed Anything that feeds the agent text you did not write: an email it triages, a paper you are reviewing, a scraped web page, a dataset's README, the output of an external tool.
What it can cause Leaking data the agent can see, running a shell command, or sending an email or making an edit you never asked for. The damage scales with the agent's reach: file access, network, and send permissions.
How to limit it Do not run full autonomy (Layer 4) on a session that is reading strangers' text. Keep approvals on for actions that reach outside: shell, send, network. Treat anything read from outside as data, not commands. Keep raw data backed up and work under version control so any edit is reversible.
The rule of thumb: the more autonomy you grant a session, the less untrusted text it should be reading.
Day 1 · Failure modes65
§ 16

Debrief & Day 2 preview

Today you went from zero to working

1
Installed Claude Code
You went from zero to a working agentic terminal.
2
Permissions you control
Allow you to control what the agent can do and access.
3
Plan mode
Read the plan before the agent does anything risky.
4
Real data exploration
Used shell tools and web search to inspect the Chicago police complaints data without writing a line of R.
5
Your global CLAUDE.md
Persists across every project, every session. Tomorrow the agent already knows you.
6
R and Python ready
Installed and verified. Tomorrow you start the analysis.
Day 1 · Today66

Day 2 · Working with data

Plan mode in practice
Plan mode and the prompts that get you the spec you intended.
Join the data
Merge the complaints and accused-officer files; report row counts and match rates.
Your first regression
Beat fixed effects, clustered standard errors, interpret the coefficient.
Reusable skills
Author table- and figure-formatting skills that encode your conventions.
Compile your draft
A Quarto/markdown and/or LaTeX memo rendered to Word (.docx) and/or PDF.
Day 1 · Day 2 preview67