Use advanced features like plugins, skills, and sub-agents.
Reality check
You will not be an expert in four days. These tools take practice. You have to learn how to talk to them, what you can trust from them, and what you can't.
The differences between Claude, Claude.ai, and Claude Code CLI
Claude.ai
The chat web app.claude.ai in a browser.
You type, the model answers. It cannot see your files, run your code, or save anything to your computer.
Same idea as ChatGPT, from Anthropic.
Claude Code CLI
An agentic command-line application. Lives in your terminal.
Reads your files. Runs commands. Edits code.
We'll refer to this as the harness.
Foundation model
Claude
e.g., Opus 4.8
The Claude desktop and mobile apps also sit on this model. This week we work only with Claude Code.
Day 1 · The names17
Same picture, OpenAI
The differences between ChatGPT and Codex CLI
ChatGPT
The chat web app.chatgpt.com in a browser.
You type, the model answers. It cannot see your files, run your code, or save anything to your computer.
Same role as Claude.ai, from OpenAI.
Codex CLI
An agentic command-line application. Lives in your terminal.
Reads your files. Runs commands. Edits code.
Same role as Claude Code, from OpenAI.
Foundation model
GPT-5.5
e.g., gpt-5.5
Day 1 · The names18
Codex vs Claude
Day 1 · Codex vs Claude19
§ 2
How do agents work?
The loop
What an agent does between your prompt and its reply
One real loop
1.
prompt: “regress wages on tenure”
2.
tool calls: writes 01_reg.R, then Rscript 01_reg.R
3.
result: error,undefined column 'tenure_yrs'
↪ agent reads codebook.md, edits the script to fix the column name, reruns it, this time succeeds
4.
reply: regression table with N, coefficient, SE
Day 1 · The loop20
Safety
The agent asks before doing anything irreversible
Three safeguards that hold across Claude Code and Codex CLI. They are the reason this is safe to run on your own laptop.
01
It asks first
Before the agent edits a file, runs a shell command, or touches anything outside the current folder, it pauses and asks for approval. You see exactly what it's about to do.
Edit 01_clean.R? 1. Yes 2. No 3. Always allow
02
You can stop it
If the agent goes the wrong direction mid-task, hit Esc. It stops immediately. You can correct course or start over.
[Esc] > interrupted. what would you like to do?
03
You can rewind
Hit Esc Esc to roll back to an earlier point in the conversation.
Claude Code
rewinds chat, file edits, or both, your choice
Codex CLI
rewinds chat only; files stay as the agent left them
Backup
Keep your work backed up somewhere else: Dropbox, Google Drive, GitHub.
Day 1 · Safety21
Anatomy
The harness is not the model
The customization stack hangs off the harness, not the model. Codex CLI has analogous primitives with different names.
Day 1 · Mental model22
Where you are
Five levels of autonomy you give the agent
1
Browser chat
Copy from ChatGPT or Claude.ai, paste into your editor.
Stuck? Troubleshooting at socialscienceai.com/help, including the inspect-then-run alternative for locked-down university machines. Windows: Git for Windows is optional but gives Claude Code a Bash shell.
Day 1 · Install25
Pricing
Pick a tier
Claude Code · Anthropic
Pro · Floor
$20/mo
Tight caps. Expect to hit the limit mid-exercise.
Max (5x) · Recommended
$100/mo
Recommended for workshop use. Enough headroom to daily-drive Claude Code.
Max (20x)
$200/mo
Heavy daily use. My tier; justified by daily research volume, not workshop needs.
API
Per token
No caps; needs an API key. Most expensive day-to-day.
codex
Plus · Floor
$20/mo
2x promo through May 31; standard caps from June 1.
Pro
$200/mo
Heavy daily use. The Codex analogue of Max (20x).
API
Per token
No caps; needs an API key. Same per-token economics as Claude.
tiers as of May 28, 2026 · live pricing: claude.com/pricing, openai.com/chatgpt/pricing
Day 1 · Pricing26
§ 4
Workshop · Set up & first conversation
Workshop format
How workshop breakouts work
~5
people per breakout room
When you get stuck, in this order:
01
Ask your agent
Ask in plain English. Just like you'd ask me in the chat.
02
Ask your group
See if any of the other people in your group can help you. They may have had the same issue.
03
Ask me
Click Ask for Help in your breakout toolbar. I get a popup and join your room.
Day 1 · Breakout format27
Data tour
A quick look at the data you're about to download
All week we use the Chicago Police Department complaint data published by the Invisible Institute. The §2 workshop has you download it; here's what's in it before you do.
complaints-complaints.csv
234,971 rows × 19 cols
# key columns
cr_id complaint id
incident_date 1967-2023
beat CPD beat number
complainant_type CIVILIAN / CPD
final_finding SU / NS / UN /
EX / NA / blank
One row per complaint. Coverage densest 1990-2018; tails on either side.
Day 1 §2 downloads this
What's in final_finding
SU · Sustained. Allegation supported by the evidence; officer disciplined.
NS · Not sustained. Evidence insufficient.
UN · Unfounded. Allegation didn't happen.
EX · Exonerated. Happened but was justified.
NA / blank · Open, withdrawn, or not coded.
Day 2's regression builds sustained = (final_finding == "SU").
the outcome variable in tomorrow's spec
There's a second file, complaints-accused.csv, that links complaints to officers. We add it on Day 2 when we move to the merge.
Open the terminal, launch the agent, download the data, and ask it what it is.
Day 1 · Workshop29
Debrief
Debrief & Questions
What worked?
Where did you get stuck?
What surprised you?
Day 1 · Debrief30
§ 5
Tooling
Interface
Both tools work in every interface
Start here this week
Terminal (CLI)
Desktop app
IDE VS Code, JetBrains
Web browser
Claude Code
Anthropic
Codex
OpenAI
Some workflows (SSH to a remote machine, automating an agent over many files in a script) only run in the terminal. Switch to a graphical option if you prefer.
Day 1 · Interface31
Language tools
R and Python work better with coding agents than Stata or SAS
R / Python
Plays well with coding agents.
Training data
An enormous amount of R and Python code is on the public web, so coding agents write both fluently.
Terminal access
Both print their results and errors straight to the terminal, so the agent sees what happened and fixes its own mistakes.
Lots of training data. Results land in the terminal.
Stata · SAS · SPSS
Works, but less smoothly.
Training data
Less of this code lives on the public web, so the agent’s output is rougher and needs more correction.
Terminal access
These run from the terminal too, but results and errors land in separate log or viewer files, so the agent has to dig them out before it can fix anything.
Less training data. Results are buried in log files.
Day 1 · Language tools32
Document tools
Compile your papers with Quarto or LaTeX, your choice
Both build a finished paper from a plain-text source the agent can read, diff, and version. Day 2 gives you two tracks: use Quarto if you must use Word to work with co-authors. If co-authors are okay with it, then LaTeX to PDF is what I use.
Quarto (Word)
For co-authors who edit in Word.
The agent writes paper.qmd, a plain-text source mixing Markdown prose and code. Quarto runs the code, then renders a .docx through Pandoc; apply house styles with a reference template. The agent edits the source and re-renders rather than patching the binary file.
LaTeX (PDF)
What I use. Only if co-authors are on board.
The agent writes paper.tex, and LaTeX compiles it to PDF. Tables, figures, and citations land where the code puts them. The agent reads every intermediate file and iterates on its own.
Pick the track that matches where your document is going. The Day 2 exercise has you build the draft in whichever one you choose.
Day 1 · Document tools33
Companion tools
Two non-agentic tools worth installing this week
★ marks what I use.
Dictation
Talk, don’t type
Hold a key, speak, release. Text drops into your prompt. About 3x faster for long instructions.
Both Claude Code (/voice) and Codex now have built-in dictation, still rolling out.
A modern terminal adds quality-of-life features: tabs for multiple sessions, autocomplete on commands, split panes for side-by-side work, clickable links and file paths, and cleaner handling of long agent output. The built-in Terminal.app still works.
Have the agent install R, Python, Quarto, TinyTeX (optional), and the data-analysis packages.
Day 1 · Workshop35
Debrief
Debrief & Questions
What worked?
Where did you get stuck?
What surprised you?
Day 1 · Debrief36
§ 7
Context management
Context window
Everything the agent has seen this session is one long document
Limit: 1M tokens for Claude Code (Opus 4.8); 400K for Codex CLI (GPT-5.5). A token is roughly three-quarters of a word. Fills faster than you'd think.
Day 1 · Context window37
Context window
See what's in the context window right now
The stack on the previous slide is abstract. The harness will show you the actual numbers. In Claude Code, type /context for a per-category breakdown. In Codex CLI, type /status for the equivalent summary.
What the terminal prints
> /context
System prompt 2.3k (0.2%)
System tools 11.4k (1.1%)
MCP tools 0 (0.0%)
Memory files 3.1k (0.3%)
Messages 47.8k (4.7%)
-----------------------------------
Free 935k (93.7%)
Illustrative numbers.
illustrative output
What to read off it
Memory files. Your CLAUDE.md and AGENTS.md totals. If this is double-digit percent, your memory file is too long.
System tools. The built-in Read / Edit / Bash schemas. Fixed cost, not yours to tune.
MCP tools. Anything you install on Day 3 shows up here.
Messages. Your prompts and the agent's replies, plus every tool result. Grows fastest.
Free. The headroom before auto-compaction triggers.
three habits: keep memory short, clear when free drops below 30%, compact intentionally
Day 1 · Context window38
Three sources
Context comes from three places. The agent can reach two of them.
Dictate it
When you know what to say but typing is slow. Wispr Flow, SuperWhisper, Aqua Voice.
Have the agent interview you
When you might not know what to say. Next slide.
Day 1 · Three sources39
Workspace setup
Drop everything the agent might need into your project folder
Your project folder
your-project/
CLAUDE.mdClaude Code
AGENTS.mdCodex CLI
↑ Loaded at every session start
notes/
research_questions.md
data_dictionary.md
meeting_notes.md
papers/
johnson-2023.pdf
li-and-smith-2024.pdf
code/
01_clean.R
02_analyze.R
data/
complaints-complaints.csv
drafts/
outline.md
CLAUDE.md / AGENTS.md is special
The only file the agent loads automatically at session start. Project conventions, your preferences, anything the agent should always know.
Other files
Research questions, data dictionaries, meeting notes, drafts. The agent reads them when you reference them, or when it explores the project.
Source materials
PDFs of papers you're citing, prior code, scratch analyses, codebooks. Drop them in. The agent reads them on demand.
Principle
Richer directory, richer context. If the agent might need it, drop it in.
Day 1 · Workspace setup40
Interview-me
Have the agent interview you before you ask for something hard
01
Vague task
plan my talk
what you typed
02
Interview
Q1. How long is the talk?
A. 20 minutes.
Q2. Who is the audience?
A. Business-school faculty.
Q3. What do you want them to remember?
A. My main result is robust to alternative IVs.
questions you may not have thought to specify
03
Sharp task
draft a 20-min talk for business-school faculty; landing point = main result is robust to alternative IVs
what plan mode now has to work with
Day 1 · Interview-me41
Auto-compaction
When context fills, the agent summarizes older turns into a memory block
Auto-compact works fine for most research workflows when state is in files. Manual /compact gives you control over when and what to keep.
Day 1 · Auto-compaction42
Context rot
What is Context Rot?
The phenomenon
As context fills, the agent has more to keep track of and starts losing the thread. The signal-to-noise ratio drops.
After many turns in a heavy session, performance is noticeably worse than at the start.
What it looks like
The agent misremembers earlier decisions
Forgets constraints you set early in the session
Goes in circles on a problem it would solve fresh
Uses old patterns after you've corrected it
As your session grows, accuracy on the same task drops.
Pattern after Hong, Troynikov & Huber (Chroma Research, 2025); independently confirmed in Du et al. (EMNLP Findings, 2025). Curve illustrative.
Day 1 · Context rot43
Managing context
Three habits that keep context useful
01
Write state to files
Have the agent write progress, plans, and decisions to .md files in your project. Files persist across sessions; conversation context doesn't.
02
Watch the context budget
Run /context (Claude Code) or /status (Codex CLI) periodically. When free space drops below about 30% (roughly the zone where rot starts to bite), start fresh or compact. The signal is the budget, not a fixed turn count.
03
Compact intentionally
Type /compact before context fills, with a hint about what to keep. Better than waiting for auto-compact to lose detail.
three rules via Paul Goldsmith-Pinkham
Day 1 · Managing context44
§ 8
Plan mode
Plan mode
Plan mode is where you and the agent agree on the approach before any code runs.
Before the agent touches a file, it states what it intends to do. You read the steps, push back on the wrong ones, refine. By the time anything executes, the two of you have agreed on the approach.
Without plan mode
Agent assumes
The agent reads your prompt and acts. It picks variable names, file paths, methods, and sequencing on its own.
By the time you see what it did, you are unwinding decisions that were never yours.
With plan mode
you negotiate
The agent proposes the steps it would take, in order, before any of them run.
You approve, reject, or amend. The assumptions become visible and editable before they execute.
Day 1 · Plan mode45
Workflow rhythm
A useful rhythm for non-trivial work: Plan, Execute, Clear
Naming the rhythm helps you notice which phase you're in mid-task. Use it as a default for non-trivial work. Skip Plan when the task is small.
01
Plan
The agent proposes what it would do, before it does anything. You read the plan, approve, reject, or amend.
Shift+Tab+Tab to enter plan mode (or skip it for your own planning ritual)
02
Execute
The agent runs the plan: edits files, runs commands, asks permission as needed. You watch, and intervene if it drifts.
ESC to interrupt
03
Clear
When context fills, /compact to keep going with key state preserved. When the task is done, /clear to start fresh. Files persist; conversation context doesn't.
/compact or /clear
framing via Steve Pocock
Day 1 · Plan / Execute / Clear46
Plan-mode anatomy
What plan mode looks like in your terminal
Plan mode
1. Read codebook.md to confirm variable names
2. Load data/raw/complaints-complaints.csv
3. Profile: nrow, summary, missingness rates
4. Save profile to results/profile.txt
5. Report top findings inline
This plan requires approval
Do you want to proceed?
> 1. Yes, run the plan
2. No, let me give feedback to refine the plan
3. Cancel
Esc to cancel · Shift+Tab+Tab toggles plan mode
Stylized example. Live wording may differ slightly.
What’s happening
You hit Shift+Tab+Tab to enter plan mode. The agent proposes the steps it would take, before doing any of them.
What you see
A numbered list of discrete actions, so you can see exactly what the agent intends. If you want to change a step, pick option 2 and tell it what to change in plain English.
What you do
1 approves the plan and exits plan mode into execute. 2 sends the agent back with your feedback. 3 cancels.
When to use it
Any task with more than one step. Cleaning a dataset, multi-file refactors, anything where the wrong first move sends the agent in a wrong direction.
Day 1 · Plan-mode anatomy47
§ 9
Permissions
Permission model
The agent asks before doing anything that changes your system
Reads
File contents, project structure, git status. Looks at things without changing them.
Runs silently. Reads do not change anything, so no prompt.
Edits
Creating, editing, or deleting files. Changes things on your computer.
Asks each time. Shows the proposed change first. Approve, decline, or always-approve for this project.
Shell
Runs terminal commands like Rscript, git, rm. (A shell is the program that runs your terminal commands.)
Asks before every command, including destructive ones like rm or git reset.
By default the agent asks before almost every action. Use /permissions to pre-approve commands as you build trust.
Day 1 · Permission model48
Anatomy
What a permission prompt looks like in your terminal
A terminal is the text-based interface to your computer. The agent prints prompts like this one when it needs your approval.
Bash command
Rscript code/01_clean.R
Run the data cleaning script
This command requires approval
Do you want to proceed?
> 1. Yes
2. Yes, and don't ask again for: Rscript *
3. No
Esc to cancel
Stylized example. Live wording may differ slightly.
What's happening
The agent paused before running a Bash command and is asking for approval.
What you see
The exact command, then a one-line description of what it's for. Read both before approving.
What you do
1 runs once. 2 auto-allows the pattern (any Rscript *) for this project. 3 declines. Esc cancels.
Day 1 · Anatomy49
Graduated path
Permission layers, least to most permissive.
The annoyance of repeated prompts is a real cost. The right fix is to remove the prompts you do not need, not to remove all the gates.
Layer 1. Allowlist specific commands.
Reads and other safe actions are always allowed without asking. You can add a list of other commands to skip prompts for, e.g. running R scripts or checking git status. Anything risky, like deleting files or force-pushing to a repo, still asks.
Layer 2. Auto-approve file edits.
File edits happen without asking. Running scripts still asks. Useful when edit approvals are the main annoyance and you know you can always undo with /rewind.
Layer 3. Auto mode.recommended starting point
A separate safety check decides which actions look routine (lets them run) and which look risky (asks you). Better than skipping all checks, because the dangerous stuff still gets reviewed. Turn it on in ~/.claude/settings.json. Needs Opus 4.6+ or Sonnet 4.6. Codex version: --ask-for-approval (Auto by default).
Layer 4. Skip all prompts.once you trust the workflow
Nothing asks for approval. Best when your work is recoverable (version control plus /rewind). As of recent versions this flag also bypasses writes it used to protect: .git/, .claude/, .vscode/, and shell config files (only catastrophic deletes still prompt), so the blast radius now includes your git history and dotfiles. Remaining risk: a webpage or document the agent reads could try to trick it into running something destructive. This is prompt injection; we come back to it in the next section. Codex version: --yolo.
Per session: Shift+Tab cycles modes in Claude Code; Codex CLI uses /permissions, which offers three modes: Auto (default), Read-only, and Full Access.
Have the agent make an annual time series and a density map of the complaints data.
Day 1 · Workshop51
Debrief
Debrief & Questions
What worked?
Where did you get stuck?
What surprised you?
Day 1 · Debrief52
§ 11
Slash commands
Why slash commands
Plain language goes to the model. Slash commands stop at the harness.
Three things only slash commands can do: clear context, show your cost, edit permissions.
Day 1 · Why slash commands53
Slash commands
The most important slash commands in Claude Code
/rewind
Roll back files and conversation to an earlier point. Your safety net for risky changes.
/clear
Start a new conversation. Past sessions stay in /resume.
/compact
Summarize the conversation to free context.
/resume
Resume a previous conversation by name or picker.
/usage
Show session cost, plan limits, activity stats. /cost is an alias.
/model
Switch models for the current session.
Opus: hard reasoning
Sonnet: fast iteration
/permissions
Manage allow / ask / deny rules for tool permissions.
/init
Initialize the project with a starter CLAUDE.md.
/exit
End the session.
+ advanced (later this week): /loop · /rc
These are Claude Code commands. Codex CLI has its own set; many overlap, but it has no /rewind (closest: /fork).
Day 1 · Slash commands54
Recently shipped
Three newer commands worth knowing
All three landed in May 2026. The tools change weekly, so treat this as a snapshot, not a fixed feature set.
/effort
Dials how hard the model thinks, alongside /model. Turn it up (/effort xhigh) for a hard identification or debugging problem; turn it down to conserve rate limits on routine edits.
/goal
Set a finish line and Claude works across turns until it is met, showing elapsed time, turns, and tokens. A new rung on the autonomy ladder. Phrase the condition as something its own output demonstrates, e.g. “the tests pass.”
/workflowspreview
Claude writes a script that fans the work across many background subagents at once. Plan-gated (Max, Team, Enterprise), so I may demo later in the week.
/goal is distinct from /loop, which repeats on a time interval rather than working toward a condition.
Day 1 · Recently shipped55
§ 12
Memory
Memory
Two primary places to store memories/rules
Every session, the agent reads both files and prepends them to its context. The global file follows you across every project. The project file lives in the folder and only applies there.
Global memory file
~/.claude/CLAUDE.md
~/.codex/AGENTS.md
What goes here. Who you are; which language and packages you use everywhere; the writing voice you want; reproducibility defaults.
Examples: “Primary languages are R and Python.” “Regressions in fixest.” “Cluster SE at the unit of treatment.”
written once on Day 1, edited rarely
Project memory file
<project>/CLAUDE.md
<project>/AGENTS.md
What goes here. The research question; data paths and codebook; the join key and unit of analysis; the project-specific clustering level.
Examples: “Cluster SE at beat.” “Inner join on cr_id; report the share dropped.” “Outcome is sustained.”
written when the project starts, edited as you learn the data
Both files are concatenated into the system prompt at session start. Keep universal rules in the global file. Keep facts that only apply to one paper in the project file.
Have the agent interview you, then write a global CLAUDE.md from the conversation.
Day 1 · Workshop57
Debrief
Debrief & Questions
What worked?
Where did you get stuck?
What surprised you?
Day 1 · Debrief58
§ 14
Privacy and copyright
Privacy
Does Anthropic train on your Claude Code data?
No, not by default.
Retention varies by plan.
Pro / Max
30
days retention
Not trained on by default. You can opt in to training (extends retention to 5 years).
API
7
days retention
Never trained on. Pay-per-token tier; standard for any researcher using an API key.
Enterprise / Team
0
days, with ZDR
Never trained on. Zero Data Retention available on request.
Using IRB-protected or otherwise sensitive data? Consult first.
Prompts and tool outputs travel to Anthropic regardless of training behavior. That may not satisfy your IRB protocol or data-use agreement. Check with your IRB office, library, or research compliance officer before pointing Claude at sensitive data.
Trained on by default. Opt out in ChatGPT Settings → Data Controls → “Improve the model for everyone.”
API
30
days, abuse logs
Never trained on. Pay-per-token. ZDR available with OpenAI approval.
Business / Enterprise / Edu
0
days, with ZDR
Never trained on. Zero Data Retention supported for the Codex App, CLI, and IDE.
Using IRB-protected or otherwise sensitive data? Consult first.
Prompts and tool outputs travel to OpenAI regardless of training behavior. That may not satisfy your IRB protocol or data-use agreement. Check with your IRB office, library, or research compliance officer before pointing Codex at sensitive data.
Can you upload copyrighted articles to Claude Code?
Probably not.
Most articles you have access to come through your library’s subscription.
Library-licensed PDFs
Most of what you read.
Articles accessed through your university’s subscription to Elsevier, Wiley, Springer, JSTOR, and similar publishers. The license between your library and the publisher typically restricts feeding subscribed content to AI tools.
Lower-risk sources
More permissive, but still verify.
CC-BY or CC0 articles. Check the explicit license; not all “open access” is permissive.
Preprints you authored. Verify the platform’s license; arXiv’s default does not grant AI-training rights.
Drafts of your own work that have not been signed over to a publisher.
Material your library has explicitly cleared for AI use. Most subscription contracts do not include this; ask before assuming.
When in doubt, ask first.
Most publisher subscription contracts restrict feeding subscribed content to AI tools, and personal purchase does not override the publisher’s TOS. Fair use is a US doctrine actively contested in pending AI litigation. Check with your library, IRB office, or general counsel before uploading material you did not write or explicitly license.
I’m not a lawyer and this is not legal advice. Check with your library or counsel for your specific situation.
Day 1 · Copyright61
§ 15
Failure modes: what the agent can't do
Failure modes
You are the P.I. The agent is the R.A.
AI coding assistants can produce code, run scripts, and draft text. They do not have the contextual judgment to make the choices that determine whether an analysis is defensible. Six responsibilities remain with the researcher.
Responsibility
What it covers
Taste
The agent doesn't know your field. Journal-specific style conventions, current methodological debates, and niche or recent techniques are underrepresented in its training.
Theoretical framing
What conversation in your field the paper joins, what counts as a contribution, which precedent to cite. The agent can suggest framings; it can't tell you what your field will register as novel.
Data quality
Whether the data actually answers the question. The agent will run regressions on whatever variables you give it without flagging that they may not measure what you think they measure.
Methodological choices
Sample construction, estimator, identification strategy, and standard error specification.
Verification
Direct inspection of the analytic output: cleaned data, row counts, regression tables, figures.
Stopping rules
When the analysis is done versus when you're p-hacking. The agent will keep running specifications as long as you ask for more.
Day 1 · Failure modes62
Failure modes
Sycophancy and leading prompts
The agent tends to comply with user direction. Prompts that signal a desired result, or that request unmotivated changes to the sample or specification, produce output that conforms to those expectations rather than to the structure of the data.
Leading
Neutral
“This effect should be significant. Try a few specs.”
“Run the pre-registered spec. Report the coefficient and SE.”
“Add controls until the coefficient is significant.”
“Estimate the model with the pre-specified control set. Report results with and without controls.”
Day 1 · Failure modes63
Failure modes
Three failure modes to watch for
Failure mode
What happens
Mitigation
Destructive actions
The agent runs a command that deletes or overwrites files you did not intend to lose.
Use version control. Back up raw data outside the project. Commit changes often.
Changes that spread
A request to change one file may quietly modify others.
Name the file you want changed. Review the diff before approving.
Gaming the checks
Asked to fix a failing check (e.g., "N should be 154,525"), the agent may weaken or delete the check itself rather than figure out why the number is off.
Write your own data-integrity checks. Review any agent change to them before approving.
Day 1 · Failure modes64
Failure modes
Prompt injection: the risk that comes from outside
The other failure modes come from the agent or from you. This one comes from a third party. The agent treats text it reads as if it were your instructions, so a hidden line in an email, a PDF, or a web page can redirect it.
What it is
Detail
Where you are exposed
Anything that feeds the agent text you did not write: an email it triages, a paper you are reviewing, a scraped web page, a dataset's README, the output of an external tool.
What it can cause
Leaking data the agent can see, running a shell command, or sending an email or making an edit you never asked for. The damage scales with the agent's reach: file access, network, and send permissions.
How to limit it
Do not run full autonomy (Layer 4) on a session that is reading strangers' text. Keep approvals on for actions that reach outside: shell, send, network. Treat anything read from outside as data, not commands. Keep raw data backed up and work under version control so any edit is reversible.
The rule of thumb: the more autonomy you grant a session, the less untrusted text it should be reading.
Day 1 · Failure modes65
§ 16
Debrief & Day 2 preview
Debrief
Today you went from zero to working
1
Installed Claude Code
You went from zero to a working agentic terminal.
2
Permissions you control
Allow you to control what the agent can do and access.
3
Plan mode
Read the plan before the agent does anything risky.
4
Real data exploration
Used shell tools and web search to inspect the Chicago police complaints data without writing a line of R.
5
Your global CLAUDE.md
Persists across every project, every session. Tomorrow the agent already knows you.
6
R and Python ready
Installed and verified. Tomorrow you start the analysis.
Day 1 · Today66
Tomorrow
Day 2 · Working with data
Plan mode in practice
Plan mode and the prompts that get you the spec you intended.
Join the data
Merge the complaints and accused-officer files; report row counts and match rates.
Your first regression
Beat fixed effects, clustered standard errors, interpret the coefficient.
Reusable skills
Author table- and figure-formatting skills that encode your conventions.
Compile your draft
A Quarto/markdown and/or LaTeX memo rendered to Word (.docx) and/or PDF.