# Project-Level CLAUDE.md Interview Prompt

**Use:** Day 3 of the workshop (customization day). Participants paste this into a fresh Claude Code session, run it inside their own research project's directory, and produce a `CLAUDE.md` at the project root. This is the project-specific overlay that sits on top of their global `~/.claude/CLAUDE.md`.

**Why it's separate from the global one.** The global file captures *you* (your conventions, preferences, anti-patterns that apply across all your work). The project file captures *this specific project* (the dataset, the variable names, the identification strategy, the file paths). Don't put project-specific things in your global file; don't restate global rules in your project file.

**Pre-test before workshop.** Run this in a representative empirical project of your own. Watch for: (a) the gating questions correctly skipping the identification-strategy section for non-empirical projects, (b) the path detection working when the participant runs it from a project subdirectory, (c) the merge-vs-overwrite behavior on existing project CLAUDE.md files.

---

## The prompt

````markdown
You are going to interview me to create a project-specific CLAUDE.md
file at the root of my current research project. This sits on top of
my global CLAUDE.md (at ~/.claude/CLAUDE.md), which already covers
universal conventions (languages, statistical defaults, working style,
anti-patterns). Do not restate those things here. This file should
contain only what is specific to *this* project: the data, the
variables, the identification strategy, the file paths, and any
project-specific anti-patterns.

Cover these EIGHT sections, in this order, asking ONE question at a
time. Two of the sections are gated by an opening question; skip the
section entirely if the gating answer is no. Do not produce empty
headers in the final file.

1. **Project location and existing CLAUDE.md** — confirm where the
   project lives and whether a CLAUDE.md already exists.
   - Use the `pwd` tool (or equivalent) to check the current working
     directory; report it back to me and ask whether this is the
     project root.
   - If it isn't, ask for the absolute path to the project root.
   - Check for an existing `CLAUDE.md` at that path. If one exists,
     ask whether I want to merge with it (you'll show a diff before
     writing) or overwrite it.

2. **Project identity** — what is this project?
   - What are the main research questions (1-3 sentences)?
   - What field or conversation is this in (e.g., "strategic human
     capital and inventor mobility")?
   - What is the current stage: ideation, data collection, analysis,
     drafting, revise-and-resubmit, post-acceptance?
   - Target outlet, if known.

3. **Data** — where the data lives and what's in it.
   - Where are the raw data files (paths relative to project root)?
   - Where do cleaned/processed data go?
   - Is there a codebook? Where?
   - What's the panel structure (unit of observation, time variable,
     panel ID)? Skip if cross-sectional.
   - Key variables: panel ID, treatment (if any), outcome, important
     covariates. List with one-line descriptions.
   - Known data quirks: missingness patterns, unit-of-analysis
     ambiguities, any silent-drop traps you've already encountered.

4. **Identification strategy** [GATED] — Open with: "Is this an
   empirical causal-inference project (DiD, IV, RDD, matching,
   synthetic control, event study)?" If no, skip the entire section.
   If yes, ask:
   - Treatment: what defines treated vs. control units?
   - Identifying assumption (parallel trends, exclusion restriction,
     continuity at cutoff, conditional independence)?
   - Main threat to validity in this context?
   - Default estimator (e.g., for staggered DiD, which robust
     estimator)?
   - Any controls that should NEVER be included because they're
     measured at or after treatment (mediator/collider concerns)?

5. **Project-specific statistical defaults** — defaults that differ
   from your global file or that need to be pinned for this project.
   - What level should standard errors be clustered at, in this
     project?
   - Default fixed-effects structure when not the focus of the
     analysis?
   - Sample restrictions that should apply by default (e.g., "drop
     observations before 1990," "include only firms with >5 years of
     data")?
   - How to handle missingness in *this* data, given its specific
     pattern?

6. **Output conventions for this project** — where outputs go and how
   they're named, if different from your global defaults.
   - Where do tables and figures save in this project?
   - Naming: numbered (`table_1.tex`) or descriptive
     (`main_baseline.tex`)?
   - Where does the writeup live (`paper/main.tex`,
     `notes/draft.md`, etc.)?
   - Any special output requirements from a co-author or target
     journal?

7. **Project-specific anti-patterns** — things Claude should never do
   in *this* project specifically. Probe for at least 3, drawn from
   things you've already been burned by or things you've explicitly
   decided.
   - Examples to probe for: silent drop of meaningful missing values,
     re-downloading large data files mid-session, using the wrong
     aggregation function on a multi-row-per-unit table, switching
     methods without flagging it, hand-editing generated outputs.

8. **Collaborators and other context** [GATED] — Open with: "Are
   there co-authors or other people whose conventions you need to
   honor on this project?" If no, skip the section. If yes, ask:
   - Who are the co-authors (initials are fine)?
   - Any shared conventions: variable-naming style, table style,
     comment style in scripts?
   - Anything specific they've asked for that should be noted?

Interview rules:
- Ask ONE question at a time. Wait for my answer before the next.
- Use multiple-choice options when reasonable to speed answering.
- For gated sections (Identification, Collaborators): ask the gating
  question first. If I say no, skip the entire section and do not
  include its header in the final file.
- If I say "I don't know" or "no strong opinion": propose a sensible
  default with a one-line rationale, and ask if I want to include it.
  Don't fabricate.
- If I give a path that doesn't exist, flag it and ask whether to
  proceed anyway or correct it.
- Be conversational, not robotic. Acknowledge my answers briefly
  before moving on.

When all sections are covered:
- Summarize the rules you've gathered, organized by section.
- If a CLAUDE.md already exists at the project root, show me a diff
  of what you propose to add or change.
- Ask if I want to (a) accept and write the file, (b) accept some and
  modify others, or (c) abort.
- Write the final file to <project_root>/CLAUDE.md. Do not write
  anywhere else.

Format the final file as plain markdown with:
- A short opening paragraph stating that this is a project-specific
  overlay on the global CLAUDE.md, and noting which file is which
- One H2 header per section
- Bulleted rules under each
- A one-line rationale comment after a rule only when non-obvious
- No preamble before the opening paragraph, no closing fluff

Begin with section 1.
````

---

## Notes for the workshop

- **Hand it to participants as a file**, not as something to retype. They open `materials/interview_prompts/project.md`, copy the fenced block, paste into a fresh Claude Code session inside their project directory.
- **Prerequisite: participants should have a working global CLAUDE.md first.** This prompt assumes the global file exists and contains universal preferences. If a participant skipped the Day 1 interview, run that one first.
- **Run the interview from the project root.** The first section asks Claude to confirm the working directory, but it's faster if the participant cd's there before pasting.
- **Watch for the gated sections.** Empirical participants get the identification-strategy questions; theoretical or qualitative participants skip them. Co-author section similar. Helpers should not be surprised when an interview is shorter than another participant's.
- **Verify the file after writing.** Have participants `cat CLAUDE.md` and confirm it looks right before continuing to the next exercise.
- **The merge step matters.** If a participant has an existing project CLAUDE.md (some come in with one), this prompt explicitly asks Claude to read it, show a diff, and ask before overwriting. Without that, participants can lose existing work.

## Related

- `global_quick.md` — focused (Day 1) interview for the global CLAUDE.md
- `global_deep.md` — comprehensive (post-workshop) interview for the global CLAUDE.md
- The workshop project CLAUDE.md at `github.com/jfrake/socialscienceai-workshop/CLAUDE.md` is the example output this prompt produces, applied to the workshop dataset
