# Global CLAUDE.md Interview Prompt: Deep tier

**Use:** After the workshop. Participants paste this into a fresh Claude Code session to extend the focused (Day 1) version of their CLAUDE.md with sections covering working style, causal inference, data integrity, reproducibility, prose drafting, and verification expectations.

**Companion:** `global_quick.md` is the focused (Day 1) tier. This file is the deeper, post-workshop version.

**Pre-test before posting.** Run this prompt yourself in a fresh Claude Code session each time the model version changes. Watch for: (a) the "ONE question at a time" rule holding across sections, (b) gated sections (Causal Inference, Academic Prose Drafting) properly skipping when the gating answer is no, (c) the merge-vs-overwrite behavior on the existing CLAUDE.md.

---

## The prompt

````markdown
You are going to interview me to expand my global CLAUDE.md file at
~/.claude/CLAUDE.md. This is the deep tier. It assumes I have already
done the focused (Day 1) interview and have a working CLAUDE.md with
sections for Identity, Languages, Statistical Defaults, Output
Conventions, and Anti-Patterns. This prompt adds sections for working
style, causal inference, data integrity, reproducibility, prose
drafting, and verification.

If no CLAUDE.md exists at ~/.claude/CLAUDE.md, run the focused tier
first. Do not proceed.

Cover these ELEVEN sections, in this order, asking ONE question at a
time. Two of the sections are gated by an opening question; skip the
section entirely if the gating answer is no. Do not produce empty
headers in the final file.

1. **Identity** — confirm and refine. Pull what is already in the
   existing file, ask if anything has changed since the focused
   interview, and revise.

2. **Working style with Claude** — how I want Claude Code to
   communicate with me. Cover, one question each:
   - Emojis: allowed in code, comments, table output, chat? (Default
     no.)
   - Em dashes in prose: allowed?
   - When you ask me something, should you restate my request before
     answering, or just answer?
   - Should you lead with the result, or describe what you are about
     to do first?
   - When a non-trivial choice is ambiguous (sample restriction,
     variable construction, model spec, control set), should you ask
     or guess?
   - Before launching a job likely to run more than a few minutes,
     should you estimate runtime and ask, or just start?
   - For non-trivial edits, should you summarize what will change
     before making the edit?

3. **Language & Package Preferences** — extend or refine. Confirm the
   primary languages and packages from the focused interview. Then
   probe for the next layer: which package for what scenario, when to
   prefer one over another (e.g., data.table vs. dplyr by file size or
   readability needs).

4. **Statistical Defaults** — extend. Beyond clustering and FE: how to
   report missingness (rate per variable, MCAR/MAR/MNAR assumption,
   chosen handling). Significance reporting (stars at which thresholds,
   SE in parentheses or brackets, decimal places).

5. **Causal Inference** [GATED] — Open with: "Do you do causal
   inference (DiD, IV, RDD, matching, regression discontinuity,
   synthetic control)?" If no, skip the entire section. If yes, ask:
   - Before finalizing an identification strategy, what should be
     stated explicitly (treatment-assignment mechanism, identifying
     assumption, main threat to validity)?
   - Default rule for control inclusion when a covariate is measured
     at or after treatment (mediator/collider check)?
   - Default DiD estimator for staggered adoption (Callaway-Sant'Anna,
     Sun-Abraham, Borusyak-Jaravel-Spiess, de Chaisemartin-D'Haultfoeuille,
     or stick with TWFE)?
   - Event-study figure expectation (always for DiD before quoting an
     estimate, sometimes, never)?
   - For IV: which first-stage statistics should always be reported
     (first-stage F, Olea-Pflueger effective F, weak-instrument
     thresholds)?

6. **Data Integrity** — what to assert after data manipulations:
   - After every merge: row count before and after, key uniqueness on
     the intended side, non-match rate, flag many-to-many?
   - After loading data: assert expected ranges, types, allowable
     values for treatment and outcome, fail loudly if violated?
   - When reading CSV/Excel: explicit column types and explicit
     sheets/ranges, never default type-guessing?
   - Rule for overwriting files in cleaned-data directories
     (confirm intent, identify which script wrote it)?

7. **Reproducibility** — quick:
   - Set a random seed at the top of any script with stochastic
     elements? (Default yes.)
   - Log session info (R sessionInfo, pip freeze) at the end of any
     script that produces final output? (Default yes.)

8. **Output Conventions** — extend. Beyond format and directory
   structure: file-naming convention (numbered like `table_1.tex`
   vs. descriptive like `fig_event_study.pdf`, snake_case vs.
   kebab-case). Whether numbers in compiled documents must come from
   `\input{}` of code-generated `.tex` snippets vs. typed by hand.

9. **Academic Prose Drafting** [GATED] — Open with: "Do you ever
   delegate prose drafting (paper sections, abstracts, response
   letters) to Claude?" If no, skip the entire section. If yes, ask:
   - Should drafted prose be treated as a first draft requiring
     substantive revision by you?
   - Words to avoid (the Kobak et al. 2025 excess-word list:
     `delve`, `underscore`, `crucial`, `comprehensive`, `meticulous`,
     `intricate`, `pivotal`, `notable`, `potential` as filler,
     `additionally`, `moreover`, `furthermore`, `importantly`)?
   - Template phrases to avoid ("it's important to note", "paving the
     way", "not only X but also Y", "in conclusion", "stands as a
     testament")?

10. **Verification Before Completion** — what counts as "done":
    - Re-run the affected script end-to-end before claiming
      completion?
    - Confirm numbers in tables and figures match the latest output?
    - Report exactly what was verified (which script, which output,
      which numbers)?
    - What to do when verification cannot be performed (long-running
      job, missing data on this machine)?

11. **Anti-Patterns** — extend the focused list with at least three
    more drawn from past frustrations. Probe for things in these
    categories: silent data modification, fabricated numbers,
    unrequested cleanup or refactoring, error suppression
    (try/except, tryCatch), package installation inside analysis
    scripts, package switching mid-project, rounding that obscures
    precision.

Interview rules:
- Ask ONE question at a time. Wait for my answer before the next.
- Use multiple-choice options when reasonable to speed answering.
- For gated sections (Causal Inference, Academic Prose Drafting): ask
  the gating question first. If I say no, skip the entire section and
  do not include its header in the final file.
- If I say "I don't know" or "no strong opinion": propose a sensible
  default with a one-line rationale, and ask if I want to include it.
  Don't fabricate.
- Be conversational, not robotic. Acknowledge my answers briefly
  before moving on.

When all sections are covered:
- Read the existing ~/.claude/CLAUDE.md.
- Show me a unified diff of what you propose to add or change.
- Ask if I want to (a) accept all changes, (b) accept some and modify
  others, or (c) abort.
- Do not overwrite the existing file silently. If I accept, write the
  merged file.

Format the final file as plain markdown with:
- One H2 header per section
- Bulleted rules under each
- A one-line rationale comment after a rule only when non-obvious
- No preamble, no closing fluff

Begin with section 1.
````

---

## Notes for the workshop

- **Hand it to participants as a file**, not as something to retype. They open `materials/interview_prompts/global_deep.md`, copy the fenced block, paste into a fresh Claude Code session.
- **Recommend this AFTER the workshop**, not during. Day 1 uses the focused (quick) tier; this is for participants who want to keep extending their CLAUDE.md once they have real usage data.
- **The merge step matters.** This prompt explicitly tells Claude to read the existing file, show a diff, and ask before overwriting. Without that, participants can lose their focused-tier work.
- **Gating reduces dread.** The Causal Inference and Academic Prose Drafting sections are skipped entirely for participants who don't do that work, rather than asking them to write "N/A" through a section that doesn't apply.

## Related

- `global_quick.md` — the focused (Day 1) tier
- `project_claude_md.md` (TBD) — interview for the project-specific CLAUDE.md, used Day 3
