The annual performance review has a structural flaw: it rewards memory, not performance. By the time review season arrives, most of the year has evaporated. The project you rescued in March, the process you quietly fixed in May, the praise a senior manager sent in June, all of it competes with whatever happened in the last six weeks. Culture Amp defines this as recency bias, 'the tendency to focus on the most recent time period instead of the total time period', and notes it persists because recent events are simply easier to remember. Its recommended counter is unglamorous: collect data points throughout the year, so the review draws on the whole period rather than the tail end.
The manual version of that counter has been around for years. Software engineer Julia Evans calls it the brag document: a running record of accomplishments kept because you do not remember everything you did, and your manager does not remember everything you did either. Evans suggests short updates every couple of weeks, or a longer session every 6 to 12 months. It works, but it depends entirely on personal discipline, and when review season arrives it still leaves you to assemble the self-assessment, the talking points and the evidence trail by hand.
This module builds the AI-powered version: a dedicated Performance project space that holds your actual KPIs, logs achievements in a consistent evidence format, and turns roughly 10 minutes a month into a complete review pack whenever you need one. If you need a reason to bother, consider that the Fair Work Ombudsman's best practice guide recommends employers ask employees to 'complete a short self-review ahead of the performance review'. The self-review is coming either way. The only question is whether you walk into it with twelve months of dated evidence or a blank page and a fading memory.
The module runs in six parts. Part 1 builds the project space and its three working files. Part 2 installs the 10 minute monthly ritual. Part 3 chains prompts from raw log to quarterly check-in and annual review packs. Part 4 turns the same log into an evidence-based remuneration case. Part 5 covers the leader side of the table, including the decisions that must never be delegated to AI. Part 6 closes with the privacy guardrails that make the whole system safe to run at work.

Part 1: Build the Performance project space
The system lives in one dedicated workspace, not scattered across ad hoc chats. A dedicated space gives you three things a loose chat cannot: persistent instructions that govern every conversation, persistent reference files the assistant can always consult, and contained memory so your performance evidence does not leak into unrelated chats. Set it up once, in about 20 minutes, and it runs for years.
In ChatGPT, create a dedicated Project. Projects group chats, uploaded reference files and project-level instructions in one place, and project instructions apply only inside that project, overriding your global custom instructions there. Projects also carry built-in memory, chosen at creation as default or project-only. On Enterprise and Edu workspaces the containment is stronger still: project chats cannot reference conversations outside the project, and outside conversations cannot reference them. That walled-garden behaviour suits this use case exactly.
On Claude, the equivalent is a Project on claude.ai, which supports custom project instructions and a project knowledge base of uploaded documents used across all chats in the project, with each project holding its own separate memory space. If your organisation runs Claude Cowork on the desktop, you can go one step further: point a Cowork project at a local folder holding the three files below. Cowork reads and writes files in connected folders directly, folder instructions add project-specific context, and project memory is scoped so what Claude learns in your Performance project does not carry over to others. Whichever tool you use, the architecture is identical: one instruction set, three files, one contained memory.
The project instructions
The instructions are the contract between you and the assistant. They tell it what this project is for, how to record what you say, and what it must never do. Paste this blueprint in as your project instructions (ChatGPT) or project or folder instructions (Claude), and adjust the file names only if you rename the files themselves.
File 1: kpi-reference.md
This is the fixed reference point: what your year is actually judged against. Copy the relevant extract from your position description, your agreed goals, or your performance agreement into this structure. Be literal. If your KPIs have weightings, include them; the prompt chains use weightings to decide where the strongest evidence matters most. Update this file only when your goals formally change.
File 2: achievements-log.md
This is the running evidence log, the heart of the system. Every entry follows the same shape so the prompt chains can build claims without guesswork. The detail fields borrow the STAR structure, Situation, Task, Action, Result, which MIT Career Advising documents for behavioural interviews with a clear instruction to highlight quantifiable results. MIT's guidance weights the telling: roughly 20 per cent situation, 10 per cent task, 60 per cent action, and 10 per cent result, with quantifiable results highlighted. MIT frames STAR for interviews; the same structure works just as well for review evidence, because both jobs are the same job: proving a contribution happened and mattered. The evidence pointer is the field people skip and later regret. It records where proof lives, a dated email, a report name, a dashboard, described rather than attached.
File 3: feedback.md
Feedback is evidence too, and it evaporates fastest. This file holds positive and constructive feedback verbatim where possible, with the date and the source's role, never their name. Recording the situation the feedback relates to and its observable effect keeps entries factual rather than editorial; that discipline echoes the Situation-Behaviour-Impact model the Center for Creative Leadership developed for feedback conversations: describe the situation, the actual observable behaviour, and the impact, keeping to facts and leaving out judgements. For constructive feedback, add one more line: what you did about it. A logged correction plus a logged fix is development evidence, and it is far more persuasive than claiming you have no development areas at all.
Upload the three files to the project and keep them current. In a Cowork project, Claude updates the local files in place. In ChatGPT, ask the assistant to output the updated file at the end of each logging session, save it, and replace the copy in the project's sources. Keep the files text-forward and focused: one file per job, no embedded screenshots, no pasted documents. The whole system stays small enough to read end to end in minutes.
Part 2: The 10 minute monthly ritual
The system runs on one recurring calendar appointment: 10 minutes, once a month, in a slot you protect. The first Friday of the month works well because the previous month is complete but still fresh. Book it as a recurring invite to yourself, treat it like a meeting with your future self, and do exactly four things:
- Open the Performance project and run the monthly logging prompt below.
- Answer the interview questions from memory, your sent folder and your calendar.
- Review the formatted entries the assistant produces, correct anything inflated or imprecise, and save them into achievements-log.md and feedback.md.
- Note any KPI the assistant flags as having no evidence this month, and decide whether the gap is real.
A worked example
Here is what one October entry looks like for a de-identified GRC analyst, produced from a two-minute answer to question one and a follow-up question about numbers:
Why monthly beats quarterly cramming
Recency bias does not only affect your manager. It affects you. By month three, the specifics from month one are gone: the exact number of overdue tests, the minutes saved per email, who said what and when. Culture Amp's counter-measure for reviewers, collecting data points throughout the year so the end-of-year view covers the entire period, applies with equal force to your own record; this module simply extends that logic from the manager's side of the table to yours. A monthly entry is a contemporaneous record, written while the numbers are still checkable.
The Fair Work Ombudsman's underperformance guidance runs on the same logic from the employer's side: write down examples of behaviour at the time, note when they occurred and why they matter, and gather the documents that demonstrate them. The regulator recommends contemporaneous documentation because memory does not survive contest. The same principle that protects an employer in a dispute protects your achievements at review time. There is also a plain scheduling reason: a 10 minute monthly slot survives busy periods. A 90 minute quarterly session is precisely the kind of appointment that gets moved four times and then abandoned, which quietly converts your evidence system back into a memory test.
When the year goes wrong
Some years do not cooperate. A restructure removes the project you were hired to run, a budget freeze cancels the work at phase one, a KPI becomes unmeasurable because the system behind it was decommissioned. The instinct is to stop logging, because nothing feels like an achievement. Log anyway, and log differently: record what was delivered before the change, the decision trail showing the cause sat outside your control, what you preserved or prevented during the disruption, and the skills the salvage work demonstrated. The Fair Work Ombudsman's documentation logic runs both ways here: the contemporaneous records the regulator recommends because they protect an employer in a dispute are the same records that protect an employee when performance is contested. A dated entry written the week a project was cancelled is worth far more than an explanation assembled months later under pressure.
Here is what a defensive entry looks like for the same GRC analyst, written the week the news landed rather than reconstructed in review week:
At review time, that entry converts a cancelled project from a hole in the year into evidence: phase one landed on schedule, the cancellation was organisational, and the handover means the investment is recoverable. The same honesty applies to plain gaps. A KPI that produced nothing because the scope was cut reads very differently from a KPI that produced nothing with no explanation. Silence at review time gets filled by assumption; a dated record gets read.

Part 3: The prompt chains, from log to review pack
When a check-in or a review lands, you do not start writing. You run a chain. Each chain is a sequence of prompts run inside the Performance project, where every step reads the same three files and every claim traces back to a log entry. Two chains cover the review calendar: a three-step quarterly check-in pack and a four-step annual review pack.
Chain 1: The quarterly check-in pack
Three steps: synthesise, draft, attack. The order matters. Drafting before synthesis invites claims the evidence cannot carry; skipping the attack step sends your manager a document nobody has stress-tested.
Step 1: Synthesise the quarter
Step 2: Draft the one-pager
Step 3: Red-team the draft
What the finished one-pager looks like
Here is the chain's end product for the same GRC analyst, generated from the October quarter of the log and tightened through the red-team pass. Notice two things. Every win carries its number and its KPI, so nothing rests on adjectives. And the thin KPI is named honestly in the risks section rather than papered over: incident reporting produced one entry all quarter, so the document says so and converts the weakness into a specific support ask. That honesty is what makes the strong claims credible.
Chain 2: The annual review pack
The annual chain adds two passes because the stakes are higher. HBR guidance from Marlo Lyons is blunt about why the self-assessment deserves this effort: it 'will set the tone for your manager's evaluation of your work', which can affect remuneration outcomes such as merit increases and bonuses, and it should cover the entire year rather than just recent work. Four steps: synthesise the year, draft in the employer's template, sharpen the impact statements, then write the development section honestly.
Step 1: Full-year synthesis
Step 2: Self-assessment draft
Step 3: Impact statements pass
Step 4: Gap and growth

Part 4: Negotiation prep
The same log that feeds your review feeds a remuneration or promotion case, and the preparation rules are well documented. Harvard's Program on Negotiation advises benchmarking your market value from multiple sources, salary databases, industry associations, recruiter conversations and professional networks, and building the case on measurable accomplishments: revenue generated or costs saved, efficiency improvements, leadership contributions and expanded responsibilities. It is equally clear about what to avoid: framing the request around personal expenses or financial stress. Need explains why you want more. It never explains why you have earned it.
The division of labour here is strict. The AI can assemble your strongest quantified contributions, surface evidence of scope growth, and anticipate the objections a manager is likely to raise, because all of that lives in your log. What it must never do is supply market data. A language model asked for salary benchmarks will produce plausible numbers with no provenance, and a single fabricated benchmark discredits an otherwise solid case. The prompt below deliberately leaves the market figures as placeholders you fill from real sources you have checked yourself.
Treat the output as your preparation document first and a handover document second. Rehearse the objection responses out loud, fill the benchmark placeholders with sourced figures, and only then decide whether a written version goes to your manager or the case is made in conversation with the document as your private script.

Part 5: The leader's side
If you lead a team, the same discipline scales up, and the boundaries get harder. A leader running six or eight reviews faces the same memory problem multiplied: a year of one-on-ones, incidents, wins and course corrections per person, most of it undocumented or scattered. AI can carry a real share of the preparation load. It cannot carry any share of the judgement.

What works
- Synthesising the year: feed a year of your own one-on-one notes per person, de-identified to [TEAM MEMBER A] before they go anywhere near the tool, and only inside your employer-approved enterprise tool; role labels alone do not truly de-identify a year of notes about one person. Ask for themes, trends and evidence gaps. You verify the themes against your records; the synthesis just saves you the re-reading.
- Consistent structures: draft a common review skeleton once and reuse it, so every team member is assessed against the same headings rather than whichever format survived from last year.
- Vagueness checking: run each draft review through a pass that flags unevidenced or vague statements, the "good team player" and "needs to be more strategic" filler that says nothing and defends nothing.
- Language comparison, as analysis support only: compare wording across draft reviews for consistency and bias signals, then re-examine the underlying evidence yourself wherever the comparison flags a mismatch.
A worked example: six reviews, one consistency pass
Consider a leader with six direct reports at annual review time. The inputs are the leader's own materials: six draft reviews and a year of one-on-one notes, de-identified to [TEAM MEMBER A] through [TEAM MEMBER F] before anything is pasted, inside the employer-approved enterprise tool. No ratings go in, and none are requested.
The consistency pass comes back with three flags. First, the draft for [TEAM MEMBER C] is built almost entirely on vague adjectives, 'reliable', 'positive attitude', 'good team player', with no evidenced example anywhere in the document, while the draft for [TEAM MEMBER E] cites a metric in every paragraph. Second, the drafts for [TEAM MEMBER B] and [TEAM MEMBER F] describe near-identical contributions, both ran a control uplift project to completion, but one is described as having transformed the control environment while the other merely completed assigned remediation tasks. Third, the pass lists the questions the leader should answer from their own records before finalising: which of C's claimed strengths actually appear in the one-on-one notes, and what specifically distinguishes B's project from F's.
What the leader does next is the point of the exercise. The [TEAM MEMBER C] draft is rewritten against the notes file, replacing each adjective with a dated example or deleting the claim it decorated. The B and F drafts are moderated so the language matches the actual gap between them, which the evidence shows is far smaller than the wording implied. The AI flagged the inconsistencies; the leader re-read the evidence and made every judgement. The ratings were never in the tool at all.
Fairness is procedural
Best practice employers have regular discussions with employees about performance. They set clear goals and provide feedback and support to help employees perform at their best.
That is the Fair Work Ombudsman's Managing Underperformance guide, and the rest of it is just as concrete: set clear performance expectations and record individual goals, provide regular, specific and timely feedback, conduct performance reviews every few months, and write things down, including examples of behaviour, when they occurred and the documents that demonstrate them. The guide lists the avoidance of legal disputes, such as unfair dismissal or bullying claims, among the benefits of getting this right. Read in reverse, that is the risk statement: procedurally poor performance management, the rushed review, the surprise rating, the undocumented history, is exactly what turns a performance problem into a claim. AI makes the documentation lighter. It does not substitute for a single one of the conversations.

What never to automate
Three decisions stay human, without exception: performance ratings, termination rationale, and remuneration decisions. This is not a stylistic preference; it is where Australian policy is visibly heading. The House of Representatives Standing Committee's Future of Work report, tabled in February 2025, recommended that AI systems used for employment related purposes, including recruitment, referral, hiring, remuneration, promotion, training, apprenticeship, transfer or termination, be classified as high-risk, and that the Fair Work Act be reviewed so decision making using AI and ADM is covered 'and employers remain liable for these decisions'. Those are committee recommendations, not yet law, but the direction of travel is unambiguous, and 'employers remain liable' is the sentence every leader should keep in view.
The privacy side is already legislated. The OAIC confirms that from 10 December 2026, entities using personal information in automated decision-making with the potential to affect rights or interests must set out in their privacy policies the kinds of personal information used and the kinds of decisions made. It is a transparency obligation, not a prohibition, but it means employment decisions made using automated decision-making become disclosable rather than invisible. And the federal employment department applies the same standard to itself: DEWR's AI transparency statement commits to human-in-the-loop mechanisms embedded at critical stages. The practical boundary for a leader is simple to state and easy to audit: use AI to prepare, structure and check. Decide alone. If an AI-generated line appears in a review you sign, it is your line, with everything that follows from that.
Part 6: Privacy guardrails
Four rules keep the system safe to run at work. They are not optional extras; they are the licence conditions.
- No colleague personal data, ever. De-identify by role: [TEAM MEMBER A], [SENIOR MANAGER], [PEER]. The log records your work, not your judgements about other people, and the project instructions tell the assistant to warn you if identifying detail slips in.
- Employer-approved tools only. Enterprise plans carry materially different data controls from consumer accounts: OpenAI states that it does not train its models on ChatGPT Enterprise business data by default and that workspace admins control retention. Confirm the equivalent data-control settings on whichever assistant your organisation has approved before the log holds a single entry.
- Check your organisation's AI and acceptable-use policies first. If policy says work information stays out of AI tools, respect that: the manual brag document still works, and the file templates in Part 1 run perfectly well in a plain text editor.
- Evidence pointers, not evidence. Reference where proof lives ("October 2026 committee pack", "email dated 14 May") instead of pasting sensitive documents into the log. The pointer is enough to retrieve the proof when you need it.
The compound payoff
Twelve entries at 10 minutes each is two hours of effort across a year. What it buys: a review pack that assembles in an afternoon instead of a lost weekend, a negotiation case grounded in numbers, a permanently current source for your CV, and something rarer than any of those, an honest record of the year including its gaps. The difference between you and a colleague of identical performance is that your year is documented and theirs is remembered. Reviews reward the documented one.
The assessment below tests whether you can run this system, not merely describe it: the three files, the monthly mechanics, the prompt chains, the negotiation rules, and the boundaries that keep the leader's side lawful and fair. Work through it before you build.


