LM-S02 The AI-Powered Performance Review System

The annual performance review has a structural flaw: it rewards memory, not performance. By the time review season arrives, most of the year has evaporated. The project you rescued in March, the process you quietly fixed in May, the praise a senior manager sent in June, all of it competes with whatever happened in the last six weeks. Culture Amp defines this as recency bias, 'the tendency to focus on the most recent time period instead of the total time period', and notes it persists because recent events are simply easier to remember. Its recommended counter is unglamorous: collect data points throughout the year, so the review draws on the whole period rather than the tail end.

The manual version of that counter has been around for years. Software engineer Julia Evans calls it the brag document: a running record of accomplishments kept because you do not remember everything you did, and your manager does not remember everything you did either. Evans suggests short updates every couple of weeks, or a longer session every 6 to 12 months. It works, but it depends entirely on personal discipline, and when review season arrives it still leaves you to assemble the self-assessment, the talking points and the evidence trail by hand.

This module builds the AI-powered version: a dedicated Performance project space that holds your actual KPIs, logs achievements in a consistent evidence format, and turns roughly 10 minutes a month into a complete review pack whenever you need one. If you need a reason to bother, consider that the Fair Work Ombudsman's best practice guide recommends employers ask employees to 'complete a short self-review ahead of the performance review'. The self-review is coming either way. The only question is whether you walk into it with twelve months of dated evidence or a blank page and a fading memory.

The module runs in six parts. Part 1 builds the project space and its three working files. Part 2 installs the 10 minute monthly ritual. Part 3 chains prompts from raw log to quarterly check-in and annual review packs. Part 4 turns the same log into an evidence-based remuneration case. Part 5 covers the leader side of the table, including the decisions that must never be delegated to AI. Part 6 closes with the privacy guardrails that make the whole system safe to run at work.

Diagram of the Performance project architecture: instructions, three evidence files, and outputs — The system at a glance: one project space, three files, four outputs.

Part 1: Build the Performance project space

The system lives in one dedicated workspace, not scattered across ad hoc chats. A dedicated space gives you three things a loose chat cannot: persistent instructions that govern every conversation, persistent reference files the assistant can always consult, and contained memory so your performance evidence does not leak into unrelated chats. Set it up once, in about 20 minutes, and it runs for years.

In ChatGPT, create a dedicated Project. Projects group chats, uploaded reference files and project-level instructions in one place, and project instructions apply only inside that project, overriding your global custom instructions there. Projects also carry built-in memory, chosen at creation as default or project-only. On Enterprise and Edu workspaces the containment is stronger still: project chats cannot reference conversations outside the project, and outside conversations cannot reference them. That walled-garden behaviour suits this use case exactly.

On Claude, the equivalent is a Project on claude.ai, which supports custom project instructions and a project knowledge base of uploaded documents used across all chats in the project, with each project holding its own separate memory space. If your organisation runs Claude Cowork on the desktop, you can go one step further: point a Cowork project at a local folder holding the three files below. Cowork reads and writes files in connected folders directly, folder instructions add project-specific context, and project memory is scoped so what Claude learns in your Performance project does not carry over to others. Whichever tool you use, the architecture is identical: one instruction set, three files, one contained memory.

The project instructions

The instructions are the contract between you and the assistant. They tell it what this project is for, how to record what you say, and what it must never do. Paste this blueprint in as your project instructions (ChatGPT) or project or folder instructions (Claude), and adjust the file names only if you rename the files themselves.

Project instructions

You are my performance evidence assistant. This project exists to
help me keep an accurate, evidence-based record of my work
performance against my documented KPIs.

Rules:
1. My KPIs, role context and capability expectations live in
   kpi-reference.md. Treat that file as the fixed reference point
   for everything else in this project.
2. When I log achievements, record them in the entry format used in
   achievements-log.md: date, what happened, situation and task in
   one or two lines, the action I took, the measurable result, the
   KPI it maps to, and where the evidence lives.
3. Feedback goes in feedback.md in its entry format: verbatim where
   possible, dated, source identified by role only.
4. Record what I tell you faithfully. Never embellish, inflate
   numbers, or add claims I did not make. If something I say is
   vague, ask me for the specific detail instead of filling the gap.
5. When you summarise or draft, flag any KPI with thin or no
   evidence. Do not paper over gaps.
6. De-identify people. Refer to colleagues by role, for example
   [TEAM MEMBER A] or [SENIOR MANAGER], never by name. Warn me if I
   paste in personal details about anyone else.
7. Keep all drafts in my voice: plain, specific, first person,
   no hype.

File 1: kpi-reference.md

This is the fixed reference point: what your year is actually judged against. Copy the relevant extract from your position description, your agreed goals, or your performance agreement into this structure. Be literal. If your KPIs have weightings, include them; the prompt chains use weightings to decide where the strongest evidence matters most. Update this file only when your goals formally change.

kpi-reference.md

# KPI reference: [YOUR NAME], [ROLE TITLE]
Review period: [START DATE] to [END DATE]
Last updated: [DATE]

## Role summary
Compliance analyst responsible for monitoring obligations under the
organisation's compliance framework, running scheduled control
testing, and reporting breaches and incidents to the risk committee.

## KPIs
1. Control testing programme: complete 100 per cent of scheduled
   control tests each quarter. Weighting: 30 per cent.
2. Incident reporting: all reportable incidents logged within 2
   business days of identification. Weighting: 20 per cent.
3. Obligation register: register reviewed and updated monthly, zero
   overdue reviews. Weighting: 20 per cent.
4. Stakeholder training: deliver 4 compliance training sessions per
   year with average feedback of 4.0 or higher. Weighting: 15 per cent.
5. Process improvement: deliver at least 2 documented process
   improvements per year. Weighting: 15 per cent.

## Capability expectations
- Communicates findings clearly to non-specialist audiences
- Works effectively across teams without requiring escalation
- Develops others through coaching and documentation

File 2: achievements-log.md

This is the running evidence log, the heart of the system. Every entry follows the same shape so the prompt chains can build claims without guesswork. The detail fields borrow the STAR structure, Situation, Task, Action, Result, which MIT Career Advising documents for behavioural interviews with a clear instruction to highlight quantifiable results. MIT's guidance weights the telling: roughly 20 per cent situation, 10 per cent task, 60 per cent action, and 10 per cent result, with quantifiable results highlighted. MIT frames STAR for interviews; the same structure works just as well for review evidence, because both jobs are the same job: proving a contribution happened and mattered. The evidence pointer is the field people skip and later regret. It records where proof lives, a dated email, a report name, a dashboard, described rather than attached.

achievements-log.md

# Achievements log: [REVIEW PERIOD]
One entry per achievement. Newest first. Every entry maps to a KPI
or a capability expectation from kpi-reference.md.

---
Date: [YYYY-MM-DD]
What: [One line: what happened]
Situation/Task: [1-2 lines: the context and what needed doing]
Action: [2-3 lines: what YOU specifically did]
Result: [The measurable outcome. Numbers wherever possible]
KPI: [Which KPI or capability this maps to]
Evidence: [Where proof lives: email dated X, report name, dashboard]
---

File 3: feedback.md

Feedback is evidence too, and it evaporates fastest. This file holds positive and constructive feedback verbatim where possible, with the date and the source's role, never their name. Recording the situation the feedback relates to and its observable effect keeps entries factual rather than editorial; that discipline echoes the Situation-Behaviour-Impact model the Center for Creative Leadership developed for feedback conversations: describe the situation, the actual observable behaviour, and the impact, keeping to facts and leaving out judgements. For constructive feedback, add one more line: what you did about it. A logged correction plus a logged fix is development evidence, and it is far more persuasive than claiming you have no development areas at all.

feedback.md

# Feedback received: [REVIEW PERIOD]
Verbatim where possible. De-identified by role. Newest first.

---
Date: 2026-05-14
From: [SENIOR MANAGER], risk division
Type: Positive, unsolicited, via email
Feedback: "The board summary you prepared was the clearest we have
had this year. The committee asked no clarifying questions."
Context: Q3 obligations report
Evidence: Email dated 14 May 2026, saved in [FOLDER]
---
Date: 2026-04-02
From: [PEER], operations team
Type: Constructive, from project retrospective
Feedback: "The handover notes assumed too much system knowledge. It
took me half a day to get running."
Context: Claims system control testing handover
Action taken: Rewrote the handover template with a prerequisites
section. No repeat issue in the two handovers since.
---

Upload the three files to the project and keep them current. In a Cowork project, Claude updates the local files in place. In ChatGPT, ask the assistant to output the updated file at the end of each logging session, save it, and replace the copy in the project's sources. Keep the files text-forward and focused: one file per job, no embedded screenshots, no pasted documents. The whole system stays small enough to read end to end in minutes.

Part 2: The 10 minute monthly ritual

The system runs on one recurring calendar appointment: 10 minutes, once a month, in a slot you protect. The first Friday of the month works well because the previous month is complete but still fresh. Book it as a recurring invite to yourself, treat it like a meeting with your future self, and do exactly four things:

Open the Performance project and run the monthly logging prompt below.
Answer the interview questions from memory, your sent folder and your calendar.
Review the formatted entries the assistant produces, correct anything inflated or imprecise, and save them into achievements-log.md and feedback.md.
Note any KPI the assistant flags as having no evidence this month, and decide whether the gap is real.

Prompt: monthly log ritual

It is the first Friday of [MONTH]. Run my monthly performance log.

Interview me one question at a time:
1. What did I ship, finish or deliver this month?
2. What problems did I solve, including quiet fixes nobody saw?
3. What numbers moved: time saved, errors reduced, volumes handled,
   feedback scores, dollars?
4. What feedback did I receive, positive or constructive, and from
   what role?
5. Did I prevent or defend anything: a risk caught, an error
   stopped, a deadline protected?
6. Did I help anyone else deliver: reviews, coaching, handovers?

Then:
- Format each item as an achievements-log.md entry (date, what,
  situation/task, action, result, KPI, evidence pointer).
- Map every entry to a KPI or capability from kpi-reference.md.
- Format any feedback as a feedback.md entry, de-identified by role.
- List every KPI with no new evidence this month, and ask me two
  targeted questions to check whether I have simply forgotten
  something before we accept the gap as real.

After I confirm the entries, append them to the top of
achievements-log.md and feedback.md, newest first. If you cannot
write files directly, output both updated files in full so I can
save them.

A worked example

Here is what one October entry looks like for a de-identified GRC analyst, produced from a two-minute answer to question one and a follow-up question about numbers:

achievements-log.md entry: October

Date: 2026-10-31
What: Cleared the overdue control testing backlog ahead of the
October risk committee meeting
Situation/Task: Control testing had fallen 9 tests behind schedule
after two team departures. The backlog risked a red flag at the
October risk committee.
Action: Rebuilt the testing schedule, negotiated two weeks of
support from [TEAM MEMBER A], and automated the evidence request
emails, cutting chase time per test from about 40 minutes to
under 10.
Result: All 9 overdue tests completed before the committee meeting.
Testing programme back to 100 per cent on schedule. The automation
is now standard practice for the team.
KPI: 1 (control testing programme), 5 (process improvement)
Evidence: October 2026 committee pack; testing tracker snapshot;
email from [RISK MANAGER] dated 2026-10-29

Why monthly beats quarterly cramming

Recency bias does not only affect your manager. It affects you. By month three, the specifics from month one are gone: the exact number of overdue tests, the minutes saved per email, who said what and when. Culture Amp's counter-measure for reviewers, collecting data points throughout the year so the end-of-year view covers the entire period, applies with equal force to your own record; this module simply extends that logic from the manager's side of the table to yours. A monthly entry is a contemporaneous record, written while the numbers are still checkable.

The Fair Work Ombudsman's underperformance guidance runs on the same logic from the employer's side: write down examples of behaviour at the time, note when they occurred and why they matter, and gather the documents that demonstrate them. The regulator recommends contemporaneous documentation because memory does not survive contest. The same principle that protects an employer in a dispute protects your achievements at review time. There is also a plain scheduling reason: a 10 minute monthly slot survives busy periods. A 90 minute quarterly session is precisely the kind of appointment that gets moved four times and then abandoned, which quietly converts your evidence system back into a memory test.

When the year goes wrong

Some years do not cooperate. A restructure removes the project you were hired to run, a budget freeze cancels the work at phase one, a KPI becomes unmeasurable because the system behind it was decommissioned. The instinct is to stop logging, because nothing feels like an achievement. Log anyway, and log differently: record what was delivered before the change, the decision trail showing the cause sat outside your control, what you preserved or prevented during the disruption, and the skills the salvage work demonstrated. The Fair Work Ombudsman's documentation logic runs both ways here: the contemporaneous records the regulator recommends because they protect an employer in a dispute are the same records that protect an employee when performance is contested. A dated entry written the week a project was cancelled is worth far more than an explanation assembled months later under pressure.

Here is what a defensive entry looks like for the same GRC analyst, written the week the news landed rather than reconstructed in review week:

achievements-log.md entry: a defensive record

Date: 2026-08-29
What: Vendor risk dashboard project cancelled by the August
budget freeze after phase one delivery
Situation/Task: Commissioned in May to consolidate vendor risk
reporting across three business units. The organisation-wide
budget freeze announced on 28 August cancelled phases two and
three.
Action: Delivered phase one on schedule before the freeze: the
consolidated vendor register covering all three units, plus the
data quality rules that keep it clean. Documented the phase two
build plan and handed it to [RISK MANAGER] so the work can
restart without rework when funding returns.
Result: Register in production use by two of the three units at
cancellation. The cancellation decision and its rationale are
recorded in the steering note of 2 September; the freeze applied
organisation wide and was outside my control.
KPI: 5 (process improvement); capability: works effectively
across teams
Evidence: Phase one acceptance email dated 2026-08-21; steering
note dated 2026-09-02

At review time, that entry converts a cancelled project from a hole in the year into evidence: phase one landed on schedule, the cancellation was organisational, and the handover means the investment is recoverable. The same honesty applies to plain gaps. A KPI that produced nothing because the scope was cut reads very differently from a KPI that produced nothing with no explanation. Silence at review time gets filled by assumption; a dated record gets read.

Timeline of the 10 minute monthly logging ritual across a review year — Twelve short sessions replace one desperate reconstruction in review week.

Part 3: The prompt chains, from log to review pack

When a check-in or a review lands, you do not start writing. You run a chain. Each chain is a sequence of prompts run inside the Performance project, where every step reads the same three files and every claim traces back to a log entry. Two chains cover the review calendar: a three-step quarterly check-in pack and a four-step annual review pack.

Chain 1: The quarterly check-in pack

Three steps: synthesise, draft, attack. The order matters. Drafting before synthesis invites claims the evidence cannot carry; skipping the attack step sends your manager a document nobody has stress-tested.

Step 1: Synthesise the quarter

Prompt: quarterly synthesis

Read every achievements-log.md entry dated between [QUARTER START]
and [QUARTER END], plus any feedback.md entries from the same
period.

Synthesise the quarter against kpi-reference.md:
1. For each KPI: list the entries that evidence it, with the
   strongest metric from each.
2. Group the entries into 3 to 5 achievement themes.
3. Flag every KPI with thin or no evidence this quarter. Do not
   soften the gaps.
Output a working summary, not polished prose.

Step 2: Draft the one-pager

Prompt: check-in one-pager

Turn the quarterly summary into a one-page check-in document with
four sections:
1. Wins: 3 to 5 achievements, each one sentence, each with a number.
2. Risks and blockers: anything threatening next quarter's KPIs,
   stated factually.
3. Support needed: specific asks of my manager, not complaints.
4. Next quarter focus: 3 priorities mapped to KPIs.
Keep it under 400 words, in plain language. Every claim must trace
to a log entry; add nothing that is not in the log.

Step 3: Red-team the draft

Prompt: red-team the draft

Red-team the check-in one-pager as a sceptical manager would:
1. Which claims would a manager challenge, and what would they ask?
2. Where is the evidence thin, secondhand, or missing a number?
3. Which win is really a team result I am claiming alone, and how
   should the wording share credit accurately?
4. What is missing that a manager who watched my work this quarter
   would expect to see?
List each challenge, then suggest the edit that answers it.

What the finished one-pager looks like

Here is the chain's end product for the same GRC analyst, generated from the October quarter of the log and tightened through the red-team pass. Notice two things. Every win carries its number and its KPI, so nothing rests on adjectives. And the thin KPI is named honestly in the risks section rather than papered over: incident reporting produced one entry all quarter, so the document says so and converts the weakness into a specific support ask. That honesty is what makes the strong claims credible.

Worked example: quarterly check-in one-pager

Quarterly check-in: [YOUR NAME], compliance analyst
Period: 1 October to 31 December 2026

Wins
1. Cleared the 9-test control testing backlog before the October
   risk committee; programme back to 100 per cent on schedule
   (KPI 1).
2. Automated the evidence request emails, cutting chase time per
   test from about 40 minutes to under 10; now standard practice
   for the team (KPI 5).
3. Delivered the quarter's compliance training session to 34
   staff with average feedback of 4.4 against the 4.0 target
   (KPI 4).
4. Obligation register reviewed on time in all three months, zero
   overdue reviews (KPI 3).

Risks and blockers
- Incident reporting (KPI 2) has thin evidence this quarter: one
  reportable incident, logged within 2 business days, but intake
  still routes through a shared mailbox nobody owns after hours.

Support needed
- A decision on out-of-hours ownership of the incident mailbox
  before the December shutdown period.

Next quarter focus
1. Hold the testing programme at 100 per cent through the
   January leave season (KPI 1).
2. Propose a monitored intake queue to replace the shared
   incident mailbox (KPI 2).
3. Scope the second documented process improvement for the year
   (KPI 5).

Chain 2: The annual review pack

The annual chain adds two passes because the stakes are higher. HBR guidance from Marlo Lyons is blunt about why the self-assessment deserves this effort: it 'will set the tone for your manager's evaluation of your work', which can affect remuneration outcomes such as merit increases and bonuses, and it should cover the entire year rather than just recent work. Four steps: synthesise the year, draft in the employer's template, sharpen the impact statements, then write the development section honestly.

Step 1: Full-year synthesis

Prompt: full-year synthesis

Read the full achievements-log.md and feedback.md for the review
period [START DATE] to [END DATE].

Produce a full-year synthesis mapped KPI by KPI:
1. For each KPI in kpi-reference.md: every supporting entry, the 2
   or 3 strongest results with numbers, and the trend across the
   year (improving, steady, declining).
2. A capability evidence section: entries demonstrating each
   capability expectation.
3. An honest gaps list: KPIs or capabilities where the year's
   evidence is thin.
4. Themes worth naming: threads that run across multiple entries.

Step 2: Self-assessment draft

Prompt: self-assessment draft

Draft my self-assessment using my employer's template structure,
which I will paste below. Rules:
- Cover the whole review period evenly, not just recent months.
- Every claim traces to a log entry. Nothing new, nothing inflated.
- Lead each section with the strongest quantified result.
- Where the template asks for ratings or reflections, draft in my
  voice and mark them [FOR MY REVIEW] so I decide the final wording.

[PASTE EMPLOYER TEMPLATE STRUCTURE HERE]

Step 3: Impact statements pass

Prompt: impact statements pass

Rewrite the 6 to 8 key achievements in the draft as impact
statements using the STAR structure: situation and task in one
line, action in one or two lines, result with the quantified
outcome. Weight the wording towards the action and the result. Cut
adjectives; keep numbers. Flag any statement where the result is
unquantified so I can either find the number or soften the claim.

Step 4: Gap and growth

Prompt: gap and growth

Draft the development section of my self-assessment:
1. From the honest gaps list, pick the 2 or 3 development areas
   that matter most against my KPIs.
2. Frame each one factually: the gap, why it matters to the role,
   and the actions I have already taken or will take (training,
   coaching, changed process).
3. No self-flagellation and no spin. A named gap with an action
   plan reads as self-awareness; a hidden gap found by my manager
   reads as a blind spot.

Flow diagram of the quarterly and annual review pack prompt chains — From raw log entries to a red-teamed review pack, one prompt per step.

Part 4: Negotiation prep

The same log that feeds your review feeds a remuneration or promotion case, and the preparation rules are well documented. Harvard's Program on Negotiation advises benchmarking your market value from multiple sources, salary databases, industry associations, recruiter conversations and professional networks, and building the case on measurable accomplishments: revenue generated or costs saved, efficiency improvements, leadership contributions and expanded responsibilities. It is equally clear about what to avoid: framing the request around personal expenses or financial stress. Need explains why you want more. It never explains why you have earned it.

The division of labour here is strict. The AI can assemble your strongest quantified contributions, surface evidence of scope growth, and anticipate the objections a manager is likely to raise, because all of that lives in your log. What it must never do is supply market data. A language model asked for salary benchmarks will produce plausible numbers with no provenance, and a single fabricated benchmark discredits an otherwise solid case. The prompt below deliberately leaves the market figures as placeholders you fill from real sources you have checked yourself.

Prompt: negotiation case

Build a one-page case for a remuneration review from my
achievements log. Structure:
1. My 3 strongest quantified contributions this review period,
   each one sentence, each with the number and the KPI it served.
2. Scope growth: evidence that my responsibilities have expanded
   beyond what kpi-reference.md describes, with dates.
3. Market position: leave the placeholders [MARKET BENCHMARK
   SOURCE 1] and [MARKET BENCHMARK SOURCE 2] exactly as written.
   Do not estimate or invent market figures. I will source actual
   benchmarks myself.
4. Anticipated objections: the 3 most likely responses (timing,
   budget, parity) with a factual, non-defensive reply to each,
   drawn only from the log.
Frame everything on contribution and market evidence. Remove
anything that argues from personal financial need.

Treat the output as your preparation document first and a handover document second. Rehearse the objection responses out loud, fill the benchmark placeholders with sourced figures, and only then decide whether a written version goes to your manager or the case is made in conversation with the document as your private script.

Structure of an evidence-based remuneration case built from the achievements log — Quantified contributions and market benchmarks, never personal financial need.

Part 5: The leader's side

If you lead a team, the same discipline scales up, and the boundaries get harder. A leader running six or eight reviews faces the same memory problem multiplied: a year of one-on-ones, incidents, wins and course corrections per person, most of it undocumented or scattered. AI can carry a real share of the preparation load. It cannot carry any share of the judgement.

Split diagram comparing the employee-led evidence system with leader-side AI support — The same discipline on both sides of the review table, with different boundaries.

What works

Synthesising the year: feed a year of your own one-on-one notes per person, de-identified to [TEAM MEMBER A] before they go anywhere near the tool, and only inside your employer-approved enterprise tool; role labels alone do not truly de-identify a year of notes about one person. Ask for themes, trends and evidence gaps. You verify the themes against your records; the synthesis just saves you the re-reading.
Consistent structures: draft a common review skeleton once and reuse it, so every team member is assessed against the same headings rather than whichever format survived from last year.
Vagueness checking: run each draft review through a pass that flags unevidenced or vague statements, the "good team player" and "needs to be more strategic" filler that says nothing and defends nothing.
Language comparison, as analysis support only: compare wording across draft reviews for consistency and bias signals, then re-examine the underlying evidence yourself wherever the comparison flags a mismatch.

Prompt: review draft consistency check

Here are my de-identified draft reviews for [N] team members,
labelled [TEAM MEMBER A] through [TEAM MEMBER N]. As analysis
support only:
1. Flag every statement that is vague or carries no evidence
   ("good team player", "needs to be more strategic").
2. Compare language across the drafts: note where similar
   performance is described in stronger or weaker terms, and any
   pattern in who gets outcome language versus effort language.
3. List the questions I should answer from my own records before
   finalising each review.
Do not score, rank or rate anyone. Ratings are my decision alone.

[PASTE THE DE-IDENTIFIED DRAFTS BELOW]

A worked example: six reviews, one consistency pass

Consider a leader with six direct reports at annual review time. The inputs are the leader's own materials: six draft reviews and a year of one-on-one notes, de-identified to [TEAM MEMBER A] through [TEAM MEMBER F] before anything is pasted, inside the employer-approved enterprise tool. No ratings go in, and none are requested.

The consistency pass comes back with three flags. First, the draft for [TEAM MEMBER C] is built almost entirely on vague adjectives, 'reliable', 'positive attitude', 'good team player', with no evidenced example anywhere in the document, while the draft for [TEAM MEMBER E] cites a metric in every paragraph. Second, the drafts for [TEAM MEMBER B] and [TEAM MEMBER F] describe near-identical contributions, both ran a control uplift project to completion, but one is described as having transformed the control environment while the other merely completed assigned remediation tasks. Third, the pass lists the questions the leader should answer from their own records before finalising: which of C's claimed strengths actually appear in the one-on-one notes, and what specifically distinguishes B's project from F's.

What the leader does next is the point of the exercise. The [TEAM MEMBER C] draft is rewritten against the notes file, replacing each adjective with a dated example or deleting the claim it decorated. The B and F drafts are moderated so the language matches the actual gap between them, which the evidence shows is far smaller than the wording implied. The AI flagged the inconsistencies; the leader re-read the evidence and made every judgement. The ratings were never in the tool at all.

Fairness is procedural

Best practice employers have regular discussions with employees about performance. They set clear goals and provide feedback and support to help employees perform at their best.

That is the Fair Work Ombudsman's Managing Underperformance guide, and the rest of it is just as concrete: set clear performance expectations and record individual goals, provide regular, specific and timely feedback, conduct performance reviews every few months, and write things down, including examples of behaviour, when they occurred and the documents that demonstrate them. The guide lists the avoidance of legal disputes, such as unfair dismissal or bullying claims, among the benefits of getting this right. Read in reverse, that is the risk statement: procedurally poor performance management, the rushed review, the surprise rating, the undocumented history, is exactly what turns a performance problem into a claim. AI makes the documentation lighter. It does not substitute for a single one of the conversations.

Boundary diagram showing which review decisions must never be automated — Ratings, termination rationale and remuneration decisions stay human.

What never to automate

Three decisions stay human, without exception: performance ratings, termination rationale, and remuneration decisions. This is not a stylistic preference; it is where Australian policy is visibly heading. The House of Representatives Standing Committee's Future of Work report, tabled in February 2025, recommended that AI systems used for employment related purposes, including recruitment, referral, hiring, remuneration, promotion, training, apprenticeship, transfer or termination, be classified as high-risk, and that the Fair Work Act be reviewed so decision making using AI and ADM is covered 'and employers remain liable for these decisions'. Those are committee recommendations, not yet law, but the direction of travel is unambiguous, and 'employers remain liable' is the sentence every leader should keep in view.

The privacy side is already legislated. The OAIC confirms that from 10 December 2026, entities using personal information in automated decision-making with the potential to affect rights or interests must set out in their privacy policies the kinds of personal information used and the kinds of decisions made. It is a transparency obligation, not a prohibition, but it means employment decisions made using automated decision-making become disclosable rather than invisible. And the federal employment department applies the same standard to itself: DEWR's AI transparency statement commits to human-in-the-loop mechanisms embedded at critical stages. The practical boundary for a leader is simple to state and easy to audit: use AI to prepare, structure and check. Decide alone. If an AI-generated line appears in a review you sign, it is your line, with everything that follows from that.

Part 6: Privacy guardrails

Four rules keep the system safe to run at work. They are not optional extras; they are the licence conditions.

No colleague personal data, ever. De-identify by role: [TEAM MEMBER A], [SENIOR MANAGER], [PEER]. The log records your work, not your judgements about other people, and the project instructions tell the assistant to warn you if identifying detail slips in.
Employer-approved tools only. Enterprise plans carry materially different data controls from consumer accounts: OpenAI states that it does not train its models on ChatGPT Enterprise business data by default and that workspace admins control retention. Confirm the equivalent data-control settings on whichever assistant your organisation has approved before the log holds a single entry.
Check your organisation's AI and acceptable-use policies first. If policy says work information stays out of AI tools, respect that: the manual brag document still works, and the file templates in Part 1 run perfectly well in a plain text editor.
Evidence pointers, not evidence. Reference where proof lives ("October 2026 committee pack", "email dated 14 May") instead of pasting sensitive documents into the log. The pointer is enough to retrieve the proof when you need it.

The compound payoff

Twelve entries at 10 minutes each is two hours of effort across a year. What it buys: a review pack that assembles in an afternoon instead of a lost weekend, a negotiation case grounded in numbers, a permanently current source for your CV, and something rarer than any of those, an honest record of the year including its gaps. The difference between you and a colleague of identical performance is that your year is documented and theirs is remembered. Reviews reward the documented one.

The assessment below tests whether you can run this system, not merely describe it: the three files, the monthly mechanics, the prompt chains, the negotiation rules, and the boundaries that keep the leader's side lawful and fair. Work through it before you build.

The AI-Powered Performance Review System

What you'll be able to do