These ai performance review survey questions help you measure how people experience AI-generated drafts, summaries, and suggestions in performance reviews—separately for employees and for Führungskräfte. You’ll spot trust, fairness, and privacy issues early, and you’ll get clear thresholds that tell you when to pause, train, or tighten guardrails.
Survey questions
Use a 5-point Likert scale for the statements (1 = Strongly disagree, 5 = Strongly agree). Numbering below is for analysis and follow-up: E = Employees, M = Managers, S = Shared.
Employees (E1–E36) — Awareness & transparency
- (Employees, E1) I understand when AI is used in my performance review process.
- (Employees, E2) I know which parts of my review may include AI-generated text (drafts, summaries, suggested wording).
- (Employees, E3) I was told what the AI can and cannot do in performance reviews.
- (Employees, E4) I know who is accountable for the final review content (not the AI).
- (Employees, E5) I was informed if AI influenced ratings, calibration input, or performance labels.
- (Employees, E6) The company explained the reason for using AI in reviews in plain language.
Employees (E7–E12) — Quality & usefulness of AI-assisted feedback
- (Employees, E7) AI-assisted feedback in my review was specific to my actual work.
- (Employees, E8) The feedback included concrete examples or evidence, not just generic phrases.
- (Employees, E9) The feedback was consistent with what I heard in 1:1s during the cycle.
- (Employees, E10) The feedback clearly separated facts, interpretations, and expectations.
- (Employees, E11) The feedback helped me understand priorities for the next 3–6 months.
- (Employees, E12) The tone of the feedback felt respectful and professional.
Employees (E13–E18) — Fairness & bias perceptions
- (Employees, E13) AI-assisted feedback made the review feel more fair than fully manual feedback.
- (Employees, E14) I worry the AI may amplify bias (e.g., proximity bias, similarity bias, stereotypes).
- (Employees, E15) The AI-assisted feedback accurately reflected my contributions (not just visible work).
- (Employees, E16) I felt the same performance standard was applied to me as to comparable peers.
- (Employees, E17) I worry AI could misread context (e.g., parental leave, part-time, project changes).
- (Employees, E18) The review language avoided “coded” or ambiguous terms (e.g., “not assertive enough”).
Employees (E19–E24) — Psychological safety & trust
- (Employees, E19) I felt comfortable asking whether AI was used in my review.
- (Employees, E20) I felt comfortable challenging AI-influenced wording during the Mitarbeitergespräch.
- (Employees, E21) My manager was open to correcting errors in the review content.
- (Employees, E22) I trust that AI use did not reduce my chance to be heard as a person.
- (Employees, E23) I believe my manager reviewed and owned the final feedback (not “copy-paste”).
- (Employees, E24) I know how to escalate concerns if AI-assisted feedback feels wrong or unfair.
Employees (E25–E30) — Data protection, privacy & consent
- (Employees, E25) I understand what data sources may be used by AI in the review process.
- (Employees, E26) I understand whether chat inputs, notes, or 360° comments can be used as AI inputs.
- (Employees, E27) I trust that sensitive personal data is not used for AI prompts in reviews.
- (Employees, E28) I understand where the data is processed/stored (e.g., EU/EEA) at a high level.
- (Employees, E29) I believe AI use in reviews follows GDPR principles (data minimisation, purpose limitation).
- (Employees, E30) I know the retention period for AI-related review artifacts (drafts, logs, summaries).
Employees (E31–E36) — Overall impact & preference
- (Employees, E31) AI-assisted reviews improved the clarity of expectations for me.
- (Employees, E32) AI-assisted feedback made the review feel more consistent across the company.
- (Employees, E33) AI-assisted feedback made the review feel less personal for me.
- (Employees, E34) I would prefer AI to be used only for drafting, not for rating suggestions.
- (Employees, E35) I would prefer AI to be used only with clear human review checkpoints.
- (Employees, E36) Overall, AI use improved my review experience this cycle.
Managers (M1–M42) — Onboarding & training
- (Managers, M1) I received training on where AI can be used in reviews and where it cannot.
- (Managers, M2) The training covered how to verify AI outputs with evidence (projects, outcomes, behaviors).
- (Managers, M3) The training covered GDPR-safe prompting (what data must not be entered).
- (Managers, M4) I understand how to explain AI use transparently to employees.
- (Managers, M5) I know what to do when an employee challenges AI-influenced wording.
- (Managers, M6) I feel prepared to use AI without weakening psychological safety in my team.
Managers (M7–M12) — Workflow & time impact
- (Managers, M7) AI reduced the time I needed to prepare reviews.
- (Managers, M8) AI helped me structure feedback faster (strengths, gaps, next steps).
- (Managers, M9) AI improved my ability to summarise 360° feedback without missing key points.
- (Managers, M10) AI increased my admin work due to extra checking and rewriting.
- (Managers, M11) AI helped me keep feedback consistent across multiple direct reports.
- (Managers, M12) AI support improved the quality of my review conversations.
Managers (M13–M18) — Quality of drafts & summaries
- (Managers, M13) AI-generated drafts were accurate enough to be a good starting point.
- (Managers, M14) The drafts included measurable outcomes or observable behaviors when prompted.
- (Managers, M15) AI helped avoid vague feedback by pushing for specifics.
- (Managers, M16) AI summaries captured context correctly (scope changes, constraints, dependencies).
- (Managers, M17) AI outputs matched our internal rubric and competency language.
- (Managers, M18) AI outputs avoided biased language without hiding performance issues.
Managers (M19–M24) — Judgment, oversight & human accountability
- (Managers, M19) I feel confident editing or rejecting AI suggestions.
- (Managers, M20) I consistently verify AI outputs against evidence before sharing with employees.
- (Managers, M21) I can explain the rationale for my final feedback without referring to AI.
- (Managers, M22) AI never overrides my judgment on ratings or performance outcomes.
- (Managers, M23) I understand the risk of over-relying on AI in sensitive people decisions.
- (Managers, M24) I know how to document decisions in an audit-ready way.
Managers (M25–M30) — Governance & guardrails
- (Managers, M25) The company has clear dos/don’ts for AI in performance reviews.
- (Managers, M26) I know which data must never be used in prompts (health, union, protected data).
- (Managers, M27) I know whether a Dienstvereinbarung/Betriebsrat agreement applies to this AI use case.
- (Managers, M28) I know who to contact when the tool output seems risky or wrong (HR/IT/Data Protection).
- (Managers, M29) Our process includes clear human checkpoints before anything impacts an employee.
- (Managers, M30) AI usage is logged in a way that supports transparency and accountability.
Managers (M31–M36) — Fairness, consistency & calibration support
- (Managers, M31) AI helped me apply our performance standards more consistently.
- (Managers, M32) AI increased the risk of “template feedback” that flattens differences between people.
- (Managers, M33) AI made it easier to spot missing evidence before calibration discussions.
- (Managers, M34) AI made it easier to avoid common review biases (recency, halo/horn, proximity).
- (Managers, M35) I worry AI could introduce new bias through training data or wording patterns.
- (Managers, M36) AI improved the quality of inputs I bring to calibration sessions.
Managers (M37–M42) — Overall confidence & willingness to continue
- (Managers, M37) I trust the tool’s outputs when used with careful human review.
- (Managers, M38) I feel comfortable being transparent with employees about AI use.
- (Managers, M39) I would use AI again for drafting feedback in the next cycle.
- (Managers, M40) I would use AI again for summarising 360° feedback in the next cycle.
- (Managers, M41) I would avoid using AI for rating suggestions unless governance improves.
- (Managers, M42) Overall, AI made my review work more effective this cycle.
Shared (S1–S6) — Experience across both audiences
- (Shared, S1) AI use in reviews is communicated in a clear, consistent way across teams.
- (Shared, S2) People affected by AI in reviews can give feedback without negative consequences.
- (Shared, S3) The process makes it easy to correct mistakes in AI-assisted review content.
- (Shared, S4) AI use in reviews aligns with our performance review rubric and expectations.
- (Shared, S5) AI improves the quality of review conversations (not just the paperwork).
- (Shared, S6) I trust the organisation’s guardrails for AI in performance reviews.
Overall ratings (0–10)
- (Employees) How much do you trust AI-assisted feedback in performance reviews? (0–10)
- (Employees) How much did AI improve the quality of feedback you received this cycle? (0–10)
- (Employees) How likely are you to recommend AI-assisted feedback in performance reviews to a colleague? (0–10)
- (Managers) How confident are you using AI in reviews without harming fairness? (0–10)
- (Managers) How much did AI improve your review preparation efficiency this cycle? (0–10)
- (Managers) How likely are you to recommend the current AI review workflow to another manager? (0–10)
Open-ended questions (12 total)
- (Employees) Where did AI-assisted feedback feel most accurate and helpful?
- (Employees) Where did AI-assisted feedback feel generic, wrong, or out of context?
- (Employees) Which sentence or section would you rewrite to better reflect your work?
- (Employees) What would make you feel safer challenging AI-influenced wording in a Mitarbeitergespräch?
- (Employees) What is your biggest concern about data use or privacy in AI-assisted reviews?
- (Managers) In which part of the review workflow did AI save you the most time?
- (Managers) Where did AI create extra work (rewrites, verification, back-and-forth)?
- (Managers) What guardrail would prevent your biggest AI risk in reviews?
- (Managers) What training topic would most improve your AI use in performance feedback?
- (Shared) What should we stop doing with AI in reviews, starting next cycle?
- (Shared) What should we keep doing with AI in reviews because it clearly works?
- (Shared) If you could change 1 rule about AI in reviews, what would it be?
| Question(s) / area | Score / threshold | Recommended action | Owner | Due |
|---|---|---|---|---|
| E1–E6 (Transparency) + S1 | Avg <3,2 or ≥20% “Disagree” | Publish a 1-page AI-in-reviews explainer; manager talking points; add “AI used: yes/no” label in forms. | HR + Comms | Within 14 days |
| E7–E12 (Quality) + E36 | Avg <3,0 | Collect 10 anonymised “bad vs good” examples; refine prompts; require 2 evidence bullets per review section. | HR + Pilot managers | Within 21 days |
| E13–E18 (Fairness) + S6 | Avg <3,0 or group gap ≥0,4 | Run a bias review of AI outputs and review language; tighten rubric anchors; schedule a calibration refresh. | People Analytics + HR | Within 30 days |
| E19–E24 (Psychological safety) | Avg <3,2 | Set a correction right: employee can request edits; provide escalation path; train managers on challenge-safe scripts. | HRBP + Managers | Within 14 days |
| E25–E30 (Privacy) + M3 | Avg <3,5 or any severe comment | Re-brief GDPR rules; update prompt do-not-enter list; confirm retention period; align with DPO and Betriebsrat. | DPO + IT + HR | Within 7 days |
| M1–M6 (Training readiness) | Avg <3,0 | Mandatory 90-minute training + checklist; certify completion before next review cycle access. | L&D + HR | Within 30 days |
| M19–M24 (Oversight) | Avg <3,2 | Add “human sign-off” step; require evidence references; peer review 10% of AI-assisted reviews for quality. | HR + Function leaders | Next review cycle |
| 0–10 ratings (Employees or Managers) | Avg <6/10 or downtrend ≥1,0 | Run 45-minute focus groups (separate employee/manager); publish 3 changes you will make next cycle. | HR | Within 14 days |
Key takeaways
- Measure trust and fairness before AI becomes “silent infrastructure” in reviews.
- Use thresholds to trigger action, not debates about single comments.
- Separate employee and manager views; their risks and incentives differ.
- Track group gaps to spot bias patterns early and correct them fast.
- Close the loop publicly: what changed, who owns it, by when.
Definition & scope
This survey measures how employees and managers experience AI use in performance reviews: usefulness, quality, fairness, psychological safety, and GDPR-aligned guardrails. Use it for teams in AI-assisted review pilots (direct reports and their Führungskraft). Results support decisions on training, transparency, calibration, and whether to expand, pause, or restrict AI features.
Survey blueprints for ai performance review survey questions
Run different versions depending on timing and audience. Keep your deep-dive short enough to finish in ≤8 minutes. If you already run a classic post-cycle survey, align wording with your existing performance review survey questions so trends stay comparable even when AI changes the workflow.
Use this simple flow: pick blueprint → load items → set anonymity rules → send within 3–10 days after reviews → publish results and actions within 21 days.
- Pick 1 blueprint per cycle (HR owner, within 2 days after cycle close).
- Agree on anonymity threshold (HR + Betriebsrat, before launch).
- Send within 10 days post-cycle; keep field time to 7 days (HR ops).
- Share top themes and actions within 21 days (HR + leaders).
| Blueprint | Audience | When to run | Items (target) | Question mix (from bank) | Decision output |
|---|---|---|---|---|---|
| A) Employee post-cycle (pilot) | Employees in AI-assisted reviews | 3–10 days after reviews | 18–22 | E1–E6, E7–E12, E13–E18, E19–E24, E25–E30 + 2 ratings + 3 open | Keep/adjust AI use; fix transparency, safety, and privacy gaps |
| B) Manager post-cycle (pilot) | Managers using AI tools | 3–10 days after calibration | 18–22 | M1–M6, M7–M12, M13–M18, M19–M24, M25–M30 + 2 ratings + 3 open | Training plan; governance updates; workflow changes for next cycle |
| C) Combined pulse (during/after pilot) | Employees + Managers | Mid-pilot or right after first cycle | 12–15 | S1–S6 + E7/E13/E20/E29 + M19/M25 + 2 ratings + 2 open | Early warning: stop/continue decisions before scaling |
| D) Follow-up trend survey | Same populations as A/B | 6–12 months later | 12–18 | Repeat core items: E1/E7/E13/E20/E29/E36, M1/M7/M19/M25/M31/M42, S6 + ratings | Trust trend, fairness trend, adoption readiness for broader rollout |
Scoring & thresholds for ai performance review survey questions
Use the 5-point scale for statements and 0–10 for overall ratings. Keep decisions tied to thresholds so you don’t overreact to anecdotes. If your review forms are being updated, align this survey with your performance review templates so managers see the same rubric language in both places.
Define three bands: Low (Avg <3,0), Medium (3,0–3,9), High (≥4,0). For ratings, treat <6/10 as a stop-and-fix signal in pilots.
- Low (Avg <3,0): fix before scale; assign owner and deadline within 7–30 days.
- Medium (3,0–3,9): improve next cycle; run targeted training and prompt updates.
- High (≥4,0): standardise; publish examples of good practice.
- Group gap trigger: difference ≥0,4 points or ≥15 pp favorable rate.
Turn scores into decisions with a 4-step routine: (1) compute averages per dimension, (2) check dispersion and “Disagree” rates, (3) compare groups, (4) map to actions in the decision table.
- HR computes dimension scores (E, M, S) and flags red thresholds (within 7 days).
- People Analytics checks group gaps and outliers (within 10 days).
- Leaders agree 3 priority fixes max (within 14 days).
- HR publishes a short “what changes next cycle” note (within 21 days).
| Metric | How to calculate | Threshold | Decision rule |
|---|---|---|---|
| Dimension average | Mean of items per block (e.g., E13–E18) | Avg <3,0 | Pause expansion; fix root cause before next cycle |
| Favorable rate | % choosing 4–5 (Agree/Strongly agree) | <60% | Targeted improvement plan with owner + deadline |
| Disagree concentration | % choosing 1–2 | ≥20% | Run focus groups; review comms and manager behaviour |
| Group gap | Difference between groups (e.g., remote vs office) | ≥0,4 points | Bias check + process audit; escalate to HR leadership |
Follow-up & responsibilities
AI in reviews fails when nobody owns the messy parts: corrections, escalations, and governance. Set routing rules up front and keep response times tight. If you already run calibration meetings, align follow-up with your talent calibration guide so AI doesn’t become a backdoor to inconsistent standards.
Use these response times as defaults in pilots: ≤24 h for severe privacy concerns, ≤7 days to acknowledge low trust scores, ≤21 days to publish actions.
- Managers review team-level results and address E19–E24 signals (within 7 days).
- HR aggregates results, sets priorities, and tracks actions (within 14 days).
- DPO/IT handles privacy or data incidents (response within ≤24 h).
- Works Council (Betriebsrat) is briefed on changes that affect monitoring or decision logic (before next cycle).
| Signal | How it shows up | Owner | Action | Due |
|---|---|---|---|---|
| Low transparency | E1–E6 Avg <3,2 | HR + Comms | Update explainer + manager script; add “AI used” disclosure | Within 14 days |
| Low psychological safety | E19–E24 Avg <3,2 | HRBP + Managers | Introduce correction workflow; train “challenge-safe” conversation steps | Within 14 days |
| Low manager oversight | M20 Avg <3,2 or M22 Avg <3,5 | Function leader | Add evidence requirement; quality spot-check 10% of reviews | Next cycle |
| Governance confusion | M25–M30 Avg <3,2 | HR + Legal + DPO | Rewrite guardrails; confirm whether Dienstvereinbarung is needed | Within 30 days |
| Severe privacy concern | Comments indicate sensitive data misuse | DPO + IT | Containment, investigation, comms to HR leadership | Response ≤24 h |
Fairness & bias checks
Don’t treat fairness as a single average score. Break results down by relevant groups and compare both perception and process signals. If fairness concerns rise, use your internal bias playbooks and align language checks with performance review biases so managers can recognise patterns in wording and evidence quality.
Use practical red flags: group gap ≥0,4, or ≥15 pp difference in favorable rates, or repeated open-text mentions of “generic,” “copy-paste,” “unfair,” “privacy.”
- Slice by location, level, tenure band, remote vs office, team, and job family (People Analytics, within 10 days).
- Check if certain groups report lower E15/E16 and higher E14/E17 (HRBP, within 14 days).
- Review a small sample of AI-assisted drafts for language patterns (HR, within 21 days).
- Re-run a short pulse after fixes (HR, within 60 days).
| Pattern you see | Typical interpretation | What to do next | Owner |
|---|---|---|---|
| Remote workers: lower E1–E6 | Transparency gaps in distributed communication | Run a remote-first briefing; add disclosure in review tool UI | HR + Managers |
| Junior staff: higher E14/E17 | Higher uncertainty and power distance | Add “how to challenge” steps; create a safe escalation path | HRBP |
| One department: lower E16 and lower M31 | Inconsistent standards and calibration drift | Refresh rubric anchors; run targeted calibration workshop | Function leader + HR |
| High E33 + low E12 | Feedback feels impersonal or poorly edited | Set editing minimums; ban copy-paste; require personalised examples | Managers |
Examples / use cases
Use the survey to make small, testable decisions. Treat these as concrete scenarios you can replicate in your environment, not as promised outcomes.
Scenario 1: Employees distrust AI because disclosure is unclear
You see E1–E6 Avg 2,8 and comments like “I only noticed AI wording after the meeting.” HR decides to add disclosure at the point of use: the review form shows “AI-assisted draft used: yes/no” plus a one-paragraph explanation. Managers get a 60-second script for the Mitarbeitergespräch. You re-run the combined pulse (Blueprint C) in 60 days to confirm trust moved upward.
Scenario 2: Managers save time, but employees feel feedback is generic
M7 and M8 score high (≥4,0), but E7–E12 score low (Avg <3,0) and E33 rises. The decision: keep AI for structuring, but require evidence. Each review section must include 2 proof points (project, metric, observable behaviour). HR shares “good vs bad” examples and updates prompts. In the next cycle, you spot-check 10% of reviews for specificity before they go to employees.
Scenario 3: Fairness concerns cluster in one group
E16 is stable overall, but one location shows a gap of 0,5 points and higher E14. HR and the local Betriebsrat align on a short review: check whether local teams used different prompts, rubrics, or data sources. You run a calibration refresher, tighten rubric anchors, and re-brief managers on bias patterns. You then compare the same group slice in the follow-up trend survey (Blueprint D) to see if the gap narrowed.
- Write down the decision you will make from each dimension before you send the survey (HR, before launch).
- Run 2 focus groups when Avg <3,0: one employee, one manager (HRBP, within 14 days).
- Publish 3 changes max; assign owners and dates (HR lead, within 21 days).
- Re-test with a short pulse after 60–90 days (People Analytics).
Implementation & updates
Start small, then scale. In DACH, involve the Betriebsrat early when AI touches performance processes, monitoring questions, or decision support. Keep the survey clearly separate from individual outcomes: responses must not be used to adjust someone’s rating or pay, or you will kill trust and participation. A talent platform like Sprad Growth can help automate survey sends, reminders, and follow-up tasks, while keeping actions and deadlines visible for owners.
Use this rollout rhythm: pilot (6–10 weeks) → first review cycle → survey + fixes (≤30 days) → scale to next area → trend check at 6–12 months. Pair rollout with role-based training, using resources like AI training for employees, AI training for managers, and AI training for HR teams so people learn the same guardrails and editing standards.
- Pilot in 1 function with 20–50 participants; define allowed AI uses (HR + IT, within 14 days).
- Agree data minimisation, retention, and access rights; document in plain language (DPO + HR, within 30 days).
- Train managers on verification and challenge-safe conversations (L&D, before the cycle starts).
- Run Blueprint A and B after the cycle; publish actions within 21 days (HR lead).
- Review and update question bank annually or after major tool changes (HR + Betriebsrat, 1× per year).
| Metric to track | Target | Why it matters | Owner |
|---|---|---|---|
| Participation rate | ≥70% post-cycle | Low rates usually mean trust or survey fatigue issues | HR Ops |
| Time-to-action | ≤21 days | Speed builds credibility; slow action reduces honesty next round | HR Lead |
| Training completion (managers) | ≥90% | Reduces risky prompts and over-reliance on drafts | L&D |
| Fairness group gaps | <0,4 points | Early warning for bias patterns or inconsistent standards | People Analytics |
| Action completion rate | ≥80% | Shows follow-through; prevents “survey theatre” | HRBP + Leaders |
If you need a broader enablement roadmap, connect this survey to your wider AI governance and skills work. A structured program like AI training programs for companies and an internal skills baseline (for example a simple AI skills matrix) make your results easier to interpret: you can separate “tool issues” from “skill gaps” and plan fixes faster.
Done well, this survey gives you three practical benefits: you detect trust and privacy issues early, you improve the quality of review conversations (not only the text), and you get clearer priorities for training and guardrails. Next, pick one pilot area, load the relevant blueprint into your survey tool, and name owners for transparency, privacy, and calibration follow-up. After the first cycle, publish what you changed and when—then re-check trends 6–12 months later to confirm AI is helping, not quietly damaging fairness.
FAQ
How often should we run these ai performance review survey questions?
If you’re piloting, run it after each AI-assisted review cycle for the first 2 cycles. That gives you fast feedback while people still remember what happened. After that, a deep-dive 1× per year plus a short pulse after major AI feature changes works well. Keep at least 6–8 core items stable so you can track trust and fairness trends over time.
What should we do if scores are very low (Avg <3,0) or comments are harsh?
Start with containment and clarity, not defensiveness. Acknowledge results within ≤7 days and say what will happen next. Use focus groups to find root causes: is it disclosure, generic wording, privacy fear, or manager behaviour? Then pick 3 fixes max with owners and deadlines. Close the loop publicly within ≤21 days. If privacy concerns are severe, route to DPO/IT with response ≤24 h.
How do we keep this survey from feeling like monitoring or performance control (especially in DACH)?
Be explicit about purpose and separation: survey responses must not influence ratings, pay, or individual outcomes. Aggregate results and apply anonymity thresholds (for example, no reporting for groups smaller than 5–7). Align with the Betriebsrat early and document guardrails in plain language. If you need framing, the EDPB guidance on automated decision-making and profiling is a helpful reference point: EDPB Guidelines.
Should we tell employees when AI was used in their review?
Yes—if you want trust. Lack of transparency tends to inflate fairness concerns even when the output quality is fine. Keep it simple: say where AI was used (drafting, summarising, calibration support), what data it did and did not use, and that the manager owns the final content. Add a correction right: employees can challenge wording and request edits without needing to “prove” the AI was wrong.
How do we keep the question bank current as tools and policies change?
Run an annual review with HR, a few managers, People Analytics, and (where applicable) the Betriebsrat. Keep your core trend items stable (trust, fairness, safety, privacy), and rotate a small set of “feature-specific” questions based on what changed (new summaries, new rating suggestions, new data sources). Pilot new items with 1 team first, then scale once wording is unambiguous and action-ready.



