This guide explains how to create one complete Arden task: a realistic clean-energy project, a financial projection question, and the rubrics used to evaluate model answers and reasoning.
You define the project → the model responds → we score correctness and reasoning.
Tasks in this scope ask the model to compute project economics. Tax outcomes are provided as inputs — the model does not calculate them. The model must produce:
Net project cost after applying the stated tax credit outcome
Assumed electricity rate and bill-savings methodology
Simple payback period
Internal rate of return (IRR)
Net present value (NPV) over a stated project life
Explicit financial assumptions (discount rate, escalation, degradation, O&M if applicable)
One alternative configuration with recomputed payback, IRR, and NPV
Step 1: Confirm Assignment, Create Task on Realm, Pick a Building (15–30 min)
Step 2: Fill All 15 Mandatory Inputs (30 min–1 hr)
Step 3: Write the Narrative Task Prompt (1.5–2 hrs)
Step 4: Incorporate Energy Engineer Feedback (15–45 min)
Step 5: Build the Response Rubric with 20+ criteria (2 hrs)
Step 6: Build the CoT Rubric with 20+ criteria (2 hrs)
Step 7: Run Model Tests and Score (2–3 hrs)
Step 8: Final Submission (30 min)
Estimated time: 15–30 minutes
Log in to the Realm platform using Google and your expert.micro1.ai email.
Create a new task and name it using the format: [State Abbreviation]-[Project Type]-FIN-###
Confirm the parameters you've been given: U.S. state and project type (residential, commercial, multifamily, edge case).
These are not rigid assignments. If you have a combination you're particularly excited to research or where you have more expertise, you may adjust accordingly—just notify us ahead of time.
Pick a property that is believable and flexible enough to support the financial complexity you want.
Use mapping and listing tools such as Google Maps, Street View, Zillow, Redfin, or LoopNet to find a building that matches your assignment, has reasonable size, age, and layout, and has roof geometry that could realistically host PV. Avoid famous buildings or landmarks; choose something typical.
Adding complexity through property selection: For commercial or multifamily projects, you can introduce multiple tenants with different load profiles, include mixed-use space, or use partial roof availability to constrain PV sizing. These affect financial analysis through different rate structures, allocation of savings, and system sizing tradeoffs.
Make assumptions about location characteristics that affect project economics.
For Financial Projection tasks, location affects:
Utility rates and rate structures (flat, TOU, tiered, demand charges)
Solar resource and production estimates
State or local incentives (if included as additional given inputs)
Net metering policies and export compensation rates
How to Research Location Characteristics
Utility Service Territory: Use EIA Electric Retail Service Territories to identify the serving utility. Use OpenEI Utility Rate Database for rate structures.
Solar Resource: Use NREL PVWatts to estimate production based on location coordinates.
State Incentives (optional): Use DSIRE database at dsireusa.org for state and local incentive programs.
If you cannot find reliable rate or resource data for your chosen location, mark [ESCALATE] in your notes and explain whether to use a different location or make simplifying assumptions.
Estimated time: 30 minutes–1 hour
Every Financial Projection task requires 15 inputs organized into three categories:
Location [1]–[5]: 5 inputs
Technology [6]–[7]: 2 inputs
Financial [8]–[15]: 8 inputs
You may use an LLM to generate non-critical numeric details as a starting point, but you must sanity-check them and add complexity that makes the aggregate problem difficult to optimize. You remain responsible for coherence.
[1] Street Address and Coordinates
Choose a real address from listing sites or maps. In Google Maps, right-click and select "What's here?" then copy latitude and longitude. Format: [Full Address] ([Latitude], [Longitude])
[2] Census Tract ID
Use Census Bureau Geocoder. Record the full 11-digit GEOID. This is included for completeness but is not central to financial analysis.
[3] Energy Community Status
State Yes or No. This is provided for context only — the tax credit amount is already given in [10], so Energy Community status does not affect the model's calculations.
[4] Utility Service Territory
Identify the serving utility. Note relevant rate structure (flat, TOU, tiered, demand-based). This is critical for financial analysis as it determines the value of electricity savings.
[5] Property Characteristics
Include only characteristics that affect energy consumption, PV hosting capacity, and site constraints: building type, size in square feet, roof type (flat vs. pitched), age, lot size, and relevant obstructions or constraints.
[6] PV Capacity (kW DC)
Use NREL PVWatts at pvwatts.nrel.gov to size appropriately. Consider roof constraints, load offset goals, and budget.
[7] Expected Annual PV Generation (kWh)
Derive from PVWatts using your coordinates from [1]. This is a critical input for calculating annual electricity savings.
Note: For more complex projects involving battery storage, heat pumps, or EVSE, you may add optional technology inputs labeled [6a], [6b], and so on. Document these clearly in your legend. The base requirement is PV only.
[8] Total Installed Cost (USD)
Use NREL and DOE cost benchmarks to anchor estimates. Include equipment, labor, and soft costs.
[9] Budget Type
Specify Hard Cap or Flexible Range. This affects whether alternative configurations can exceed budget.
[10] Stated Tax Credit Amount (USD)
This is the federal tax credit value provided as a given input. The model does not calculate this — it is the key difference from Tax Strategy tasks. Simply state the dollar amount the project receives.
Adding complexity through tax credit: You can provide a credit amount that reflects complex stacking (base + bonuses) without requiring the model to derive it. The model's job is to use this given value in financial calculations.
[11] Financial Objective
Express the client's financial goal. This guides the model's analysis and alternative configuration selection.
Examples: Minimize payback period. Maximize NPV. Achieve IRR of at least 8%. Balance payback and NPV.
Adding complexity through objectives: Use competing objectives (minimize payback vs. maximize NPV) that may yield different optimal configurations.
[12] Contract Signing Date
The date the contract is executed. Format as YYYY-MM-DD.
[13] Placed-in-Service Date
The date the system becomes operational. This is the start date for financial projections.
[14] Electricity Rate ($/kWh)
The retail electricity rate used for calculating bill savings. Specify whether it is a flat rate, average of TOU periods, or blended rate.
You may provide this explicitly, or leave it for the model to research and justify, which increases difficulty. If the utility has TOU rates, you can require the model to account for production timing.
Adding complexity through rates: Use TOU rates with significant peak/off-peak differentials, tiered rates, or demand charges that require more sophisticated savings calculations.
[15] Project Life (years)
The analysis period for IRR and NPV calculations. Standard assumption is 25 years for PV. May vary based on equipment warranties, financing terms, or client preference.
Optional [16] Discount Rate
You may provide a discount rate explicitly, or require the model to select and justify an appropriate rate, which increases difficulty. If omitted, add to the prompt: "Select and justify an appropriate discount rate."
Estimated time: 1.5–2 hours
Turn your inputs into a coherent story. Use bracketed references throughout.
Opening Context
You are advising a [Project Type] in [City, State] with the goal of evaluating the financial performance of a proposed solar installation. The tax credit treatment has already been determined. Your task is to compute project economics and evaluate alternatives.
Location Block
The property is located at [1], in census tract [2]. The property is served by [4], which offers [describe rate structure].
Here are the property's characteristics: [5].
Technology Block
The owner is planning a solar PV installation sized at [6], designed to produce approximately [7] per year based on the site's solar resource and typical system losses.
Financial Block
The overall turnkey project is quoted at a total installed cost of [8]. This is a [9] for the homeowner.
The project qualifies for a federal tax credit of [10]. This credit amount is provided as a given — do not recalculate it.
The homeowner's financial objective is to [11].
Assume the contract is signed on [12], and the project is placed in service on [13].
Use an electricity rate of [14] for bill savings calculations. Assume a project life of [15] for IRR and NPV analysis.
At the end of the narrative, include this instruction block:
For every key calculation or recommendation, show your work and explain how you derived it from the inputs [1]–[15]. Use explicit formulas and state all assumptions clearly.
Your task:
Net Project Cost: Calculate the net project cost after applying the stated tax credit [10].
Bill Savings Methodology: State the assumed electricity rate [14] and explain your bill-savings calculation methodology. Show how annual savings are derived from expected generation [7].
Simple Payback Period: Calculate the simple payback period in years using the formula: Net Cost / Annual Savings.
Internal Rate of Return (IRR): Calculate the project IRR over the stated project life [15]. Show the cash flow structure used.
Net Present Value (NPV): Calculate the NPV using an explicitly stated discount rate. Account for electricity rate escalation, panel degradation, and O&M costs if applicable.
Financial Assumptions: List all assumptions used including discount rate, electricity escalation rate, panel degradation rate, O&M costs, and project life. Justify each assumption.
Alternative Configuration: Present one alternative configuration (different system size, different equipment, or different financing) with recomputed payback, IRR, and NPV. Explain the tradeoffs relative to the base case and the client's stated objective [11].
At the end, provide a short summary comparing the base case and alternative, with a recommendation aligned to the client's objective.
At the end of every prompt, include a legend mapping all 15 inputs:
Legend: [1] [Full Address] ([Latitude], [Longitude]) – street address and coordinates [2] [11-digit GEOID] – census tract ID [3] [Yes/No] – Energy Community status (for context only, tax credit already given) [4] [Utility Name] – utility service territory [5] [Property details] – property characteristics [6] [X] kW – PV capacity [7] [X] kWh – expected annual PV generation [8] [X] USD – total installed cost [9] [Hard Cap/Flexible Range] – budget type [10] [X] USD – stated federal tax credit (given, do not recalculate) [11] [Objective statement] – financial objective [12] [YYYY-MM-DD] – contract signing date [13] [YYYY-MM-DD] – placed-in-service date [14] [X] $/kWh – electricity rate [15] [X] years – project life
Validate with Rhea on the Realm platform.
If Rhea invalidates the prompt, make suggested changes and re-validate.
Once validated, submit for Review.
An energy expert will conduct an asynchronous feasibility review.
Your prompt passes only after engineering approval.
After approval, rubric steps are unlocked.
Estimated time: 15–45 minutes
After engineering review:
Update Task Prompt: If system sizes, production estimates, or cost assumptions change, update legend values, narrative references, and recalculate any reference values you will use in rubrics.
Resolve Disagreements: If you disagree with a suggested change that materially affects the scenario, mark it [ESCALATE], bring in the HD team for decision, and do not override feasibility concerns unilaterally.
Final Coherence Check: Confirm prompt and legend are consistent. Confirm [ESCALATE] items are resolved or documented. Re-validate with Rhea.
Estimated time: 2 hours
The Response Rubric scores final outputs only, not reasoning. It evaluates what the model's answer explicitly states.
What to Include: Net cost, payback period (years), IRR (percentage), NPV (dollars), stated assumptions (discount rate, escalation, degradation, O&M), alternative scenario values
What NOT to Include: Reasoning steps, formula derivations, comparisons or tradeoff logic, justifications or explanations. Those belong in the Chain-of-Thought Rubric.
One Criterion = One Claim. No stacking multiple checks with "and/or." If you want to check two things, use two rows.
Binary Only. Each criterion must be satisfiable as true or false. No partial credit within a single row.
Self-Contained. A grader must evaluate the criterion using only the task prompt, the model's final answer, and the criterion text itself.
Numeric Checks Require Tolerances. All numeric criteria must include explicit tolerances. Use ±1% for percentages (IRR), ±$100 or ±2% for dollar amounts (NPV, net cost), ±0.5 years for payback.
Neutral, Observable Verbs. Start each criterion with States, Mentions, Identifies, Computes, Quantifies, Provides, or Assigns. Avoid subjective language such as "properly," "clearly," "thoroughly," "key," or "significant."
Each criterion requires these fields:
Score — Integer points (positive or negative)
Type — Financial
Criterion — Single observable claim with tolerance if numeric
Source — Primary reference URL (NREL, DOE, financial methodology reference)
Quote — Short supporting excerpt (1–2 phrases)
Justification — Why this output is required; for numeric checks, show formula and reference value
Minimum 20 Response Rubric criteria.
Convert each prompt requirement into multiple atomic checks.
Include negative (penalty) criteria for serious errors.
Avoid criteria requiring the grader to do new research.
Avoid criteria that reference other rubric items.
Example Positive Criteria: States net project cost within ±[tolerance] of [reference value]. States simple payback period within ±0.5 years of [reference value]. States IRR as a percentage within ±1% of [reference value]. States NPV within ±[tolerance] of [reference value]. Identifies the discount rate used for NPV calculation. States the electricity escalation rate assumption. States the panel degradation rate assumption. Provides an alternative configuration with different system size or parameters. States the alternative configuration payback period. States the alternative configuration NPV.
Example Negative Criteria: Uses an electricity rate that differs from [14] without justification. Computes NPV without stating discount rate. Computes IRR without showing or describing cash flow structure. Ignores panel degradation in lifetime production estimate. Proposes alternative that exceeds budget when [9] is Hard Cap. Recalculates or questions the stated tax credit [10] instead of accepting it as given.
Estimated time: 2 hours
The CoT Rubric scores reasoning steps that appear in the answer text. It evaluates whether required reasoning actions are explicitly performed, not whether conclusions are optimal.
What to Score: Savings calculation methodology, formula selection and application, assumption justification, degradation and escalation logic, alternative configuration reasoning, tradeoff analysis
What NOT to Include: Final numeric outputs (those go in Response Rubric), final selections, narrative summaries, writing quality, repetition of answers already graded in Response Rubric
Each criterion requires:
Score — Integer points (positive or negative)
Type — Financial
Criterion — Binary description of one reasoning step
Source — Reference from Resource List
Quote — Short supporting excerpt
Justification — Why this reasoning step matters
Approved Verbs for CoT Criteria: Explains, Describes, Identifies, States, Computes, Quantifies, Connects, Compares, Evaluates, Considers, Derives, Shows
Express only one reasoning idea per criterion. Make criteria self-contained using [1]–[15] labels.
Minimum 20 CoT criteria covering all prompt asks.
Example Positive CoT Criteria: Explains how annual savings are derived from [7] and the electricity rate [14]. Describes the formula used for simple payback calculation. Shows the cash flow structure used for IRR calculation. Explains how panel degradation affects lifetime production. Explains how electricity escalation affects lifetime savings. Justifies the selected discount rate with reference to market rates or client context. Compares base case and alternative NPV with explicit reasoning. Explains the tradeoff between payback and NPV for the alternative configuration. Connects the recommendation to the client's stated objective [11]. Describes how O&M costs are incorporated into cash flows.
Example Negative CoT Criteria: Uses annual savings without showing derivation from [7] and [14]. Applies degradation rate without explaining its effect on production. Selects discount rate without any justification. Proposes alternative without explaining how it differs from base case. Ignores budget constraint [9] when reasoning about alternatives. Treats the tax credit [10] as something to be calculated rather than a given input. Ignores the client's stated objective [11] when making recommendations.
Estimated time: 2–3 hours
Evaluate four LLM responses on your task prompt.
Use your rubrics to grade each response.
Confirm all models score below 60% of total possible points.
Generate Model Outputs: Run your prompt through four models externally: GPT 5.2, Claude Opus 4.5, Gemini 3 Pro, and Llama 4. Use the exact same prompt for all. Purchase deep research for LLMs (you will be reimbursed). Copy responses to the "Evaluate Models" section on Realm.
Score with Rubrics: For the Response Rubric, Rhea will auto-assess each criterion. Double-check Rhea's assessments and change if wrong. Mark each row as satisfied or not. For the CoT Rubric, Rhea cannot analyze these. You must manually assess each criterion. Mark each row as satisfied or not.
Calculate Scores: For each model, Response Score = (Awarded Points / Total Points) × 100 and CoT Score = (Awarded Points / Total Points) × 100. If the awarded points sum to less than zero due to penalties, cap the score at 0%
Check 60% Threshold: Both rubrics must show below 60% for all four models. If any model scores 60% or higher, do NOT down-score retroactively. Instead, increase task difficulty prospectively by adding more realistic scenarios. We should not reverse engineer and try to increase difficulty by checking what the model did wrong, instead just adding realistic complexities.
Estimated time: 30 minutes
Task prompt is approved (post-engineering review).
Response Rubric has 20+ criteria including negatives.
CoT Rubric has 20+ criteria including negatives.
Both rubrics cover every prompt ask.
All four models score below 60% on both rubrics.
Legend is complete and matches narrative.
All [ESCALATE] items are resolved.
Submit your finalized task on Realm.
Here is a consolidated list of complexity-adding strategies for Financial Projection tasks.
Non-obvious rate structures: TOU rates with significant peak/off-peak differentials requiring production timing analysis. Tiered rates where marginal value of savings changes. Demand charges for commercial projects.
Multiple degradation factors: Panel degradation (typically 0.5% per year) plus inverter efficiency loss. Require explicit treatment of both.
Competing objectives: Client objective is to minimize payback, but the configuration that minimizes payback does not maximize NPV. Model must recognize and address this tension.
Constraint binding: Budget constraint [9] as Hard Cap eliminates the configuration that would otherwise be optimal on financial metrics.
Sensitivity requirements: Require the model to show how NPV changes with different discount rate or escalation assumptions.
Financing variations: Compare cash purchase vs. loan financing with different optimal recommendations. Loan scenarios require modeling of interest payments and different cash flow timing.
Export compensation complexity: Net metering with different export rates, or net billing where export compensation differs from retail rate.
O&M cost variations: Require explicit treatment of inverter replacement at year 12-15, or annual O&M as percentage of system cost.
Inflation and escalation mismatch: Electricity escalation rate differs from general inflation rate used for discounting.
Alternative configuration constraints: Alternative must achieve a minimum threshold (e.g., "payback under 10 years") while optimizing a different metric.
These are the standard formulas the model should use. Include expected values in your rubric justifications.
Net Project Cost Net Cost = Total Installed Cost [8] − Tax Credit [10]
Annual Savings (Year 1) Annual Savings = Expected Generation [7] × Electricity Rate [14]
Simple Payback Payback = Net Cost / Annual Savings (Year 1)
Annual Production with Degradation Production(year n) = [7] × (1 − degradation rate)^(n−1)
Annual Savings with Escalation and Degradation Savings(year n) = Production(year n) × [14] × (1 + escalation rate)^(n−1)
NPV NPV = −Net Cost + Σ [Savings(year n) / (1 + discount rate)^n] for n = 1 to [15]
IRR IRR is the discount rate that makes NPV = 0. Solved iteratively or using financial functions.
NREL System Advisor Model (SAM) documentation at sam.nrel.gov
NREL PVWatts Calculator at pvwatts.nrel.gov
NREL Annual Technology Baseline at atb.nrel.gov
Lawrence Berkeley National Laboratory Tracking the Sun at lbl.gov/tracking-the-sun
NREL U.S. Solar Photovoltaic System and Energy Storage Cost Benchmarks at nrel.gov/docs/fy23osti/83586.pdf
EnergySage Solar Marketplace Intel Reports at energysage.com/data
OpenEI Utility Rate Database at openei.org/wiki/Utility_Rate_Database
EIA Electric Power Monthly at eia.gov/electricity/monthly
EIA State Electricity Profiles at eia.gov/electricity/state
DSIRE (Database of State Incentives for Renewables & Efficiency) at dsireusa.org
Census Bureau Geocoder at geocoding.geo.census.gov/geocoder
EIA Electric Retail Service Territories at eia.gov/maps