Finance Scope Only

Purpose

This guide explains how to create one complete Arden task: a realistic clean-energy project, a financial projection question, and the rubrics used to evaluate model answers and reasoning.

You define the project → the model responds → we score correctness and reasoning.

Tasks in this scope ask the model to compute project economics. Tax outcomes are provided as inputs — the model does not calculate them. The model must produce:

Net project cost after applying the stated tax credit outcome

Assumed electricity rate and bill-savings methodology

Simple payback period

Internal rate of return (IRR)

Net present value (NPV) over a stated project life

Explicit financial assumptions (discount rate, escalation, degradation, O&M if applicable)

One alternative configuration with recomputed payback, IRR, and NPV

Quick Reference: Step Index

Step 1: Confirm Assignment, Create Task on Realm, Pick a Building (15–30 min)

Step 2: Fill All 15 Mandatory Inputs (30 min–1 hr)

Step 3: Write the Narrative Task Prompt (1.5–2 hrs)

Step 4: Incorporate Energy Engineer Feedback (15–45 min)

Step 5: Build the Response Rubric with 20+ criteria (2 hrs)

Step 6: Build the CoT Rubric with 20+ criteria (2 hrs)

Step 7: Run Model Tests and Score (2–3 hrs)

Step 8: Final Submission (30 min)

Step 1: Confirm Assignment, Create Task on Realm, and Pick a Building

Estimated time: 15–30 minutes

1.1 Create a New Task on Realm

Create a new task and name it using the format: [State Abbreviation]-[Project Type]-FIN-###

1.2 Check Your Assignment

Confirm the parameters you've been given: U.S. state and project type (residential, commercial, multifamily, edge case).

These are not rigid assignments. If you have a combination you're particularly excited to research or where you have more expertise, you may adjust accordingly—just notify us ahead of time.

1.3 Choose a Realistic Subject Property

Pick a property that is believable and flexible enough to support the financial complexity you want.

Use mapping and listing tools such as Google Maps, Street View, Zillow, Redfin, or LoopNet to find a building that matches your assignment, has reasonable size, age, and layout, and has roof geometry that could realistically host PV. Avoid famous buildings or landmarks; choose something typical.

Adding complexity through property selection: For commercial or multifamily projects, you can introduce multiple tenants with different load profiles, include mixed-use space, or use partial roof availability to constrain PV sizing. These affect financial analysis through different rate structures, allocation of savings, and system sizing tradeoffs.

1.4 Determine Location Characteristics

Make assumptions about location characteristics that affect project economics.

For Financial Projection tasks, location affects:

Utility rates and rate structures (flat, TOU, tiered, demand charges)

Solar resource and production estimates

State or local incentives (if included as additional given inputs)

Net metering policies and export compensation rates

How to Research Location Characteristics

Utility Service Territory: Use EIA Electric Retail Service Territories to identify the serving utility. Use OpenEI Utility Rate Database for rate structures.

Solar Resource: Use NREL PVWatts to estimate production based on location coordinates.

State Incentives (optional): Use DSIRE database at dsireusa.org for state and local incentive programs.

If you cannot find reliable rate or resource data for your chosen location, mark [ESCALATE] in your notes and explain whether to use a different location or make simplifying assumptions.

Step 2: Fill All 15 Mandatory Inputs

Estimated time: 30 minutes–1 hour

Every Financial Projection task requires 15 inputs organized into three categories:

Location [1]–[5]: 5 inputs

Technology [6]–[7]: 2 inputs

Financial [8]–[15]: 8 inputs

You may use an LLM to generate non-critical numeric details as a starting point, but you must sanity-check them and add complexity that makes the aggregate problem difficult to optimize. You remain responsible for coherence.

2.1 Location Inputs [1]–[5]

[1] Street Address and Coordinates

Choose a real address from listing sites or maps. In Google Maps, right-click and select "What's here?" then copy latitude and longitude. Format: [Full Address] ([Latitude], [Longitude])

[2] Census Tract ID

Use Census Bureau Geocoder. Record the full 11-digit GEOID. This is included for completeness but is not central to financial analysis.

[3] Energy Community Status

State Yes or No. This is provided for context only — the tax credit amount is already given in [10], so Energy Community status does not affect the model's calculations.

[4] Utility Service Territory

Identify the serving utility. Note relevant rate structure (flat, TOU, tiered, demand-based). This is critical for financial analysis as it determines the value of electricity savings.

[5] Property Characteristics

Include only characteristics that affect energy consumption, PV hosting capacity, and site constraints: building type, size in square feet, roof type (flat vs. pitched), age, lot size, and relevant obstructions or constraints.

2.2 Technology Inputs [6]–[7]

[6] PV Capacity (kW DC)

Use NREL PVWatts at pvwatts.nrel.gov to size appropriately. Consider roof constraints, load offset goals, and budget.

[7] Expected Annual PV Generation (kWh)

Derive from PVWatts using your coordinates from [1]. This is a critical input for calculating annual electricity savings.

Note: For more complex projects involving battery storage, heat pumps, or EVSE, you may add optional technology inputs labeled [6a], [6b], and so on. Document these clearly in your legend. The base requirement is PV only.

2.3 Financial Inputs [8]–[15]

[8] Total Installed Cost (USD)

Use NREL and DOE cost benchmarks to anchor estimates. Include equipment, labor, and soft costs.

[9] Budget Type

Specify Hard Cap or Flexible Range. This affects whether alternative configurations can exceed budget.

[10] Stated Tax Credit Amount (USD)

This is the federal tax credit value provided as a given input. The model does not calculate this — it is the key difference from Tax Strategy tasks. Simply state the dollar amount the project receives.

Adding complexity through tax credit: You can provide a credit amount that reflects complex stacking (base + bonuses) without requiring the model to derive it. The model's job is to use this given value in financial calculations.

[11] Financial Objective

Express the client's financial goal. This guides the model's analysis and alternative configuration selection.

Examples: Minimize payback period. Maximize NPV. Achieve IRR of at least 8%. Balance payback and NPV.

Adding complexity through objectives: Use competing objectives (minimize payback vs. maximize NPV) that may yield different optimal configurations.

[12] Contract Signing Date

The date the contract is executed. Format as YYYY-MM-DD.

[13] Placed-in-Service Date

The date the system becomes operational. This is the start date for financial projections.

[14] Electricity Rate ($/kWh)

The retail electricity rate used for calculating bill savings. Specify whether it is a flat rate, average of TOU periods, or blended rate.

You may provide this explicitly, or leave it for the model to research and justify, which increases difficulty. If the utility has TOU rates, you can require the model to account for production timing.

Adding complexity through rates: Use TOU rates with significant peak/off-peak differentials, tiered rates, or demand charges that require more sophisticated savings calculations.

[15] Project Life (years)

The analysis period for IRR and NPV calculations. Standard assumption is 25 years for PV. May vary based on equipment warranties, financing terms, or client preference.

Optional [16] Discount Rate

You may provide a discount rate explicitly, or require the model to select and justify an appropriate rate, which increases difficulty. If omitted, add to the prompt: "Select and justify an appropriate discount rate."

Step 3: Write the Narrative Task Prompt

Estimated time: 1.5–2 hours

3.1 Narrative Structure

Turn your inputs into a coherent story. Use bracketed references throughout.

Opening Context

You are advising a [Project Type] in [City, State] with the goal of evaluating the financial performance of a proposed solar installation. The tax credit treatment has already been determined. Your task is to compute project economics and evaluate alternatives.

Location Block

The property is located at [1], in census tract [2]. The property is served by [4], which offers [describe rate structure].

Here are the property's characteristics: [5].

Technology Block

The owner is planning a solar PV installation sized at [6], designed to produce approximately [7] per year based on the site's solar resource and typical system losses.

Financial Block

The overall turnkey project is quoted at a total installed cost of [8]. This is a [9] for the homeowner.

The project qualifies for a federal tax credit of [10]. This credit amount is provided as a given — do not recalculate it.

The homeowner's financial objective is to [11].

Assume the contract is signed on [12], and the project is placed in service on [13].

Use an electricity rate of [14] for bill savings calculations. Assume a project life of [15] for IRR and NPV analysis.

3.2 Task Request

At the end of the narrative, include this instruction block:

For every key calculation or recommendation, show your work and explain how you derived it from the inputs [1]–[15]. Use explicit formulas and state all assumptions clearly.

Your task:

Net Project Cost: Calculate the net project cost after applying the stated tax credit [10].

Bill Savings Methodology: State the assumed electricity rate [14] and explain your bill-savings calculation methodology. Show how annual savings are derived from expected generation [7].

Simple Payback Period: Calculate the simple payback period in years using the formula: Net Cost / Annual Savings.

Internal Rate of Return (IRR): Calculate the project IRR over the stated project life [15]. Show the cash flow structure used.

Net Present Value (NPV): Calculate the NPV using an explicitly stated discount rate. Account for electricity rate escalation, panel degradation, and O&M costs if applicable.

Financial Assumptions: List all assumptions used including discount rate, electricity escalation rate, panel degradation rate, O&M costs, and project life. Justify each assumption.

Alternative Configuration: Present one alternative configuration (different system size, different equipment, or different financing) with recomputed payback, IRR, and NPV. Explain the tradeoffs relative to the base case and the client's stated objective [11].

At the end, provide a short summary comparing the base case and alternative, with a recommendation aligned to the client's objective.

3.3 Legend

At the end of every prompt, include a legend mapping all 15 inputs:

Legend: [1] [Full Address] ([Latitude], [Longitude]) – street address and coordinates [2] [11-digit GEOID] – census tract ID [3] [Yes/No] – Energy Community status (for context only, tax credit already given) [4] [Utility Name] – utility service territory [5] [Property details] – property characteristics [6] [X] kW – PV capacity [7] [X] kWh – expected annual PV generation [8] [X] USD – total installed cost [9] [Hard Cap/Flexible Range] – budget type [10] [X] USD – stated federal tax credit (given, do not recalculate) [11] [Objective statement] – financial objective [12] [YYYY-MM-DD] – contract signing date [13] [YYYY-MM-DD] – placed-in-service date [14] [X] $/kWh – electricity rate [15] [X] years – project life

3.4 Validate and Submit for Review

Validate with Rhea on the Realm platform.

If Rhea invalidates the prompt, make suggested changes and re-validate.

Once validated, submit for Review.

An energy expert will conduct an asynchronous feasibility review.

Your prompt passes only after engineering approval.

After approval, rubric steps are unlocked.

Step 4: Incorporate Energy Engineer Feedback

Estimated time: 15–45 minutes

After engineering review:

Update Task Prompt: If system sizes, production estimates, or cost assumptions change, update legend values, narrative references, and recalculate any reference values you will use in rubrics.

Resolve Disagreements: If you disagree with a suggested change that materially affects the scenario, mark it [ESCALATE], bring in the HD team for decision, and do not override feasibility concerns unilaterally.

Final Coherence Check: Confirm prompt and legend are consistent. Confirm [ESCALATE] items are resolved or documented. Re-validate with Rhea.

Step 5: Build the Response Rubric

Estimated time: 2 hours

5.1 Purpose

The Response Rubric scores final outputs only, not reasoning. It evaluates what the model's answer explicitly states.

What to Include: Net cost, payback period (years), IRR (percentage), NPV (dollars), stated assumptions (discount rate, escalation, degradation, O&M), alternative scenario values

What NOT to Include: Reasoning steps, formula derivations, comparisons or tradeoff logic, justifications or explanations. Those belong in the Chain-of-Thought Rubric.

5.2 Mandatory Rules for Response Rubric Criteria

One Criterion = One Claim. No stacking multiple checks with "and/or." If you want to check two things, use two rows.

Binary Only. Each criterion must be satisfiable as true or false. No partial credit within a single row.

Self-Contained. A grader must evaluate the criterion using only the task prompt, the model's final answer, and the criterion text itself.

Numeric Checks Require Tolerances. All numeric criteria must include explicit tolerances. Use ±1% for percentages (IRR), ±$100 or ±2% for dollar amounts (NPV, net cost), ±0.5 years for payback.

Neutral, Observable Verbs. Start each criterion with States, Mentions, Identifies, Computes, Quantifies, Provides, or Assigns. Avoid subjective language such as "properly," "clearly," "thoroughly," "key," or "significant."

5.3 Response Rubric Structure

Each criterion requires these fields:

Score — Integer points (positive or negative)

Type — Financial

Criterion — Single observable claim with tolerance if numeric

Source — Primary reference URL (NREL, DOE, financial methodology reference)

Quote — Short supporting excerpt (1–2 phrases)

Justification — Why this output is required; for numeric checks, show formula and reference value

5.4 Coverage Guidelines

Minimum 20 Response Rubric criteria.

Convert each prompt requirement into multiple atomic checks.

Include negative (penalty) criteria for serious errors.

Avoid criteria requiring the grader to do new research.

Avoid criteria that reference other rubric items.

Example Positive Criteria: States net project cost within ±[tolerance] of [reference value]. States simple payback period within ±0.5 years of [reference value]. States IRR as a percentage within ±1% of [reference value]. States NPV within ±[tolerance] of [reference value]. Identifies the discount rate used for NPV calculation. States the electricity escalation rate assumption. States the panel degradation rate assumption. Provides an alternative configuration with different system size or parameters. States the alternative configuration payback period. States the alternative configuration NPV.

Example Negative Criteria: Uses an electricity rate that differs from [14] without justification. Computes NPV without stating discount rate. Computes IRR without showing or describing cash flow structure. Ignores panel degradation in lifetime production estimate. Proposes alternative that exceeds budget when [9] is Hard Cap. Recalculates or questions the stated tax credit [10] instead of accepting it as given.

Step 6: Build the CoT (Chain-of-Thought) Rubric

Estimated time: 2 hours

6.1 Purpose

The CoT Rubric scores reasoning steps that appear in the answer text. It evaluates whether required reasoning actions are explicitly performed, not whether conclusions are optimal.

What to Score: Savings calculation methodology, formula selection and application, assumption justification, degradation and escalation logic, alternative configuration reasoning, tradeoff analysis

What NOT to Include: Final numeric outputs (those go in Response Rubric), final selections, narrative summaries, writing quality, repetition of answers already graded in Response Rubric

6.2 CoT Rubric Structure

Each criterion requires:

Score — Integer points (positive or negative)

Type — Financial

Criterion — Binary description of one reasoning step

Source — Reference from Resource List

Quote — Short supporting excerpt

Justification — Why this reasoning step matters

Approved Verbs for CoT Criteria: Explains, Describes, Identifies, States, Computes, Quantifies, Connects, Compares, Evaluates, Considers, Derives, Shows

Express only one reasoning idea per criterion. Make criteria self-contained using [1]–[15] labels.

6.3 CoT Coverage Guidelines

Minimum 20 CoT criteria covering all prompt asks.

Example Positive CoT Criteria: Explains how annual savings are derived from [7] and the electricity rate [14]. Describes the formula used for simple payback calculation. Shows the cash flow structure used for IRR calculation. Explains how panel degradation affects lifetime production. Explains how electricity escalation affects lifetime savings. Justifies the selected discount rate with reference to market rates or client context. Compares base case and alternative NPV with explicit reasoning. Explains the tradeoff between payback and NPV for the alternative configuration. Connects the recommendation to the client's stated objective [11]. Describes how O&M costs are incorporated into cash flows.

Example Negative CoT Criteria: Uses annual savings without showing derivation from [7] and [14]. Applies degradation rate without explaining its effect on production. Selects discount rate without any justification. Proposes alternative without explaining how it differs from base case. Ignores budget constraint [9] when reasoning about alternatives. Treats the tax credit [10] as something to be calculated rather than a given input. Ignores the client's stated objective [11] when making recommendations.

Step 7: Run Model Tests and Score

Estimated time: 2–3 hours

Goal

Evaluate four LLM responses on your task prompt.

Use your rubrics to grade each response.

Confirm all models score below 60% of total possible points.

Process

Generate Model Outputs: Run your prompt through four models externally: GPT 5.2, Claude Opus 4.5, Gemini 3 Pro, and Llama 4. Use the exact same prompt for all. Purchase deep research for LLMs (you will be reimbursed). Copy responses to the "Evaluate Models" section on Realm.

Score with Rubrics: For the Response Rubric, Rhea will auto-assess each criterion. Double-check Rhea's assessments and change if wrong. Mark each row as satisfied or not. For the CoT Rubric, Rhea cannot analyze these. You must manually assess each criterion. Mark each row as satisfied or not.

Calculate Scores: For each model, Response Score = (Awarded Points / Total Points) × 100 and CoT Score = (Awarded Points / Total Points) × 100. If the awarded points sum to less than zero due to penalties, cap the score at 0%

Check 60% Threshold: Both rubrics must show below 60% for all four models. If any model scores 60% or higher, do NOT down-score retroactively. Instead, increase task difficulty prospectively by adding more realistic scenarios. We should not reverse engineer and try to increase difficulty by checking what the model did wrong, instead just adding realistic complexities.

Step 8: Final Submission

Estimated time: 30 minutes

Pre-Submission Checklist

Task prompt is approved (post-engineering review).

Response Rubric has 20+ criteria including negatives.

CoT Rubric has 20+ criteria including negatives.

Both rubrics cover every prompt ask.

All four models score below 60% on both rubrics.

Legend is complete and matches narrative.

All [ESCALATE] items are resolved.

Submit

Submit your finalized task on Realm.

Appendix A: Adding Complexity

Here is a consolidated list of complexity-adding strategies for Financial Projection tasks.

Non-obvious rate structures: TOU rates with significant peak/off-peak differentials requiring production timing analysis. Tiered rates where marginal value of savings changes. Demand charges for commercial projects.

Multiple degradation factors: Panel degradation (typically 0.5% per year) plus inverter efficiency loss. Require explicit treatment of both.

Competing objectives: Client objective is to minimize payback, but the configuration that minimizes payback does not maximize NPV. Model must recognize and address this tension.

Constraint binding: Budget constraint [9] as Hard Cap eliminates the configuration that would otherwise be optimal on financial metrics.

Sensitivity requirements: Require the model to show how NPV changes with different discount rate or escalation assumptions.

Financing variations: Compare cash purchase vs. loan financing with different optimal recommendations. Loan scenarios require modeling of interest payments and different cash flow timing.

Export compensation complexity: Net metering with different export rates, or net billing where export compensation differs from retail rate.

O&M cost variations: Require explicit treatment of inverter replacement at year 12-15, or annual O&M as percentage of system cost.

Inflation and escalation mismatch: Electricity escalation rate differs from general inflation rate used for discounting.

Alternative configuration constraints: Alternative must achieve a minimum threshold (e.g., "payback under 10 years") while optimizing a different metric.

Appendix B: Reference Formulas

These are the standard formulas the model should use. Include expected values in your rubric justifications.

Net Project Cost Net Cost = Total Installed Cost [8] − Tax Credit [10]

Annual Savings (Year 1) Annual Savings = Expected Generation [7] × Electricity Rate [14]

Simple Payback Payback = Net Cost / Annual Savings (Year 1)

Annual Production with Degradation Production(year n) = [7] × (1 − degradation rate)^(n−1)

Annual Savings with Escalation and Degradation Savings(year n) = Production(year n) × [14] × (1 + escalation rate)^(n−1)

NPV NPV = −Net Cost + Σ [Savings(year n) / (1 + discount rate)^n] for n = 1 to [15]

IRR IRR is the discount rate that makes NPV = 0. Solved iteratively or using financial functions.

Appendix C: Resource List (Primary Sources)

Financial Analysis Resources

NREL System Advisor Model (SAM) documentation at sam.nrel.gov

NREL PVWatts Calculator at pvwatts.nrel.gov

NREL Annual Technology Baseline at atb.nrel.gov

Lawrence Berkeley National Laboratory Tracking the Sun at lbl.gov/tracking-the-sun

Cost Benchmarks

NREL U.S. Solar Photovoltaic System and Energy Storage Cost Benchmarks at nrel.gov/docs/fy23osti/83586.pdf

EnergySage Solar Marketplace Intel Reports at energysage.com/data

Utility Rates

OpenEI Utility Rate Database at openei.org/wiki/Utility_Rate_Database

EIA Electric Power Monthly at eia.gov/electricity/monthly

EIA State Electricity Profiles at eia.gov/electricity/state

Incentive Databases

DSIRE (Database of State Incentives for Renewables & Efficiency) at dsireusa.org

Census and Mapping

Census Bureau Geocoder at geocoding.geo.census.gov/geocoder

EIA Electric Retail Service Territories at eia.gov/maps

Page updated

Report abuse