Can I add sources not in the inputs? Yes! But do skew toward official/primary sources and cite properly.
What if sources conflict? Report the conflict, state which you follow and why; include both citations.
Can I change the deliverable format? Within reason. The prompt should always contain all the required inputs and full coverage of the task requests, but you are free to adjust per your research for a specific problem.
What is the project goal? To design energy tax credit optimization problems that AI can't solve and to provide both the correct answers and the scale to which the answers and reasoning are correct (through the rubrics).
Why do we need both tax experts and energy experts? Tax experts will do the bulk of the research itself -- designing the problem, solving it, and creating rubrics. Energy experts are simply analyzing the project proposal to make sure it's realistic and feasible.
What exactly am I creating? Tax experts create a realistic energy project scenario and tax credit optimization problem (prompt) as well as rubrics to grade AI model responses. Energy experts review those scenarios for technical feasibility.
Do I also grade the AI models after creating my task? Yes. After your task is approved, you'll run your prompt through four AI models and paste their responses on our platform to be graded automatically using your rubrics.
Can I choose any energy project scenario I want? Within the bounds of your task type assignments (residential with/without battery, commercial, multifamily, or edge cases) so that you hit the required distribution of projects. You're welcome to adjust the assignment if you find a more interesting combination, as well. The rest of the scenario will be devised and researched by you and should be coherent. Technical aspects of the energy choices will be reviewed by an energy expert for feasibility.
Where do I get the 20 mandatory inputs from? You fill them out based on realistic project specs. Use census tract databases for location data, IRS Energy Community tools for status verification, and standard engineering specs for technology inputs. Keep track of your sources, and cite them in the golden response rubric if you assign points for that component.
Where do I actually submit my work? Tax Experts: Write your Task in a Google Doc, your Golden Response in a Google Doc, and your rubrics as two tabs within a Google Sheets spreadsheet. You will copy and paste each of these into the Realm platform and validate with Rhea per these instructions and then submit for review on the platform. You then submit this Google Form with your links, and your work will be reviewed by an energy expert. Upon review, you will see on the platform whether you need to make any adjustments, and we will send you the expert review. Energy Experts: You will receive the full task bundle (task prompt, golden response, rubrics). Submit your feedback using this Google Form.
After I submit my work, who reviews it first? There are several layers of review. Our internal review process Rhea (on the micro1 platform) will validate or invalidate your prompt and rubrics in a very minimal way, and you should adjust using its suggestions. See more here. You will then Then it goes to an energy expert for technical review. Then back to the tax expert for rework if needed, or if approved go to the project team for final approval.
How long until I get feedback? Target is 12-24 hours for first review.
What does "rework" mean and do I get paid for it? Rework means you need to make changes based on feedback. You're paid per completed task (12 hours worth), so rework time is included in that rate.
How many times can a task be sent back? Maximum 2 rework cycles. After that, it goes to project team for a final decision on acceptance or rejection.
What's the difference between Response Rubric and CoT Rubric? Response Rubric checks if the AI got the right answer (correct credits, correct amounts). CoT Rubric checks if the AI used the right reasoning process to get there.
How many rubric items do I need total? Minimum 15 items for Response Rubric and 15 items for CoT Rubric. At least 5 negative items across both rubrics combined.
What is a "negative rubric item"? A penalty for doing something wrong. If the AI makes a specific mistake (like claiming two incompatible credits), it loses points.
Where do I find sources to cite? IRS.gov, Treasury guidance documents, IRS Private Letter Rulings, state utility commission sites, and official energy program documentation. Need minimum 5 different sources across all rubric items.
Where exactly do I create my task? On the micro1 platform at realm.micro1.ai/arden. You'll input everything directly there.
What is Rhea? An AI validation tool that checks your prompt and rubrics for common errors before you submit. It helps catch issues early.
Can I save drafts and come back later? Yes. You can save your work in progress on the platform and return to it later.
Which AI models do I test and how many? 5 models total. The specific models will be provided when you start model evaluation.
Do I grade the model responses myself? Yes. You use your rubrics to grade each model's response. Rhea helps with initial grading, but you need to review and override if needed.
What does "models must score below 40%" mean? Your prompt and rubrics should be difficult enough that no AI model gets more than 40% of the possible points. This ensures we're training on challenging scenarios.
How much do I get paid per task? You get paid for 12 hours of work per completed task at your hourly rate. So if your rate is $40/hour, you get $480 per task that reaches sign-off.
Is it per-task or per-hour? Per-task with a fixed 12-hour rate. Whether it takes you 8 hours or 15 hours, you get paid for 12.
When does payment happen? Payments are processed twice monthly (1st-15th and 16th-end of month). Your payment shows up about 4-5 days after the pay period ends.
Do I submit timesheets? No. Just track your time in Hubstaff while working. The platform automatically calculates your payment based on completed tasks.
If I have a question, where do I ask? Post in #realm-arden-working on Slack and tag @Erick.
How fast should I expect responses? Urgent/blocking issues within 2 hours. Normal questions same business day.
Who do I contact for urgent vs general questions? General questions: @Erick in Slack. Urgent/blocking issues: @Erick first, @Zizo if critical.
What if real IRS guidance is unclear on something? Document the ambiguity in your rubric justification. Cite the conflicting sources and explain which interpretation you're using and why.
What are the most common reasons tasks get rejected? Missing mandatory inputs, unrealistic scenarios, rubric items without proper sources, scenarios that are too easy (models score above 40%), and technical infeasibility.