You are responsible for designing the full energy project + tax credit optimization task for a given scenario:
A fully specified task prompt (narrative + legend) with 20 mandatory inputs.
A Golden Response (full, correct solution that answers every request in your prompt as an expert would)
A Golden Response Rubric (criteria derived from your Golden Response, scores the final answer from an LLM).
A Golden Chain-of-Thought (CoT) Rubric (criteria derived from the reasoning in your Golden Response, scores the reasoning inside answer from LLM).
Running initial model tests in micro1’s Realm platform to confirm that all evaluated models score below the 40% threshold on your rubrics and calculating three metrics for each LLM response.
Your goal is to create a sufficiently difficult energy tax credit optimization problem that AI is unable to solve, and then to provide the best solution as well as rubrics scoring the response and CoT from LLMs.
Each task will consist of your generation of outputs, a review by an energy engineer, your integration of their feedback, and an evaluation of LLM outputs using your rubric before submission. The energy engineer does not create rubrics -- they only review your scenario for technical feasibility.
Compensation
We expect that this whole process will take around 12 hours per task from zero to revised and final submission, which is our fixed Average Handling Time (AHT) cap. That means if your hourly rate is $50/hr, you will be paid $600 for a completed task bundle, no matter how long it actually takes. Please try to avoid going over 12 hours of work!
If you are unsure about anything substantive while composing the task prompt or rubrics, mark it with [ESCALATE] and comment what you’re unsure about before sending to the energy engineer or HD team, or feel free to message in the interim on Slack if you would like to figure it out.
Please refer to the example task (prompt + rubrics) as a model for structure and level of detail. We highly recommend looking at the example before reading this guide, and then also pairing these instructions with the example as a parallel reference.
Note that this example was written by our team internally and may not be legally consistent or sensible from a tax perspective, which is why we need expertise like yours for additional prompts and rubrics. The general structure, the inclusion of 20 mandatory inputs, and the task requests are all representative of a good task prompt, but what makes a task really strong is legal soundness and tax optimization under complex scenarios, which you'll be designing and solving for. Again, we are not tax experts, so please treat all of the following as a starting place!
Suggested Workflow
We recommend starting by getting very familiar with the Realm platform (log in with your expert.micro1.ai Google account). This is where you'll do the bulk of your work, and it will save your progress as you develop your outputs and allow you to validate with our internal quality checker. We will go into detail throughout the instructions. Once you have your initial assignment, create a new task, and copy and paste the body from this template and work from the platform itself.
Realm platform (log in using expert.micro1.ai email through Google)
Form for submitting links to your documents for review and final submission (after submitting using Realm platform)
Start by reading through the full step-by-step instruction set to understand every step of the workflow. Pair this with the example task prompt, Golden Response, and Golden Rubrics here.