Q: Are overlength (>300 characters) but coherent outputs penalized in evaluation, or are truncated results acceptable if they remain fluent and informative?
For ideas longer than 300 characters, we will simply truncate the text; no additional penalties are applied.The limit is meant only to standardize the amount of information. If the truncated idea remains coherent and meaningful, it will be evaluated as usual.
Q: Does the automatic evaluation pipeline address LLMs’ position bias between idea_1 and idea_2?
We recognize the position bias in LLM-as-a-Judge. To mitigate it, we will
run the evaluation with five different random seeds, and
swap the positions of idea_1 and idea_2.
Thus, each idea pair will be evaluated at least 5 (seeds) × 2 (order) = 10 times. Then we use majority voting to decide the judge.
Q: Some patent applicants may already offer commercial services. Does the novelty evaluation process take existing commercial offerings based on the patents into account?
Both human annotators and LLM-as-a-Judge may search the web to assess innovativeness. Ideas that are already commercialized in a similar form may receive lower novelty scores. However, if the patented technology is applied in a genuinely new way, the idea can still earn a high innovativeness score.
Q: Does the evaluation process consider the risk of patent infringement, or is this outside the scope of the shared task?
Patent-infringement risk is outside the scope of this shared task. That said, overlap with existing patents can still reduce the innovativeness score.