The needs that justify an electrical and computer engineering problem-solving effort:
Power needs
Automation
Designing efficient computer hardware
Our Design Proposal:
LLM are trained using large amounts of computing power and servers, all which need to be cooled due to their excessive heat. This cooling uses a substantial amount of water, causing environmental concerns as the need for AI advances.
Objective (why?):
Current AI and supercomputers consume a lot of water to power their systems. Limiting the amount of water used would be extremely beneficial.
Background (who? where?):
People who are interested in AI and the environment. Big GPU and CPU tech companies would be interested in such a solution as it would vastly improve their computing efficiency.
Methodology (how? when?):
By creating algorithms that are highly efficient and need less power to reduce cooling needs. This would be done when there are major leaps in LLM power efficiency.
Expected results (what?):
Increase the efficiency that LLM compute task to reduce overall consumption of water or other cooling resources in hopes of reducing water consumption by 5%
Costs (how much will it cost?):
Cost of Graphics Processing Units (GPU): NVIDIA Tesla V100: $8,000 - $10,000/unit, $2.48/hour; NVIDIA A100: $10,000 - $15,000/unit, $2.93/hour
Cost of utilizing Tensor Processing Units(TPU): Google Cloud TPU v3: $4.50/hour; TPU v4: $8.00/hour
Time (how time will it take?):
Time to research: It will take 6 months - 1 Year to fully research the problem and use this research to find viable solutions.
Time to develop: It will take 2 years to fully develop the best solution to the problem.
Time to test: It will take 1 year to test our solution to see if it works properly.
Sources and References:
https://www.sciencedirect.com/science/article/pii/S1359431124007804#s0015 ←- New Interesting study on air/liquid cooling.
“In response, the academia has seen a surge of innovation in this area (specifically in chip level), with alternative techniques ranging from thermoelectric (TE) assisted air-cooling [59] to phase change material (PCM)-based thermal management systems [60]. General challenges of these technologies have limited their widespread adoption are complexity and expense, the high-level coordination required between electrical circuit and thermal engineers, new manufacturing processes, reliability risks, yield reduction, restricted re-workability, and supply-chain risks [61]. “