- Home
- » Publications
- » Energy efficiency for high-powered research computing: Case study in Green ICT
Energy efficiency for high-powered research computing: Case study in Green ICT
Download the publication
Summary
Using energy-efficient high performance computing (HPC) for research and other institutional activities through a program called Condor can help reduce energy bills if used effectively. Dr Hugh Beedie, Chief Technology Officer for Information Services at Cardiff University, has discovered some fascinating insights into how to increase energy efficiency in HPC for research.
The challenge
Hugh had identified a demand for a particular type of high performance computing service: researchers who needed to run the same job on their PC multiple times over long periods to achieve the data they required for their projects. The large amount of time taken to do this work on a single PC was hindering the researchers’ work.
Hugh saw the potential for improving the IT systems at Cardiff University by doing this work on PCs that were switched on, but not in use. By sharing the work between many idle PCs, researchers would get results more quickly. By using PCs that were already turned on, the electricity used in performing calculations was reduced, when compared with the alternative – a set of PCs dedicated to running such programs.
The innovative approach
Collaborating with Cardiff’s School of Computer Science, Hugh used Cardiff’s fully automated application delivery system to install a computing program called Condor on thousands of PCs. Condor is an open source product developed by the University of Wisconsin which allows
efficient use of computer power by allocating jobs to any idle computer within a pool.
The new system was initially run as a pilot with a small pool of up to 50 PCs, but because of the success of the project, the Information Services Board agreed to support it as a full service. Hugh said: ‘We knew there were certain types of jobs that ran well on Condor pools, and we recognised that it’s easier to encourage non-HPC literate people to use a Windows-based system … you’re providing an entry point that wouldn’t be there otherwise, which helps new HPC users understand the concepts and get results without a huge learning curve.’
One potential drawback of the Condor system is that if a job has been allocated to one of the PCs in the pool and another person uses that PC, the job will be interrupted and evicted, wasting time and energy. However, some researchers worked out their own solution to this: a process called ‘checkpointing’, where a temporary backup file is created every 10-15 minutes, allowing the PC to start the job again at the point it left off if the job gets interrupted. Another solution is to split the job into smaller chunks and run these through the Condor system.
Results and benefits
Initially, Hugh was focused on the capital savings the new system could provide, but the green aspect became evident early on. Hugh said: ‘Initially, I was focused on the capital savings … I then realised there was a green angle because not only was the PC sitting there, but it was turned on using its base electrical load, and all that adding Condor would do is increase that slightly. Instead of taking a computer from “power off” to “fully loaded”, you’re taking it from idle to fully loaded. You might be talking about 150 watts at full load and 75 when idle. You’re only looking at a 75 watt leap. That’s the basic model that made me think, “Oh yes, this is green, this is good”.’
In addition, the team recognised that using newer PCs in the Condor pool would greatly reduce power consumption and increase efficiency: ‘… if you care about being “green”, you should only put your newer PCs in the Condor pool. The older ones are just acting as slow, calculating room-heaters …’
The new system greatly benefits researchers from a wide range of schools in Cardiff University, who are able to run experiments and jobs more effectively and efficiently than they could have done before using only their own PC. But, although Condor can be a strong tool for cutting power usage, it has to be done with care and consideration of all of the elements that affect it:
‘The balance of what to run where changes over time. It’s not immediately intuitive, you need to think it through – but it’s worth it.’
Further Information
A longer version of this case study was originally produced by Grid Computing Now! and is available from the SusteIT project website.