As the water resources community moves towards more widespread use of 2D and more complex modeling, access to optimal hardware becomes increasingly important. Partially inspired by a study carried out by the Army Corps of Engineers, the Galileo team decided to launch its own investigation to find the fastest setup for compute-intensive 2D HEC-RAS runs, especially focusing on the number of cores allocated to each run.
We started by running scalability tests on large models for several HEC-RAS users, using Galileo to run each plan in a 48-core cloud instance, leveraging from 1 to ‘All Available’ cores and recording the effect on runtime. Here’s the first of two case studies we’ll discuss in more detail:
We were able to get the runtime down from 10.5 hours in your typical 4-core, 8-thread computer to about 7.5 hours. That’s over 30% improvement. As you can see, throwing more computation threads at the problem definitely leads to runtime savings with 2D models. We can extrapolate from this that leveraging additional machines with more threads–on-prem or in the cloud–will save engineers and firms time and money. However, the data shows that this is only effective up to a point, thus the ‘elbow’ shaped curve. More on this later.
Our second test case is particularly interesting because it was built, in part, to test the limits of RAS, itself! Chris Goodell of Kleinschmidt Associates and The RAS Solution built his Missoula Floods model out of an interest in understanding the geological formations of the Pacific Northwest. As a long-time HEC-RAS user and instructor, he also wanted to test the limits of the software, in terms of model scope and complexity.
His model is an attempt to simulate the floods resulting from the failure of a glacial dam approximately 10 to 40 thousand years ago. The ice dam was estimated to reach 630 m in height, and glacial Lake Missoula, impounded by the glacier, contained a volume of water approximately equal to that of Lake Ontario and Lake Erie combined. Floodwater affected territory spanning from present-day western Montana, the panhandle of Idaho, eastern Washington, through the Columbia River Gorge, the Willamette Valley of Oregon, and to the mouth of the Columbia River. To Goodell’s knowledge, HEC-RAS had never been used to model a dam breach of this magnitude. He originally built the model in 2008 in 1D and later created a 2D version. We ran scalability tests on the 2D model:
As in our first case study above, we found that modelers should expect significant benefits from spreading the computation across more processing threads, but again we observe diminishing marginal returns beyond a certain threshold.
Across multiple tests, running these and other models, we observed anywhere between a 5 and 30% runtime improvement beyond 8-16 threads (the quantity typically available to modelers with whom we work), depending on the model. The inflection point is dependent on several factors, including cell count, the size of the geographical area, the time stepping resolution, etc.
There are reasons for the asymptotic shape of these curves. First, any given RAS plan may not need or use all cores available. More importantly, adding more cores increases the computing overhead generated by the increase in communication needed to share information between cores. At a certain point, the time saved by distributing the task across one more core fails to outweigh the costs of the communication overhead.
As we confronted this physical limitation, we began to look for smarter ways to circumvent the obstacle rather than overcome it by brute force. Our solution was to schedule multi-core parallel plan runs, made possible by Galileo.
Previous tests (above) had shown us that a machine with 20+ cores might optimally handle two or more models running in parallel. Armed with this knowledge, we decided to experiment with parallelism and different core count setups in a 48-core instance, working with up to 12 RAS plans. Here are the results:
The dark blue line, ascending from left to right, illustrates cumulative runtime per additional plan for sequentially-running plans using the “All Available” cores RAS configuration. In this case, each run took approximately 7 hours. Given sequential runs, the addition of a new plan meant an extra 7 hours added to the total runtime. This is essentially the status quo predicament, especially for engineers running on their own personal workstations.
The efficiency gains from parallelism are immediately evident when you compare the blue and yellow lines. The yellow line represents two parallel batches of 6 plans each, with each RAS plan constrained to run on 8 cores. Although the first batch–and therefore the first run–took 11 hours instead of 7, we were able to complete 6 runs in those first 11 hours. All 12 runs were completed in 22 hours. This is clearly an enormous improvement compared to approx. 42 hours for 6 sequential runs.
The remaining lines illustrate variations on this theme. Running all 12 plans in parallel, on 4 cores each, was the optimal solution in this case, for this model.
Overall, running in parallel is the best solution we found as a remedy for runtime woes, but there are some caveats. If you lose a significant amount of time attempting to access a remote machine with more cores, it could cancel out your time saved. This is precisely the problem that Galileo was built to solve, by allowing you to deploy to other machines in as little time as it would take to start running locally.
Although communication overhead imposes a limit on the acceleration potential of your RAS plans, you can definitely circumvent the issue by running in parallel. With Galileo, you can easily access RAS-optimized machines, with as many cores as you need for your models, instead of tying up your own workstation. Contact us below to try it out for free.