Welcome, Guest
Username: Password: Remember me

TOPIC: Slowing down in Telemac

Slowing down in Telemac 5 years 3 months ago #32846

  • nguyenblue
  • nguyenblue's Avatar
Telemac still runs slowly for large example. It takes one week to complete my test.

By using gprof and Valgrind tools, I noticed that about 20% computational time is spent by this block (in mvseg.f):

51315350_402675200537108_4250785646030880768_n.jpg


I have attempted to change ncsize but it doesn't work well and my computer has just 8 cores.

Do you think I can accelerate Telemac by optimizing the above code? Or is there any file of BIEF which can be improved possibly?

I am still a newbie in fortran as well as Telemac. But it seems I have no choice but to modify Telemac code for running faster... :(
The administrator has disabled public write access.

Slowing down in Telemac 5 years 3 months ago #32855

  • c.coulet
  • c.coulet's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3632
  • Thank you received: 1010
Hi
It seems a little bit surprising for me to think you will be capable to modify telemac code to enhance compupation speed, particularly if your a newbie in telemac and maybe more important in fortran...

You must know that telemac computation capabilities had been tested and nowadays, the most used solution to reduce computational time is to use parallelism capabilities of Telemac...

You said that telemac runs slowly for large example but could you give us more detail about your particular case? maybe share your case...
Did you tried to run malpasset test case to compare your computation time with other results given in the benchmark section?

As a final comment, I just run a model of around 700km x 500 km on a coastal area. Mesh size between 100m locally at coast and 20km offshore.
~18000 nodes
It takes 1h to simulate 1 year on 12 cores...

regards
Christophe
The administrator has disabled public write access.
The following user(s) said Thank You: nguyenblue

Slowing down in Telemac 5 years 3 months ago #32862

  • amanj2013
  • amanj2013's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 211
  • Thank you received: 24
Hi,

well I run a model of coastal area storm surge with 309,000 nodes and included large db file for wind and atm pressure spatial and temporal variables with 30 mint output, comp. time only 1hr and 45 minutes.

Cheers,
AMANJ
The administrator has disabled public write access.

Slowing down in Telemac 5 years 3 months ago #32882

  • nguyenblue
  • nguyenblue's Avatar
Thank you for all your replies. Your messages help me understand how hard my optimization idea will be.

Our testing model consists of 754906 elements corresponding to 384691 nodes. In the future we are planning to target domains of 20 million elements or more. So we are looking for possibilities to improve Telemac performance.

About one week of running, my bad, my computer has two executing modes and I chose a buggy one so it took too long. :( After changing mode, performance is much better.
telemac computation capabilities had been tested and nowadays, the most used solution to reduce computational time is to use parallelism capabilities

I know Telemac is one of the best performance CFD simulations, that is why I choose it. But you know, even the bests want to be better, I believe Telemac will have better performance in future ;).

My post here just shares my idea about improving/optimizing Telemac. Maybe someone has done for their special purposes/test cases/examples. Or maybe someone will share his/her experiences like: "Oh man, I attempted to modify this code one year ago and I failed" or "You might use latest technologies or approaches like Hybrid MPI + OpenMP or Process in Process"

Back to the above code, I think the main problem is that compiler can not automatically vectorize the loop as data dependencies available. And between iterations, the program should go back or forward again and again to access elements of X, Y instead of unit-stride memory access.

I wrote a small program to test my ideas. Firstly this program executes the original loop

originalloop.png


Secondly, I write a new subroutine sumMult(Z, XA1, Y, G1, G2, N) like this:

newsubroutine.png


Thirdly, I compile my program with option -O3 for module OptimizedLib.f90 (which contains sumMult). The main program still have option -O0.

As running 10000 (ten thousand) times in which each of six arrays (X, XA1, XA2, Y, G1, G2) has about 280 thousands numbers (real and integer), I estimate that my modification can save 59% computational times of the original loop.

Of course, it is too early to judge its performance, I need to confirm my suggestion will work with real program, real data and real hardware.

There are two big questions:
1. As compiling TELEMAC, is it possible to use different optimization options for different fortran modules?
2. Are six arrays (X, XA1, XA2, Y, G1, G2) independent to each other? For example, any change of X will not affect values of elements of others, will it?

Merci d'avance
The administrator has disabled public write access.

Slowing down in Telemac 5 years 3 months ago #32863

  • amanj2013
  • amanj2013's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 211
  • Thank you received: 24
Well, if you have a supplementary FORTRAN file that gathering data from a file, this may be a case for slowing down since I was facing with while retrieve data from a file and I came with an idea to optimize my FORTRAN code and accelerate my comp time triple.

AMANJ
The administrator has disabled public write access.
The following user(s) said Thank You: nguyenblue

Slowing down in Telemac 5 years 3 months ago #32883

  • nguyenblue
  • nguyenblue's Avatar
Sorry, my bad.
I didn't notice that TELEMAC also uses -O3 to compile code. After recompile, the original loop has better performance than my one.

:( Sorry again
The administrator has disabled public write access.

Slowing down in Telemac 5 years 3 months ago #32893

  • nguyenblue
  • nguyenblue's Avatar
I found a new way to calculate X by storing temporary variables in C arrays.

Instead of calculating X in Fortran code, I write a C function and all calculation steps are done in C code. This C function is called by Fortran code.

The performance is much better, for all calculation steps (multiple and add) computational time reduces 60%.

However as writing back values of elements from C array to Fortran array, it slows down the whole program.


Cassignment.png


The above figure is the C statements which assign values from collection (C array) to X (Fortran array) and they slow my program. Curiously, I assign values from Fortran array to C array very quickly.

I am looking for an efficient way to assign values from C array to Fortran array.
The administrator has disabled public write access.

Slowing down in Telemac 5 years 3 months ago #32884

  • judicael
  • judicael's Avatar
Hello,

I have attempted to change ncsize but it doesn't work well and my computer has just 8 cores.

What do you mean by it doesn't work well?
If you want performances in Telemac you need to use the MPI parallelization.
Did you try to add "PARALLEL PROCESSORS = 8" in your CAS file?
The administrator has disabled public write access.

Slowing down in Telemac 5 years 3 months ago #32898

  • judicael
  • judicael's Avatar
Hi,
What do you mean when you are saying that you have changed ncsize but it doesn't work well? Have you tried adding "PARALLEL PROCESSORS = 8" in your case file?
The administrator has disabled public write access.

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.