Welcome, Guest
Username: Password: Remember me

TOPIC: Genop optimizer from TelApy

Genop optimizer from TelApy 2 years 1 month ago #40111

  • qilong
  • qilong's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 340
  • Thank you received: 33
Hello Telemac fellows,

I'm learning how to use TelApy from the examples in the source code located in v8p3r1\notebooks\optim\telemac2d_optim_genop.ipynb.

I noticed that in the section "Run the optimization" at the line
# run optimization in parallel mode
# ...comments: change parameter nproc depending on your machine 
fcost, xopt = mypb.optimize(nproc=4)

if I use nproc>2, I will get errors
OSError: /home/qilong/Telemac/v8p3r1/examples/telemac2d/estimation/libuser_fortran.so: file too short

But it worked well with nproc<=2.

I'm wondering, how does optimization work when using multiple processors?
Is it because each processor was trying to compile its own version of libuser_fortran.so, which corrupted that file?

Thanks in advance!

Kind regards,
Qilong
The administrator has disabled public write access.

Genop optimizer from TelApy 2 years 1 month ago #40161

  • pham
  • pham's Avatar
  • OFFLINE
  • Administrator
  • Posts: 1469
  • Thank you received: 563
Hello Qilong,

Can you write the command you have run please?
I am not a TelApy specialist, but other people might answer with more information.

Chi-Tuan
The administrator has disabled public write access.

Genop optimizer from TelApy 2 years 1 month ago #40168

  • qilong
  • qilong's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 340
  • Thank you received: 33
Hello Chi-Tuan,

Thanks for following up on this. I was running the example of using genop optimizer in \v8p3r1\notebooks\optim\telemac2d_optim_genop.ipynb. The only thing I changed was in cell 9, the number of processors used by the optimizer.
# run optimization in parallel mode
# ...comments: change parameter nproc depending on your machine 
fcost, xopt = mypb.optimize(nproc=2)

I used 2 and it worked fine, but if I use 4 or 8, I got the following errors.
~> Checking keyword/rubrique coherence
  ~> Checking keyword/rubrique coherence
  ~> Checking keyword/rubrique coherence
  ~> Checking keyword/rubrique coherence
  ~> Checking keyword/rubrique coherence
  ~> Checking keyword/rubrique coherence
  ~> Checking keyword/rubrique coherence
  ~> Checking keyword/rubrique coherence
---------------------------------------------------------------------------
RemoteTraceback                           Traceback (most recent call last)
RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "<ipython-input-4-7e435a25630a>", line 6, in estimation
    Fx = study.h_x(CHESTR[0]) # Telemac computation with new friction coefficient
  File "/home/qilong/Telemac/v8p3r1/scripts/python3/telapy/tools/study_t2d.py", line 19, in h_x
    h, _, _ = self.run_telemac2d(k, finalize=True)
  File "/home/qilong/Telemac/v8p3r1/scripts/python3/telapy/tools/study_t2d.py", line 25, in run_telemac2d
    self.t2d = Telemac2d(self.studyFiles['t2d.cas'],
  File "/home/qilong/Telemac/v8p3r1/scripts/python3/telapy/api/t2d.py", line 58, in __init__
    super(Telemac2d, self).__init__("t2d", casfile, user_fortran,
  File "/home/qilong/Telemac/v8p3r1/scripts/python3/telapy/api/api_module.py", line 166, in __init__
    ctypes.cdll.LoadLibrary(user_fortran_lib_path)
  File "/usr/lib/python3.8/ctypes/__init__.py", line 451, in LoadLibrary
    return self._dlltype(name)
  File "/usr/lib/python3.8/ctypes/__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/qilong/Telemac/v8p3r1/examples/telemac2d/estimation/libuser_fortran.so: file too short
"""

The above exception was the direct cause of the following exception:

OSError                                   Traceback (most recent call last)
<ipython-input-9-6e8281fa46e5> in <module>
     17 # run optimization in parallel mode
     18 # ...comments: change parameter nproc depending on your machine
---> 19 fcost, xopt = mypb.optimize(nproc=8)

~/Telemac/v8p3r1/scripts/python3/telapy/tools/genop/genop.py in optimize(self, nbgen, nproc)
    115         self._pop = genpop(self.bounds, self.nvar, self.popsize)
    116         # Compute the initial score for the population
--> 117         self._fvalpop, ncalls = cost(self._pop, self.popsize, self.nvar,
    118                                      self.function, nproc)
    119         self.nsimul = self.nsimul + ncalls

~/Telemac/v8p3r1/scripts/python3/telapy/tools/genop/costfunction.py in cost(pop, npop, nvar, fname, nproc)
     48     else:
     49         pool = mp.Pool(processes=nproc)
---> 50         feval = pool.map(fname, pop)
     51         pool.close()
     52         pool.join()

/usr/lib/python3.8/multiprocessing/pool.py in map(self, func, iterable, chunksize)
    362         in a list that is returned.
    363         '''
--> 364         return self._map_async(func, iterable, mapstar, chunksize).get()
    365 
    366     def starmap(self, func, iterable, chunksize=None):

/usr/lib/python3.8/multiprocessing/pool.py in get(self, timeout)
    769             return self._value
    770         else:
--> 771             raise self._value
    772 
    773     def _set(self, i, obj):

OSError: /home/qilong/Telemac/v8p3r1/examples/telemac2d/estimation/libuser_fortran.so: file too short

It seems that libuser_fortran.so was corrupted somehow but I don't know if it's due to the testing environment I was using (gfortran, openmpi and python on Ubuntu 20.04 LTS on WSL2, VS Code as IDE), or it's because multiple processes were trying to overwrite this file.

But if I use 2 processors, it worked as expected.

Kind regards,
Qilong
The administrator has disabled public write access.

Genop optimizer from TelApy 1 year 7 months ago #41209

  • qilong
  • qilong's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 340
  • Thank you received: 33
Hello developers,

Recently I was trying to apply the Genop optimizer to our Telemac-2D models, based on the example in the notebook. The goal was to auto-calibrate the hydrodynamics. It worked in general but I also found the same issue as mentioned above.

When I included "user_fortran" in the model for optimization using multi-core CPU, the modified subroutines were recompiled and I often got the error:
OSError: /home/qilong/Telemac/v8p3r1/examples/telemac2d/estimation/libuser_fortran.so: file too short

The direct cause was that the file "libuser_fortran.so" was corrupted. I suspected that this was due to I/O from multiprocessing. The GENOP created a batch of simulations at the same time and each one was trying to do I/O operation to this file.

Do you also have this issue at your side?

So later I removed "user_fortran" in the model and the problem was gone, and the optimizer worked as expected. If this was the reason, is it possible to append a serial number to "libuser_fortran.so" so the same file will not be corrupted by other processes?

Thanks in advance!

Kind regards,
Qilong
The administrator has disabled public write access.

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.