Model Run HowTo

From OpenM++
Jump to: navigation, search

Contents

OpenM++ model run overview

It is recommended to start from single desktop version of openM++.

OpenM++ models can be run on Windows and Linux platforms, on single desktop computer, on multiple computers over network, in HPC cluster or cloud environment (i.e. Google Cloud, Microsoft Azure, Amazon,...).

You need to use cluster version of openM++ to run the model on multiple computers in your network, in cloud or HPC cluster environment. OpenM++ is using MPI to run the models on multiple computers.

By default openM++ model runs with one sub-value and in single thread, which is convenient to debug or study your model. There are following options to run openM++ model:

  • "default" run: one sub-value and single thread
  • "desktop" run: multiple sub-values and multiple threads
  • "restart" run: finish model run after previous failure (i.e. power outage)
  • "task" run: multiple input sets of data (a.k.a. multiple "scenarios" in Modgen), multiple sub-values and threads
  • "cluster" run: multiple sub-values, threads and model process instances runs on LAN or cloud (required MPI)
  • "cluster task" run: same as "cluster" plus multiple input sets of data (required MPI)

Please also check Model Run Cycle: How model finds input parameters for more details.

Sub-values: sub-samples, members, replicas

Following terms: "simulation member", "replica", "sub-sample" are often used in micro-simulation conversations interchangeably, depending on context. To avoid terminology discussion openM++ uses "sub-value" as equivalent of all above and some older pages of that wiki may contain "sub-sample" in that case.

Default run: simplest

File:Model run default 20180205 1.png
Model run by default: Single thread and one sub-value

If no any options specified to run the model then

  • all parameters are from default input data set
  • single thread is used for modeling
  • only one sub-value calculated
modelOne.exe

It is most simple way to debug your model.

Desktop run: model run on single computer

File:Model run desktop 20180205 1.png
Model run on desktop: Multiple threads and multiple sub-values

If only single computer available then

  • user can specify which set of input data to use (by set name or id)
  • number of sub-values to calculate
  • number of modeling threads to use
modelOne.exe -OpenM.SetName modelOne -OpenM.Subvalues 16 -OpenM.Threads 4

Restart run: finish model run after previous failure

If previous model run was not completed (i.e. due to power failure or insufficient disk space) you can restart it by specifying run id:

modelOne.exe -OpenM.RestartRunId 11

Task run: multiple sets of input data

File:Model run task 20180205 1.png
Modeling task run: Multiple sets of input data

Modeling task consists of multiple sets of input data and can be run in batch mode. For example, it is make sense to create modeling task to Run RiskPaths model from R with 800 sets of input data to study Childlessness by varying

  • Age baseline for first union formation
  • Relative risks of union status on first pregnancy
RiskPaths.exe -OpenM.TaskName Childlessness -OpenM.Subvalues 8 -OpenM.Threads 4

Run of such modeling task will read 800 input sets with set id [1, 800] and produce 800 model run outputs with run id [801, 1600] respectively.

Dynamic task run: wait for input data

It is possible to append new sets of input data to the task as it runs. That allow you to use some optimization methods rather than simply calculate all possible combinations of input parameters. In that case modeling task does not completed automatically but wait for external "task can be completed" signal. For example:

#
# pseudo script to run RiskPaths and find optimal solution for Childlessness problem
# you can use R or any other tools of your choice
#
# # create Childlessness task
# # run loop until you satisfied with results

RiskPaths.exe -OpenM.TaskName Childlessness -OpenM.TaskWait true

# # find your modeling task run id, i.e.: 1234
# # analyze model output tables
# # if results not optimal
#   # then append new set of input data into task "Childlessness" and continue loop
#   # else signal to RiskPaths model "task can be completed":
#   #   UPDATE task_run_lst SET status = 'p' WHERE task_run_id = 1234;
#
# Done.
#

Cluster run: model run on multiple computers

File:Model run cluster 20180205 1.png
Model run on cluster: Multiple modeling processes

You use MPI to run the model on multiple computers over network or in cloud or on HPC cluster. For example, to run 4 instances of modelOne.exe with 2 threads each and compute 16 sub-values:

mpiexec -n 4 modelOne.exe -OpenM.Threads 2 -OpenM.Subvalues 16

Please notice, usage of "mpiexec -n 4 ...." as above is suitable for test only and you should use your cluster tools for real model run.

Cluster task: run modeling task on multiple computers

Modeling task with 1000x input data sets can take long time to run and it is recommended to use cluster (multiple computers over network) or cloud, such as Google Compute Engine, to do that. For example, RiskPaths task above can be calculated much faster if 200 servers available to run it:

mpiexec -n 200 RiskPaths.exe -OpenM.TaskName Childlessness -OpenM.Subvalues 16 -OpenM.Threads 4

Please notice, usage of "mpiexec -n 200 ...." as above is suitable for test only and you should use your cluster tools for real model run.

Dynamic task: you can use -OpenM.TaskWait true argument as described above to dynamically change task as it runs.

<metadesc>OpenM++: open source microsimulation platform</metadesc>