OpenMPCD
Public Member Functions | Static Public Member Functions | Public Attributes | List of all members
MPCDAnalysis.DataManager.DataManager Class Reference

Public Member Functions

def __init__ (self, dataPaths=None)
 
def getProjects (self)
 
def getClusterNodesGroupedByGPUCount (self)
 
def getSubmittedJobBatchesGroupedByJobBatchSize (self)
 
def selectProject (self, name)
 
def getProject (self)
 
def getRuns (self)
 
def getNumberOfSweepsByRun (self, run)
 
def getNumberOfSweepsByRuns (self, rundirs)
 
def getRunsFilteredByConfig (self, filters)
 
def getRunsWithMatchingConfiguration (self, configuration, pathTranslators=[])
 
def getTargetDataSpecifications (self)
 
def getTargetDataSpecificationStatus (self, specification, defaultTargetSweepCount=None)
 
def getTargetDataSpecificationsGroupedByStatus (self, defaultTargetSweepCount=None)
 
def createRundirByTargetDataSpecification (self, specification)
 
def createSlurmJobScripts (self, rundirs, executablePath, executableOptions, slurmOptions={}, srunOptions={}, chunkSize=1, excludedNodes=None, pathTranslator=None)
 

Static Public Member Functions

def getConfigurationPath ()
 

Public Attributes

 rundirs
 
 runs
 
 pathTranslators
 
 pathTranslatorsServerToLocal
 
 projects
 
 project
 
 cluster
 

Detailed Description

Provides information on the data that have been generated by OpenMPCD.

Throughout this class, the a "config part generator" is understood to be a
generator in the Python sense (c.f. the `yield` keyword), which yields a
list; each of the elements of this list is a dictionary, containing:
  - `settings`:
    A dictionary, with each key being a configuration setting name, and the
    value being the corresponding value;
  - `pathComponents`:
    A list of all path components that the generator request be added to the
    run directory name.
  - `targetSweepCount`:
    The minimum number of sweeps this configuration must be simulated for,
    possibly across multiple runs. Set to `None` if no such minimum is
    desired.

Definition at line 28 of file DataManager.py.

Constructor & Destructor Documentation

◆ __init__()

def MPCDAnalysis.DataManager.DataManager.__init__ (   self,
  dataPaths = None 
)
    The constructor.

    This will require the file returned by `getConfigurationPath`
    to be readable, and contain in `dataPaths` a list of OpenMPCD run
    directories. Each entry in the list will be processed through Python's
    `glob.glob` function, and as such, special tokens such as '*' may be
    used to match a larger number of directories. If any of the matching
    directories is found to not be a OpenMPCD run directory, it is ignored.
    Furthermore, each entry in `dataPaths` may contain an initial '~'
    character, which will be expanded to the user's home directory.

    Alternatively, the list of data paths may be supplied as the `dataPaths`
    variable, which takes over the configuration file.

    @param[in] dataPaths
               A list of data paths, or `None` for the default (see function
               description).

Definition at line 61 of file DataManager.py.

Member Function Documentation

◆ createRundirByTargetDataSpecification()

def MPCDAnalysis.DataManager.DataManager.createRundirByTargetDataSpecification (   self,
  specification 
)
    Creates a new rundir, and configuration files therein, for the given
    target data specification (c.f. `getTargetDataSpecifications`), and
    returns the newly created path.

Definition at line 474 of file DataManager.py.

◆ createSlurmJobScripts()

def MPCDAnalysis.DataManager.DataManager.createSlurmJobScripts (   self,
  rundirs,
  executablePath,
  executableOptions,
  slurmOptions = {},
  srunOptions = {},
  chunkSize = 1,
  excludedNodes = None,
  pathTranslator = None 
)
    For each of the given `rundirs`, creates a Slurm job script at
    `input/job.slrm` (relative to the respective rundir) that can be used to
    submit a job via `sbatch`, or alternatively, if the rundir is part of a
    larger job controlled via a jobscript in another run directory, creates
    the file `input/parent-job-path.txt`, which contains the absolute path
    to the parent job.

    The job script will assume that the OpenMPCD executable will reside at
    `executablePath`, which should most probably be an absolute path. The
    `--rundir` option, with the respective rundir specification as its
    value, will be added to the string of program arguments
    `executableOptions`.

    `executableOptions` is a string that contains all options that are
    passed to the executable upon invocation, as if specified in `bash`.

    `slurmOptions` is a dictionary, with each key specifying a Slurm
    `sbatch` option (e.g. "-J" or "--job-name") and its value specifying
    that option's value. There, the special string "$JOBS_PER_NODE" will be
    replaced with the number of individual invocations of the given
    executable in the current Slurm job. See `chunkSize` below.
    Furthermore, for each value, the special string "$RUNDIR" will be
    replaced with the absolute path to the run directory that will contain
    the `sbatch` job script.

    `srunOptions` is a dictionary, with each key specifying a `srun` option
    (e.g. --gres") and its value specifying that option's value, or `None`
    if there is no value.
    For each value, the special string "$RUNDIR" will be replaced with the
    absolute path to the run directory.

    `chunkSize` can be used to specify that one Slurm job should contain
    `chunkSize` many individual invocations of the executable given. If
    the number of `rundirs` is not divisible `chunkSize`, an exception is
    thrown.

    If `excludedNodes` is not `None`, it is a string describing,
    in Slurm's syntax (e.g. "n25-02[1-2]"), which compude nodes
    should be excluded from executing the jobs.

    If `pathTranslator` is not `None`, it is called on each absolute path
    before writing its value to some file, or returning it from this
    function.
    This is useful if this program is run on one computer, but the resulting
    files will be on another, where the root directory of the project (or
    the user's home) is different.

    The function returns a list of server paths to the created jobfiles.

Definition at line 560 of file DataManager.py.

◆ getClusterNodesGroupedByGPUCount()

def MPCDAnalysis.DataManager.DataManager.getClusterNodesGroupedByGPUCount (   self)
    Returns a dictionary, with each key corresponding the number of GPUs
    installed on the individual systems grouped in the corresponding
    dictionary value. Each value is a dictionary, with the following
    entries:
        * "nodes": A list of nodes that fall into that category
        * "SlurmNodeList": A string, compatible with the `Slurm` scheduler,
          that collects all the nodes in entry `nodes`.

Definition at line 135 of file DataManager.py.

◆ getConfigurationPath()

def MPCDAnalysis.DataManager.DataManager.getConfigurationPath ( )
static
    Returns the path at which the configuration file for this class is
    expected.

Definition at line 36 of file DataManager.py.

◆ getNumberOfSweepsByRun()

def MPCDAnalysis.DataManager.DataManager.getNumberOfSweepsByRun (   self,
  run 
)
    Returns the number of completed sweeps in the given `run`, which may be
    an instance of `Run`, or be a string pointing to a run directory.
    The returned value corresponds to `run.getNumberOfCompletedSweeps()`,
    pretending that `run` is indeed an instance of `Run`.

Definition at line 255 of file DataManager.py.

◆ getNumberOfSweepsByRuns()

def MPCDAnalysis.DataManager.DataManager.getNumberOfSweepsByRuns (   self,
  rundirs 
)
    Returns the sum of the number of completed sweeps in all of the given
    `runs`, which may be instances of `Run`, or strings pointing to run
    directories.

Definition at line 269 of file DataManager.py.

◆ getProject()

def MPCDAnalysis.DataManager.DataManager.getProject (   self)
    Returns the currently selected project, or `None` if none has been
    selected.

Definition at line 227 of file DataManager.py.

◆ getProjects()

def MPCDAnalysis.DataManager.DataManager.getProjects (   self)
    Returns all configured projects.

Definition at line 120 of file DataManager.py.

◆ getRuns()

def MPCDAnalysis.DataManager.DataManager.getRuns (   self)
    Returns a list of `Run` instances, corresponding to the run directories
    that have been found.

Definition at line 237 of file DataManager.py.

◆ getRunsFilteredByConfig()

def MPCDAnalysis.DataManager.DataManager.getRunsFilteredByConfig (   self,
  filters 
)
    Returns a list of runs that match the given criteria.

    The argument `filters` is expected to be a function, or a list of
    functions, each taking an object that represents the configuration for a
    particular run, returning `False` if it should be filtered out of the
    result set, or `True` otherwise (filters applied later might still
    remove that configuration from the result set).

Definition at line 288 of file DataManager.py.

◆ getRunsWithMatchingConfiguration()

def MPCDAnalysis.DataManager.DataManager.getRunsWithMatchingConfiguration (   self,
  configuration,
  pathTranslators = [] 
)
    Returns the result of `getRuns` filtered by the condition that the run's
    configuration must be equivalent (in the sense of
    `Configuration.isEquivalent`) to the given `configuration`.

    @param[in] pathTranslators
               This argument will be passed as the `pathTranslators`
               argument to `Configuration.isEquivalent`.

Definition at line 320 of file DataManager.py.

◆ getSubmittedJobBatchesGroupedByJobBatchSize()

def MPCDAnalysis.DataManager.DataManager.getSubmittedJobBatchesGroupedByJobBatchSize (   self)
    Returns a dictionary, with each key being a number of jobs executed in
    parallel on one node, and the value being the list of job batches
    having exactly this many jobs, which are submitted and pending
    execution.

Definition at line 162 of file DataManager.py.

◆ getTargetDataSpecifications()

def MPCDAnalysis.DataManager.DataManager.getTargetDataSpecifications (   self)
    Returns, for the currently selected project, a list of dictionaries;
    the latter each contain a key `config`, which contains a `Configuration`
    instance, and a key `targetSweepCount`, which contains the number of
    sweeps that are desired to be in the data gathered with this
    configuration, or `None` if none is specified. Furthermore, the key
    `pathComponents` contains a list of all path components that the
    generators request be added to the run directory name.

Definition at line 346 of file DataManager.py.

◆ getTargetDataSpecificationsGroupedByStatus()

def MPCDAnalysis.DataManager.DataManager.getTargetDataSpecificationsGroupedByStatus (   self,
  defaultTargetSweepCount = None 
)
    Takes the values returned by `getTargetDataSpecifications`, and groups
    them into three categories in the returned dictionary: `completed`
    contains all the target data specifications that have achieved their
    target sweep counts, `pending` contains all the target data
    specifications that are not yet completed, but have runs being executed
    or being scheduled for execution, and `incomplete` contains the rest.

    @param[in] defaultTargetSweepCount
               This parameter is used as the sweep count for target data
               specifications that do not have a target sweep count set.
               If `defaultTargetSweepCount` parameter is `None`, all target
               data specifications must specify a target sweep count.

Definition at line 448 of file DataManager.py.

◆ getTargetDataSpecificationStatus()

def MPCDAnalysis.DataManager.DataManager.getTargetDataSpecificationStatus (   self,
  specification,
  defaultTargetSweepCount = None 
)
    For the given target data `specification`, returns:
      - `"completed"`
        if the specification has achieved its target sweep count,
      - `"pending"`
        if it is not yet completed, but has runs being executed or being
        scheduled for execution, or
      - `"incomplete"` in any other case.

    @param[in] defaultTargetSweepCount
               This parameter is used as the sweep count for target data
               specifications that do not have a target sweep count set.
               If `defaultTargetSweepCount` parameter is `None`, all target
               data specifications must specify a target sweep count.

Definition at line 404 of file DataManager.py.

◆ selectProject()

def MPCDAnalysis.DataManager.DataManager.selectProject (   self,
  name 
)
    Selects the project of the given `name` as the currently active one.

Definition at line 186 of file DataManager.py.


The documentation for this class was generated from the following file: