# Migrating to AiiDA

:::{admonition} Learning Objectives
:class: learning-objectives

In this section, we will look at how to migrate from running a quantum code from text-based input files, to running it within AiiDA, and understand how AiiDA automates the computation execution and output parsing.

We shall take the example of Quantum ESPRESSO, but the same principles apply to any other code.
This would be a typical command line script to run a Quantum ESPRESSO relaxation:

```console
$ mpirun -np 2 pw.x -in pwx.in > pwx.out
```

:::


## Modularising the inputs

The first step is to modularise the inputs within the `input.in` file, and any pseudo-potential files.

By splitting them into separate components, we can create **re-usable** building blocks for multiple calculations.
We shall also see later how these components can be generated from external data sources, such as databases or web APIs.

![pw-to-aiida](_static/pw-to-aiida.svg){height=500px}


In the diagram above, we have split the input generation into separate entities, handling the different aspects of the calculation and allowing for component re-use.
For a `pw.x` calculation, we need to create the following nodes:

- {term}`Computer`, which describes how we interface with a compute resource
- {term}`Code`, which contains the information on how to execute a single calculation
- `StructureData`, which contains the crystal structure
- `UpfData`, which contains the pseudo-potentials per atomic
- `KpointsData`, which contains the k-point mesh
- `Dict` node, which contains the parameters for the calculation


## The AiiDA Profile

First we need to create a new AiiDA profile.
This is where we store all the nodes generated for a project, and the links between them.


:::{note}

Here we generate a profile with temporary, in-memory storage, which will be destroyed when the Python is restarted.
This is useful for testing, but for a real project, you would create a persistent profile connected to a PostgreSQL database,
using the `verdi quicksetup` command.

:::


In [None]:
from local_module import load_temp_profile

data = load_temp_profile(name="qe-to-aiida")
data

In [None]:
import aiida

profile = aiida.load_profile("qe-to-aiida")
profile

In [None]:
%verdi profile show qe-to-aiida

We can check on the status of the profile using the `verdi status` command.


In [None]:
%verdi -p qe-to-aiida status --no-rmq

We can also check the statistics of the profile's storage.
Before running any simulations, we see that only a single {term}`User` node has been created, which is the default creator of data for the profile.


In [None]:
%verdi storage info

## Connecting to a compute resource


An AiiDA {term}`Computer` represents a compute resource, such as a local or remote machine.
It contains information on how to connect to the machine, how to **transport** data to/from the compute resource, and how to **schedule** jobs on it.

In the following we will use a simple `local_direct` computer, which connects to the local machine, and runs the calculations directly, without any scheduler.

AiiDA also has built-in support for a number of {term}`Scheduler`s, including:

- `pbspro`
- `slurm`
- `sge`
- `torque`
- `lsf`

Connections to remote machines can be made using the `SSH` {term}`Transport`, and [aiida-code-registry](https://github.com/aiidateam/aiida-code-registry) provides a collection of example configurations for Swiss based HPC clusters.

We can create the computer using the `verdi computer setup` CLI.


In [None]:
%verdi computer setup \
 --non-interactive \
 --label local_direct \
 --hostname localhost \
 --description "Local computer with direct scheduler" \
 --transport core.local \
 --scheduler core.direct \
 --work-dir {data.workdir} \
 --mpiprocs-per-machine {data.cpu_count}

In [None]:
%verdi computer configure core.local local_direct \
 --non-interactive \
 --safe-interval 0

Or we can use the `Computer` class from the `aiida.orm` API module.


In [None]:
created, computer = aiida.orm.Computer.collection.get_or_create(
 label="local_direct",
 description="local computer with direct scheduler",
 hostname="localhost",
 workdir=str(data.workdir),
 transport_type="core.local",
 scheduler_type="core.direct",
)
if created:
 computer.store()
 computer.set_minimum_job_poll_interval(0.0)
 computer.set_default_mpiprocs_per_machine(data.cpu_count)
 computer.configure()
computer

Now we have a computer, ready to run calculations on.


In [None]:
%verdi computer show local_direct

## Setting up a code plugin


An AiiDA {term}`Code` represent a single executable, and contain information on how to execute it.
The `Code` node is associated with a specific `Computer`, contains the path to the executable, and is associated with a specific {term}`CalcJob` plugin we shall discuss later.


Again, we can use either the CLI or the API to create a new `Code` node.


In [None]:
%verdi code setup \
 --non-interactive \
 --label pw.x \
 --description "Quantum ESPRESSO pw.x code" \
 --computer local_direct \
 --remote-abs-path {data.pwx_path} \
 --input-plugin quantumespresso.pw \
 --prepend-text "export OMP_NUM_THREADS=1"

In [None]:
try:
 code = aiida.orm.load_code("pw.x@local_direct")
except aiida.common.NotExistent:
 code = aiida.orm.Code(
 input_plugin_name="quantumespresso.pw",
 remote_computer_exec=[computer, data.pwx_path],
 )
 code.label = "pw.x"
 code.description = "Quantum ESPRESSO pw.x code"
 code.set_prepend_text("export OMP_NUM_THREADS=1")
 code.store()
code

Now we have a code ready to run our computations.


In [None]:
%verdi code show pw.x

## Deconstructing the input file


Let's now take a look at a typical `pw.x` input file, and how we can convert it to the requisite AiiDA nodes.

:::{note}

Here we are simply generating the inputs from a pre-written input file.
But in practice, you would want to generate the inputs from a Python script, or from a database or web API, as we shall see in the next section.

:::


In [None]:
%cat direct_run/pwx.in

To decompose this file into the components we need, we can use the [qe_tools](https://pypi.org/project/qe-tools/) package, which provides a Python API to parse Quantum ESPRESSO input files.


In [None]:
import qe_tools

pw_input = qe_tools.parsers.PwInputFile(open("direct_run/pwx.in").read())
pw_input

We can then generate our AiiDA input {term}`Data` nodes.


In [None]:
structure = aiida.orm.StructureData(cell=pw_input.structure["cell"])
for p, s in zip(pw_input.structure["positions"], pw_input.structure["atom_names"]):
 structure.append_atom(position=p, symbols=s)
structure

In [None]:
kpoints = aiida.orm.KpointsData()
kpoints.set_cell_from_structure(structure)
kpoints.set_kpoints_mesh(
 pw_input.k_points["points"],
 offset=pw_input.k_points["offset"],
)
kpoints

In [None]:
# AiiDA will handle assigning file names to generated input files,
# and computing te system type from the structure.
_parameters = pw_input.namelists
for disallowed in ["pseudo_dir", "outdir", "prefix"]:
 _parameters["CONTROL"].pop(disallowed, None)
for disallowed in ["nat", "ntyp"]:
 _parameters["SYSTEM"].pop(disallowed, None)
parameters = aiida.orm.Dict(dict=_parameters)
parameters

In [None]:
from os.path import abspath

pseudo_si, _ = aiida.orm.UpfData.get_or_create(
 abspath("direct_run/pseudo/Si.pbe-n-rrkjus_psl.1.0.0.UPF")
)
pseudo_si

## Setting up the inputs for a calculation


Using `verdi plugin list aiida.calculations` we can inspect the full specification for the inputs of the calculation plugin we wish to use.


In [None]:
%verdi plugin list aiida.calculations quantumespresso.pw

Since we already assigned the `quantumespresso.pw` plugin to our `Code` node, we can load it and use the `get_builder` to generate a template for the inputs, known as the `Builder`.

The `Builder` provides us a structured way to add (and validate) the inputs for the calculation.
Below we add the input nodes that we have created for our calculation.


In [None]:
code = aiida.orm.load_code("pw.x@local_direct")
builder = code.get_builder()
builder.structure = structure
builder.parameters = parameters
builder.kpoints = kpoints
builder.pseudos = {"Si": pseudo_si}

# we can also add metadata like the maximum walltime
builder.metadata.options.max_wallclock_seconds = 30 * 60

builder

## Running the calculation


AiiDA provides two main ways to run a calculation:

1. Using the `engine.run` functions, which runs the computation directly and waits for it to complete.
2. Using the `engine.submit` function, which submits the calculation to the AiiDA daemon, which can be started in the background and manages the execution of the calculations.


In [None]:
output = aiida.engine.run_get_node(builder)
output.node

## How the calculation is run


On executing the calculation, AiiDA will:

1. Generate the input files necessary for the calculation, and the submission script specific to the computer's scheduler.
2. Write the input files to the desired location on the local/remote computer.
3. Submit the job to the scheduler.
4. Monitor the job until it completes.
5. Retrieve the output files from the computer.
6. Parse the output files and store the results.


The generated input files are stored on the `CalcJobNode`.

In [None]:
calcnode_repo = output.node.base.repository
print("input files: ", calcnode_repo.list_object_names())
print("-" * 10 + "\naiida.in\n" + "-" * 10)
print(calcnode_repo.get_object_content("aiida.in"))
print("-" * 16 + "\n_aiidasubmit.sh\n" + "-" * 16)
print(calcnode_repo.get_object_content("_aiidasubmit.sh"))

These are then "transported" to the remote computer, into a unique sub-folder of the the working directory.

:::{tip}

These folders and their contents are not deleted by default after the calculation is completed, and can be inspected at any time with `verdi calcjob gotocomputer `.

Many workflows though can be configured to clean up these folders after the calculation is (successfully) completed, to save disk space.

:::

In [None]:
output.node.get_remote_workdir()

The retrieved output files are stored in the `retrieved` output node from the `CalcJobNode`.

In [None]:
print("output files:", output.node.get_retrieved_node().list_object_names())
print("-" * 10 + "\naiida.out\n" + "-" * 10)
print(output.node.get_retrieved_node().get_object_content("aiida.out"))

and finally, the parsed results are stored on defined output nodes from the `CalcJobNode`.

In [None]:
%verdi process show {output.node.pk}

We can then access key results from the calculation using the `CalcJobNode`s `outputs` method (or loading the node by its identifier).

In [None]:
output.node.outputs.output_parameters.get_dict()

AiiDA automatically generates links between the inputs, calculation and outputs, to generate the provenance graph.
The provence graph is a directed acyclic graph (DAG) that contains the nodes and links between them, and can be used for visualisation of a calculation or workflow, or with advance querying of the stored results.


In [None]:
from aiida.tools.visualization import Graph

graph = Graph()
graph.add_incoming(output.node, annotate_links="both")
graph.add_outgoing(output.node, annotate_links="both")
graph.graphviz

In [None]:
query = aiida.orm.QueryBuilder()
query.append(aiida.orm.StructureData, tag="initial", project="*")
query.append(
 aiida.orm.CalcJobNode,
 filters={"attributes.process_state": "finished"},
 tag="calculation",
 with_incoming="initial",
 project="id",
)
query.append(
 aiida.orm.StructureData, tag="final", with_incoming="calculation", project="*"
)
query.dict()

## Saving compute time with caching

Over the course of a project, you may end up re-running the same calculations multiple times - perhaps because two workflows include the same calculation.

Since AiiDA stores the full provenance of each calculation, it can detect whether a calculation has been run before and, instead of running it again, simply reuse its outputs, thereby saving valuable computational resources. This is what we mean by **caching** in AiiDA.

With caching enabled, AiiDA searches the database for a calculation of the same hash. If found, AiiDA creates a copy of the calculation node and its results, thus ensuring that the resulting provenance graph is independent of whether caching is enabled or not.

![caching](_static/caching.png){align=center width="150px"}

Caching happens on the calculation level (not the workchain level), and is **not** enabled by default.
We can enable it by setting the `verdi config` options.

In [None]:
%verdi config set caching.enabled_for 'aiida.calculations:quantumespresso.pw'

In [None]:
%verdi config list caching

Now, when we run the same calculation again, AiiDA will detect that it has already been run, and will simply reuse the results from the previous run!

In [None]:
output = aiida.engine.run_get_node(builder)
output.node

We can see that the calculation was created from the cache by checking the following:

In [None]:
output.node.base.caching.is_created_from_cache

:::{seealso}

The [caching how-to documentation](https://aiida.readthedocs.io/projects/aiida-core/en/latest/howto/run_codes.html#how-to-run-codes-caching).

:::