Installation

MegaQC has been written in Python using the Flask web framework. MegaQC is designed to be very simple to get up and running for basic testing and evaluation, yet super easy to configure for a high performance production installation. The various ways of getting a runnable MegaQC instance are explained in the following sections.

Production

This section explains how to set up a production environment without the usage of container technologies. If you want to run MegaQC in a containerized environment please refer to Docker.

1. Install the MegaQC package

MegaQC is available on both the Python Package Index (PyPI). We are planning to add MegaQC to Conda soon. To install using PyPI, run the following command:

pip install megaqc[prod]

2. Export environment variables

By default, MegaQC runs in development mode with a sqlite flat file database (this is to make it as simple as possible to get up and running for a quick test / demo). To tell MegaQC to use a production server, you need to set the MEGAQC_PRODUCTION environment variable to true (export MEGAQC_PRODUCTION=1).

If you are running MegaQC behind a custom domain name (recommended, it’s nicer than just having a difficult to remember IP address), then you need to set SERVER_NAME to the URL of the website.

Add the following lines to your .bashrc file:

export MEGAQC_PRODUCTION=1
export SERVER_NAME='http://megaqc.yourdomain.com'

3. Set up the database

MegaQC uses the Flask SQLAlchemy plugin, meaning that it can be used with any SQL database (PostgreSQL, MySQL, SQLite and others).

MegaQC has been developed with PostgreSQL, see below. For instructions. If you use MegaQC with any other database tools and could contribute to the documentation, that would be great!

3.1 Using a PostgreSQL database

First, install PostgreSQL: https://wiki.postgresql.org/wiki/Detailed_installation_guides

Then, install the Python package that handles requests:

pip install psycopg2

MegaQC can assess whether the database to use is postgresql. If it is, it will try to connect as megaqc_user to the database megaqc on localhost:5432. On failure, MegaQC will attempt to create the user and the database, and will then export the schema.

In order to make this happen, run :

megaqc initdb

3.2 Using a MySQL database

Although PostgreSQL is highly recommended, MegaQC should work with other SQL database back ends, such as MySQL.

Please note that MySQL support is currently untested and unsupported. If you use MegaQC with MySQL we’d love to hear about your experiences!

First, install MySQL: https://dev.mysql.com/doc/refman/5.7/en/installing.html

Then install the Python MySQL connector (alternatively with the PyPI package).

Now, create a custom MegaQC configuration file somewhere and set the environment variable MEGAQC_CONFIG to point to it. For example, in ~/.bashrc:

export MEGAQC_CONFIG="/path/to/megaqc_config.yaml"

Then in this file, set the following configuration key pair:

SQLALCHEMY_DBMS: mysql

This should, hopefully, make everything work. If you have problems, please create an issue and we’ll do our best to help.

5. Start the web server

gunicorn --log-file megaqc.log --timeout 300 megaqc.wsgi:app

Note:We recommend using a long timeout as the data upload from MultiQC can take several minutes for large reports

At this point, MegaQC should be running on the default gunicorn port (8000)

You should now have a fully functional MegaQC server running! 🎉

Troubleshooting

The password encryption relies on the libffi-devel package to work. If you run an older OS, ensure that the package is installed.

Docker

MegaQC offers two ways of getting a containerized setup running:

  1. A single Docker container containing MegaQC with a Gunicorn WSGI HTTP server

  2. A Docker Compose stack containing the MegaQC container, a Postgres container and a NGINX container

The MegaQC Docker container

Overview

The MegaQC container is based on the Node container to compile all Javascript scripts and the Gunicorn Flask container providing Gunicorn, Flask and MegaQC preconfigured for production deployments. The Gunicorn Flask container is also the one spinning up the final server.

Pulling the docker image from dockerhub

To run MegaQC with docker, simply use the following command:

docker run -p 80:80 ewels/megaqc

This will pull the latest image from dockerhub and run MegaQC on port 80.

Note that you will need to publish the port in order to access it from the host, or other machines. For more information, read https://docs.docker.com/engine/reference/run/ .

Building your own docker image

If you prefer, you can build your own docker image if you have pulled the MegaQC code from GitHub. Simply cd to the MegaQC root directory and run

docker build . -t ewels/megaqc

You can then run MegaQC as described above:

docker run -p 80:80 ewels/megaqc

Configuration

Besides the sections below it is also recommended to read the Gunicorn Flask container documentation, which explains how to customize the host IP where Gunicorn listens to requests, the port the container should listen on and bind, the actual host and port passed to gunicorn, let alone custom Gunicorn configuration files.

Environment variables

By default, the MegaQC related environment variables are set to:

MEGAQC_PRODUCTION=1
MEGAQC_SECRET="SuperSecretValueYouShouldReallyChange"
MEGAQC_CONFIG=""
APP_MODULE=megaqc.wsgi:app
DB_HOST="127.0.0.1"
DB_PORT="5432"
DB_NAME="megaqc"
DB_USER="megaqc"
DB_PASS="megaqcpswd"

To run MegaQC with custom environment variables use the -e key=value run options. For more information, please read Docker - setting environment variables. Running MegaQC for example with a custom database password works as follows:

docker run -e DB_PASS=someotherpassword ewels/megaqc

Furthermore, be aware that the default latest tag will typically be a development version and may not be very stable. You can specify a tagged version to run a release instead:

docker run -p 80:80 ewels/megaqc:v0.1

Also note that docker will use a local version of the image if it exists. To pull the latest version of MegaQC use the following command:

docker pull ewels/megaqc
Using persistent data

The Dockerfile has been configured to automatically create persistent volumes for the data and log directories. This volume will be created without additional input by the user, but if you want to re-use those volumes with a new container you must specify them when running the docker image.

The easiest way to ensure the database persists between container states is to always specify the same volume for /usr/local/lib/postgresql. If a volume is found with that name it is used, otherwise it creates a new volume.

To create or re-use a docker volume named pg_data:

docker run -p 80:80 -v pg_data:/usr/local/lib/postgresql ewels/megaqc

The same can be done for a log directory volume called pg_logs

docker run -p 80:80 -v pg_data:/usr/local/lib/postgresql -v pg_logs:/var/log/postgresql ewels/megaqc

If you did not specify a volume name, docker will have given it a long hex string as a unique name. If you do not use volumes frequently, you can check the output from docker volume ls and docker volume inspect $VOLUME_NAME. However, the easiest way is to inspect the docker container.

# ugly default docker output
docker inspect --format '{{json .Mounts}}' example_container

# use jq for pretty formatting
docker inspect --format '{{json .Mounts}}' example_container | jq

# or use python for pretty formatting
docker inspect --format '{{json .Mounts}}' example_container | python -m json.tool

Example output for the above, nicely formatted:

[
{
   "Type": "volume",
   "Name": "7c8c9dfbcc66874b472676659dde6a5c8e15dea756a620435c83f5980c21d804",
   "Source": "/var/lib/docker/volumes/7c8c9dfbcc66874b472676659dde6a5c8e15dea756a620435c83f5980c21d804/_data",
   "Destination": "/usr/local/lib/postgresql",
   "Driver": "local",
   "Mode": "",
   "RW": true,
   "Propagation": ""
},
{
   "Type": "volume",
   "Name": "6d48d24a660d078dfe4c04960aeb1848ea688a3eae0d4b7b54b1043f7885e428",
   "Source": "/var/lib/docker/volumes/6d48d24a660d078dfe4c04960aeb1848ea688a3eae0d4b7b54b1043f7885e428/_data",
   "Destination": "/var/log/postgresql",
   "Driver": "local",
   "Mode": "",
   "RW": true,
   "Propagation": ""
}
]
Running MegaQC with a local Postgres database

To access a Postgres database running on a localhost you need to use the host’s networking. For more information, read https://docs.docker.com/network/host/ .

An example command to run MegaQC with a Postgres database which is accessible on localhost:5432, looks as follows:

docker run --network="host" -p 5432 ewels/megaqc

Note that by default localhost=127.0.0.1.

The MegaQC Docker Compose stack

Since a fully working and performant MegaQC instance depends on a SQL database and a reverse proxy, MegaQC offers a docker-compose stack, which sets up three containers for a zero configuration setup.

Overview

The docker-compose configuration can be accessed in the deployment folder. The docker-compose configuration provides the The MegaQC Docker container, a postgres container for the SQL database and a nginx container for the reverse proxy setup.

Usage

Inside the deployment folder the docker-compose configuration together with the associated .env file are found. To spin up all containers simply run from inside the deployment folder:

docker-compose up

All containers should now spin up and the MegaQC server should be accessible on 0.0.0.0:80. Alternatively, you can spin up the containers in the background:

docker-compose up -d

The -d option detaches from the containers, but will keep them running.

Configuration

Environment variables

The default environment variables for MegaQC used when starting the The MegaQC Docker container are defined inside the .env file. Simply edit the file and the new environment variables will be passed to the The MegaQC Docker container.

Further runtime arguments

Further runtime arguments can be added to a command section inside the docker-compose configuration file.

HTTPS

By default, the MegaQC stack ships with a self-signed SSL certificate for testing purposes. For this reason we recommend that you use HTTP to access the stack. However, if you want to enable HTTPS, perhaps because you are making MegaQC available on the public internet, then it should be simple to install your own certificates. To do so, go to the deployment directory and edit the .env file. Then, edit these lines to the full filepath of the respective .crt and .key files:

CRT_PATH=./nginx-selfsigned.crt
KEY_PATH=./nginx-selfsigned.key

After this, run the stack as described above, and then you should be able to access MegaQC on https://your_hostname.

Development

Prerequisites

You will need:

1. Clone the repo

If you’re doing development work, you need access to the source code

git clone https://github.com/ewels/MegaQC

2. Install the Python into a virtual environment

You should do your development in a virtual environment. You also need to install MegaQC and all its dependencies there:

cd MegaQC
python3 -m venv venv
source venv/bin/activate
pip install -e .[dev]

3. Enable development mode:

Setting this bash variable runs MegaQC in development mode. This means that it will show full Python exception tracebacks in the web browser as well as additional Flask plugins which help with debugging and performance testing.

export FLASK_DEBUG=1

4. Set up the database

Running this command creates an empty SQLite MegaQC database file in the installation directory called megaqc.db

megaqc initdb

5. Start megaqc

Start MegaQC:

megaqc run

You will have to run the rest of these commands in another terminal window, because megaqc run blocks the terminal.

6. Setup your access key

  • Login to MegaQC in your browser by browsing to http://localhost:5000/register/ (the port might differ, it will depend on what was output in the megaqc run stage previously

  • Once registered, visit http://localhost:5000/users/multiqc_config and follow the instructions there to configure your access token in ~/.multiqc_config.yaml.

  • Note: if you you’d rather not pollute your home directory, you can instead name the file multiqc_config.yaml and place it in the current (MegaQC) directory. However, you will then have to run megaqc upload from that directory each time

7. Load test data

In order to develop new features you need some data to test it with:

git clone https://github.com/TMiguelT/1000gFastqc
for report in $(find 1000gFastqc -name '*.json')
    do megaqc upload $report
done

8. Install the JavaScript and start compiling

This command will run until you cancel it, but will ensure that any changes to the JavaScript are compiled instantly:

npm install
npm run watch

9. Install the pre-commit hooks

MegaQC has a number of pre-commit hooks installed, which automatically format and check your code before you commit. To set it up, run:

pre-commit install

From now on, whenever you commit, each changed file will get processed by the pre-commit hooks. If a file is changed by this process (because your code style didn’t match the configuration), you’ll have to git add the files again, and then re-run git commit. If it lets you write a commit message then everything has succeeded.

Next Steps

You should now have a fully functional MegaQC test server running, accessible on your localhost at http://127.0.0.1:5000

Migrations

Introduction

Migrations are updates to a database schema. This is relevant if, for example, you set up a MegaQC database (using initdb), and then a new version of MegaQC is released that needs new tables or columns.

When to migrate

Every time a new version of MegaQC is released, you should ensure your database is up to date. You don’t need to run the migrations the first time you install MegaQC, because the megaqc initdb command replaces the need for migrations.

How to migrate

To migrate, run the following commands:

cd megaqc
export FLASK_APP=wsgi.py
flask db upgrade

Note: when you run these migrations, you must have the same environment as you use to run MegaQC normally, which means the same value of FLASK_DEBUG and MEGAQC_PRODUCTION environment variables. Otherwise it will migrate the wrong database (or a non-existing one).

Stamping your database

The complete migration history has only recently been added. This means that, if you were using MegaQC in the past when migrations were not included in the repo, your database won’t know what version you’re currently at.

To fix this, first you need to work out which migration your database is up to. Browse through the files in megaqc/migrations/versions, starting from the oldest date (at the top of each file), until you find a change that wasn’t present in your database. At this point, note the revision value at the top of the file, (e.g. revision = "007c354223ec").

Next, run the following command, replacing <revision ID> with the revision you noted above:

flask db stamp <revision ID>