How does it work?

As shown in the diagram, the system is controlled by an extensible reference ontology, published by a central ontology manager. This determines precisely what data is permitted to enter the system. Data can then enter the system in two ways:

  • Via a data lake or enterprise data warehouse that has already brought together data from multiple sources.
  • By bringing together data from disparate databases across an organisation or system.

As data enters the system it is passed through a coding engine that transforms the data into a novel coded format designed to support flexible manipulation and analysis (see below). The coded data is then used to define cohorts and set data restrictions that control access to the system. Any user can only view the cohort of patients and datasets associated with their role.

Within the scope of that role any end user can create further cohorts of any required level of complexity. For any selected cohort(s) the user can then explore and visualise their characteristics, using a suite of 16 ‘self-service’ Apps that cover all the key bases of day to day healthcare.

Click image to enlarge


Over the past ten years we have developed an entirely new technology designed specifically to facilitate data exploration and analysis. Below is a brief explanation of the underlying approach. In order to query a database it is necessary to identify and access data of interest and take into account relationships with other data. Access to such information is conventionally achieved through reference to database table structures, either through direct use of a querying tool or as represented in a schema diagram or similar. In contrast, we have taken a fundamentally different approach, by developing a unified method of representing all standard data types.

We do this by converting all data items into ‘Atmofacts’ – coded statements that have a three-part structure comprising, ‘Question’, ‘Answer type’ and ‘Answer’. As shown in Figure 1, each Atmofact is represented by a unique numeric code with fixed length which also has an associated range of attributes (not shown) that capture numeric values, natural language terms, position in a hierarchy, metadata and other relationships, including mappings to standard health nomenclatures such as ICD-10, LOINC or SNOMED. This data format is also language-independent so, for example, data collected in one language can be viewed in another.

This approach removes the need for queries to refer to the physical structure of the database, makes formulation simpler and modification more flexible. Atmofacts therefore enable both easy access to the data for display or manipulation in the user interface and the construction of highly generalised queries that work on any permutation of data elements. The meaning of data elements is displayed in everyday language, making the full contents of a database directly accessible to the non-technical user. Thus, a simple user interface can facilitate the production of highly specific user-generated queries, without the end user needing any prior knowledge of data structures or query languages.

Click to enlarge

Aggregating Data from Multiple Sources

The complexity of seamlessly bringing together data from multiple sources is one of the major challenges in working with large-scale health data. The use of Atmofacts offers considerable advantages in this area. Data import from one or more sources no longer requires complex mappings between database table structures and can be largely automated, massively reducing task complexity. In many circumstances it is feasible to fully automate the data import, whether importing from simple flat file databases, such as Excel, or complex clinical databases, such as electronic health records. Leveraging this benefit of the technology is one of the key objectives of the Atmohealth Project.

Integration: A Unified Journey from Form Design to Analytics

Click to enlarge