HELCOM PLUS
1.0
Database Functional Design Document
Oct 2013
Revision History
Date Version Description Author
HELCOM PLUS Database Functional Design Document ii Dec 2013
Table of Contents
1. Introduction.......................................................................................... 5
1.1. Purpose........................................................................................................... 5
1.2. Scope, Approach and Methods.....................................................................5
1.3. System Overview............................................................................................ 5
1.4. Acronyms and Abbreviations........................................................................5
1.5. Points of Contact............................................................................................ 6
2. System Overview................................................................................. 6
2.1. System Information..............................................
2.1.1. Database Management System Configuration........Error! Bookmark not
defined.
2.1.2. Database Software Utilities....................................................................7
2.1.3. Support Software..........................................
2.1.4. Security..........................................................
2.2. Architecture...........................................................
2.2.1. Hardware Architecture.................................
2.2.2. Software Architecture...................................
2.2.3. Interfaces.......................................................
2.2.4. Data Stores....................................................
3. Database Specifications.....................................................................8
3.1. Database Identification.........................................
3.2. Schema Information.............................................
3.2.1. Description....................................................
3.2.2. Physical Design......................................................................................8
3.2.3. Physical Structure................................................................................10
3.2.4. Naming Convention......................................
4. Database Design and Functionalities..............................................10
4.1. Design & Functional Support.......................................................................10
The database has been designed meet the below listed functional requirements..................10
4.2. User Management......................................................................................... 12
4.3. Performance Improvement..................................
4.4. Assumptions.........................................................
4.5. Issues....................................................................
4.6. Constraints............................................................
5. Database Administrative Functions........
5.1. Responsibility.......................................................
5.2. Systems Using the Database...............................
5.3. Relationship to Other Databases.........................
5.4. Special Instructions..............................................
5.5. Storage..................................................................
5.6. Recovery................................................................
6. Database Interfaces..................................
6.1. Database Interfaces..............................................
6.2. Operational Implications......................................
6.2.1. Data Transfer Requirements........................
6.2.2. Data Formats.................................................
6.3. Interface [Name]....................................................
6.4. Dependencies.......................................................
7. Non-Functional Design............................
7.1. Security Design.....................................................
7.2. Availability.............................................................
7.3. Scalability..............................................................
7.4. Performance..........................................................
7.5. Error Processing...................................................
7.6. Backups and Recovery........................................
7.7. Archiving...............................................................
HELCOM PLUS Database Functional Design Document iv Dec 2013
1. Introduction
The HELCOM PLUS project aims to modernize the HELCOM waterborne pollution load compilation
(PLC) database, and develop a web application to access the data. The new design changesimplemented
to the PLC Database would provide a more efficient data system both for reporting and retrieving data
derived from pollution discharges into the Baltic Sea.
1.1. Purpose
This Functional Database Design document provides detailed infomation of the PLC data model
implemented to support the functional requirements for HELCOM PLUS target database management
system with consideration to the system’s performance requirements.
The document describes, how the database that will support the [Application] Data Model with details of
the logical and physical definitions.The document provides the functional and non-functional usage of the
tables, considerations and requirements.
Further, the document would briefly describe the integration aspects of the Database with the Web
Application . The Web Application would provide the users with easy access to PLC data.
1.2. Scope, Approach and Methods
The Database Design for the [Application] is composed of definitions for database objects derived by
mapping entities to tables, attributes to columns, unique identifiers to unique keys and relationships to
foreign keys.
During design, these definitions may be enhanced to in order to support the requirements of the PLUS
application listed in the Requrements Traceability Matrix .
The document shall also describe the database changes pertaining to the requirements listed in the
Requrements Traceability Matrix and also briefly describe, how the specific requirements will be
designed and implemented structurally in the database.
1.3. System Overview
System Overview Details
System name HELCOM PLUS
System type Client Server Application
Operational status In development
Database Name PLC Database
1.4. Acronyms and Abbreviations
Database Design Document 5 <Month> <Year>
Template Version 1.1 (remove prior to publication)
Acronym / Abbreviation Meaning
HELCOM Helsinki Commission
PLUS Pollution Load User System
PLC Pollution Load Compilation
DBA Database Administrator
1.5. Points of Contact
Identify the points of contact that may be needed for informational purposes.
Role Name Email Telephone
Project Manager Sriram Sethuraman
[email protected] Data Manager Pekka Kotilainen
[email protected] System Marco Manzi
[email protected] Specialist
Database
Administrator
Table 1: POC Contact Information
2. System Overview
The diagram shown below indicates the Data Flow Diagram for the PLUS Application. As one can see,
the National Data Reporters use the Web Application to input the data to the PLC Database. This is done
using a standard reporting template in the form of an Excel file. The data passes through a set of QA
processes, before it is finally available in the database. The QA process involves a series of steps ,
including data verification from National QA’s, and providing estimates for data gaps At the end of the
QA process, the data is finally approved.
As for the visualization aspect, the approved data is made available to the end users ( NGO’s, scientific
institutes, decistion makers etc.) in the form of tables, graphs and reports. Users can access the data via
web interface from a public URL accessible via the HELCOM website.
6
Figure 1. Data Flow Diagram
2.1. Quality Assurance Process
The PLUS Application will provide a Quality Assurance system to ensure a minimum level of quality to
the reported data. The diagram shown below indicates the various stages of the QA process. The QA
Level 0 will involve manual format and content verification by the national experts, before reporting the
data.As shown in the figure, QA level 1 will verify automatically the format and conformity of the data
with the database structure (logical schema). QA level 2 will verify the content for questionable data
values, meaning possible outliers or other values which could be potentially incorrect. QA level 3 will
provide the National Data Reporters with the option of manually verifying, correcting and approving the
data. QA level 4 will involve in a similar fashion the verification from National Quality Assurers,
including the final approval of data to be used for assessments and made accessible to the public.
7
Figure 2. QA Process
2.2 Database Software Utilities
Identify any utility software that will be used to support the use or maintenance of the database.
Vendor Product Version Comments
Microsoft MS-SQL Server Standard Edition 2012 Database
Management System
Table 2: Database Software Utilities
3. Database Specifications
3.1. Physical Design
Below is the enity relationship diagram, which shows the physical design of the database
8
Figure 3. Entity Relationship Diagram
9
3.2. Physical Structure
The excel attached here provides the exact physical structure of the database, with the tables, relationships
and description of the tables and fields.
4. Database Design and Functionalities
4.1. Design & Functional Support
The database has been designed meet the below listed functional requirements.
A separate Use Case document is available under in the meeting portal the below location for
all the Functional requirements listed below.
Data retrieval
Requirement Id # 2 – Priority - High
Easy retrieval of information from new PLC Database about catchments, stations, point
sources (in order to check with my national information)
Information regarding catchments is stored in the tables TBL_RIVER_CATCHMENT and
TBL_SUBCATCHMENT. These include border and transboundary rivers.
Information regarding monitoring stations is found in the table TBL_STATION. A subcatchment
can be linked to 0 (when sea or coastal area) or more stations, but from 0 (unmonitored area) to 1
station can be active in a subcatchment during a period of time.
Information about point sources is contained in TBL_POINT_SOURCE. Point sources, during a
reporting period, can be either located in a monitored subcatchment (in which case they are linked
in a many-to-many relationship with TBL_SUBCATCHMENT and TBL_PERIOD), or they can
be direct, in this case belonging to a sub-basin for a specific country.
The information and related metadata can be easily obtained by querying the database. For more
details see Section 3.3 on the list of information pertaining to structural implementation.
The river catchment table contains the name of the river, type of river (country, boundar or
transboundary) and the coordinates of the river mouth. These are necessary to identify the country
where the lowest (i.e. closest to the sea) monitoring station is established.
10
Nomenclature consistency over time for the established sub-basins (e.g. Baltic Proper,
Kattegat, Gulf of Finland)
The naming of the sub-basins are specified according to the definitions contained in the PLC-6
Guidelines, as shown below
Sub-basins Abbreviation
1. GULF of BOTHNIA GUB
1.1 Bothnian Bay BOB
The Quark
1.2 Bothnian Sea BOS
1.3 Archipelago Sea ARC
2. GULF of FINLAND GUF
3. GULF of RIGA GUR
4. BALTIC PROPER BAP
4.1 Northern Baltic Proper BPN
Western Gotland Basin
Eastern Gotland Basin
4.2 Southern Baltic Proper BPS
Gulf of Gdansk
Bornholm Basin
Arkona Basin
5. BELT SEA and KATTEGAT BSK
5.1 Belt Sea BES
5.1.1 Western Baltic WEB
5.1.2 The Sound SOU
5.2 The Kattegat KAT
The database stores the codes (abbreviations) for each sub basin as shown in the above table.
Requirement Id # 4 – Priority- High
Easy retrieval of information on a point source from PLC Database, even though its name
has changed
The database stores relevant information with regard to point sources in the
TBL_POINT_SOURCE and in the tables TBL_INDUSTRY, TBL_MWWTP and
TBL_FISH_FARM.
A point source is primarily identified using the PLANT_CODE - which is a combination of the
Point source type (Fish Farm (F), Municipal Waste water (M), Industrial Waste (I)) + country
code + unique id number - as well as the PERIOD_ID, in order to identify changes in relevant
data. These data include, among others, the name of the point source.
As such, it is possible to retrieve information on the point source even if the name of the plant has
changed.
11
Requirement Id # 5 –Priority- Medium
Ablility to check historical (previous) point sources from PLC Database
The table TBL_POINT_SOURCE contains the following fields:
ACTIVITY_START_DATE (Start date for the monitoring activity related to a point source) and
ACTIVITY_END_DATE (End date for the monitoring activity related to a point source).
When a point source activity is not relevant for monitoring purposes (low emissions), or the outlet
is closed, the old data related to a previously entered point source will still be available, as it is
stored in the database.
When a point source is “reopened” (or parameter-specific loads need to be monitored again) on a
certain date, this date becomes the new ACTIVITY_START_DATE value, and the
ACTIVITY_END_DATE is reset to NULL. This information, together with the PERIOD_ID,
allows to track the relevant activities of a point source in time.
Requirement Id # 7-Priority-Medium
Able to calculate the normalized flow and loads based on aggregated data
The PLC database provides the load and flow data required in order to perform the flow
normalization calculations. Such Data is stored in the tables collecting load, flow and
concentration values
VAL_SUBCATCHMENT_LOAD
VAL_STATION_FLOW_CONCENTRATION
For more detailed information, please refer to the PLC Database Structure defined in 3.2
The normalized flows are calculated using the techniques provided in the PLC guidelines.
For more information, please see Section 5.3 of the below document
http://helcom.fi/Lists/Publications/BSEP128.pdf
Requirement Id # 9 – Priority - Medium
Possibility to modify the content of the database
It will be possible to modify the data in the database using the Edit option available in the new
web application. However, modifications can be carried out according to the respective user
rights and the QA assurance process which is described in requirement Id #30.
National Data Reporters and National Quality Assurers will have the credentials to edit their
respective national data, with the latter having higher privileges. As such, a Data Reporter cannot
modify data which has been previously approved by the Quality Assurer. The PLC Data Manager
will have unrestricted access to the database, with the possibility to modify all the content as need
arise.
12
For the detailed description of tables and fields, please refer to the database specification
document on section 3.2.2
Requirement Id # 10 –Priority - Low
Possibility to modify the structure of the database
Modification of database structure will be analyzed for costs and impact to the PLUS application,
and if approved, handled as a PLUS Change request or in a separate maintenance release. This is
due to the fact that structural changes to the database will affect also the web application.
Requirement Id # 11-Priority - High
Contracting Party to be able to upload data (partial data, or complete annual set) directly
into the database
Data upload (partial and/or a complete dataset) to the PLC database is possible via upload
operation. The upload operation will be performed using the web application, resulting in a
predefined sequence of INSERT, UPDATE or DELETE operations to the PLC Database.
Aannual reports can be uploaded in Excel format via the web application user interface by the
national data reporter. The specifications and detailed instructions on how to fill in the Excel
reporting file correctly will be available on the updated PLC-6 Guidelines.
Data which have been reported in the correct format and within the constraints described in the
quality assurance process, will be either inserted anew in the PLC database, update existing
records, or otherwise marked as rejected, to be deleted manually from the web application at later
stage by the Data Reporter him/herself, or the Quality Assurer, after verification.
Please refer Req Id 11 in the Use case document.
Requirement Id# 12 – Priority - High
Able to add comments on data, when the data is missing or questionable.
The PLC database includes a Quality Assurance mechanism (Requirement #30) to ensure that the
data entered has at least a certain level of quality and reliability.
The definition of missing data is related to data which is considered mandatory from the PLC
Guidelines, but for some reason hasn’t been yet provided by the Contracting Party(ies). This is in
opposition to “Not Available” data (N/A), which is instead coded as NULL in the database. It is
possible for the national experts, upon consultation with the Data Manager, to modify the flag of
missing data as “Not Available”, in case these cannot be provided, together with the reason why.
In this way, national reporters won’t be requested to fill in missing data, which they are not able
to provide for some particular reason.
For missing data, the information is stored in the field DATA_STATUS_FLAG_ID. This is
available in the following tables:
VAL_SUBCATCHMENT_LOAD
VAL_STATION_FLOW_CONCENTRATION
VAL_INDUSTRIAL_FLOW_LOAD
VAL_FISH_FARM_LOAD
VAL_MUNICIPAL_FLOW_LOAD
13
TBL_SOURCE_APPORTIONMENT
TBL_DIFFUSE_SOURCE
TBL_RETENTION
TBL_NATURAL_BACKGROUND
In addition to this, the user is able to add comments to the data when it is questionable.
This is managed in the PLC database using the following QA tables:
QA_LEVEL
QA_FLAG
QA_QUESTIONABLE_CATEGORY
QA_QUESTIONABLE_FLAG
QA_NOTE
The QA_FLAG and QA_NOTE tables are related to the different load, concentration and flow
tables using a QA_FLAG_ID and a QA_NOTE_ID. Data reporters and Quality Assurers, as well
as the Data Manager, are then able to add comments for questionable and missing data in the
load, flow and concentration tables.
For more details on the table relationships, please refer the data model in Section 3.2
Requirement Id# 13 – Priority - High
Able to modify previously entered data in the database
The users shall be able to edit the previously entered data using the edit option in the web
application. This would translate to an update query on the database.
The users shall be allowed to edit only the data that he or she is authorized to, depending on his or
her nationality, role, and/or respective user rights. For more details on user privileges, please see
section 4.2 (User Management).
Requirement Id# 15 – Priority - Medium
Able to report different unmonitored areas or monitoring stations according to different
parameters (e.g. total N, NO23-N, Ni, discharge etc.)
NOTE: Varying unmonitored areas of subbasin have been added to the structure in order
to allow reporting of varying areas by parameter. Testing of the structure is going on.
Requirement Id# 19 – Priority - Medium
Allow reporting of data based on individual point sources including their coordinates.
Point sources are reported via the table TBL_POINT_SOURCE, which includes a field
PLANT_TYPE to identify the type of point source. The point sources can be in this way
categorized into one of the following:
I = Industry,
M = Municipal wastewater treatment plant
F = Fish farm.
14
Specific information related to these plant types are collected in
TBL_INDUSTRY, TBL_MWWTP and TBL_FISH_FARM,
respectively.
The coordinates of a point source can be provided by the data
reporters through the fields PS_LAT (Point source Latitude) and
PS_LON (Point source Longitude), when the point sources are
reported as individual (and when it is allowed under national
legislation). Normally the coordinates would indicate the location of
the outlet, except in the Russian case, where these indicate the city
(or municipality) where the point source is located.