Skip to content

Commit

Permalink
Added minutes from the 5th meeting.
Browse files Browse the repository at this point in the history
  • Loading branch information
PeterKraus committed May 31, 2023
1 parent 18e3e31 commit 369fbbe
Showing 1 changed file with 94 additions and 0 deletions.
94 changes: 94 additions & 0 deletions meetings/5-meeting-2023-05-30/minutes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# Progress meeting 5

30/05/2023

## Agenda

- Intro from Matthew Evans
- Overview of NOMAD Parsers Lauri Himanen
- Demo of API from Matthew Evans

## Minutes

### MaRDA WG status summary - Matthew Evans
- Quick summary of MaRDA Extractors WG
- Overview of the 3 repos, incl. [schema](https://github.com/marda-alliance/metadata_extractors_schema), [registry](https://github.com/marda-alliance/metadata_extractors_registry), and [api](https://github.com/marda-alliance/metadata_extractors_api)
- Overview of the current `Filetypes` and `Extractors` in the registry

**See also [Demo of API](#demo-of-api-videos) videos below!**

### NOMAD Parsers - Lauri Himanen

1. What is NOMAD:
- RDM platform
- covers simulations, experiments, workflows
- funded by German NFDI
- [nomad-lab.eu](nomad-lab.eu) is a freely available, open, central repository
- NOMAD Oasis is a self-hosted instance
2. Getting started with NOMAD:
- `upload` files
- add / edit using ELN interface
- publish to get a DOI
3. Parser infrastructure in NOMAD:
- act on uploaded files and turn them into `entries`
- `entries` can be searched, analysed, have a known structure -> implies a `schema`
- each parser has to define its own `schema`
- NOMAD has it's own Pydantic-like schema language called NOMAD metainfo
- parsers are triggered on upload of a file, matching by using:
- file extension
- file mimetype
- file contents (e.g. header)
- one file is usually one `entry`, but sometimes one file is many `entries`
- reading of auxilliary files in an upload is handled by the parser
4. Parser plugins
- basic NOMAD has ~60 parsers pre-installed, mostly for electronic structure calculations
- defining custom parsers is possible via a `plugin` mechanism
- `plugins` may be integrated into the central service after a review
- plugins have to have:
- a schema definition in a specified location
- the parsing code and file matching logic in a specified location
- the general `schema` can be extended by the `parser`
- the infrastructure is using a lot of regex to perform matching of quantities and filetypes
- parsing is performed by passing the path of the file
- `nomad.yaml`: a configuration file for the plugin

### Q/A:
- Peter: About auxiliary files - must be uploaded together, or can NOMAD ask for them?
- Lauri: They must be uploaded together. Their usage may be documented in the README. The parser can emit a debug/log message, but the user is responsible for uploading all files together.

---
- Nicolas: Use of regex on plaintext files does not sound efficient. Is this really the best way? Wouldn't it be better to fix QM codes upstream?
- Lauri: Some QM codes are moving away from text files, but progress is slow. There is also the important issue of legacy QM data, which has to be addressed somehow.

---
- Peter: Are you aware of QCSchema?
- Lauri: No, not yet.

---
- Matthew: Overlap with MaRDA WG. How would we go about validating plugins?
- Lauri: The plugin mechanism in NOMAD is very new. Registry and an authority marking/reviewing plugins would be useful.

---
- Matthew: How about sandboxing plugin code?
- Lauri: Sandboxing is tricky, as on an instance, it's a question for the Oasis (instance) admin


### Open Discussion:
- Matthew: Suggestion to skip next months (July) meeting in favour of writing & working.
- Peter: Agreed.

---
- Steffen: Multiple people have different goals. Focusing on a single extractor for a single filetype is perhaps too ambitious.
- Matthew: Yes, this is currently the goal of the WG.

---
- Steffen: Focus should be on getting more examples. This requires making the parser submission process to be simple and comfortable enough even for lazy people...
- Peter: A website frontend to avoid boilerplate is on the TODO list, however, manpower is a problem.

---
- Steffen: Review process of extractors should be currently very streamlined, as long as things don't overwrite other people's work, we should allow things in.
- Matthew: Sandboxing at some level might be necessary, as otherwise it's a big safety issue.

### Demo of API videos:
- [API Intro and `biologic-mpr` example](https://drive.google.com/file/d/10v6uE6By0bm3Z2Sfno1ULM6I8itX75uF/view?usp=drive_web)
- [Parsing `agilent-dx` using API](https://drive.google.com/file/d/1XeTYR14dJORUGIZnYzoc1FRM0eWUNSVu/view?usp=drive_web)

0 comments on commit 369fbbe

Please sign in to comment.