Data Model#
Modules in the data_model
sections provide functionality for reading, writing, and validation of data.
Data products ingested or produced by simtools generally follows the CTAO data model.
data_reader#
Helper module for reading of standardized simtools data products.
- data_model.data_reader.read_table_from_file(file_name, schema_file=None, validate=False, metadata_file=None)[source]#
Read astropy table from file and validate against schema.
Metadata is read from metadata file or from the metadata section of the data file. Schema for validation can be given as argument, or is determined from the metadata associated to the file.
- Parameters:
- file_name: str or Path
Name of file to be read.
- schema_file: str or Path
Name of schema file to be used for validation.
- validate: bool
Validate data against schema (if true).
- metadata_file: str or Path
Name of metadata file to be read.
- Returns:
- astropy Table
Table read from file.
- Raises:
- FileNotFoundError
If file does not exist.
- data_model.data_reader.read_value_from_file(file_name, schema_file=None, validate=False)[source]#
Read value from file and validate against schema.
Expect data to follow the convention for how simulation model parameters are stored in the simulation model database: to be a single value stored in the ‘value’ field (with possible units in the ‘units’ field). Metadata is read from metadata file or from the metadata section of the data file. Schema for validation can be given as argument, or is determined from the metadata associated to the file.
- Parameters:
- file_name: str or Path
Name of file to be read.
- schema_file: str or Path
Name of schema file to be used for validation.
- validate: bool
Validate data against schema (if true).
- Returns:
- astro quantity or str
Value read from file. If units are given, return an astropy quantity, otherwise a string. Return None if no value is found in the file.
- Raises:
- FileNotFoundError
If file does not exist.
format_checkers#
Custom format checkers for jsonschema validation.
- data_model.format_checkers.check_array_element(element)[source]#
Validate array elements for jsonschema.
- data_model.format_checkers.check_array_triggers_name(name)[source]#
Validate array trigger names for jsonschema.
- data_model.format_checkers.check_astropy_unit(unit_string)[source]#
Validate astropy units (including dimensionless) for jsonschema.
- data_model.format_checkers.check_astropy_unit_of_length(unit_string)[source]#
Validate astropy units that this is an astropy unit of length.
metadata_collector#
Metadata collector for simtools.
This should be the only module in simtools with knowledge on the implementation of the observatory metadata model.
- class data_model.metadata_collector.MetadataCollector(args_dict=None, metadata_file_name=None, model_parameter_name=None, observatory='cta', clean_meta=True, schema_version='latest')[source]#
Collects metadata to describe the current simtools activity and its data products.
Collect metadata from command line configuration, input data, environment, and schema descriptions. Depends on the CTAO top-level metadata definition.
Two dictionaries store two different types of metadata:
top_level_meta: metadata for the current activity
input_metadata: metadata from input data
- Parameters:
- args_dict: dict
Command line parameters
- metadata_file_name: str
Name of metadata file (only required when args_dict is None)
- model_parameter_name: str
Name of model parameter
- observatory: str
Name of observatory (default: “cta”)
- clean_meta: bool
Clean metadata from None values and empty lists (default: True)
- schema_version: str
Version of the metadata schema to use (default: ‘latest’)
- clean_meta_data(meta_dict)[source]#
Clean metadata dictionary from None values and empty lists.
- Parameters:
- meta_dict: dict
Metadata dictionary.
- static dump(args_dict, output_file, add_activity_name=False)[source]#
Write metadata to file (static method).
- Parameters:
- args_dict: dict
Command line parameters
- output_file: str or Path
Name of output file.
- add_activity_name: bool
Add activity name to file name.
- get_data_model_schema_dict()[source]#
Return data model schema dictionary.
- Returns:
- dict
Data model schema dictionary.
- get_data_model_schema_file_name()[source]#
Return data model schema file name.
The schema file name is taken (in this order) from the command line, from the metadata file, from the data model name, or from the input metadata file.
- Returns:
- str
Name of schema file.
- get_site(from_input_meta=False)[source]#
Get site entry from metadata. Allow to get from collected or from input metadata.
- Parameters:
- from_input_meta: bool
Get site from first entry of input metadata (default: False)
- Returns:
- str
Site name
- get_top_level_metadata()[source]#
Return top level metadata dictionary (with updated activity end time).
- Returns:
- dict
Top level metadata dictionary.
- write(yml_file=None, keys_lower_case=True, add_activity_name=False)[source]#
Write toplevel metadata to file (yaml file format).
- Parameters:
- metadata: dict
Metadata to be stored
- yml_file: str
Name of output file.
- keys_lower_case: bool
Write yaml keys in lower case.
- add_activity_name: bool
Add activity name to file name.
- Returns:
- str
Name of output file
- Raises:
- FileNotFoundError
If yml_file not found.
- TypeError
If yml_file is not defined.
metadata_model#
Definition of metadata model for input to and output of simtools.
Follows CTAO top-level data model definition.
data products submitted to SimPipe (‘input’)
data products generated by SimPipe (‘output’)
- data_model.metadata_model.get_default_metadata_dict(schema_file=None, observatory='CTA', schema_version='latest', lower_case=True)[source]#
Return metadata schema with default values.
Follows the CTA Top-Level Data Model.
- Parameters:
- schema_file: str
Schema file (jsonschema format) used for validation
- observatory: str
Observatory name
- schema_version: str, optional
Version of the schema to use. If not provided, the latest version is used.
- lower_case: bool, optional
If True, all keys in the returned dictionary will be converted to lower case.
- Returns:
- dict
Reference schema dictionary.
model_data_writer#
Model data writer module.
- class data_model.model_data_writer.ModelDataWriter(product_data_file=None, product_data_format=None, output_path=None, args_dict=None)[source]#
Writer for simulation model data and metadata.
- Parameters:
- product_data_file: str
Name of output file.
- product_data_format: str
Format of output file.
- args_dict: Dictionary
Dictionary with configuration parameters.
- output_path: str or Path
Path to output file.
- args_dict: dict
Dictionary with configuration parameters.
- check_db_for_existing_parameter(parameter_name, instrument, parameter_version, db_config)[source]#
Check if a parameter with the same version exists in the simulation model database.
- Parameters:
- parameter_name: str
Name of the parameter.
- instrument: str
Name of the instrument.
- parameter_version: str
Version of the parameter.
- db_config: dict
Database configuration.
- Raises:
- ValueError
If parameter with the same version exists in the database.
- static dump(args_dict, output_file=None, metadata=None, product_data=None, validate_schema_file=None)[source]#
Write model data and metadata (as static method).
- Parameters:
- args_dict: dict
Dictionary with configuration parameters (including output file name and path).
- output_file: string or Path
Name of output file (args[“output_file”] is used if this parameter is not set).
- metadata: MetadataCollector object
Metadata to be written.
- product_data: astropy Table
Model data to be written
- validate_schema_file: str
Schema file used in validation of output data.
- static dump_model_parameter(parameter_name, value, instrument, parameter_version, output_file, output_path=None, metadata_input_dict=None, db_config=None, unit=None, meta_parameter=False)[source]#
Generate DB-style model parameter dict and write it to json file.
- Parameters:
- parameter_name: str
Name of the parameter.
- value: any
Value of the parameter.
- instrument: str
Name of the instrument.
- parameter_version: str
Version of the parameter.
- output_file: str
Name of output file.
- output_path: str or Path
Path to output file.
- metadata_input_dict: dict
Input to metadata collector.
- db_config: dict
Database configuration. If not None, check if parameter with the same version exists.
- unit: str
Unit of the parameter value (if applicable and value is not of type astropy Quantity).
- Returns:
- dict
Validated parameter dictionary.
- get_validated_parameter_dict(parameter_name, value, instrument, parameter_version, unique_id=None, schema_version=None, unit=None, meta_parameter=False)[source]#
Get validated parameter dictionary.
- Parameters:
- parameter_name: str
Name of the parameter.
- value: any
Value of the parameter.
- instrument: str
Name of the instrument.
- parameter_version: str
Version of the parameter.
- schema_version: str
Version of the schema.
- unique_id: str
Unique ID of the parameter set (from metadata).
- unit: str
Unit of the parameter value (if applicable and value is not an astropy Quantity).
- meta_parameter: bool
Setting for meta parameter flag.
- Returns:
- dict
Validated parameter dictionary.
- static prepare_data_dict_for_writing(data_dict)[source]#
Prepare data dictionary for writing to json file.
Ensure sim_telarray style lists as strings ‘type’ and ‘unit’ entries. Replace “None” with “null” for unit field. Replace list of equal units with single unit string.
- Parameters:
- data_dict: dict
Dictionary with lists.
- Returns:
- dict
Dictionary with lists converted to strings.
- validate_and_transform(product_data_table=None, product_data_dict=None, validate_schema_file=None, is_model_parameter=False)[source]#
Validate product data using jsonschema given in metadata.
If necessary, transform product data to match schema.
- Parameters:
- product_data_table: astropy Table
Model data to be validated.
- product_data_dict: dict
Model data to be validated.
- validate_schema_file: str
Schema file used in validation of output data.
- is_model_parameter: bool
True if data describes a model parameter.
schema#
Module providing functionality to read and validate dictionaries using schema.
- data_model.schema.get_model_parameter_schema_file(parameter)[source]#
Return schema file path for a given model parameter.
- Parameters:
- parameter: str
Model parameter name.
- Returns:
- Path
Schema file path.
- data_model.schema.get_model_parameter_schema_files(schema_directory=PosixPath('/opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/simtools/schemas/model_parameters'))[source]#
Return list of parameters and schema files located in schema file directory.
- Returns:
- list
List of parameters found in schema file directory.
- list
List of schema files found in schema file directory.
- data_model.schema.get_model_parameter_schema_version(schema_version=None)[source]#
Validate and return schema versions.
If no schema_version is given, the most recent version is provided.
- Parameters:
- schema_version: str
Schema version.
- Returns:
- str
Schema version.
- data_model.schema.get_schema_version_from_data(data, observatory='cta')[source]#
Get schema version from data dictionary.
- Parameters:
- data: dict
data dictionary.
- Returns:
- str
Schema version. If not found, returns ‘latest’.
- data_model.schema.load_schema(schema_file=None, schema_version='latest')[source]#
Load parameter schema from file.
- Parameters:
- schema_file: str
Path to schema file.
- schema_version: str
Schema version.
- Returns:
- schema: dict
Schema dictionary.
- Raises:
- FileNotFoundError
if schema file is not found
- data_model.schema.validate_dict_using_schema(data, schema_file=None, json_schema=None, ignore_software_version=False)[source]#
Validate a data dictionary against a schema.
- Parameters:
- data
dictionary to be validated
- schema_file (dict)
schema used for validation
- json_schema (dict)
schema used for validation
- ignore_software_version: bool
If True, ignore software version check.
- Raises:
- jsonschema.exceptions.ValidationError
if validation fails
validate_data#
Validation of data using schema.
- class data_model.validate_data.DataValidator(schema_file=None, data_file=None, data_table=None, data_dict=None, check_exact_data_type=True)[source]#
Validate data for type and units following a describing schema; converts or transform data.
Data can be of table or dict format (internally, all data is converted to astropy tables).
- Parameters:
- schema_file: Path
Schema file describing input data and transformations.
- data_file: Path
Input data file.
- data_table: astropy.table
Input data table.
- data_dict: dict
Input data dict.
- check_exact_data_type: bool
Check for exact data type (default: True).
- validate_and_transform(is_model_parameter=False, lists_as_strings=False)[source]#
Validate data and data file.
- Parameters:
- is_model_parameter: bool
This is a model parameter (add some data preparation)
- lists_as_strings: bool
Convert lists to strings (as needed for model parameters)
- Returns:
- data: dict or astropy.table
Data dict or table
- Raises:
- TypeError
if no data or data table is available
- validate_data_file(is_model_parameter=None)[source]#
Open data file and read data from file.
Doing this successfully is understood as file validation.
- Parameters:
- is_model_parameter: bool
This is a model parameter file.