Workspace typed objects¶
The Workspace Service (WSS) provides storage, sharing, versioning, validation and provenance tracking of typed object (TO) data. This document describes basic information for developers who need to define and register TOs for use with the WSS.
Typed object basics¶
TOs in the WSS are hierarchical data objects that conform to type definitions specified in the KBase Interface Description Language (KIDL). Just as KIDL is used to specify the structure of data exchanged between KBase clients and servers (generated by the Type Compiler), KIDL is used to define the structure of data stored in the WSS.
Any structures defined in a KIDL formatted file
(e.g. typedef structure { … } StructureName
;) can be registered with the
WSS (see Typed object registration & versioning). Instances of these objects can then be
saved to the WSS by any user. The WSS does not support storage of primitive or
basic container types directly (ie string, int, float, list, mapping).
KIDL defined Modules provides namespacing for typed objects. Thus, the module
name and type name is required to uniquely identify a type in the WSS,
generally in the format ModuleName.TypeName
.
Typed object registration & versioning¶
TO definitions must be registered with the WSS before instances of the TOs can be saved. The basic process for registering a TO is:
Developer requests ownership of a module name via the Workspace API
see API method
request_module_ownership(...)
WSS admin approves the request
Developer uploads (i.e. registers) a type specification file (typespec) in KIDL format where the module name is identical to the just approved module name in the WSS and indicates the names of the TOs which the developer wants the WSS to support
see API method
register_typespec(...)
Developer releases the module, which releases the latest version of all TO definitions in the module
see API method
release_module(...)
TO definitions marked for WSS usage are versioned with a major and minor version. Every time a new typespec is uploaded and registered, the TO definitions defined in the module automatically receive a new version number if changed. Minor versions are incremented if the change is backwards compatible (i.e. addition of a new optional field). Major versions are incremented if the change is not backwards compatible.
All versions of all registered TO definitions are available to WSS users, but to save an object instance of an old version, or an unreleased version, the exact version number must be provided by the user. If a WSS user saves an object instance without providing version numbers for the type, the latest released version of the TO definition is assumed. The process of releasing a module therefore indicates that the latest version of all typed object definitions in the module are ready for public use, but does not limit user’s or developer’s ability to work with old or pre-released versions of TOs.
Before the first release of a module, repeated uploads of a module result in version numbers of TO definitions of 0.x and are assumed to be backwards incompatible. On first release of a module, all version numbers of TO definitions are updated to 1.0.
Users and developers can use the ws-typespec-list
script or the API to list
registered modules, type definitions, and versions of type definitions, and to
retrieve the actual KIDL or JSON schema encoding of the typed object
definition. End users will only be able to view the versions of TOs
that are released. Owners of a module can list all versions of TOs
in modules that they own.
Typed object validation¶
Instances of TOs can be validated against type definitions registered with the WSS. Instances of TOs must pass this validation process to be stored in the WSS, thereby guaranteeing that WSS data is structurally valid.
Todo
Update this document to use the kb-sdk tools.
The WSS validates the TO instance against a
JSON Schema V4 encoding of the
TO definition. The JSON Schema encoding can be generated by the KBase Type
Compiler (currently in branch dev-prototypes
). In addition to matching the
structure and type of data, additional constraints can be placed on TO
validation through the use of Annotations (see Typed object annotations).
To generate JSON encodings of your TOs for review, checkout the
dev-prototypes
branch of the typecompiler and compile your typespec file
with the --jsonschema
option of the compile_typespec
command. The JSON
Schema encoding of each object definition is generated in the output location
in a directory called jsonschema. The JSON Schema encoding is also available
for all registered TO definitions via the WSS API or the ws-typespec-list
command.
All TO instances pulled from the WSS are guaranteed to be valid instances of a TO definition as registered with the WSS. Therefore it is recommended that KBase services which require rigorous validation of complex data operate on data stored in the WSS (as opposed to passing the object by value and writing the validation code yourself). Note that full validation is not built into generated KBase client/server code, so it is not safe to assume that input data received directly from a type compiler generated client conforms to the specified type definitions in your API.
Additional technical details: The TO validation code is written in Java and is available in the workspace_deluxe KBase repo.
Typed object annotations¶
Annotations provide an infrastructure for attaching structured meta data to type definitions (and eventually to functions and modules). Such meta data is useful for specifying additional constraints on data types, interpreting data types within a particular context, and declaring structured information that can later be automatically indexed or searched, such as authorship of a function implementation.
Annotations are declared in the comment immediately preceding the definition of the TO. Thus, all annotations are always attached and viewable within the API documentation. Each annotation must be specified on its own line in the following format:
@[ANNOTATION] [INFO]
where [ANNOTATION]
is the name of the annotation and [INFO]
is any
additional information, if any, required of the annotation. To provide a simple
example which associates authorship information to a TO using the @author
annotation:
/*
Data type for my experimental data.
@author John Scientist
*/
typedef structure {
string name;
list <int> results;
} MyExperimentData;
Currently supported type definition annotations¶
Optional annotation¶
Mark a specific field of a structure as an optional field. The optional annotation can only be declared where a structure is first defined. On validation of TO instances by the WSS, missing optional fields are permitted. If an optional field is present, however, the value of the field will be validated normally. Optional fields are defined as:
@optional [FIELD_NAME_1] [FIELD_NAME_2] ...
For example, the following annotation will declare that two fields of the structure are optional.:
/*
@optional alias functional_assignments
*/
typedef structure {
string name;
string alias;
string sequence;
list <string> functional_assignments;
} Feature;
ID annotations¶
Mark a string as an ID that references another object or entity. ID annotations can only be associated to type definitions which resolve to a string. ID annotations are declared in the general form:
@id [ID_TYPE] [PARAMETERS]
where [ID_TYPE]
specifies the type of ID and is required, and
[PARAMETERS]
provides additional information or constraints.
[PARAMETERS]
are always optional.
ID annotations are inherited when declaring a new typedef
of a string that
was already marked as an ID. If a new ID Annotation is declared in a
typedef
, it overrides any previous ID declaration.
Note that although @id
annotations may be specified as any ID_TYPE
and
associated to any typedef
, applications that consume type specifications
(primarily the workspace at the time of writing) may only recognize specific
@id
ID_TYPE
/ typedef
combinations.
The ID types currently supported are described below.
Workspace ID
@id ws [TYPEDEF_NAME] ...
The ID must reference a TO instance stored in the WSS. There are multiple valid ways to specify a workspace object, and all are acceptable. A reference path into the object graph may be provided by providing a string consisting of a list of references separated by semicolons.
Optionally, one or more type definition names can be specified indicating that the ID must point
to a TO instance that is one of the specified types. The typedef with which the
@id
annotation is associated must be a string.
Example:
/*
A reference to a genome.
@id ws KB.MicrobialGenome KB.PlantGenome
*/
typedef string genome_id;
KBase ID
@id kb
This annotation originally specified that the string must be a KBase ID which was typically registered in the ID service in a format such as “kb|type.XXX”. The ID server is no longer used in KBase and this field doesn’t have any particular meaning at this point.
No type checking on this field is performed, but the annotation may be used in the future so that users can automatically extract KBase IDs from typed objects.
Handle ID
@id handle
The ID must reference a Handle ID from the Handle Service. This is typically in the format KBH_XXX. When saving an object containing one or more handles to the WSS, the WSS checks that the handles are owned by the user before completing the save. Furthermore, the handle data is shared as the workspace object is shared. See Shock integration with the workspace service for more details.
Shock ID
@id bytestream
The ID must reference a Shock node that exists in the Shock instance configured for linking Shock nodes to WSS objects. When saving an object containing one or more Shock nodes to the WSS, the WSS checks that the nodes are owned by the user or owned by the workspace and readable by the user and (if necessary) takes ownership of the nodes. Furthermore, the nodes are shared as the workspace object is shared. See Shock integration with the workspace service for more details.
Sample ID
@id sample
The ID must reference a Sample service sample. When saving an object containing one or more sample IDs to the WSS, the WSS checks that the samples are administrated by the user. Furthermore, the nodes are shared as the workspace object is shared. See Sample service integration with the workspace service for more details.
External ID
@id external [SOURCE] ...
The ID must reference an entity in an external (i.e. outside of KBase) data
store. The IDs are not verified or validated, but may be used in the future to
allow users to automatically extract external IDs from typed objects.
[SOURCE]
provides an optional way to specify the external source.
Currently there is no standard dictionary of sources.
Deprecated annotation¶
@deprecated [REPLACEMENT_TYPE]
The deprecated annotation is used to mark a type definition as deprecated, and provides a structured mechanism for indicating a replacement type if one exists. The deprecated annotation so far is only for documentation purposes, but may be used by the Workspace in the future to better display, list, or query workspace objects (e.g. list all objects of a type that is not deprecated).
Range annotation¶
@range [RANGE SPECIFICATION]
The range annotation is associated with either a float or int typedef and
specifies the minimum and / or maximum value of the int or float. The
[RANGE SPECIFICATION]
is a tuple of the minimum and maximum numbers,
separated by a comma. Omit the minimum or maximum to specify an infinite
negative or positive range, respectively. Bracketing the
[RANGE SPECIFICATION]
with parentheses indicates the range extents are
exclusive; square brackets or no brackets indicates an inclusive range.
Examples:
Range |
Explanation |
---|---|
0, 30 |
Range from 0 - 30, inclusive |
[0, 30] |
Range from 0 - 30, inclusive |
[0, 30) |
Range from 0 - 30, including 0, excluding 30 |
(0, |
Range from 0 - +inf, excluding 0 |
,30] |
Range from -inf - 30, including 30 |
Example specification:
/*
@range -4.5,7.6)
*/
typedef float my_float;
/*
@range [2,10]
*/
typedef int my_int;
Metadata annotation¶
@metadata [CONTEXT] [ACTION] [as NAME]
The metadata annotation specifies data that an application should extract from a TO as metadata about the TO. Typically this metadata is very small compared to the TO and is therefore suitable for use when only a summary of the TO is necessary for an operation. As of this writing, the WSS uses the annotation to automatically generate user metadata for a TO.
The metadata annotation may only be associated with structure
typedef
s. Metadata annotations on nested structure
s are ignored.
[CONTEXT]
specifies where the metadata annotation is applicable. In the
case of the WSS, the [CONTEXT]
is ws
. [CONTEXT]
is always required.
[ACTION]
specifies what metadata should be extracted and any operations
to perform on said metadata. At minimum, the [ACTION]
must provide the
path (dot separated) to the item of interest. Note that the path may only
proceed through structure
typedef
s, not mapping
s or list
s. A
bare path must terminate at a primitive type - either a string
, int
, or
float
.
[ACTION]
s may also specify a function to apply to the item specified by
the path. Currently, the only available function is length()
, which may be
applied to list
s, mapping
s, tuple
s, and string
s.
length()
returns the number of items in a list
, mapping
, or
tuple
, or the length of a string
.
[as NAME]
allows specifying an optional NAME
for the extracted
metadata. If a NAME
is not provided, the application will use the
[ACTION]
string as the metadata name. The NAME
is entirety of the
remainder of the line after “as”.
Example:
/* Nested structure, metadata annotations have no effect here
Cannot provide a path into the mapping in a metadata annotation
*/
typedef structure {
mapping<string, string> strmap;
int an_int;
} InnerStruct;
/*
Specifies the metadata ("str" -> value of str in TO)
@metadata ws str
Specifies the metadata ("my rad string" -> value of str in TO)
@metadata ws str as my rad string
Specifies the metadata ("inner.an_int" -> value of inner.an_int in TO)
@metadata ws inner.an_int
Specifies the metadata ("length(str)" -> length of str in TO)
@metadata ws length(str)
Specifies the metadata ("num strings" -> # of items in inner.strmap)
@metadata ws length(inner.strmap) as num strings
Note that metadata paths cannot enter outerstrmap.
*/
typedef structure {
InnerStruct inner;
string str;
mapping<string, string> outerstrmap;
} MyStruct;