Paging workspace objects¶
Workspaces can contain many objects, but the list_objects
command returns a maximum of
10,000 object_info
data structures (referred to as objects from now on) at a time. This
section of the documentation discusses how to handle the case where access to large amounts
of data is required.
If it is reasonable to expect that a list_objects
command will return no more than 10,000
objects then paging doesn’t need to be considered. For example if the command only encompasses
one workspace, only the most recent version of each object is required, and the max_objid
field in the workspace information returned by various API methods (for example
get_workspace_info
) is no more than 10,000, then a single list_objects
command can return
all the data requested. See Workspaces for more information about getting information
about workspaces and List objects for information about the list_objects
command.
There are currently two ways to page through workspace data - via object ID limits, and via reference limit.
Sort order¶
When paging via the two methods mentioned above, the objects are returned in a specific sort
order (but see the notes about ill-advised list_objects
parameters in
Paging by object ID limits):
the workspace ID, ascending
the object ID, ascending
the version, descending
This makes it easy to determine how many versions an object has the first time it is encountered in the list of returned data.
Object ID versus reference limits¶
Paging by object ID limits works well for paging through data in a single workspace where only the most recent version of each object is required. Its advantage over reference-based paging is that any range of object IDs can be accessed at any time, whereas in reference-based paging each set of objects returned depends on the reference of the last object in the prior set.
Paging by reference works well for any number of workspaces and when all object versions are required. The drawback is that data from the prior set of objects is required to get the next set.
Paging by object ID limits¶
To page by object ID limits, provide the minObjectID
and maxObjectID
parameters
to limit the object IDs, and thus the number of objects, returned in the result. Both parameters
are inclusive. maxObjectID - minObjectID
should always be <= 10,000. Setting a smaller
limit
can be done by specifying object ID parameters with a smaller difference.
Note that by default, hidden and deleted objects will not be returned in the set of objects.
For this reason, it is important to specify both a minimum and maximum object ID when
incrementing the minimum object ID by a set amount - otherwise
there may be duplicated objects in subsequent list_objects
calls.
Specifying that all object versions should be included in the results is not recommended, as
the limit
may be exceeded, and any excess objects will not be returned in the set.
Several parameters should be avoided when using the list_objects
command to page
through data with object ID limits:
Any of the time stamps (
after
,before
,after_epoch
, andbefore_epoch
)savedby
meta
Depending on the parameter set and the data to be listed, including any of these parameters may slow down the data lookup substantially and may cause the data not to be sorted.
The following example shows how to page through a large workspace <= 10,000 objects at a time:
In [3]: max_id = ws.get_workspace_info({'id': 19217})[4]
In [4]: max_id
Out[4]: 606146
In [5]: for i in range(0, max_id, 10000):
...: objs = ws.list_objects({
...: 'ids': [19217],
...: 'minObjectID': i + 1,
...: 'maxObjectID': i + 10000}
...: )
...: # do something with objs
If the result is sorted, an alternative is to omit the maximum object ID and generate the
minimum ID for each loop from the maximum object ID in the data from the prior loop. This
may result in fewer overall list_objects
calls, but has the disadvantage that getting each
data set depends on the prior set.
Paging by reference limit¶
To page by a reference limit, provide the startafter
parameter and optionally a limit
if less than 10,000 objects should be returned at a time. Paging with a reference limit is useful
for paging through multiple workspaces or object versions. The startafter
parameter looks
like a workspace object reference of the form X/Y/Z
where X
is the integer workspace ID,
Y
the integer object ID, and Z
the version. The version may be omitted, and and object ID
may be omitted if the version is omitted. To page through data, start by submitting only the
smallest workspace ID from the set of workspaces submitted to the list_objects
command and
the object ID of the starting object of interest (often 1). Construct further startafter
parameters from the last object in the list of returned objects.
The following list_objects
parameters are incompatible with the startafter
parameter
and will cause an error to be returned if used:
Any of the time stamps (
after
,before
,after_epoch
, andbefore_epoch
)savedby
meta
object ID limits (
minObjectID
andmaxObjectID
)
The following example shows how the paging proceeds, using a very small limit for clarity:
In [5]: def calc_startafter(objects):
...: if not objects:
...: return None
...: o = objects[-1]
...: return '/'.join([str(o[6]), str(o[0]), str(o[4])])
...:
In [6]: startafter = '1/1'
In [7]: while startafter:
...: objs = ws.list_objects({
...: 'ids': [1, 2],
...: 'startafter': startafter,
...: 'limit': 3,
...: 'showAllVersions': 1
...: })
...: for o in objs:
...: # do something more meaningful here
...: print(f'{o[6]}/{o[0]}/{o[4]}')
...: startafter = calc_startafter(objs)
...: print(f'startafter: {startafter}')
...:
1/1/3
1/1/2
1/1/1
startafter: 1/1/1
1/2/2
1/2/1
2/1/2
startafter: 2/1/2
2/1/1
startafter: 2/1/1
startafter: None
The last call to the workspace could be eliminated by checking that the number of objects
returned is less than limit
.