Discussion:
[Libstoragemgmt-devel] [RFC] Please review the first draft of Volume Replication(Snapshot, Clone, Mirror) API design.
Gris Ge
2014-04-01 08:48:44 UTC
Permalink
Hi Team,

In order to support volume snapshot, clone, mirror and etc replication
in LSM, I drafted this wiki for volume replication:
http://goo.gl/uHVVuC

In short:
* Class name: lsm.VolumeReplication
* Properties:
id, name, type, time_stamp, src_volume_id, tgt_volume_id,
sync_progress
* Terminology: (Based on SMI-S Copy Profile)[1]
UNSYNC_DELTA_RO
Read Only, delta based copy. AKA, Snapshot.
UNSYNC_DELTA_RW
Read write, delta based copy,
Example: NetApp clone or EMC Snapshot Mount Point.
UNSYNC_FULL_LOCAL
Read write, full copy, on local system.
Example: NetApp Copy or EMC VNX/CX SAN Copy, EMC VMAX/DMX Clone
UNSYNC_FULL_REMOTE
Read write, full copy, on remote system.
Example: EMC VNX/CX SAN Copy, EMC VMAX/DMX RClone
SYNC_LOCAL
Mirror data in local system. In sync mode(Block I/O until
both target and source committed the data).
Example: EMC VNX/CX SnapView Clone, EMC VMAX/CX BCV
SYNC_REMOTE
Mirror data in remote system. In sync mode.
Example: NetApp SnapMirror, EMC VNX/CX MirrorView, EMC VMAX/CX RDF
ASYNC_LOCALE
Mirror data in local system. In async mode(return I/O once
source volume committed the data, sync data with internval).
No storage array support this yet. Just place holder for
future.
ASYNC_REMOTE
Mirror data in remote system. In async mode(return I/O once
source volume committed the data, sync data with internval).
Example: EMC VNX/CX MirrorView, EMC VMAX/CX RDF/A
API user can translate them into their favorite terminologies.

* Methods:
lsm.Client.volume_replications()
lsm.Client.volume_replication_create()
lsm.Client.volume_replication_delete()
lsm.Client.volume_replication_resync()
lsm.Client.volume_replication_restore()
lsm.Client.volume_replication_failover()
lsm.Client.volume_replication_giveback()
lsm.Client.volume_replication_split()

On vendor implementation notes:
* NetApp ONTAP SDK
Only support UNSYNC_DELTA_RO, UNSYNC_DELTA_RW, UNSYNC_FULL_LOCAL,
SYNC_LOCAL.
Maybe if we are on cluster mode, we might be able to do more
remote replication, but I don't have such array or simulators yet.
* SMI-S
SMI-S 1.6 "Copy Service" profile is stable. We can base on that
first.
SMI-S 1.6 "Replication" profile is experimental, we only use them
for performance or feature coverage issue.
EMC SMI-S provider provides well support on this including the
experimental replication profile.
* Targetd.
LVM snapshot not support yet.
* Other plugin.
No time to investigate.

It's clearly my initial design is buggy and unreasonable.
Please kindly provide your feedback to improve this or maybe provide a
complete new one if my design sucks too much(It happens a lot).

Thank you in advance.

Best regards.

[1] I am tired on explaining what snapshot, mirror, clone means. So I
choose the names SNIA do. SNIA is actually using "snapshot, mirror,
clone" names in newer profile -- Replication Profile, but that was
experimental.
--
Gris Ge
Tony Asleson
2014-04-01 16:05:00 UTC
Permalink
Post by Gris Ge
Hi Team,
In order to support volume snapshot, clone, mirror and etc replication
http://goo.gl/uHVVuC
Thank you for this draft.
Post by Gris Ge
* Class name: lsm.VolumeReplication
Would VolumeAssociation or VolumeRelationship work better? Is this the
SNIA term?
Post by Gris Ge
id, name, type, time_stamp, src_volume_id, tgt_volume_id,
sync_progress
Should the tgt_volume we a list of targets? Don't some arrays allow you
to mirror to more than one destination? Or I guess we could just have
another instance of this class for each target too.
Post by Gris Ge
* Terminology: (Based on SMI-S Copy Profile)[1]
UNSYNC_DELTA_RO
Read Only, delta based copy. AKA, Snapshot.
UNSYNC_DELTA_RW
Read write, delta based copy,
Example: NetApp clone or EMC Snapshot Mount Point.
UNSYNC_FULL_LOCAL
Read write, full copy, on local system.
Example: NetApp Copy or EMC VNX/CX SAN Copy, EMC VMAX/DMX Clone
UNSYNC_FULL_REMOTE
Read write, full copy, on remote system.
Example: EMC VNX/CX SAN Copy, EMC VMAX/DMX RClone
SYNC_LOCAL
Mirror data in local system. In sync mode(Block I/O until
both target and source committed the data).
Example: EMC VNX/CX SnapView Clone, EMC VMAX/CX BCV
SYNC_REMOTE
Mirror data in remote system. In sync mode.
Example: NetApp SnapMirror, EMC VNX/CX MirrorView, EMC VMAX/CX RDF
ASYNC_LOCALE
Mirror data in local system. In async mode(return I/O once
source volume committed the data, sync data with internval).
No storage array support this yet. Just place holder for
future.
ASYNC_REMOTE
Mirror data in remote system. In async mode(return I/O once
source volume committed the data, sync data with internval).
Example: EMC VNX/CX MirrorView, EMC VMAX/CX RDF/A
API user can translate them into their favorite terminologies.
It would be helpful to have proposed parameters for each of these
methods. Some are pretty obvious, but for example the
replication_create, does that take source and destination systems too or
perhaps a list of destination systems & targets?
Post by Gris Ge
lsm.Client.volume_replications()
lsm.Client.volume_replication_create()
When you create a mirror, does the target already need to exist or will
we create it for them?
Post by Gris Ge
lsm.Client.volume_replication_delete()
So if you have a two volumes with a UNSYNC_DELTA_RO replication and you
call this method, what happens? Does the UNSYNC_DELTA_RO get
transformed into a UNSYNC_DELTA_RW?
Post by Gris Ge
lsm.Client.volume_replication_resync()
This only apply for mirror types?
Post by Gris Ge
lsm.Client.volume_replication_restore()
lsm.Client.volume_replication_failover()
lsm.Client.volume_replication_giveback()
Please elaborate on each of these three above. What do they do and what
types would they work with.
Post by Gris Ge
lsm.Client.volume_replication_split()
So this can be used for for all replicated types or just mirror types?
eg. will we allow a user the ability to split a UNSYNC_DELTA_RO,
UNSYNC_DELTA_RW?
Post by Gris Ge
* NetApp ONTAP SDK
Only support UNSYNC_DELTA_RO, UNSYNC_DELTA_RW, UNSYNC_FULL_LOCAL,
SYNC_LOCAL.
Maybe if we are on cluster mode, we might be able to do more
remote replication, but I don't have such array or simulators yet.
* SMI-S
SMI-S 1.6 "Copy Service" profile is stable. We can base on that
first.
SMI-S 1.6 "Replication" profile is experimental, we only use them
for performance or feature coverage issue.
EMC SMI-S provider provides well support on this including the
experimental replication profile.
* Targetd.
LVM snapshot not support yet.
Today when you ask to copy a volume, targetd uses thinp snapshot which
is an UNSYNC_DELTA_RW.
Post by Gris Ge
* Other plugin.
No time to investigate.
NStor is based on zfs. I believe they might be like NetApp, where
logical units are files on the file system, but I need to double check.
Post by Gris Ge
It's clearly my initial design is buggy and unreasonable.
Please kindly provide your feedback to improve this or maybe provide a
complete new one if my design sucks too much(It happens a lot).
It's going to be difficult to find a balance between providing an
abstraction that works seamlessly for every array vendor. Every design
choice is a compromise for some vendor.
Post by Gris Ge
[1] I am tired on explaining what snapshot, mirror, clone means. So I
choose the names SNIA do. SNIA is actually using "snapshot, mirror,
clone" names in newer profile -- Replication Profile, but that was
experimental.
I like this, I too would prefer to have something that is less ambiguous
and these are different and descriptive.

Thanks!

Regards,
Tony
Gris Ge
2014-04-02 01:25:52 UTC
Permalink
Post by Tony Asleson
Post by Gris Ge
* Class name: lsm.VolumeReplication
Would VolumeAssociation or VolumeRelationship work better? Is this the
SNIA term?
SNIA use 'ReplicationGroup' in 'Replication' profile(experimental) and
'StorageSynchronized' in 'Copy'.

Maybe 'VolumeReplicationGroup' might be better, but it will cause method
name too long (like: volume_replication_group_create).
Post by Tony Asleson
Post by Gris Ge
id, name, type, time_stamp, src_volume_id, tgt_volume_id,
sync_progress
Should the tgt_volume we a list of targets? Don't some arrays allow you
to mirror to more than one destination? Or I guess we could just have
another instance of this class for each target too.
With a list of targets, it might hard for use to track the
sync_progress and replication type.
Yes. New instance will be created for each target.
Mentioned in wiki.[1]
Post by Tony Asleson
It would be helpful to have proposed parameters for each of these
methods. Some are pretty obvious, but for example the
replication_create, does that take source and destination systems too or
perhaps a list of destination systems & targets?
Mentioned in wiki.[1]
Post by Tony Asleson
Post by Gris Ge
lsm.Client.volume_replications()
lsm.Client.volume_replication_create()
When you create a mirror, does the target already need to exist or will
we create it for them?
We create new one. Wiki updated.
Post by Tony Asleson
Post by Gris Ge
lsm.Client.volume_replication_delete()
So if you have a two volumes with a UNSYNC_DELTA_RO replication and you
call this method, what happens? Does the UNSYNC_DELTA_RO get
transformed into a UNSYNC_DELTA_RW?
Mentioned in wiki.[1]
Quote:
* For TYPE_UNSYNC_DELTA_RO and TYPE_UNSYNC_DELTA_RW, this method
will deleted target volume.

It seems we have no way to convert UNSYNC_DELTA_RO to UNSYNC_DELTA_RW, I
will check the documents and create a new method.
Post by Tony Asleson
Post by Gris Ge
lsm.Client.volume_replication_resync()
This only apply for mirror types?
Mentioned in wiki.[1]
Quote:
Resync a volume replication:
* For TYPE_UNSYNC_DELTA_RO and TYPE_UNSYNC_DELTA_RW, target volume
data will be erased, target volume now holding/pointing the
identical data.
* For TYPE_UNSYNC_FULL_LOCAL and TYPE_UNSYNC_FULL_REMOTE, target
volume will be updated, target will holding indentical data of
source volume once done.
* For TYPE_SYNC_LOCAL, TYPE_ASYNC_LOCAL, TYPE_SYNC_REMOTE and
TYPE_ASYNC_REMOTE, manual resync command will send to storage
system. Might be used after lsm.Client.volume_replication_split()
or error resolve.
Post by Tony Asleson
Post by Gris Ge
lsm.Client.volume_replication_restore()
lsm.Client.volume_replication_failover()
lsm.Client.volume_replication_giveback()
Please elaborate on each of these three above. What do they do and what
types would they work with.
Mentioned in wiki.[1]
Post by Tony Asleson
Post by Gris Ge
lsm.Client.volume_replication_split()
So this can be used for for all replicated types or just mirror types?
eg. will we allow a user the ability to split a UNSYNC_DELTA_RO,
UNSYNC_DELTA_RW?
Mentioned in wiki.[1]
No support for UNSYNC_DELTA_RO or UNSYNC_DELTA_RW.

[1] http://goo.gl/uHVVuC
--
Gris Ge
Tony Asleson
2014-04-02 15:05:22 UTC
Permalink
Post by Gris Ge
Post by Tony Asleson
So if you have a two volumes with a UNSYNC_DELTA_RO replication and you
call this method, what happens? Does the UNSYNC_DELTA_RO get
transformed into a UNSYNC_DELTA_RW?
Mentioned in wiki.[1]
* For TYPE_UNSYNC_DELTA_RO and TYPE_UNSYNC_DELTA_RW, this method
will deleted target volume.
To me this new class represents a relationship between one or more
volumes. However, you are proposing methods which work on these
relationships affect the volume(s) too, instead of just the relationship.

In the above example you are advocating that when we delete the
relationship the target is deleted too. To me this seems like a bad
side effect.

In general I would like to see this new class used to express the
relationships between volumes and the ability to modify the relationship
between them without the side effect of the target(s) being deleted. My
preference would be that a user does a volume_delete when they no longer
want the data which would have the side effect of removing the
relationship, not the other way around.

Does this make sense?
Post by Gris Ge
It seems we have no way to convert UNSYNC_DELTA_RO to UNSYNC_DELTA_RW, I
will check the documents and create a new method.
Perhaps we don't need to support this as arrays don't support it?

Regards,
Tony
Gris Ge
2014-04-03 08:33:52 UTC
Permalink
Post by Tony Asleson
To me this new class represents a relationship between one or more
volumes. However, you are proposing methods which work on these
relationships affect the volume(s) too, instead of just the relationship.
In the above example you are advocating that when we delete the
relationship the target is deleted too. To me this seems like a bad
side effect.
I believe the split method is enough.
If user really want to destroy the replication, they can delete the
source volume or target volume.
Post by Tony Asleson
In general I would like to see this new class used to express the
relationships between volumes and the ability to modify the relationship
between them without the side effect of the target(s) being deleted. My
preference would be that a user does a volume_delete when they no longer
want the data which would have the side effect of removing the
relationship, not the other way around.
Does this make sense?
Yes. Good point.
Post by Tony Asleson
Post by Gris Ge
It seems we have no way to convert UNSYNC_DELTA_RO to UNSYNC_DELTA_RW, I
will check the documents and create a new method.
Perhaps we don't need to support this as arrays don't support it?
EMC VNX support this. I didn't try it on real storage, but their
document said so. We can add this one later if any user request this
method.
Post by Tony Asleson
Regards,
Tony
VolumeReplication document updated:
1. The volume_replication_delete() method was removed. Documented
the alternative way for delete a replication:
A. lsm.Client.volume_replication_split() # pause the replication
B. Delete the target volume
2. The lsm.Capabilities constants allocated to every methods.
3. Will document volume_delete() dependency in volume_delete()
section. Quick thoughts:
A. Target volume is deletable no matter in replication or not if
other deletion dependency cleared. Once target volume
deleted, the replication will in STATUS_FRACTURED state.
B. Source volume is deletable when all associated target volume
deleted and other deletion dependency cleared.
--
Gris Ge
Tony Asleson
2014-04-03 13:45:02 UTC
Permalink
Post by Gris Ge
1. The volume_replication_delete() method was removed. Documented
A. lsm.Client.volume_replication_split() # pause the replication
B. Delete the target volume
Could we just do the required functionality to delete the target volume
with out requiring the user to call split first? Some arrays don't
require anything to delete the target. Either we can expose that logic,
or hide it and I would prefer to hide it if we can.
Post by Gris Ge
2. The lsm.Capabilities constants allocated to every methods.
3. Will document volume_delete() dependency in volume_delete()
A. Target volume is deletable no matter in replication or not if
other deletion dependency cleared. Once target volume
deleted, the replication will in STATUS_FRACTURED state.
B. Source volume is deletable when all associated target volume
deleted and other deletion dependency cleared.
These last two points seem to fit into what I was thinking for
child_dependency and child_dependency_delete. Determine if a volume has
them and the ability to remove them.

Regards,
Tony
Gris Ge
2014-04-04 03:14:32 UTC
Permalink
Post by Tony Asleson
Post by Gris Ge
1. The volume_replication_delete() method was removed. Documented
A. lsm.Client.volume_replication_split() # pause the replication
B. Delete the target volume
Could we just do the required functionality to delete the target volume
with out requiring the user to call split first? Some arrays don't
require anything to delete the target. Either we can expose that logic,
or hide it and I would prefer to hide it if we can.
Actually, I didn't express them in a precise way. My intention was
suggesting user to pause the replication in stead of delete the
replication in case they want resync. Clearly that does not fit in
delete expectation.

Anyway, I will remove the pause suggest.

Thanks.
--
Gris Ge
Gris Ge
2014-04-04 03:17:40 UTC
Permalink
Post by Gris Ge
Post by Tony Asleson
Post by Gris Ge
1. The volume_replication_delete() method was removed. Documented
A. lsm.Client.volume_replication_split() # pause the replication
B. Delete the target volume
Could we just do the required functionality to delete the target volume
with out requiring the user to call split first? Some arrays don't
require anything to delete the target. Either we can expose that logic,
or hide it and I would prefer to hide it if we can.
Actually, I didn't express them in a precise way. My intention was
suggesting user to pause the replication in stead of delete the
replication in case they want resync. Clearly that does not fit in
delete expectation.
Anyway, I will remove the pause suggest.
Just checked the wiki, it looks OK to me:

===
There is no delete method of lsm.VolumeReplication, you can use one of
these instead:

1. Use lsm.Client.volume_replication_split() to pause the copy or
synchronization.
2. Delete the target volume.
===
--
Gris Ge
Loading...