A Scalable Sequence Encoding for Massive Collaborative Editing

advertisement
Sequence CRDT: A Scalable
Sequence Encoding for Massive
Collaborative Editing
Brice Nédelec, Pascal Molli & Achour Mostefaoui
GDD – LINA – University of Nantes
Workshop on Highly-Scalable Distributed Systems
Wednesday 14 January 2015, Paris France.
Distributed Collaborative Editors
• Distributed
Collaborative Editors
allow people to work
distributed in space,
time and organizations.
• Google Doc, Etherpad,
Google Wave…
• 190M users on GDrive.
(include Gdoc)
Google Doc is great, but...
• Single point of failure:
• If provider is down -> no
collaboration
• Privacy, economic
intelligence:
• What if google search for ANR
on 15 October ;) ?
• Mass editing:
• Google has limitations on
simultaneous users… (50), up
to 50 -> just readers
Is it possible to build a fully decentralized editor
that support 1M of simultaneous users?
• Why? Because it is hard ;)
– “We choose to go to the
moon in this decade and do
the other things, not
because they are easy, but
because they are hard.”
Kennedy 1962
• Because it can also be
useful, mass collaboration
-> Mooc, Webinars,
events, Google Wave has
already been used like
that…
Distributed Collaborative Editors
Principles (OT or CRDT)
• Based on optimistic replication algorithms
– Operations are generated locally
• No lock, no communication with others sites
– Broadcasted to others sites
• Every operation eventually derlivered
– Re-executed when received
• System is correct if it ensures causality,
convergence and “intention preservation” (OT
definition) i.e. preserve partial orders in the
sequence
Principles of Sequence CRDT
• Encode the order of the
sequence in the Id of
elements (remember ;)
10 LET B=A
15 For I=1 to 27
20 LET A=A*A
21 NEXT I
• Arghh, I forgot LET
B=B^2 before NEXT
I, no way to use 20,5 ??
Insert alpha between p and q
Create an id for alpha
Create a disambiguator
for alpha so path+dis unique)
Space and time complexity of Sequence
CRDT mainly decided here !!
Scientific problem
• Write an allocation
strategy ID for sequence
element that is
independent of
insertion order
• Many ways to type
“QWERTY”, how to
compute the smallest
IDs for each character
whatever insertion
order ?
PB: Order of Insertions
Typed: Q;W;E;R;T;Y
Typed: Y;T;R;E;W;Q
Combine Exponential tree &
random allocation
LSEQ Complexities
O((log n)2) -> avoid to rebalance IDs…
Experiments
• We built the CRATE Editor1
– LSEQ for ID allocation
– Gossip for broadcast
– Anti-entropy for missed delivery
– interval version vectors for causal reception2
1https://github.com/Chat-Wane/CRATE.git
2M.
Mukund, G. Shenoy R., S. Suresh, Optimized or-sets without ordering constraints, in: M.
Chatterjee, J.-n. Cao, K. Kothapalli, S. Rajsbaum (Eds.), Distributed Computing and
Networking, Vol. 8314 of Lecture Notes in Computer Science, Springer Berlin Heidelberg,
2014, pp. 227{241. doi:10.1007/978-3-642-45249-9_15.
1st Setup
• Objective: Validate the space complexity analysis of
LSEQ.
– when the editing behaviour is monotonic, LSEQ has a
polylogarithmic upper-bound on space complexity with
respect to the number of insert operations.
– When the editing behaviour is random, LSEQ has a
logarithmic space complexity.
• Setup:
– A single machine with 2 peers
– Peers globally produce 166 char/s to create a doc of
500000 chars
– Monotonic behavior
Evaluation
2nd Setup
• Objective: Show that CRATE scales in terms of
the number of peers.
– In other words, the size of the network does not
impact the space complexity upper bound of
messages
.
• Setup:
– On GRID500, number of peers grows from 2 to
450,
– 166 C/s uniformely distributed among peers
3rd Setup
• Objective: Show that concurrency does not
negatively impact the size of identifiers. Hence,
scenarios without concurrency show the upperbound on the size of identifiers.
• Setup:
– A single machine emulates 10 peers using the
application CRATE.
– 10000 char at 3 ins/s uniformly distributed among the
peers
– 5 runs with the approximate following latencies: 0:
02ms , 100ms , 500ms , 1s , and 10s .
Conclusions
• LSEQ allows to compute IDs for sequence
CRDT with an upper bound to log(n)2
• The number of peers and concurrency do not
impact negatively the performances of CRATE
• One million users is reachable…
Nédelec, B., Molli, P., Mostefaoui, A., & Desmontils, E. (2013, September). LSEQ: an adaptive structure
for sequences in distributed collaborative editing. In Proceedings of the 2013 ACM symposium on
Document engineering (pp. 37-46). ACM.
Nédelec, B., Molli, P., Mostefaoui, A., & Desmontils, E. (2013). Concurrency Effects Over Variable-size
Identifiers in Distributed Collaborative Editing. In Proceedings of the International workshop on
Document Changes: Modeling, Detection, Storage and Visualization, Florence, Italy, September 10,
2013 (Vol. 1008, pp. 0-7).
Perspectives
• Deploy a 1M editor on a network of browsers
– 1M users
– Editing 1M characters…
• And measures performances
• Under progress, nearly ready…
Download