LECTURE 27 Python and Redis PYTHON AND REDIS Today, we’ll be covering a useful – but not entirely Python-centered – topic: the inmemory datastore Redis. We’ll start by introducing Redis itself and then discussing the Python clients available for interfacing with Redis. Then, we’ll talk about some unique Python projects that enhance Redis’ functionality. REDIS Redis is often known as a “data-structure server”. REmote DIctionary Server. Essentially, it is a key-value store which supports strings, lists, sets, sorted sets, hashes, and more. Each of these types has a number of supported atomic operations like appending to a string, pushing an element to a list, computing set intersection, etc. Compared to traditional databases, Redis is extremely fast. This is because the whole dataset is stored in-memory, as opposed to being written to disk on every transaction. The trade off here is that high write and read speed is achieved with the limitation of data sets that can't be larger than memory. REDIS USAGE Commonly, Redis is used to buffer write-heavy small data and insert big blobs of data into an SQL or other on-disk database. Redis can also be partitioned into multiple instances of set up in a Master-Slave configuration. There is a new method of partitioning instances into a Redis Cluster, but it is currently in a kind of “beta” stage. Redis is not completely volatile – it will write to disk but should not be relied upon for integrity. From the default Redis configuration file: save 900 1 save 300 10 save 60 10000 # after 900 sec (15 min) if at least 1 key changed # after 300 sec (5 min) if at least 10 keys changed # after 60 sec if at least 10000 keys changed REDIS BASICS Assuming you have Redis installed (as easy as “sudo aptget install redis-server” on Ubuntu), you can get a command line to the local instance with redis-cli. Note that the INCR operation is completely atomic – this prevents undefined behavior that might arise from two connections to the server. Note: the maximum allowed key size is 512 MB. ~$ redis-cli redis 127.0.0.1:6379> OK redis 127.0.0.1:6379> "somevalue" redis 127.0.0.1:6379> OK redis 127.0.0.1:6379> (integer) 11 redis 127.0.0.1:6379> (integer) 12 redis 127.0.0.1:6379> (integer) 0 redis 127.0.0.1:6379> (integer) 13 redis 127.0.0.1:6379> (integer) 1 redis 127.0.0.1:6379> (nil) SET somekey "somevalue" GET somekey SET count 10 INCR count INCR count SETNX count 20 INCR count DEL count GET count REDIS BASICS redis 127.0.0.1:6379> OK redis 127.0.0.1:6379> (integer) 1 redis 127.0.0.1:6379> (integer) 14 redis 127.0.0.1:6379> (integer) 9 redis 127.0.0.1:6379> (integer) 5 redis 127.0.0.1:6379> (integer) 2 redis 127.0.0.1:6379> (integer) -2 redis 127.0.0.1:6379> (integer) -1 SET countdown "This message will self destruct in 20 secs" EXPIRE countdown 20 TTL countdown TTL countdown TTL countdown TTL countdown TTL countdown TTL somekey Expire will cause the first argument to become unavailable after some amount of time. TTL will display the time-to-live. A -2 sentinel value indicates that the item has expired. A -1 sentinel value indicates that it is not scheduled to expire. REDIS BASICS One of the most useful Redis structures is the list, an ordered set of values. Redis structures do not need to be “declared”, you can immediately start working with a key as a certain structure by using its built-in methods as long as the key is not already defined. redis 127.0.0.1:6379> RPUSH fruit "apple" (integer) 1 redis 127.0.0.1:6379> RPUSH fruit "orange" (integer) 2 redis 127.0.0.1:6379> LPUSH fruit "banana" (integer) 3 redis 127.0.0.1:6379> LRANGE fruit 0 -1 1) "banana" 2) "apple" 3) "orange" redis 127.0.0.1:6379> LRANGE fruit 1 2 1) "apple" 2) "orange" redis 127.0.0.1:6379> LLEN fruit (integer) 3 redis 127.0.0.1:6379> LPOP fruit "banana" redis 127.0.0.1:6379> RPOP fruit "orange" redis 127.0.0.1:6379> LRANGE fruit 0 -1 1) "apple" REDIS BASICS Like a Python set, Redis sets are unordered and may only have one instance of any item. redis 127.0.0.1:6379> (integer) 1 redis 127.0.0.1:6379> (integer) 1 redis 127.0.0.1:6379> (integer) 1 redis 127.0.0.1:6379> 1) "potato" 2) "brussel sprouts" 3) "cucumber" redis 127.0.0.1:6379> (integer) 0 redis 127.0.0.1:6379> (integer) 1 redis 127.0.0.1:6379> 1 redis 127.0.0.1:6379> (integer) 1 redis 127.0.0.1:6379> 1) "carrot" 2) "potato" 3) "cucumber" SADD veggies "potato" SADD veggies "cucumber" SADD veggies "brussel sprouts" SMEMBERS veggies SADD veggies "potato" SISMEMBER veggies "cucumber" SREM veggies "brussel sprouts" SADD other_veggies "carrot" SUNION veggies other_veggies REDIS BASICS Sorted sets are like sets but each value in the set has an associated ordering value. redis 127.0.0.1:6379> (integer) 1 redis 127.0.0.1:6379> (integer) 1 redis 127.0.0.1:6379> Stallman" (integer) 1 redis 127.0.0.1:6379> (integer) 1 redis 127.0.0.1:6379> (integer) 1 redis 127.0.0.1:6379> 1) "Claude Shannon" 2) "Alan Kay" 3) "Richard Stallman" ZADD hackers 1940 "Alan Kay" ZADD hackers 1906 "Grace Hopper" ZADD hackers 1953 "Richard ZADD hackers 1916 "Claude Shannon" ZADD hackers 1912 "Alan Turing" ZRANGE hackers 2 4 REDIS BASICS Hashes map string fields to string values. redis 127.0.0.1:6379> HSET user:1000 name "John Smith" (integer) 1 redis 127.0.0.1:6379> HSET user:1000 email "john.smith@example.com" (integer) 1 redis 127.0.0.1:6379> HSET user:1000 password "s3cret" (integer) 1 redis 127.0.0.1:6379> HGETALL user:1000 1) "name" 2) "John Smith" 3) "email" 4) "john.smith@example.com" 5) "password" 6) "s3cret" REDIS BASICS redis 127.0.0.1:6379> OK redis 127.0.0.1:6379> "Mary Jones“ redis 127.0.0.1:6379> (integer) 1 redis 127.0.0.1:6379> (integer) 11 redis 127.0.0.1:6379> (integer) 21 redis 127.0.0.1:6379> (integer) 1 redis 127.0.0.1:6379> (integer) 1 HMSET user:1001 name "Mary Jones" password "hidden" email "mjones@example.com" HGET user:1001 name HSET user:1000 visits 10 HINCRBY user:1000 visits 1 HINCRBY user:1000 visits 10 HDEL user:1000 visits HINCRBY user:1000 visits 1 REDIS BASICS Redis has commands for a simple Pub/Sub where publishers can publish messages to specific channels which are subscribed to by subscribers. • SUBSCRIBE channel1 [channel2, …] • UNSUBSCRIBE channel1 [channel2, …] • PUBLISH channel message – Broadcast message to subscribers of channel. • PSUBSCIBE pattern1 [pattern2, …] – subscribe to channel matching regex’s. • PUNSUBSCRIBE pattern1 [pattern2, …] REDIS AND PYTHON There are a number of Python Redis clients available but the recommended client is redis-py. As usual, installation is as simple as $ sudo pip install redis Docs are available at readthedocs. REDIS-PY Redis connections are made by instantiating the Redis or StrictRedis classes. Redis provides backwards compatibility with older versions of redis-py. class redis.StrictRedis(host='localhost', port=6379, password=None, …) This abstract class provides a Python interface to all Redis commands and an implementation of the Redis protocol. REDIS-PY import redis conn = redis.StrictRedis(host='localhost', port=6379) conn.set('somekey', "somevalue") print "Somekey: ", conn.get('somekey') conn.set('count', 10) print "Count incremented: ", conn.incr('count') print "Count incremented: ", conn.incr('count') conn.setnx('count', 20) print "Count incremented: ", conn.incr('count') conn.delete('count') print "Count: ", conn.get('count') redis 127.0.0.1:6379> OK redis 127.0.0.1:6379> "somevalue" redis 127.0.0.1:6379> OK redis 127.0.0.1:6379> (integer) 11 redis 127.0.0.1:6379> (integer) 12 redis 127.0.0.1:6379> (integer) 0 redis 127.0.0.1:6379> (integer) 13 redis 127.0.0.1:6379> (integer) 1 redis 127.0.0.1:6379> (nil) SET somekey "somevalue" GET somekey SET count 10 INCR count INCR count SETNX count 20 INCR count DEL count GET count REDIS-PY import redis conn = redis.StrictRedis(host='localhost', port=6379) conn.set('somekey', "somevalue") print "Somekey: ", conn.get('somekey') conn.set('count', 10) print "Count incremented: ", conn.incr('count') print "Count incremented: ", conn.incr('count') conn.setnx('count', 20) print "Count incremented: ", conn.incr('count') conn.delete('count') print "Count: ", conn.get('count') $ python redis_ex.py Somekey: somevalue Count incremented: 11 Count incremented: 12 Count incremented: 13 Count: None REDIS-PY The redis-py client has implemented a lot of the redis-cli commands identically. • Take lists for example, • RPUSH list item conn.rpush(list, item), LPUSH conn.lpush(list, item) • RPOP list conn.rpop(list), LPOP list conn.lpop(list) • LLEN list conn.llen(list) • LRANGE list start end conn.lrange(list, start, end) REDIS AS A REQUEST/RESPONSE QUEUE Say we want to use Redis as a queue in our application. Generally speaking, say our application processes both request items in the form of dictionary objects. Let’s implement a class called RedisQueue_hash, which manages queues of request ids as well as hashes of the corresponding JSON objects. QUEUES We have one queue called “request”. Each request has an associated id. The Ids are stored, in order, in a queue with the name ‘request:queue’. The JSON object corresponding to the id is stored in a hash with the key ‘request:<id>’. import redis class RedisQueue_hash(object): def __init__(self, **redis_kwargs): self.__db = redis.Redis(**redis_kwargs) def put(self, id, data_dict, queue): if queue=="request": key = "request:" + str(id) self.__db.hmset(key,data_dict) self.__db.rpush("request:queue",str(id)) return True else: return False QUEUES Some other functions we might implement to make our lives better are qsize, empty, pop (for returning the oldest item in the queue), and get for retrieving the contents of a hash. def qsize(self, queue): return self.__db.llen(queue + ":queue") def empty(self, queue): return self.qsize(queue) == 0 def pop(self, queue): return self.__db.lpop(queue + ":queue") def get(self, id, queue): item = self.__db.hgetall(queue+":"+str(id)) keys = self.__db.hkeys(queue+":"+str(id)) for each in keys: self.__db.hdel(queue+":"+str(id), each) return item QUEUES >>> import RedisQueue_hash as rqh >>> conn = rqh.RedisQueue_hash() >>> import uuid >>> conn.put(uuid.uuid4(), {'color': 'red', 'shape': 'star'}, 'request') True >>> conn.qsize('request') 1 >>> conn.empty('request') False >>> new_request = conn.pop('request') >>> print new_request 9e471e07-b334-42e8-a643-22bad18297c8 >>> conn.get(new_request, 'request') {'color': 'red', 'shape': 'star'} >>> conn.empty('request') True CACHING INFO You can also use Redis as a quick and dirty cache for session related information. For example, when using a database like Cassandra, determining the next available row key for a record requires keeping track of inserted records manually. Implement a counter in Redis to store the currently available row key. Applications can atomically increment the key and grab it with INCR. REDIS USAGE • Show latest items listings in your home page. -- LPUSH is used to insert a content ID at the head of the list stored at a key. LTRIM is used to limit the number of items in the list to 5000. If the user needs to page beyond this cache only then are they sent to the database. • Leaderboards and related problems. A leader board is a set sorted by score. The ZADD commands implements this directly and the ZREVRANGE command can be used to get the top 100 users by score and ZRANK can be used to get a users rank. • Order by user votes and time. This is a leaderboard like Reddit where the score changes over time. LPUSH + LTRIM are used to add an article to a list. A background task polls the list and recomputes the order of the list and ZADD is used to populate the list in the new order. • Implement expires on items. To keep a sorted list by time then use unix time as the key. The difficult task of expiring items is implemented by indexing current_time+time_to_live. Another background worker is used to make queries using ZRANGE ... with SCORES and delete timed out entries. • Counting stuff. Keeping stats of all kinds is common, say you want to know when to block an IP addresss. The INCRBY command makes it easy to atomically keep counters; GETSET to atomically clear the counter; the expire attribute can be used to tell when a key should be deleted. • Unique N items in a given amount of time. This is the unique visitors problem and can be solved using SADD for each pageview. SADD won't add a member to a set if it already exists. • Queues. Queues are everywhere in programming. In addition to the push and pop type commands, Redis has blocking queue commands so a program can wait on work being added to the queue by another program. http://highscalability.com/blog/2011/7/6/11-common-web-use-cases-solved-in-redis.html REDIS+PYTHON Besides the Python interface for manually manipulating Redis yourself, Python has some libraries that are built-off of Redis but specialized for certain tasks. • RQ (Redis Queue) -- library for queuing jobs and processing them in the background with workers. • leaderboard – Redis-backed leaderboard library. • runlog – Redis-backed logging library for recurring jobs. • Celery, a message-passing library, can be backed with Redis.