LECTURE 27 Python and Redis

advertisement
LECTURE 27
Python and Redis
PYTHON AND REDIS
Today, we’ll be covering a useful – but not entirely Python-centered – topic: the inmemory datastore Redis.
We’ll start by introducing Redis itself and then discussing the Python clients available
for interfacing with Redis.
Then, we’ll talk about some unique Python projects that enhance Redis’ functionality.
REDIS
Redis is often known as a “data-structure server”. REmote DIctionary Server.
Essentially, it is a key-value store which supports strings, lists, sets, sorted sets, hashes,
and more.
Each of these types has a number of supported atomic operations like appending to
a string, pushing an element to a list, computing set intersection, etc.
Compared to traditional databases, Redis is extremely fast. This is because the whole
dataset is stored in-memory, as opposed to being written to disk on every
transaction. The trade off here is that high write and read speed is achieved with the
limitation of data sets that can't be larger than memory.
REDIS USAGE
Commonly, Redis is used to buffer write-heavy small data and insert big blobs of
data into an SQL or other on-disk database.
Redis can also be partitioned into multiple instances of set up in a Master-Slave
configuration. There is a new method of partitioning instances into a Redis Cluster, but
it is currently in a kind of “beta” stage.
Redis is not completely volatile – it will write to disk but should not be relied upon for
integrity. From the default Redis configuration file:
save 900 1
save 300 10
save 60 10000
# after 900 sec (15 min) if at least 1 key changed
# after 300 sec (5 min) if at least 10 keys changed
# after 60 sec if at least 10000 keys changed
REDIS BASICS
Assuming you have Redis
installed (as easy as “sudo aptget install redis-server” on
Ubuntu), you can get a
command line to the local
instance with redis-cli.
Note that the INCR operation is
completely atomic – this
prevents undefined behavior
that might arise from two
connections to the server.
Note: the maximum allowed key
size is 512 MB.
~$ redis-cli
redis 127.0.0.1:6379>
OK
redis 127.0.0.1:6379>
"somevalue"
redis 127.0.0.1:6379>
OK
redis 127.0.0.1:6379>
(integer) 11
redis 127.0.0.1:6379>
(integer) 12
redis 127.0.0.1:6379>
(integer) 0
redis 127.0.0.1:6379>
(integer) 13
redis 127.0.0.1:6379>
(integer) 1
redis 127.0.0.1:6379>
(nil)
SET somekey "somevalue"
GET somekey
SET count 10
INCR count
INCR count
SETNX count 20
INCR count
DEL count
GET count
REDIS BASICS
redis 127.0.0.1:6379>
OK
redis 127.0.0.1:6379>
(integer) 1
redis 127.0.0.1:6379>
(integer) 14
redis 127.0.0.1:6379>
(integer) 9
redis 127.0.0.1:6379>
(integer) 5
redis 127.0.0.1:6379>
(integer) 2
redis 127.0.0.1:6379>
(integer) -2
redis 127.0.0.1:6379>
(integer) -1
SET countdown "This message will self destruct in 20 secs"
EXPIRE countdown 20
TTL countdown
TTL countdown
TTL countdown
TTL countdown
TTL countdown
TTL somekey
Expire will cause the first argument to become
unavailable after some amount of time.
TTL will display the time-to-live. A -2 sentinel value
indicates that the item has expired. A -1 sentinel
value indicates that it is not scheduled to expire.
REDIS BASICS
One of the most useful Redis structures is
the list, an ordered set of values.
Redis structures do not need to be
“declared”, you can immediately start
working with a key as a certain structure
by using its built-in methods as long as
the key is not already defined.
redis 127.0.0.1:6379> RPUSH fruit "apple"
(integer) 1
redis 127.0.0.1:6379> RPUSH fruit "orange"
(integer) 2
redis 127.0.0.1:6379> LPUSH fruit "banana"
(integer) 3
redis 127.0.0.1:6379> LRANGE fruit 0 -1
1) "banana"
2) "apple"
3) "orange"
redis 127.0.0.1:6379> LRANGE fruit 1 2
1) "apple"
2) "orange"
redis 127.0.0.1:6379> LLEN fruit
(integer) 3
redis 127.0.0.1:6379> LPOP fruit
"banana"
redis 127.0.0.1:6379> RPOP fruit
"orange"
redis 127.0.0.1:6379> LRANGE fruit 0 -1
1) "apple"
REDIS BASICS
Like a Python set, Redis sets are
unordered and may only have
one instance of any item.
redis 127.0.0.1:6379>
(integer) 1
redis 127.0.0.1:6379>
(integer) 1
redis 127.0.0.1:6379>
(integer) 1
redis 127.0.0.1:6379>
1) "potato"
2) "brussel sprouts"
3) "cucumber"
redis 127.0.0.1:6379>
(integer) 0
redis 127.0.0.1:6379>
(integer) 1
redis 127.0.0.1:6379>
1
redis 127.0.0.1:6379>
(integer) 1
redis 127.0.0.1:6379>
1) "carrot"
2) "potato"
3) "cucumber"
SADD veggies "potato"
SADD veggies "cucumber"
SADD veggies "brussel sprouts"
SMEMBERS veggies
SADD veggies "potato"
SISMEMBER veggies "cucumber"
SREM veggies "brussel sprouts"
SADD other_veggies "carrot"
SUNION veggies other_veggies
REDIS BASICS
Sorted sets are like sets but
each value in the set has an
associated ordering value.
redis 127.0.0.1:6379>
(integer) 1
redis 127.0.0.1:6379>
(integer) 1
redis 127.0.0.1:6379>
Stallman"
(integer) 1
redis 127.0.0.1:6379>
(integer) 1
redis 127.0.0.1:6379>
(integer) 1
redis 127.0.0.1:6379>
1) "Claude Shannon"
2) "Alan Kay"
3) "Richard Stallman"
ZADD hackers 1940 "Alan Kay"
ZADD hackers 1906 "Grace Hopper"
ZADD hackers 1953 "Richard
ZADD hackers 1916 "Claude Shannon"
ZADD hackers 1912 "Alan Turing"
ZRANGE hackers 2 4
REDIS BASICS
Hashes map string fields to string
values.
redis 127.0.0.1:6379> HSET user:1000 name "John Smith"
(integer) 1
redis 127.0.0.1:6379> HSET user:1000 email "john.smith@example.com"
(integer) 1
redis 127.0.0.1:6379> HSET user:1000 password "s3cret"
(integer) 1
redis 127.0.0.1:6379> HGETALL user:1000
1) "name"
2) "John Smith"
3) "email"
4) "john.smith@example.com"
5) "password"
6) "s3cret"
REDIS BASICS
redis 127.0.0.1:6379>
OK
redis 127.0.0.1:6379>
"Mary Jones“
redis 127.0.0.1:6379>
(integer) 1
redis 127.0.0.1:6379>
(integer) 11
redis 127.0.0.1:6379>
(integer) 21
redis 127.0.0.1:6379>
(integer) 1
redis 127.0.0.1:6379>
(integer) 1
HMSET user:1001 name "Mary Jones" password "hidden" email "mjones@example.com"
HGET user:1001 name
HSET user:1000 visits 10
HINCRBY user:1000 visits 1
HINCRBY user:1000 visits 10
HDEL user:1000 visits
HINCRBY user:1000 visits 1
REDIS BASICS
Redis has commands for a simple Pub/Sub where publishers can publish messages to
specific channels which are subscribed to by subscribers.
• SUBSCRIBE channel1 [channel2, …]
• UNSUBSCRIBE channel1 [channel2, …]
• PUBLISH channel message – Broadcast message to subscribers of channel.
• PSUBSCIBE pattern1 [pattern2, …] – subscribe to channel matching regex’s.
• PUNSUBSCRIBE pattern1 [pattern2, …]
REDIS AND PYTHON
There are a number of Python Redis clients available but the recommended client is
redis-py.
As usual, installation is as simple as
$ sudo pip install redis
Docs are available at readthedocs.
REDIS-PY
Redis connections are made by instantiating the Redis or StrictRedis classes. Redis
provides backwards compatibility with older versions of redis-py.
class redis.StrictRedis(host='localhost', port=6379, password=None, …)
This abstract class provides a Python interface to all Redis commands and an
implementation of the Redis protocol.
REDIS-PY
import redis
conn = redis.StrictRedis(host='localhost', port=6379)
conn.set('somekey', "somevalue")
print "Somekey: ", conn.get('somekey')
conn.set('count', 10)
print "Count incremented: ", conn.incr('count')
print "Count incremented: ", conn.incr('count')
conn.setnx('count', 20)
print "Count incremented: ", conn.incr('count')
conn.delete('count')
print "Count: ", conn.get('count')
redis 127.0.0.1:6379>
OK
redis 127.0.0.1:6379>
"somevalue"
redis 127.0.0.1:6379>
OK
redis 127.0.0.1:6379>
(integer) 11
redis 127.0.0.1:6379>
(integer) 12
redis 127.0.0.1:6379>
(integer) 0
redis 127.0.0.1:6379>
(integer) 13
redis 127.0.0.1:6379>
(integer) 1
redis 127.0.0.1:6379>
(nil)
SET somekey "somevalue"
GET somekey
SET count 10
INCR count
INCR count
SETNX count 20
INCR count
DEL count
GET count
REDIS-PY
import redis
conn = redis.StrictRedis(host='localhost', port=6379)
conn.set('somekey', "somevalue")
print "Somekey: ", conn.get('somekey')
conn.set('count', 10)
print "Count incremented: ", conn.incr('count')
print "Count incremented: ", conn.incr('count')
conn.setnx('count', 20)
print "Count incremented: ", conn.incr('count')
conn.delete('count')
print "Count: ", conn.get('count')
$ python redis_ex.py
Somekey: somevalue
Count incremented: 11
Count incremented: 12
Count incremented: 13
Count: None
REDIS-PY
The redis-py client has implemented a lot of the redis-cli commands identically.
• Take lists for example,
• RPUSH list item  conn.rpush(list, item), LPUSH  conn.lpush(list, item)
• RPOP list  conn.rpop(list), LPOP list  conn.lpop(list)
• LLEN list  conn.llen(list)
• LRANGE list start end  conn.lrange(list, start, end)
REDIS AS A REQUEST/RESPONSE QUEUE
Say we want to use Redis as a queue in our application. Generally speaking, say our
application processes both request items in the form of dictionary objects.
Let’s implement a class called RedisQueue_hash, which manages queues of request
ids as well as hashes of the corresponding JSON objects.
QUEUES
We have one queue called “request”.
Each request has an associated id. The
Ids are stored, in order, in a queue with
the name ‘request:queue’.
The JSON object corresponding to the
id is stored in a hash with the key
‘request:<id>’.
import redis
class RedisQueue_hash(object):
def __init__(self, **redis_kwargs):
self.__db = redis.Redis(**redis_kwargs)
def put(self, id, data_dict, queue):
if queue=="request":
key = "request:" + str(id)
self.__db.hmset(key,data_dict)
self.__db.rpush("request:queue",str(id))
return True
else:
return False
QUEUES
Some other functions we might implement
to make our lives better are qsize, empty,
pop (for returning the oldest item in the
queue), and get for retrieving the contents
of a hash.
def qsize(self, queue):
return self.__db.llen(queue + ":queue")
def empty(self, queue):
return self.qsize(queue) == 0
def pop(self, queue):
return self.__db.lpop(queue + ":queue")
def get(self, id, queue):
item = self.__db.hgetall(queue+":"+str(id))
keys = self.__db.hkeys(queue+":"+str(id))
for each in keys:
self.__db.hdel(queue+":"+str(id), each)
return item
QUEUES
>>> import RedisQueue_hash as rqh
>>> conn = rqh.RedisQueue_hash()
>>> import uuid
>>> conn.put(uuid.uuid4(), {'color': 'red', 'shape': 'star'}, 'request')
True
>>> conn.qsize('request')
1
>>> conn.empty('request')
False
>>> new_request = conn.pop('request')
>>> print new_request
9e471e07-b334-42e8-a643-22bad18297c8
>>> conn.get(new_request, 'request')
{'color': 'red', 'shape': 'star'}
>>> conn.empty('request')
True
CACHING INFO
You can also use Redis as a quick and dirty cache for session related information. For
example, when using a database like Cassandra, determining the next available row
key for a record requires keeping track of inserted records manually.
Implement a counter in Redis to store the currently available row key. Applications
can atomically increment the key and grab it with INCR.
REDIS USAGE
• Show latest items listings in your home page. -- LPUSH is used to insert a content ID at the head of the list stored at a key.
LTRIM is used to limit the number of items in the list to 5000. If the user needs to page beyond this cache only then are they sent
to the database.
• Leaderboards and related problems. A leader board is a set sorted by score. The ZADD commands implements this directly and
the ZREVRANGE command can be used to get the top 100 users by score and ZRANK can be used to get a users rank.
• Order by user votes and time. This is a leaderboard like Reddit where the score changes over time. LPUSH + LTRIM are used to
add an article to a list. A background task polls the list and recomputes the order of the list and ZADD is used to populate the
list in the new order.
• Implement expires on items. To keep a sorted list by time then use unix time as the key. The difficult task of expiring items is
implemented by indexing current_time+time_to_live. Another background worker is used to make queries using ZRANGE ... with
SCORES and delete timed out entries.
• Counting stuff. Keeping stats of all kinds is common, say you want to know when to block an IP addresss. The INCRBY command
makes it easy to atomically keep counters; GETSET to atomically clear the counter; the expire attribute can be used to tell when
a key should be deleted.
• Unique N items in a given amount of time. This is the unique visitors problem and can be solved using SADD for each pageview.
SADD won't add a member to a set if it already exists.
• Queues. Queues are everywhere in programming. In addition to the push and pop type commands, Redis has blocking queue
commands so a program can wait on work being added to the queue by another program.
http://highscalability.com/blog/2011/7/6/11-common-web-use-cases-solved-in-redis.html
REDIS+PYTHON
Besides the Python interface for manually manipulating Redis yourself, Python has
some libraries that are built-off of Redis but specialized for certain tasks.
• RQ (Redis Queue) -- library for queuing jobs and processing them in the background
with workers.
• leaderboard – Redis-backed leaderboard library.
• runlog – Redis-backed logging library for recurring jobs.
• Celery, a message-passing library, can be backed with Redis.
Download