Memory StickOver the weekend, I received an email from a guy that was asking about database caching. In his email, he said he had heard that caching your data in memory can boost the performance of your application since the data is pulled from memory instead of from the file system upon every request. Since file I/O can be very expensive at times, his statement was correct. Throughout the email, I felt like he understood the benefits of caching. But, by the end of the email, he explained that he did not know how caching can be accomplished. In my reply email to him, I told him about how some of the larger players in the world will use systems like Memcache to gain tremendous performance numbers by caching their data. Simply replying to him with an email talking about stuff like Memcache was not exactly what he was looking for. Instead, he was interested in how the internals of systems like Memcache could be accomplished in code. So, I put together a super simple application using Python that demonstrates how you can accomplish in-memory caching to enhance the performance of an application and now I want to share that example with you.

To begin with, I want to mention that this is not exactly the most efficient way to accomplish in-memory caching. Instead, this article will demonstrate a super simple mechanism for caching small amounts of data in-memory in a way that can be easily understood and can be extended when needed.

For this example of database caching, every time we insert a new record into our database, we will also insert the record into our cache. Every time we update the record in the database, we will also update it in our cache. Every time we need to retrieve a record from the database, we will instead retrieve it from the cache and not from the database. By using this approach, inserts and updates will take just a bit longer since not only do they have to upsert (update / insert) the database, they also have to upsert the cache. Even though upserts will take a little longer, retrieving data will be much faster because it only needs to fetch the data from the cache and not from the database, removing file I/O during data reads. Besides, let’s face it, the main purpose of caching data is for the ability to return data faster to the user. Plus, if data isn’t being upserted very often, this mechanism will pay for itself in the first few data retrievals as it will be much faster than fetching the data directly from the database.

To extend this example even further, you could easily implement a mechanism that retrieves new and updated data from the cache and updates the database with that data on a periodic basis. This would reduce the time taken for upserts during runtime. However, one thing you will need to keep in mind by doing this is that if something was to happen to your application and it goes down in between database updates, any data that has been upserted in your cache and has not been persisted yet to your database will be lost. So, if you use this approach, make sure you scan your cache for updates on a short enough time frame that your chances for lost data is at an acceptable rate. Don’t try to scan your cache for updates every hour if it means that you could lose more data than you can afford. Instead, shorten the time in between scans or make sure you only use this approach on data that is not very important or can be retrieved again at a later time. If you use this approach, make sure you include a timestamp or other flag in your cache that indicates when the data was last persisted to your database or if it has not been at all. You’ll use this flag to fetch data from the cache that needs to be persisted and not all data every time.

For this example, we will be using a dictionary to house all of our cached data. Dictionaries provide an easy way to store and retrieve data using a key. When working with data in a database, the easiest way to retrieve data from a table is to fetch it using the primary key. Primary keys are unique identifiers that point to a specific record / row in a table. So, for the purposes of this tutorial, dictionaries behave much like rows in a table by providing us with an easy way to retrieve data, the key. So, the first thing you will need to do is create a dictionary. To keep things easy to follow, I will name my dictionary “cache”.

cache = {}

Next, you will need to create methods for saving, deleting, and retrieving data from your database and cache. The first method you will need to build is for saving data. The save method will accept 2 parameters. The first parameter will be the key under which you want to store the record in the database. The best practice here is to use the same fields that make up your primary key. If you use an auto-generated key in your database, you will need to first insert the new record into the database and immediately retrieve it after the insert so that you will have the auto-generated key. Then, use that key to create a new entry in your dictionary. The second parameter your save method will accept is the record itself. Since this article’s focus is on working with a cache, I will be leaving out the database specific parts but will make sure to note when communication with the database should occur. Here is an example of our save method:

def save(key, record):
if key != None:
cache[key] = record
“”” Add call to insert or update database “””

In the code above, you’ll notice that instead of inserting the record into the database, then fetching it immediately after to get the primary key (which we’ve assumed is auto-generated by the database), I chose to go with the approach of checking if the key is null or not and updating the cache if the key is not null. This simple method will take care of inserting new records into our cache as well as updating existing records. Depending on how the primary keys will be created for your application, you will need to arrange the cache and database communications accordingly. If your primary keys will be composite keys, meaning the primary key consists of multiple fields, you should concatenate each of those fields to generate the key that you’ll use in your cache.

The next method you will need to build is the method that will delete records from your database and cache. This method will only need 1 parameter, the key of the record you want to delete.

def delete(key):
if key in cache:
cache.pop(key, 0)
“”” Add call to delete record from database “””

Again, in the code above, I have chosen to leave out the call to delete the record from the database. I’ll leave this up to you. The code I did provide simply checks to see if the key being passed in exists in the cache (dictionary) and removes it if it does. To remove items from a dictionary, you will use the “pop” function and pass it the key of the record you want to remove as well as how many subsequent items you want to remove along with it. In this case, we only want to remove the one item. So, we will pass the key and a zero as the 2 parameters to the “pop” function.

The next method you will be implementing is the function that will return items from the cache. In this method, we will take care of 2 things. Not only will we be retrieving items from the cache, we will also be retrieving items from the database that are not yet in our cache and will add those items to our cache as they are retrieved from the database. The reason for this is that since we are not persisting our cache to the filesystem, we will need a way to rebuild the cache. We can either choose to rebuild the in the beginning every time the application first starts up, or we can use this approach and rebuild the cache on-the-fly. Both approaches have their pros and cons. The former method will cause the application to take much longer to startup. The latter method will cause each record to be retrieved directly from the database the first time it is requested and retrieved from the cache from then on. I like the second approach as it does not require me to cache the entire database every time the app starts up. Instead, this approach will only cache data that’s actually being used which keeps the size of my cache down.

To implement the “fetch” method, you will need to pass in only one parameter, the key. Then, you will need to check if that key exists in the cache and return it if it does exist. If it does not currently exist in the cache, you will need to retrieve the record from the database, store it in the cache, and then return it to the user. Again, I am not implementing any of the database specific code. So, to keep things simple, I will build a second method that will be responsible for retrieving data from the database. But, instead of actually communicating with a database, I will simply return the current timestamp which I will explain shortly.

def fetch(key):
if key not in cache:
ret = get_from_db(key)
cache[key] = ret
return cache[key]

def get_from_db(key):
“”” Add code to fetch record from database “””
return “This is a test for %s at %s” % (key, datetime.now())

That’s it. You now have everything you need for a super simple database caching mechanism using Python. So, let’s jump straight into testing this code. The first thing you will need to do to test this code is to first put something in the cache. Lucky for us, we’ve already done this inside our “fetch” method. So, go ahead and make a call to the “fetch” method using any key (if using this code with an actual database, be sure to use a key that exists in the database since we haven’t incorporated any kind of error handling).

print fetch(‘test’)

When you run the previous line, you will see that the key does not exist in your cache. Because it doesn’t yet exist, the “get_from_db” method will be invoked which will return a string with the current timestamp in it and that string will be stored in our cache. Since we are working with a timestamp as example data, we can use the “time.sleep” method to tell our app to pause for a short time and then re-run the previous command to see that the timestamp did not change. I’ve chosen to have my app pause for 5 seconds before re-running the previous command.

time.sleep(5)
print fetch(‘test’)

Next, you will want to test the “save” method by passing the same key as before, but with different record information. Then, you will want to re-run the previous command to see that the cache did in fact get updated.

save(‘test’, ‘Hello world %s’ % datetime.now())
print fetch(‘test’)

As you can see, the cache did get updated. So, the next thing you will need to test is the “delete” method for removing items from the cache as well as the database. After you run the “delete” command, you will again want to run the “fetch” method so that you can see that the record did get removed from the cache and re-built in the “get_from_db” method by checking that the timestamp in the string has changed.

delete(‘test’)
print fetch(‘test’)

Here are the test results I got when I just ran the code.

This is a test for key ‘test’ at 2012-11-19 2:20:03.956000
This is a test for key ‘test’ at 2012-11-19 2:20:03.956000
Hello world 2012-11-19 2:20:08.962000
This is a test for key ‘test’ at 2012-11-19 2:20:08.964000

Below is the code that made all of this possible. If you have any questions or comments, please let us know in the comments below.

from datetime import datetime
import time

cache = {}

def get_from_db(key):
    """ Add code to fetch record from database """
    return "This is a test for key '%s' at %s" % (key, datetime.now())

def save(key, record):
    if key != None:
        cache[key] = record
    """ Add call to insert or update database """

def delete(key):
    if key in cache:
        cache.pop(key, 0)
    """ Add call to delete record from database """

def fetch(key):
    if key not in cache:
        ret = get_from_db(key)
        cache[key] = ret
    return cache[key]

if __name__ == '__main__':
    print fetch('test')
    time.sleep(5)
    print fetch('test')
    save('test', 'Hello world %s' % datetime.now())
    print fetch('test')
    delete('test')
    print fetch('test')

Grab yourself a copy of my eBook (Android App Development 101) today!

BUY NOW

Related Posts

Tagged with:  

Leave a Reply