Am doing bulk data processing and for speed's sake am first storing the data in redis before dumping it into a postgresql database in 2 minute intervals. I've used a redis hash to store the data and even the hash keys in redis correspond to the columns in the database tables.
Am using redis.scan() to get the list of redis hashes storing the rows of data and then using redis.hgetall() to get the data in the hashes. From there I create an SQL Insert statement in SqlAlchemy and perform a bulk data insert into the database.
The issues I face is that I have to first extract the keys containing my data using SCAN:
redis_match = 'data:row_keys:*'
row_keys = rdb.scan_iter(match=redis_match, count=limit_no)
From there I fetch all data in each hash:
for index, row_id in enumerate(row_keys):
row_data = rdb.hgetall(row_id)
The row_data is in the form of key:value but its stored in byte form so am incurring an extra overhead to decode each key and value manually using:
for key, value in row_data.items():
key = ast.literal_eval(key.decode('unicode_escape'))
value = ast.literal_eval(value.decode('unicode_escape'))
I feel this is too much and there must be a more elegant way to:
- Get the data from redis using hgetall() and be able to use that dataimmediately for a bulk SQL insert since the keys in the redis hashcorrespond to the column names in the postgresql table
- Even if 1 is not possible, at least there must be a quicker way toget the data from redis using hgetall() and do some on the flydecoding of the whole entry i.e. all entries in the hash instead ofiterating to each key and value
I've read about postgresql's Foreign Data Wrappers and especially redis_fdw and am wondering whether its the one to resolve my situation of getting a faster way to move data from Redis to Postgresql with the least trouble possible