Tuesday, 22 May 2018

Use ProxySQL to obfuscate data for development.

Before I begin let me put this out as a protection against the pitch fork guys 🙂 I do not in anyway recommend using live data (albeit sanitised) into dev, we should really have test data and keep updating the test data as application grows and is changed.

But in case when your developers are supporting a really fast moving, highly configurable system there may be a need to use live backups to be regularly sanitised and applied to dev systems. The usual way of doing this is take a backup from production servers, transfer to dev/stage servers, restore to Dev DB and then run some sanitisation scripts.

This seemed to be a bit of a problem as we did not want to have a situation that the mysqldump of live database sits on pre prod servers or may be left there my mistake. Privacy is a big thing today and due to introduction of GDPR we now have a need to eliminate any chances of collecting and storing personal data where we do not need it. 

Enter ProxySQL:

I stumbled upon a presentation from Frederic Descamps and Rene Cannao at FOSDEM 2017 where they talk about the data anonymisation for developers https://archive.fosdem.org/2017/schedule/event/proxysqldatamasking/ this is exactly what we needed. I wanted to try that out.

I also found an article from the proxysql blog http://www.proxysql.com/blog/obfuscate-data-from-mysqldump which the same authors show what they did to achieve the data obfuscation. What follows is a step by step guide on how i did this with docker.

First we need our actual database, this is going to represent our live database, for this i am using the official mysql-server container from hub.docker.com and then we need some data; for the data i have used the same database as recommended in the blogpost.

I pull the docker image for Mysql 5.7, expose the 3306 port so I can access the database from my host machine.

docker run --name=mysql1 -p 3306:3306 -e MYSQL_ROOT_PASSWORD=newpassword -e MYSQL_ROOT_HOST=% -d mysql/mysql-server:5.7

In short this will create a mysql 5.7 container with the database running, setting the 'e' params we set the password and MYSQL_ROOT_HOST variable needs to be set as % or the IP address of your docker machine if you need to access this database from outside the container.

For more information on other options and how this works please refer to the https://hub.docker.com/r/mysql/mysql-server/, beware the content will be volatile as we are not mounting any shares or volumes, which should suffice for our tests.

Now i will load this test data which is from https://github.com/datacharmer/test_db

mysql -u root -h -p < employees.sql

in this case is my docker host, as i am using docker toolbox for mac.

Now for create a proxy sql container.

docker run -d -p 6033:6033 nidash/proxysql:1.4.8

This will create a proxysql server and start it, exposing port 6033

Now proxysql uses 2 ports, 6032 for administering proxysql and 6033 where it acts as a proxy for the mysql servers, ie. all external users connect to the mysql server using port 6033.

docker exec -it mysql -u admin -padmin -h -P6032

After you enter the password, which my default is 'admin' you are presented with a mysql client prompt, beware this is not a true mysql command line interface but a special one for proxy sql administering, this is done so it is easier for DBAs to use a familiar interface.

There is an option to configure ProxySQL using a configuration file, when using with docker the above command can be modified to add a -v (volume mapping option to mount your config file over the default one at /etc/proxysql.cfg)

To allow us to connect to our mysql server through the proxysql proxy, we need a user created on mysql server that allows you to connect, in my case i have created a user as 'msandbox', we also need the actual user to be present on the server in our case we will be using the user 'muser' we need to give appropriate permissions to allow it to connect as well, for the proxy sql side of things we need to add the configuration in as follows:

Setup backend:

INSERT INTO mysql_servers(hostgroup_id,hostname,port) VALUES (1,'',3306);

Configure monitoring:

UPDATE global_variables SET variable_value='msandbox' WHERE variable_name='mysql-monitor_username';
UPDATE global_variables SET variable_value='msandbox' WHERE variable_name='mysql-monitor_password';

Load the variables to runtime:



Now, lets confirm proxysql can access the mysql server by looking at the monitoring log:

SELECT * FROM monitor.mysql_server_connect_log ORDER BY time_start_us DESC LIMIT 10;

should show something like
| hostname       | port | time_start_us    | connect_success_time_us | connect_error                                                       |
| | 3306 | 1526302545856645 | 1388                    | NULL                                                                |
| | 3306 | 1526302485856208 | 622                     | NULL                                                                |

which is good, and we are happy with the mysql backend configuration so we load this config into runtime and save to disk.


Lets create mysql users now:

INSERT INTO mysql_users(username,password,default_hostgroup) VALUES ('muser','muser',1);

and load users to runtime and update disk:

Now lets try a mysql dump (these instructions are not my own, these are excerpts from the proxysql blog, i have just reproduced this here for the sake of showing this working with my setup)

mysqldump --user=muser --password=muser -h -P6033 employees salaries --skip-extended-insert | less

notice the output shows insert statements as 

INSERT INTO `salaries` VALUES (10001,60117,'1986-06-26','1987-06-26');
INSERT INTO `salaries` VALUES (10001,62102,'1987-06-26','1988-06-25');
INSERT INTO `salaries` VALUES (10001,66074,'1988-06-25','1989-06-25');

Now lets create a rule to mask these salaries.

We will need to connect to our admin interface and add the rules:

INSERT INTO mysql_query_rules (rule_id,active,schemaname,match_pattern,replace_pattern) VALUES (1,1,'employees','^SELECT /\*!40001 SQL_NO_CACHE \*/ \* FROM `salaries`','SELECT SQL_NO_CACHE emp_no, 99999, from_date, to_date FROM salaries');

Load the rules to runtime

and save to disk:

Exit the admin interaface and run the mysqldump again.

mysqldump --user=muser --password=muser -h -P6033 employees salaries --skip-extended-insert | less

notice the salary is now masked and output now contains:

INSERT INTO `salaries` VALUES (10001,99999,'1986-06-26','1987-06-26');
INSERT INTO `salaries` VALUES (10001,99999,'1987-06-26','1988-06-25');
INSERT INTO `salaries` VALUES (10001,99999,'1988-06-25','1989-06-25');

Now we can improve on these and add more rules, but do not forget to load rules to runtime and to save for future save them to disk. There is also a handy script that allows you to create obfuscation rules for the columns you specify, this can be found at: https://gist.github.com/lefred/c040fee7e9c60ff3ca80f1590c48572b

Using Wodby (drupal4docker) to create our local development environment.

Using Wodby https://wodby.com  to create a Drupal local development environment using docker. I have used the official Drupal ...