Tuesday, 22 May 2018

Use ProxySQL to obfuscate data for development.

Before I begin let me put this out as a protection against the pitch fork guys 🙂 I do not in anyway recommend using live data (albeit sanitised) into dev, we should really have test data and keep updating the test data as application grows and is changed.

But in case when your developers are supporting a really fast moving, highly configurable system there may be a need to use live backups to be regularly sanitised and applied to dev systems. The usual way of doing this is take a backup from production servers, transfer to dev/stage servers, restore to Dev DB and then run some sanitisation scripts.

This seemed to be a bit of a problem as we did not want to have a situation that the mysqldump of live database sits on pre prod servers or may be left there my mistake. Privacy is a big thing today and due to introduction of GDPR we now have a need to eliminate any chances of collecting and storing personal data where we do not need it. 

Enter ProxySQL:

I stumbled upon a presentation from Frederic Descamps and Rene Cannao at FOSDEM 2017 where they talk about the data anonymisation for developers https://archive.fosdem.org/2017/schedule/event/proxysqldatamasking/ this is exactly what we needed. I wanted to try that out.

I also found an article from the proxysql blog http://www.proxysql.com/blog/obfuscate-data-from-mysqldump which the same authors show what they did to achieve the data obfuscation. What follows is a step by step guide on how i did this with docker.

First we need our actual database, this is going to represent our live database, for this i am using the official mysql-server container from hub.docker.com and then we need some data; for the data i have used the same database as recommended in the blogpost.

I pull the docker image for Mysql 5.7, expose the 3306 port so I can access the database from my host machine.

docker run --name=mysql1 -p 3306:3306 -e MYSQL_ROOT_PASSWORD=newpassword -e MYSQL_ROOT_HOST=% -d mysql/mysql-server:5.7

In short this will create a mysql 5.7 container with the database running, setting the 'e' params we set the password and MYSQL_ROOT_HOST variable needs to be set as % or the IP address of your docker machine if you need to access this database from outside the container.

For more information on other options and how this works please refer to the https://hub.docker.com/r/mysql/mysql-server/, beware the content will be volatile as we are not mounting any shares or volumes, which should suffice for our tests.

Now i will load this test data which is from https://github.com/datacharmer/test_db

mysql -u root -h -p < employees.sql

in this case is my docker host, as i am using docker toolbox for mac.

Now for create a proxy sql container.

docker run -d -p 6033:6033 nidash/proxysql:1.4.8

This will create a proxysql server and start it, exposing port 6033

Now proxysql uses 2 ports, 6032 for administering proxysql and 6033 where it acts as a proxy for the mysql servers, ie. all external users connect to the mysql server using port 6033.

docker exec -it mysql -u admin -padmin -h -P6032

After you enter the password, which my default is 'admin' you are presented with a mysql client prompt, beware this is not a true mysql command line interface but a special one for proxy sql administering, this is done so it is easier for DBAs to use a familiar interface.

There is an option to configure ProxySQL using a configuration file, when using with docker the above command can be modified to add a -v (volume mapping option to mount your config file over the default one at /etc/proxysql.cfg)

To allow us to connect to our mysql server through the proxysql proxy, we need a user created on mysql server that allows you to connect, in my case i have created a user as 'msandbox', we also need the actual user to be present on the server in our case we will be using the user 'muser' we need to give appropriate permissions to allow it to connect as well, for the proxy sql side of things we need to add the configuration in as follows:

Setup backend:

INSERT INTO mysql_servers(hostgroup_id,hostname,port) VALUES (1,'',3306);

Configure monitoring:

UPDATE global_variables SET variable_value='msandbox' WHERE variable_name='mysql-monitor_username';
UPDATE global_variables SET variable_value='msandbox' WHERE variable_name='mysql-monitor_password';

Load the variables to runtime:



Now, lets confirm proxysql can access the mysql server by looking at the monitoring log:

SELECT * FROM monitor.mysql_server_connect_log ORDER BY time_start_us DESC LIMIT 10;

should show something like
| hostname       | port | time_start_us    | connect_success_time_us | connect_error                                                       |
| | 3306 | 1526302545856645 | 1388                    | NULL                                                                |
| | 3306 | 1526302485856208 | 622                     | NULL                                                                |

which is good, and we are happy with the mysql backend configuration so we load this config into runtime and save to disk.


Lets create mysql users now:

INSERT INTO mysql_users(username,password,default_hostgroup) VALUES ('muser','muser',1);

and load users to runtime and update disk:

Now lets try a mysql dump (these instructions are not my own, these are excerpts from the proxysql blog, i have just reproduced this here for the sake of showing this working with my setup)

mysqldump --user=muser --password=muser -h -P6033 employees salaries --skip-extended-insert | less

notice the output shows insert statements as 

INSERT INTO `salaries` VALUES (10001,60117,'1986-06-26','1987-06-26');
INSERT INTO `salaries` VALUES (10001,62102,'1987-06-26','1988-06-25');
INSERT INTO `salaries` VALUES (10001,66074,'1988-06-25','1989-06-25');

Now lets create a rule to mask these salaries.

We will need to connect to our admin interface and add the rules:

INSERT INTO mysql_query_rules (rule_id,active,schemaname,match_pattern,replace_pattern) VALUES (1,1,'employees','^SELECT /\*!40001 SQL_NO_CACHE \*/ \* FROM `salaries`','SELECT SQL_NO_CACHE emp_no, 99999, from_date, to_date FROM salaries');

Load the rules to runtime

and save to disk:

Exit the admin interaface and run the mysqldump again.

mysqldump --user=muser --password=muser -h -P6033 employees salaries --skip-extended-insert | less

notice the salary is now masked and output now contains:

INSERT INTO `salaries` VALUES (10001,99999,'1986-06-26','1987-06-26');
INSERT INTO `salaries` VALUES (10001,99999,'1987-06-26','1988-06-25');
INSERT INTO `salaries` VALUES (10001,99999,'1988-06-25','1989-06-25');

Now we can improve on these and add more rules, but do not forget to load rules to runtime and to save for future save them to disk. There is also a handy script that allows you to create obfuscation rules for the columns you specify, this can be found at: https://gist.github.com/lefred/c040fee7e9c60ff3ca80f1590c48572b

Thursday, 17 August 2017

How to find differences between two docker images or containers.

Docker comes with a handy command called docker diff which allows you to find differences between the image and the container.

So if we make changes inside the container then we can find the difference from the image this is based on using the docker diff command.

But sometimes we might have to find differences between two running containers, in order to do this i found this helpful.

We can use the docker export command to export the filesystem of the container into a tarball and use our normal diff tools to compare.

This is how i did this:

Assume the image we are comparing is in our own docker registry docker.mysite.com:dev:0.25 and docker.mysite.com:dev:0.26 I would first create a container with the two images and then give the command

docker export dev025-instance -o dev05.tar


docker export dev026-instance -o dev06.tar

untar the files in separate folders and use an ide like PHPStorm to compare side by side.

For mac users there is a brew available to tap (https://github.com/moul/docker-diff) using

brew tap moul/moul

and then

brew install docker-diff

This does speed things up and you do not need to create containers etc, because docker-diff does that for you implicitly. However i felt it was easier with docker export commands due to large number of changes i was dealing with.

Docker diff works both with custom as well as official docker repos.

Monday, 7 March 2016

Setting up a drupal development env using docker image on mac os.

I have been trying to try various ways in which i can make building and destroying development environments quickly and also make them as close to production as possible, i have tried using vagrant before till i was bitten by the docker bug.

There is an official docker image for drupal 8 and drupal 8 on the docker hub, i have tried using them before it wasn't a great wow moment to be honest.

Creating a drupal image using official drupal docker image form docker hub.

The docker image did not have drush or drupal console so it was a bit disappointing. I searched and came across https://hub.docker.com/r/iiiepe/nginx-drupal/ The guys have done a lot of work in using docker in their deployment workflow, so i wanted to give this a try.

I could have used docker hub to pull the iiiepe image, but i wanted to see what is inside and if i need to make changes i could.

I first created a folder in my docker subfolder, I normally keep docker images and my projects separate. I checked out the git repo https://github.com/iiiepe/docker-nginx-drupal

Obviously you need to have docker installed and configured properly, you can find a lot of tutorials for this on the internet, but basically you need to start your default docker machine i.e.

docker-machine start default

then, set the env like so

docker-machine env default

and type the command with eval() in the output of docker-machine env default command.

Now i am ready to create the docker images as below:

I went to the folder which has the dockerfile, then gave the following command

docker build -t drupalprogrammer/nginx-drupal .

This will pull appropriate images and begin to build images, in a few mins this will be all done and you can check the images by giving the docker images command

We will also need a mysql image for our drupal container to work with, we could have gone for the official mysql image however it doesnt work very well with os x because we want to keep the data outside the container and there is some work required to create scripts to allow this to happen.

There is instructions on the official mysql images page i.e. https://hub.docker.com/_/mysql/ i tried to use the instructions to load the data outside but could not get it to work. Luckily little googling led me to another image we can use.

run the following command:

docker run --name mysql-data -v /Users/nidash/data:/var/lib/mysql -p 3306:3306 -d dgraziotin/mysql

This will pull dgraziotin/mysql image and expose the 3306 port to use.

Now give the following command docker ps this shows the some-mysql container working, great job. 

The first time you create the container this will also create an additional user called admin with a random generated password, to find the password you need to look at the docker logs like so.

docker logs mysql-data

So this shows the password like below:

You can now connect to this MySQL Server using: mysql -uadmin -p47nnf4FweaKu -h -P Please remember to change the above password as soon as possible! MySQL user 'root' has no password but only allows local connections

Using a mysql client connect to the mysql server and change your password eg.

SET PASSWORD FOR 'admin'@'%' = PASSWORD('password');
Now we need to the drupal container to be up as well, so here is the command to use.

docker run -d -p 8000:80 -v /Users/nidash/work/drupal7/:/var/www -v /Users/nidash/work/drupal7/sites/all/default/files:/var/www/sites/default/files  drupalprogrammer/nginx-drupal

We need to map 2 volumes, /Users/nidash/work/drupal7 or where the codebase is to /var/www and because drupal is drupal and has files inside the code base, we need to map and mount this separately hence the other -v.

Complete the installation and that is it. Ensure the database host name is set as the IP the docker-machine env default command reports.