Friday, 21 December 2012

Import data from mysql to mongo


To import data from mysql into mongodb we need to export data from mysql in a format mongo likes, this is either a tsv or a csv. Ideally you would like to use the following sql

select columns INTO OUTFILE '/path/to/csv'
   FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
   LINES TERMINATED BY '\n' from table [where clause]

What the above SQL does is exports the data (only data , no headers) into a file as provided in your path, make sure you have access to location where you are putting the file. If you ignore the path and just use the filename this will be stored in the location where mysql data files are stored.

To get the fields in the data file, we can use the following
   
echo "SELECT * FROM
;" | mysql -uXXX -pXXX > '/path/to/
.tsv'

Finally you want all the be imported into a mongo database, here since our file contains a header we specify the --headerline option. For a complete list of options available with mongoimport please look at the mongodb site (http://docs.mongodb.org/manual/reference/mongoimport/)

mongoimport -d -c -type tsv --headerline

The headerline option uses the first line for fieldnames, this is fine till you have no data in your mysql and still want to import, then you are back to using the first option on OUTFILE but you also need to have a header file generated which you can then use in mongo, lets see how we can do this.

If you run into problems because of an empty data set, an empty set means no column names! use the steps below:

echo "SELECT column_name FROM information_Schema.cloumns WHERE table_schema = DATABASE() AND table_name='';" | mysql -uXXX -pXXX > '/path/to/_headers.tsv'

Then when importing to mongo use the following:

mongoimport -d -c -type tsv --fieldFile /path/to/_header.tsv

Sunday, 9 December 2012

Use feeds module to import XML data in Drupal 7

Recently i had to setup a xml import process to import data from an external supplier on a regular basis, the filename and format was not changing and i could choose where to store the file. Following is how i set it up.

First we need to download the following modules

feeds
feeds admin ui
feeds_xpathparser
ctools
job_scheduler

enable the above modules.

We will need a sample XML so this will be the one we will use for this demonstration:

<?xml version="1.0" encoding="utf-8" ?>
<!-- Sample for demonstration of processing XML feeds with feeds module -->
<Orders>
<Order>
<CustomerID>10</CustomerID>
<Order_Number>1</Order_Number>
<Order_Total>100</Order_Total>
<Currency>£</Currency>
</Order>
<Order>
<CustomerID>11</CustomerID>
<Order_Number>2</Order_Number>
<Order_Total>40</Order_Total>
<Currency>$</Currency>
</Order>
</Orders>


Now create a content type to hold this information. I am creating a content type with name as "order_info" give some description, to make things easier i have done following settings, removed body field, removed comments, removed 'publish to frontpage' options.

Now add fields as per above sample xml files we require the following fields

CustomerID, Text, 10
Order_Number, Text, 10
Order_Total, Decimal, 10.2
Currency, Text 3

Now that this is done now create an importer using structure->feeds_imports->add importer. We will name the importer "Order info xml importer".

Base the importer on following settings:

Basic settings
===========
Attach to content type: Order_info
Periodic import: As often as possible
Import on submission: uncheck
Click "Save" important to do this at each step so you do not loose the settings.

Fetcher
======
File upload - select
Click "Save"
got to settings and select "Supply path to file or directory directly" as we want to setup a regular import from a selected location. We also assume filename will be the same and will be located in the "site/default/files/feeds/" and we will call this sample.xml

Parser
=====
select XPath XML Parser
Click "Save"

Parser->Settings
=====   =======
This setting is for specifying your path information, define what is the context and fields, context is the row of data for our example context is "Order".

Context: Order

Following will be based on your content type, this is the place where you map your fields to data in XML.

field_order_number: Order_Number
field_customerid: CustomerID
field_order_total: Order_Total
field_currency: Currency
title: Order_Number

Here we have used title as identifying and unique information, and we are using Order_Number

Processor->Node processor->Settings
========   ============   ======
Content type: Order_info
Click "Save"

Processor->Node processor->Mapping
========   ============   ======
Use source as "Xpath Expression" and target as the XML data item, when you add this will add mapping as below:

xpathparser: =>

After you have completed the assignments the mappings will look as follows:

xpathparser:0 => Order_Number
xpathparser:1 => CustomerID
xpathparser:2 => Order_Total
xpathparser:3 => Currency

xpathparser:4 => Title => Unique

As we need to identify an item in XML as unique to ensure data integrity, we can choose title of node and make sure they are marked as unique. Later we will use order_number and put this in the Title, this will also help identify the content.

After saving this, now lets import. For this we need to go to "http://yourlocal/import" and select 'Order_info' put the title as 'Order XML Import' this will create a content that allows your to import data into your Order_info content type, here you need to select location of file lets put "public://feeds/Sample.xml" and put title as "Order XML Import" and save the content.

To Import manually click 'Import' or to see how this will be imported using cron go to admin/config/system/cron and see new nodes being created from you XML.






Tuesday, 4 December 2012

Would you love apt-get on a mac?

As Ubuntu kept maturing i kept on gradually liking the "apt-get install" command in ubuntu to the point this felt like a hinderance on other linux flavours more notably Mac.

To my surprise there is a tool available to make apt-get work in mac, this is fink (http://www.finkproject.org).  The binary is available to download for 10.5 and lower, for the rest who have moved on it seems the only option is to compile and install from source as per the instructions on the fink website.

I had 10.8 (Mountain Lion) installed and as the fink site suggests i had to compile using source, before i could install fink i had to install xcode and xcode developer tools (xcode > preferences > downloads > developer tools)



xcode-select -switch /path/to/Xcode.app/Contents/Developer

I had to accept a license for Xcode build by reading and typing 'accept'

sudo xcodebuild -license

Unpack tar.gz manually or double click in your browsers download, if you want to manually uncompress 

cd $HOME/Downloads
followed by
tar -xvf fink-0.34.4.tar.gz
or
tar -xvf fink-0.34.4.tar
where the choice depends on whether the source has already been partially unpacked, e.g. by Safari.
Then run the following commands in a terminal window:
cd /Users//Downloads/fink-0.34.4 

or wherever you have downloaded and un-packed fink and run the following command:

sudo ./bootstrap /Applications/fink

Sit back, relax and keep glancing at your terminal window to choose options at prompts, for most options default is fine, for geographic locations you may want to select the one closest to you.

After the installation is completed, running the command without sudo - this will comeup with a prompt this will allow you to configure apt-get to work from any part of your system.

/Applications/fink/bin/pathsetup.sh

After this do the following:

fink selfupdate-rsync
fink index -f

Once this is completed you should be able to use apt-get in mac!!!





Update Drupal 8 DB to allow Updates to be run again for testing and dev purposes

When working with Drupal 7 and indeed Drupal 8 we may sometimes need to run updates (hook_update) again. In Drupal 7 updating system table...