Wednesday, October 12, 2011

Loading Lists into SQL Server

We recently run into a performance wall when loading objects of varying sizes (1 to say a million properties) into SQL Server 2008.

As you will find that ADO DataTable (also referred to as Table Valued Paramater or TVP in SQL land) is the fastest way to load lists and arrays into SQL Server.

Table Value Parameter in SQL Server 2008 and Handling Arrays and List in SQL

Ok, this was not fast enough for our needs. We were comparing our object Write performance with correspondingly long string - and we kept coming up short. For example, if you load an array of strings say each 1k long via DataTable and you do the same with one simple SProc which takes one parameter (of one string of 10k long), doing one insert (predictably) is the fastest way.

But, we would never know apriori, the size of array/list so we could not create an static SProc which takes the correct number of parameters.

So, we tried to break up large arrays into say 5 parameters each and called a single SProc which took 5 parameters. Of course, we had N threads doing this (each thread calling the SProc). Intuitively, we would think that this will yield better throughput. But, it did not! DataTable (aka TVP) still beat the final throughput results - while still not meeting our Write throughput expectations.

So, we profiled the actual SProc getting called. Before we mention what we found, to the SQL experts questions, we took out almost everything out of the sproc except the insert of N parameters (columns) into a 3 tables.

We discovered that the ADO.Net, basically creates dynamic SQL like "insert into @p1 values.." before calling our SProc. So, even the fastest way of loading lists and array into SQL has some dynamic SQL which needs to be compiled each time!

We asked the SQL Server team and they confirmed what we found. They however did suggest SQLDataRecord using and we found it did a yield a 5-10% improvement (but we were looking for 100% improvement). So, we will wait for next release of SQL Server which promises better throughput.









Friday, September 9, 2011

Merging Facial Recognisation with Data About the user

One has to, of course, worry about loss of privacy and valid legal challenges.

However, on a pure technical and business level, this is quite a convergence of targeting - being able to recognize a user (visually in this case) and attaching it to the data about the user (why stop at ads they clicked on, ...go on their consumer profile and find out what type of car they can afford..).

Alessandro Acquisti, Ph.D, a researcher and instructor still at Carnegie Mellon has designed an iPhone app that functions as a front end for PittPatt's facial recognition technology. As mentioned, it can identify strangers Facebook profiles with startling accuracy.

And that's not all it can do. It also incorporates searches of public databases that allows it to make a good guess at your social security number. If it knows your date of birth (e.g. if your Facebook profile is public), there's a good chance it can ID your social security number. More
The real question is how to make this technology available to users in way that makes their life a bit easier without sacrificing privacy (Remember a decade ago, you did not have a cell phone so your wife or your boss could not reach you when you travelled. Cell phones have made our life easier but we are now available all the time..)

Friday, August 12, 2011

Big Data - Visualizing the Exploding Data Growth

I found this blog post really illuminating. The actual data volume size/growth curve may be off but at least, it captures in one place the deep magnitude of data

http://blog.getsatisfaction.com/2011/07/13/big-data/?view=socialstudies