Wednesday, 24 March 2010
A little while ago, I delivered my master thesis and finished my Master of Technology in Computer Science degree at the Norwegian University of Science and Technology. The thesis has the title iAD: Query Optimization in MARS, and is about building a query optimizer for Fast's (now a subsidiary of Microsoft, they bought them) new Enterprise Search solution.

The report itself can be found here: MasterReport.pdf, and the abstract is included below:

This document is the report for the authors' joint effort in researching and designing a query optimizer for Fast's next-generation search platform, known as MARS. The work was done during our master thesis at the Department of Computer and Information Science at the Norwegian University of Science and Technology, spring 2009.

MARS does not currently employ any form of query optimizer, but does have a parser and a runtime system. The report therefore focuses on the core query optimizing aspects, like plan generation and optimizer design. First, we give an introduction to query optimizers and selected problems. Then, we describe previous and ongoing efforts regarding query optimizers, before shifting focus to our own design and results.

MARS supports DAG-structured query plans for more efficient execution, which means that the optimizer must do so too. This turned out to be a greater task than what it might seem like -- since we must use algorithms that greatly differ from the optimizers we were familiar with. The optimizer also needed to be extensible, including the ability to deal with future query operators, as well as supporting arbitrary cost models.

During the course of the master's thesis, we have laid out the design of an optimizer that satisfies these goals. The optimizer is able to recognize common subexpressions and construct DAGs from non-DAG inputs. Extensibility is solved by loose coupling between optimizer components. Rules are used to model operators, and the cost model is a separate, customizable component. We have also implemented a prototype that demonstrates that the design actually works.

The optimizer itself is designed as a separate component, not tied up to MARS. We have been able tin inject it into the MARS query pipeline and run queries end-to-end with optimization enabled, improving query evaluation time. For now, the project depends on MARS assemblies, but reusing it for another engine and algebra is entirely feasible.

posted on Wednesday, 24 March 2010 04:07:19 (W. Europe Standard Time, UTC+01:00)  #    Comments [2]
 Tuesday, 07 July 2009
I while ago I developed a Windows Vista Sidebar Gadget for tracking packages shipped using the Norwegian Postal Service, Posten - or Bring as they call themselves these days.

A couple of users (thanks Haakon and Morten) made me aware of that it's not working anymore. I've looked into it and found that Posten has changed the address to the page I'm downloading the tracking data from.

I've now updated it, so it should be working again. It looks like there's a delay for updating gadgets at the Live Gallery, so in the meantime you can download it here.

posted on Tuesday, 07 July 2009 14:25:57 (W. Europe Standard Time, UTC+01:00)  #    Comments [14]
 Friday, 30 January 2009
I'm in my last semester here at the Norwegian University of Science and Technology, which means that I'm starting on my master thesis these days. Actually, I'm not starting now, I'm continuing on the "pre-project" I worked on together with my friend, Alex Brasetvik from August to December last year.

Our master thesis is about building a query optimizer for Fast's (now a subsidiary of Microsoft, they bought them) new Enterprise Search solution.

The report itself can be found here: MasterProjectReport.pdf, and the abstract is included below:
This document is the report for the authors’ joint effort in researching and designing a query optimizer for fast’s next-generation search platform, known as MARS. This work was done during the pre-project to the master thesis at the Department of Computer and Information Science at the Norwegian University of Science and Technology, autumn 2008.

MARS does not currently employ any form of query optimizer, but does have a parser and a runtime system. The report therefore focuses on the core query optimizing aspects, like plan generation and optimizer design. First, we give an introduction to query optimizers and selected problems. Then, we describe previous and ongoing efforts regarding query optimizers, before shifting focus to our own design and results.

MARS supports
DAG-structured query plans, which means that the optimizer must do so too. This turned out to be a greater task than what it might seem like. The optimizer also needed to be extensible, including the ability to deal with query operators it does not know, as well as supporting arbitrary cost models.

During the course of the project, we have laid out the design of an optimizer we believe satisfies these goals. DAGs are currently not
fully supported, but the design can be extended to do so. Extensibility is solved by loose coupling between optimizer components. Rules are used to model operators, and the cost model is a separate, customizable component. We have also implemented a prototype that demonstrates that the design actually works.

posted on Friday, 30 January 2009 03:08:08 (W. Europe Standard Time, UTC+01:00)  #    Comments [1]
 Monday, 01 September 2008
Here are the slides and T-SQL code I used during Thursday's Norwegian .NET User Group presentation.
I presented basic transaction processing with emphasis on concurrency and isolation. I hope everyone had a good time. I definitely had a good time presenting.

I also got some really good questions, one of them being what would happen if SQL Server were to lose the log file for one of its databases during operation. Since I didn't give the full explanation at the presentation, I've written a blog post about it. It can be found here.

Also, I mentioned that SQL Server 2008 RTM'ed (it's done!) on or sometime before 6th of August with build number 10.0.1600.22. I didn't blog about it here since I was OOF at the time :-)

Transaksjoner, isolasjonsnivåer og låsing i SQL Server.pptx (151.12 KB)
NNUGDemos-2008-08-28-HON.zip (2.92 KB)


posted on Monday, 01 September 2008 21:32:15 (W. Europe Standard Time, UTC+01:00)  #    Comments [3]
I got a question at a .NET Community Event a few days ago about what would happen if SQL Server were to lose the log (LDF) or data (MDF/NDFs) file for a database while in operation (e.g. the disk with the data or log file on crashes). If I've got my SQL Server disaster recovery right, this should be what would happen:

First, if both data and log are lost, it's simple - SQL Server will stop servicing requests for that DB and we'll need to restore everything from our last backup (possibly some minutes/hours/days old, depending on your backup scheme).

Second, if the data file is lost, while the log is good, SQL Server will probably stop servicing requests pretty quickly here too, but we shouldn't lose any data (assuming we're running under the full recovery model and have taken at least one full backup and have the log chain intact - that is, we haven't truncated the transaction log and we've got all log backups since the last full or differential backup ready for restore). We can just restore the last full backup, then the last differential one and then all log backups consecutively, up to and including the tail of the log that is still good.

Third, if the log file is lost, while the data file is good, we may have bigger problems. SQL Server will at least stop servicing any requests involving writing to the database, and we now have the potential to lose data.
But wait - we have the complete data file - why would we lose data? The reason for this is the way SQL Server handles buffering and recovery, using the ARIES algorithm. ARIES uses a so-called STEAL/NO-FORCE approach to optimize performance for the buffer pool (SQL Server's in-memory data cache), which basically means that data from uncommitted transactions can be written to the MDF/NDFs on disk and that data from committed transactions can still only reside in memory.

This means that if there are open transactions or any transactions have been writing data to the database since the last checkpoint at the time of the crash (and possibly more scenarios), the data file is potentially in an inconsistent state. Losing the log file in such a situation can cause database corruption, broken constraints, half-finished transactions, lost data and all sorts of crap, since SQL Server will not be able to roll back uncommitted transactions or roll forward committed ones.

If the log is lost, it can be rebuilt using Emergency Mode Repair, but as Paul S. Randall (former SQL Server employee) describes here, this is something that shouldn't be done unless you're out of other options.

So, the only way to ensure you don't lose data is, once again, a plan for backup and disaster recovery. Murphy states that if you don't, you WILL find yourself in deep shit at some time in the future.

And when we're on the topic of losing the log - I've seen some pretty ridiculous ways of reducing the size of your log file around different forums. I've seen posts advising people to just delete or rebuild the log file whenever it gets too big. That is a pretty bad piece of advise (unless you know what you're doing and are checkpointing or detaching the database first). Rebuilding the log is, due to the reasons above, a pretty quick and handy way of inducing corruption into your database. To reduce the size of your transaction log, back it up using the BACKUP LOG statement, optionally shrinking the log files afterward.

So, do you agree with me? Feel free to post comments if I've got something wrong.
posted on Monday, 01 September 2008 18:37:09 (W. Europe Standard Time, UTC+01:00)  #    Comments [1]
 Saturday, 07 June 2008
A few months ago I wrote that a SQL Server 2008 RC (Release Candidate) was scheduled for Q2 this year.

Looks like Microsoft is staying on their schedule - RC0 was just released to MSDN and TechNet subscribers!

EDIT: RC0 is now available here for the public as well.

posted on Saturday, 07 June 2008 04:58:05 (W. Europe Standard Time, UTC+01:00)  #    Comments [1]
 Wednesday, 04 June 2008
Microsoft has published SQL Server's new logo:




I think it looks good :-)

Courtesy of Wesley from http://blogs.msdn.com/wesleyb/archive/2008/06/03/sql-server-logo.aspx

posted on Wednesday, 04 June 2008 02:58:52 (W. Europe Standard Time, UTC+01:00)  #    Comments [1]
 Friday, 09 May 2008
I recently wrote a paper at school about how flash memory impacts the database world. Those who are interested can read it here: How Flash Memory Changes the DBMS World - An Introduction

posted on Friday, 09 May 2008 15:30:30 (W. Europe Standard Time, UTC+01:00)  #    Comments [4]