Thursday, November 29, 2012
Performance tips for SSAS | James Serra's Blog
The one that really helped us was to move parameters from the where clause to a subquery. This can improve performance drastically.
'via Blog this'
Ranting about Cumulative Updates
Sharepoint has had umpteen patches since Service Pack 1. Each one is outlined in separate KB articles by install file. This works out to about 100 tabs open in the browser to figure out what is in each cumulative update.
Is there not a database that has this stuff?
On another note, SQL 2012 SP1 is released, which doesn't include the fixes contained in CU3 & CU4, so download SQL 2012 SP1 and SQL 2012 SP1 CU1.
Monday, November 12, 2012
Big Data Vendor Landscape
http://www.rosebt.com/1/post/2012/06/big-data-vendor-landscape.html
Vertica is really eating everyone else's pie. They partner with many of the traditional front-end BI app developers, including Cognos, Microstrategy and Tableau.
http://www.vertica.com/partners/business-intelligence/
Friday, November 02, 2012
sql server 2008 - Table Driven WHERE Clause Using LIKE - Stack Overflow
A new one for me. Joining on another table that contains LIKE clauses.
SELECT * FROM dbo
JOIN PatterTable ON dbo.Field LIKE PatterTable.pattern'via Blog this'
Monday, October 29, 2012
LINKMEMBER MDX Syntax
The key issue I had was that the HideUnrelatedDimension property was set to false in the cube. This means that you need to explicitly specify the "all" member in the linkmember function, and exclude the all member from the scope of the calculation statement.
Details here.
http://rklh.blogspot.ca/2012/01/scope-statement-on-calculated-measure.html
Thursday, October 04, 2012
Interesting MDX Feature
select test on 0
from Adventureworks
Easter egg?
Wednesday, October 03, 2012
Humor in code
The Source was "Bowser"
The browser driver has received too many illegal datagrams from the remote computer ---- to name WORKGROUP on transport NetBT_Tcpip_{-----}. The data is the datagram. No more events will be generated until the reset frequency has expired.
http://blogs.msdn.com/b/larryosterman/archive/2011/05/02/reason-number-9-999-999-why-you-don-t-ever-use-humorous-elements-in-a-shipping-product.aspx
Apparently this means my computer is beyond repair, or at least the network card is sending out line noise.
Another one that happens every minute since I installed Windows 8:
A corrected hardware error has occurred.
Reported by component: Processor Core
Error Source: Corrected Machine Check
Error Type: No Error
Processor ID: 1
The details view of this entry contains further information.
If it's corrected, how come it keeps showing up every minute?
If it's not an error, why is it called a corrected hardware error?
I always thought the event viewer should have ads for software that fixes event viewer errors or offers up places to buy hardware. That could be a great idea for extra MS revenue...
Reading Larry's blog, I finally figured out how I fixed that insurance office network 17 years ago.
http://blogs.msdn.com/b/larryosterman/archive/2005/01/11/350800.aspx
Flipping the network cards from Auto Detect to Full Duplex solved the issue, since 1 card was blowing up the entire network.
Tuesday, September 25, 2012
Grouping in SQL RS
This solved our filtering issue.
Wednesday, September 12, 2012
Database documentation tool - Elsasoft SqlSpec
Database documentation tool - Elsasoft SqlSpec:
'via Blog this'
Thursday, August 30, 2012
Understanding Analysis Services Deployment Modes - Analysis Services and PowerPivot Team Blog - Site Home - MSDN Blogs
There are some key features which may force you to select both the Tabular and PowerPivot modes for different functional requirements.
Understanding Analysis Services Deployment Modes - Analysis Services and PowerPivot Team Blog - Site Home - MSDN Blogs: "Obviously, this way of connecting to a PowerPivot database is very different from connecting to a Tabular database, but the differences between Tabular and SharePoint modes go even deeper than that because the databases themselves are very different in nature. Tabular databases
are permanent databases. PowerPivot databases, on the other hand, are temporary. They can be deleted from the SSAS server and reloaded from their workbooks at any time."
'via Blog this'
Friday, August 24, 2012
Office Links
http://sqlblog.com/blogs/kevin_kline/archive/2012/08/23/resource-center-for-microsoft-office.aspx
Wednesday, August 08, 2012
Spider Schema
It sounds a bit like the Data Vault architecture promoted by Dan Linstedt, though targeted at reporting on vs vaulting the data.
More details here.
About Me | Spider Schema:
Tuesday, August 07, 2012
Excel 2013's Data Model Feature
The new Data Model feature lets you build out tables and relationships in your spreadsheet. No more vlookups!
http://office.microsoft.com/en-us/excel-help/create-a-data-model-in-excel-HA102923361.aspx
I have highlighted a new option in the create PivotTable dialog which is to “Add this data to the Data Model”. So what is this Data Model I speak of?
“A Data Model is a new approach for integrating data from multiple tables, effectively building a relational data source inside the Excel workbook. Within Excel, Data Models are used transparently, providing data used in PivotTables, PivotCharts, and Power View reports“. Read more here…
http://blogs.office.com/b/microsoft-excel/archive/2012/08/23/introduction-to-the-data-model-and-relationships.aspx
Data models utilize the features of PowerPivot, now embedded into Excel. They are managed by the PowerPivot add-in, and more structured than Excel tables.
It would be nice to see if there is some way of 'upsizing' or linking these data tables to SQL Server, as we had with Access. Being able to give users the ability to store large datasets on the server without admin intervention could change the way they work with Excel.
Wednesday, August 01, 2012
Oracle ORAOLEDB Data Type Hokeyness
The first quirk is just connecting. Using the ORAOLEDB.1 provider, we specify / as the user name with a blank password, and ensure connection is trusted. Under Advanced - Extended Properties, we specify a FetchSize=10000;defaultRowPrefech=10000; (perhaps larger). This is supposed to improve performance of processing the cube.
The second quirk is creating your DSV. When using the Oracle connection, it lists all tables in all schemas. The Oracle environment here is defined by separate schemas instead of separate databases. Microsoft likes the database segmentation approach. In a development environment, we have access to everything. It's not _that_ slow with 18,000 tables and 3x the amount of views, but preselecting a schema would be a nice thing to have.
This laundry list of schemas and tables has a larger impact on the audience using Power Pivot, which regulates Oracle to an "Other Data Source" while my client uses it as their primary data source. Searching through 18,000 tables, and also placing the views at the bottom in unsorted order is not the most friendly interface for building models. Exposing the data using OData links is probably the preferred approach anyway... but there has to be a better way to deal with using Oracle as a data source for Power Pivot.
One major quirk/show stopper using Analysis Services against Oracle is with data types. Sometimes the data types don't get reflected properly. A Numeric() data type in Oracle is a dynamic type, which SQL doesn't like. Casting this to NUMERIC(9,0) should theoretically provide you with Int, or Int32 in Analysis Services world.
It doesn't always. In an Analysis Services data source view, sometimes it doesn't pick up your changes. Casting a number to NUMERIC(9,0) in the Oracle View and refreshing doesn't appear to do anything. My data type is still Int64 or Decimal or whatever the Oracle decides is in my future.
The workaround is to use named queries everywhere. Any changes to the underlying data model requires changing something "case or space or anything" in the named query. Refresh View doesn't seem to work.
Analysis Services appears to then pick up data types properly.
NUMERIC(5,0) becomes Byte (SmallInt)
NUMERIC(6,0 - 9,0) Becomes Int32 (Int)
NUMERIC(10,0 - xxx,0) becomes Int64 (Bigint)
Anything with decimal places becomes a decimal. Cast your decimals and round them to ensure precision doesn't blow up the cube.
Seems like an acceptable workaround. The other way is to manually change the DSV xml. I will be avoiding that one at all costs.
When you build out your multidimensional cube, all surrogate keys and numeric identity keys should be of data type Int32. You can use smaller sizes for smaller dimension tables, but usually the tradeoff for size/performance vs. consistency is negligible. The exception to the Int rule is when you're expecting lots of rows in your dimensions or facts... 2 billion rows? Int64 or BigInt is what you want.
Would like to hear from others building cubes and PowerPivot models against Oracle data sources. What tricks or quirks do you see when dealing with non-Microsoft sources?
Wednesday, July 25, 2012
The beginning of the end of NoSQL — Too much information
Some NoSQL-categorized vendors now prefer to be called anything but NoSQL, since this connotation lumps together various technologies into a single buzzword acronym.
CouchDB is often categorized as a “NoSQL” database, a term that became increasingly popular in late 2009, and early 2010. While this term is a rather generic characterization of a database, or data store, it does clearly define a break from traditional SQL-based databases. A CouchDB database lacks a schema, or rigid pre-defined data structures such as tables. Data stored in CouchDB is a JSON document(s). The structure of the data, or document(s), can change dynamically to accommodate evolving needs.
So if the term NoSQL is dead, what is the replacement? NewSQL? The lack of creativity is amazing.
A database is only a database if it is an organized collection of data. Are NoSQL databases really organized or are they freeform? If it's not a database, what is this unstructured set of information called?
Another term that could be headed for the Furby pile is "Big Data" which is apparently a trademark of a corporation. Massive Data sounds better to me anyway...
The beginning of the end of NoSQL — Too much information
Speaking of Furbys...
http://www.engadget.com/2012/07/06/furby-hands-on-video/
Query Languages and technology Mashups
MDX (Multidimensional Expressions, or Mosha's Data Expressions) are used to query Analysis Services multidimensional and tabular cube models. Instead of a single axis of rows being returned, there is the potential to return 128 axes of data. Usually it's just rows, columns, and sometimes pages, since thinking in 128 dimensions hurts most peoples brains.
DMX (Data Mining Extensions) is a a SQL-like query language used for creating and pulling data from multidimensional mining models. The original team lead for Data Mining at MS is now CTO at a company called Predixions, implementing mining models using PowerPivot.
DAX (Data Analysis Expressions) supercedes MDX for the Analysis Services / PowerPivot tabular models. MDX is still used in the background for Excel when querying PowerPivot models, but DAX is the way you write expressions and calculated functions.
NoSQL (Not Only SQL) isn't really a language, it is a technology categorization. NoSQL databases like those used for Facebook and Google aren't necessarily relational in nature. Usually they are key-value stores. Think thin columns and deep rows. Microsoft has Windows Azure Blob Storage for it's NoSQL offering, with others on the way. HQL (Hive Query Language) is one way of querying a NoSQL database running on Hadoop. Not to be mistaken with Hibernate Query Language, used to query the ORM framework NHibernate. Most NoSQL databases fail the ACID test, and aren't necessarily good for transactional systems. They're great when you need to distribute and analyze massive amounts of data.
NewSQL is either a Sourceforge LDBC driver implementation created around 2003 that offers a simplified SQL syntax, or yet another buzzword coined last year to describe something that isn't SQL and isn't NoSQL. It's easier to use, more scalable, and performs better. It provides ACID support and all the features of new and old database technologies. Or so the vendors of this technology space suggest...
Some products in the NewSQL space include Akiban Server and (perhaps) Microsoft SQL Azure
More info here:
http://www.readwriteweb.com/cloud/2011/04/the-newsql-movement.php
Augmenting legacy SQL databases and technologies with their new "Big Data" columnstore or key-pair counterparts seems to be the Next Big Thing. Eventually one of the Big Dogs will fill this space, probably Microsoft, Oracle, IBM or SAP. For now the niche players will nibble at each other's dogfood, until somebody decides to buy out the dogfood supplier.
What we really need is a database to track and compare the companies, names, technologies, features and query languages used for each of these products. Maybe this database could just be a query language against Wikipedia. As more and more players enter the database market, research companies and consultants who understand the "Big Picture" become more important, as do content aggregators and search tools like Wikipedia, Linked-In and Google.
Wednesday, July 18, 2012
Using Sequence Identity Column in SQL 2012
It sounds like sequence is the way to go when performance and concurrency are issues. Sequence requires rewriting your application inserts to handle the syntax for sequence. eg. INSERT INTO x (id) VALUES (NEXT VALUE FOR seqname)
An insert trigger could potentially replicate the functionality of the identity column.
SQL Server 2012 Sequence Internal » John Huang's Blog
Further details on the internals of identity vs. sequence are here.
http://dba.stackexchange.com/questions/1635/why-are-denali-sequences-supposed-to-perform-better-than-identity-columns
The primary reasons I wouldn't use sequence - in some cases anyway.
1. No DISTINCT keyword allowed when using sequences in query.
2. Can't use in views.
3. Adds complexity vs. identity column.
4. Can't use in defaults for column.
5. Requires trigger to implement identity-style insert logic.
6. Can be updated without setting identity insert option.
Monday, July 16, 2012
Tuning Analysis Services
Set AggregationMemoryLimitMax and AggregationMemoryLimitMin for performance with multiple-cpu environments.
http://geekswithblogs.net/ManicArchitect/archive/2010/11/02/142558.aspx
Wednesday, July 11, 2012
Data Quality in an Enterprise Warehouse
Not adopting the latest technologies can also stop your business, though less like a train wreck and more like an old mall department store. If maintenance isn't kept up, prices don't appear competitive and new products don't keep filling the shelves, the store is probably doomed to stagnate to a slow death. If software patches and releases aren't kept up, support lifecycles expire and similar platforms become harder to integrate. Upgrading through 3 versions of an O/S while maintaining compatibility with your in-house proprietary apps becomes not just painful, but nearly impossible.
It's usually the same with data. Fixing problems immediately without doing root cause analysis can be just a band-aid solution. Not fixing problems at all could be even worse, and fixing historical data could cost more than it's worth.
Historians say that it is harder to predict the past than it is to predict the future. The internet is making predicting the past a bit easier, at least the past beyond the mid 90s.
Here's an article from 2005 that's still relevant, regarding data cleanliness in the warehouse.
Despite the great efforts from Inmon, Kimball, and the Others, the world of Data Warehousing is still facing great challenges. Even in 2005, after 14 years of Inmon explaining the concept, more than 50% of today’s data warehouse projects are anticipated to fail [Robert Jaques]. In fact, Ted Friedman, a Principal Analyst in Gartner wrote in 2005, “Many enterprises fail to recognize that they have an issue with data quality. They focus only on identifying, extracting, and loading data to the warehouse, but do not take the time to assess the quality.”
Today’s data warehouses suffer from poor quality of the data. Whether or not the poor quality of data existed a decade ago is a questionable hypothesis. In the 90s, the new breed of software products and the ease of implementing data-moving techniques have opened several avenues that resulted in data duplication. As a result, any data inconsistencies in source systems have been remedied by scrubbing and cleansing them on “local copies” of the data sets rather than taking efforts to correct them at the sources.
The premise behind many data warehousing strategies is to bring "all the things" into a central area for reporting, and perhaps analysis. More so just reporting, or "giving the users the numbers" instead of the Semantic Meaning behind them. Just like a real warehouse, the historical data starts to collect dust. Without a proper inventory control system, it becomes hard to find things. The older data elements appear a bit yellowed and sun-bleached, and perhaps have warped or had some design flaws when they were originally input that were only caught in later versions. The newest items may look shiny due to improved validation processes during input time, but could have critical defects due to ETL "Chinese Whispers" gone haywire.
The way companies deal with interpreting and fixing bad data is probably the number one reason why most BI projects fail.
As in a warehouse, it can be hard or just doesn't make ROI sense to go back and fix those old elements. It is much easier to write them off as being known bad data. This can cause grief and confusion for those trying to perform historical comparisons and trend analysis on the data.
In a dimensional model, a measure is also called a fact. What is a fact if it is not quite true? It doesn't become an opinion, it becomes an error. Loading errors into a fact table is not a good idea. So we have a conundrum.
The missing component here is being able to publish scores that tell us the data is flawed, and commentary describing why we are not going to fix it, or how data quality is being addressed, or at least how to interpret the data, and ways to ignore data based on a threshold of quality.
As we move toward the "big data" experience, identifying trust-levels within data sources becomes more important. Facts become opinions instead of errors. Opinion tables sound much nicer to me than Error tables, and for dealing with data that doesn't have to be down to the penny, like internet sentiment analysis, perhaps opinions may work.
I propose an approach to augmenting current star/snowflake dimension models. Opinionate your data models.
1. Add an opinion table to your fact table. Ensure it uses the same grain as the fact table.
2. Include status columns for each column in your fact that you wish to track errors.
3. Include an aggregated status column, and a checksum that can be matched against the source record for auditing purposes.
4. Include a type 2 with history slowly-changing dimension for managing when each fact table load occurred and what the most current load is.
5. Track all errors, questionable data, or data out of the statistical norm using this table.
6. Ensure you have dimensions for ratings, tags and link to a wiki for people to maintain commentary.
This table load could be automated using standard ETL tools, or manually augmented using data input forms. If you're using Excel 2007/2010 and Analysis Services cubes, you could use the writeback feature to assign scores to existing data and provide 1/0 values for true/false flagging. Some tools, like Oracle and SQL, automatically create statistics and histograms on tables which you could leverage to find out what data are outliers. Data Quality Services tools can be trained to understand your data and identify suggestions or even fix your data.
Gamify the process by giving out weekly prizes for those business users that find the most issues with the data, and those technical users that propose the best solutions for fixing the issues.
Sometimes automating data processes can uncover unsettling facts, especially with people & processes that resist automation. Improving data quality gives more power to the users of that data, and less power to the controllers of the data. This isn't always a good thing.... depending on your opinion.
Wednesday, June 20, 2012
SSIS SCD Wizard Performance Issue « Data Warehousing and Business Intelligence
If your dimension is 1000 rows and the source data is 1000 rows, the SCD takes a few minutes. But if your dimension is 500k and the source data is 500k, it’s 5-6 hours performance. The whole idea is fundamentally flawed. On the welcome screen of the SSIS box, there should be a warning: Warning, do not use SSIS SCD Wizard if your dimension is above 1000 rows.
So, if your dimension is 1m rows, what should you do?
SSIS SCD Wizard Performance Issue « Data Warehousing and Business Intelligence
Tuesday, June 19, 2012
Filtering by logged in user
WITH MEMBER Measures.UserID
as
'UserName()'
SELECT
[Measures].[UserID]
ON COLUMNS
FROM
Filtering by logged in user
Friday, June 08, 2012
Wednesday, June 06, 2012
Add Report Server Content Types to a Library (Reporting Services in SharePoint Integrated Mode)
To configure Reporting Services against an existing Sharepoint 2010 farm, you need to go to the root site and enable the proper features. Then configure libraries in subsites to expose the proper content types (Report Builder Model, Report, Data Source).
It can get a bit tricky. Here are some links.
Add Report Server Content Types to a Library (Reporting Services in SharePoint Integrated Mode)
SQL Error 909 when restoring ContosoRetailDW.bak using MS SQL Server Mgt. Studio
If you’re trying to restore the Contoso database samples on a BI or Standard edition of SQL 2012, you’ll get an error about Enterprise features being used. Here is a script that can be run on a trial Enterprise or Developer edition of SQL 2012 to remove that dependency.
ALTER INDEX ALL ON DimChannel REBUILD WITH (DATA_COMPRESSION = None);
ALTER INDEX ALL ON DimEntity REBUILD WITH (DATA_COMPRESSION = None);
ALTER INDEX ALL ON DimProduct REBUILD WITH (DATA_COMPRESSION = None);
ALTER INDEX ALL ON DimStore REBUILD WITH (DATA_COMPRESSION = None);
ALTER INDEX ALL ON FactExchangeRate REBUILD WITH (DATA_COMPRESSION = None);
ALTER INDEX ALL ON FactInventory REBUILD WITH (DATA_COMPRESSION = None);
ALTER INDEX ALL ON FactITMachine REBUILD WITH (DATA_COMPRESSION = None);
ALTER INDEX ALL ON FactOnlineSales REBUILD WITH (DATA_COMPRESSION = None);
ALTER INDEX ALL ON FactSales REBUILD WITH (DATA_COMPRESSION = None);
ALTER INDEX ALL ON FactSalesQuota REBUILD WITH (DATA_COMPRESSION = None);
ALTER INDEX ALL ON FactStrategyPlan REBUILD WITH (DATA_COMPRESSION = None);
SQL Error 909 when restoring ContosoRetailDW.bak using MS SQL Server Mgt. Studio
Thursday, May 31, 2012
Share cache across SSIS packages
Prior to SSIS 2012 it was not possible to re-use or share the same cache across packages. What that meant is if you created a cache you will be able to consume it only in the very package where the cache was instantiated. Therefore, a developer could take only two routes from here – either make as much heavy ETL processing as possible inside the same package where the cache resides, or populate yet another cache object in every other package where it is needed. The latter was
especially harmful leading to unnecessary heavy extra database calls and an extended development time. This limitation has been overcome in the SSIS 2012 release.
Monday, May 14, 2012
SQL Live Monitor
a .NET application that provides realtime performance data on the target SQL Server instance.
No installation required, data displayed in realtime, and can also be logged to CSV for offline analysis. Will also capture SQL data for processing uing PAL.
Thursday, May 10, 2012
Layerscape - Home Page
Layerscape is a fascinating Microsoft Research tool that integrates with Worldwide Telescope to provide deep spatial virtualizations.
Thursday, May 03, 2012
Download: Microsoft® SSIS Balanced Data Distributor - Microsoft Download Center - Download Details
Useful for distributing workloads and speeding up SSIS performance.
Microsoft® SSIS Balanced Data Distributor (BDD) is a new SSIS transform. This transform takes a single input and distributes the incoming rows to one or more outputs uniformly via multithreading. The transform takes one pipeline buffer worth of rows at a time and moves it to the next output in a round robin fashion. It’s balanced and synchronous so if one of the downstream transforms or destinations is slower than the others, the rest of the pipeline will stall so this transform works best if all of the outputs have identical transforms and destinations. The intention of BDD is to improve performance via multi-threading. Several characteristics of the scenarios BDD applies to: 1) the destinations would be uniform, or at least be of the same type. 2) the input is faster than the output, for example, reading from flat file to OleDB.
NOTE: An updated version of Microsoft SSIS Balanced Data Distributor (BDD) is available. This includes a fix for the problem causing BIDS to crash when removing the BDD transform. For more details, and download information, see KB 2616527.
Download: Microsoft® SSIS Balanced Data Distributor - Microsoft Download Center - Download Details
Monday, April 30, 2012
SQL Server Security - Site Home - MSDN Blogs
Instead of using CHECKSUM, timestamp, or other methods, hashing data may be a good alternative.
A common scenario in data warehousing applications is knowing what source system records to update, what data needs to be loaded and which data rows can be skipped as nothing has changed since they were last loaded. Another possible scenario is the need to facilitate searching data that is encrypted using cell level encryption or storing application passwords inside the database.
Data Hashing can be used to solve this problem in SQL Server.
Friday, April 27, 2012
SQL 2012 Labs
Some additional training content for SQL 2012, from John Weston’s site
MSDN Virtual Lab: SQL Server 2012: Working with the SQL Server 2012 Availability Groups
MSDN Virtual Lab Express: SQL Server 2012: Exploring SQL Server 2012 Reporting Services PowerView
Don’t forget SQL 2012 just RTM’ed get more info here: http://blogs.technet.com/b/jweston/archive/2012/04/02/sql-2012-rtm-available-for-technet-customers-to-download-today.aspx
Single Text Can Wreck Windows Phone 7 Devices | Fox Business
Apparently, sending a certain long message to a WP7 device kills the messaging app and requires a factory reset. No minor glitch here. Could carriers detect this message and filter it out?
This could get ugly. Back up your WP7 phones!
Thursday, April 26, 2012
Twitter Kills The Majesty of Pleasant Conversation
Words are powerful things. As of late, I have been working on text mining social data feeds, and investigating how Hadoop, R, Azure and SQL Server 2012 fit into the big picture. I wrote a SQL Integration Services package using a script task that pulls Twitter data from a syndication feed, parses out the words, hash tags, screen names and links, and stores the results in a SQL 2012 (or optionally SQL Azure) database. Performing some text mining of Twitter status keywords against a selection of “best” and “worst” words in the English language brings back some interesting and slightly depressing results.
I started by entering in the database the twitter screen names to follow. I targeted a few Canadian bike companies (don’t ask why). Each time the package is run, it adds the last 20 tweets, and parses screen names of users that are mentioned in each tweet. And so on. This recursion builds a very quick six-degrees-of-separation trail and some fairly random discussions.
Running this process for about 4 days, sometimes 2-3 times per day, produced 5599 tweets. Originally I was looking at using R and Hadoop to analyze the results, which is a bit like bringing a ballistic missile to a knife fight. To slice this data with SQL takes only a couple of seconds or less. Perhaps reading the entire Twitter firehose or analyzing historic tweet data might change the architecture in the future. For now, things are working pretty well.
Of a selection of 5599 individual tweets, 9 contain the “best words” and 2135 have the “worst words” as rated by Vocabula Review. That’s 38% of the sample that have an aura of foolishness or odium, and 0.1% that have an aura of fun and majesty. The sampling is fairly small, with the top word “valley” only coming up 3 times.
Another dataset with seeded with a more technology-centric list of twitter users like Robert Scoble some Microsoft folks I follow brought back similar results. Running this process over the course of a month saved 59,583 tweets containing 796,171 words, links, screen names, emoticons and hash tags.
Of the 796k words, 24,171 came up in the “worst words” filter. That’s about 30%. A measly 282 came up in the “best words” filter. That’s less than 0.001%.
The following Top 5 Words came up.
valley 74
azure 19
simplicity 11
bliss 10
recherche 8
- Valley makes sense, with Silicon Valley, Napa Valley, and those other west coast valleys being discussed.
- Azure makes sense, since a steady stream of Windows Azure propaganda continually bubbles from Microsoft.
- Simplicity comes up a few times when people talk about Apple or iPad.
- Bliss comes up because of Rob Bliss, a Washington Times columnist, and some comments about cranberry bliss bars.
- Recherche, well, let’s chalk that up to the fact that some of the best words in the English language are French. Mon dieu.
With only 140 characters to leverage, you would think that people would use words like “animadversion” or “cachinnation” to provide deep and meaningful expression. Instead, you get the logorrhea that is the Twitter dataset.
Check out www.vocabula.com to improve your tweets and amaze your followers with fun and majesty.
The BI Software Tool Belt
Excellent list of must-have software for SQL Server and everything else.
My must-have software is:
- Microsoft OneNote, Live Writer
- SSMS Tools
- BIDS Helper
- WinDirStat
Wednesday, April 25, 2012
SQL Server FineBuild
A wizard for installing SQL? Yes, it exists…
FineBuild provides 1-click install and best-practice configuration of SQL Server 2012, SQL Server 2008 R2, SQL Server 2008, and SQL Server 2005.
Tuesday, April 24, 2012
CQRS & Separation of Church and State
CQRS is a design pattern acronym which stands for Command Query and Responsibility Segregation. I would call this the R/RW pattern, or the separation of Church and State. In a reporting application design, it would silo the read queries from the write commands.
In its simplicity, it is designing two services for each application interface, one that reads and one that writes.
That is it. That is the entirety of the CQRS pattern.
CQRS, Task Based UIs, Event Sourcing agh! | Greg Young
Since the bulk of applications have highly concurrent reads and minimal or bulk writes, thinking about this pattern during design is very important. What if you were able to architect a system where you could spin up unlimited services that provide querying, and a few services that provide writes/updates, queuing them up for later? Would this improve performance and scalability? What if you could split the data for these systems into unlimited read-only models and a few read/write models?
What if the data in the read-only models was effortlessly in-synch with the read/write models, with minimal latency and lazy consistency?
This is one of the tenets behind Azure and cloud computing in general.
You can do this in a SQL Server environment, albeit not so effortlessly and with some gotchas, using Log Shipping, Undo and database snapshots for a read-only picture of the database. There is some latency, and keeping the databases in synch adds lots of overhead.
SQL 2012 appears to solve some of the limitations of SQL 2008 R2 log shipping with it’s AlwaysOn capabilities for rolling out read-only, distributed clusters. It still doesn’t seem as effortless as it should be.
Replication also offers a solution, though it can be brittle and has its limitations and maintenance overhead.
SANs also offer a high-cost solution for fast bulk loads, with some downtime.
You cannot use SAN snapshots of an active SQL Server database file(s) unless you detach them first. This is very intrusive.
I actually do this for a situation where I need 800GB of data from one server to another every night.
- Detach the database on ServerA
- SAN snapshot serverA drives F:,G:,H:,I:
- reattach database on ServerA
- detach database on ServerB
- detach drive L,M,N,O on ServerB
- mount new SAN snapshots L,M.N,O on ServerB
- attach database files from L,M,N,O on ServerB
It requires an outage on both, but is very fast to virtually move the data to the other server.
Change data capture is another solution for maintaining multiple concurrent copies of a data stream. SQL 2012 now has an Oracle CDC feature which improves the integration and reporting story.
The above solutions are very hammer-based approaches to the solution and don’t necessarily relate to CQRS. Message queuing, eventing, change data capture and pub/sub systems all seem to fall within the CQRS design pattern, though they are really just an extension to the thought process.
Chad Meyers has an excellent posting on Laws, Rules, Principles, Patterns, and Practices.
Another set of excellent posts on the concept of CQRS.
There doesn’t seem to be a silver bullet to the solution just yet, though I’m sure someone will get it right sooner or later. The CQRS pattern is merely that, a pattern. The devil is in the details.
Monday, April 23, 2012
Ctrl+E not working for executing statements | Microsoft Connect
Upgrading from Denali CTP1? Your keyboard shortcuts (CTRL-E, CTRL-R) are probably not working. Use Options – Keyboard – Reset to fix.
Ctrl+E not working for executing statements | Microsoft Connect
Reloading Agent Settings appears in SQL Agent Logs every 90 seconds | Microsoft Connect
One of the things you may want to do as soon as you install SQL 2012 is disable the auto refresh feature of SQL Agent, which will fill up your agent logs with superfluous messages.
We investigated this issue and it's indeed painful behavior from the user perspectives, we will address this issue in nearest Service Pack.
For now I can suggest you the the following workaround:
Go to registry on your SQL Server machine and navigate to corresponding SQL Instance registry key: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\ MSSQL11.YOUR_INSTANCE_NAME\SQLServerAgent
Change registry settings called “AutoRegistryRefresh” from 1 to 0.
The side impact of this: if you change any default settings behavior for SQL Agent, you may need to restart SQL Agent to get it.
Reloading Agent Settings appears in SQL Agent Logs every 90 seconds | Microsoft Connect
Exporting via HTML instead of MHTML - CSS SQL Server Engineers - Site Home - MSDN Blogs
How to configure additional output formats (HTML) for Reporting Services.
There was a question on Twitter about how to display a report in HTML instead of MHTML due to some browser issues. Based on the fact that it was MHTML, I’m assuming we are talking about exporting a report as the default report view is HTML. First off, if we look at our export options for a report, we see the following:
Exporting via HTML instead of MHTML - CSS SQL Server Engineers - Site Home - MSDN Blogs
Thursday, April 19, 2012
Jonathan Kehayias | Installing and Configuring SQL Server 2012 Distributed Replay
The distributed replay feature of SQL 2012 allows for load testing a sampled workload against a different SQL server, and spinning up multiple workloads to simulate real activity.
Here’s the install instructions.
Jonathan Kehayias | Installing and Configuring SQL Server 2012 Distributed Replay
Monday, April 16, 2012
SQL Server 2012 Command Line Utilities — DatabaseJournal.com
The MAK talks about command line utilities in SQL 2012.
Microsoft provided many Command line utilities for SQL Server 2012. In this article I am going to explain the functionality and use of these command line utilities based on the feature it is dealing with.
SQL Server 2012 Command Line Utilities — DatabaseJournal.com
Also, the details on PowerShell cmdlets for SQL
Friday, April 13, 2012
Paging Function Performance in SQL Server 2012 | Ms SQL Girl
Beware the new fetch feature of SQL 2012. RBAR in disguise?
Although SQL Server 2012 new Paging function is intuitive for the purpose of paging a data set, it comes at a cost for large volume of data. The Top Batching and Direct ID Batching perform significantly better under similar circumstances, i.e. simple Source Data with continuous ID.
Paging Function Performance in SQL Server 2012 | Ms SQL Girl
There is also another bug around identity columns skipping increments after a SQL restart that might affect how paging functions in SQL 2012. Be sure to test any applications doing server-side paging!
Thursday, April 12, 2012
TechNet Log Parser Studio
Log Parser Studio is a utility that allows you to search through and create reports from your IIS, Event, EXADB and others types of logs. It builds on top of Log Parser 2.2 and has a full user interface for easy creation and management of related SQL queries.
Wednesday, April 11, 2012
Data Warehousing and Business Intelligence
Data Warehousing and Business Intelligence
Vincent Rainardi presents an excellent resource for a Business Intelligence project, with plenty of tips and tricks to succeed when building a data warehouse or reporting solution.
Monday, April 09, 2012
Aaron Bertrand : Two bugs you should be aware of
Questioning using SQL 2012 as a production environment? There are a few critical issues with the current platform that may make you want to think about when SP1 is coming out….
This one is critical for those that think identity should be an auto-incrementing sequential key.
Just to add to your list. There is one more critical bug wherein if you define identity column on a table and insert some values. Thereafter restart SQL server and insert further more values and do a restart again and then when u insert in that table there are large gaps in the identity values.
This is Sparta...n SQL!: Scripting out statistics
An interesting technique for building a VSDB (Very Small Database) with a very large query plan profile. If you have a huge database and don’t want to do performance testing of procedures in production, mirroring statistics could be a way to simulate this activity using tiny or even empty tables.
I recently read an excellent post by Kevin Kline regarding scripting out histograms from one database and applying them to another. What would have made it better is if there was a way to automate it. And so I pulled out powershell and got to work.
This is Sparta...n SQL!: Scripting out statistics
There should really be a tool that clones the schema of a database and the statistics of a database while clearing out the data or subsets of the data.
Thursday, March 29, 2012
"SYLK: File format is not valid" error message when you open file
Before deciding to use ID as a column name, consider this scenario when dealing with CSV files…
"SYLK: File format is not valid" error message when you open file
"SYLK: File format is not valid" error message when you open file
SAP Data Services | A small notes. . .
Some information on SAP Data Services here, and a refresher on Databases and Data Warehousing.
Also, an interesting post on ETL vs ELT. Would it be faster to use ETL to transform your data before loading, or would it be faster to just copy your data directly to the warehouse and manage it inside?
Perhaps this is the role the new FileTable type in SQL 2012 will play for you. Why not FTP your CSV files directly into SQL as blobs, and manage the transformations within?
Maybe not the best example, but it does centralize things a bit more…
Wednesday, March 28, 2012
New programming features in SQL 2012
SQL Server 2012 is released this weekend. SQL 2012, you can now use select-style statements to return customized results from a single stored procedure.
EXEC uspGetEmployeeManagers 16 WITH RESULT SETS ( ([Reporting Level] int NOT NULL, [ID of Employee] int NOT NULL, [Employee First Name] nvarchar(50) NOT NULL, [Employee Last Name] nvarchar(50) NOT NULL, [Employee ID of Manager] nvarchar(max) NOT NULL, [Manager First Name] nvarchar(50) NOT NULL, [Manager Last Name] nvarchar(50) NOT NULL ) );
Custom paging using an ORDER BY clause
-- Skip 0 rows and return only the first 10 rows from the sorted result set. SELECT DepartmentID, Name, GroupName FROM HumanResources.Department ORDER BY DepartmentID OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY;
Throw to negate the need for custom messages with RAISERROR
C#/VB style Format to format dates, times, numbers
OVER clause for windowed summarization / rolling totals
PARSE to convert strings to dates, currencies, etc
TRY_CONVERT to avoid errors when converting data types
TRY_PARSE to determine if strings = data types
DATEFROMPARTS to take a year, month and day and create a date
CHOOSE to return an index item from a list of selections. (Arrays in SQL?)
Monday, March 26, 2012
Bob's blog - SQL 2008 R2 MDS Easter Egg - wBob
Some helpful tools coming out of SQL MDS, including regex and XSLT straight from T-SQL.
I've been having a play with SQL 2008 R2 and its new Master Data Services (MDS). I was pleased to see it adds some additional functions when you create a new MDS database. These include mdq.Split and mdq.RegexReplace which implement handy Regular Expression functions. Most of these are documented at: http://msdn.microsoft.com/en-us/library/ee633712(SQL.105).aspx
I also discovered a scalar function called mdq.XmlTransform which allows you to use XSLT:
Wednesday, March 21, 2012
Microsoft SQL Server 2012 Virtual Labs Available To You Online, plus many more SQL Server 2012 resources - Microsoft SMS&P Partner Community Blog - By Eric Ligman - Site Home - MSDN Blogs
'via Blog this'


