Wednesday, March 12, 2014

uniqueidentifier [guid] vs int

The uniqueidentifier data type stores 16-byte binary values that operate as globally unique identifiers (GUIDs). A GUID is a unique binary number; no other computer in the world will generate a duplicate of that GUID value.
On other hand int data type store 4-byte value.

A GUID value for a uniqueidentifier column is usually obtained by one of the following ways:
  •     In a Transact-SQL statement, batch, or script by calling the NEWID function.
  •     In application code by calling an application API function or method that returns a GUID.
   
   
The uniqueidentifier data type has the following disadvantages:
  •     The values are long and obscure. This makes them difficult for users to type correctly, and more difficult for users to remember.
  •     The values are random and cannot accept any patterns that may make them more meaningful to users.
  •     There is no way to determine the sequence in which uniqueidentifier values were generated. They are not suited for existing applications that depend on incrementing key values serially.
  •     At 16 bytes, the uniqueidentifier data type is relatively larger than other data types, such as 4-byte integers. This means indexes that are built using uniqueidentifier keys might be relatively slower than indexes using an int key.
   
Cluster-index can be apply on:-
  1. Random GUIDs (using NEWID function)
  2. Sequential GUIDs (using NewSequentialGUID function)
  3. BIGINT identity value
  4. INT identity value

Below is the table for comparing and ranking for above datatypes:


Criteria

GUIDs

Seq. GUIDs

BIGINT

INT

Storage

16 bytes

16 bytes

8 bytes

4 bytes

Insert/Update performance

Slowest

Comparable but the index keys are larger in size. For wider tables, this would be slower than Integer values.

Faster than sequential GUIDs

Fastest

Hotspot contention

Very rare

Yes

Yes

Highest, due to smaller size of RIDs

Fragmentation/Page Splits

High

Minimal

Minimal

Minimal

JOIN Performance/SORT operations

Least performance
(Rank 4 = Least performance)

Better than random GUIDs due lesser fragmentation (Rank: 3)

High performance (Rank: 2)

High
Performance
(Rank: 1)

Logical reads

Rank 4=Highest

Rank 3

Rank 2

Rank 1=Least

Merging data across servers

Beneficial

Beneficial

Difficult

Difficult

Uniqueness

Rare chance of duplicates

Globally unique. Virtually no chance of collisions

Limited by range of BIGINT

Limited by range of INT


The above considerations make the use of GUIDs unfavorable for a clustered index in environments which have large number of queries performing JOIN operations in OLTP and when referential integrity is enforced on the database among multiple tables. Throw non-clustered indexes that you created on the table as covering indexes for the frequently run queries against the database, you can have a significant performance bottleneck.

Source:
http://blogs.msdn.com/b/sqlserverfaq/archive/2010/05/27/guid-vs-int-debate.aspx?Redirected=true

No comments:

Post a Comment

web stats