你的位置:首页 > 数据库

[数据库]【转载】GUID vs INT Debate


I recently read a blog post on what was better using GUIDs or Integer values. This is been an age long debate and there are advocates in both camps stressing on the disadvantages of the other. Well both implementations have their advantages and disadvantages. At the outset, I shall mention that the answer to this debate is: IT DEPENDS! J

It is highly dependent on your database design, migration needs and overall architecture.  There is a good reason why SQL Server replication uses GUIDs to track the changes to the replicated articles. So, it is not that the usage to GUIDs is necessarily a bad practice. SQL Server Books Online lists the following disadvantages for uniqueidentifier data type:

  • The values are long and obscure. This makes them difficult for users to type correctly, and more difficult for users to remember.
  • The values are random and cannot accept any patterns that may make them more meaningful to users.
  • There is no way to determine the sequence in which uniqueidentifier values were generated. They are not suited for existing applications that depend on incrementing key values serially.
  • At 16 bytes, the uniqueidentifier data type is relatively larger than other data types, such as 4-byte integers. This means indexes that are built using uniqueidentifier keys might be relatively slower than indexes using an int key.

If you are using NEWID function in SQL Server, then this generates random UUIDs which have a huge domain but the chances of GUID collisions are always there though the probability is very slim in nature. If you are using NEWID function to generate uniqueidentifiers as row identifiers in your table, then you need to think again! Uniqueness of the row should be enforced using a Unique or Primary Key constraint on the table. NewSequentialID function uses identification number of the computer network card plus a unique number from the CPU clock to generate the uniqueidentifier (Reference article). So the chance of getting a globally unique value is practically guaranteed as long as the machine has a network card. Moreover, possibility of a GUID collision while using NewSequentialID is virtually impossible.

 

Given that you have a beefy server, the above time difference would not make much of a difference unless and until you only have a high number of concurrent INSERT workload on the server or during a Data Load operation which would cause a significant impact. What is interesting to note is that the fragmentation on the tables after the first batch of 1 million inserts.

Object Name

Index Name

Pages

Average Record Size

Extents

Average Page Density

Logical Fragmentation

Extent Fragmentation

tblGUID

cidx_tblGUID

9608

51.89

1209.00

69.27

99.14

0.25

tblSeqGUID

cidx_tblSeqGUID

6697

51.89

845.00

99.39

0.76

0.12

tblBigINT

cidx_tblBigINT

5671

43.89

714.00

99.95

0.48

0.14

tblINT

cidx_tblINT

5194

39.89

653.00

99.62

0.37

0.15

 

If you look at the above data, you will see that the random GUIDs have 99% logical fragmentation in the tables. This is due to the random nature of the GUIDs generated which end up causing high number of page splits in the database.

--------------

原文地址:http://blogs.msdn.com/b/sqlserverfaq/archive/2010/05/27/guid-vs-int-debate.aspx

上面的表格说明,普通GUID 会发生很大的页分裂情况,这在一个表反复修改的情况下,可能会明显影响查询速度。