[数据库]【转载】GUID vs INT Debate
I recently read a blog post on what was better using GUIDs or Integer values. This is been an age long debate and there are advocates in both camps stressing on the disadvantages of the other. Well both implementations have their advantages and disadvantages. At the outset, I shall mention that the answer to this debate is: IT DEPENDS! J
It is highly dependent on your database design, migration needs and overall architecture. There is a good reason why SQL Server replication uses GUIDs to track the changes to the replicated articles. So, it is not that the usage to GUIDs is necessarily a bad practice. SQL Server Books Online lists the following disadvantages for uniqueidentifier data type:
- The values are long and obscure. This makes them difficult for users to type correctly, and more difficult for users to remember.
- The values are random and cannot accept any patterns that may make them more meaningful to users.
- There is no way to determine the sequence in which uniqueidentifier values were generated. They are not suited for existing applications that depend on incrementing key values serially.
- At 16 bytes, the uniqueidentifier data type is relatively larger than other data types, such as 4-byte integers. This means indexes that are built using uniqueidentifier keys might be relatively slower than indexes using an int key.
If you are using NEWID function in SQL Server, then this generates random UUIDs which have a huge domain but the chances of GUID collisions are always there though the probability is very slim in nature. If you are using NEWID function to generate uniqueidentifiers as row identifiers in your table, then you need to think again! Uniqueness of the row should be enforced using a Unique or Primary Key constraint on the table. NewSequentialID function uses identification number of the computer network card plus a unique number from the CPU clock to generate the uniqueidentifier (Reference article). So the chance of getting a globally unique value is practically guaranteed as long as the machine has a network card. Moreover, possibility of a GUID collision while using NewSequentialID is virtually impossible.
Given that you have a beefy server, the above time difference would not make much of a difference unless and until you only have a high number of concurrent INSERT workload on the server or during a Data Load operation which would cause a significant impact. What is interesting to note is that the fragmentation on the tables after the first batch of 1 million inserts.
Average Record Size
Average Page Density
If you look at the above data, you will see that the random GUIDs have 99% logical fragmentation in the tables. This is due to the random nature of the GUIDs generated which end up causing high number of page splits in the database.