current position:Home>True interview question: why does redis store a data type twice?

True interview question: why does redis store a data type twice?

2022-05-15 07:43:01Java architecture design

Preface

stay Redis in , There is a data type , When the data is stored, two data structures will be used to store them separately , that Redis Why do that ? Will this cause the same piece of data to take up twice as much space ?

Five basic types of set objects

Redis The collection object in is an unordered collection of string type elements , The elements in the set are unique and unrepeatable .

There are two underlying data structures for collection objects :intset and hashtable. The interior is differentiated by coding :

intset code

intset( Set of integers ) It can be saved as int16_t,int32_t,int64_t The integer value , And make sure there are no duplicate elements in the collection .

intset The data structure is defined as follows ( Source code inset.h Inside ):

typedef struct intset {
    uint32_t encoding;// Encoding mode 
    uint32_t length;// The number of elements in the current set 
    int8_t contents[];// Specific elements in a collection 
} intset;

Here is a intset Collection object storage diagram of :

encoding

stay intset Inside encoding Records the data storage type of the current integer set , There are three main types :

  • INTSET_ENC_INT16

here contents[] Every element in is a int16_t Integer value of type , The scope is :-32768 ~ 32767(-2 Of 15 Power ~ 2 Of 15 Power - 1).

  • INTSET_ENC_INT32

here contents[] Every element in is a int32_t Integer value of type , The scope is :-2147483648 ~ 2147483647(-2 Of 31 Power ~ 2 Of 31 Power - 1).

  • INTSET_ENC_INT64

here contents[] Every element in is a int64_t Integer value of type , The scope is :-9223372036854775808 ~ 9223372036854775807(-2 Of 63 Power ~ 2 Of 63 Power - 1).

contents[]

contents[] Although the definition of structure says int8_t type , But the actual storage type is determined by encoding To decide .

Upgrade of the collection of integers

If at first all the elements in the set of integers are 16 Bit , use int16_t Type to store , At this time, you need to store another one 32 An integer , Then you need to upgrade the original integer set , Only after upgrading can 32 Bit integers are stored in the set of integers . This involves the type upgrade of integer sets , The upgrade process mainly includes 4 A step :

  • Expand the size of the underlying array space according to the type of the newly added element , Allocate new space according to the number of digits of the upgraded existing elements .
  • Type conversion of existing elements , And put the elements of the converted type back into the array one by one from back to front .
  • Put new elements at the head or tail of the array ( Because the condition that triggers the upgrade is that the integer type of the current array cannot store new elements , So the new elements are either bigger than the existing ones , Or it's smaller than the existing elements ).
  • take encoding Property to the latest encoding , And synchronous modification length attribute .

PS: Just like the encoding of string objects , Once the type of integer set is upgraded , Will keep coding , Can't downgrade .

Upgrade example

1. Suppose we have a collective store of encoding yes int16_t, Internal storage 3 Elements :

2. In this case, you need to insert an integer 50000, Found that the storage does not go down , and 50000 It's a int32_t Type integer , So you need to apply for a new space , The application space is 4 * 32 - 48=80.

3. Now I'm going to put... In the new array 4 Elements , The original array is at number 3, So we need to upgrade the 3 Move to 64-95 position .

4. Continue to upgrade the 2 Move to 32-63 position .

5. Continue to upgrade the 1 Move to 0-31 position .

6. Then it will 50000 Put it in 96-127 position .

7. In the end, it will change encoding and length attribute , After modification, the upgrade is completed .

hashtable code

hashtable The structure was analyzed in detail when we talked about hash objects , If you want to know more, you can click here .

intset and hashtable Encoding conversion

When a set satisfies the following two conditions ,Redis Will choose to use intset code :

  • All elements stored in the collection object are integer values .
  • The number of elements held by a collection object is less than or equal to 512 individual ( This threshold can be set through the configuration file set-max-intset-entries To control ).

Once the elements in the set do not meet the above two conditions , Will choose to use hashtable code .

Common commands for collection objects

  • sadd key member1 member2: Put one or more elements member Add to set key among , And returns the number of successful additions , If the element already exists, it is ignored .
  • sismember key member: Element of judgement member Whether there is a collection key in .
  • srem key member1 member2: Remove set key The elements in , Elements that don't exist are ignored .
  • smove source dest member: Put the element member From the collection source Move to dest in , If member non-existent , Do nothing .
  • smembers key: Back to the assembly key All elements in .

Learn about common commands for manipulating collection objects , We can verify the type and encoding of the hash object mentioned above , Before testing, in order to prevent other key The interference of values , Let's do it first flushall Command clear Redis database .

Execute the following commands in turn :

sadd num 1 2 3  // Set up  3  A set of integers , Will use  intset  code 
type num // View type 
object encoding num   // View encoding 

sadd name 1 2 3 test  // Set up  3  Integers and  1  A collection of strings , Will use  hashtable  code 
type name // View type 
object encoding name // View encoding 

The results are as follows :

You can see , When there are only integers in the set element , What collections use is intset code , When the set element contains a non integer , What you use is hashtable code .

Five basic types of ordered set objects

Redis The difference between an ordered set in and a set is that each element in an ordered set is associated with a double Score of type , And then in the order of scores from small to large . let me put it another way , The order of ordered sets is determined by the fraction when we set our own values .

There are two underlying data structures for ordered collection objects :skiplist and ziplist. The interior is also distinguished by coding :

skiplist code

skiplist The jump table , Sometimes it's also called jump watch for short . Use skiplist Encoded ordered set objects use zset Structure as the underlying implementation , and zset It contains both a dictionary and a jump table .

Skip list

A skip table is an ordered data structure , Its main feature is to maintain multiple pointers to other nodes in each node , So as to achieve the purpose of fast access to nodes .

In most cases , The efficiency of jump table can be equal to that of balance tree , But the implementation of jump table is much simpler than that of balance tree , therefore Redis Choose to use jump table to realize ordered set .

The following is a common ordered list , If we want to find 35 This element , Can only traverse from the beginning to the end ( Random access is not supported for elements in linked lists , So you can't use binary search , The array can be accessed randomly by subscript , So binary search generally applies to ordered arrays ), The time complexity is O(n).

So if we can jump directly to the middle of the list , That would save a lot of resources , This is the principle of jump watch , As shown in the figure below is an example of the data structure of a jump table :

Above picture level1,level2,level3 It's the level of the jump table , every last level All levels have one point to the next, the same level Pointers to hierarchical elements , For example, in the figure above, we traverse to find elements 35 There are three ways to do it :

  • The first 1 One is execution level1 Pointer to hierarchy , Need to traverse 7 Time (1->8->9->12->15->20->35) To find the elements 35.
  • The first 2 One is execution level2 Pointer to hierarchy , Just traverse 5 Time (1->9->12->15->35) You can find the elements 35.
  • The first 3 One is execution level3 The elements of hierarchy , In this case, we just need to traverse 3 Time (1->12->35) You can find the elements 35 了 , Greatly improved efficiency .

skiplist Storage structure of

Each node in the jump table is a zskiplistNode node ( Source code server.h Inside ):

typedef struct zskiplistNode {
    sds ele;// Elements 
    double score;// The score is 
    struct zskiplistNode *backward;// Back pointer 
    struct zskiplistLevel {// layer 
        struct zskiplistNode *forward;// Forward pointer 
        unsigned long span;// The span from the current node to the next node ( The number of nodes crossed )
    } level[];
} zskiplistNode;
  • level( layer )

level That is, the layer in the jump table , It's an array , That is to say, an element of a node can have multiple layers , Multiple pointers to other nodes , Programs can choose the fastest path through different levels of pointers to improve access speed .

level Every time a new node is created, according to the power law (power law) A randomly generated one between 1~32 Number between .

  • forward( Forward pointer )

Each layer has a pointer to the end of the list , When traversing elements, you need to use the forward pointer .

  • span( span )

The span records the distance between two nodes , It should be noted that , If it points to NULL Words , Then the span is 0.

  • backward( Back pointer )

Unlike the forward pointer, there is only one backward pointer , So you can only go back to the previous node at a time ( The back pointer is not shown in the picture above ).

  • ele( Elements )

The element in the jump table is a sds object ( Earlier versions used redisObject object ), The element must be unique and cannot be repeated .

  • score( The score is )

The score of a node is a double Floating point number of type , In the jump table, the nodes will be ranked from small to large according to the score , The score of different nodes can be repeated .

This is just a node in the jump table , Multiple zskiplistNode Nodes make up a zskiplist object :

typedef struct zskiplist {
    struct zskiplistNode *header, *tail;// Pointer to the head node and tail node of the jump table 
    unsigned long length;// The number of nodes in the jump table 
    int level;// The largest number of layers in all nodes 
} zskiplist;

Here you might think that's what ordered sets use zskiplist To achieve , But in fact Redis It's not used directly zskiplist To achieve , It's about using zset The object is wrapped again .

typedef struct zset {
    dict *dict;// A dictionary object 
    zskiplist *zsl;// Jump table objects 
} zset;

So the final , An ordered set if you use skiplist code , The data structure is shown in the figure below :

In the dictionary in the upper part of the picture above key It corresponds to the elements in an ordered set (member),value It corresponds to the score (score). In the lower part of the figure above, jump table integers 1,8,9,12 It also corresponds to the element (member), The last row double The type number is the score (score).

In other words, the data in dictionaries and jump tables point to the elements we store ( The two data structures ultimately point to the same address , So data doesn't have redundant storage ),Redis Why do that ?

Why choose to use both dictionary and jump table

The ordered set can be realized by using jump table or dictionary alone , But let's think about it , If you use the jump table alone , So although we can use a long-span pointer to traverse the elements to find the data we need , But its complexity still reaches O(logN), And the complexity of getting an element in a dictionary is O(1), And if you use the dictionary alone, it's fast to get elements , But dictionaries are out of order , So if you want to find a range, you need to sort it , It's another time-consuming operation , therefore Redis Two data structures are integrated to maximize performance , This is also Redis The subtlety of design .

ziplist code

Compressed lists are used in both list objects and hash objects , If you want to know more, you can click here .

https://blog.csdn.net/zwx900102/article/details/112651435

ziplist and skiplist Encoding conversion

When an ordered collection object satisfies both of the following conditions , Will use ziplist Code for storage :

  • The number of elements saved in an ordered collection object is less than 128 individual ( Can be configured by zset-max-ziplist-entries modify ).
  • The total length of all elements stored in an ordered collection object is less than 64 byte ( Can be configured by zset-max-ziplist-value modify ).

Common commands for ordered collection objects

  • zadd key score1 member1 score2 member2: Put one or more elements (member) And its score Add to ordered collection key in .
  • zscore key member: Return to ordered set key in member Members of the score.
  • zincrby key num member: To assemble in order key Medium member add num,num Can be negative .
  • zcount key min max: Return to ordered set key in score Values in [min,max] The interval of member Number .
  • zrange key start stop: Return to ordered set key in score From small to large, after [start,stop] All of the intervals member.
  • zrevrange key start stop: Return to ordered set key in score From big to small, after [start,stop] All of the intervals member.
  • zrangebyscore key min max: Return to the ordered set and press score From small to large, after [min,max] All elements of the interval . Note that the default here is the closed interval , But you can max and min Add... Before the value of ( perhaps [ To control the opening and closing section .
  • zrevrangebyscore key max min: Return to the ordered set and press score From big to small, after [min,max] All elements of the interval . Note that the default here is the closed interval , But you can max and min Add... Before the value of ( perhaps [ To control the opening and closing section .
  • zrank key member: Return to the ordered set member Ranking of the elements in ( From small to large ), The result returned is from 0 Start calculating .
  • zrevrank key member: Return to the ordered set member Ranking of the elements in ( From big to small ), The result returned is from 0 Start calculating .
  • zlexcount key min max: Return to the ordered set min and max Between member Number . Notice in this command min and max It must be preceded by ( perhaps [ To control the opening and closing section , Special values - and + Minus infinity and plus infinity .

Understand the common commands for operating ordered collection objects , We can verify the type and encoding of the hash object mentioned above , Before testing, in order to prevent other key The interference of values , Let's do it first flushall Command clear Redis database .

Before executing the order , Let's start with the parameters in the configuration file zset-max-ziplist-entries It is amended as follows 2, And then restart Redis service .

After the restart, execute the following commands in turn :

zadd name 1 zs 2 lisi // Set up  2  Two elements will use  ziplist
type name // View type 
object encoding name // View encoding 

zadd address 1 beijing 2 shanghai 3 guangzhou 4 shenzhen  // Set up 4 Elements will use  skiplist code 
type address  // View type 
object encoding address // View encoding 

The results are as follows :

summary

This paper mainly analyzes the underlying storage structure of set objects and ordered set objects intset and skiplist Implementation principle of , And it focuses on how to sort ordered sets and why to use two data structures at the same time ( Dictionary and jump table ) The reason for storing data at the same time .

copyright notice
author[Java architecture design],Please bring the original link to reprint, thank you.
https://en.chowdera.com/2022/131/202205102126510534.html

Random recommended