MySQL 8.0 Reference Manual(读书笔记78节-- InnoDB Table and Page Compression (1))
概述
This section provides information about the InnoDB table compression and InnoDB page compression features. The page compression feature is also referred to as transparent page compression.
Using the compression features of InnoDB, you can create tables where the data is stored in compressed form. Compression can help to improve both raw performance and scalability. The compression means less data is transferred between disk and memory, and takes up less space on disk and in memory. 【压缩后,数据量相对少了,I/O 就小了,在 磁盘与内存 传输 就快捷了,并且 保存时 在磁盘上占用的存储空间 也少了】 The benefits are amplified【ˈæmplɪfaɪd 放大,增强(声音等);阐发,充实(故事、事情、陈述等)】 for tables with secondary indexes, because index data is compressed also. Compression can be especially important for SSD storage devices, because they tend to have lower capacity than HDD devices.【考虑到成本的问题,这个特性对SSD,更有好处】
InnoDB Table Compression
This section describes InnoDB table compression, which is supported with InnoDB tables that reside in file_per_table tablespaces or general tablespaces. Table compression is enabled using the ROW_FORMAT=COMPRESSED attribute with CREATE TABLE or ALTER TABLE.
1 Overview of Table Compression
Because processors and cache memories have increased in speed more than disk storage devices, many workloads are disk-bound. Data compression enables smaller database size, reduced I/O, and improved throughput, at the small cost of increased CPU utilization. Compression is especially valuable for read-intensive applications, on systems with enough RAM to keep frequently used data in memory.
An InnoDB table created with ROW_FORMAT=COMPRESSED can use a smaller page size on disk than the configured innodb_page_size value. Smaller pages require less I/O to read from and write to disk, which is especially valuable for SSD devices.
The compressed page size is specified through the CREATE TABLE or ALTER TABLE KEY_BLOCK_SIZE parameter. The different page size requires that the table be placed in a file-per-table tablespace or general tablespace rather than in the system tablespace, as the system tablespace cannot store compressed tables.
The level of compression is the same regardless of the KEY_BLOCK_SIZE value. As you specify smaller values for KEY_BLOCK_SIZE, you get the I/O benefits of increasingly smaller pages. But if you specify a value that is too small, there is additional overhead 【ˌoʊvərˈhed 开销;经常费用;经常开支;】 to reorganize the pages when data values cannot be compressed enough to fit multiple rows in each page. There is a hard limit on how small KEY_BLOCK_SIZE can be for a table, based on the lengths of the key columns for each of its indexes. Specify a value that is too small, and the CREATE TABLE or ALTER TABLE statement fails.【太小了也不合适,甚至可能失败】
In the buffer pool, the compressed data is held in small pages, with a page size based on the KEY_BLOCK_SIZE value. For extracting 【ɪkˈstræktɪŋ 提取;摘录;提炼;选取;索取,设法得到】or updating the column values, MySQL also creates an uncompressed page in the buffer pool with the uncompressed data. Within the buffer pool, any updates to the uncompressed page are also re-written back to the equivalent【ɪˈkwɪvələnt (价值、数量、意义、重要性等)相同的;相等的】compressed page. You might need to size your buffer pool to accommodate the additional data of both compressed and uncompressed pages, although the uncompressed pages are evicted from the buffer pool when space is needed, and then uncompressed again on the next access.
2 Creating Compressed Tables
Compressed tables can be created in file-per-table tablespaces or in general tablespaces. Table compression is not available for the InnoDB system tablespace. The system tablespace (space 0, the .ibdata files) can contain user-created tables, but it also contains internal system data, which is never compressed. Thus, compression applies only to tables (and indexes) stored in file-per-table or general tablespaces.---【对tablespaces 这个条件是有些要求的,system tablespace 就不可以】
2.1 Creating a Compressed Table in File-Per-Table Tablespace
To create a compressed table in a file-per-table tablespace, innodb_file_per_table must be enabled (the default). You can set this parameter in the MySQL configuration file (my.cnf or my.ini) or dynamically, using a SET statement.
After the innodb_file_per_table option is configured, specify the ROW_FORMAT=COMPRESSED clause or KEY_BLOCK_SIZE clause, or both, in a CREATE TABLE or ALTER TABLE statement to create a compressed table in a file-per-table tablespace.
例子
SET GLOBAL innodb_file_per_table=1;
CREATE TABLE t1
(c1 INT PRIMARY KEY)
ROW_FORMAT=COMPRESSED
KEY_BLOCK_SIZE=8;
2.2 Creating a Compressed Table in a General Tablespace
To create a compressed table in a general tablespace, FILE_BLOCK_SIZE must be defined for the general tablespace, which is specified when the tablespace is created. The FILE_BLOCK_SIZE value must be a valid compressed page size in relation【rɪˈleɪʃn 关系;(事物之间的)关联;】 to the innodb_page_size value, and the page size of the compressed table, defined by the CREATE TABLE or ALTER TABLE KEY_BLOCK_SIZE clause, must be equal to FILE_BLOCK_SIZE/1024. For example, if innodb_page_size=16384 and FILE_BLOCK_SIZE=8192, the KEY_BLOCK_SIZE of the table must be 8.
The following example demonstrates creating a general tablespace and adding a compressed table. The example assumes a default innodb_page_size of 16K. The FILE_BLOCK_SIZE of 8192 requires that the compressed table have a KEY_BLOCK_SIZE of 8.
mysql> CREATE TABLESPACE `ts2` ADD DATAFILE 'ts2.ibd' FILE_BLOCK_SIZE = 8192 Engine=InnoDB;
mysql> CREATE TABLE t4 (c1 INT PRIMARY KEY) TABLESPACE ts2 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8;
说明
• As of MySQL 8.0, the tablespace file for a compressed table is created using the physical page size instead of the InnoDB page size, which makes the initial size of a tablespace file for an empty compressed table smaller than in previous MySQL releases.--【初始化后;比原来老版本的有点大】
• If you specify ROW_FORMAT=COMPRESSED, you can omit 【əˈmɪt 省略;忽略;遗漏;删除;漏掉;不做;未能做】KEY_BLOCK_SIZE; the KEY_BLOCK_SIZE setting defaults to half the innodb_page_size value.--【允许默认】
• If you specify a valid KEY_BLOCK_SIZE value, you can omit ROW_FORMAT=COMPRESSED; compression is enabled automatically.--【允许默认】
• To determine the best value for KEY_BLOCK_SIZE, typically you create several copies of the same table with different values for this clause, then measure the size of the resulting .ibd files and see how well each performs with a realistic【riːəˈlɪstɪk 现实的;实际的;逼真的;实事求是的;】 workload. For general tablespaces, keep in mind that dropping a table does not reduce the size of the general tablespace .ibd file, nor does it return disk space to the operating system.
• The KEY_BLOCK_SIZE value is treated as a hint【hɪnt 提示;暗示;迹象;示意;少许;少量;征兆;秘诀】; a different size could be used by InnoDB if necessary. For file-per-table tablespaces, the KEY_BLOCK_SIZE can only be less than or equal to the innodb_page_size value. If you specify a value greater than the innodb_page_size value, the specified value is ignored, a warning is issued, and KEY_BLOCK_SIZE is set to half of the innodb_page_size value. If innodb_strict_mode=ON, specifying an invalid KEY_BLOCK_SIZE value returns an error. For general tablespaces, valid KEY_BLOCK_SIZE values depend on the FILE_BLOCK_SIZE setting of the tablespace.
• InnoDB supports 32KB and 64KB page sizes but these page sizes do not support compression.
• The default uncompressed size of InnoDB data pages is 16KB. Depending on the combination of option values, MySQL uses a page size of 1KB, 2KB, 4KB, 8KB, or 16KB for the tablespace data file (.ibd file). The actual compression algorithm is not affected by the KEY_BLOCK_SIZE value; the value determines how large each compressed chunk is, which in turn affects how many rows can be packed into each compressed page.
• When creating a compressed table in a file-per-table tablespace, setting KEY_BLOCK_SIZE equal to the InnoDB page size does not typically result in much compression. For example, setting KEY_BLOCK_SIZE=16 typically would not result in much compression, since the normal InnoDB page size is 16KB. This setting may still be useful for tables with many long BLOB, VARCHAR or TEXT columns, because such values often do compress well, and might therefore require fewer overflow pages. For general tablespaces, a KEY_BLOCK_SIZE value equal to the InnoDB page size is not permitted.
• All indexes of a table (including the clustered index) are compressed using the same page size, as specified in the CREATE TABLE or ALTER TABLE statement. Table attributes such as ROW_FORMAT and KEY_BLOCK_SIZE are not part of the CREATE INDEX syntax for InnoDB tables, and are ignored if they are specified (although, if specified, they appear in the output of the SHOW CREATE TABLE statement).
2.3 Restrictions on Compressed Tables --限制条件
• Compressed tables cannot be stored in the InnoDB system tablespace.
• General tablespaces can contain multiple tables, but compressed and uncompressed tables cannot coexist【koʊɪɡˈzɪst 共存;(尤指)和平共处】 within the same general tablespace. ---不能混杂一处
• Compression applies to an entire table and all its associated indexes, not to individual rows, despite the clause name ROW_FORMAT.
• InnoDB does not support compressed temporary tables. When innodb_strict_mode is enabled (the default), CREATE TEMPORARY TABLE returns errors if ROW_FORMAT=COMPRESSED or KEY_BLOCK_SIZE is specified. If innodb_strict_mode is disabled, warnings are issued and the temporary table is created using a non-compressed row format. The same restrictions apply to ALTER TABLE operations on temporary tables.
3 Tuning Compression for InnoDB Tables
Most often, the internal optimizations described in InnoDB Data Storage and Compression ensure that the system runs well with compressed data. However, because the efficiency of compression depends on the nature of your data, you can make decisions that affect the performance of compressed tables: ---优化考虑的因素
• Which tables to compress.
• What compressed page size to use.
• Whether to adjust the size of the buffer pool based on run-time performance characteristics, such as the amount of time the system spends compressing and uncompressing data. Whether the workload is more like a data warehouse (primarily queries) or an OLTP system (mix of queries and DML).
• If the system performs DML operations on compressed tables, and the way the data is distributed leads to expensive compression failures at runtime, you might adjust additional advanced configuration options.
Use the guidelines in this section to help make those architectural and configuration choices.
3.1 When to Use Compression
In general, compression works best on tables that include a reasonable number of character string columns and where the data is read far more often than it is written. Because there are no guaranteed ways to predict whether or not compression benefits a particular situation, always test with a specific workload and data set running on a representative configuration. Consider the following factors when deciding which tables to compress.
3.2 Data Characteristics and Compression
A key determinant of the efficiency of compression in reducing the size of data files is the nature of the data itself. Recall that compression works by identifying【aɪˈdentɪfaɪɪŋ 确认;发现;鉴定;显示;找到;认出;说明身份】 repeated strings of bytes in a block of data. Completely randomized data is the worst case. Typical data often has repeated values, and so compresses effectively. Character strings often compress well, whether defined in CHAR, VARCHAR, TEXT or BLOB columns. On the other hand, tables containing mostly binary data (integers or floating point numbers) or data that is previously compressed (for example JPEG or PNG images) may not generally compress well, significantly or at all.--字段类型也很关键
You choose whether to turn on compression for each InnoDB table. A table and all of its indexes use the same (compressed) page size. It might be that the primary key (clustered) index, which contains the data for all columns of a table, compresses more effectively than the secondary indexes. For those cases where there are long rows, the use of compression might result in long column values being stored “off-page”. Those overflow pages may compress well. Given these considerations, for many applications, some tables compress more effectively than others, and you might find that your workload performs best only with a subset of tables compressed.
To determine whether or not to compress a particular table, conduct【kənˈdʌkt 实施;组织;执行;安排;引导;】 experiments. You can get a rough estimate of how efficiently your data can be compressed by using a utility that implements LZ77 compression (such as gzip or WinZip) on a copy of the .ibd file for an uncompressed table. You can expect less compression from a MySQL compressed table than from file-based compression tools, because MySQL compresses data in chunks based on the page size, 16KB by default. In addition to user data, the page format includes some internal system data that is not compressed. File-based compression utilities can examine much larger chunks of data, and so might find more repeated strings in a huge file than MySQL can find in an individual page.
Another way to test compression on a specific table is to copy some data from your uncompressed table to a similar, compressed table (having all the same indexes) in a file-per-table tablespace and look at the size of the resulting .ibd file.--这个方法还是不错的,直接、明了
To see whether compression is efficient for your particular workload:--去查看统计效率
• For simple tests, use a MySQL instance with no other compressed tables and run queries against the Information Schema INNODB_CMP table.
• For more elaborate【ɪˈlæbərət 复杂的;详尽的;精心制作的】 tests involving workloads with multiple compressed tables, run queries against the Information Schema INNODB_CMP_PER_INDEX table. Because the statistics in the INNODB_CMP_PER_INDEX table are expensive to collect, you must enable the configuration option innodb_cmp_per_index_enabled before querying that table, and you might restrict such testing to a development server or a non-critical replica server.
• Run some typical SQL statements against the compressed table you are testing.
• Examine the ratio of successful compression operations to overall compression operations by querying INFORMATION_SCHEMA.INNODB_CMP or INFORMATION_SCHEMA.INNODB_CMP_PER_INDEX, and comparing COMPRESS_OPS to COMPRESS_OPS_OK.
• If a high percentage of compression operations complete successfully, the table might be a good candidate for compression.
• If you get a high proportion of compression failures, you can adjust innodb_compression_level, innodb_compression_failure_threshold_pct, and innodb_compression_pad_pct_max options,and try further tests.
3.3 Database Compression versus Application Compression
Decide whether to compress data in your application or in the table; do not use both types of compression for the same data. When you compress the data in the application and store the results in a compressed table, extra space savings are extremely unlikely, and the double compression just wastes CPU cycles.
3.4 Compressing in the Database
When enabled, MySQL table compression is automatic and applies to all columns and index values. The columns can still be tested with operators such as LIKE, and sort operations can still use indexes even when the index values are compressed. Because indexes are often a significant 【sɪɡˈnɪfɪkənt 重要的, 有重大意义的;显著的, 值得注意的;<统>显著的, 有效的;(词缀等)有意义的;不可忽略的, 值得注意的;相当数量的;别有含义的, 意味深长的;(语言上)区别性的】fraction【frækʃn 小部分;分数;小数;少量;一点儿】 of the total size of a database, compression could result in significant savings in storage, I/O or processor time. The compression and decompression operations happen on the database server, which likely is a powerful system that is sized to handle the expected load.
3.5 Compressing in the Application
If you compress data such as text in your application, before it is inserted into the database, You might save overhead for data that does not compress well by compressing some columns and not others. This approach uses CPU cycles for compression and uncompression on the client machine rather than the database server, which might be appropriate for a distributed application with many clients, or where the client machine has spare CPU cycles.
3.6 Hybrid 【ˈhaɪbrɪd 混合的;】Approach【əˈproʊtʃ 方式;方法】
Of course, it is possible to combine these approaches. For some applications, it may be appropriate to use some compressed tables and some uncompressed tables. It may be best to externally compress some data (and store it in uncompressed tables) and allow MySQL to compress (some of) the other tables in the application. As always, up-front design and real-life testing are valuable in reaching the right decision.
3.7 Workload Characteristics and Compression
In addition to choosing which tables to compress (and the page size), the workload is another key determinant of performance. If the application is dominated【dɑːmɪneɪtɪd 支配;控制;影响;左右;俯视;高耸于;在…中具有最重要;在…中拥有最重要的位置】 by reads, rather than updates, fewer pages need to be reorganized and recompressed after the index page runs out of room for the per-page “modification log” that MySQL maintains for compressed data. If the updates predominantly【prɪˈdɑːmɪnəntli 主要地;多数情况下】 change non-indexed columns or those containing BLOBs or large strings that happen to be stored “off-page”, the overhead of compression may be acceptable. If the only changes to a table are INSERTs that use a monotonically 【单调地;单调地,无变化地】increasing primary key, and there are few secondary indexes, there is little need to reorganize and recompress index pages. Since MySQL can “delete-mark” and delete rows on compressed pages “in place” by modifying uncompressed data, DELETE operations on a table are relatively efficient.
For some environments, the time it takes to load data can be as important as run-time retrieval. Especially in data warehouse environments, many tables may be read-only or read-mostly. In those cases, it might or might not be acceptable to pay the price of compression in terms of increased load time, unless the resulting savings in fewer disk reads or in storage cost is significant.
Fundamentally【ˌfʌndəˈmentəli 从根本上说,基本上;根本上;完全地】, compression works best when the CPU time is available for compressing and uncompressing data. Thus, if your workload is I/O bound, rather than CPU-bound, you might find that compression can improve overall performance. When you test your application performance with different compression configurations, test on a platform similar to the planned configuration of the production system.
3.8 Configuration Characteristics and Compression
Reading and writing database pages from and to disk is the slowest aspect of system performance. Compression attempts to reduce I/O by using CPU time to compress and uncompress data, and is most effective when I/O is a relatively scarce resource compared to processor cycles
This is often especially the case when running in a multi-user environment with fast, multi-core CPUs. When a page of a compressed table is in memory, MySQL often uses additional memory, typically 16KB, in the buffer pool for an uncompressed copy of the page. The adaptive LRU algorithm attempts to balance the use of memory between compressed and uncompressed pages to take into account whether the workload is running in an I/O-bound or CPU-bound manner. Still, a configuration with more memory dedicated to the buffer pool tends to run better when using compressed tables than a configuration where memory is highly constrained.
3.9 Choosing the Compressed Page Siz
The optimal setting of the compressed page size depends on the type and distribution of data that the table and its indexes contain. The compressed page size should always be bigger than the maximum record size, or operations may fail as noted in Compression of B-Tree Pages.
Setting the compressed page size too large wastes some space, but the pages do not have to be compressed as often. If the compressed page size is set too small, inserts or updates may require timeconsuming recompression, and the B-tree nodes may have to be split more frequently, leading to bigger data files and less efficient indexing.
Typically, you set the compressed page size to 8K or 4K bytes. Given that the maximum row size for an InnoDB table is around 8K, KEY_BLOCK_SIZE=8 is usually a safe choice.
---https://dev.mysql.com/doc/refman/8.0/en/innodb-compression-tuning.html