介绍docker的的过程中,提到lxc利用cgroup来提供资源的限额和控制,本文主要介绍cgroup的用法和操作命令,主要内容来自
[2]https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt
##cgroup
cgroup的功能在于将一台计算机上的资源(CPU,memory, network)进行分片,来防止进程间不利的资源抢占。
Terminology
subsystem
和一组树形结构的cgroup
. 和cgroup
不同,hierarchy
包含的是可管理的subsystem
而非具体参数由此可见,cgroup对资源的管理是一个树形结构,类似进程。
相同点 - 分层结构,子进程/cgroup继承父进程/cgroup
不同点 - 进程是一个单根树状结构(pid=0为根),而cgroup整体来看是一个多树的森林结构(hierarchy为根)。
一个典型的hierarchy
挂载目录如下
/cgroup/
├── blkio <--------------- hierarchy/root cgroup
│ ├── blkio.io_merged <--------------- subsystem parameter
... ...
│ ├── blkio.weight
│ ├── blkio.weight_device
│ ├── cgroup.event_control
│ ├── cgroup.procs
│ ├── lxc <--------------- cgroup
│ │ ├── blkio.io_merged <--------------- subsystem parameter
│ │ ├── blkio.io_queued
... ... ...
│ │ └── tasks <--------------- task list
│ ├── notify_on_release
│ ├── release_agent
│ └── tasks
...
subsystem列表
RHEL/centos支持的subsystem如下
##cgroup操作准则与方法
A single hierarchy can have one or more subsystems attached to it.
eg.
mount -t cgroup -o cpu,cpuset,memory cpu_and_mem /cgroup/cpu_and_mem
Any single subsystem (such as cpu) cannot be attached to more than one hierarchy if one of those hierarchies has a different subsystem attached to it already.
Each time a new hierarchy is created on the systems, all tasks on the system are initially members of the default cgroup of that hierarchy, which is known as the root cgroup. For any single hierarchy you create, each task on the system can be a member of exactly onecgroup in that hierarchy. A single task may be in multiple cgroups, as long as each of those cgroups is in a different hierarchy. As soon as a task becomes a member of a second cgroup in the same hierarchy, it is removed from the first cgroup in that hierarchy. At no time is a task ever in two different cgroups in the same hierarchy.
Any process (task) on the system which forks itself creates a child task. A child task automatically inherits the cgroup membership of its parent but can be moved to different cgroups as needed. Once forked, the parent and child processes are completely independent.
利用cgconfig服务及其配置文件 /etc/cgconfig.conf
- 服务启动时自动挂载
subsystem = /cgroup/hierarchy;
命令行操作
mount -t cgroup -o subsystems name /cgroup/name
取消挂载
umount /cgroup/name
eg. 挂载 cpuset, cpu, cpuacct, memory 4个subsystem到/cgroup/cpu_and_mem
目录(hierarchy)
mount {
cpuset = /cgroup/cpu_and_mem;
cpu = /cgroup/cpu_and_mem;
cpuacct = /cgroup/cpu_and_mem;
memory = /cgroup/cpu_and_mem;
}
or
mount -t cgroup -o remount,cpu,cpuset,memory cpu_and_mem /cgroup/cpu_and_mem
利用cgconfig服务及其配置文件 /etc/cgconfig.conf
- 服务启动时自动挂载
group <name> {
[<permissions>] <controller> { <param name> = <param value>;
…
}
…
}
命令行操作
cgcreate -t uid:gid -a uid:gid -g subsystems:path
mkdir /cgroup/hierarchy/name/child_name
cgdelete subsystems:path
(使用 -r 递归删除)rm -rf /cgroup/hierarchy/name/child_name
(cgconfig service not running)利用cgconfig服务及其配置文件 /etc/cgconfig.conf
- 服务启动时自动挂载
perm {
task {
uid = <task user>;
gid = <task group>;
}
admin {
uid = <admin name>;
gid = <admin group>;
}
}
命令行操作 chown
eg.
group daemons {
cpuset {
cpuset.mems = 0;
cpuset.cpus = 0;
}
}
group daemons/sql {
perm {
task {
uid = root;
gid = sqladmin;
} admin {
uid = root;
gid = root;
}
}
cpuset {
cpuset.mems = 0;
cpuset.cpus = 0;
}
}
or
~]$ mkdir -p /cgroup/red/daemons/sql
~]$ chown root:root /cgroup/red/daemons/sql/*
~]$ chown root:sqladmin /cgroup/red/daemons/sql/tasks
~]$ echo 0 > /cgroup/red/daemons/cpuset.mems
~]$ echo 0 > /cgroup/red/daemons/cpuset.cpus
~]$ echo 0 > /cgroup/red/daemons/sql/cpuset.mems
~]$ echo 0 > /cgroup/red/daemons/sql/cpuset.cpus
cgset -r parameter=value path_to_cgroup
cgset --copy-from path_to_source_cgroup path_to_target_cgroup
echo value > path_to_cgroup/parameter
eg.
cgset -r cpuset.cpus=0-1 group1
cgset --copy-from group1/ group2/
echo 0-1 > /cgroup/cpuset/group1/cpuset.cpus
cgclassify -g subsystems:path_to_cgroup pidlist
echo pid > path_to_cgroup/tasks
cgexec -g subsystems:path_to_cgroup command arguments
echo 'CGROUP_DAEMON="subsystem:control_group"' >> /etc/sysconfig/<service>
利用cgrulesengd服务初始化,在配置文件/etc/cgrules.conf
中
user<:command> subsystems control_group
其中:
+用户user的所有进程的subsystems限制的group为control_group
+<:command>是可选项,表示对特定命令实行限制
+user可以用@group表示对特定的 usergroup 而非user
+可以用*表示全部
+%表示和前一行的该项相同
eg.
cgclassify -g cpu,memory:group1 1701 1138
echo -e "1701\n1138" |tee -a /cgroup/cpu/group1/tasks /cgroup/memory/group1/tasks
cgexec -g cpu:group1 lynx http://www.redhat.com
sh -c "echo \$$ > /cgroup/lab1/group1/tasks && lynx http://www.redhat.com"
通过/etc/cgrules.conf 对特定服务限制
maria devices /usergroup/staff
maria:ftp devices /usergroup/staff/ftp
@student cpu,memory /usergroup/student/
% memory /test2/
cgsnapshot会根据当前cgroup情况生成/etc/cgconfig.conf文件内容
gsnapshot [-s] [-b FILE] [-w FILE] [-f FILE] [controller]
-b, --blacklist=FILE Set the blacklist configuration file (default /etc/cgsnapshot_blacklist.conf)
-f, --file=FILE Redirect the output to output_file
-s, --silent Ignore all warnings
-t, --strict Don't show the variables which are not on the whitelist
-w, --whitelist=FILE Set the whitelist configuration file (don't used by default)
查看进程在哪个cgroup
ps -O cgroup
或
cat /proc/<PID>/cgroup
查看subsystem mount情况
cat /proc/cgroups
lssubsys -m <subsystems>
lscgroup
查看cgroup参数值
cgget -r parameter list_of_cgroups
cgget -g <controllers>:<path>
##subsystem配置
###1. blkio - BLOCK IO限额
device_types:node_numbers milliseconds
device_types:node_numbers sector_count
CONFIG_DEBUG_BLK_CGROUP=y
)CONFIG_DEBUG_BLK_CGROUP=y
, 单位ns)CONFIG_DEBUG_BLK_CGROUP=y
, 单位ns)CONFIG_DEBUG_BLK_CGROUP=y
) - device_types:node_numbers number
device_types:node_numbers operation number
device_types:node_numbers operation bytes
device_types:node_numbers operation time
device_types:node_numbers operation time
number operation
number operation
device_types:node_numbers bytes_per_second
blkio.throttle.write_bps_device - device_types:node_numbers bytes_per_second
device_types:node_numbers operations_per_second
blkio.throttle.write_iops_device - device_types:node_numbers operations_per_second
device_types:node_numbers operation operations_per_second
blkio.throttle.io_service_bytes - device_types:node_numbers operation bytes_per_second
###2. cpu - CPU使用时间限额
cpu.rt_period_us, cpu.rt_runtime_us
二者配合使用规定cgroup里的task每cpu.rt_period_us(微秒)必然会执行cpu.rt_runtime_us(微秒)
###3. cpuacct - CPU资源报告
###4. cpuset - CPU绑定
###5. device - cgoup的device限制
###6. freezer - 暂停/恢复 cgroup的限制
###7. memory - 内存限制
###8. net_cls
###9.net_prio 指定task网络设备优先级
<network_interface> <priority>
###10.其他
##总结