ceph tools
I find that some of the ceph
subcommands, such as ceph osd df
and ceph osd tree
, are not very useful when you're managing a ceph cluster with a large number of OSDs. When I say "large", I'm talking about 300+ OSDs here and trawling through the unordered ceph
output looking for one particular OSD is often impractical, particularly when you when you have to do it often.
So, I wrote a Python module that can run ceph
with the "-f xml" option (so it produces XML output) and parse the XML result. Then I write a couple of Python programs to collect all the information I needed from ceph
and present it in a form that is more useful to me.
The module and the programs can be downloaded via the links further down on this page. Download the files into the same directory and rename them so they don't have the ".txt" extension.
ceph_support.py
(v1.1): the module with the XML parser and support functions,ceph_show_osd_detail
(v1.0): OSDs listed by host or by OSD number,ceph_show_pool_distribution
(v1.0): List each placement group and its OSDs and hosts,ceph_xml_dump
: Tool to help when developing software withceph_support.py
. Displays the result of raw XML parsing.
ceph_show_osd_detail
This program outputs a list of all OSDs, one per line, with usage details and status. By default the list is ordered by host name in alphabetical order and then ascending OSD number, but using the "-s" option causes the list to be ordered by OSD number only. This makes finding a particular OSD easy. See below for examples on a small ceph cluster.
# cephbin/ceph_show_osd_detail -? Outputs OSD information Options: -s Sort by OSD number instead of host name -h This message -? This message # cephbin/ceph_show_osd_detail Host OSD in/out up/down Weight pgs Utilization Variance Size Available Used ceph 1 in up 1.862 243 37.2941 0.931978 1862.1G 1167.7G 694.5G ceph 11 in up 1.862 244 43.0174 1.075 1862.1G 1061.1G 801.0G ceph 13 in up 0.931 100 34.769 0.868876 931.1G 607.3G 323.7G hp 12 in up 1.81898 172 44.2796 1.10655 1819.6G 1013.9G 805.7G hp 5 in up 1.862 153 36.6459 0.91578 1862.1G 1179.7G 682.4G kvm 2 in up 1.765 150 42.3609 1.0586 1765.7G 1017.7G 748.0G kvm 3 in up 0.594986 57 48.721 1.21754 595.9G 305.6G 290.3G kvm 4 in up 0.925995 100 38.7887 0.96933 926.1G 566.9G 359.2G node1 0 in up 1.81 235 37.6362 0.940528 1810.6G 1129.2G 681.5G node1 8 in up 1.862 233 39.8964 0.997011 1862.1G 1119.2G 742.9G node2 10 in up 0.931 131 43.2747 1.08143 931.1G 528.1G 402.9G node2 9 in up 1.80899 261 43.0949 1.07694 1809.6G 1029.8G 779.9G pi 14 in up 0.931 140 38.229 0.955341 931.1G 575.1G 355.9G scruffy 6 in up 1.85699 228 40.1493 1.00333 1857.1G 1111.5G 745.6G scruffy 7 in up 1.862 241 35.7383 0.893099 1862.1G 1196.6G 665.5G # cephbin/ceph_show_osd_detail -s Host OSD in/out up/down Weight pgs Utilization Variance Size Available Used node1 0 in up 1.81 235 37.6362 0.940528 1810.6G 1129.2G 681.5G ceph 1 in up 1.862 243 37.2941 0.931978 1862.1G 1167.7G 694.5G kvm 2 in up 1.765 150 42.3609 1.0586 1765.7G 1017.7G 748.0G kvm 3 in up 0.594986 57 48.721 1.21754 595.9G 305.6G 290.3G kvm 4 in up 0.925995 100 38.7887 0.96933 926.1G 566.9G 359.2G hp 5 in up 1.862 153 36.6459 0.91578 1862.1G 1179.7G 682.4G scruffy 6 in up 1.85699 228 40.1493 1.00333 1857.1G 1111.5G 745.6G scruffy 7 in up 1.862 241 35.7383 0.893099 1862.1G 1196.6G 665.5G node1 8 in up 1.862 233 39.8964 0.997011 1862.1G 1119.2G 742.9G node2 9 in up 1.80899 261 43.0949 1.07694 1809.6G 1029.8G 779.9G node2 10 in up 0.931 131 43.2747 1.08143 931.1G 528.1G 402.9G ceph 11 in up 1.862 244 43.0174 1.075 1862.1G 1061.1G 801.0G hp 12 in up 1.81898 172 44.2796 1.10655 1819.6G 1013.9G 805.7G ceph 13 in up 0.931 100 34.769 0.868876 931.1G 607.3G 323.7G pi 14 in up 0.931 140 38.229 0.955341 931.1G 575.1G 355.9G #
ceph_show_pool_distribution
This program is used to analyse the distribution of placement groups across OSDs and/or hosts. I use this because with very large numbers of OSDs distributed across a large number of hosts, it can be useful to see where they are, sometimes by host instead of OSD. This program allows that.
This first example shows a few lines of output showing each PGs status and the OSDs on which they are located.
# cephbin/ceph_show_pool_distribution pg state up primary acting primary 14.0 active+clean [13,3,6] 13 [13,3,6] 13 14.1 active+clean [2,11,5] 2 [2,11,5] 2 14.2 active+clean [2,1,6] 2 [2,1,6] 2 14.3 active+clean [2,6,8] 2 [2,6,8] 2 14.4 active+clean [13,9,7] 13 [13,9,7] 13 14.5 active+clean [6,8,11] 6 [6,8,11] 6 14.6 active+clean [5,6,0] 5 [5,6,0] 5 14.7 active+clean [3,0,11] 3 [3,0,11] 3 14.8 active+clean [5,9,11] 5 [5,9,11] 5 ...
This next example is from the same ceph cluster, but this time the "-m" option is used to display the host names instead of OSD numbers. There are a number of variants of the "-m" option. Use the "-h" option to get a usage message.
# cephbin/ceph_show_pool_distribution -m* pg state up primary acting primary 14.0 active+clean [ceph,kvm,scruffy] ceph [ceph,kvm,scruffy] ceph 14.1 active+clean [kvm,ceph,hp] kvm [kvm,ceph,hp] kvm 14.2 active+clean [kvm,ceph,scruffy] kvm [kvm,ceph,scruffy] kvm 14.3 active+clean [kvm,scruffy,node1] kvm [kvm,scruffy,node1] kvm 14.4 active+clean [ceph,node2,scruffy] ceph [ceph,node2,scruffy] ceph 14.5 active+clean [scruffy,node1,ceph] scruffy [scruffy,node1,ceph] scruffy 14.6 active+clean [hp,scruffy,node1] hp [hp,scruffy,node1] hp 14.7 active+clean [kvm,node1,ceph] kvm [kvm,node1,ceph] kvm 14.8 active+clean [hp,node2,ceph] hp [hp,node2,ceph] hp ...