[Python] Practice_pandas_and_matplotlib(활용편)

2022-03-01

1. 인구 피라미드

연령별 인구현황은 구글링해서 데이터 사용하였다.(여기선 2021년도 인구)
예전 년도의 인구도 이런식으로 그래프 만들면 서로 비교하기 용이할것이다.(이런식으로 활용)

남자 데이터 정의

import pandas as pd
file_name = '202108_202108_연령별인구현황_월간.xlsx'
# 위의 불필요한 3행 무시(skiprows)
# 필요한 컬럼만 가져오기 위해(usecols) => 큰 데이터 활용시 유용할테니 잘 참고
df_m = pd.read_excel(file_name, skiprows=3, index_col='행정기관', usecols='B,E:Y') 
df_m.head(3)

C:\Users\Nadocoding\anaconda3\lib\site-packages\openpyxl\styles\stylesheet.py:221: UserWarning: Workbook contains no default style, apply openpyxl's default
  warn("Workbook contains no default style, apply openpyxl's default")

	0~4세	5~9세	10~14세	15~19세	20~24세	25~29세	30~34세	35~39세	40~44세	45~49세	...	55~59세	60~64세	65~69세	70~74세	75~79세	80~84세	85~89세	90~94세	95~99세	100세 이상
행정기관
전국	807,150	1,154,806	1,215,483	1,229,848	1,652,371	1,875,014	1,655,155	1,829,373	2,024,616	2,139,842	...	2,075,439	2,039,854	1,423,351	982,527	691,904	448,750	189,426	47,982	9,047	2,230
서울특별시	129,766	176,433	189,993	201,088	292,214	402,129	364,322	358,270	362,416	377,526	...	348,510	346,019	257,783	183,490	133,498	81,360	31,954	8,564	1,819	623
부산광역시	46,491	68,329	68,191	69,493	102,324	116,387	99,345	110,248	125,965	128,387	...	132,304	145,246	112,633	79,465	53,942	32,142	11,705	2,744	442	147

3 rows × 21 columns

# 데이터에 , 가있으니까 문자로 인식이 될 수 있기 때문에 int형으로 데이터 변환하겠다.
df_m.iloc[0] = df_m.iloc[0].str.replace(',', '').astype(int) # 1,195,951 -> 1195951 (정수형)

df_m.iloc[0] # 첫번째 행(데이터)는 정수형으로 바뀐걸 볼 수 있다. (0번째가 전국이라서 어차피 이 데이터만 사용)

0~4세        807150
5~9세       1154806
10~14세     1215483
15~19세     1229848
20~24세     1652371
25~29세     1875014
30~34세     1655155
35~39세     1829373
40~44세     2024616
45~49세     2139842
50~54세     2271863
55~59세     2075439
60~64세     2039854
65~69세     1423351
70~74세      982527
75~79세      691904
80~84세      448750
85~89세      189426
90~94세       47982
95~99세        9047
100세 이상       2230
Name: 전국  , dtype: object

여자 데이터 정의

df_w = pd.read_excel(file_name, skiprows=3, index_col='행정기관', usecols='B,AB:AV')
df_w.head(3)
# 아래 출력을 보면 컬럼명이 .1 이 나오는 이유는 동일한 컬럼이 2개이상 존재하기 때문(남, 여)

C:\Users\Nadocoding\anaconda3\lib\site-packages\openpyxl\styles\stylesheet.py:221: UserWarning: Workbook contains no default style, apply openpyxl's default
  warn("Workbook contains no default style, apply openpyxl's default")

	0~4세.1	5~9세.1	10~14세.1	15~19세.1	20~24세.1	25~29세.1	30~34세.1	35~39세.1	40~44세.1	45~49세.1	...	55~59세.1	60~64세.1	65~69세.1	70~74세.1	75~79세.1	80~84세.1	85~89세.1	90~94세.1	95~99세.1	100세 이상.1
행정기관
전국	765,718	1,099,511	1,146,252	1,142,146	1,521,873	1,672,118	1,523,976	1,744,685	1,954,674	2,073,661	...	2,020,720	2,084,814	1,523,981	1,119,798	908,817	731,836	429,062	163,137	37,655	8,705
서울특별시	122,200	168,049	179,786	192,489	322,918	422,032	365,483	358,167	368,511	384,837	...	359,036	381,456	292,733	215,754	163,571	113,723	61,339	24,590	6,093	1,893
부산광역시	44,214	64,756	64,561	64,967	96,558	105,853	93,317	106,076	122,036	127,847	...	140,679	161,288	126,921	93,874	68,059	50,639	26,473	9,739	2,095	605

3 rows × 21 columns

df_m.columns

Index(['0~4세', '5~9세', '10~14세', '15~19세', '20~24세', '25~29세', '30~34세',
       '35~39세', '40~44세', '45~49세', '50~54세', '55~59세', '60~64세', '65~69세',
       '70~74세', '75~79세', '80~84세', '85~89세', '90~94세', '95~99세', '100세 이상'],
      dtype='object')

df_w.columns # 컬럼이 동일해서 구분을 위해 .1 이 붙었다는것을 알 수 있다.

Index(['0~4세.1', '5~9세.1', '10~14세.1', '15~19세.1', '20~24세.1', '25~29세.1',
       '30~34세.1', '35~39세.1', '40~44세.1', '45~49세.1', '50~54세.1', '55~59세.1',
       '60~64세.1', '65~69세.1', '70~74세.1', '75~79세.1', '80~84세.1', '85~89세.1',
       '90~94세.1', '95~99세.1', '100세 이상.1'],
      dtype='object')

df_w.columns = df_m.columns # 컬럼명 통일(.1 이 사라진다)
df_w.columns

Index(['0~4세', '5~9세', '10~14세', '15~19세', '20~24세', '25~29세', '30~34세',
       '35~39세', '40~44세', '45~49세', '50~54세', '55~59세', '60~64세', '65~69세',
       '70~74세', '75~79세', '80~84세', '85~89세', '90~94세', '95~99세', '100세 이상'],
      dtype='object')

df_w.iloc[0] = df_w.iloc[0].str.replace(',', '').astype(int) # 1,195,951 -> 1195951 (정수형)
df_w

	0~4세	5~9세	10~14세	15~19세	20~24세	25~29세	30~34세	35~39세	40~44세	45~49세	...	55~59세	60~64세	65~69세	70~74세	75~79세	80~84세	85~89세	90~94세	95~99세	100세 이상
행정기관
전국	765718	1099511	1146252	1142146	1521873	1672118	1523976	1744685	1954674	2073661	...	2020720	2084814	1523981	1119798	908817	731836	429062	163137	37655	8705
서울특별시	122,200	168,049	179,786	192,489	322,918	422,032	365,483	358,167	368,511	384,837	...	359,036	381,456	292,733	215,754	163,571	113,723	61,339	24,590	6,093	1,893
부산광역시	44,214	64,756	64,561	64,967	96,558	105,853	93,317	106,076	122,036	127,847	...	140,679	161,288	126,921	93,874	68,059	50,639	26,473	9,739	2,095	605
대구광역시	33,301	50,145	51,747	53,488	72,843	73,837	62,613	74,250	89,310	100,716	...	102,229	101,311	76,491	56,373	42,677	33,782	18,595	6,432	1,278	304
인천광역시	43,971	63,283	66,430	65,516	86,994	101,318	90,591	102,585	114,678	119,736	...	123,367	118,457	78,978	54,143	42,634	33,055	19,798	7,995	1,975	422
광주광역시	22,602	33,687	36,409	37,671	50,079	49,107	40,131	48,846	57,440	61,851	...	54,453	52,763	36,125	29,771	22,568	17,166	9,583	3,821	880	210
대전광역시	21,700	31,506	33,996	35,270	49,378	50,621	42,816	47,974	55,890	60,817	...	55,179	56,850	40,131	28,008	22,029	16,946	10,066	3,790	832	174
울산광역시	18,529	27,124	27,162	25,248	29,711	31,220	30,500	38,235	43,776	46,490	...	47,961	45,784	30,400	19,164	13,243	9,925	5,723	2,106	523	74
세종특별자치시	9,941	13,034	12,604	9,535	8,242	10,397	13,568	17,752	19,304	15,760	...	9,793	9,738	6,743	4,430	3,394	2,871	2,014	839	163	29
경기도	225,076	319,632	327,337	317,509	407,961	448,234	423,626	494,008	553,942	573,577	...	514,555	489,425	333,357	231,807	188,167	146,519	85,361	33,103	7,982	1,782
강원도	21,125	28,786	31,597	33,185	40,061	37,118	34,911	42,294	50,461	56,727	...	63,705	73,853	54,100	36,741	36,685	31,075	17,257	7,172	1,748	357
충청북도	24,103	34,144	35,652	35,008	43,448	43,605	40,542	49,091	54,562	59,938	...	62,954	68,454	49,989	33,766	31,763	27,412	17,103	6,020	1,326	251
충청남도	33,360	47,677	49,460	47,370	54,019	53,061	53,970	66,688	74,394	77,698	...	76,346	83,852	63,464	49,776	43,375	39,844	26,778	9,333	2,132	425
전라북도	23,371	36,218	40,646	41,997	50,520	45,466	39,634	49,429	59,077	68,134	...	69,420	76,246	58,091	50,816	44,006	37,765	24,453	9,313	1,906	445
전라남도	25,516	36,569	39,337	39,771	45,358	41,231	37,563	47,990	56,070	62,324	...	72,199	79,487	60,005	54,791	51,607	48,998	28,744	11,057	2,513	545
경상북도	36,267	52,093	53,066	52,840	62,146	59,051	57,674	74,002	85,110	95,918	...	109,065	119,672	95,407	71,283	62,194	57,612	35,736	12,904	2,848	564
경상남도	48,955	75,654	78,936	74,285	82,718	80,470	79,474	104,526	123,674	133,392	...	134,046	141,827	103,474	75,815	61,496	54,734	33,409	12,262	2,516	427
제주특별자치도	11,487	17,154	17,526	15,997	18,919	19,497	17,563	22,772	26,439	27,899	...	25,733	24,351	17,572	13,486	11,349	9,770	6,630	2,661	845	198

18 rows × 21 columns

데이터 시각화

import matplotlib.pyplot as plt
import matplotlib
matplotlib.rcParams['font.family'] = 'Malgun Gothic' # Windows
# matplotlib.rcParams['font.family'] = 'AppleGothic' # Mac
matplotlib.rcParams['font.size'] = 15 # 글자 크기
matplotlib.rcParams['axes.unicode_minus'] = False # 한글 폰트 사용 시, 마이너스 글자가 깨지는 현상을 해결

plt.figure(figsize=(10, 7))
plt.barh(df_m.columns, -df_m.iloc[0] // 1000) # 단위 : 천 명 (-는 음수로해서 왼쪽으로 그리기 위해)
plt.barh(df_w.columns, df_w.iloc[0] // 1000)
plt.title('2021 대한민국 인구 피라미드')
plt.savefig('2021_인구피라미드.png', dpi=100)
plt.show()

2. 출생아 수 및 합계출산율

출생아수 합계출산율 검색해서 인터넷에서 데이터 가져와서 사용

import pandas as pd
# 위의 2개 행 데이터는 필요없어서 skiprows
# nrows=2를 통해 2개 행 데이터 가져오고 자동으로 첫번째 행은 컬럼명으로 사용하게 됨(아래 데이터 3줄인거 보면 이해 될것이다)
# index_col은 굳이 컬럼명 말고, 0번째 인덱스를 사용하고싶으면 0으로 적어도 됨.
df = pd.read_excel('stat_142801.xls', skiprows=2, nrows=2, index_col=0)
df

	2011	2012	2013	2014	2015	2016	2017	2018	2019	2020
출생아 수	471.300	484.600	436.500	435.400	438.400	406.200	357.800	326.800	302.700	272.30
합계 출산율	1.244	1.297	1.187	1.205	1.239	1.172	1.052	0.977	0.918	0.84

df.index
# => df.loc['출생아 수'] 이렇게 한다면 에러가 떠서 키값 알아보기 위해 df.index 해봄
# 사실은 이렇게 알아보면서 rename할필요 없이 df.iloc를 사용해도 되긴하다

Index(['출생아 수', '합계 출산율'], dtype='object')

df.index.values

array(['출생아\xa0수', '합계\xa0출산율'], dtype=object)

df.rename(index={'출생아\xa0수':'출생아 수', '합계\xa0출산율':'합계 출산율'}, inplace=True)
df

	2011	2012	2013	2014	2015	2016	2017	2018	2019	2020
출생아 수	471.300	484.600	436.500	435.400	438.400	406.200	357.800	326.800	302.700	272.30
합계 출산율	1.244	1.297	1.187	1.205	1.239	1.172	1.052	0.977	0.918	0.84

df.index.values

array(['출생아 수', '합계 출산율'], dtype=object)

df.loc['출생아 수']

2011    471.3
2012    484.6
2013    436.5
2014    435.4
2015    438.4
2016    406.2
2017    357.8
2018    326.8
2019    302.7
2020    272.3
Name: 출생아 수, dtype: float64

df.iloc[0]

2011    471.3
2012    484.6
2013    436.5
2014    435.4
2015    438.4
2016    406.2
2017    357.8
2018    326.8
2019    302.7
2020    272.3
Name: 출생아 수, dtype: float64

df.iloc[1]

2011    1.244
2012    1.297
2013    1.187
2014    1.205
2015    1.239
2016    1.172
2017    1.052
2018    0.977
2019    0.918
2020    0.840
Name: 합계 출산율, dtype: float64

df

	2011	2012	2013	2014	2015	2016	2017	2018	2019	2020
출생아 수	471.300	484.600	436.500	435.400	438.400	406.200	357.800	326.800	302.700	272.30
합계 출산율	1.244	1.297	1.187	1.205	1.239	1.172	1.052	0.977	0.918	0.84

df = df.T # 행, 열변환(유용하니까 기억)
df

	출생아 수	합계 출산율
2011	471.3	1.244
2012	484.6	1.297
2013	436.5	1.187
2014	435.4	1.205
2015	438.4	1.239
2016	406.2	1.172
2017	357.8	1.052
2018	326.8	0.977
2019	302.7	0.918
2020	272.3	0.840

import matplotlib.pyplot as plt
import matplotlib
matplotlib.rcParams['font.family'] = 'Malgun Gothic' # Windows
# matplotlib.rcParams['font.family'] = 'AppleGothic' # Mac
matplotlib.rcParams['font.size'] = 15 # 글자 크기
matplotlib.rcParams['axes.unicode_minus'] = False # 한글 폰트 사용 시, 마이너스 글자가 깨지는 현상을 해결

plt.plot(df.index, df['출생아 수'])
plt.plot(df.index, df['합계 출산율'])
# 이렇게 보면 합계 출산율 그래프는 제대로 비교해 볼 수가 없으니 x축을 공유하는 쌍둥이를 만들겠음.

[<matplotlib.lines.Line2D at 0x2958b564eb0>]

fig, ax1 = plt.subplots(figsize=(10, 7)) # 2,2같은거 했었는데 없애면 기본값으로 그래프 1개 만듬
ax1.plot(df.index, df['출생아 수'], color='#ff812d')

ax2 = ax1.twinx() # x 축을 공유하는 쌍둥이 axis (즉, y범주는 서로 다르게해서 사용!!!)
ax2.plot(df.index, df['합계 출산율'], color='#ffd100')

[<matplotlib.lines.Line2D at 0x2958be419a0>]

fig, ax1 = plt.subplots(figsize=(13, 5))
ax1.set_ylabel('출생아 수 (천 명)')
ax1.set_ylim(250, 700) # y범위 제한
ax1.set_yticks([300, 400, 500, 600]) # y축 숫자 지정
ax1.bar(df.index, df['출생아 수'], color='#ff812d')

ax2 = ax1.twinx() # x 축을 공유하는 쌍둥이 axis
ax2.set_ylabel('합계 출산율 (가임여성 1명당 명)')
ax2.set_ylim(0, 1.5)
ax2.set_yticks([0, 1])
ax2.plot(df.index, df['합계 출산율'], color='#ffd100')

[<matplotlib.lines.Line2D at 0x2958cd24640>]

fig, ax1 = plt.subplots(figsize=(13, 5))
ax1.set_ylabel('출생아 수 (천 명)')
ax1.set_ylim(250, 700)
ax1.set_yticks([300, 400, 500, 600])
ax1.bar(df.index, df['출생아 수'], color='#ff812d')
for idx, val in enumerate(df['출생아 수']): # 텍스트 넣기 (즉, 레이블 값 넣기)
    ax1.text(idx, val + 12, val, ha='center')

ax2 = ax1.twinx() # x 축을 공유하는 쌍둥이 axis
ax2.set_ylabel('합계 출산율 (가임여성 1명당 명)')
ax2.set_ylim(0, 1.5)
ax2.set_yticks([0, 1])
ax2.plot(df.index, df['합계 출산율'], color='#ffd100')

[<matplotlib.lines.Line2D at 0x2958ff17be0>]

fig, ax1 = plt.subplots(figsize=(13, 5))
ax1.set_ylabel('출생아 수 (천 명)')
ax1.set_ylim(250, 700)
ax1.set_yticks([300, 400, 500, 600])
ax1.bar(df.index, df['출생아 수'], color='#ff812d')
for idx, val in enumerate(df['출생아 수']):
    ax1.text(idx, val + 12, val, ha='center')

ax2 = ax1.twinx() # x 축을 공유하는 쌍둥이 axis
ax2.set_ylabel('합계 출산율 (가임여성 1명당 명)')
ax2.set_ylim(0, 1.5)
ax2.set_yticks([0, 1])
ax2.plot(df.index, df['합계 출산율'], color='#ffd100', marker='o', ms=15, lw=5, mec='w', mew=3)
# 마커 속성 사용

[<matplotlib.lines.Line2D at 0x29590425f70>]

fig, ax1 = plt.subplots(figsize=(13, 5))
ax1.set_ylabel('출생아 수 (천 명)')
ax1.set_ylim(250, 700)
ax1.set_yticks([300, 400, 500, 600])
ax1.bar(df.index, df['출생아 수'], color='#ff812d')
for idx, val in enumerate(df['출생아 수']):
    ax1.text(idx, val + 12, val, ha='center')

ax2 = ax1.twinx() # x 축을 공유하는 쌍둥이 axis
ax2.set_ylabel('합계 출산율 (가임여성 1명당 명)')
ax2.set_ylim(0, 1.5)
ax2.set_yticks([0, 1])
ax2.plot(df.index, df['합계 출산율'], color='#ffd100', marker='o', ms=15, lw=5, mec='w', mew=3)
for idx, val in enumerate(df['합계 출산율']): # 텍스트 넣기(즉, 레이블값 넣기)
    ax2.text(idx, val + 0.08, val, ha='center')

fig, ax1 = plt.subplots(figsize=(13, 5))
fig.suptitle('출생아 수 및 합계출산율') # 총 제목 넣고 끝
ax1.set_ylabel('출생아 수 (천 명)')
ax1.set_ylim(250, 700)
ax1.set_yticks([300, 400, 500, 600])
ax1.bar(df.index, df['출생아 수'], color='#ff812d')
for idx, val in enumerate(df['출생아 수']):
    ax1.text(idx, val + 12, val, ha='center')

ax2 = ax1.twinx() # x 축을 공유하는 쌍둥이 axis
ax2.set_ylabel('합계 출산율 (가임여성 1명당 명)')
ax2.set_ylim(0, 1.5)
ax2.set_yticks([0, 1])
ax2.plot(df.index, df['합계 출산율'], color='#ffd100', marker='o', ms=15, lw=5, mec='w', mew=3)
for idx, val in enumerate(df['합계 출산율']):
    ax2.text(idx, val + 0.08, val, ha='center')

Twitter Facebook LinkedIn

[Python] Practice_pandas_and_matplotlib(활용편)

1. 인구 피라미드

남자 데이터 정의

여자 데이터 정의

데이터 시각화

2. 출생아 수 및 합계출산율

공유하기

댓글남기기

참고

2022-03-01

[Python] Practice_OpenCV(활용편)

PY

2022-03-01

[Python] Practice_pandas_and_matplotlib(활용편)

PY

2022-03-04

[Python] Colab 구글 드라이브 연동 및 파일 읽기

PY

1. 인구 피라미드

남자 데이터 정의

여자 데이터 정의

데이터 시각화

2. 출생아 수 및 합계출산율

공유하기

댓글남기기

참고

2022-03-01 [Python] Practice_OpenCV(활용편) PY

2022-03-01 [Python] Practice_pandas_and_matplotlib(활용편) PY

2022-03-04 [Python] Colab 구글 드라이브 연동 및 파일 읽기 PY

2022-03-01

[Python] Practice_OpenCV(활용편)

PY

2022-03-01

[Python] Practice_pandas_and_matplotlib(활용편)

PY

2022-03-04

[Python] Colab 구글 드라이브 연동 및 파일 읽기

PY