1. 영화 추천 시스템(인구통계학적, 컨텐츠 기반 필터링)

본 자료는 다음 링크의 내용을 참고하였습니다.

  • Reference : https://www.kaggle.com/code/ibtesama/getting-started-with-a-movie-recommendation-system

  • TMDB 5000 에서 수많은 영화 추천 게시글중 인기있는 예시를 가져온것이다.

  • TMDB 5000은 TMDB에 영화 5000개를 데이터로 사용할 수 있게 데이터 셋 해둔 사이트이다.

영화 추천 시스템

  1. Demographic Filtering (인구통계학적 필터링)

  2. Content Based Filtering (컨텐츠 기반 필터링)

  3. Collaborative Filtering (협업 필터링)

1. Demographic Filtering (인구통계학적 필터링)

참고 링크의 공식 사용

$ WR = (\frac{v}{v+m}\cdot R) + (\frac{m}{v+m}\cdot C) $

# csv 파일들도 참고 링크에서 가져왔다.
import pandas as pd
import numpy as np

df1 = pd.read_csv('tmdb_5000_credits.csv')
df2 = pd.read_csv('tmdb_5000_movies.csv')
df1.head()
movie_id title cast crew
0 19995 Avatar [{"cast_id": 242, "character": "Jake Sully", "... [{"credit_id": "52fe48009251416c750aca23", "de...
1 285 Pirates of the Caribbean: At World's End [{"cast_id": 4, "character": "Captain Jack Spa... [{"credit_id": "52fe4232c3a36847f800b579", "de...
2 206647 Spectre [{"cast_id": 1, "character": "James Bond", "cr... [{"credit_id": "54805967c3a36829b5002c41", "de...
3 49026 The Dark Knight Rises [{"cast_id": 2, "character": "Bruce Wayne / Ba... [{"credit_id": "52fe4781c3a36847f81398c3", "de...
4 49529 John Carter [{"cast_id": 5, "character": "John Carter", "c... [{"credit_id": "52fe479ac3a36847f813eaa3", "de...
df2.head(3)
budget genres homepage id keywords original_language original_title overview popularity production_companies production_countries release_date revenue runtime spoken_languages status tagline title vote_average vote_count
0 237000000 [{"id": 28, "name": "Action"}, {"id": 12, "nam... http://www.avatarmovie.com/ 19995 [{"id": 1463, "name": "culture clash"}, {"id":... en Avatar In the 22nd century, a paraplegic Marine is di... 150.437577 [{"name": "Ingenious Film Partners", "id": 289... [{"iso_3166_1": "US", "name": "United States o... 2009-12-10 2787965087 162.0 [{"iso_639_1": "en", "name": "English"}, {"iso... Released Enter the World of Pandora. Avatar 7.2 11800
1 300000000 [{"id": 12, "name": "Adventure"}, {"id": 14, "... http://disney.go.com/disneypictures/pirates/ 285 [{"id": 270, "name": "ocean"}, {"id": 726, "na... en Pirates of the Caribbean: At World's End Captain Barbossa, long believed to be dead, ha... 139.082615 [{"name": "Walt Disney Pictures", "id": 2}, {"... [{"iso_3166_1": "US", "name": "United States o... 2007-05-19 961000000 169.0 [{"iso_639_1": "en", "name": "English"}] Released At the end of the world, the adventure begins. Pirates of the Caribbean: At World's End 6.9 4500
2 245000000 [{"id": 28, "name": "Action"}, {"id": 12, "nam... http://www.sonypictures.com/movies/spectre/ 206647 [{"id": 470, "name": "spy"}, {"id": 818, "name... en Spectre A cryptic message from Bond’s past sends him o... 107.376788 [{"name": "Columbia Pictures", "id": 5}, {"nam... [{"iso_3166_1": "GB", "name": "United Kingdom"... 2015-10-26 880674609 148.0 [{"iso_639_1": "fr", "name": "Fran\u00e7ais"},... Released A Plan No One Escapes Spectre 6.3 4466
df1.shape, df2.shape
((4803, 4), (4803, 20))
# column들 달라서 동일한지 확인
df1['title'].equals(df2['title'])
True
df1.columns
Index(['movie_id', 'title', 'cast', 'crew'], dtype='object')
df1.columns = ['id', 'title', 'cast', 'crew']
df1.columns
Index(['id', 'title', 'cast', 'crew'], dtype='object')
# title은 동일해서 제외시킴
df1[['id', 'cast', 'crew']]
id cast crew
0 19995 [{"cast_id": 242, "character": "Jake Sully", "... [{"credit_id": "52fe48009251416c750aca23", "de...
1 285 [{"cast_id": 4, "character": "Captain Jack Spa... [{"credit_id": "52fe4232c3a36847f800b579", "de...
2 206647 [{"cast_id": 1, "character": "James Bond", "cr... [{"credit_id": "54805967c3a36829b5002c41", "de...
3 49026 [{"cast_id": 2, "character": "Bruce Wayne / Ba... [{"credit_id": "52fe4781c3a36847f81398c3", "de...
4 49529 [{"cast_id": 5, "character": "John Carter", "c... [{"credit_id": "52fe479ac3a36847f813eaa3", "de...
... ... ... ...
4798 9367 [{"cast_id": 1, "character": "El Mariachi", "c... [{"credit_id": "52fe44eec3a36847f80b280b", "de...
4799 72766 [{"cast_id": 1, "character": "Buzzy", "credit_... [{"credit_id": "52fe487dc3a368484e0fb013", "de...
4800 231617 [{"cast_id": 8, "character": "Oliver O\u2019To... [{"credit_id": "52fe4df3c3a36847f8275ecf", "de...
4801 126186 [{"cast_id": 3, "character": "Sam", "credit_id... [{"credit_id": "52fe4ad9c3a368484e16a36b", "de...
4802 25975 [{"cast_id": 3, "character": "Herself", "credi... [{"credit_id": "58ce021b9251415a390165d9", "de...

4803 rows × 3 columns

# 구한 df1과 df2를 merge(합침)
df2 = df2.merge(df1[['id', 'cast', 'crew']], on='id')
df2.head(3)
budget genres homepage id keywords original_language original_title overview popularity production_companies ... revenue runtime spoken_languages status tagline title vote_average vote_count cast crew
0 237000000 [{"id": 28, "name": "Action"}, {"id": 12, "nam... http://www.avatarmovie.com/ 19995 [{"id": 1463, "name": "culture clash"}, {"id":... en Avatar In the 22nd century, a paraplegic Marine is di... 150.437577 [{"name": "Ingenious Film Partners", "id": 289... ... 2787965087 162.0 [{"iso_639_1": "en", "name": "English"}, {"iso... Released Enter the World of Pandora. Avatar 7.2 11800 [{"cast_id": 242, "character": "Jake Sully", "... [{"credit_id": "52fe48009251416c750aca23", "de...
1 300000000 [{"id": 12, "name": "Adventure"}, {"id": 14, "... http://disney.go.com/disneypictures/pirates/ 285 [{"id": 270, "name": "ocean"}, {"id": 726, "na... en Pirates of the Caribbean: At World's End Captain Barbossa, long believed to be dead, ha... 139.082615 [{"name": "Walt Disney Pictures", "id": 2}, {"... ... 961000000 169.0 [{"iso_639_1": "en", "name": "English"}] Released At the end of the world, the adventure begins. Pirates of the Caribbean: At World's End 6.9 4500 [{"cast_id": 4, "character": "Captain Jack Spa... [{"credit_id": "52fe4232c3a36847f800b579", "de...
2 245000000 [{"id": 28, "name": "Action"}, {"id": 12, "nam... http://www.sonypictures.com/movies/spectre/ 206647 [{"id": 470, "name": "spy"}, {"id": 818, "name... en Spectre A cryptic message from Bond’s past sends him o... 107.376788 [{"name": "Columbia Pictures", "id": 5}, {"nam... ... 880674609 148.0 [{"iso_639_1": "fr", "name": "Fran\u00e7ais"},... Released A Plan No One Escapes Spectre 6.3 4466 [{"cast_id": 1, "character": "James Bond", "cr... [{"credit_id": "54805967c3a36829b5002c41", "de...

3 rows × 22 columns

영화 1 : 영화의 평점이 10/10 -> 5명이 평가

영화 2 : 영화의 평점이 8/10 -> 500명이 평가

  • 당연히 500명의 영화가 더 신뢰도가 높다.

  • 참고 링크에서 점수 계산해주는 공식 활용(링크 내용을 따라가는 중임)

    • $ WR = (\frac{v}{v+m}\cdot R) + (\frac{m}{v+m}\cdot C) $
C = df2['vote_average'].mean()
C # 전체 영화의 평균 평점
6.092171559442011
m = df2['vote_count'].quantile(0.9)
m # 상위 10%의 평가수를 가지는 데이터들
1838.4000000000015
# df2를 복제(copy)해서 loc로 'vote_count'가 m보다 큰 데이터들만 가져옴
q_movies = df2.copy().loc[df2['vote_count'] >= m]
q_movies.shape
(481, 22)
# 가장 적은 평가 개수가 1840으로 위의 m보다 큰것이 가장 최소인걸 알 수 있다.
q_movies['vote_count'].sort_values()
2585     1840
195      1851
2454     1859
597      1862
1405     1864
        ...  
788     10995
16      11776
0       11800
65      12002
96      13752
Name: vote_count, Length: 481, dtype: int64
# 위 참고 링크에서 본 공식을 생성
def weighted_rating(x, m=m, C=C):
    v = x['vote_count']
    R = x['vote_average']
    return (v / (v + m) * R) + (m / (m + v) * C)
# 함수를 통해 값을 얻어서 'score'라는 새로운 열에 추가
q_movies['score'] = q_movies.apply(weighted_rating, axis=1)
q_movies.head(3)
budget genres homepage id keywords original_language original_title overview popularity production_companies ... runtime spoken_languages status tagline title vote_average vote_count cast crew score
0 237000000 [{"id": 28, "name": "Action"}, {"id": 12, "nam... http://www.avatarmovie.com/ 19995 [{"id": 1463, "name": "culture clash"}, {"id":... en Avatar In the 22nd century, a paraplegic Marine is di... 150.437577 [{"name": "Ingenious Film Partners", "id": 289... ... 162.0 [{"iso_639_1": "en", "name": "English"}, {"iso... Released Enter the World of Pandora. Avatar 7.2 11800 [{"cast_id": 242, "character": "Jake Sully", "... [{"credit_id": "52fe48009251416c750aca23", "de... 7.050669
1 300000000 [{"id": 12, "name": "Adventure"}, {"id": 14, "... http://disney.go.com/disneypictures/pirates/ 285 [{"id": 270, "name": "ocean"}, {"id": 726, "na... en Pirates of the Caribbean: At World's End Captain Barbossa, long believed to be dead, ha... 139.082615 [{"name": "Walt Disney Pictures", "id": 2}, {"... ... 169.0 [{"iso_639_1": "en", "name": "English"}] Released At the end of the world, the adventure begins. Pirates of the Caribbean: At World's End 6.9 4500 [{"cast_id": 4, "character": "Captain Jack Spa... [{"credit_id": "52fe4232c3a36847f800b579", "de... 6.665696
2 245000000 [{"id": 28, "name": "Action"}, {"id": 12, "nam... http://www.sonypictures.com/movies/spectre/ 206647 [{"id": 470, "name": "spy"}, {"id": 818, "name... en Spectre A cryptic message from Bond’s past sends him o... 107.376788 [{"name": "Columbia Pictures", "id": 5}, {"nam... ... 148.0 [{"iso_639_1": "fr", "name": "Fran\u00e7ais"},... Released A Plan No One Escapes Spectre 6.3 4466 [{"cast_id": 1, "character": "James Bond", "cr... [{"credit_id": "54805967c3a36829b5002c41", "de... 6.239396

3 rows × 23 columns

q_movies = q_movies.sort_values('score', ascending=False) # 내림차순
q_movies[['title', 'vote_count', 'vote_average', 'score']].head(10)
title vote_count vote_average score
1881 The Shawshank Redemption 8205 8.5 8.059258
662 Fight Club 9413 8.3 7.939256
65 The Dark Knight 12002 8.2 7.920020
3232 Pulp Fiction 8428 8.3 7.904645
96 Inception 13752 8.1 7.863239
3337 The Godfather 5893 8.4 7.851236
95 Interstellar 10867 8.1 7.809479
809 Forrest Gump 7927 8.2 7.803188
329 The Lord of the Rings: The Return of the King 8064 8.1 7.727243
1990 The Empire Strikes Back 5879 8.2 7.697884
pop= df2.sort_values('popularity', ascending=False)
import matplotlib.pyplot as plt
plt.figure(figsize=(12,4))

plt.barh(pop['title'].head(10),pop['popularity'].head(10), align='center',
        color='skyblue')
plt.gca().invert_yaxis()
plt.xlabel("Popularity")
plt.title("Popular Movies")
Text(0.5, 1.0, 'Popular Movies')

2. Content Based Filtering (컨텐츠 기반 필터링)

컨텐츠의 문자들을 유사도를 통해서 유사도가 상위인 데이터들 가져오는 형식

줄거리 기반 추천

‘overview’

df2['overview'].head(5)
0    In the 22nd century, a paraplegic Marine is di...
1    Captain Barbossa, long believed to be dead, ha...
2    A cryptic message from Bond’s past sends him o...
3    Following the death of District Attorney Harve...
4    John Carter is a war-weary, former military ca...
Name: overview, dtype: object

Bag Of Words - BOW

문장1 : I am a boy

문장2 : I am a girl

I(2), am(2), a(2), boy(1), girl(1)

    I    am   a   boy    girl

문장1 1 1 1 1 0 (1,1,1,1,0)

(I am a boy)

문장2 1 1 1 0 1 (1,1,1,0,1)

(I am a girl)

피처 벡터화.

문서 100개

모든 문서에서 나온 단어 10,000 개

100 * 10,000 = 100만

    단어1, 단어2, 단어3, 단어4, .... 단어 10000

문서1 1 1 3 0

문서2

문서3

..

문서100

  1. TfidfVectorizer (TF-IDF 기반의 벡터화)

  2. CountVectorizer

TfidfVectorizer (TF-IDF 기반의 벡터화)은 a, the 등등 어디 문서에서든 많이 나오므로 필요없는 이 영어들은 제외하고 나머지에서 위처럼 단어들을 필터링 해주는 방식을 의미

영화 줄거리는 이런 영어들 많이 사용하므로 이 기술을 사용

from sklearn.feature_extraction.text import TfidfVectorizer
tfidf = TfidfVectorizer(stop_words='english') # 필요없는 영어들 제외
from sklearn.feature_extraction.text import ENGLISH_STOP_WORDS
ENGLISH_STOP_WORDS # 무엇을 제외했는지 보는 방법
frozenset({'a',
           'about',
           'above',
           'across',
           'after',
           'afterwards',
           'again',
           'against',
           'all',
           'almost',
           'alone',
           'along',
           'already',
           'also',
           'although',
           'always',
           'am',
           'among',
           'amongst',
           'amoungst',
           'amount',
           'an',
           'and',
           'another',
           'any',
           'anyhow',
           'anyone',
           'anything',
           'anyway',
           'anywhere',
           'are',
           'around',
           'as',
           'at',
           'back',
           'be',
           'became',
           'because',
           'become',
           'becomes',
           'becoming',
           'been',
           'before',
           'beforehand',
           'behind',
           'being',
           'below',
           'beside',
           'besides',
           'between',
           'beyond',
           'bill',
           'both',
           'bottom',
           'but',
           'by',
           'call',
           'can',
           'cannot',
           'cant',
           'co',
           'con',
           'could',
           'couldnt',
           'cry',
           'de',
           'describe',
           'detail',
           'do',
           'done',
           'down',
           'due',
           'during',
           'each',
           'eg',
           'eight',
           'either',
           'eleven',
           'else',
           'elsewhere',
           'empty',
           'enough',
           'etc',
           'even',
           'ever',
           'every',
           'everyone',
           'everything',
           'everywhere',
           'except',
           'few',
           'fifteen',
           'fifty',
           'fill',
           'find',
           'fire',
           'first',
           'five',
           'for',
           'former',
           'formerly',
           'forty',
           'found',
           'four',
           'from',
           'front',
           'full',
           'further',
           'get',
           'give',
           'go',
           'had',
           'has',
           'hasnt',
           'have',
           'he',
           'hence',
           'her',
           'here',
           'hereafter',
           'hereby',
           'herein',
           'hereupon',
           'hers',
           'herself',
           'him',
           'himself',
           'his',
           'how',
           'however',
           'hundred',
           'i',
           'ie',
           'if',
           'in',
           'inc',
           'indeed',
           'interest',
           'into',
           'is',
           'it',
           'its',
           'itself',
           'keep',
           'last',
           'latter',
           'latterly',
           'least',
           'less',
           'ltd',
           'made',
           'many',
           'may',
           'me',
           'meanwhile',
           'might',
           'mill',
           'mine',
           'more',
           'moreover',
           'most',
           'mostly',
           'move',
           'much',
           'must',
           'my',
           'myself',
           'name',
           'namely',
           'neither',
           'never',
           'nevertheless',
           'next',
           'nine',
           'no',
           'nobody',
           'none',
           'noone',
           'nor',
           'not',
           'nothing',
           'now',
           'nowhere',
           'of',
           'off',
           'often',
           'on',
           'once',
           'one',
           'only',
           'onto',
           'or',
           'other',
           'others',
           'otherwise',
           'our',
           'ours',
           'ourselves',
           'out',
           'over',
           'own',
           'part',
           'per',
           'perhaps',
           'please',
           'put',
           'rather',
           're',
           'same',
           'see',
           'seem',
           'seemed',
           'seeming',
           'seems',
           'serious',
           'several',
           'she',
           'should',
           'show',
           'side',
           'since',
           'sincere',
           'six',
           'sixty',
           'so',
           'some',
           'somehow',
           'someone',
           'something',
           'sometime',
           'sometimes',
           'somewhere',
           'still',
           'such',
           'system',
           'take',
           'ten',
           'than',
           'that',
           'the',
           'their',
           'them',
           'themselves',
           'then',
           'thence',
           'there',
           'thereafter',
           'thereby',
           'therefore',
           'therein',
           'thereupon',
           'these',
           'they',
           'thick',
           'thin',
           'third',
           'this',
           'those',
           'though',
           'three',
           'through',
           'throughout',
           'thru',
           'thus',
           'to',
           'together',
           'too',
           'top',
           'toward',
           'towards',
           'twelve',
           'twenty',
           'two',
           'un',
           'under',
           'until',
           'up',
           'upon',
           'us',
           'very',
           'via',
           'was',
           'we',
           'well',
           'were',
           'what',
           'whatever',
           'when',
           'whence',
           'whenever',
           'where',
           'whereafter',
           'whereas',
           'whereby',
           'wherein',
           'whereupon',
           'wherever',
           'whether',
           'which',
           'while',
           'whither',
           'who',
           'whoever',
           'whole',
           'whom',
           'whose',
           'why',
           'will',
           'with',
           'within',
           'without',
           'would',
           'yet',
           'you',
           'your',
           'yours',
           'yourself',
           'yourselves'})
# null 이 하나라도 있다면 true 반환
df2['overview'].isnull().values.any()
True
# null 값을 찾아서 '' 값으로 삽입
df2['overview'] = df2['overview'].fillna('')
tfidf_matrix = tfidf.fit_transform(df2['overview'])
tfidf_matrix.shape
(4803, 20978)
tfidf_matrix
<4803x20978 sparse matrix of type '<class 'numpy.float64'>'
	with 125840 stored elements in Compressed Sparse Row format>
# 신뢰도 - 코사인 유사도
from sklearn.metrics.pairwise import linear_kernel

cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)
cosine_sim
array([[1.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 1.        , 0.        , ..., 0.02160533, 0.        ,
        0.        ],
       [0.        , 0.        , 1.        , ..., 0.01488159, 0.        ,
        0.        ],
       ...,
       [0.        , 0.02160533, 0.01488159, ..., 1.        , 0.01609091,
        0.00701914],
       [0.        , 0.        , 0.        , ..., 0.01609091, 1.        ,
        0.01171696],
       [0.        , 0.        , 0.        , ..., 0.00701914, 0.01171696,
        1.        ]])

| | 문장1 | 문장2 | 문장3 |

|—|—|—|—|

문장1 1 0.3 0.8
문장2 0.3 1 0.5
문장3 0.8 0.5 1
# 문장1이 자신을 제외한 가장 유사도 높은 값은? 문장3
cosine_sim.shape # 대칭
(4803, 4803)
# 참고 : Series는 1차원 배열로 생각하면 됨
indices = pd.Series(df2.index, index=df2['title']).drop_duplicates()
indices
title
Avatar                                         0
Pirates of the Caribbean: At World's End       1
Spectre                                        2
The Dark Knight Rises                          3
John Carter                                    4
                                            ... 
El Mariachi                                 4798
Newlyweds                                   4799
Signed, Sealed, Delivered                   4800
Shanghai Calling                            4801
My Date with Drew                           4802
Length: 4803, dtype: int64
indices['The Dark Knight Rises']
3
df2.iloc[[3]]
budget genres homepage id keywords original_language original_title overview popularity production_companies ... revenue runtime spoken_languages status tagline title vote_average vote_count cast crew
3 250000000 [{"id": 28, "name": "Action"}, {"id": 80, "nam... http://www.thedarkknightrises.com/ 49026 [{"id": 849, "name": "dc comics"}, {"id": 853,... en The Dark Knight Rises Following the death of District Attorney Harve... 112.31295 [{"name": "Legendary Pictures", "id": 923}, {"... ... 1084939099 165.0 [{"iso_639_1": "en", "name": "English"}] Released The Legend Ends The Dark Knight Rises 7.6 9106 [{"cast_id": 2, "character": "Bruce Wayne / Ba... [{"credit_id": "52fe4781c3a36847f81398c3", "de...

1 rows × 22 columns

# 영화의 제목을 입력받으면 코사인 유사도를 통해서 가장 유사도가 높은 상위 10개의 영화 목록 반환
def get_recommendations(title, cosine_sim=cosine_sim):
    # 영화 제목을 통해서 전체 데이터 기준 그 영화의 index 값을 얻기
    idx = indices[title]
    
    # 코사인 유사도 매트릭스 (cosine_sim) 에서 idx 에 해당하는 데이터를 (idx, 유사도) 형태로 얻기
    sim_scores = list(enumerate(cosine_sim[idx]))
    
    # 코사인 유사도 기준으로 내림차순 정렬
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    
    # 자기 자신을 제외한 10개의 추천 영화를 슬라이싱
    sim_scores = sim_scores[1:11]
    
    # 추천 영화 목록 10개의 인덱스 정보 추출
    movie_indices = [i[0] for i in sim_scores]
    
    # 인덱스 정보를 통해 영화 제목 추출
    return df2['title'].iloc[movie_indices]
test_idx = indices['The Dark Knight Rises'] # 영화 제목을 통해서 전체 데이터 기준 그 영화의 index 값을 얻기
test_idx
3
cosine_sim[3] # 유사도
array([0.02499512, 0.        , 0.        , ..., 0.03386366, 0.04275232,
       0.02269198])
test_sim_scores = list(enumerate(cosine_sim[3])) # 코사인 유사도 매트릭스 (cosine_sim) 에서 idx 에 해당하는 데이터를 (idx, 유사도) 형태로 얻기
test_sim_scores = sorted(test_sim_scores, key=lambda x: x[1], reverse=True) # 코사인 유사도 기준으로 내림차순 정렬
test_sim_scores[1:11] # 자기 자신을 제외한 10개의 추천 영화를 슬라이싱
[(65, 0.30151176591665485),
 (299, 0.29857045255396825),
 (428, 0.2878505467001694),
 (1359, 0.264460923827995),
 (3854, 0.18545003006561456),
 (119, 0.16799626199850706),
 (2507, 0.16682891043358278),
 (9, 0.1337400906655523),
 (1181, 0.13219702138476813),
 (210, 0.13045537014449818)]
# 람다식 사용방식 보여주기위함
def get_second(x):
    return x[1]

lst = ['인덱스', '유사도']
print(get_second(lst))
유사도
# 람다식 사용방식 보여주기위함
# x[1]로 함수를 만든거고 lst가 x로 사용된거라고 생각하면 됨
(lambda x: x[1])(lst)
'유사도'
# 추천 영화 목록 10개의 인덱스 정보 추출
test_movie_indices = [i[0] for i in test_sim_scores[1:11]]
test_movie_indices
[65, 299, 428, 1359, 3854, 119, 2507, 9, 1181, 210]
# 인덱스 정보를 통해 영화 제목 추출
df2['title'].iloc[test_movie_indices]
65                              The Dark Knight
299                              Batman Forever
428                              Batman Returns
1359                                     Batman
3854    Batman: The Dark Knight Returns, Part 2
119                               Batman Begins
2507                                  Slow Burn
9            Batman v Superman: Dawn of Justice
1181                                        JFK
210                              Batman & Robin
Name: title, dtype: object
df2['title'][:20]
0                                          Avatar
1        Pirates of the Caribbean: At World's End
2                                         Spectre
3                           The Dark Knight Rises
4                                     John Carter
5                                    Spider-Man 3
6                                         Tangled
7                         Avengers: Age of Ultron
8          Harry Potter and the Half-Blood Prince
9              Batman v Superman: Dawn of Justice
10                               Superman Returns
11                              Quantum of Solace
12     Pirates of the Caribbean: Dead Man's Chest
13                                The Lone Ranger
14                                   Man of Steel
15       The Chronicles of Narnia: Prince Caspian
16                                   The Avengers
17    Pirates of the Caribbean: On Stranger Tides
18                                 Men in Black 3
19      The Hobbit: The Battle of the Five Armies
Name: title, dtype: object
get_recommendations('Avengers: Age of Ultron')
16                    The Avengers
79                      Iron Man 2
68                        Iron Man
26      Captain America: Civil War
227                 Knight and Day
31                      Iron Man 3
1868            Cradle 2 the Grave
344                    Unstoppable
1922                    Gettysburg
531        The Man from U.N.C.L.E.
Name: title, dtype: object
get_recommendations('The Avengers')
7               Avengers: Age of Ultron
3144                            Plastic
1715                            Timecop
4124                 This Thing of Ours
3311              Thank You for Smoking
3033                      The Corruptor
588     Wall Street: Money Never Sleeps
2136         Team America: World Police
1468                       The Fountain
1286                        Snowpiercer
Name: title, dtype: object

다양한 요소 기반 추천 (장르, 감독, 키워드 등)

위의 줄거리말고 다른것들 해보는것

df2.head(3)
budget genres homepage id keywords original_language original_title overview popularity production_companies ... revenue runtime spoken_languages status tagline title vote_average vote_count cast crew
0 237000000 [{"id": 28, "name": "Action"}, {"id": 12, "nam... http://www.avatarmovie.com/ 19995 [{"id": 1463, "name": "culture clash"}, {"id":... en Avatar In the 22nd century, a paraplegic Marine is di... 150.437577 [{"name": "Ingenious Film Partners", "id": 289... ... 2787965087 162.0 [{"iso_639_1": "en", "name": "English"}, {"iso... Released Enter the World of Pandora. Avatar 7.2 11800 [{"cast_id": 242, "character": "Jake Sully", "... [{"credit_id": "52fe48009251416c750aca23", "de...
1 300000000 [{"id": 12, "name": "Adventure"}, {"id": 14, "... http://disney.go.com/disneypictures/pirates/ 285 [{"id": 270, "name": "ocean"}, {"id": 726, "na... en Pirates of the Caribbean: At World's End Captain Barbossa, long believed to be dead, ha... 139.082615 [{"name": "Walt Disney Pictures", "id": 2}, {"... ... 961000000 169.0 [{"iso_639_1": "en", "name": "English"}] Released At the end of the world, the adventure begins. Pirates of the Caribbean: At World's End 6.9 4500 [{"cast_id": 4, "character": "Captain Jack Spa... [{"credit_id": "52fe4232c3a36847f800b579", "de...
2 245000000 [{"id": 28, "name": "Action"}, {"id": 12, "nam... http://www.sonypictures.com/movies/spectre/ 206647 [{"id": 470, "name": "spy"}, {"id": 818, "name... en Spectre A cryptic message from Bond’s past sends him o... 107.376788 [{"name": "Columbia Pictures", "id": 5}, {"nam... ... 880674609 148.0 [{"iso_639_1": "fr", "name": "Fran\u00e7ais"},... Released A Plan No One Escapes Spectre 6.3 4466 [{"cast_id": 1, "character": "James Bond", "cr... [{"credit_id": "54805967c3a36829b5002c41", "de...

3 rows × 22 columns

df2.loc[0, 'genres'] # 0번째 행의 장르 가져온것
'[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]'
s1 = [{"id": 28, "name": "Action"}]
s2 = '[{"id": 28, "name": "Action"}]'
type(s1), type(s2)
(list, str)
# 위의 s2 형식을 list로 바꾸는 방법
from ast import literal_eval
s2 = literal_eval(s2)
s2, type(s2)
([{'id': 28, 'name': 'Action'}], list)
print(s1)
print(s2)
[{'id': 28, 'name': 'Action'}]
[{'id': 28, 'name': 'Action'}]
# 반복문으로 4개 열을 다 list형태로 바꾸는중
features = ['cast', 'crew', 'keywords', 'genres']
for feature in features:
    df2[feature] = df2[feature].apply(literal_eval)
# list로 바뀐것을 볼 수 있음
df2.loc[0, 'crew']
[{'credit_id': '52fe48009251416c750aca23',
  'department': 'Editing',
  'gender': 0,
  'id': 1721,
  'job': 'Editor',
  'name': 'Stephen E. Rivkin'},
 {'credit_id': '539c47ecc3a36810e3001f87',
  'department': 'Art',
  'gender': 2,
  'id': 496,
  'job': 'Production Design',
  'name': 'Rick Carter'},
 {'credit_id': '54491c89c3a3680fb4001cf7',
  'department': 'Sound',
  'gender': 0,
  'id': 900,
  'job': 'Sound Designer',
  'name': 'Christopher Boyes'},
 {'credit_id': '54491cb70e0a267480001bd0',
  'department': 'Sound',
  'gender': 0,
  'id': 900,
  'job': 'Supervising Sound Editor',
  'name': 'Christopher Boyes'},
 {'credit_id': '539c4a4cc3a36810c9002101',
  'department': 'Production',
  'gender': 1,
  'id': 1262,
  'job': 'Casting',
  'name': 'Mali Finn'},
 {'credit_id': '5544ee3b925141499f0008fc',
  'department': 'Sound',
  'gender': 2,
  'id': 1729,
  'job': 'Original Music Composer',
  'name': 'James Horner'},
 {'credit_id': '52fe48009251416c750ac9c3',
  'department': 'Directing',
  'gender': 2,
  'id': 2710,
  'job': 'Director',
  'name': 'James Cameron'},
 {'credit_id': '52fe48009251416c750ac9d9',
  'department': 'Writing',
  'gender': 2,
  'id': 2710,
  'job': 'Writer',
  'name': 'James Cameron'},
 {'credit_id': '52fe48009251416c750aca17',
  'department': 'Editing',
  'gender': 2,
  'id': 2710,
  'job': 'Editor',
  'name': 'James Cameron'},
 {'credit_id': '52fe48009251416c750aca29',
  'department': 'Production',
  'gender': 2,
  'id': 2710,
  'job': 'Producer',
  'name': 'James Cameron'},
 {'credit_id': '52fe48009251416c750aca3f',
  'department': 'Writing',
  'gender': 2,
  'id': 2710,
  'job': 'Screenplay',
  'name': 'James Cameron'},
 {'credit_id': '539c4987c3a36810ba0021a4',
  'department': 'Art',
  'gender': 2,
  'id': 7236,
  'job': 'Art Direction',
  'name': 'Andrew Menzies'},
 {'credit_id': '549598c3c3a3686ae9004383',
  'department': 'Visual Effects',
  'gender': 0,
  'id': 6690,
  'job': 'Visual Effects Producer',
  'name': 'Jill Brooks'},
 {'credit_id': '52fe48009251416c750aca4b',
  'department': 'Production',
  'gender': 1,
  'id': 6347,
  'job': 'Casting',
  'name': 'Margery Simkin'},
 {'credit_id': '570b6f419251417da70032fe',
  'department': 'Art',
  'gender': 2,
  'id': 6878,
  'job': 'Supervising Art Director',
  'name': 'Kevin Ishioka'},
 {'credit_id': '5495a0fac3a3686ae9004468',
  'department': 'Sound',
  'gender': 0,
  'id': 6883,
  'job': 'Music Editor',
  'name': 'Dick Bernstein'},
 {'credit_id': '54959706c3a3686af3003e81',
  'department': 'Sound',
  'gender': 0,
  'id': 8159,
  'job': 'Sound Effects Editor',
  'name': 'Shannon Mills'},
 {'credit_id': '54491d58c3a3680fb1001ccb',
  'department': 'Sound',
  'gender': 0,
  'id': 8160,
  'job': 'Foley',
  'name': 'Dennie Thorpe'},
 {'credit_id': '54491d6cc3a3680fa5001b2c',
  'department': 'Sound',
  'gender': 0,
  'id': 8163,
  'job': 'Foley',
  'name': 'Jana Vance'},
 {'credit_id': '52fe48009251416c750aca57',
  'department': 'Costume & Make-Up',
  'gender': 1,
  'id': 8527,
  'job': 'Costume Design',
  'name': 'Deborah Lynn Scott'},
 {'credit_id': '52fe48009251416c750aca2f',
  'department': 'Production',
  'gender': 2,
  'id': 8529,
  'job': 'Producer',
  'name': 'Jon Landau'},
 {'credit_id': '539c4937c3a36810ba002194',
  'department': 'Art',
  'gender': 0,
  'id': 9618,
  'job': 'Art Direction',
  'name': 'Sean Haworth'},
 {'credit_id': '539c49b6c3a36810c10020e6',
  'department': 'Art',
  'gender': 1,
  'id': 12653,
  'job': 'Set Decoration',
  'name': 'Kim Sinclair'},
 {'credit_id': '570b6f2f9251413a0e00020d',
  'department': 'Art',
  'gender': 1,
  'id': 12653,
  'job': 'Supervising Art Director',
  'name': 'Kim Sinclair'},
 {'credit_id': '54491a6c0e0a26748c001b19',
  'department': 'Art',
  'gender': 2,
  'id': 14350,
  'job': 'Set Designer',
  'name': 'Richard F. Mays'},
 {'credit_id': '56928cf4c3a3684cff0025c4',
  'department': 'Production',
  'gender': 1,
  'id': 20294,
  'job': 'Executive Producer',
  'name': 'Laeta Kalogridis'},
 {'credit_id': '52fe48009251416c750aca51',
  'department': 'Costume & Make-Up',
  'gender': 0,
  'id': 17675,
  'job': 'Costume Design',
  'name': 'Mayes C. Rubeo'},
 {'credit_id': '52fe48009251416c750aca11',
  'department': 'Camera',
  'gender': 2,
  'id': 18265,
  'job': 'Director of Photography',
  'name': 'Mauro Fiore'},
 {'credit_id': '5449194d0e0a26748f001b39',
  'department': 'Art',
  'gender': 0,
  'id': 42281,
  'job': 'Set Designer',
  'name': 'Scott Herbertson'},
 {'credit_id': '52fe48009251416c750aca05',
  'department': 'Crew',
  'gender': 0,
  'id': 42288,
  'job': 'Stunts',
  'name': 'Woody Schultz'},
 {'credit_id': '5592aefb92514152de0010f5',
  'department': 'Costume & Make-Up',
  'gender': 0,
  'id': 29067,
  'job': 'Makeup Artist',
  'name': 'Linda DeVetta'},
 {'credit_id': '5592afa492514152de00112c',
  'department': 'Costume & Make-Up',
  'gender': 0,
  'id': 29067,
  'job': 'Hairstylist',
  'name': 'Linda DeVetta'},
 {'credit_id': '54959ed592514130fc002e5d',
  'department': 'Camera',
  'gender': 2,
  'id': 33302,
  'job': 'Camera Operator',
  'name': 'Richard Bluck'},
 {'credit_id': '539c4891c3a36810ba002147',
  'department': 'Art',
  'gender': 2,
  'id': 33303,
  'job': 'Art Direction',
  'name': 'Simon Bright'},
 {'credit_id': '54959c069251417a81001f3a',
  'department': 'Visual Effects',
  'gender': 0,
  'id': 113145,
  'job': 'Visual Effects Supervisor',
  'name': 'Richard Martin'},
 {'credit_id': '54959a0dc3a3680ff5002c8d',
  'department': 'Crew',
  'gender': 2,
  'id': 58188,
  'job': 'Visual Effects Editor',
  'name': 'Steve R. Moore'},
 {'credit_id': '52fe48009251416c750aca1d',
  'department': 'Editing',
  'gender': 2,
  'id': 58871,
  'job': 'Editor',
  'name': 'John Refoua'},
 {'credit_id': '54491a4dc3a3680fc30018ca',
  'department': 'Art',
  'gender': 0,
  'id': 92359,
  'job': 'Set Designer',
  'name': 'Karl J. Martin'},
 {'credit_id': '52fe48009251416c750aca35',
  'department': 'Camera',
  'gender': 1,
  'id': 72201,
  'job': 'Director of Photography',
  'name': 'Chiling Lin'},
 {'credit_id': '52fe48009251416c750ac9ff',
  'department': 'Crew',
  'gender': 0,
  'id': 89714,
  'job': 'Stunts',
  'name': 'Ilram Choi'},
 {'credit_id': '54959c529251416e2b004394',
  'department': 'Visual Effects',
  'gender': 2,
  'id': 93214,
  'job': 'Visual Effects Supervisor',
  'name': 'Steven Quale'},
 {'credit_id': '54491edf0e0a267489001c37',
  'department': 'Crew',
  'gender': 1,
  'id': 122607,
  'job': 'Dialect Coach',
  'name': 'Carla Meyer'},
 {'credit_id': '539c485bc3a368653d001a3a',
  'department': 'Art',
  'gender': 2,
  'id': 132585,
  'job': 'Art Direction',
  'name': 'Nick Bassett'},
 {'credit_id': '539c4903c3a368653d001a74',
  'department': 'Art',
  'gender': 0,
  'id': 132596,
  'job': 'Art Direction',
  'name': 'Jill Cormack'},
 {'credit_id': '539c4967c3a368653d001a94',
  'department': 'Art',
  'gender': 0,
  'id': 132604,
  'job': 'Art Direction',
  'name': 'Andy McLaren'},
 {'credit_id': '52fe48009251416c750aca45',
  'department': 'Crew',
  'gender': 0,
  'id': 236696,
  'job': 'Motion Capture Artist',
  'name': 'Terry Notary'},
 {'credit_id': '54959e02c3a3680fc60027d2',
  'department': 'Crew',
  'gender': 2,
  'id': 956198,
  'job': 'Stunt Coordinator',
  'name': 'Garrett Warren'},
 {'credit_id': '54959ca3c3a3686ae300438c',
  'department': 'Visual Effects',
  'gender': 2,
  'id': 957874,
  'job': 'Visual Effects Supervisor',
  'name': 'Jonathan Rothbart'},
 {'credit_id': '570b6f519251412c74001b2f',
  'department': 'Art',
  'gender': 0,
  'id': 957889,
  'job': 'Supervising Art Director',
  'name': 'Stefan Dechant'},
 {'credit_id': '570b6f62c3a3680b77007460',
  'department': 'Art',
  'gender': 2,
  'id': 959555,
  'job': 'Supervising Art Director',
  'name': 'Todd Cherniawsky'},
 {'credit_id': '539c4a3ac3a36810da0021cc',
  'department': 'Production',
  'gender': 0,
  'id': 1016177,
  'job': 'Casting',
  'name': 'Miranda Rivers'},
 {'credit_id': '539c482cc3a36810c1002062',
  'department': 'Art',
  'gender': 0,
  'id': 1032536,
  'job': 'Production Design',
  'name': 'Robert Stromberg'},
 {'credit_id': '539c4b65c3a36810c9002125',
  'department': 'Costume & Make-Up',
  'gender': 2,
  'id': 1071680,
  'job': 'Costume Design',
  'name': 'John Harding'},
 {'credit_id': '54959e6692514130fc002e4e',
  'department': 'Camera',
  'gender': 0,
  'id': 1177364,
  'job': 'Steadicam Operator',
  'name': 'Roberto De Angelis'},
 {'credit_id': '539c49f1c3a368653d001aac',
  'department': 'Costume & Make-Up',
  'gender': 2,
  'id': 1202850,
  'job': 'Makeup Department Head',
  'name': 'Mike Smithson'},
 {'credit_id': '5495999ec3a3686ae100460c',
  'department': 'Visual Effects',
  'gender': 0,
  'id': 1204668,
  'job': 'Visual Effects Producer',
  'name': 'Alain Lalanne'},
 {'credit_id': '54959cdfc3a3681153002729',
  'department': 'Visual Effects',
  'gender': 0,
  'id': 1206410,
  'job': 'Visual Effects Supervisor',
  'name': 'Lucas Salton'},
 {'credit_id': '549596239251417a81001eae',
  'department': 'Crew',
  'gender': 0,
  'id': 1234266,
  'job': 'Post Production Supervisor',
  'name': 'Janace Tashjian'},
 {'credit_id': '54959c859251416e1e003efe',
  'department': 'Visual Effects',
  'gender': 0,
  'id': 1271932,
  'job': 'Visual Effects Supervisor',
  'name': 'Stephen Rosenbaum'},
 {'credit_id': '5592af28c3a368775a00105f',
  'department': 'Costume & Make-Up',
  'gender': 0,
  'id': 1310064,
  'job': 'Makeup Artist',
  'name': 'Frankie Karena'},
 {'credit_id': '539c4adfc3a36810e300203b',
  'department': 'Costume & Make-Up',
  'gender': 1,
  'id': 1319844,
  'job': 'Costume Supervisor',
  'name': 'Lisa Lovaas'},
 {'credit_id': '54959b579251416e2b004371',
  'department': 'Visual Effects',
  'gender': 0,
  'id': 1327028,
  'job': 'Visual Effects Supervisor',
  'name': 'Jonathan Fawkner'},
 {'credit_id': '539c48a7c3a36810b5001fa7',
  'department': 'Art',
  'gender': 0,
  'id': 1330561,
  'job': 'Art Direction',
  'name': 'Robert Bavin'},
 {'credit_id': '539c4a71c3a36810da0021e0',
  'department': 'Costume & Make-Up',
  'gender': 0,
  'id': 1330567,
  'job': 'Costume Supervisor',
  'name': 'Anthony Almaraz'},
 {'credit_id': '539c4a8ac3a36810ba0021e4',
  'department': 'Costume & Make-Up',
  'gender': 0,
  'id': 1330570,
  'job': 'Costume Supervisor',
  'name': 'Carolyn M. Fenton'},
 {'credit_id': '539c4ab6c3a36810da0021f0',
  'department': 'Costume & Make-Up',
  'gender': 0,
  'id': 1330574,
  'job': 'Costume Supervisor',
  'name': 'Beth Koenigsberg'},
 {'credit_id': '54491ab70e0a267480001ba2',
  'department': 'Art',
  'gender': 0,
  'id': 1336191,
  'job': 'Set Designer',
  'name': 'Sam Page'},
 {'credit_id': '544919d9c3a3680fc30018bd',
  'department': 'Art',
  'gender': 0,
  'id': 1339441,
  'job': 'Set Designer',
  'name': 'Tex Kadonaga'},
 {'credit_id': '54491cf50e0a267483001b0c',
  'department': 'Editing',
  'gender': 0,
  'id': 1352422,
  'job': 'Dialogue Editor',
  'name': 'Kim Foscato'},
 {'credit_id': '544919f40e0a26748c001b09',
  'department': 'Art',
  'gender': 0,
  'id': 1352962,
  'job': 'Set Designer',
  'name': 'Tammy S. Lee'},
 {'credit_id': '5495a115c3a3680ff5002d71',
  'department': 'Crew',
  'gender': 0,
  'id': 1357070,
  'job': 'Transportation Coordinator',
  'name': 'Denny Caira'},
 {'credit_id': '5495a12f92514130fc002e94',
  'department': 'Crew',
  'gender': 0,
  'id': 1357071,
  'job': 'Transportation Coordinator',
  'name': 'James Waitkus'},
 {'credit_id': '5495976fc3a36811530026b0',
  'department': 'Sound',
  'gender': 0,
  'id': 1360103,
  'job': 'Supervising Sound Editor',
  'name': 'Addison Teague'},
 {'credit_id': '54491837c3a3680fb1001c5a',
  'department': 'Art',
  'gender': 2,
  'id': 1376887,
  'job': 'Set Designer',
  'name': 'C. Scott Baker'},
 {'credit_id': '54491878c3a3680fb4001c9d',
  'department': 'Art',
  'gender': 0,
  'id': 1376888,
  'job': 'Set Designer',
  'name': 'Luke Caska'},
 {'credit_id': '544918dac3a3680fa5001ae0',
  'department': 'Art',
  'gender': 0,
  'id': 1376889,
  'job': 'Set Designer',
  'name': 'David Chow'},
 {'credit_id': '544919110e0a267486001b68',
  'department': 'Art',
  'gender': 0,
  'id': 1376890,
  'job': 'Set Designer',
  'name': 'Jonathan Dyer'},
 {'credit_id': '54491967c3a3680faa001b5e',
  'department': 'Art',
  'gender': 0,
  'id': 1376891,
  'job': 'Set Designer',
  'name': 'Joseph Hiura'},
 {'credit_id': '54491997c3a3680fb1001c8a',
  'department': 'Art',
  'gender': 0,
  'id': 1376892,
  'job': 'Art Department Coordinator',
  'name': 'Rebecca Jellie'},
 {'credit_id': '544919ba0e0a26748f001b42',
  'department': 'Art',
  'gender': 0,
  'id': 1376893,
  'job': 'Set Designer',
  'name': 'Robert Andrew Johnson'},
 {'credit_id': '54491b1dc3a3680faa001b8c',
  'department': 'Art',
  'gender': 0,
  'id': 1376895,
  'job': 'Assistant Art Director',
  'name': 'Mike Stassi'},
 {'credit_id': '54491b79c3a3680fbb001826',
  'department': 'Art',
  'gender': 0,
  'id': 1376897,
  'job': 'Construction Coordinator',
  'name': 'John Villarino'},
 {'credit_id': '54491baec3a3680fb4001ce6',
  'department': 'Art',
  'gender': 2,
  'id': 1376898,
  'job': 'Assistant Art Director',
  'name': 'Jeffrey Wisniewski'},
 {'credit_id': '54491d2fc3a3680fb4001d07',
  'department': 'Editing',
  'gender': 0,
  'id': 1376899,
  'job': 'Dialogue Editor',
  'name': 'Cheryl Nardi'},
 {'credit_id': '54491d86c3a3680fa5001b2f',
  'department': 'Editing',
  'gender': 0,
  'id': 1376901,
  'job': 'Dialogue Editor',
  'name': 'Marshall Winn'},
 {'credit_id': '54491d9dc3a3680faa001bb0',
  'department': 'Sound',
  'gender': 0,
  'id': 1376902,
  'job': 'Supervising Sound Editor',
  'name': 'Gwendolyn Yates Whittle'},
 {'credit_id': '54491dc10e0a267486001bce',
  'department': 'Sound',
  'gender': 0,
  'id': 1376903,
  'job': 'Sound Re-Recording Mixer',
  'name': 'William Stein'},
 {'credit_id': '54491f500e0a26747c001c07',
  'department': 'Crew',
  'gender': 0,
  'id': 1376909,
  'job': 'Choreographer',
  'name': 'Lula Washington'},
 {'credit_id': '549599239251412c4e002a2e',
  'department': 'Visual Effects',
  'gender': 0,
  'id': 1391692,
  'job': 'Visual Effects Producer',
  'name': 'Chris Del Conte'},
 {'credit_id': '54959d54c3a36831b8001d9a',
  'department': 'Visual Effects',
  'gender': 2,
  'id': 1391695,
  'job': 'Visual Effects Supervisor',
  'name': 'R. Christopher White'},
 {'credit_id': '54959bdf9251412c4e002a66',
  'department': 'Visual Effects',
  'gender': 0,
  'id': 1394070,
  'job': 'Visual Effects Supervisor',
  'name': 'Dan Lemmon'},
 {'credit_id': '5495971d92514132ed002922',
  'department': 'Sound',
  'gender': 0,
  'id': 1394129,
  'job': 'Sound Effects Editor',
  'name': 'Tim Nielsen'},
 {'credit_id': '5592b25792514152cc0011aa',
  'department': 'Crew',
  'gender': 0,
  'id': 1394286,
  'job': 'CG Supervisor',
  'name': 'Michael Mulholland'},
 {'credit_id': '54959a329251416e2b004355',
  'department': 'Crew',
  'gender': 0,
  'id': 1394750,
  'job': 'Visual Effects Editor',
  'name': 'Thomas Nittmann'},
 {'credit_id': '54959d6dc3a3686ae9004401',
  'department': 'Visual Effects',
  'gender': 0,
  'id': 1394755,
  'job': 'Visual Effects Supervisor',
  'name': 'Edson Williams'},
 {'credit_id': '5495a08fc3a3686ae300441c',
  'department': 'Editing',
  'gender': 0,
  'id': 1394953,
  'job': 'Digital Intermediate',
  'name': 'Christine Carr'},
 {'credit_id': '55402d659251413d6d000249',
  'department': 'Visual Effects',
  'gender': 0,
  'id': 1395269,
  'job': 'Visual Effects Supervisor',
  'name': 'John Bruno'},
 {'credit_id': '54959e7b9251416e1e003f3e',
  'department': 'Camera',
  'gender': 0,
  'id': 1398970,
  'job': 'Steadicam Operator',
  'name': 'David Emmerichs'},
 {'credit_id': '54959734c3a3686ae10045e0',
  'department': 'Sound',
  'gender': 0,
  'id': 1400906,
  'job': 'Sound Effects Editor',
  'name': 'Christopher Scarabosio'},
 {'credit_id': '549595dd92514130fc002d79',
  'department': 'Production',
  'gender': 0,
  'id': 1401784,
  'job': 'Production Supervisor',
  'name': 'Jennifer Teves'},
 {'credit_id': '549596009251413af70028cc',
  'department': 'Production',
  'gender': 0,
  'id': 1401785,
  'job': 'Production Manager',
  'name': 'Brigitte Yorke'},
 {'credit_id': '549596e892514130fc002d99',
  'department': 'Sound',
  'gender': 0,
  'id': 1401786,
  'job': 'Sound Effects Editor',
  'name': 'Ken Fischer'},
 {'credit_id': '549598229251412c4e002a1c',
  'department': 'Crew',
  'gender': 0,
  'id': 1401787,
  'job': 'Special Effects Coordinator',
  'name': 'Iain Hutton'},
 {'credit_id': '549598349251416e2b00432b',
  'department': 'Crew',
  'gender': 0,
  'id': 1401788,
  'job': 'Special Effects Coordinator',
  'name': 'Steve Ingram'},
 {'credit_id': '54959905c3a3686ae3004324',
  'department': 'Visual Effects',
  'gender': 0,
  'id': 1401789,
  'job': 'Visual Effects Producer',
  'name': 'Joyce Cox'},
 {'credit_id': '5495994b92514132ed002951',
  'department': 'Visual Effects',
  'gender': 0,
  'id': 1401790,
  'job': 'Visual Effects Producer',
  'name': 'Jenny Foster'},
 {'credit_id': '549599cbc3a3686ae1004613',
  'department': 'Crew',
  'gender': 0,
  'id': 1401791,
  'job': 'Visual Effects Editor',
  'name': 'Christopher Marino'},
 {'credit_id': '549599f2c3a3686ae100461e',
  'department': 'Crew',
  'gender': 0,
  'id': 1401792,
  'job': 'Visual Effects Editor',
  'name': 'Jim Milton'},
 {'credit_id': '54959a51c3a3686af3003eb5',
  'department': 'Visual Effects',
  'gender': 0,
  'id': 1401793,
  'job': 'Visual Effects Producer',
  'name': 'Cyndi Ochs'},
 {'credit_id': '54959a7cc3a36811530026f4',
  'department': 'Crew',
  'gender': 0,
  'id': 1401794,
  'job': 'Visual Effects Editor',
  'name': 'Lucas Putnam'},
 {'credit_id': '54959b91c3a3680ff5002cb4',
  'department': 'Visual Effects',
  'gender': 0,
  'id': 1401795,
  'job': 'Visual Effects Supervisor',
  'name': "Anthony 'Max' Ivins"},
 {'credit_id': '54959bb69251412c4e002a5f',
  'department': 'Visual Effects',
  'gender': 0,
  'id': 1401796,
  'job': 'Visual Effects Supervisor',
  'name': 'John Knoll'},
 {'credit_id': '54959cbbc3a3686ae3004391',
  'department': 'Visual Effects',
  'gender': 2,
  'id': 1401799,
  'job': 'Visual Effects Supervisor',
  'name': 'Eric Saindon'},
 {'credit_id': '54959d06c3a3686ae90043f6',
  'department': 'Visual Effects',
  'gender': 0,
  'id': 1401800,
  'job': 'Visual Effects Supervisor',
  'name': 'Wayne Stables'},
 {'credit_id': '54959d259251416e1e003f11',
  'department': 'Visual Effects',
  'gender': 0,
  'id': 1401801,
  'job': 'Visual Effects Supervisor',
  'name': 'David Stinnett'},
 {'credit_id': '54959db49251413af7002975',
  'department': 'Visual Effects',
  'gender': 0,
  'id': 1401803,
  'job': 'Visual Effects Supervisor',
  'name': 'Guy Williams'},
 {'credit_id': '54959de4c3a3681153002750',
  'department': 'Crew',
  'gender': 0,
  'id': 1401804,
  'job': 'Stunt Coordinator',
  'name': 'Stuart Thorp'},
 {'credit_id': '54959ef2c3a3680fc60027f2',
  'department': 'Lighting',
  'gender': 0,
  'id': 1401805,
  'job': 'Best Boy Electric',
  'name': 'Giles Coburn'},
 {'credit_id': '54959f07c3a3680fc60027f9',
  'department': 'Camera',
  'gender': 2,
  'id': 1401806,
  'job': 'Still Photographer',
  'name': 'Mark Fellman'},
 {'credit_id': '54959f47c3a3681153002774',
  'department': 'Lighting',
  'gender': 0,
  'id': 1401807,
  'job': 'Lighting Technician',
  'name': 'Scott Sprague'},
 {'credit_id': '54959f8cc3a36831b8001df2',
  'department': 'Visual Effects',
  'gender': 0,
  'id': 1401808,
  'job': 'Animation Director',
  'name': 'Jeremy Hollobon'},
 {'credit_id': '54959fa0c3a36831b8001dfb',
  'department': 'Visual Effects',
  'gender': 0,
  'id': 1401809,
  'job': 'Animation Director',
  'name': 'Orlando Meunier'},
 {'credit_id': '54959fb6c3a3686af3003f54',
  'department': 'Visual Effects',
  'gender': 0,
  'id': 1401810,
  'job': 'Animation Director',
  'name': 'Taisuke Tanimura'},
 {'credit_id': '54959fd2c3a36831b8001e02',
  'department': 'Costume & Make-Up',
  'gender': 0,
  'id': 1401812,
  'job': 'Set Costumer',
  'name': 'Lilia Mishel Acevedo'},
 {'credit_id': '54959ff9c3a3686ae300440c',
  'department': 'Costume & Make-Up',
  'gender': 0,
  'id': 1401814,
  'job': 'Set Costumer',
  'name': 'Alejandro M. Hernandez'},
 {'credit_id': '5495a0ddc3a3686ae10046fe',
  'department': 'Editing',
  'gender': 0,
  'id': 1401815,
  'job': 'Digital Intermediate',
  'name': 'Marvin Hall'},
 {'credit_id': '5495a1f7c3a3686ae3004443',
  'department': 'Production',
  'gender': 0,
  'id': 1401816,
  'job': 'Publicist',
  'name': 'Judy Alley'},
 {'credit_id': '5592b29fc3a36869d100002f',
  'department': 'Crew',
  'gender': 0,
  'id': 1418381,
  'job': 'CG Supervisor',
  'name': 'Mike Perry'},
 {'credit_id': '5592b23a9251415df8001081',
  'department': 'Crew',
  'gender': 0,
  'id': 1426854,
  'job': 'CG Supervisor',
  'name': 'Andrew Morley'},
 {'credit_id': '55491e1192514104c40002d8',
  'department': 'Art',
  'gender': 0,
  'id': 1438901,
  'job': 'Conceptual Design',
  'name': 'Seth Engstrom'},
 {'credit_id': '5525d5809251417276002b06',
  'department': 'Crew',
  'gender': 0,
  'id': 1447362,
  'job': 'Visual Effects Art Director',
  'name': 'Eric Oliver'},
 {'credit_id': '554427ca925141586500312a',
  'department': 'Visual Effects',
  'gender': 0,
  'id': 1447503,
  'job': 'Modeling',
  'name': 'Matsune Suzuki'},
 {'credit_id': '551906889251415aab001c88',
  'department': 'Art',
  'gender': 0,
  'id': 1447524,
  'job': 'Art Department Manager',
  'name': 'Paul Tobin'},
 {'credit_id': '5592af8492514152cc0010de',
  'department': 'Costume & Make-Up',
  'gender': 0,
  'id': 1452643,
  'job': 'Hairstylist',
  'name': 'Roxane Griffin'},
 {'credit_id': '553d3c109251415852001318',
  'department': 'Lighting',
  'gender': 0,
  'id': 1453938,
  'job': 'Lighting Artist',
  'name': 'Arun Ram-Mohan'},
 {'credit_id': '5592af4692514152d5001355',
  'department': 'Costume & Make-Up',
  'gender': 0,
  'id': 1457305,
  'job': 'Makeup Artist',
  'name': 'Georgia Lockhart-Adams'},
 {'credit_id': '5592b2eac3a36877470012a5',
  'department': 'Crew',
  'gender': 0,
  'id': 1466035,
  'job': 'CG Supervisor',
  'name': 'Thrain Shadbolt'},
 {'credit_id': '5592b032c3a36877450015f1',
  'department': 'Crew',
  'gender': 0,
  'id': 1483220,
  'job': 'CG Supervisor',
  'name': 'Brad Alexander'},
 {'credit_id': '5592b05592514152d80012f6',
  'department': 'Crew',
  'gender': 0,
  'id': 1483221,
  'job': 'CG Supervisor',
  'name': 'Shadi Almassizadeh'},
 {'credit_id': '5592b090c3a36877570010b5',
  'department': 'Crew',
  'gender': 0,
  'id': 1483222,
  'job': 'CG Supervisor',
  'name': 'Simon Clutterbuck'},
 {'credit_id': '5592b0dbc3a368774b00112c',
  'department': 'Crew',
  'gender': 0,
  'id': 1483223,
  'job': 'CG Supervisor',
  'name': 'Graeme Demmocks'},
 {'credit_id': '5592b0fe92514152db0010c1',
  'department': 'Crew',
  'gender': 0,
  'id': 1483224,
  'job': 'CG Supervisor',
  'name': 'Adrian Fernandes'},
 {'credit_id': '5592b11f9251415df8001059',
  'department': 'Crew',
  'gender': 0,
  'id': 1483225,
  'job': 'CG Supervisor',
  'name': 'Mitch Gates'},
 {'credit_id': '5592b15dc3a3687745001645',
  'department': 'Crew',
  'gender': 0,
  'id': 1483226,
  'job': 'CG Supervisor',
  'name': 'Jerry Kung'},
 {'credit_id': '5592b18e925141645a0004ae',
  'department': 'Crew',
  'gender': 0,
  'id': 1483227,
  'job': 'CG Supervisor',
  'name': 'Andy Lomas'},
 {'credit_id': '5592b1bfc3a368775d0010e7',
  'department': 'Crew',
  'gender': 0,
  'id': 1483228,
  'job': 'CG Supervisor',
  'name': 'Sebastian Marino'},
 {'credit_id': '5592b2049251415df8001078',
  'department': 'Crew',
  'gender': 0,
  'id': 1483229,
  'job': 'CG Supervisor',
  'name': 'Matthias Menz'},
 {'credit_id': '5592b27b92514152d800136a',
  'department': 'Crew',
  'gender': 0,
  'id': 1483230,
  'job': 'CG Supervisor',
  'name': 'Sergei Nevshupov'},
 {'credit_id': '5592b2c3c3a36869e800003c',
  'department': 'Crew',
  'gender': 0,
  'id': 1483231,
  'job': 'CG Supervisor',
  'name': 'Philippe Rebours'},
 {'credit_id': '5592b317c3a36877470012af',
  'department': 'Crew',
  'gender': 0,
  'id': 1483232,
  'job': 'CG Supervisor',
  'name': 'Michael Takarangi'},
 {'credit_id': '5592b345c3a36877470012bb',
  'department': 'Crew',
  'gender': 0,
  'id': 1483233,
  'job': 'CG Supervisor',
  'name': 'David Weitzberg'},
 {'credit_id': '5592b37cc3a368775100113b',
  'department': 'Crew',
  'gender': 0,
  'id': 1483234,
  'job': 'CG Supervisor',
  'name': 'Ben White'},
 {'credit_id': '573c8e2f9251413f5d000094',
  'department': 'Crew',
  'gender': 1,
  'id': 1621932,
  'job': 'Stunts',
  'name': 'Min Windle'}]
# 감독 정보를 추출
def get_director(x):
    for i in x:
        if i['job'] == 'Director':
            return i['name']
    return np.nan
df2['director'] = df2['crew'].apply(get_director)
df2['director']
0           James Cameron
1          Gore Verbinski
2              Sam Mendes
3       Christopher Nolan
4          Andrew Stanton
              ...        
4798     Robert Rodriguez
4799         Edward Burns
4800          Scott Smith
4801          Daniel Hsia
4802     Brian Herzlinger
Name: director, Length: 4803, dtype: object
# 얼마나 null 값이있나 확인(nan으로 우린 저장했음)
df2[df2['director'].isnull()]
budget genres homepage id keywords original_language original_title overview popularity production_companies ... runtime spoken_languages status tagline title vote_average vote_count cast crew director
3661 0 [{'id': 18, 'name': 'Drama'}] NaN 19615 [] en Flying By A real estate developer goes to his 25th high ... 1.546169 [] ... 95.0 [{"iso_639_1": "en", "name": "English"}] Released It's about the music Flying By 7.0 2 [{'cast_id': 1, 'character': 'George', 'credit... [] NaN
3670 0 [{'id': 10751, 'name': 'Family'}] NaN 447027 [] en Running Forever After being estranged since her mother's death... 0.028756 [{"name": "New Kingdom Pictures", "id": 41671}] ... 88.0 [] Released NaN Running Forever 0.0 0 [] [] NaN
3729 3250000 [{'id': 18, 'name': 'Drama'}, {'id': 10751, 'n... http://www.paathefilm.com/ 26379 [] en Paa He suffers from a progeria like syndrome. Ment... 2.126139 [{"name": "A B Corp", "id": 4502}] ... 133.0 [{"iso_639_1": "hi", "name": "\u0939\u093f\u09... Released NaN Paa 6.6 19 [{'cast_id': 1, 'character': 'Auro', 'credit_i... [{'credit_id': '52fe44fec3a368484e042a29', 'de... NaN
3977 0 [{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam... NaN 55831 [{'id': 10183, 'name': 'independent film'}] en Boynton Beach Club A handful of men and women of a certain age pi... 0.188870 [] ... 105.0 [{"iso_639_1": "en", "name": "English"}] Released NaN Boynton Beach Club 6.8 3 [{'cast_id': 1, 'character': 'Marilyn', 'credi... [] NaN
4068 0 [] NaN 371085 [] en Sharkskin The Post War II story of Manhattan born Mike E... 0.027801 [] ... 0.0 [] Released NaN Sharkskin 0.0 0 [] [] NaN
4105 2000000 [] NaN 48382 [] en The Book of Mormon Movie, Volume 1: The Journey The story of Lehi and his wife Sariah and thei... 0.031947 [] ... 120.0 [] Released 2600 years ago, one family began a remarkable ... The Book of Mormon Movie, Volume 1: The Journey 5.0 2 [{'cast_id': 1, 'character': 'Sam', 'credit_id... [] NaN
4118 0 [] NaN 325140 [] en Hum To Mohabbat Karega Raju, a waiter, is in love with the famous TV ... 0.001186 [] ... 0.0 [] Released NaN Hum To Mohabbat Karega 0.0 0 [] [] NaN
4123 7000000 [{'id': 16, 'name': 'Animation'}, {'id': 10751... http://www.roadsideromeo.com/ 20653 [] en Roadside Romeo This is the story of Romeo. A dude who was liv... 0.253595 [{"name": "Walt Disney Pictures", "id": 2}, {"... ... 93.0 [{"iso_639_1": "en", "name": "English"}, {"iso... Released NaN Roadside Romeo 6.7 3 [{'cast_id': 1, 'character': 'Romeo', 'credit_... [] NaN
4247 1 [{'id': 10749, 'name': 'Romance'}, {'id': 35, ... NaN 361505 [] en Me You and Five Bucks A womanizing yet lovable loser, Charlie, a wai... 0.094105 [] ... 90.0 [] Released A story about second, second chances Me You and Five Bucks 10.0 2 [] [] NaN
4305 0 [{'id': 35, 'name': 'Comedy'}, {'id': 10402, '... NaN 114065 [] en Down & Out With The Dolls The raunchy, spunky tale of the rise and fall ... 0.002386 [] ... 88.0 [] Released Ain't Rock 'N' Roll a bitch. Down & Out With The Dolls 0.0 0 [] [] NaN
4314 1200000 [] NaN 137955 [] en Crowsnest In late summer of 2011, five young friends on ... 0.057564 [] ... 84.0 [] Released NaN Crowsnest 4.8 12 [] [] NaN
4322 0 [{'id': 99, 'name': 'Documentary'}] NaN 102840 [] en Sex With Strangers For some married couples, sex is an obsession ... 0.014406 [] ... 0.0 [] Released NaN Sex With Strangers 5.0 1 [] [] NaN
4374 0 [{'id': 35, 'name': 'Comedy'}] NaN 47686 [{'id': 10183, 'name': 'independent film'}] en Dream with the Fishes Terry is a suicidal voyeur who treats a dying ... 0.948316 [] ... 97.0 [{"iso_639_1": "en", "name": "English"}] Released An oddball odyssey about voyeurism, LSD and nu... Dream with the Fishes 7.7 10 [{'cast_id': 1, 'character': 'Terry', 'credit_... [{'credit_id': '555e51909251417e5f000b42', 'de... NaN
4401 0 [{'id': 28, 'name': 'Action'}, {'id': 35, 'nam... NaN 43630 [] en The Helix... Loaded 0.020600 [] ... 97.0 [{"iso_639_1": "en", "name": "English"}] Rumored NaN The Helix... Loaded 4.8 2 [] [] NaN
4405 0 [{'id': 10751, 'name': 'Family'}, {'id': 35, '... https://www.epicbuzz.net/movies/karachi-se-lahore 357441 [] en Karachi se Lahore A road trip from Karachi to Lahore where 5 fri... 0.060003 [] ... 0.0 [{"iso_639_1": "ur", "name": "\u0627\u0631\u06... Released NaN Karachi se Lahore 8.0 1 [{'cast_id': 0, 'character': '', 'credit_id': ... [] NaN
4458 0 [] NaN 279759 [] en Harrison Montgomery Film from Daniel Davila 0.006943 [] ... 0.0 [] Released NaN Harrison Montgomery 0.0 0 [] [] NaN
4504 0 [] NaN 331493 [] en Light from the Darkroom Light in the Darkroom is the story of two best... 0.012942 [] ... 0.0 [] Released NaN Light from the Darkroom 0.0 0 [] [] NaN
4553 0 [] NaN 380097 [] en America Is Still the Place 1971 post civil rights San Francisco seemed li... 0.000000 [] ... 0.0 [] Released NaN America Is Still the Place 0.0 0 [] [] NaN
4562 500000 [] NaN 297100 [] en The Little Ponderosa Zoo The Little Ponderosa Zoo is preparing for thei... 0.073079 [] ... 84.0 [{"iso_639_1": "en", "name": "English"}] Released NaN The Little Ponderosa Zoo 2.0 1 [] [] NaN
4566 0 [] NaN 325579 [] en Diamond Ruff Action - Orphan, con artist, crime boss and mi... 0.165257 [] ... 0.0 [] Released NaN Diamond Ruff 2.4 4 [] [] NaN
4571 0 [] NaN 328307 [] en Rise of the Entrepreneur: The Search for a Bet... The world is changing faster than ever. Techno... 0.052942 [] ... 0.0 [] Released NaN Rise of the Entrepreneur: The Search for a Bet... 8.0 1 [] [] NaN
4583 0 [{'id': 99, 'name': 'Documentary'}] http://www.iwantyourmoney.net/ 47546 [] en I Want Your Money Two versions of the American dream now stand i... 0.084344 [] ... 92.0 [] Released The film contrasts two views of role that the ... I Want Your Money 3.8 5 [] [] NaN
4589 0 [{'id': 18, 'name': 'Drama'}, {'id': 9648, 'na... NaN 43743 [{'id': 10183, 'name': 'independent film'}] en Fabled Joseph just broke up with his girlfriend and i... 0.003352 [] ... 84.0 [] Released There once was a wolf named Lupold... Fabled 0.0 0 [] [] NaN
4633 0 [] NaN 300327 [] en Death Calls An action-packed love story on the Mexican bor... 0.005883 [] ... 0.0 [] Released NaN Death Calls 0.0 0 [] [] NaN
4638 300000 [{'id': 18, 'name': 'Drama'}, {'id': 28, 'name... NaN 378237 [] en Amidst the Devil's Wings Prequel to "5th of a Degree." 0.018087 [{"name": "Daniel Columbie Films & Productions... ... 90.0 [{"iso_639_1": "en", "name": "English"}] Released Prequel to "5th of a Degree." Amidst the Devil's Wings 0.0 0 [] [] NaN
4644 0 [{'id': 27, 'name': 'Horror'}] NaN 325123 [] en Teeth and Blood A beautiful diva is murdered on the set of hor... 0.055325 [] ... 96.0 [{"iso_639_1": "en", "name": "English"}] Released NaN Teeth and Blood 3.0 1 [{'cast_id': 0, 'character': 'Vincent Augustin... [] NaN
4657 0 [] NaN 320435 [] en UnDivided UnDivided documents the true story of how a su... 0.010607 [] ... 0.0 [] Released NaN UnDivided 0.0 0 [] [] NaN
4662 0 [{'id': 35, 'name': 'Comedy'}] NaN 40963 [{'id': 10183, 'name': 'independent film'}] en Little Big Top An aging out of work clown returns to his smal... 0.092100 [{"name": "Fly High Films", "id": 24248}] ... 0.0 [{"iso_639_1": "en", "name": "English"}] Rumored NaN Little Big Top 10.0 1 [{'cast_id': 0, 'character': 'Seymour', 'credi... [] NaN
4674 0 [] NaN 194588 [] en Short Cut to Nirvana: Kumbh Mela Every 12 years over 70 million pilgrims gather... 0.004998 [] ... 85.0 [] Released NaN Short Cut to Nirvana: Kumbh Mela 0.0 0 [] [] NaN
4716 0 [] NaN 38786 [] en The Blood of My Brother: A Story of Death in Iraq THE BLOOD OF MY BROTHER goes behind the scenes... 0.005256 [] ... 90.0 [] Released NaN The Blood of My Brother: A Story of Death in Iraq 0.0 0 [] [] NaN

30 rows × 23 columns

df2.loc[0, 'cast']
[{'cast_id': 242,
  'character': 'Jake Sully',
  'credit_id': '5602a8a7c3a3685532001c9a',
  'gender': 2,
  'id': 65731,
  'name': 'Sam Worthington',
  'order': 0},
 {'cast_id': 3,
  'character': 'Neytiri',
  'credit_id': '52fe48009251416c750ac9cb',
  'gender': 1,
  'id': 8691,
  'name': 'Zoe Saldana',
  'order': 1},
 {'cast_id': 25,
  'character': 'Dr. Grace Augustine',
  'credit_id': '52fe48009251416c750aca39',
  'gender': 1,
  'id': 10205,
  'name': 'Sigourney Weaver',
  'order': 2},
 {'cast_id': 4,
  'character': 'Col. Quaritch',
  'credit_id': '52fe48009251416c750ac9cf',
  'gender': 2,
  'id': 32747,
  'name': 'Stephen Lang',
  'order': 3},
 {'cast_id': 5,
  'character': 'Trudy Chacon',
  'credit_id': '52fe48009251416c750ac9d3',
  'gender': 1,
  'id': 17647,
  'name': 'Michelle Rodriguez',
  'order': 4},
 {'cast_id': 8,
  'character': 'Selfridge',
  'credit_id': '52fe48009251416c750ac9e1',
  'gender': 2,
  'id': 1771,
  'name': 'Giovanni Ribisi',
  'order': 5},
 {'cast_id': 7,
  'character': 'Norm Spellman',
  'credit_id': '52fe48009251416c750ac9dd',
  'gender': 2,
  'id': 59231,
  'name': 'Joel David Moore',
  'order': 6},
 {'cast_id': 9,
  'character': 'Moat',
  'credit_id': '52fe48009251416c750ac9e5',
  'gender': 1,
  'id': 30485,
  'name': 'CCH Pounder',
  'order': 7},
 {'cast_id': 11,
  'character': 'Eytukan',
  'credit_id': '52fe48009251416c750ac9ed',
  'gender': 2,
  'id': 15853,
  'name': 'Wes Studi',
  'order': 8},
 {'cast_id': 10,
  'character': "Tsu'Tey",
  'credit_id': '52fe48009251416c750ac9e9',
  'gender': 2,
  'id': 10964,
  'name': 'Laz Alonso',
  'order': 9},
 {'cast_id': 12,
  'character': 'Dr. Max Patel',
  'credit_id': '52fe48009251416c750ac9f1',
  'gender': 2,
  'id': 95697,
  'name': 'Dileep Rao',
  'order': 10},
 {'cast_id': 13,
  'character': 'Lyle Wainfleet',
  'credit_id': '52fe48009251416c750ac9f5',
  'gender': 2,
  'id': 98215,
  'name': 'Matt Gerald',
  'order': 11},
 {'cast_id': 32,
  'character': 'Private Fike',
  'credit_id': '52fe48009251416c750aca5b',
  'gender': 2,
  'id': 154153,
  'name': 'Sean Anthony Moran',
  'order': 12},
 {'cast_id': 33,
  'character': 'Cryo Vault Med Tech',
  'credit_id': '52fe48009251416c750aca5f',
  'gender': 2,
  'id': 397312,
  'name': 'Jason Whyte',
  'order': 13},
 {'cast_id': 34,
  'character': 'Venture Star Crew Chief',
  'credit_id': '52fe48009251416c750aca63',
  'gender': 2,
  'id': 42317,
  'name': 'Scott Lawrence',
  'order': 14},
 {'cast_id': 35,
  'character': 'Lock Up Trooper',
  'credit_id': '52fe48009251416c750aca67',
  'gender': 2,
  'id': 986734,
  'name': 'Kelly Kilgour',
  'order': 15},
 {'cast_id': 36,
  'character': 'Shuttle Pilot',
  'credit_id': '52fe48009251416c750aca6b',
  'gender': 0,
  'id': 1207227,
  'name': 'James Patrick Pitt',
  'order': 16},
 {'cast_id': 37,
  'character': 'Shuttle Co-Pilot',
  'credit_id': '52fe48009251416c750aca6f',
  'gender': 0,
  'id': 1180936,
  'name': 'Sean Patrick Murphy',
  'order': 17},
 {'cast_id': 38,
  'character': 'Shuttle Crew Chief',
  'credit_id': '52fe48009251416c750aca73',
  'gender': 2,
  'id': 1019578,
  'name': 'Peter Dillon',
  'order': 18},
 {'cast_id': 39,
  'character': 'Tractor Operator / Troupe',
  'credit_id': '52fe48009251416c750aca77',
  'gender': 0,
  'id': 91443,
  'name': 'Kevin Dorman',
  'order': 19},
 {'cast_id': 40,
  'character': 'Dragon Gunship Pilot',
  'credit_id': '52fe48009251416c750aca7b',
  'gender': 2,
  'id': 173391,
  'name': 'Kelson Henderson',
  'order': 20},
 {'cast_id': 41,
  'character': 'Dragon Gunship Gunner',
  'credit_id': '52fe48009251416c750aca7f',
  'gender': 0,
  'id': 1207236,
  'name': 'David Van Horn',
  'order': 21},
 {'cast_id': 42,
  'character': 'Dragon Gunship Navigator',
  'credit_id': '52fe48009251416c750aca83',
  'gender': 0,
  'id': 215913,
  'name': 'Jacob Tomuri',
  'order': 22},
 {'cast_id': 43,
  'character': 'Suit #1',
  'credit_id': '52fe48009251416c750aca87',
  'gender': 0,
  'id': 143206,
  'name': 'Michael Blain-Rozgay',
  'order': 23},
 {'cast_id': 44,
  'character': 'Suit #2',
  'credit_id': '52fe48009251416c750aca8b',
  'gender': 2,
  'id': 169676,
  'name': 'Jon Curry',
  'order': 24},
 {'cast_id': 46,
  'character': 'Ambient Room Tech',
  'credit_id': '52fe48009251416c750aca8f',
  'gender': 0,
  'id': 1048610,
  'name': 'Luke Hawker',
  'order': 25},
 {'cast_id': 47,
  'character': 'Ambient Room Tech / Troupe',
  'credit_id': '52fe48009251416c750aca93',
  'gender': 0,
  'id': 42288,
  'name': 'Woody Schultz',
  'order': 26},
 {'cast_id': 48,
  'character': 'Horse Clan Leader',
  'credit_id': '52fe48009251416c750aca97',
  'gender': 2,
  'id': 68278,
  'name': 'Peter Mensah',
  'order': 27},
 {'cast_id': 49,
  'character': 'Link Room Tech',
  'credit_id': '52fe48009251416c750aca9b',
  'gender': 0,
  'id': 1207247,
  'name': 'Sonia Yee',
  'order': 28},
 {'cast_id': 50,
  'character': 'Basketball Avatar / Troupe',
  'credit_id': '52fe48009251416c750aca9f',
  'gender': 1,
  'id': 1207248,
  'name': 'Jahnel Curfman',
  'order': 29},
 {'cast_id': 51,
  'character': 'Basketball Avatar',
  'credit_id': '52fe48009251416c750acaa3',
  'gender': 0,
  'id': 89714,
  'name': 'Ilram Choi',
  'order': 30},
 {'cast_id': 52,
  'character': "Na'vi Child",
  'credit_id': '52fe48009251416c750acaa7',
  'gender': 0,
  'id': 1207249,
  'name': 'Kyla Warren',
  'order': 31},
 {'cast_id': 53,
  'character': 'Troupe',
  'credit_id': '52fe48009251416c750acaab',
  'gender': 0,
  'id': 1207250,
  'name': 'Lisa Roumain',
  'order': 32},
 {'cast_id': 54,
  'character': 'Troupe',
  'credit_id': '52fe48009251416c750acaaf',
  'gender': 1,
  'id': 83105,
  'name': 'Debra Wilson',
  'order': 33},
 {'cast_id': 57,
  'character': 'Troupe',
  'credit_id': '52fe48009251416c750acabb',
  'gender': 0,
  'id': 1207253,
  'name': 'Chris Mala',
  'order': 34},
 {'cast_id': 55,
  'character': 'Troupe',
  'credit_id': '52fe48009251416c750acab3',
  'gender': 0,
  'id': 1207251,
  'name': 'Taylor Kibby',
  'order': 35},
 {'cast_id': 56,
  'character': 'Troupe',
  'credit_id': '52fe48009251416c750acab7',
  'gender': 0,
  'id': 1207252,
  'name': 'Jodie Landau',
  'order': 36},
 {'cast_id': 58,
  'character': 'Troupe',
  'credit_id': '52fe48009251416c750acabf',
  'gender': 0,
  'id': 1207254,
  'name': 'Julie Lamm',
  'order': 37},
 {'cast_id': 59,
  'character': 'Troupe',
  'credit_id': '52fe48009251416c750acac3',
  'gender': 0,
  'id': 1207257,
  'name': 'Cullen B. Madden',
  'order': 38},
 {'cast_id': 60,
  'character': 'Troupe',
  'credit_id': '52fe48009251416c750acac7',
  'gender': 0,
  'id': 1207259,
  'name': 'Joseph Brady Madden',
  'order': 39},
 {'cast_id': 61,
  'character': 'Troupe',
  'credit_id': '52fe48009251416c750acacb',
  'gender': 0,
  'id': 1207262,
  'name': 'Frankie Torres',
  'order': 40},
 {'cast_id': 62,
  'character': 'Troupe',
  'credit_id': '52fe48009251416c750acacf',
  'gender': 1,
  'id': 1158600,
  'name': 'Austin Wilson',
  'order': 41},
 {'cast_id': 63,
  'character': 'Troupe',
  'credit_id': '52fe48019251416c750acad3',
  'gender': 1,
  'id': 983705,
  'name': 'Sara Wilson',
  'order': 42},
 {'cast_id': 64,
  'character': 'Troupe',
  'credit_id': '52fe48019251416c750acad7',
  'gender': 0,
  'id': 1207263,
  'name': 'Tamica Washington-Miller',
  'order': 43},
 {'cast_id': 65,
  'character': 'Op Center Staff',
  'credit_id': '52fe48019251416c750acadb',
  'gender': 1,
  'id': 1145098,
  'name': 'Lucy Briant',
  'order': 44},
 {'cast_id': 66,
  'character': 'Op Center Staff',
  'credit_id': '52fe48019251416c750acadf',
  'gender': 2,
  'id': 33305,
  'name': 'Nathan Meister',
  'order': 45},
 {'cast_id': 67,
  'character': 'Op Center Staff',
  'credit_id': '52fe48019251416c750acae3',
  'gender': 0,
  'id': 1207264,
  'name': 'Gerry Blair',
  'order': 46},
 {'cast_id': 68,
  'character': 'Op Center Staff',
  'credit_id': '52fe48019251416c750acae7',
  'gender': 2,
  'id': 33311,
  'name': 'Matthew Chamberlain',
  'order': 47},
 {'cast_id': 69,
  'character': 'Op Center Staff',
  'credit_id': '52fe48019251416c750acaeb',
  'gender': 0,
  'id': 1207265,
  'name': 'Paul Yates',
  'order': 48},
 {'cast_id': 70,
  'character': 'Op Center Duty Officer',
  'credit_id': '52fe48019251416c750acaef',
  'gender': 0,
  'id': 1207266,
  'name': 'Wray Wilson',
  'order': 49},
 {'cast_id': 71,
  'character': 'Op Center Staff',
  'credit_id': '52fe48019251416c750acaf3',
  'gender': 2,
  'id': 54492,
  'name': 'James Gaylyn',
  'order': 50},
 {'cast_id': 72,
  'character': 'Dancer',
  'credit_id': '52fe48019251416c750acaf7',
  'gender': 0,
  'id': 1207267,
  'name': 'Melvin Leno Clark III',
  'order': 51},
 {'cast_id': 73,
  'character': 'Dancer',
  'credit_id': '52fe48019251416c750acafb',
  'gender': 0,
  'id': 1207268,
  'name': 'Carvon Futrell',
  'order': 52},
 {'cast_id': 74,
  'character': 'Dancer',
  'credit_id': '52fe48019251416c750acaff',
  'gender': 0,
  'id': 1207269,
  'name': 'Brandon Jelkes',
  'order': 53},
 {'cast_id': 75,
  'character': 'Dancer',
  'credit_id': '52fe48019251416c750acb03',
  'gender': 0,
  'id': 1207270,
  'name': 'Micah Moch',
  'order': 54},
 {'cast_id': 76,
  'character': 'Dancer',
  'credit_id': '52fe48019251416c750acb07',
  'gender': 0,
  'id': 1207271,
  'name': 'Hanniyah Muhammad',
  'order': 55},
 {'cast_id': 77,
  'character': 'Dancer',
  'credit_id': '52fe48019251416c750acb0b',
  'gender': 0,
  'id': 1207272,
  'name': 'Christopher Nolen',
  'order': 56},
 {'cast_id': 78,
  'character': 'Dancer',
  'credit_id': '52fe48019251416c750acb0f',
  'gender': 0,
  'id': 1207273,
  'name': 'Christa Oliver',
  'order': 57},
 {'cast_id': 79,
  'character': 'Dancer',
  'credit_id': '52fe48019251416c750acb13',
  'gender': 0,
  'id': 1207274,
  'name': 'April Marie Thomas',
  'order': 58},
 {'cast_id': 80,
  'character': 'Dancer',
  'credit_id': '52fe48019251416c750acb17',
  'gender': 0,
  'id': 1207275,
  'name': 'Bravita A. Threatt',
  'order': 59},
 {'cast_id': 81,
  'character': 'Mining Chief (uncredited)',
  'credit_id': '52fe48019251416c750acb1b',
  'gender': 0,
  'id': 1207276,
  'name': 'Colin Bleasdale',
  'order': 60},
 {'cast_id': 82,
  'character': 'Veteran Miner (uncredited)',
  'credit_id': '52fe48019251416c750acb1f',
  'gender': 0,
  'id': 107969,
  'name': 'Mike Bodnar',
  'order': 61},
 {'cast_id': 83,
  'character': 'Richard (uncredited)',
  'credit_id': '52fe48019251416c750acb23',
  'gender': 0,
  'id': 1207278,
  'name': 'Matt Clayton',
  'order': 62},
 {'cast_id': 84,
  'character': "Nav'i (uncredited)",
  'credit_id': '52fe48019251416c750acb27',
  'gender': 1,
  'id': 147898,
  'name': 'Nicole Dionne',
  'order': 63},
 {'cast_id': 85,
  'character': 'Trooper (uncredited)',
  'credit_id': '52fe48019251416c750acb2b',
  'gender': 0,
  'id': 1207280,
  'name': 'Jamie Harrison',
  'order': 64},
 {'cast_id': 86,
  'character': 'Trooper (uncredited)',
  'credit_id': '52fe48019251416c750acb2f',
  'gender': 0,
  'id': 1207281,
  'name': 'Allan Henry',
  'order': 65},
 {'cast_id': 87,
  'character': 'Ground Technician (uncredited)',
  'credit_id': '52fe48019251416c750acb33',
  'gender': 2,
  'id': 1207282,
  'name': 'Anthony Ingruber',
  'order': 66},
 {'cast_id': 88,
  'character': 'Flight Crew Mechanic (uncredited)',
  'credit_id': '52fe48019251416c750acb37',
  'gender': 0,
  'id': 1207283,
  'name': 'Ashley Jeffery',
  'order': 67},
 {'cast_id': 14,
  'character': 'Samson Pilot',
  'credit_id': '52fe48009251416c750ac9f9',
  'gender': 0,
  'id': 98216,
  'name': 'Dean Knowsley',
  'order': 68},
 {'cast_id': 89,
  'character': 'Trooper (uncredited)',
  'credit_id': '52fe48019251416c750acb3b',
  'gender': 0,
  'id': 1201399,
  'name': 'Joseph Mika-Hunt',
  'order': 69},
 {'cast_id': 90,
  'character': 'Banshee (uncredited)',
  'credit_id': '52fe48019251416c750acb3f',
  'gender': 0,
  'id': 236696,
  'name': 'Terry Notary',
  'order': 70},
 {'cast_id': 91,
  'character': 'Soldier (uncredited)',
  'credit_id': '52fe48019251416c750acb43',
  'gender': 0,
  'id': 1207287,
  'name': 'Kai Pantano',
  'order': 71},
 {'cast_id': 92,
  'character': 'Blast Technician (uncredited)',
  'credit_id': '52fe48019251416c750acb47',
  'gender': 0,
  'id': 1207288,
  'name': 'Logan Pithyou',
  'order': 72},
 {'cast_id': 93,
  'character': 'Vindum Raah (uncredited)',
  'credit_id': '52fe48019251416c750acb4b',
  'gender': 0,
  'id': 1207289,
  'name': 'Stuart Pollock',
  'order': 73},
 {'cast_id': 94,
  'character': 'Hero (uncredited)',
  'credit_id': '52fe48019251416c750acb4f',
  'gender': 0,
  'id': 584868,
  'name': 'Raja',
  'order': 74},
 {'cast_id': 95,
  'character': 'Ops Centreworker (uncredited)',
  'credit_id': '52fe48019251416c750acb53',
  'gender': 0,
  'id': 1207290,
  'name': 'Gareth Ruck',
  'order': 75},
 {'cast_id': 96,
  'character': 'Engineer (uncredited)',
  'credit_id': '52fe48019251416c750acb57',
  'gender': 0,
  'id': 1062463,
  'name': 'Rhian Sheehan',
  'order': 76},
 {'cast_id': 97,
  'character': "Col. Quaritch's Mech Suit (uncredited)",
  'credit_id': '52fe48019251416c750acb5b',
  'gender': 0,
  'id': 60656,
  'name': 'T. J. Storm',
  'order': 77},
 {'cast_id': 98,
  'character': 'Female Marine (uncredited)',
  'credit_id': '52fe48019251416c750acb5f',
  'gender': 0,
  'id': 1207291,
  'name': 'Jodie Taylor',
  'order': 78},
 {'cast_id': 99,
  'character': 'Ikran Clan Leader (uncredited)',
  'credit_id': '52fe48019251416c750acb63',
  'gender': 1,
  'id': 1186027,
  'name': 'Alicia Vela-Bailey',
  'order': 79},
 {'cast_id': 100,
  'character': 'Geologist (uncredited)',
  'credit_id': '52fe48019251416c750acb67',
  'gender': 0,
  'id': 1207292,
  'name': 'Richard Whiteside',
  'order': 80},
 {'cast_id': 101,
  'character': "Na'vi (uncredited)",
  'credit_id': '52fe48019251416c750acb6b',
  'gender': 0,
  'id': 103259,
  'name': 'Nikie Zambo',
  'order': 81},
 {'cast_id': 102,
  'character': 'Ambient Room Tech / Troupe',
  'credit_id': '52fe48019251416c750acb6f',
  'gender': 1,
  'id': 42286,
  'name': 'Julene Renee',
  'order': 82}]
df2.loc[0, 'genres']
[{'id': 28, 'name': 'Action'},
 {'id': 12, 'name': 'Adventure'},
 {'id': 14, 'name': 'Fantasy'},
 {'id': 878, 'name': 'Science Fiction'}]
df2.loc[0, 'keywords']
[{'id': 1463, 'name': 'culture clash'},
 {'id': 2964, 'name': 'future'},
 {'id': 3386, 'name': 'space war'},
 {'id': 3388, 'name': 'space colony'},
 {'id': 3679, 'name': 'society'},
 {'id': 3801, 'name': 'space travel'},
 {'id': 9685, 'name': 'futuristic'},
 {'id': 9840, 'name': 'romance'},
 {'id': 9882, 'name': 'space'},
 {'id': 9951, 'name': 'alien'},
 {'id': 10148, 'name': 'tribe'},
 {'id': 10158, 'name': 'alien planet'},
 {'id': 10987, 'name': 'cgi'},
 {'id': 11399, 'name': 'marine'},
 {'id': 13065, 'name': 'soldier'},
 {'id': 14643, 'name': 'battle'},
 {'id': 14720, 'name': 'love affair'},
 {'id': 165431, 'name': 'anti war'},
 {'id': 193554, 'name': 'power relations'},
 {'id': 206690, 'name': 'mind and soul'},
 {'id': 209714, 'name': '3d'}]
# 처음 3개의 데이터 중에서 name 에 해당하는 value 만 추출
def get_list(x):
    if isinstance(x, list): # list 타입인지 우선 확인
        names = [i['name'] for i in x]
        if len(names) > 3:
            names = names[:3]
        return names
    return []
features = ['cast', 'keywords', 'genres']
for feature in features:
    df2[feature] = df2[feature].apply(get_list)
# 데이터 많으니까 head(3)로 확인
# 각 행에 cast, keywords, genres가 3개씩 name이 구성된다는걸 확인
df2[['title', 'cast', 'director', 'keywords', 'genres']].head(3)
title cast director keywords genres
0 Avatar [Sam Worthington, Zoe Saldana, Sigourney Weaver] James Cameron [culture clash, future, space war] [Action, Adventure, Fantasy]
1 Pirates of the Caribbean: At World's End [Johnny Depp, Orlando Bloom, Keira Knightley] Gore Verbinski [ocean, drug abuse, exotic island] [Adventure, Fantasy, Action]
2 Spectre [Daniel Craig, Christoph Waltz, Léa Seydoux] Sam Mendes [spy, based on novel, secret agent] [Action, Adventure, Crime]
# 소문자로 바꾸고, 빈칸 없애는 함수
def clean_data(x):
    if isinstance(x, list): # list 타입
        return [str.lower(i.replace(' ', '')) for i in x]
    else:
        if isinstance(x, str): # str 타입
            return str.lower(x.replace(' ', ''))
        else: # 그 외
            return ''
features = ['cast', 'keywords', 'director', 'genres']
for feature in features:
    df2[feature] = df2[feature].apply(clean_data)
df2[['title', 'cast', 'director', 'keywords', 'genres']].head(3)
title cast director keywords genres
0 Avatar [samworthington, zoesaldana, sigourneyweaver] jamescameron [cultureclash, future, spacewar] [action, adventure, fantasy]
1 Pirates of the Caribbean: At World's End [johnnydepp, orlandobloom, keiraknightley] goreverbinski [ocean, drugabuse, exoticisland] [adventure, fantasy, action]
2 Spectre [danielcraig, christophwaltz, léaseydoux] sammendes [spy, basedonnovel, secretagent] [action, adventure, crime]
# 위의 데이터들이 콤마 없이 띄어 쓰기로..
def create_soup(x):
    return ' '.join(x['keywords']) + ' ' + ' '.join(x['cast']) + ' ' + x['director'] + ' ' + ' '.join(x['genres'])
df2['soup'] = df2.apply(create_soup, axis=1)
df2['soup'] # 새로 만든 'soup' 컬럼
0       cultureclash future spacewar samworthington zo...
1       ocean drugabuse exoticisland johnnydepp orland...
2       spy basedonnovel secretagent danielcraig chris...
3       dccomics crimefighter terrorist christianbale ...
4       basedonnovel mars medallion taylorkitsch lynnc...
                              ...                        
4798    unitedstates–mexicobarrier legs arms carlosgal...
4799     edwardburns kerrybishé marshadietlein edwardb...
4800    date loveatfirstsight narration ericmabius kri...
4801       danielhenney elizacoupe billpaxton danielhsia 
4802    obsession camcorder crush drewbarrymore brianh...
Name: soup, Length: 4803, dtype: object

줄거리에서 사용한 TfidfVectorizer 방식은 필요 없는 영어를 삭제하였다.

그러나 지금은 그럴필요가 없이 순수하게 단어들 카운트를 할것이기 때문에 CountVectorizer 방식을 사용한다.

from sklearn.feature_extraction.text import CountVectorizer

count = CountVectorizer(stop_words='english') # 혹시몰라 추가함(stop_words)
count_matrix = count.fit_transform(df2['soup'])
count_matrix
<4803x11520 sparse matrix of type '<class 'numpy.int64'>'
	with 42935 stored elements in Compressed Sparse Row format>

앞에서 코사인 유사도로 linear_kernel 함수 를 사용했는데 이번엔 cosine_similarity 함수 로 사용하겠다.

# 신뢰도 - 코사인 유사도
from sklearn.metrics.pairwise import cosine_similarity
cosine_sim2 = cosine_similarity(count_matrix, count_matrix)
cosine_sim2
array([[1. , 0.3, 0.2, ..., 0. , 0. , 0. ],
       [0.3, 1. , 0.2, ..., 0. , 0. , 0. ],
       [0.2, 0.2, 1. , ..., 0. , 0. , 0. ],
       ...,
       [0. , 0. , 0. , ..., 1. , 0. , 0. ],
       [0. , 0. , 0. , ..., 0. , 1. , 0. ],
       [0. , 0. , 0. , ..., 0. , 0. , 1. ]])
indices['Avatar'] # 앞에서 사용했던 영화 제목을 통해 index구하는 법
0
# 안해도 되긴 하는데, 혹시나 꼬였을 경우 다시 indices를 생성
df2 = df2.reset_index()
indices = pd.Series(df2.index, index=df2['title'])
indices
title
Avatar                                         0
Pirates of the Caribbean: At World's End       1
Spectre                                        2
The Dark Knight Rises                          3
John Carter                                    4
                                            ... 
El Mariachi                                 4798
Newlyweds                                   4799
Signed, Sealed, Delivered                   4800
Shanghai Calling                            4801
My Date with Drew                           4802
Length: 4803, dtype: int64
get_recommendations('The Dark Knight Rises', cosine_sim2)
65               The Dark Knight
119                Batman Begins
4638    Amidst the Devil's Wings
1196                The Prestige
3073           Romeo Is Bleeding
3326              Black November
1503                      Takers
1986                      Faster
303                     Catwoman
747               Gangster Squad
Name: title, dtype: object
get_recommendations('Up', cosine_sim2)
231                                        Monsters, Inc.
1983                                     Meet the Deedles
3403    Alpha and Omega: The Legend of the Saw Tooth Cave
3114                                          Elsa & Fred
1580                                          The Nut Job
3670                                      Running Forever
4709                            A Charlie Brown Christmas
40                                                 Cars 2
42                                            Toy Story 3
77                                             Inside Out
Name: title, dtype: object
get_recommendations('The Martian', cosine_sim2)
4                    John Carter
95                  Interstellar
365                      Contact
256                    Allegiant
1326                The 5th Wave
1958                 On the Road
3043            End of the Spear
3373    The Other Side of Heaven
3392                       Gerry
3698                   Moby Dick
Name: title, dtype: object
indices['The Martian'] # 영화 정보 보기위해 index 구하기
270
df2.loc[270] # 마션 영화 데이터
index                                                                 270
budget                                                          108000000
genres                                 [drama, adventure, sciencefiction]
homepage                      http://www.foxmovies.com/movies/the-martian
id                                                                 286217
keywords                                       [basedonnovel, mars, nasa]
original_language                                                      en
original_title                                                The Martian
overview                During a manned mission to Mars, Astronaut Mar...
popularity                                                      167.93287
production_companies    [{"name": "Twentieth Century Fox Film Corporat...
production_countries    [{"iso_3166_1": "US", "name": "United States o...
release_date                                                   2015-09-30
revenue                                                         630161890
runtime                                                             141.0
spoken_languages        [{"iso_639_1": "en", "name": "English"}, {"iso...
status                                                           Released
tagline                                                    Bring Him Home
title                                                         The Martian
vote_average                                                          7.6
vote_count                                                           7268
cast                            [mattdamon, jessicachastain, kristenwiig]
crew                    [{'credit_id': '5607a7e19251413050003e2c', 'de...
director                                                      ridleyscott
soup                    basedonnovel mars nasa mattdamon jessicachasta...
Name: 270, dtype: object
# 확인해 보면 장르, 키워드 등등 비슷한게 겹쳐서 유사도가 높게 나온거라 판단
df2.loc[4] # 마션 검색시 추천으로 높게나온 'John Carter' 영화 데이터
index                                                                   4
budget                                                          260000000
genres                                [action, adventure, sciencefiction]
homepage                             http://movies.disney.com/john-carter
id                                                                  49529
keywords                                  [basedonnovel, mars, medallion]
original_language                                                      en
original_title                                                John Carter
overview                John Carter is a war-weary, former military ca...
popularity                                                      43.926995
production_companies          [{"name": "Walt Disney Pictures", "id": 2}]
production_countries    [{"iso_3166_1": "US", "name": "United States o...
release_date                                                   2012-03-07
revenue                                                         284139100
runtime                                                             132.0
spoken_languages                 [{"iso_639_1": "en", "name": "English"}]
status                                                           Released
tagline                              Lost in our world, found in another.
title                                                         John Carter
vote_average                                                          6.1
vote_count                                                           2124
cast                          [taylorkitsch, lynncollins, samanthamorton]
crew                    [{'credit_id': '52fe479ac3a36847f813eaa3', 'de...
director                                                    andrewstanton
soup                    basedonnovel mars medallion taylorkitsch lynnc...
Name: 4, dtype: object
get_recommendations('The Avengers', cosine_sim2)
7                  Avengers: Age of Ultron
26              Captain America: Civil War
79                              Iron Man 2
169     Captain America: The First Avenger
174                    The Incredible Hulk
85     Captain America: The Winter Soldier
31                              Iron Man 3
33                   X-Men: The Last Stand
68                                Iron Man
94                 Guardians of the Galaxy
Name: title, dtype: object
import pickle
df2.head(3)
index budget genres homepage id keywords original_language original_title overview popularity ... spoken_languages status tagline title vote_average vote_count cast crew director soup
0 0 237000000 [action, adventure, fantasy] http://www.avatarmovie.com/ 19995 [cultureclash, future, spacewar] en Avatar In the 22nd century, a paraplegic Marine is di... 150.437577 ... [{"iso_639_1": "en", "name": "English"}, {"iso... Released Enter the World of Pandora. Avatar 7.2 11800 [samworthington, zoesaldana, sigourneyweaver] [{'credit_id': '52fe48009251416c750aca23', 'de... jamescameron cultureclash future spacewar samworthington zo...
1 1 300000000 [adventure, fantasy, action] http://disney.go.com/disneypictures/pirates/ 285 [ocean, drugabuse, exoticisland] en Pirates of the Caribbean: At World's End Captain Barbossa, long believed to be dead, ha... 139.082615 ... [{"iso_639_1": "en", "name": "English"}] Released At the end of the world, the adventure begins. Pirates of the Caribbean: At World's End 6.9 4500 [johnnydepp, orlandobloom, keiraknightley] [{'credit_id': '52fe4232c3a36847f800b579', 'de... goreverbinski ocean drugabuse exoticisland johnnydepp orland...
2 2 245000000 [action, adventure, crime] http://www.sonypictures.com/movies/spectre/ 206647 [spy, basedonnovel, secretagent] en Spectre A cryptic message from Bond’s past sends him o... 107.376788 ... [{"iso_639_1": "fr", "name": "Fran\u00e7ais"},... Released A Plan No One Escapes Spectre 6.3 4466 [danielcraig, christophwaltz, léaseydoux] [{'credit_id': '54805967c3a36829b5002c41', 'de... sammendes spy basedonnovel secretagent danielcraig chris...

3 rows × 25 columns

movies = df2[['id', 'title']].copy()
movies.head(5)
id title
0 19995 Avatar
1 285 Pirates of the Caribbean: At World's End
2 206647 Spectre
3 49026 The Dark Knight Rises
4 49529 John Carter
# 영화 데이터
pickle.dump(movies, open('movies.pickle', 'wb'))
# 코사인 유사도 데이터
pickle.dump(cosine_sim2, open('cosine_sim.pickle', 'wb'))

참고자료 : 나도코딩-유튜브

추천!

댓글남기기