- 영화 추천 시스템(인구통계학적, 컨텐츠 기반 필터링)
본 자료는 다음 링크의 내용을 참고하였습니다.
-
Reference : https://www.kaggle.com/code/ibtesama/getting-started-with-a-movie-recommendation-system
-
TMDB 5000 에서 수많은 영화 추천 게시글중 인기있는 예시를 가져온것이다.
-
TMDB 5000은 TMDB에 영화 5000개를 데이터로 사용할 수 있게 데이터 셋 해둔 사이트이다.
영화 추천 시스템
-
Demographic Filtering (인구통계학적 필터링)
-
Content Based Filtering (컨텐츠 기반 필터링)
-
Collaborative Filtering (협업 필터링)
1. Demographic Filtering (인구통계학적 필터링)
참고 링크의 공식 사용
$ WR = (\frac{v}{v+m}\cdot R) + (\frac{m}{v+m}\cdot C) $
# csv 파일들도 참고 링크에서 가져왔다.
import pandas as pd
import numpy as np
df1 = pd.read_csv('tmdb_5000_credits.csv')
df2 = pd.read_csv('tmdb_5000_movies.csv')
df1.head()
movie_id | title | cast | crew | |
---|---|---|---|---|
0 | 19995 | Avatar | [{"cast_id": 242, "character": "Jake Sully", "... | [{"credit_id": "52fe48009251416c750aca23", "de... |
1 | 285 | Pirates of the Caribbean: At World's End | [{"cast_id": 4, "character": "Captain Jack Spa... | [{"credit_id": "52fe4232c3a36847f800b579", "de... |
2 | 206647 | Spectre | [{"cast_id": 1, "character": "James Bond", "cr... | [{"credit_id": "54805967c3a36829b5002c41", "de... |
3 | 49026 | The Dark Knight Rises | [{"cast_id": 2, "character": "Bruce Wayne / Ba... | [{"credit_id": "52fe4781c3a36847f81398c3", "de... |
4 | 49529 | John Carter | [{"cast_id": 5, "character": "John Carter", "c... | [{"credit_id": "52fe479ac3a36847f813eaa3", "de... |
df2.head(3)
budget | genres | homepage | id | keywords | original_language | original_title | overview | popularity | production_companies | production_countries | release_date | revenue | runtime | spoken_languages | status | tagline | title | vote_average | vote_count | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 237000000 | [{"id": 28, "name": "Action"}, {"id": 12, "nam... | http://www.avatarmovie.com/ | 19995 | [{"id": 1463, "name": "culture clash"}, {"id":... | en | Avatar | In the 22nd century, a paraplegic Marine is di... | 150.437577 | [{"name": "Ingenious Film Partners", "id": 289... | [{"iso_3166_1": "US", "name": "United States o... | 2009-12-10 | 2787965087 | 162.0 | [{"iso_639_1": "en", "name": "English"}, {"iso... | Released | Enter the World of Pandora. | Avatar | 7.2 | 11800 |
1 | 300000000 | [{"id": 12, "name": "Adventure"}, {"id": 14, "... | http://disney.go.com/disneypictures/pirates/ | 285 | [{"id": 270, "name": "ocean"}, {"id": 726, "na... | en | Pirates of the Caribbean: At World's End | Captain Barbossa, long believed to be dead, ha... | 139.082615 | [{"name": "Walt Disney Pictures", "id": 2}, {"... | [{"iso_3166_1": "US", "name": "United States o... | 2007-05-19 | 961000000 | 169.0 | [{"iso_639_1": "en", "name": "English"}] | Released | At the end of the world, the adventure begins. | Pirates of the Caribbean: At World's End | 6.9 | 4500 |
2 | 245000000 | [{"id": 28, "name": "Action"}, {"id": 12, "nam... | http://www.sonypictures.com/movies/spectre/ | 206647 | [{"id": 470, "name": "spy"}, {"id": 818, "name... | en | Spectre | A cryptic message from Bond’s past sends him o... | 107.376788 | [{"name": "Columbia Pictures", "id": 5}, {"nam... | [{"iso_3166_1": "GB", "name": "United Kingdom"... | 2015-10-26 | 880674609 | 148.0 | [{"iso_639_1": "fr", "name": "Fran\u00e7ais"},... | Released | A Plan No One Escapes | Spectre | 6.3 | 4466 |
df1.shape, df2.shape
((4803, 4), (4803, 20))
# column들 달라서 동일한지 확인
df1['title'].equals(df2['title'])
True
df1.columns
Index(['movie_id', 'title', 'cast', 'crew'], dtype='object')
df1.columns = ['id', 'title', 'cast', 'crew']
df1.columns
Index(['id', 'title', 'cast', 'crew'], dtype='object')
# title은 동일해서 제외시킴
df1[['id', 'cast', 'crew']]
id | cast | crew | |
---|---|---|---|
0 | 19995 | [{"cast_id": 242, "character": "Jake Sully", "... | [{"credit_id": "52fe48009251416c750aca23", "de... |
1 | 285 | [{"cast_id": 4, "character": "Captain Jack Spa... | [{"credit_id": "52fe4232c3a36847f800b579", "de... |
2 | 206647 | [{"cast_id": 1, "character": "James Bond", "cr... | [{"credit_id": "54805967c3a36829b5002c41", "de... |
3 | 49026 | [{"cast_id": 2, "character": "Bruce Wayne / Ba... | [{"credit_id": "52fe4781c3a36847f81398c3", "de... |
4 | 49529 | [{"cast_id": 5, "character": "John Carter", "c... | [{"credit_id": "52fe479ac3a36847f813eaa3", "de... |
... | ... | ... | ... |
4798 | 9367 | [{"cast_id": 1, "character": "El Mariachi", "c... | [{"credit_id": "52fe44eec3a36847f80b280b", "de... |
4799 | 72766 | [{"cast_id": 1, "character": "Buzzy", "credit_... | [{"credit_id": "52fe487dc3a368484e0fb013", "de... |
4800 | 231617 | [{"cast_id": 8, "character": "Oliver O\u2019To... | [{"credit_id": "52fe4df3c3a36847f8275ecf", "de... |
4801 | 126186 | [{"cast_id": 3, "character": "Sam", "credit_id... | [{"credit_id": "52fe4ad9c3a368484e16a36b", "de... |
4802 | 25975 | [{"cast_id": 3, "character": "Herself", "credi... | [{"credit_id": "58ce021b9251415a390165d9", "de... |
4803 rows × 3 columns
# 구한 df1과 df2를 merge(합침)
df2 = df2.merge(df1[['id', 'cast', 'crew']], on='id')
df2.head(3)
budget | genres | homepage | id | keywords | original_language | original_title | overview | popularity | production_companies | ... | revenue | runtime | spoken_languages | status | tagline | title | vote_average | vote_count | cast | crew | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 237000000 | [{"id": 28, "name": "Action"}, {"id": 12, "nam... | http://www.avatarmovie.com/ | 19995 | [{"id": 1463, "name": "culture clash"}, {"id":... | en | Avatar | In the 22nd century, a paraplegic Marine is di... | 150.437577 | [{"name": "Ingenious Film Partners", "id": 289... | ... | 2787965087 | 162.0 | [{"iso_639_1": "en", "name": "English"}, {"iso... | Released | Enter the World of Pandora. | Avatar | 7.2 | 11800 | [{"cast_id": 242, "character": "Jake Sully", "... | [{"credit_id": "52fe48009251416c750aca23", "de... |
1 | 300000000 | [{"id": 12, "name": "Adventure"}, {"id": 14, "... | http://disney.go.com/disneypictures/pirates/ | 285 | [{"id": 270, "name": "ocean"}, {"id": 726, "na... | en | Pirates of the Caribbean: At World's End | Captain Barbossa, long believed to be dead, ha... | 139.082615 | [{"name": "Walt Disney Pictures", "id": 2}, {"... | ... | 961000000 | 169.0 | [{"iso_639_1": "en", "name": "English"}] | Released | At the end of the world, the adventure begins. | Pirates of the Caribbean: At World's End | 6.9 | 4500 | [{"cast_id": 4, "character": "Captain Jack Spa... | [{"credit_id": "52fe4232c3a36847f800b579", "de... |
2 | 245000000 | [{"id": 28, "name": "Action"}, {"id": 12, "nam... | http://www.sonypictures.com/movies/spectre/ | 206647 | [{"id": 470, "name": "spy"}, {"id": 818, "name... | en | Spectre | A cryptic message from Bond’s past sends him o... | 107.376788 | [{"name": "Columbia Pictures", "id": 5}, {"nam... | ... | 880674609 | 148.0 | [{"iso_639_1": "fr", "name": "Fran\u00e7ais"},... | Released | A Plan No One Escapes | Spectre | 6.3 | 4466 | [{"cast_id": 1, "character": "James Bond", "cr... | [{"credit_id": "54805967c3a36829b5002c41", "de... |
3 rows × 22 columns
영화 1 : 영화의 평점이 10/10 -> 5명이 평가
영화 2 : 영화의 평점이 8/10 -> 500명이 평가
-
당연히 500명의 영화가 더 신뢰도가 높다.
-
참고 링크에서 점수 계산해주는 공식 활용(링크 내용을 따라가는 중임)
- $ WR = (\frac{v}{v+m}\cdot R) + (\frac{m}{v+m}\cdot C) $
C = df2['vote_average'].mean()
C # 전체 영화의 평균 평점
6.092171559442011
m = df2['vote_count'].quantile(0.9)
m # 상위 10%의 평가수를 가지는 데이터들
1838.4000000000015
# df2를 복제(copy)해서 loc로 'vote_count'가 m보다 큰 데이터들만 가져옴
q_movies = df2.copy().loc[df2['vote_count'] >= m]
q_movies.shape
(481, 22)
# 가장 적은 평가 개수가 1840으로 위의 m보다 큰것이 가장 최소인걸 알 수 있다.
q_movies['vote_count'].sort_values()
2585 1840 195 1851 2454 1859 597 1862 1405 1864 ... 788 10995 16 11776 0 11800 65 12002 96 13752 Name: vote_count, Length: 481, dtype: int64
# 위 참고 링크에서 본 공식을 생성
def weighted_rating(x, m=m, C=C):
v = x['vote_count']
R = x['vote_average']
return (v / (v + m) * R) + (m / (m + v) * C)
# 함수를 통해 값을 얻어서 'score'라는 새로운 열에 추가
q_movies['score'] = q_movies.apply(weighted_rating, axis=1)
q_movies.head(3)
budget | genres | homepage | id | keywords | original_language | original_title | overview | popularity | production_companies | ... | runtime | spoken_languages | status | tagline | title | vote_average | vote_count | cast | crew | score | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 237000000 | [{"id": 28, "name": "Action"}, {"id": 12, "nam... | http://www.avatarmovie.com/ | 19995 | [{"id": 1463, "name": "culture clash"}, {"id":... | en | Avatar | In the 22nd century, a paraplegic Marine is di... | 150.437577 | [{"name": "Ingenious Film Partners", "id": 289... | ... | 162.0 | [{"iso_639_1": "en", "name": "English"}, {"iso... | Released | Enter the World of Pandora. | Avatar | 7.2 | 11800 | [{"cast_id": 242, "character": "Jake Sully", "... | [{"credit_id": "52fe48009251416c750aca23", "de... | 7.050669 |
1 | 300000000 | [{"id": 12, "name": "Adventure"}, {"id": 14, "... | http://disney.go.com/disneypictures/pirates/ | 285 | [{"id": 270, "name": "ocean"}, {"id": 726, "na... | en | Pirates of the Caribbean: At World's End | Captain Barbossa, long believed to be dead, ha... | 139.082615 | [{"name": "Walt Disney Pictures", "id": 2}, {"... | ... | 169.0 | [{"iso_639_1": "en", "name": "English"}] | Released | At the end of the world, the adventure begins. | Pirates of the Caribbean: At World's End | 6.9 | 4500 | [{"cast_id": 4, "character": "Captain Jack Spa... | [{"credit_id": "52fe4232c3a36847f800b579", "de... | 6.665696 |
2 | 245000000 | [{"id": 28, "name": "Action"}, {"id": 12, "nam... | http://www.sonypictures.com/movies/spectre/ | 206647 | [{"id": 470, "name": "spy"}, {"id": 818, "name... | en | Spectre | A cryptic message from Bond’s past sends him o... | 107.376788 | [{"name": "Columbia Pictures", "id": 5}, {"nam... | ... | 148.0 | [{"iso_639_1": "fr", "name": "Fran\u00e7ais"},... | Released | A Plan No One Escapes | Spectre | 6.3 | 4466 | [{"cast_id": 1, "character": "James Bond", "cr... | [{"credit_id": "54805967c3a36829b5002c41", "de... | 6.239396 |
3 rows × 23 columns
q_movies = q_movies.sort_values('score', ascending=False) # 내림차순
q_movies[['title', 'vote_count', 'vote_average', 'score']].head(10)
title | vote_count | vote_average | score | |
---|---|---|---|---|
1881 | The Shawshank Redemption | 8205 | 8.5 | 8.059258 |
662 | Fight Club | 9413 | 8.3 | 7.939256 |
65 | The Dark Knight | 12002 | 8.2 | 7.920020 |
3232 | Pulp Fiction | 8428 | 8.3 | 7.904645 |
96 | Inception | 13752 | 8.1 | 7.863239 |
3337 | The Godfather | 5893 | 8.4 | 7.851236 |
95 | Interstellar | 10867 | 8.1 | 7.809479 |
809 | Forrest Gump | 7927 | 8.2 | 7.803188 |
329 | The Lord of the Rings: The Return of the King | 8064 | 8.1 | 7.727243 |
1990 | The Empire Strikes Back | 5879 | 8.2 | 7.697884 |
pop= df2.sort_values('popularity', ascending=False)
import matplotlib.pyplot as plt
plt.figure(figsize=(12,4))
plt.barh(pop['title'].head(10),pop['popularity'].head(10), align='center',
color='skyblue')
plt.gca().invert_yaxis()
plt.xlabel("Popularity")
plt.title("Popular Movies")
Text(0.5, 1.0, 'Popular Movies')
2. Content Based Filtering (컨텐츠 기반 필터링)
컨텐츠의 문자들을 유사도를 통해서 유사도가 상위인 데이터들 가져오는 형식
줄거리 기반 추천
‘overview’
df2['overview'].head(5)
0 In the 22nd century, a paraplegic Marine is di... 1 Captain Barbossa, long believed to be dead, ha... 2 A cryptic message from Bond’s past sends him o... 3 Following the death of District Attorney Harve... 4 John Carter is a war-weary, former military ca... Name: overview, dtype: object
Bag Of Words - BOW
문장1 : I am a boy
문장2 : I am a girl
I(2), am(2), a(2), boy(1), girl(1)
I am a boy girl
문장1 1 1 1 1 0 (1,1,1,1,0)
(I am a boy)
문장2 1 1 1 0 1 (1,1,1,0,1)
(I am a girl)
피처 벡터화.
문서 100개
모든 문서에서 나온 단어 10,000 개
100 * 10,000 = 100만
단어1, 단어2, 단어3, 단어4, .... 단어 10000
문서1 1 1 3 0
문서2
문서3
..
문서100
-
TfidfVectorizer (TF-IDF 기반의 벡터화)
-
CountVectorizer
TfidfVectorizer (TF-IDF 기반의 벡터화)
은 a, the 등등 어디 문서에서든 많이 나오므로 필요없는 이 영어들은 제외하고 나머지에서 위처럼 단어들을 필터링 해주는 방식을 의미
영화 줄거리는 이런 영어들 많이 사용하므로 이 기술을 사용
from sklearn.feature_extraction.text import TfidfVectorizer
tfidf = TfidfVectorizer(stop_words='english') # 필요없는 영어들 제외
from sklearn.feature_extraction.text import ENGLISH_STOP_WORDS
ENGLISH_STOP_WORDS # 무엇을 제외했는지 보는 방법
frozenset({'a', 'about', 'above', 'across', 'after', 'afterwards', 'again', 'against', 'all', 'almost', 'alone', 'along', 'already', 'also', 'although', 'always', 'am', 'among', 'amongst', 'amoungst', 'amount', 'an', 'and', 'another', 'any', 'anyhow', 'anyone', 'anything', 'anyway', 'anywhere', 'are', 'around', 'as', 'at', 'back', 'be', 'became', 'because', 'become', 'becomes', 'becoming', 'been', 'before', 'beforehand', 'behind', 'being', 'below', 'beside', 'besides', 'between', 'beyond', 'bill', 'both', 'bottom', 'but', 'by', 'call', 'can', 'cannot', 'cant', 'co', 'con', 'could', 'couldnt', 'cry', 'de', 'describe', 'detail', 'do', 'done', 'down', 'due', 'during', 'each', 'eg', 'eight', 'either', 'eleven', 'else', 'elsewhere', 'empty', 'enough', 'etc', 'even', 'ever', 'every', 'everyone', 'everything', 'everywhere', 'except', 'few', 'fifteen', 'fifty', 'fill', 'find', 'fire', 'first', 'five', 'for', 'former', 'formerly', 'forty', 'found', 'four', 'from', 'front', 'full', 'further', 'get', 'give', 'go', 'had', 'has', 'hasnt', 'have', 'he', 'hence', 'her', 'here', 'hereafter', 'hereby', 'herein', 'hereupon', 'hers', 'herself', 'him', 'himself', 'his', 'how', 'however', 'hundred', 'i', 'ie', 'if', 'in', 'inc', 'indeed', 'interest', 'into', 'is', 'it', 'its', 'itself', 'keep', 'last', 'latter', 'latterly', 'least', 'less', 'ltd', 'made', 'many', 'may', 'me', 'meanwhile', 'might', 'mill', 'mine', 'more', 'moreover', 'most', 'mostly', 'move', 'much', 'must', 'my', 'myself', 'name', 'namely', 'neither', 'never', 'nevertheless', 'next', 'nine', 'no', 'nobody', 'none', 'noone', 'nor', 'not', 'nothing', 'now', 'nowhere', 'of', 'off', 'often', 'on', 'once', 'one', 'only', 'onto', 'or', 'other', 'others', 'otherwise', 'our', 'ours', 'ourselves', 'out', 'over', 'own', 'part', 'per', 'perhaps', 'please', 'put', 'rather', 're', 'same', 'see', 'seem', 'seemed', 'seeming', 'seems', 'serious', 'several', 'she', 'should', 'show', 'side', 'since', 'sincere', 'six', 'sixty', 'so', 'some', 'somehow', 'someone', 'something', 'sometime', 'sometimes', 'somewhere', 'still', 'such', 'system', 'take', 'ten', 'than', 'that', 'the', 'their', 'them', 'themselves', 'then', 'thence', 'there', 'thereafter', 'thereby', 'therefore', 'therein', 'thereupon', 'these', 'they', 'thick', 'thin', 'third', 'this', 'those', 'though', 'three', 'through', 'throughout', 'thru', 'thus', 'to', 'together', 'too', 'top', 'toward', 'towards', 'twelve', 'twenty', 'two', 'un', 'under', 'until', 'up', 'upon', 'us', 'very', 'via', 'was', 'we', 'well', 'were', 'what', 'whatever', 'when', 'whence', 'whenever', 'where', 'whereafter', 'whereas', 'whereby', 'wherein', 'whereupon', 'wherever', 'whether', 'which', 'while', 'whither', 'who', 'whoever', 'whole', 'whom', 'whose', 'why', 'will', 'with', 'within', 'without', 'would', 'yet', 'you', 'your', 'yours', 'yourself', 'yourselves'})
# null 이 하나라도 있다면 true 반환
df2['overview'].isnull().values.any()
True
# null 값을 찾아서 '' 값으로 삽입
df2['overview'] = df2['overview'].fillna('')
tfidf_matrix = tfidf.fit_transform(df2['overview'])
tfidf_matrix.shape
(4803, 20978)
tfidf_matrix
<4803x20978 sparse matrix of type '<class 'numpy.float64'>' with 125840 stored elements in Compressed Sparse Row format>
# 신뢰도 - 코사인 유사도
from sklearn.metrics.pairwise import linear_kernel
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)
cosine_sim
array([[1. , 0. , 0. , ..., 0. , 0. , 0. ], [0. , 1. , 0. , ..., 0.02160533, 0. , 0. ], [0. , 0. , 1. , ..., 0.01488159, 0. , 0. ], ..., [0. , 0.02160533, 0.01488159, ..., 1. , 0.01609091, 0.00701914], [0. , 0. , 0. , ..., 0.01609091, 1. , 0.01171696], [0. , 0. , 0. , ..., 0.00701914, 0.01171696, 1. ]])
| | 문장1 | 문장2 | 문장3 |
|—|—|—|—|
문장1 | 1 | 0.3 | 0.8 |
문장2 | 0.3 | 1 | 0.5 |
문장3 | 0.8 | 0.5 | 1 |
# 문장1이 자신을 제외한 가장 유사도 높은 값은? 문장3
cosine_sim.shape # 대칭
(4803, 4803)
# 참고 : Series는 1차원 배열로 생각하면 됨
indices = pd.Series(df2.index, index=df2['title']).drop_duplicates()
indices
title Avatar 0 Pirates of the Caribbean: At World's End 1 Spectre 2 The Dark Knight Rises 3 John Carter 4 ... El Mariachi 4798 Newlyweds 4799 Signed, Sealed, Delivered 4800 Shanghai Calling 4801 My Date with Drew 4802 Length: 4803, dtype: int64
indices['The Dark Knight Rises']
3
df2.iloc[[3]]
budget | genres | homepage | id | keywords | original_language | original_title | overview | popularity | production_companies | ... | revenue | runtime | spoken_languages | status | tagline | title | vote_average | vote_count | cast | crew | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | 250000000 | [{"id": 28, "name": "Action"}, {"id": 80, "nam... | http://www.thedarkknightrises.com/ | 49026 | [{"id": 849, "name": "dc comics"}, {"id": 853,... | en | The Dark Knight Rises | Following the death of District Attorney Harve... | 112.31295 | [{"name": "Legendary Pictures", "id": 923}, {"... | ... | 1084939099 | 165.0 | [{"iso_639_1": "en", "name": "English"}] | Released | The Legend Ends | The Dark Knight Rises | 7.6 | 9106 | [{"cast_id": 2, "character": "Bruce Wayne / Ba... | [{"credit_id": "52fe4781c3a36847f81398c3", "de... |
1 rows × 22 columns
# 영화의 제목을 입력받으면 코사인 유사도를 통해서 가장 유사도가 높은 상위 10개의 영화 목록 반환
def get_recommendations(title, cosine_sim=cosine_sim):
# 영화 제목을 통해서 전체 데이터 기준 그 영화의 index 값을 얻기
idx = indices[title]
# 코사인 유사도 매트릭스 (cosine_sim) 에서 idx 에 해당하는 데이터를 (idx, 유사도) 형태로 얻기
sim_scores = list(enumerate(cosine_sim[idx]))
# 코사인 유사도 기준으로 내림차순 정렬
sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
# 자기 자신을 제외한 10개의 추천 영화를 슬라이싱
sim_scores = sim_scores[1:11]
# 추천 영화 목록 10개의 인덱스 정보 추출
movie_indices = [i[0] for i in sim_scores]
# 인덱스 정보를 통해 영화 제목 추출
return df2['title'].iloc[movie_indices]
test_idx = indices['The Dark Knight Rises'] # 영화 제목을 통해서 전체 데이터 기준 그 영화의 index 값을 얻기
test_idx
3
cosine_sim[3] # 유사도
array([0.02499512, 0. , 0. , ..., 0.03386366, 0.04275232, 0.02269198])
test_sim_scores = list(enumerate(cosine_sim[3])) # 코사인 유사도 매트릭스 (cosine_sim) 에서 idx 에 해당하는 데이터를 (idx, 유사도) 형태로 얻기
test_sim_scores = sorted(test_sim_scores, key=lambda x: x[1], reverse=True) # 코사인 유사도 기준으로 내림차순 정렬
test_sim_scores[1:11] # 자기 자신을 제외한 10개의 추천 영화를 슬라이싱
[(65, 0.30151176591665485), (299, 0.29857045255396825), (428, 0.2878505467001694), (1359, 0.264460923827995), (3854, 0.18545003006561456), (119, 0.16799626199850706), (2507, 0.16682891043358278), (9, 0.1337400906655523), (1181, 0.13219702138476813), (210, 0.13045537014449818)]
# 람다식 사용방식 보여주기위함
def get_second(x):
return x[1]
lst = ['인덱스', '유사도']
print(get_second(lst))
유사도
# 람다식 사용방식 보여주기위함
# x[1]로 함수를 만든거고 lst가 x로 사용된거라고 생각하면 됨
(lambda x: x[1])(lst)
'유사도'
# 추천 영화 목록 10개의 인덱스 정보 추출
test_movie_indices = [i[0] for i in test_sim_scores[1:11]]
test_movie_indices
[65, 299, 428, 1359, 3854, 119, 2507, 9, 1181, 210]
# 인덱스 정보를 통해 영화 제목 추출
df2['title'].iloc[test_movie_indices]
65 The Dark Knight 299 Batman Forever 428 Batman Returns 1359 Batman 3854 Batman: The Dark Knight Returns, Part 2 119 Batman Begins 2507 Slow Burn 9 Batman v Superman: Dawn of Justice 1181 JFK 210 Batman & Robin Name: title, dtype: object
df2['title'][:20]
0 Avatar 1 Pirates of the Caribbean: At World's End 2 Spectre 3 The Dark Knight Rises 4 John Carter 5 Spider-Man 3 6 Tangled 7 Avengers: Age of Ultron 8 Harry Potter and the Half-Blood Prince 9 Batman v Superman: Dawn of Justice 10 Superman Returns 11 Quantum of Solace 12 Pirates of the Caribbean: Dead Man's Chest 13 The Lone Ranger 14 Man of Steel 15 The Chronicles of Narnia: Prince Caspian 16 The Avengers 17 Pirates of the Caribbean: On Stranger Tides 18 Men in Black 3 19 The Hobbit: The Battle of the Five Armies Name: title, dtype: object
get_recommendations('Avengers: Age of Ultron')
16 The Avengers 79 Iron Man 2 68 Iron Man 26 Captain America: Civil War 227 Knight and Day 31 Iron Man 3 1868 Cradle 2 the Grave 344 Unstoppable 1922 Gettysburg 531 The Man from U.N.C.L.E. Name: title, dtype: object
get_recommendations('The Avengers')
7 Avengers: Age of Ultron 3144 Plastic 1715 Timecop 4124 This Thing of Ours 3311 Thank You for Smoking 3033 The Corruptor 588 Wall Street: Money Never Sleeps 2136 Team America: World Police 1468 The Fountain 1286 Snowpiercer Name: title, dtype: object
다양한 요소 기반 추천 (장르, 감독, 키워드 등)
위의 줄거리말고 다른것들 해보는것
df2.head(3)
budget | genres | homepage | id | keywords | original_language | original_title | overview | popularity | production_companies | ... | revenue | runtime | spoken_languages | status | tagline | title | vote_average | vote_count | cast | crew | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 237000000 | [{"id": 28, "name": "Action"}, {"id": 12, "nam... | http://www.avatarmovie.com/ | 19995 | [{"id": 1463, "name": "culture clash"}, {"id":... | en | Avatar | In the 22nd century, a paraplegic Marine is di... | 150.437577 | [{"name": "Ingenious Film Partners", "id": 289... | ... | 2787965087 | 162.0 | [{"iso_639_1": "en", "name": "English"}, {"iso... | Released | Enter the World of Pandora. | Avatar | 7.2 | 11800 | [{"cast_id": 242, "character": "Jake Sully", "... | [{"credit_id": "52fe48009251416c750aca23", "de... |
1 | 300000000 | [{"id": 12, "name": "Adventure"}, {"id": 14, "... | http://disney.go.com/disneypictures/pirates/ | 285 | [{"id": 270, "name": "ocean"}, {"id": 726, "na... | en | Pirates of the Caribbean: At World's End | Captain Barbossa, long believed to be dead, ha... | 139.082615 | [{"name": "Walt Disney Pictures", "id": 2}, {"... | ... | 961000000 | 169.0 | [{"iso_639_1": "en", "name": "English"}] | Released | At the end of the world, the adventure begins. | Pirates of the Caribbean: At World's End | 6.9 | 4500 | [{"cast_id": 4, "character": "Captain Jack Spa... | [{"credit_id": "52fe4232c3a36847f800b579", "de... |
2 | 245000000 | [{"id": 28, "name": "Action"}, {"id": 12, "nam... | http://www.sonypictures.com/movies/spectre/ | 206647 | [{"id": 470, "name": "spy"}, {"id": 818, "name... | en | Spectre | A cryptic message from Bond’s past sends him o... | 107.376788 | [{"name": "Columbia Pictures", "id": 5}, {"nam... | ... | 880674609 | 148.0 | [{"iso_639_1": "fr", "name": "Fran\u00e7ais"},... | Released | A Plan No One Escapes | Spectre | 6.3 | 4466 | [{"cast_id": 1, "character": "James Bond", "cr... | [{"credit_id": "54805967c3a36829b5002c41", "de... |
3 rows × 22 columns
df2.loc[0, 'genres'] # 0번째 행의 장르 가져온것
'[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]'
s1 = [{"id": 28, "name": "Action"}]
s2 = '[{"id": 28, "name": "Action"}]'
type(s1), type(s2)
(list, str)
# 위의 s2 형식을 list로 바꾸는 방법
from ast import literal_eval
s2 = literal_eval(s2)
s2, type(s2)
([{'id': 28, 'name': 'Action'}], list)
print(s1)
print(s2)
[{'id': 28, 'name': 'Action'}] [{'id': 28, 'name': 'Action'}]
# 반복문으로 4개 열을 다 list형태로 바꾸는중
features = ['cast', 'crew', 'keywords', 'genres']
for feature in features:
df2[feature] = df2[feature].apply(literal_eval)
# list로 바뀐것을 볼 수 있음
df2.loc[0, 'crew']
[{'credit_id': '52fe48009251416c750aca23', 'department': 'Editing', 'gender': 0, 'id': 1721, 'job': 'Editor', 'name': 'Stephen E. Rivkin'}, {'credit_id': '539c47ecc3a36810e3001f87', 'department': 'Art', 'gender': 2, 'id': 496, 'job': 'Production Design', 'name': 'Rick Carter'}, {'credit_id': '54491c89c3a3680fb4001cf7', 'department': 'Sound', 'gender': 0, 'id': 900, 'job': 'Sound Designer', 'name': 'Christopher Boyes'}, {'credit_id': '54491cb70e0a267480001bd0', 'department': 'Sound', 'gender': 0, 'id': 900, 'job': 'Supervising Sound Editor', 'name': 'Christopher Boyes'}, {'credit_id': '539c4a4cc3a36810c9002101', 'department': 'Production', 'gender': 1, 'id': 1262, 'job': 'Casting', 'name': 'Mali Finn'}, {'credit_id': '5544ee3b925141499f0008fc', 'department': 'Sound', 'gender': 2, 'id': 1729, 'job': 'Original Music Composer', 'name': 'James Horner'}, {'credit_id': '52fe48009251416c750ac9c3', 'department': 'Directing', 'gender': 2, 'id': 2710, 'job': 'Director', 'name': 'James Cameron'}, {'credit_id': '52fe48009251416c750ac9d9', 'department': 'Writing', 'gender': 2, 'id': 2710, 'job': 'Writer', 'name': 'James Cameron'}, {'credit_id': '52fe48009251416c750aca17', 'department': 'Editing', 'gender': 2, 'id': 2710, 'job': 'Editor', 'name': 'James Cameron'}, {'credit_id': '52fe48009251416c750aca29', 'department': 'Production', 'gender': 2, 'id': 2710, 'job': 'Producer', 'name': 'James Cameron'}, {'credit_id': '52fe48009251416c750aca3f', 'department': 'Writing', 'gender': 2, 'id': 2710, 'job': 'Screenplay', 'name': 'James Cameron'}, {'credit_id': '539c4987c3a36810ba0021a4', 'department': 'Art', 'gender': 2, 'id': 7236, 'job': 'Art Direction', 'name': 'Andrew Menzies'}, {'credit_id': '549598c3c3a3686ae9004383', 'department': 'Visual Effects', 'gender': 0, 'id': 6690, 'job': 'Visual Effects Producer', 'name': 'Jill Brooks'}, {'credit_id': '52fe48009251416c750aca4b', 'department': 'Production', 'gender': 1, 'id': 6347, 'job': 'Casting', 'name': 'Margery Simkin'}, {'credit_id': '570b6f419251417da70032fe', 'department': 'Art', 'gender': 2, 'id': 6878, 'job': 'Supervising Art Director', 'name': 'Kevin Ishioka'}, {'credit_id': '5495a0fac3a3686ae9004468', 'department': 'Sound', 'gender': 0, 'id': 6883, 'job': 'Music Editor', 'name': 'Dick Bernstein'}, {'credit_id': '54959706c3a3686af3003e81', 'department': 'Sound', 'gender': 0, 'id': 8159, 'job': 'Sound Effects Editor', 'name': 'Shannon Mills'}, {'credit_id': '54491d58c3a3680fb1001ccb', 'department': 'Sound', 'gender': 0, 'id': 8160, 'job': 'Foley', 'name': 'Dennie Thorpe'}, {'credit_id': '54491d6cc3a3680fa5001b2c', 'department': 'Sound', 'gender': 0, 'id': 8163, 'job': 'Foley', 'name': 'Jana Vance'}, {'credit_id': '52fe48009251416c750aca57', 'department': 'Costume & Make-Up', 'gender': 1, 'id': 8527, 'job': 'Costume Design', 'name': 'Deborah Lynn Scott'}, {'credit_id': '52fe48009251416c750aca2f', 'department': 'Production', 'gender': 2, 'id': 8529, 'job': 'Producer', 'name': 'Jon Landau'}, {'credit_id': '539c4937c3a36810ba002194', 'department': 'Art', 'gender': 0, 'id': 9618, 'job': 'Art Direction', 'name': 'Sean Haworth'}, {'credit_id': '539c49b6c3a36810c10020e6', 'department': 'Art', 'gender': 1, 'id': 12653, 'job': 'Set Decoration', 'name': 'Kim Sinclair'}, {'credit_id': '570b6f2f9251413a0e00020d', 'department': 'Art', 'gender': 1, 'id': 12653, 'job': 'Supervising Art Director', 'name': 'Kim Sinclair'}, {'credit_id': '54491a6c0e0a26748c001b19', 'department': 'Art', 'gender': 2, 'id': 14350, 'job': 'Set Designer', 'name': 'Richard F. Mays'}, {'credit_id': '56928cf4c3a3684cff0025c4', 'department': 'Production', 'gender': 1, 'id': 20294, 'job': 'Executive Producer', 'name': 'Laeta Kalogridis'}, {'credit_id': '52fe48009251416c750aca51', 'department': 'Costume & Make-Up', 'gender': 0, 'id': 17675, 'job': 'Costume Design', 'name': 'Mayes C. Rubeo'}, {'credit_id': '52fe48009251416c750aca11', 'department': 'Camera', 'gender': 2, 'id': 18265, 'job': 'Director of Photography', 'name': 'Mauro Fiore'}, {'credit_id': '5449194d0e0a26748f001b39', 'department': 'Art', 'gender': 0, 'id': 42281, 'job': 'Set Designer', 'name': 'Scott Herbertson'}, {'credit_id': '52fe48009251416c750aca05', 'department': 'Crew', 'gender': 0, 'id': 42288, 'job': 'Stunts', 'name': 'Woody Schultz'}, {'credit_id': '5592aefb92514152de0010f5', 'department': 'Costume & Make-Up', 'gender': 0, 'id': 29067, 'job': 'Makeup Artist', 'name': 'Linda DeVetta'}, {'credit_id': '5592afa492514152de00112c', 'department': 'Costume & Make-Up', 'gender': 0, 'id': 29067, 'job': 'Hairstylist', 'name': 'Linda DeVetta'}, {'credit_id': '54959ed592514130fc002e5d', 'department': 'Camera', 'gender': 2, 'id': 33302, 'job': 'Camera Operator', 'name': 'Richard Bluck'}, {'credit_id': '539c4891c3a36810ba002147', 'department': 'Art', 'gender': 2, 'id': 33303, 'job': 'Art Direction', 'name': 'Simon Bright'}, {'credit_id': '54959c069251417a81001f3a', 'department': 'Visual Effects', 'gender': 0, 'id': 113145, 'job': 'Visual Effects Supervisor', 'name': 'Richard Martin'}, {'credit_id': '54959a0dc3a3680ff5002c8d', 'department': 'Crew', 'gender': 2, 'id': 58188, 'job': 'Visual Effects Editor', 'name': 'Steve R. Moore'}, {'credit_id': '52fe48009251416c750aca1d', 'department': 'Editing', 'gender': 2, 'id': 58871, 'job': 'Editor', 'name': 'John Refoua'}, {'credit_id': '54491a4dc3a3680fc30018ca', 'department': 'Art', 'gender': 0, 'id': 92359, 'job': 'Set Designer', 'name': 'Karl J. Martin'}, {'credit_id': '52fe48009251416c750aca35', 'department': 'Camera', 'gender': 1, 'id': 72201, 'job': 'Director of Photography', 'name': 'Chiling Lin'}, {'credit_id': '52fe48009251416c750ac9ff', 'department': 'Crew', 'gender': 0, 'id': 89714, 'job': 'Stunts', 'name': 'Ilram Choi'}, {'credit_id': '54959c529251416e2b004394', 'department': 'Visual Effects', 'gender': 2, 'id': 93214, 'job': 'Visual Effects Supervisor', 'name': 'Steven Quale'}, {'credit_id': '54491edf0e0a267489001c37', 'department': 'Crew', 'gender': 1, 'id': 122607, 'job': 'Dialect Coach', 'name': 'Carla Meyer'}, {'credit_id': '539c485bc3a368653d001a3a', 'department': 'Art', 'gender': 2, 'id': 132585, 'job': 'Art Direction', 'name': 'Nick Bassett'}, {'credit_id': '539c4903c3a368653d001a74', 'department': 'Art', 'gender': 0, 'id': 132596, 'job': 'Art Direction', 'name': 'Jill Cormack'}, {'credit_id': '539c4967c3a368653d001a94', 'department': 'Art', 'gender': 0, 'id': 132604, 'job': 'Art Direction', 'name': 'Andy McLaren'}, {'credit_id': '52fe48009251416c750aca45', 'department': 'Crew', 'gender': 0, 'id': 236696, 'job': 'Motion Capture Artist', 'name': 'Terry Notary'}, {'credit_id': '54959e02c3a3680fc60027d2', 'department': 'Crew', 'gender': 2, 'id': 956198, 'job': 'Stunt Coordinator', 'name': 'Garrett Warren'}, {'credit_id': '54959ca3c3a3686ae300438c', 'department': 'Visual Effects', 'gender': 2, 'id': 957874, 'job': 'Visual Effects Supervisor', 'name': 'Jonathan Rothbart'}, {'credit_id': '570b6f519251412c74001b2f', 'department': 'Art', 'gender': 0, 'id': 957889, 'job': 'Supervising Art Director', 'name': 'Stefan Dechant'}, {'credit_id': '570b6f62c3a3680b77007460', 'department': 'Art', 'gender': 2, 'id': 959555, 'job': 'Supervising Art Director', 'name': 'Todd Cherniawsky'}, {'credit_id': '539c4a3ac3a36810da0021cc', 'department': 'Production', 'gender': 0, 'id': 1016177, 'job': 'Casting', 'name': 'Miranda Rivers'}, {'credit_id': '539c482cc3a36810c1002062', 'department': 'Art', 'gender': 0, 'id': 1032536, 'job': 'Production Design', 'name': 'Robert Stromberg'}, {'credit_id': '539c4b65c3a36810c9002125', 'department': 'Costume & Make-Up', 'gender': 2, 'id': 1071680, 'job': 'Costume Design', 'name': 'John Harding'}, {'credit_id': '54959e6692514130fc002e4e', 'department': 'Camera', 'gender': 0, 'id': 1177364, 'job': 'Steadicam Operator', 'name': 'Roberto De Angelis'}, {'credit_id': '539c49f1c3a368653d001aac', 'department': 'Costume & Make-Up', 'gender': 2, 'id': 1202850, 'job': 'Makeup Department Head', 'name': 'Mike Smithson'}, {'credit_id': '5495999ec3a3686ae100460c', 'department': 'Visual Effects', 'gender': 0, 'id': 1204668, 'job': 'Visual Effects Producer', 'name': 'Alain Lalanne'}, {'credit_id': '54959cdfc3a3681153002729', 'department': 'Visual Effects', 'gender': 0, 'id': 1206410, 'job': 'Visual Effects Supervisor', 'name': 'Lucas Salton'}, {'credit_id': '549596239251417a81001eae', 'department': 'Crew', 'gender': 0, 'id': 1234266, 'job': 'Post Production Supervisor', 'name': 'Janace Tashjian'}, {'credit_id': '54959c859251416e1e003efe', 'department': 'Visual Effects', 'gender': 0, 'id': 1271932, 'job': 'Visual Effects Supervisor', 'name': 'Stephen Rosenbaum'}, {'credit_id': '5592af28c3a368775a00105f', 'department': 'Costume & Make-Up', 'gender': 0, 'id': 1310064, 'job': 'Makeup Artist', 'name': 'Frankie Karena'}, {'credit_id': '539c4adfc3a36810e300203b', 'department': 'Costume & Make-Up', 'gender': 1, 'id': 1319844, 'job': 'Costume Supervisor', 'name': 'Lisa Lovaas'}, {'credit_id': '54959b579251416e2b004371', 'department': 'Visual Effects', 'gender': 0, 'id': 1327028, 'job': 'Visual Effects Supervisor', 'name': 'Jonathan Fawkner'}, {'credit_id': '539c48a7c3a36810b5001fa7', 'department': 'Art', 'gender': 0, 'id': 1330561, 'job': 'Art Direction', 'name': 'Robert Bavin'}, {'credit_id': '539c4a71c3a36810da0021e0', 'department': 'Costume & Make-Up', 'gender': 0, 'id': 1330567, 'job': 'Costume Supervisor', 'name': 'Anthony Almaraz'}, {'credit_id': '539c4a8ac3a36810ba0021e4', 'department': 'Costume & Make-Up', 'gender': 0, 'id': 1330570, 'job': 'Costume Supervisor', 'name': 'Carolyn M. Fenton'}, {'credit_id': '539c4ab6c3a36810da0021f0', 'department': 'Costume & Make-Up', 'gender': 0, 'id': 1330574, 'job': 'Costume Supervisor', 'name': 'Beth Koenigsberg'}, {'credit_id': '54491ab70e0a267480001ba2', 'department': 'Art', 'gender': 0, 'id': 1336191, 'job': 'Set Designer', 'name': 'Sam Page'}, {'credit_id': '544919d9c3a3680fc30018bd', 'department': 'Art', 'gender': 0, 'id': 1339441, 'job': 'Set Designer', 'name': 'Tex Kadonaga'}, {'credit_id': '54491cf50e0a267483001b0c', 'department': 'Editing', 'gender': 0, 'id': 1352422, 'job': 'Dialogue Editor', 'name': 'Kim Foscato'}, {'credit_id': '544919f40e0a26748c001b09', 'department': 'Art', 'gender': 0, 'id': 1352962, 'job': 'Set Designer', 'name': 'Tammy S. Lee'}, {'credit_id': '5495a115c3a3680ff5002d71', 'department': 'Crew', 'gender': 0, 'id': 1357070, 'job': 'Transportation Coordinator', 'name': 'Denny Caira'}, {'credit_id': '5495a12f92514130fc002e94', 'department': 'Crew', 'gender': 0, 'id': 1357071, 'job': 'Transportation Coordinator', 'name': 'James Waitkus'}, {'credit_id': '5495976fc3a36811530026b0', 'department': 'Sound', 'gender': 0, 'id': 1360103, 'job': 'Supervising Sound Editor', 'name': 'Addison Teague'}, {'credit_id': '54491837c3a3680fb1001c5a', 'department': 'Art', 'gender': 2, 'id': 1376887, 'job': 'Set Designer', 'name': 'C. Scott Baker'}, {'credit_id': '54491878c3a3680fb4001c9d', 'department': 'Art', 'gender': 0, 'id': 1376888, 'job': 'Set Designer', 'name': 'Luke Caska'}, {'credit_id': '544918dac3a3680fa5001ae0', 'department': 'Art', 'gender': 0, 'id': 1376889, 'job': 'Set Designer', 'name': 'David Chow'}, {'credit_id': '544919110e0a267486001b68', 'department': 'Art', 'gender': 0, 'id': 1376890, 'job': 'Set Designer', 'name': 'Jonathan Dyer'}, {'credit_id': '54491967c3a3680faa001b5e', 'department': 'Art', 'gender': 0, 'id': 1376891, 'job': 'Set Designer', 'name': 'Joseph Hiura'}, {'credit_id': '54491997c3a3680fb1001c8a', 'department': 'Art', 'gender': 0, 'id': 1376892, 'job': 'Art Department Coordinator', 'name': 'Rebecca Jellie'}, {'credit_id': '544919ba0e0a26748f001b42', 'department': 'Art', 'gender': 0, 'id': 1376893, 'job': 'Set Designer', 'name': 'Robert Andrew Johnson'}, {'credit_id': '54491b1dc3a3680faa001b8c', 'department': 'Art', 'gender': 0, 'id': 1376895, 'job': 'Assistant Art Director', 'name': 'Mike Stassi'}, {'credit_id': '54491b79c3a3680fbb001826', 'department': 'Art', 'gender': 0, 'id': 1376897, 'job': 'Construction Coordinator', 'name': 'John Villarino'}, {'credit_id': '54491baec3a3680fb4001ce6', 'department': 'Art', 'gender': 2, 'id': 1376898, 'job': 'Assistant Art Director', 'name': 'Jeffrey Wisniewski'}, {'credit_id': '54491d2fc3a3680fb4001d07', 'department': 'Editing', 'gender': 0, 'id': 1376899, 'job': 'Dialogue Editor', 'name': 'Cheryl Nardi'}, {'credit_id': '54491d86c3a3680fa5001b2f', 'department': 'Editing', 'gender': 0, 'id': 1376901, 'job': 'Dialogue Editor', 'name': 'Marshall Winn'}, {'credit_id': '54491d9dc3a3680faa001bb0', 'department': 'Sound', 'gender': 0, 'id': 1376902, 'job': 'Supervising Sound Editor', 'name': 'Gwendolyn Yates Whittle'}, {'credit_id': '54491dc10e0a267486001bce', 'department': 'Sound', 'gender': 0, 'id': 1376903, 'job': 'Sound Re-Recording Mixer', 'name': 'William Stein'}, {'credit_id': '54491f500e0a26747c001c07', 'department': 'Crew', 'gender': 0, 'id': 1376909, 'job': 'Choreographer', 'name': 'Lula Washington'}, {'credit_id': '549599239251412c4e002a2e', 'department': 'Visual Effects', 'gender': 0, 'id': 1391692, 'job': 'Visual Effects Producer', 'name': 'Chris Del Conte'}, {'credit_id': '54959d54c3a36831b8001d9a', 'department': 'Visual Effects', 'gender': 2, 'id': 1391695, 'job': 'Visual Effects Supervisor', 'name': 'R. Christopher White'}, {'credit_id': '54959bdf9251412c4e002a66', 'department': 'Visual Effects', 'gender': 0, 'id': 1394070, 'job': 'Visual Effects Supervisor', 'name': 'Dan Lemmon'}, {'credit_id': '5495971d92514132ed002922', 'department': 'Sound', 'gender': 0, 'id': 1394129, 'job': 'Sound Effects Editor', 'name': 'Tim Nielsen'}, {'credit_id': '5592b25792514152cc0011aa', 'department': 'Crew', 'gender': 0, 'id': 1394286, 'job': 'CG Supervisor', 'name': 'Michael Mulholland'}, {'credit_id': '54959a329251416e2b004355', 'department': 'Crew', 'gender': 0, 'id': 1394750, 'job': 'Visual Effects Editor', 'name': 'Thomas Nittmann'}, {'credit_id': '54959d6dc3a3686ae9004401', 'department': 'Visual Effects', 'gender': 0, 'id': 1394755, 'job': 'Visual Effects Supervisor', 'name': 'Edson Williams'}, {'credit_id': '5495a08fc3a3686ae300441c', 'department': 'Editing', 'gender': 0, 'id': 1394953, 'job': 'Digital Intermediate', 'name': 'Christine Carr'}, {'credit_id': '55402d659251413d6d000249', 'department': 'Visual Effects', 'gender': 0, 'id': 1395269, 'job': 'Visual Effects Supervisor', 'name': 'John Bruno'}, {'credit_id': '54959e7b9251416e1e003f3e', 'department': 'Camera', 'gender': 0, 'id': 1398970, 'job': 'Steadicam Operator', 'name': 'David Emmerichs'}, {'credit_id': '54959734c3a3686ae10045e0', 'department': 'Sound', 'gender': 0, 'id': 1400906, 'job': 'Sound Effects Editor', 'name': 'Christopher Scarabosio'}, {'credit_id': '549595dd92514130fc002d79', 'department': 'Production', 'gender': 0, 'id': 1401784, 'job': 'Production Supervisor', 'name': 'Jennifer Teves'}, {'credit_id': '549596009251413af70028cc', 'department': 'Production', 'gender': 0, 'id': 1401785, 'job': 'Production Manager', 'name': 'Brigitte Yorke'}, {'credit_id': '549596e892514130fc002d99', 'department': 'Sound', 'gender': 0, 'id': 1401786, 'job': 'Sound Effects Editor', 'name': 'Ken Fischer'}, {'credit_id': '549598229251412c4e002a1c', 'department': 'Crew', 'gender': 0, 'id': 1401787, 'job': 'Special Effects Coordinator', 'name': 'Iain Hutton'}, {'credit_id': '549598349251416e2b00432b', 'department': 'Crew', 'gender': 0, 'id': 1401788, 'job': 'Special Effects Coordinator', 'name': 'Steve Ingram'}, {'credit_id': '54959905c3a3686ae3004324', 'department': 'Visual Effects', 'gender': 0, 'id': 1401789, 'job': 'Visual Effects Producer', 'name': 'Joyce Cox'}, {'credit_id': '5495994b92514132ed002951', 'department': 'Visual Effects', 'gender': 0, 'id': 1401790, 'job': 'Visual Effects Producer', 'name': 'Jenny Foster'}, {'credit_id': '549599cbc3a3686ae1004613', 'department': 'Crew', 'gender': 0, 'id': 1401791, 'job': 'Visual Effects Editor', 'name': 'Christopher Marino'}, {'credit_id': '549599f2c3a3686ae100461e', 'department': 'Crew', 'gender': 0, 'id': 1401792, 'job': 'Visual Effects Editor', 'name': 'Jim Milton'}, {'credit_id': '54959a51c3a3686af3003eb5', 'department': 'Visual Effects', 'gender': 0, 'id': 1401793, 'job': 'Visual Effects Producer', 'name': 'Cyndi Ochs'}, {'credit_id': '54959a7cc3a36811530026f4', 'department': 'Crew', 'gender': 0, 'id': 1401794, 'job': 'Visual Effects Editor', 'name': 'Lucas Putnam'}, {'credit_id': '54959b91c3a3680ff5002cb4', 'department': 'Visual Effects', 'gender': 0, 'id': 1401795, 'job': 'Visual Effects Supervisor', 'name': "Anthony 'Max' Ivins"}, {'credit_id': '54959bb69251412c4e002a5f', 'department': 'Visual Effects', 'gender': 0, 'id': 1401796, 'job': 'Visual Effects Supervisor', 'name': 'John Knoll'}, {'credit_id': '54959cbbc3a3686ae3004391', 'department': 'Visual Effects', 'gender': 2, 'id': 1401799, 'job': 'Visual Effects Supervisor', 'name': 'Eric Saindon'}, {'credit_id': '54959d06c3a3686ae90043f6', 'department': 'Visual Effects', 'gender': 0, 'id': 1401800, 'job': 'Visual Effects Supervisor', 'name': 'Wayne Stables'}, {'credit_id': '54959d259251416e1e003f11', 'department': 'Visual Effects', 'gender': 0, 'id': 1401801, 'job': 'Visual Effects Supervisor', 'name': 'David Stinnett'}, {'credit_id': '54959db49251413af7002975', 'department': 'Visual Effects', 'gender': 0, 'id': 1401803, 'job': 'Visual Effects Supervisor', 'name': 'Guy Williams'}, {'credit_id': '54959de4c3a3681153002750', 'department': 'Crew', 'gender': 0, 'id': 1401804, 'job': 'Stunt Coordinator', 'name': 'Stuart Thorp'}, {'credit_id': '54959ef2c3a3680fc60027f2', 'department': 'Lighting', 'gender': 0, 'id': 1401805, 'job': 'Best Boy Electric', 'name': 'Giles Coburn'}, {'credit_id': '54959f07c3a3680fc60027f9', 'department': 'Camera', 'gender': 2, 'id': 1401806, 'job': 'Still Photographer', 'name': 'Mark Fellman'}, {'credit_id': '54959f47c3a3681153002774', 'department': 'Lighting', 'gender': 0, 'id': 1401807, 'job': 'Lighting Technician', 'name': 'Scott Sprague'}, {'credit_id': '54959f8cc3a36831b8001df2', 'department': 'Visual Effects', 'gender': 0, 'id': 1401808, 'job': 'Animation Director', 'name': 'Jeremy Hollobon'}, {'credit_id': '54959fa0c3a36831b8001dfb', 'department': 'Visual Effects', 'gender': 0, 'id': 1401809, 'job': 'Animation Director', 'name': 'Orlando Meunier'}, {'credit_id': '54959fb6c3a3686af3003f54', 'department': 'Visual Effects', 'gender': 0, 'id': 1401810, 'job': 'Animation Director', 'name': 'Taisuke Tanimura'}, {'credit_id': '54959fd2c3a36831b8001e02', 'department': 'Costume & Make-Up', 'gender': 0, 'id': 1401812, 'job': 'Set Costumer', 'name': 'Lilia Mishel Acevedo'}, {'credit_id': '54959ff9c3a3686ae300440c', 'department': 'Costume & Make-Up', 'gender': 0, 'id': 1401814, 'job': 'Set Costumer', 'name': 'Alejandro M. Hernandez'}, {'credit_id': '5495a0ddc3a3686ae10046fe', 'department': 'Editing', 'gender': 0, 'id': 1401815, 'job': 'Digital Intermediate', 'name': 'Marvin Hall'}, {'credit_id': '5495a1f7c3a3686ae3004443', 'department': 'Production', 'gender': 0, 'id': 1401816, 'job': 'Publicist', 'name': 'Judy Alley'}, {'credit_id': '5592b29fc3a36869d100002f', 'department': 'Crew', 'gender': 0, 'id': 1418381, 'job': 'CG Supervisor', 'name': 'Mike Perry'}, {'credit_id': '5592b23a9251415df8001081', 'department': 'Crew', 'gender': 0, 'id': 1426854, 'job': 'CG Supervisor', 'name': 'Andrew Morley'}, {'credit_id': '55491e1192514104c40002d8', 'department': 'Art', 'gender': 0, 'id': 1438901, 'job': 'Conceptual Design', 'name': 'Seth Engstrom'}, {'credit_id': '5525d5809251417276002b06', 'department': 'Crew', 'gender': 0, 'id': 1447362, 'job': 'Visual Effects Art Director', 'name': 'Eric Oliver'}, {'credit_id': '554427ca925141586500312a', 'department': 'Visual Effects', 'gender': 0, 'id': 1447503, 'job': 'Modeling', 'name': 'Matsune Suzuki'}, {'credit_id': '551906889251415aab001c88', 'department': 'Art', 'gender': 0, 'id': 1447524, 'job': 'Art Department Manager', 'name': 'Paul Tobin'}, {'credit_id': '5592af8492514152cc0010de', 'department': 'Costume & Make-Up', 'gender': 0, 'id': 1452643, 'job': 'Hairstylist', 'name': 'Roxane Griffin'}, {'credit_id': '553d3c109251415852001318', 'department': 'Lighting', 'gender': 0, 'id': 1453938, 'job': 'Lighting Artist', 'name': 'Arun Ram-Mohan'}, {'credit_id': '5592af4692514152d5001355', 'department': 'Costume & Make-Up', 'gender': 0, 'id': 1457305, 'job': 'Makeup Artist', 'name': 'Georgia Lockhart-Adams'}, {'credit_id': '5592b2eac3a36877470012a5', 'department': 'Crew', 'gender': 0, 'id': 1466035, 'job': 'CG Supervisor', 'name': 'Thrain Shadbolt'}, {'credit_id': '5592b032c3a36877450015f1', 'department': 'Crew', 'gender': 0, 'id': 1483220, 'job': 'CG Supervisor', 'name': 'Brad Alexander'}, {'credit_id': '5592b05592514152d80012f6', 'department': 'Crew', 'gender': 0, 'id': 1483221, 'job': 'CG Supervisor', 'name': 'Shadi Almassizadeh'}, {'credit_id': '5592b090c3a36877570010b5', 'department': 'Crew', 'gender': 0, 'id': 1483222, 'job': 'CG Supervisor', 'name': 'Simon Clutterbuck'}, {'credit_id': '5592b0dbc3a368774b00112c', 'department': 'Crew', 'gender': 0, 'id': 1483223, 'job': 'CG Supervisor', 'name': 'Graeme Demmocks'}, {'credit_id': '5592b0fe92514152db0010c1', 'department': 'Crew', 'gender': 0, 'id': 1483224, 'job': 'CG Supervisor', 'name': 'Adrian Fernandes'}, {'credit_id': '5592b11f9251415df8001059', 'department': 'Crew', 'gender': 0, 'id': 1483225, 'job': 'CG Supervisor', 'name': 'Mitch Gates'}, {'credit_id': '5592b15dc3a3687745001645', 'department': 'Crew', 'gender': 0, 'id': 1483226, 'job': 'CG Supervisor', 'name': 'Jerry Kung'}, {'credit_id': '5592b18e925141645a0004ae', 'department': 'Crew', 'gender': 0, 'id': 1483227, 'job': 'CG Supervisor', 'name': 'Andy Lomas'}, {'credit_id': '5592b1bfc3a368775d0010e7', 'department': 'Crew', 'gender': 0, 'id': 1483228, 'job': 'CG Supervisor', 'name': 'Sebastian Marino'}, {'credit_id': '5592b2049251415df8001078', 'department': 'Crew', 'gender': 0, 'id': 1483229, 'job': 'CG Supervisor', 'name': 'Matthias Menz'}, {'credit_id': '5592b27b92514152d800136a', 'department': 'Crew', 'gender': 0, 'id': 1483230, 'job': 'CG Supervisor', 'name': 'Sergei Nevshupov'}, {'credit_id': '5592b2c3c3a36869e800003c', 'department': 'Crew', 'gender': 0, 'id': 1483231, 'job': 'CG Supervisor', 'name': 'Philippe Rebours'}, {'credit_id': '5592b317c3a36877470012af', 'department': 'Crew', 'gender': 0, 'id': 1483232, 'job': 'CG Supervisor', 'name': 'Michael Takarangi'}, {'credit_id': '5592b345c3a36877470012bb', 'department': 'Crew', 'gender': 0, 'id': 1483233, 'job': 'CG Supervisor', 'name': 'David Weitzberg'}, {'credit_id': '5592b37cc3a368775100113b', 'department': 'Crew', 'gender': 0, 'id': 1483234, 'job': 'CG Supervisor', 'name': 'Ben White'}, {'credit_id': '573c8e2f9251413f5d000094', 'department': 'Crew', 'gender': 1, 'id': 1621932, 'job': 'Stunts', 'name': 'Min Windle'}]
# 감독 정보를 추출
def get_director(x):
for i in x:
if i['job'] == 'Director':
return i['name']
return np.nan
df2['director'] = df2['crew'].apply(get_director)
df2['director']
0 James Cameron 1 Gore Verbinski 2 Sam Mendes 3 Christopher Nolan 4 Andrew Stanton ... 4798 Robert Rodriguez 4799 Edward Burns 4800 Scott Smith 4801 Daniel Hsia 4802 Brian Herzlinger Name: director, Length: 4803, dtype: object
# 얼마나 null 값이있나 확인(nan으로 우린 저장했음)
df2[df2['director'].isnull()]
budget | genres | homepage | id | keywords | original_language | original_title | overview | popularity | production_companies | ... | runtime | spoken_languages | status | tagline | title | vote_average | vote_count | cast | crew | director | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3661 | 0 | [{'id': 18, 'name': 'Drama'}] | NaN | 19615 | [] | en | Flying By | A real estate developer goes to his 25th high ... | 1.546169 | [] | ... | 95.0 | [{"iso_639_1": "en", "name": "English"}] | Released | It's about the music | Flying By | 7.0 | 2 | [{'cast_id': 1, 'character': 'George', 'credit... | [] | NaN |
3670 | 0 | [{'id': 10751, 'name': 'Family'}] | NaN | 447027 | [] | en | Running Forever | After being estranged since her mother's death... | 0.028756 | [{"name": "New Kingdom Pictures", "id": 41671}] | ... | 88.0 | [] | Released | NaN | Running Forever | 0.0 | 0 | [] | [] | NaN |
3729 | 3250000 | [{'id': 18, 'name': 'Drama'}, {'id': 10751, 'n... | http://www.paathefilm.com/ | 26379 | [] | en | Paa | He suffers from a progeria like syndrome. Ment... | 2.126139 | [{"name": "A B Corp", "id": 4502}] | ... | 133.0 | [{"iso_639_1": "hi", "name": "\u0939\u093f\u09... | Released | NaN | Paa | 6.6 | 19 | [{'cast_id': 1, 'character': 'Auro', 'credit_i... | [{'credit_id': '52fe44fec3a368484e042a29', 'de... | NaN |
3977 | 0 | [{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam... | NaN | 55831 | [{'id': 10183, 'name': 'independent film'}] | en | Boynton Beach Club | A handful of men and women of a certain age pi... | 0.188870 | [] | ... | 105.0 | [{"iso_639_1": "en", "name": "English"}] | Released | NaN | Boynton Beach Club | 6.8 | 3 | [{'cast_id': 1, 'character': 'Marilyn', 'credi... | [] | NaN |
4068 | 0 | [] | NaN | 371085 | [] | en | Sharkskin | The Post War II story of Manhattan born Mike E... | 0.027801 | [] | ... | 0.0 | [] | Released | NaN | Sharkskin | 0.0 | 0 | [] | [] | NaN |
4105 | 2000000 | [] | NaN | 48382 | [] | en | The Book of Mormon Movie, Volume 1: The Journey | The story of Lehi and his wife Sariah and thei... | 0.031947 | [] | ... | 120.0 | [] | Released | 2600 years ago, one family began a remarkable ... | The Book of Mormon Movie, Volume 1: The Journey | 5.0 | 2 | [{'cast_id': 1, 'character': 'Sam', 'credit_id... | [] | NaN |
4118 | 0 | [] | NaN | 325140 | [] | en | Hum To Mohabbat Karega | Raju, a waiter, is in love with the famous TV ... | 0.001186 | [] | ... | 0.0 | [] | Released | NaN | Hum To Mohabbat Karega | 0.0 | 0 | [] | [] | NaN |
4123 | 7000000 | [{'id': 16, 'name': 'Animation'}, {'id': 10751... | http://www.roadsideromeo.com/ | 20653 | [] | en | Roadside Romeo | This is the story of Romeo. A dude who was liv... | 0.253595 | [{"name": "Walt Disney Pictures", "id": 2}, {"... | ... | 93.0 | [{"iso_639_1": "en", "name": "English"}, {"iso... | Released | NaN | Roadside Romeo | 6.7 | 3 | [{'cast_id': 1, 'character': 'Romeo', 'credit_... | [] | NaN |
4247 | 1 | [{'id': 10749, 'name': 'Romance'}, {'id': 35, ... | NaN | 361505 | [] | en | Me You and Five Bucks | A womanizing yet lovable loser, Charlie, a wai... | 0.094105 | [] | ... | 90.0 | [] | Released | A story about second, second chances | Me You and Five Bucks | 10.0 | 2 | [] | [] | NaN |
4305 | 0 | [{'id': 35, 'name': 'Comedy'}, {'id': 10402, '... | NaN | 114065 | [] | en | Down & Out With The Dolls | The raunchy, spunky tale of the rise and fall ... | 0.002386 | [] | ... | 88.0 | [] | Released | Ain't Rock 'N' Roll a bitch. | Down & Out With The Dolls | 0.0 | 0 | [] | [] | NaN |
4314 | 1200000 | [] | NaN | 137955 | [] | en | Crowsnest | In late summer of 2011, five young friends on ... | 0.057564 | [] | ... | 84.0 | [] | Released | NaN | Crowsnest | 4.8 | 12 | [] | [] | NaN |
4322 | 0 | [{'id': 99, 'name': 'Documentary'}] | NaN | 102840 | [] | en | Sex With Strangers | For some married couples, sex is an obsession ... | 0.014406 | [] | ... | 0.0 | [] | Released | NaN | Sex With Strangers | 5.0 | 1 | [] | [] | NaN |
4374 | 0 | [{'id': 35, 'name': 'Comedy'}] | NaN | 47686 | [{'id': 10183, 'name': 'independent film'}] | en | Dream with the Fishes | Terry is a suicidal voyeur who treats a dying ... | 0.948316 | [] | ... | 97.0 | [{"iso_639_1": "en", "name": "English"}] | Released | An oddball odyssey about voyeurism, LSD and nu... | Dream with the Fishes | 7.7 | 10 | [{'cast_id': 1, 'character': 'Terry', 'credit_... | [{'credit_id': '555e51909251417e5f000b42', 'de... | NaN |
4401 | 0 | [{'id': 28, 'name': 'Action'}, {'id': 35, 'nam... | NaN | 43630 | [] | en | The Helix... Loaded | 0.020600 | [] | ... | 97.0 | [{"iso_639_1": "en", "name": "English"}] | Rumored | NaN | The Helix... Loaded | 4.8 | 2 | [] | [] | NaN | |
4405 | 0 | [{'id': 10751, 'name': 'Family'}, {'id': 35, '... | https://www.epicbuzz.net/movies/karachi-se-lahore | 357441 | [] | en | Karachi se Lahore | A road trip from Karachi to Lahore where 5 fri... | 0.060003 | [] | ... | 0.0 | [{"iso_639_1": "ur", "name": "\u0627\u0631\u06... | Released | NaN | Karachi se Lahore | 8.0 | 1 | [{'cast_id': 0, 'character': '', 'credit_id': ... | [] | NaN |
4458 | 0 | [] | NaN | 279759 | [] | en | Harrison Montgomery | Film from Daniel Davila | 0.006943 | [] | ... | 0.0 | [] | Released | NaN | Harrison Montgomery | 0.0 | 0 | [] | [] | NaN |
4504 | 0 | [] | NaN | 331493 | [] | en | Light from the Darkroom | Light in the Darkroom is the story of two best... | 0.012942 | [] | ... | 0.0 | [] | Released | NaN | Light from the Darkroom | 0.0 | 0 | [] | [] | NaN |
4553 | 0 | [] | NaN | 380097 | [] | en | America Is Still the Place | 1971 post civil rights San Francisco seemed li... | 0.000000 | [] | ... | 0.0 | [] | Released | NaN | America Is Still the Place | 0.0 | 0 | [] | [] | NaN |
4562 | 500000 | [] | NaN | 297100 | [] | en | The Little Ponderosa Zoo | The Little Ponderosa Zoo is preparing for thei... | 0.073079 | [] | ... | 84.0 | [{"iso_639_1": "en", "name": "English"}] | Released | NaN | The Little Ponderosa Zoo | 2.0 | 1 | [] | [] | NaN |
4566 | 0 | [] | NaN | 325579 | [] | en | Diamond Ruff | Action - Orphan, con artist, crime boss and mi... | 0.165257 | [] | ... | 0.0 | [] | Released | NaN | Diamond Ruff | 2.4 | 4 | [] | [] | NaN |
4571 | 0 | [] | NaN | 328307 | [] | en | Rise of the Entrepreneur: The Search for a Bet... | The world is changing faster than ever. Techno... | 0.052942 | [] | ... | 0.0 | [] | Released | NaN | Rise of the Entrepreneur: The Search for a Bet... | 8.0 | 1 | [] | [] | NaN |
4583 | 0 | [{'id': 99, 'name': 'Documentary'}] | http://www.iwantyourmoney.net/ | 47546 | [] | en | I Want Your Money | Two versions of the American dream now stand i... | 0.084344 | [] | ... | 92.0 | [] | Released | The film contrasts two views of role that the ... | I Want Your Money | 3.8 | 5 | [] | [] | NaN |
4589 | 0 | [{'id': 18, 'name': 'Drama'}, {'id': 9648, 'na... | NaN | 43743 | [{'id': 10183, 'name': 'independent film'}] | en | Fabled | Joseph just broke up with his girlfriend and i... | 0.003352 | [] | ... | 84.0 | [] | Released | There once was a wolf named Lupold... | Fabled | 0.0 | 0 | [] | [] | NaN |
4633 | 0 | [] | NaN | 300327 | [] | en | Death Calls | An action-packed love story on the Mexican bor... | 0.005883 | [] | ... | 0.0 | [] | Released | NaN | Death Calls | 0.0 | 0 | [] | [] | NaN |
4638 | 300000 | [{'id': 18, 'name': 'Drama'}, {'id': 28, 'name... | NaN | 378237 | [] | en | Amidst the Devil's Wings | Prequel to "5th of a Degree." | 0.018087 | [{"name": "Daniel Columbie Films & Productions... | ... | 90.0 | [{"iso_639_1": "en", "name": "English"}] | Released | Prequel to "5th of a Degree." | Amidst the Devil's Wings | 0.0 | 0 | [] | [] | NaN |
4644 | 0 | [{'id': 27, 'name': 'Horror'}] | NaN | 325123 | [] | en | Teeth and Blood | A beautiful diva is murdered on the set of hor... | 0.055325 | [] | ... | 96.0 | [{"iso_639_1": "en", "name": "English"}] | Released | NaN | Teeth and Blood | 3.0 | 1 | [{'cast_id': 0, 'character': 'Vincent Augustin... | [] | NaN |
4657 | 0 | [] | NaN | 320435 | [] | en | UnDivided | UnDivided documents the true story of how a su... | 0.010607 | [] | ... | 0.0 | [] | Released | NaN | UnDivided | 0.0 | 0 | [] | [] | NaN |
4662 | 0 | [{'id': 35, 'name': 'Comedy'}] | NaN | 40963 | [{'id': 10183, 'name': 'independent film'}] | en | Little Big Top | An aging out of work clown returns to his smal... | 0.092100 | [{"name": "Fly High Films", "id": 24248}] | ... | 0.0 | [{"iso_639_1": "en", "name": "English"}] | Rumored | NaN | Little Big Top | 10.0 | 1 | [{'cast_id': 0, 'character': 'Seymour', 'credi... | [] | NaN |
4674 | 0 | [] | NaN | 194588 | [] | en | Short Cut to Nirvana: Kumbh Mela | Every 12 years over 70 million pilgrims gather... | 0.004998 | [] | ... | 85.0 | [] | Released | NaN | Short Cut to Nirvana: Kumbh Mela | 0.0 | 0 | [] | [] | NaN |
4716 | 0 | [] | NaN | 38786 | [] | en | The Blood of My Brother: A Story of Death in Iraq | THE BLOOD OF MY BROTHER goes behind the scenes... | 0.005256 | [] | ... | 90.0 | [] | Released | NaN | The Blood of My Brother: A Story of Death in Iraq | 0.0 | 0 | [] | [] | NaN |
30 rows × 23 columns
df2.loc[0, 'cast']
[{'cast_id': 242, 'character': 'Jake Sully', 'credit_id': '5602a8a7c3a3685532001c9a', 'gender': 2, 'id': 65731, 'name': 'Sam Worthington', 'order': 0}, {'cast_id': 3, 'character': 'Neytiri', 'credit_id': '52fe48009251416c750ac9cb', 'gender': 1, 'id': 8691, 'name': 'Zoe Saldana', 'order': 1}, {'cast_id': 25, 'character': 'Dr. Grace Augustine', 'credit_id': '52fe48009251416c750aca39', 'gender': 1, 'id': 10205, 'name': 'Sigourney Weaver', 'order': 2}, {'cast_id': 4, 'character': 'Col. Quaritch', 'credit_id': '52fe48009251416c750ac9cf', 'gender': 2, 'id': 32747, 'name': 'Stephen Lang', 'order': 3}, {'cast_id': 5, 'character': 'Trudy Chacon', 'credit_id': '52fe48009251416c750ac9d3', 'gender': 1, 'id': 17647, 'name': 'Michelle Rodriguez', 'order': 4}, {'cast_id': 8, 'character': 'Selfridge', 'credit_id': '52fe48009251416c750ac9e1', 'gender': 2, 'id': 1771, 'name': 'Giovanni Ribisi', 'order': 5}, {'cast_id': 7, 'character': 'Norm Spellman', 'credit_id': '52fe48009251416c750ac9dd', 'gender': 2, 'id': 59231, 'name': 'Joel David Moore', 'order': 6}, {'cast_id': 9, 'character': 'Moat', 'credit_id': '52fe48009251416c750ac9e5', 'gender': 1, 'id': 30485, 'name': 'CCH Pounder', 'order': 7}, {'cast_id': 11, 'character': 'Eytukan', 'credit_id': '52fe48009251416c750ac9ed', 'gender': 2, 'id': 15853, 'name': 'Wes Studi', 'order': 8}, {'cast_id': 10, 'character': "Tsu'Tey", 'credit_id': '52fe48009251416c750ac9e9', 'gender': 2, 'id': 10964, 'name': 'Laz Alonso', 'order': 9}, {'cast_id': 12, 'character': 'Dr. Max Patel', 'credit_id': '52fe48009251416c750ac9f1', 'gender': 2, 'id': 95697, 'name': 'Dileep Rao', 'order': 10}, {'cast_id': 13, 'character': 'Lyle Wainfleet', 'credit_id': '52fe48009251416c750ac9f5', 'gender': 2, 'id': 98215, 'name': 'Matt Gerald', 'order': 11}, {'cast_id': 32, 'character': 'Private Fike', 'credit_id': '52fe48009251416c750aca5b', 'gender': 2, 'id': 154153, 'name': 'Sean Anthony Moran', 'order': 12}, {'cast_id': 33, 'character': 'Cryo Vault Med Tech', 'credit_id': '52fe48009251416c750aca5f', 'gender': 2, 'id': 397312, 'name': 'Jason Whyte', 'order': 13}, {'cast_id': 34, 'character': 'Venture Star Crew Chief', 'credit_id': '52fe48009251416c750aca63', 'gender': 2, 'id': 42317, 'name': 'Scott Lawrence', 'order': 14}, {'cast_id': 35, 'character': 'Lock Up Trooper', 'credit_id': '52fe48009251416c750aca67', 'gender': 2, 'id': 986734, 'name': 'Kelly Kilgour', 'order': 15}, {'cast_id': 36, 'character': 'Shuttle Pilot', 'credit_id': '52fe48009251416c750aca6b', 'gender': 0, 'id': 1207227, 'name': 'James Patrick Pitt', 'order': 16}, {'cast_id': 37, 'character': 'Shuttle Co-Pilot', 'credit_id': '52fe48009251416c750aca6f', 'gender': 0, 'id': 1180936, 'name': 'Sean Patrick Murphy', 'order': 17}, {'cast_id': 38, 'character': 'Shuttle Crew Chief', 'credit_id': '52fe48009251416c750aca73', 'gender': 2, 'id': 1019578, 'name': 'Peter Dillon', 'order': 18}, {'cast_id': 39, 'character': 'Tractor Operator / Troupe', 'credit_id': '52fe48009251416c750aca77', 'gender': 0, 'id': 91443, 'name': 'Kevin Dorman', 'order': 19}, {'cast_id': 40, 'character': 'Dragon Gunship Pilot', 'credit_id': '52fe48009251416c750aca7b', 'gender': 2, 'id': 173391, 'name': 'Kelson Henderson', 'order': 20}, {'cast_id': 41, 'character': 'Dragon Gunship Gunner', 'credit_id': '52fe48009251416c750aca7f', 'gender': 0, 'id': 1207236, 'name': 'David Van Horn', 'order': 21}, {'cast_id': 42, 'character': 'Dragon Gunship Navigator', 'credit_id': '52fe48009251416c750aca83', 'gender': 0, 'id': 215913, 'name': 'Jacob Tomuri', 'order': 22}, {'cast_id': 43, 'character': 'Suit #1', 'credit_id': '52fe48009251416c750aca87', 'gender': 0, 'id': 143206, 'name': 'Michael Blain-Rozgay', 'order': 23}, {'cast_id': 44, 'character': 'Suit #2', 'credit_id': '52fe48009251416c750aca8b', 'gender': 2, 'id': 169676, 'name': 'Jon Curry', 'order': 24}, {'cast_id': 46, 'character': 'Ambient Room Tech', 'credit_id': '52fe48009251416c750aca8f', 'gender': 0, 'id': 1048610, 'name': 'Luke Hawker', 'order': 25}, {'cast_id': 47, 'character': 'Ambient Room Tech / Troupe', 'credit_id': '52fe48009251416c750aca93', 'gender': 0, 'id': 42288, 'name': 'Woody Schultz', 'order': 26}, {'cast_id': 48, 'character': 'Horse Clan Leader', 'credit_id': '52fe48009251416c750aca97', 'gender': 2, 'id': 68278, 'name': 'Peter Mensah', 'order': 27}, {'cast_id': 49, 'character': 'Link Room Tech', 'credit_id': '52fe48009251416c750aca9b', 'gender': 0, 'id': 1207247, 'name': 'Sonia Yee', 'order': 28}, {'cast_id': 50, 'character': 'Basketball Avatar / Troupe', 'credit_id': '52fe48009251416c750aca9f', 'gender': 1, 'id': 1207248, 'name': 'Jahnel Curfman', 'order': 29}, {'cast_id': 51, 'character': 'Basketball Avatar', 'credit_id': '52fe48009251416c750acaa3', 'gender': 0, 'id': 89714, 'name': 'Ilram Choi', 'order': 30}, {'cast_id': 52, 'character': "Na'vi Child", 'credit_id': '52fe48009251416c750acaa7', 'gender': 0, 'id': 1207249, 'name': 'Kyla Warren', 'order': 31}, {'cast_id': 53, 'character': 'Troupe', 'credit_id': '52fe48009251416c750acaab', 'gender': 0, 'id': 1207250, 'name': 'Lisa Roumain', 'order': 32}, {'cast_id': 54, 'character': 'Troupe', 'credit_id': '52fe48009251416c750acaaf', 'gender': 1, 'id': 83105, 'name': 'Debra Wilson', 'order': 33}, {'cast_id': 57, 'character': 'Troupe', 'credit_id': '52fe48009251416c750acabb', 'gender': 0, 'id': 1207253, 'name': 'Chris Mala', 'order': 34}, {'cast_id': 55, 'character': 'Troupe', 'credit_id': '52fe48009251416c750acab3', 'gender': 0, 'id': 1207251, 'name': 'Taylor Kibby', 'order': 35}, {'cast_id': 56, 'character': 'Troupe', 'credit_id': '52fe48009251416c750acab7', 'gender': 0, 'id': 1207252, 'name': 'Jodie Landau', 'order': 36}, {'cast_id': 58, 'character': 'Troupe', 'credit_id': '52fe48009251416c750acabf', 'gender': 0, 'id': 1207254, 'name': 'Julie Lamm', 'order': 37}, {'cast_id': 59, 'character': 'Troupe', 'credit_id': '52fe48009251416c750acac3', 'gender': 0, 'id': 1207257, 'name': 'Cullen B. Madden', 'order': 38}, {'cast_id': 60, 'character': 'Troupe', 'credit_id': '52fe48009251416c750acac7', 'gender': 0, 'id': 1207259, 'name': 'Joseph Brady Madden', 'order': 39}, {'cast_id': 61, 'character': 'Troupe', 'credit_id': '52fe48009251416c750acacb', 'gender': 0, 'id': 1207262, 'name': 'Frankie Torres', 'order': 40}, {'cast_id': 62, 'character': 'Troupe', 'credit_id': '52fe48009251416c750acacf', 'gender': 1, 'id': 1158600, 'name': 'Austin Wilson', 'order': 41}, {'cast_id': 63, 'character': 'Troupe', 'credit_id': '52fe48019251416c750acad3', 'gender': 1, 'id': 983705, 'name': 'Sara Wilson', 'order': 42}, {'cast_id': 64, 'character': 'Troupe', 'credit_id': '52fe48019251416c750acad7', 'gender': 0, 'id': 1207263, 'name': 'Tamica Washington-Miller', 'order': 43}, {'cast_id': 65, 'character': 'Op Center Staff', 'credit_id': '52fe48019251416c750acadb', 'gender': 1, 'id': 1145098, 'name': 'Lucy Briant', 'order': 44}, {'cast_id': 66, 'character': 'Op Center Staff', 'credit_id': '52fe48019251416c750acadf', 'gender': 2, 'id': 33305, 'name': 'Nathan Meister', 'order': 45}, {'cast_id': 67, 'character': 'Op Center Staff', 'credit_id': '52fe48019251416c750acae3', 'gender': 0, 'id': 1207264, 'name': 'Gerry Blair', 'order': 46}, {'cast_id': 68, 'character': 'Op Center Staff', 'credit_id': '52fe48019251416c750acae7', 'gender': 2, 'id': 33311, 'name': 'Matthew Chamberlain', 'order': 47}, {'cast_id': 69, 'character': 'Op Center Staff', 'credit_id': '52fe48019251416c750acaeb', 'gender': 0, 'id': 1207265, 'name': 'Paul Yates', 'order': 48}, {'cast_id': 70, 'character': 'Op Center Duty Officer', 'credit_id': '52fe48019251416c750acaef', 'gender': 0, 'id': 1207266, 'name': 'Wray Wilson', 'order': 49}, {'cast_id': 71, 'character': 'Op Center Staff', 'credit_id': '52fe48019251416c750acaf3', 'gender': 2, 'id': 54492, 'name': 'James Gaylyn', 'order': 50}, {'cast_id': 72, 'character': 'Dancer', 'credit_id': '52fe48019251416c750acaf7', 'gender': 0, 'id': 1207267, 'name': 'Melvin Leno Clark III', 'order': 51}, {'cast_id': 73, 'character': 'Dancer', 'credit_id': '52fe48019251416c750acafb', 'gender': 0, 'id': 1207268, 'name': 'Carvon Futrell', 'order': 52}, {'cast_id': 74, 'character': 'Dancer', 'credit_id': '52fe48019251416c750acaff', 'gender': 0, 'id': 1207269, 'name': 'Brandon Jelkes', 'order': 53}, {'cast_id': 75, 'character': 'Dancer', 'credit_id': '52fe48019251416c750acb03', 'gender': 0, 'id': 1207270, 'name': 'Micah Moch', 'order': 54}, {'cast_id': 76, 'character': 'Dancer', 'credit_id': '52fe48019251416c750acb07', 'gender': 0, 'id': 1207271, 'name': 'Hanniyah Muhammad', 'order': 55}, {'cast_id': 77, 'character': 'Dancer', 'credit_id': '52fe48019251416c750acb0b', 'gender': 0, 'id': 1207272, 'name': 'Christopher Nolen', 'order': 56}, {'cast_id': 78, 'character': 'Dancer', 'credit_id': '52fe48019251416c750acb0f', 'gender': 0, 'id': 1207273, 'name': 'Christa Oliver', 'order': 57}, {'cast_id': 79, 'character': 'Dancer', 'credit_id': '52fe48019251416c750acb13', 'gender': 0, 'id': 1207274, 'name': 'April Marie Thomas', 'order': 58}, {'cast_id': 80, 'character': 'Dancer', 'credit_id': '52fe48019251416c750acb17', 'gender': 0, 'id': 1207275, 'name': 'Bravita A. Threatt', 'order': 59}, {'cast_id': 81, 'character': 'Mining Chief (uncredited)', 'credit_id': '52fe48019251416c750acb1b', 'gender': 0, 'id': 1207276, 'name': 'Colin Bleasdale', 'order': 60}, {'cast_id': 82, 'character': 'Veteran Miner (uncredited)', 'credit_id': '52fe48019251416c750acb1f', 'gender': 0, 'id': 107969, 'name': 'Mike Bodnar', 'order': 61}, {'cast_id': 83, 'character': 'Richard (uncredited)', 'credit_id': '52fe48019251416c750acb23', 'gender': 0, 'id': 1207278, 'name': 'Matt Clayton', 'order': 62}, {'cast_id': 84, 'character': "Nav'i (uncredited)", 'credit_id': '52fe48019251416c750acb27', 'gender': 1, 'id': 147898, 'name': 'Nicole Dionne', 'order': 63}, {'cast_id': 85, 'character': 'Trooper (uncredited)', 'credit_id': '52fe48019251416c750acb2b', 'gender': 0, 'id': 1207280, 'name': 'Jamie Harrison', 'order': 64}, {'cast_id': 86, 'character': 'Trooper (uncredited)', 'credit_id': '52fe48019251416c750acb2f', 'gender': 0, 'id': 1207281, 'name': 'Allan Henry', 'order': 65}, {'cast_id': 87, 'character': 'Ground Technician (uncredited)', 'credit_id': '52fe48019251416c750acb33', 'gender': 2, 'id': 1207282, 'name': 'Anthony Ingruber', 'order': 66}, {'cast_id': 88, 'character': 'Flight Crew Mechanic (uncredited)', 'credit_id': '52fe48019251416c750acb37', 'gender': 0, 'id': 1207283, 'name': 'Ashley Jeffery', 'order': 67}, {'cast_id': 14, 'character': 'Samson Pilot', 'credit_id': '52fe48009251416c750ac9f9', 'gender': 0, 'id': 98216, 'name': 'Dean Knowsley', 'order': 68}, {'cast_id': 89, 'character': 'Trooper (uncredited)', 'credit_id': '52fe48019251416c750acb3b', 'gender': 0, 'id': 1201399, 'name': 'Joseph Mika-Hunt', 'order': 69}, {'cast_id': 90, 'character': 'Banshee (uncredited)', 'credit_id': '52fe48019251416c750acb3f', 'gender': 0, 'id': 236696, 'name': 'Terry Notary', 'order': 70}, {'cast_id': 91, 'character': 'Soldier (uncredited)', 'credit_id': '52fe48019251416c750acb43', 'gender': 0, 'id': 1207287, 'name': 'Kai Pantano', 'order': 71}, {'cast_id': 92, 'character': 'Blast Technician (uncredited)', 'credit_id': '52fe48019251416c750acb47', 'gender': 0, 'id': 1207288, 'name': 'Logan Pithyou', 'order': 72}, {'cast_id': 93, 'character': 'Vindum Raah (uncredited)', 'credit_id': '52fe48019251416c750acb4b', 'gender': 0, 'id': 1207289, 'name': 'Stuart Pollock', 'order': 73}, {'cast_id': 94, 'character': 'Hero (uncredited)', 'credit_id': '52fe48019251416c750acb4f', 'gender': 0, 'id': 584868, 'name': 'Raja', 'order': 74}, {'cast_id': 95, 'character': 'Ops Centreworker (uncredited)', 'credit_id': '52fe48019251416c750acb53', 'gender': 0, 'id': 1207290, 'name': 'Gareth Ruck', 'order': 75}, {'cast_id': 96, 'character': 'Engineer (uncredited)', 'credit_id': '52fe48019251416c750acb57', 'gender': 0, 'id': 1062463, 'name': 'Rhian Sheehan', 'order': 76}, {'cast_id': 97, 'character': "Col. Quaritch's Mech Suit (uncredited)", 'credit_id': '52fe48019251416c750acb5b', 'gender': 0, 'id': 60656, 'name': 'T. J. Storm', 'order': 77}, {'cast_id': 98, 'character': 'Female Marine (uncredited)', 'credit_id': '52fe48019251416c750acb5f', 'gender': 0, 'id': 1207291, 'name': 'Jodie Taylor', 'order': 78}, {'cast_id': 99, 'character': 'Ikran Clan Leader (uncredited)', 'credit_id': '52fe48019251416c750acb63', 'gender': 1, 'id': 1186027, 'name': 'Alicia Vela-Bailey', 'order': 79}, {'cast_id': 100, 'character': 'Geologist (uncredited)', 'credit_id': '52fe48019251416c750acb67', 'gender': 0, 'id': 1207292, 'name': 'Richard Whiteside', 'order': 80}, {'cast_id': 101, 'character': "Na'vi (uncredited)", 'credit_id': '52fe48019251416c750acb6b', 'gender': 0, 'id': 103259, 'name': 'Nikie Zambo', 'order': 81}, {'cast_id': 102, 'character': 'Ambient Room Tech / Troupe', 'credit_id': '52fe48019251416c750acb6f', 'gender': 1, 'id': 42286, 'name': 'Julene Renee', 'order': 82}]
df2.loc[0, 'genres']
[{'id': 28, 'name': 'Action'}, {'id': 12, 'name': 'Adventure'}, {'id': 14, 'name': 'Fantasy'}, {'id': 878, 'name': 'Science Fiction'}]
df2.loc[0, 'keywords']
[{'id': 1463, 'name': 'culture clash'}, {'id': 2964, 'name': 'future'}, {'id': 3386, 'name': 'space war'}, {'id': 3388, 'name': 'space colony'}, {'id': 3679, 'name': 'society'}, {'id': 3801, 'name': 'space travel'}, {'id': 9685, 'name': 'futuristic'}, {'id': 9840, 'name': 'romance'}, {'id': 9882, 'name': 'space'}, {'id': 9951, 'name': 'alien'}, {'id': 10148, 'name': 'tribe'}, {'id': 10158, 'name': 'alien planet'}, {'id': 10987, 'name': 'cgi'}, {'id': 11399, 'name': 'marine'}, {'id': 13065, 'name': 'soldier'}, {'id': 14643, 'name': 'battle'}, {'id': 14720, 'name': 'love affair'}, {'id': 165431, 'name': 'anti war'}, {'id': 193554, 'name': 'power relations'}, {'id': 206690, 'name': 'mind and soul'}, {'id': 209714, 'name': '3d'}]
# 처음 3개의 데이터 중에서 name 에 해당하는 value 만 추출
def get_list(x):
if isinstance(x, list): # list 타입인지 우선 확인
names = [i['name'] for i in x]
if len(names) > 3:
names = names[:3]
return names
return []
features = ['cast', 'keywords', 'genres']
for feature in features:
df2[feature] = df2[feature].apply(get_list)
# 데이터 많으니까 head(3)로 확인
# 각 행에 cast, keywords, genres가 3개씩 name이 구성된다는걸 확인
df2[['title', 'cast', 'director', 'keywords', 'genres']].head(3)
title | cast | director | keywords | genres | |
---|---|---|---|---|---|
0 | Avatar | [Sam Worthington, Zoe Saldana, Sigourney Weaver] | James Cameron | [culture clash, future, space war] | [Action, Adventure, Fantasy] |
1 | Pirates of the Caribbean: At World's End | [Johnny Depp, Orlando Bloom, Keira Knightley] | Gore Verbinski | [ocean, drug abuse, exotic island] | [Adventure, Fantasy, Action] |
2 | Spectre | [Daniel Craig, Christoph Waltz, Léa Seydoux] | Sam Mendes | [spy, based on novel, secret agent] | [Action, Adventure, Crime] |
# 소문자로 바꾸고, 빈칸 없애는 함수
def clean_data(x):
if isinstance(x, list): # list 타입
return [str.lower(i.replace(' ', '')) for i in x]
else:
if isinstance(x, str): # str 타입
return str.lower(x.replace(' ', ''))
else: # 그 외
return ''
features = ['cast', 'keywords', 'director', 'genres']
for feature in features:
df2[feature] = df2[feature].apply(clean_data)
df2[['title', 'cast', 'director', 'keywords', 'genres']].head(3)
title | cast | director | keywords | genres | |
---|---|---|---|---|---|
0 | Avatar | [samworthington, zoesaldana, sigourneyweaver] | jamescameron | [cultureclash, future, spacewar] | [action, adventure, fantasy] |
1 | Pirates of the Caribbean: At World's End | [johnnydepp, orlandobloom, keiraknightley] | goreverbinski | [ocean, drugabuse, exoticisland] | [adventure, fantasy, action] |
2 | Spectre | [danielcraig, christophwaltz, léaseydoux] | sammendes | [spy, basedonnovel, secretagent] | [action, adventure, crime] |
# 위의 데이터들이 콤마 없이 띄어 쓰기로..
def create_soup(x):
return ' '.join(x['keywords']) + ' ' + ' '.join(x['cast']) + ' ' + x['director'] + ' ' + ' '.join(x['genres'])
df2['soup'] = df2.apply(create_soup, axis=1)
df2['soup'] # 새로 만든 'soup' 컬럼
0 cultureclash future spacewar samworthington zo... 1 ocean drugabuse exoticisland johnnydepp orland... 2 spy basedonnovel secretagent danielcraig chris... 3 dccomics crimefighter terrorist christianbale ... 4 basedonnovel mars medallion taylorkitsch lynnc... ... 4798 unitedstates–mexicobarrier legs arms carlosgal... 4799 edwardburns kerrybishé marshadietlein edwardb... 4800 date loveatfirstsight narration ericmabius kri... 4801 danielhenney elizacoupe billpaxton danielhsia 4802 obsession camcorder crush drewbarrymore brianh... Name: soup, Length: 4803, dtype: object
줄거리에서 사용한 TfidfVectorizer
방식은 필요 없는 영어를 삭제하였다.
그러나 지금은 그럴필요가 없이 순수하게 단어들 카운트를 할것이기 때문에 CountVectorizer
방식을 사용한다.
from sklearn.feature_extraction.text import CountVectorizer
count = CountVectorizer(stop_words='english') # 혹시몰라 추가함(stop_words)
count_matrix = count.fit_transform(df2['soup'])
count_matrix
<4803x11520 sparse matrix of type '<class 'numpy.int64'>' with 42935 stored elements in Compressed Sparse Row format>
앞에서 코사인 유사도로 linear_kernel 함수
를 사용했는데 이번엔 cosine_similarity 함수
로 사용하겠다.
# 신뢰도 - 코사인 유사도
from sklearn.metrics.pairwise import cosine_similarity
cosine_sim2 = cosine_similarity(count_matrix, count_matrix)
cosine_sim2
array([[1. , 0.3, 0.2, ..., 0. , 0. , 0. ], [0.3, 1. , 0.2, ..., 0. , 0. , 0. ], [0.2, 0.2, 1. , ..., 0. , 0. , 0. ], ..., [0. , 0. , 0. , ..., 1. , 0. , 0. ], [0. , 0. , 0. , ..., 0. , 1. , 0. ], [0. , 0. , 0. , ..., 0. , 0. , 1. ]])
indices['Avatar'] # 앞에서 사용했던 영화 제목을 통해 index구하는 법
0
# 안해도 되긴 하는데, 혹시나 꼬였을 경우 다시 indices를 생성
df2 = df2.reset_index()
indices = pd.Series(df2.index, index=df2['title'])
indices
title Avatar 0 Pirates of the Caribbean: At World's End 1 Spectre 2 The Dark Knight Rises 3 John Carter 4 ... El Mariachi 4798 Newlyweds 4799 Signed, Sealed, Delivered 4800 Shanghai Calling 4801 My Date with Drew 4802 Length: 4803, dtype: int64
get_recommendations('The Dark Knight Rises', cosine_sim2)
65 The Dark Knight 119 Batman Begins 4638 Amidst the Devil's Wings 1196 The Prestige 3073 Romeo Is Bleeding 3326 Black November 1503 Takers 1986 Faster 303 Catwoman 747 Gangster Squad Name: title, dtype: object
get_recommendations('Up', cosine_sim2)
231 Monsters, Inc. 1983 Meet the Deedles 3403 Alpha and Omega: The Legend of the Saw Tooth Cave 3114 Elsa & Fred 1580 The Nut Job 3670 Running Forever 4709 A Charlie Brown Christmas 40 Cars 2 42 Toy Story 3 77 Inside Out Name: title, dtype: object
get_recommendations('The Martian', cosine_sim2)
4 John Carter 95 Interstellar 365 Contact 256 Allegiant 1326 The 5th Wave 1958 On the Road 3043 End of the Spear 3373 The Other Side of Heaven 3392 Gerry 3698 Moby Dick Name: title, dtype: object
indices['The Martian'] # 영화 정보 보기위해 index 구하기
270
df2.loc[270] # 마션 영화 데이터
index 270 budget 108000000 genres [drama, adventure, sciencefiction] homepage http://www.foxmovies.com/movies/the-martian id 286217 keywords [basedonnovel, mars, nasa] original_language en original_title The Martian overview During a manned mission to Mars, Astronaut Mar... popularity 167.93287 production_companies [{"name": "Twentieth Century Fox Film Corporat... production_countries [{"iso_3166_1": "US", "name": "United States o... release_date 2015-09-30 revenue 630161890 runtime 141.0 spoken_languages [{"iso_639_1": "en", "name": "English"}, {"iso... status Released tagline Bring Him Home title The Martian vote_average 7.6 vote_count 7268 cast [mattdamon, jessicachastain, kristenwiig] crew [{'credit_id': '5607a7e19251413050003e2c', 'de... director ridleyscott soup basedonnovel mars nasa mattdamon jessicachasta... Name: 270, dtype: object
# 확인해 보면 장르, 키워드 등등 비슷한게 겹쳐서 유사도가 높게 나온거라 판단
df2.loc[4] # 마션 검색시 추천으로 높게나온 'John Carter' 영화 데이터
index 4 budget 260000000 genres [action, adventure, sciencefiction] homepage http://movies.disney.com/john-carter id 49529 keywords [basedonnovel, mars, medallion] original_language en original_title John Carter overview John Carter is a war-weary, former military ca... popularity 43.926995 production_companies [{"name": "Walt Disney Pictures", "id": 2}] production_countries [{"iso_3166_1": "US", "name": "United States o... release_date 2012-03-07 revenue 284139100 runtime 132.0 spoken_languages [{"iso_639_1": "en", "name": "English"}] status Released tagline Lost in our world, found in another. title John Carter vote_average 6.1 vote_count 2124 cast [taylorkitsch, lynncollins, samanthamorton] crew [{'credit_id': '52fe479ac3a36847f813eaa3', 'de... director andrewstanton soup basedonnovel mars medallion taylorkitsch lynnc... Name: 4, dtype: object
get_recommendations('The Avengers', cosine_sim2)
7 Avengers: Age of Ultron 26 Captain America: Civil War 79 Iron Man 2 169 Captain America: The First Avenger 174 The Incredible Hulk 85 Captain America: The Winter Soldier 31 Iron Man 3 33 X-Men: The Last Stand 68 Iron Man 94 Guardians of the Galaxy Name: title, dtype: object
import pickle
df2.head(3)
index | budget | genres | homepage | id | keywords | original_language | original_title | overview | popularity | ... | spoken_languages | status | tagline | title | vote_average | vote_count | cast | crew | director | soup | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 237000000 | [action, adventure, fantasy] | http://www.avatarmovie.com/ | 19995 | [cultureclash, future, spacewar] | en | Avatar | In the 22nd century, a paraplegic Marine is di... | 150.437577 | ... | [{"iso_639_1": "en", "name": "English"}, {"iso... | Released | Enter the World of Pandora. | Avatar | 7.2 | 11800 | [samworthington, zoesaldana, sigourneyweaver] | [{'credit_id': '52fe48009251416c750aca23', 'de... | jamescameron | cultureclash future spacewar samworthington zo... |
1 | 1 | 300000000 | [adventure, fantasy, action] | http://disney.go.com/disneypictures/pirates/ | 285 | [ocean, drugabuse, exoticisland] | en | Pirates of the Caribbean: At World's End | Captain Barbossa, long believed to be dead, ha... | 139.082615 | ... | [{"iso_639_1": "en", "name": "English"}] | Released | At the end of the world, the adventure begins. | Pirates of the Caribbean: At World's End | 6.9 | 4500 | [johnnydepp, orlandobloom, keiraknightley] | [{'credit_id': '52fe4232c3a36847f800b579', 'de... | goreverbinski | ocean drugabuse exoticisland johnnydepp orland... |
2 | 2 | 245000000 | [action, adventure, crime] | http://www.sonypictures.com/movies/spectre/ | 206647 | [spy, basedonnovel, secretagent] | en | Spectre | A cryptic message from Bond’s past sends him o... | 107.376788 | ... | [{"iso_639_1": "fr", "name": "Fran\u00e7ais"},... | Released | A Plan No One Escapes | Spectre | 6.3 | 4466 | [danielcraig, christophwaltz, léaseydoux] | [{'credit_id': '54805967c3a36829b5002c41', 'de... | sammendes | spy basedonnovel secretagent danielcraig chris... |
3 rows × 25 columns
movies = df2[['id', 'title']].copy()
movies.head(5)
id | title | |
---|---|---|
0 | 19995 | Avatar |
1 | 285 | Pirates of the Caribbean: At World's End |
2 | 206647 | Spectre |
3 | 49026 | The Dark Knight Rises |
4 | 49529 | John Carter |
# 영화 데이터
pickle.dump(movies, open('movies.pickle', 'wb'))
# 코사인 유사도 데이터
pickle.dump(cosine_sim2, open('cosine_sim.pickle', 'wb'))
참고자료 : 나도코딩-유튜브
추천!
댓글남기기