Ceci est le troisième article d'une série. Liens vers les articles précédents: premier , deuxième
Dans cet article, je vais vous expliquer comment travailler avec la bibliothèque Pandas pour créer un arbre de décision.
3.1 Importer la bibliothèque
# pandas , pd
import pandas as pd
3.2 Trame de données et série
Pandas utilise des structures telles que la trame de données et la série.
Jetons un coup d'œil au tableau de type Excel suivant.
Une ligne de données est appelée Série, les colonnes sont appelées les attributs de ces données et la table entière est appelée le bloc de données.
3.3 Créer un cadre de données
Nous connectons une feuille de calcul Excel en utilisant read_excel ou ExcelWriter
# Excel , ipynb
df0 = pd.read_excel("data_golf.xlsx")
# DataFrame HTML
from IPython.display import HTML
html = "<div style='font-family:\"メイリオ\";'>"+df0.to_html()+"</div>"
HTML(html)
# Excel (with f.close)
with pd.ExcelWriter("data_golf2.xlsx") as f:
df0.to_excel(f)
Création d'une trame de données à partir d'un dictionnaire (tableau associatif): le dictionnaire rassemble les données des colonnes DataFrame
# :
d = {
"":["","","","","","","","","","","","","",""],
"":["","","","","","","","","","","","","",""],
"":["","","","","","","","","","","","","",""],
"":["","","","","","","","","","","","","",""],
"":["×","×","○","○","○","×","○","×","○","○","○","○","○","×"],
}
df0 = pd.DataFrame(d)
Création de trames de données à partir de tableaux: collecte de données à partir de lignes DataFrame
# :
d = [["","","","","×"],
["","","","","×"],
["","","","","○"],
["","","","","○"],
["","","","","○"],
["","","","","×"],
["","","","","○"],
["","","","","×"],
["","","","","○"],
["","","","","○"],
["","","","","○"],
["","","","","○"],
["","","","","○"],
["","","","","×"],
]
# columns index . , , .
df0 = pd.DataFrame(d,columns=["","","","",""],index=range(len(d)))
3.4 Obtenir des informations à partir du tableau
#
#
print(df0.shape) # (14, 5)
#
print(df0.shape[0]) # 14
#
print(df0.columns) # Index(['', '', '', '', ''], dtype='object')
# ( df0 - )
print(df0.index) # RangeIndex(start=0, stop=14, step=1)
3.5 Récupération des valeurs loc iloc
#
# ,
# №1 ( )
print(df0.loc[1,""]) #
# ,
# 1,2,4, Data Frame-
df = df0.loc[[1,2,4],["",""]]
print(df)
#
#
# 1 ×
# 2 ○
# 3 ○
# 4 ○
# iloc . 0.
# 1 3, . iloc , 1:4, 4- .
df = df0.iloc[1:4,:-1]
print(df)
#
#
# 1
# 2
# 3
# (Series)
# . s Series
s = df0.iloc[0,:]
# , , s[" "]
print(s[""]) #
# (numpy.ndarray).
print(df0.values)
3.6 Parcourir les données en utilisant iterrows iteritems
# ,
# . .
for i,row in df0.iterrows():
# i ( ), row Series
print(i,row)
pass
# . .
for i,col in df0.iteritems():
# i , col Series
print(i,col)
pass
3.7 Fréquence des value_counts
#
# . s Series
s = df0.loc[:,""]
#
print(s.value_counts())
#
# 5
# 5
# 4
# Name: , dtype: int64
# , , “”
print(s.value_counts()[""]) # 5
3.8 Récupération de données de requête spécifiques
#
# , - .
print(df0.query("==''"))
#
#
# 0 ×
# 1 ×
# 7 ×
# 8 ○
# 10 ○
# , - ,
print(df0.query("=='' and =='○'"))
#
#
# 8 ○
# 10 ○
# , - ,
print(df0.query("=='' or =='○'"))
#
#
# 0 ×
# 1 ×
# 2 ○
# 3 ○
# 4 ○
# 6 ○
# 7 ×
# 8 ○
# 9 ○
# 10 ○
# 11 ○
# 12 ○
Merci d'avoir lu!
Nous serons très heureux si vous nous dites si vous avez aimé cet article, la traduction était-elle claire, vous a-t-elle été utile?