"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "N_MBLA_ZtV6z"
},
"source": [
"## Introduccion a pandas. Titanic Dataset"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "QbG7tHAZvhMe"
},
"source": [
"Clicar aquí para abrir el cuaderno en google Colab. \n",
"[](https://colab.research.google.com/github/mrBronnWow/Curso_Beginners/blob/master/1_Dataset_titanic/Introduccion_a_pandas.ipynb)\n",
"\n",
"Toca sin miedo, te aseguramos que no se romperá. "
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "W5jMyPO73ZIp"
},
"source": [
"### Qué es pandas y porqué es tan interesante aprenderlo"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "MsJ6weem3o8p"
},
"source": [
"`pandas` es una biblioteca de software escrita como extensión de NumPy para manipulación y análisis de datos para el lenguaje de programación Python. En particular, ofrece estructuras de datos y operaciones para manipular tablas numéricas y series temporales.\n",
"\n",
"Dicho de un modo más sencillo, pandas nos permite realizar operaciones sobre tablas, como hariamos en Excel. **La gran ventaja, es que puede manejar volúmenes de datos mucho más grande que Excel, y de forma más rápida.**\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "kCOIdR-24Cm7"
},
"source": [
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Empezamos"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "UO5wgEntwkTT"
},
"source": [
"### Cargamos la libreria de pandas\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "1G1ev8z2xZfL"
},
"source": [
"Para empezar a utilizar pandas, lo primero que debemos hacer es cargar la librería"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"id": "s_sAIPnwxaxO"
},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "BMZO8Cp-xeqm"
},
"source": [
"Truco, para ejecutar cada celda de código puedes pulsar `Shift+Enter`. \n",
"\n",
"\n",
"```\n",
"import pandas \n",
"```\n",
"carga le dice al notebook que vamos a utilizar esta librería, y con\n",
"```\n",
"as pd\n",
"```\n",
"le indicamos un alias, un mote, para poder llamar a la librería como `pd` , y de esta forma que el código quede más limpio. Podriamos haberla llamada `panditas`, pero por convención la llamamos `pd`.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "WYtUddUA2KVl"
},
"source": [
"### Cargamos el CSV en pandas"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "3lxWr_vlxE1J"
},
"source": [
"Hemos elegido el mítico dataset del Titanic. Contiene información sobre los pasajeros del naufragado barco. Este dataset lo hemos escogido por uno de loa tradicionales para empezar en el mundo del Machine Learning o ML. Pero de momento nos centraremos en hacer un *Exploratory Data Analysis* o EDA.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "P70DMgecvyEp"
},
"source": [
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "p5YHCAvZ4XIz"
},
"source": [
"Vamos a cargar el archivo `test.csv`. Para ello debemos cargar este archivo en Google Colab, y lo haremos de la siguiente forma. "
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 197
},
"id": "CRcFutTj_ErF",
"outputId": "bad5d2f8-25cd-41e0-de59-4bbbd954851f"
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
PassengerId
\n",
"
Survived
\n",
"
Pclass
\n",
"
Name
\n",
"
Sex
\n",
"
Age
\n",
"
SibSp
\n",
"
Parch
\n",
"
Ticket
\n",
"
Fare
\n",
"
Cabin
\n",
"
Embarked
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
3
\n",
"
Braund, Mr. Owen Harris
\n",
"
male
\n",
"
22.0
\n",
"
1
\n",
"
0
\n",
"
A/5 21171
\n",
"
7.2500
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
1
\n",
"
2
\n",
"
1
\n",
"
1
\n",
"
Cumings, Mrs. John Bradley (Florence Briggs Th...
\n",
"
female
\n",
"
38.0
\n",
"
1
\n",
"
0
\n",
"
PC 17599
\n",
"
71.2833
\n",
"
C85
\n",
"
C
\n",
"
\n",
"
\n",
"
2
\n",
"
3
\n",
"
1
\n",
"
3
\n",
"
Heikkinen, Miss. Laina
\n",
"
female
\n",
"
26.0
\n",
"
0
\n",
"
0
\n",
"
STON/O2. 3101282
\n",
"
7.9250
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
3
\n",
"
4
\n",
"
1
\n",
"
1
\n",
"
Futrelle, Mrs. Jacques Heath (Lily May Peel)
\n",
"
female
\n",
"
35.0
\n",
"
1
\n",
"
0
\n",
"
113803
\n",
"
53.1000
\n",
"
C123
\n",
"
S
\n",
"
\n",
"
\n",
"
4
\n",
"
5
\n",
"
0
\n",
"
3
\n",
"
Allen, Mr. William Henry
\n",
"
male
\n",
"
35.0
\n",
"
0
\n",
"
0
\n",
"
373450
\n",
"
8.0500
\n",
"
NaN
\n",
"
S
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" PassengerId Survived Pclass ... Fare Cabin Embarked\n",
"0 1 0 3 ... 7.2500 NaN S\n",
"1 2 1 1 ... 71.2833 C85 C\n",
"2 3 1 3 ... 7.9250 NaN S\n",
"3 4 1 1 ... 53.1000 C123 S\n",
"4 5 0 3 ... 8.0500 NaN S\n",
"\n",
"[5 rows x 12 columns]"
]
},
"execution_count": 2,
"metadata": {
"tags": []
},
"output_type": "execute_result"
}
],
"source": [
"df = pd.read_csv('https://raw.githubusercontent.com/mrBronnWow/Curso_Beginners/main/1_Dataset_titanic/train.csv')\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "7aopYb6Z12Od"
},
"source": [
"Esta linea lee una URL de un repositorio en Github, pero no te preocupes por ello ahora. Si tienes curiosidad por saberqué significa cada columna, puedes consultar la web de kaggle [https://www.kaggle.com/c/titanic/overview](https://www.kaggle.com/c/titanic/overview)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "MrnzA1tO2QN8"
},
"source": [
"Con la sentencia `pd.head()` podemos ver las primeras 5 lineas de nuestro dataset, es decir, la cabecera. ¿Adivinas qué hará `pd.tail()` ? Escribelo en la siguiente celda y ejecutalo ;)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"id": "lU1by5li4iwq"
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "SeWfVhgp3CBi"
},
"source": [
"podemos comprobar qué tipo de variable es `df`"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "sKBJGDFh3Isg",
"outputId": "93fa6b30-017d-4738-ea60-b909d0bf0164"
},
"outputs": [
{
"data": {
"text/plain": [
"pandas.core.frame.DataFrame"
]
},
"execution_count": 3,
"metadata": {
"tags": []
},
"output_type": "execute_result"
}
],
"source": [
"type(df)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "SXDYKlQS3K5P"
},
"source": [
"Es un pandas dataframe! es el objeto básico de `pandas`, es una forma de trabajar con tablas dentro de Python. "
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "4gRmUBfY4m5b"
},
"source": [
"### Obteniendo informacion sobre el dataset"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "EiILt-TL514P",
"outputId": "ce57b8ae-2a40-47c1-e8eb-24015df0b231"
},
"outputs": [
{
"data": {
"text/plain": [
"(891, 12)"
]
},
"execution_count": 4,
"metadata": {
"tags": []
},
"output_type": "execute_result"
}
],
"source": [
"df.shape"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "pn7TgLtn52FW"
},
"source": [
"La propiedad shape del objeto df, nos informa del numero de filas y columnas del dataset. \n",
" que (por eso se escribe sin parentesis al final. "
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "IDxvIJwj6KtI"
},
"source": [
"
\n",
"Tip: shape es una propiedad del objeto dataframe, por eso no tiene paréntesis al final. Los métodos en cambio, terminan en (), como veremos a continuación. Pero por ahora no te preocupes de eso.\n",
"
"
],
"text/plain": [
" PassengerId Survived Pclass ... SibSp Parch Fare\n",
"count 891.000000 891.000000 891.000000 ... 891.000000 891.000000 891.000000\n",
"mean 446.000000 0.383838 2.308642 ... 0.523008 0.381594 32.204208\n",
"std 257.353842 0.486592 0.836071 ... 1.102743 0.806057 49.693429\n",
"min 1.000000 0.000000 1.000000 ... 0.000000 0.000000 0.000000\n",
"25% 223.500000 0.000000 2.000000 ... 0.000000 0.000000 7.910400\n",
"50% 446.000000 0.000000 3.000000 ... 0.000000 0.000000 14.454200\n",
"75% 668.500000 1.000000 3.000000 ... 1.000000 0.000000 31.000000\n",
"max 891.000000 1.000000 3.000000 ... 8.000000 6.000000 512.329200\n",
"\n",
"[8 rows x 7 columns]"
]
},
"execution_count": 5,
"metadata": {
"tags": []
},
"output_type": "execute_result"
}
],
"source": [
"df.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "QmD6_coC2jDb"
},
"source": [
"Si algunas vez tienes dudas de cómo usar un comando, una sentencia, siempre puedes escribir `?` despues del comando y ejecutar la celda, esto abrirá la documentación. "
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"id": "NIAHEPHR2sx1"
},
"outputs": [],
"source": [
"df.describe?"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "wgIEFIrL2z0v",
"outputId": "607318a0-b0c0-4767-9f54-241fe03cf223"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"RangeIndex: 891 entries, 0 to 890\n",
"Data columns (total 12 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 PassengerId 891 non-null int64 \n",
" 1 Survived 891 non-null int64 \n",
" 2 Pclass 891 non-null int64 \n",
" 3 Name 891 non-null object \n",
" 4 Sex 891 non-null object \n",
" 5 Age 714 non-null float64\n",
" 6 SibSp 891 non-null int64 \n",
" 7 Parch 891 non-null int64 \n",
" 8 Ticket 891 non-null object \n",
" 9 Fare 891 non-null float64\n",
" 10 Cabin 204 non-null object \n",
" 11 Embarked 889 non-null object \n",
"dtypes: float64(2), int64(5), object(5)\n",
"memory usage: 83.7+ KB\n"
]
}
],
"source": [
"df.info()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5JV5J0jO5PYn"
},
"source": [
"Este comando nos indican información sobre qué tipo de datos contiene cada una de las columnas. En este caso tenemos integer, float o object.\n",
"\n",
"Los object es la forma genérica que tiene `pandas` de decirnos que no sabe descifrar qué tipo de dato es, y la guarda como cadena de texto, un `string`.\n",
"\n",
"la columna Non-null nos dice cuantas celdas con contenido hay en cada columna. Observa que de 891 filas (entries), la columna 11, Embarked, tiene solo 889 datos no nulos. En las columna`Age` y `Cabin`, tampoco tenemos 891 registros, sino 714 y 204 respecticamente. ¡La columna cabin está casi vacía!"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "brWbG28r8wBd"
},
"source": [
"### Accediendo a los datos"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "fi66ewwI8zoJ"
},
"source": [
"en `pandas` tenemos dos formas de acceder a la información de las columnas. "
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "EQ6kNWp88zU4",
"outputId": "2cf840da-adcf-490b-b1fb-32ddf0fa4365"
},
"outputs": [
{
"data": {
"text/plain": [
"Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',\n",
" 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],\n",
" dtype='object')"
]
},
"execution_count": 8,
"metadata": {
"tags": []
},
"output_type": "execute_result"
}
],
"source": [
"#nos dice los nombres de las columnas presentes en el df\n",
"df.columns"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "gPIvrItx9HJk"
},
"source": [
"Si quiero acceder solo a la columna Name, por ejemplo"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "OaihiKIo5HlX",
"outputId": "c2047664-2f85-4f4a-c607-2ca3a305f686"
},
"outputs": [
{
"data": {
"text/plain": [
"0 Braund, Mr. Owen Harris\n",
"1 Cumings, Mrs. John Bradley (Florence Briggs Th...\n",
"2 Heikkinen, Miss. Laina\n",
"3 Futrelle, Mrs. Jacques Heath (Lily May Peel)\n",
"4 Allen, Mr. William Henry\n",
" ... \n",
"886 Montvila, Rev. Juozas\n",
"887 Graham, Miss. Margaret Edith\n",
"888 Johnston, Miss. Catherine Helen \"Carrie\"\n",
"889 Behr, Mr. Karl Howell\n",
"890 Dooley, Mr. Patrick\n",
"Name: Name, Length: 891, dtype: object"
]
},
"execution_count": 9,
"metadata": {
"tags": []
},
"output_type": "execute_result"
}
],
"source": [
"df.Name"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "7iOVlYVx9NuO"
},
"source": [
"Tambien puedo acceder de esta forma. Lo vereis mucho. "
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "FCJK9_j55F-T",
"outputId": "8cbbc252-8920-4063-adce-873d40eee97c"
},
"outputs": [
{
"data": {
"text/plain": [
"0 Braund, Mr. Owen Harris\n",
"1 Cumings, Mrs. John Bradley (Florence Briggs Th...\n",
"2 Heikkinen, Miss. Laina\n",
"3 Futrelle, Mrs. Jacques Heath (Lily May Peel)\n",
"4 Allen, Mr. William Henry\n",
" ... \n",
"886 Montvila, Rev. Juozas\n",
"887 Graham, Miss. Margaret Edith\n",
"888 Johnston, Miss. Catherine Helen \"Carrie\"\n",
"889 Behr, Mr. Karl Howell\n",
"890 Dooley, Mr. Patrick\n",
"Name: Name, Length: 891, dtype: object"
]
},
"execution_count": 10,
"metadata": {
"tags": []
},
"output_type": "execute_result"
}
],
"source": [
"df['Name']"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ZB6j79qc9UG8"
},
"source": [
"Si quisiera acceder a más de una columna, debemos empacar los nombres de las columnas desealas en una lista"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 406
},
"id": "8ZOxVnAv9Wx_",
"outputId": "2f547c68-828d-4bf9-f377-77306c620dde"
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Name
\n",
"
Age
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Braund, Mr. Owen Harris
\n",
"
22.0
\n",
"
\n",
"
\n",
"
1
\n",
"
Cumings, Mrs. John Bradley (Florence Briggs Th...
\n",
"
38.0
\n",
"
\n",
"
\n",
"
2
\n",
"
Heikkinen, Miss. Laina
\n",
"
26.0
\n",
"
\n",
"
\n",
"
3
\n",
"
Futrelle, Mrs. Jacques Heath (Lily May Peel)
\n",
"
35.0
\n",
"
\n",
"
\n",
"
4
\n",
"
Allen, Mr. William Henry
\n",
"
35.0
\n",
"
\n",
"
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
\n",
"
\n",
"
886
\n",
"
Montvila, Rev. Juozas
\n",
"
27.0
\n",
"
\n",
"
\n",
"
887
\n",
"
Graham, Miss. Margaret Edith
\n",
"
19.0
\n",
"
\n",
"
\n",
"
888
\n",
"
Johnston, Miss. Catherine Helen \"Carrie\"
\n",
"
NaN
\n",
"
\n",
"
\n",
"
889
\n",
"
Behr, Mr. Karl Howell
\n",
"
26.0
\n",
"
\n",
"
\n",
"
890
\n",
"
Dooley, Mr. Patrick
\n",
"
32.0
\n",
"
\n",
" \n",
"
\n",
"
891 rows × 2 columns
\n",
"
"
],
"text/plain": [
" Name Age\n",
"0 Braund, Mr. Owen Harris 22.0\n",
"1 Cumings, Mrs. John Bradley (Florence Briggs Th... 38.0\n",
"2 Heikkinen, Miss. Laina 26.0\n",
"3 Futrelle, Mrs. Jacques Heath (Lily May Peel) 35.0\n",
"4 Allen, Mr. William Henry 35.0\n",
".. ... ...\n",
"886 Montvila, Rev. Juozas 27.0\n",
"887 Graham, Miss. Margaret Edith 19.0\n",
"888 Johnston, Miss. Catherine Helen \"Carrie\" NaN\n",
"889 Behr, Mr. Karl Howell 26.0\n",
"890 Dooley, Mr. Patrick 32.0\n",
"\n",
"[891 rows x 2 columns]"
]
},
"execution_count": 11,
"metadata": {
"tags": []
},
"output_type": "execute_result"
}
],
"source": [
"df[['Name', 'Age']]"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "fljdCjMKJVid"
},
"source": [
"Una vez hayamos seleccionado los datos, podemos realizar operaciones sobre ellos, como `sum`, `mean`, `max' entre muchas otras. "
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "wdPSUr-aJjFA"
},
"source": [
"
\n",
"Tip: Buscar en google \"como hacer en pandas tal cosa\" será tu mejor aliado en esta etapa de aprendizaje. Si tienes preguntas, no dudes en usar el foro de la plataforma de EducAltran\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "w9CgdsagF6UH"
},
"source": [
"### Sacando informacion del dataset"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "qdSR4KjD_oXO"
},
"source": [
"La columna Survived nos dice si este pasajero sobrevivió al naufragio (valor 1) o por el contrario falleció (valor 0)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 406
},
"id": "NzzD7j_zAGAB",
"outputId": "c7794f45-cf36-4bb4-d21f-e067be0e8018"
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
PassengerId
\n",
"
Survived
\n",
"
Pclass
\n",
"
Name
\n",
"
Sex
\n",
"
Age
\n",
"
SibSp
\n",
"
Parch
\n",
"
Ticket
\n",
"
Fare
\n",
"
Cabin
\n",
"
Embarked
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
3
\n",
"
Braund, Mr. Owen Harris
\n",
"
male
\n",
"
22.0
\n",
"
1
\n",
"
0
\n",
"
A/5 21171
\n",
"
7.2500
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
1
\n",
"
2
\n",
"
1
\n",
"
1
\n",
"
Cumings, Mrs. John Bradley (Florence Briggs Th...
\n",
"
female
\n",
"
38.0
\n",
"
1
\n",
"
0
\n",
"
PC 17599
\n",
"
71.2833
\n",
"
C85
\n",
"
C
\n",
"
\n",
"
\n",
"
2
\n",
"
3
\n",
"
1
\n",
"
3
\n",
"
Heikkinen, Miss. Laina
\n",
"
female
\n",
"
26.0
\n",
"
0
\n",
"
0
\n",
"
STON/O2. 3101282
\n",
"
7.9250
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
3
\n",
"
4
\n",
"
1
\n",
"
1
\n",
"
Futrelle, Mrs. Jacques Heath (Lily May Peel)
\n",
"
female
\n",
"
35.0
\n",
"
1
\n",
"
0
\n",
"
113803
\n",
"
53.1000
\n",
"
C123
\n",
"
S
\n",
"
\n",
"
\n",
"
4
\n",
"
5
\n",
"
0
\n",
"
3
\n",
"
Allen, Mr. William Henry
\n",
"
male
\n",
"
35.0
\n",
"
0
\n",
"
0
\n",
"
373450
\n",
"
8.0500
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
\n",
"
\n",
"
886
\n",
"
887
\n",
"
0
\n",
"
2
\n",
"
Montvila, Rev. Juozas
\n",
"
male
\n",
"
27.0
\n",
"
0
\n",
"
0
\n",
"
211536
\n",
"
13.0000
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
887
\n",
"
888
\n",
"
1
\n",
"
1
\n",
"
Graham, Miss. Margaret Edith
\n",
"
female
\n",
"
19.0
\n",
"
0
\n",
"
0
\n",
"
112053
\n",
"
30.0000
\n",
"
B42
\n",
"
S
\n",
"
\n",
"
\n",
"
888
\n",
"
889
\n",
"
0
\n",
"
3
\n",
"
Johnston, Miss. Catherine Helen \"Carrie\"
\n",
"
female
\n",
"
NaN
\n",
"
1
\n",
"
2
\n",
"
W./C. 6607
\n",
"
23.4500
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
889
\n",
"
890
\n",
"
1
\n",
"
1
\n",
"
Behr, Mr. Karl Howell
\n",
"
male
\n",
"
26.0
\n",
"
0
\n",
"
0
\n",
"
111369
\n",
"
30.0000
\n",
"
C148
\n",
"
C
\n",
"
\n",
"
\n",
"
890
\n",
"
891
\n",
"
0
\n",
"
3
\n",
"
Dooley, Mr. Patrick
\n",
"
male
\n",
"
32.0
\n",
"
0
\n",
"
0
\n",
"
370376
\n",
"
7.7500
\n",
"
NaN
\n",
"
Q
\n",
"
\n",
" \n",
"
\n",
"
891 rows × 12 columns
\n",
"
"
],
"text/plain": [
" PassengerId Survived Pclass ... Fare Cabin Embarked\n",
"0 1 0 3 ... 7.2500 NaN S\n",
"1 2 1 1 ... 71.2833 C85 C\n",
"2 3 1 3 ... 7.9250 NaN S\n",
"3 4 1 1 ... 53.1000 C123 S\n",
"4 5 0 3 ... 8.0500 NaN S\n",
".. ... ... ... ... ... ... ...\n",
"886 887 0 2 ... 13.0000 NaN S\n",
"887 888 1 1 ... 30.0000 B42 S\n",
"888 889 0 3 ... 23.4500 NaN S\n",
"889 890 1 1 ... 30.0000 C148 C\n",
"890 891 0 3 ... 7.7500 NaN Q\n",
"\n",
"[891 rows x 12 columns]"
]
},
"execution_count": 12,
"metadata": {
"tags": []
},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "7PuN-hsdAlA3",
"outputId": "d94e3e34-b729-4ece-e46d-123a2e7f4cb5"
},
"outputs": [
{
"data": {
"text/plain": [
"0 549\n",
"1 342\n",
"Name: Survived, dtype: int64"
]
},
"execution_count": 13,
"metadata": {
"tags": []
},
"output_type": "execute_result"
}
],
"source": [
"df['Survived'].value_counts()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "oGTc1E2uAi6E"
},
"source": [
"De esta forma hemos seleccionado la columna Survived, y le hemos pedido que nos devuelva cuantos valores hay contenidos. Hay valores 0 y 1. Hay 549 filas con valore 0, y 342 con valor 1 . Es decir, sobrevivieron 342 personas, y fallecieron 549."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "rKANqS-N_oFy"
},
"source": [
"Para terminar, vamos a ver si la expresión \"¡Mujeres y niños primero!\" fue una realidad. ¿Sobrevivieron más mujeres que hombres?"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "OkFZHqvQBfyd"
},
"source": [
"Ahora podemos realizar nuestro primer filtro!"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 406
},
"id": "ddWxVomsBjRU",
"outputId": "97e84b54-c9fd-4cea-b3a6-44c02318c156"
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
PassengerId
\n",
"
Survived
\n",
"
Pclass
\n",
"
Name
\n",
"
Sex
\n",
"
Age
\n",
"
SibSp
\n",
"
Parch
\n",
"
Ticket
\n",
"
Fare
\n",
"
Cabin
\n",
"
Embarked
\n",
"
\n",
" \n",
" \n",
"
\n",
"
1
\n",
"
2
\n",
"
1
\n",
"
1
\n",
"
Cumings, Mrs. John Bradley (Florence Briggs Th...
\n",
"
female
\n",
"
38.0
\n",
"
1
\n",
"
0
\n",
"
PC 17599
\n",
"
71.2833
\n",
"
C85
\n",
"
C
\n",
"
\n",
"
\n",
"
2
\n",
"
3
\n",
"
1
\n",
"
3
\n",
"
Heikkinen, Miss. Laina
\n",
"
female
\n",
"
26.0
\n",
"
0
\n",
"
0
\n",
"
STON/O2. 3101282
\n",
"
7.9250
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
3
\n",
"
4
\n",
"
1
\n",
"
1
\n",
"
Futrelle, Mrs. Jacques Heath (Lily May Peel)
\n",
"
female
\n",
"
35.0
\n",
"
1
\n",
"
0
\n",
"
113803
\n",
"
53.1000
\n",
"
C123
\n",
"
S
\n",
"
\n",
"
\n",
"
8
\n",
"
9
\n",
"
1
\n",
"
3
\n",
"
Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)
\n",
"
female
\n",
"
27.0
\n",
"
0
\n",
"
2
\n",
"
347742
\n",
"
11.1333
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
9
\n",
"
10
\n",
"
1
\n",
"
2
\n",
"
Nasser, Mrs. Nicholas (Adele Achem)
\n",
"
female
\n",
"
14.0
\n",
"
1
\n",
"
0
\n",
"
237736
\n",
"
30.0708
\n",
"
NaN
\n",
"
C
\n",
"
\n",
"
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
\n",
"
\n",
"
880
\n",
"
881
\n",
"
1
\n",
"
2
\n",
"
Shelley, Mrs. William (Imanita Parrish Hall)
\n",
"
female
\n",
"
25.0
\n",
"
0
\n",
"
1
\n",
"
230433
\n",
"
26.0000
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
882
\n",
"
883
\n",
"
0
\n",
"
3
\n",
"
Dahlberg, Miss. Gerda Ulrika
\n",
"
female
\n",
"
22.0
\n",
"
0
\n",
"
0
\n",
"
7552
\n",
"
10.5167
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
885
\n",
"
886
\n",
"
0
\n",
"
3
\n",
"
Rice, Mrs. William (Margaret Norton)
\n",
"
female
\n",
"
39.0
\n",
"
0
\n",
"
5
\n",
"
382652
\n",
"
29.1250
\n",
"
NaN
\n",
"
Q
\n",
"
\n",
"
\n",
"
887
\n",
"
888
\n",
"
1
\n",
"
1
\n",
"
Graham, Miss. Margaret Edith
\n",
"
female
\n",
"
19.0
\n",
"
0
\n",
"
0
\n",
"
112053
\n",
"
30.0000
\n",
"
B42
\n",
"
S
\n",
"
\n",
"
\n",
"
888
\n",
"
889
\n",
"
0
\n",
"
3
\n",
"
Johnston, Miss. Catherine Helen \"Carrie\"
\n",
"
female
\n",
"
NaN
\n",
"
1
\n",
"
2
\n",
"
W./C. 6607
\n",
"
23.4500
\n",
"
NaN
\n",
"
S
\n",
"
\n",
" \n",
"
\n",
"
314 rows × 12 columns
\n",
"
"
],
"text/plain": [
" PassengerId Survived Pclass ... Fare Cabin Embarked\n",
"1 2 1 1 ... 71.2833 C85 C\n",
"2 3 1 3 ... 7.9250 NaN S\n",
"3 4 1 1 ... 53.1000 C123 S\n",
"8 9 1 3 ... 11.1333 NaN S\n",
"9 10 1 2 ... 30.0708 NaN C\n",
".. ... ... ... ... ... ... ...\n",
"880 881 1 2 ... 26.0000 NaN S\n",
"882 883 0 3 ... 10.5167 NaN S\n",
"885 886 0 3 ... 29.1250 NaN Q\n",
"887 888 1 1 ... 30.0000 B42 S\n",
"888 889 0 3 ... 23.4500 NaN S\n",
"\n",
"[314 rows x 12 columns]"
]
},
"execution_count": 14,
"metadata": {
"tags": []
},
"output_type": "execute_result"
}
],
"source": [
"df_females = df[ df['Sex'] == 'female']\n",
"df_females"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "foOP-kedEzjl"
},
"source": [
"El dataset df_females solo contiene las filas de mujeres. "
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "IVksjSYZBfMa",
"outputId": "b270630b-237b-439c-d035-fc6fe01398ff"
},
"outputs": [
{
"data": {
"text/plain": [
"233"
]
},
"execution_count": 15,
"metadata": {
"tags": []
},
"output_type": "execute_result"
}
],
"source": [
"df_females['Survived'].sum()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "lbGQ1jjwE9Tn"
},
"source": [
"De las 314 mujeres que tenemos información, 233 sobrevivieron. "
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "4ST5RB7UKEzA"
},
"source": [
"Para ver cuantos hombres y mujeres tenemos en el dataset"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "gU2KecarKEN2",
"outputId": "55b5ad9d-551a-4b27-991e-8bfec85936c0"
},
"outputs": [
{
"data": {
"text/plain": [
"male 577\n",
"female 314\n",
"Name: Sex, dtype: int64"
]
},
"execution_count": 21,
"metadata": {
"tags": []
},
"output_type": "execute_result"
}
],
"source": [
"df['Sex'].value_counts()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "TpTjASjMCBId"
},
"source": [
"Podemos hacer este proceso en una sola línea, para los hombres"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "m0PdnnhGCAsd",
"outputId": "bf5e05f4-48f5-4b60-aef6-b0f0ce4da2b6"
},
"outputs": [
{
"data": {
"text/plain": [
"109"
]
},
"execution_count": 16,
"metadata": {
"tags": []
},
"output_type": "execute_result"
}
],
"source": [
"df[ df['Sex'] == 'male']['Survived'].sum()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "VV5TVFmMFFU0"
},
"source": [
"De 577 los hombres que tenemos información, solo 109 sobrevivieron"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "PE4zqCDbKNOL"
},
"source": [
"Un simple calculo nos dirá que el ratio de supervivencia para hombres y mujeres:"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "v0uADi5MKSjp",
"outputId": "2a2b26ea-196a-4dca-f92a-933c68356cdc"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Los hombres tenian una prob de 0.1854419410745234 de sobrevivir\n",
"Mientras que la de las mujeres fue de 0.7420382165605095\n"
]
}
],
"source": [
"rat_men = 107/577\n",
"rat_women = 233/314\n",
"\n",
"print(f' Los hombres tenian una prob de {rat_men} de sobrevivir')\n",
"print(f'Mientras que la de las mujeres fue de {rat_women}')"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "GKIX6ta2KolH"
},
"source": [
"## (opcional) Para ampliar conocimientos"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "xJY1j79qCUSl"
},
"source": [
"Incluso podriamos crear una tabla pivote, pero no te preocupes ahora por esto, te lo mostramos para que conozcas las posibilidades"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 137
},
"id": "mfAWq-3sHuz4",
"outputId": "6f4b7627-4431-4ad0-954b-7b37a3cf1e98"
},
"outputs": [
{
"data": {
"text/html": [
"