You are building a predictive solution based on web server log data. The data is collected in a comma-separated values (CSV) format that always includes the following fields: date: string time: string client_ip: string server_ip: string url_stem: string url_query: string client_bytes: integer server_bytes: integer You want to load the data into a DataFrame for analysis. You must load the data in the correct format while minimizing the processing overhead on the Spark cluster. What should you do? Load the data as lines of text into an RDD, then split the text based on a comma-delimiter and load the RDD into a DataFrame. Define a schema for the data, then read the data from the CSV file into a DataFrame using the schema. Read the data from the CSV file into a DataFrame, infering the schema. Convert the data to tab-delimited format, then read the data from the text file into a DataFrame, infering the schema.

Question

litttyyyu8116 litttyyyu8116

20-05-2020
Computers and Technology

contestada

You are building a predictive solution based on web server log data. The data is collected in a comma-separated values (CSV) format that always includes the following fields: date: string time: string client_ip: string server_ip: string url_stem: string url_query: string client_bytes: integer server_bytes: integer You want to load the data into a DataFrame for analysis. You must load the data in the correct format while minimizing the processing overhead on the Spark cluster. What should you do? Load the data as lines of text into an RDD, then split the text based on a comma-delimiter and load the RDD into a DataFrame. Define a schema for the data, then read the data from the CSV file into a DataFrame using the schema. Read the data from the CSV file into a DataFrame, infering the schema. Convert the data to tab-delimited format, then read the data from the text file into a DataFrame, infering the schema.

Respuesta :

RELAXING NOICE

Otras preguntas

advantage of transport

What do Scientists think may happen to you if you entered a black hole?

What did Annie Easley do in collage?

Which characteristic is common to all three groups of mammals? They require little or no parental care. They develop partially or fully inside the mother’s body

What the answer for the bottom now now

Lana is putting lace trim around the border of a circular tablecloth. The tablecloth has a diameter of 1.2 meters. To the nearest meter, what is the least amoun

Use the binomial squares pattern to multiply (6u3−v)2.

I WILL GIVE BRAINLIEST!! Just need help for questions 88 and 90.

Can someone help me with a couple of words that I can use thank you

In the additive color model, what is the default color when nothing is added?? Grey White Brown or black

kendrich kendrich · Answer 1 · 2020-05-21T02:14:44+02:00

Answer:

see explaination

Explanation:

The data is collected in a comma-separated values (CSV) format that always includes the following fields:

? date: string

? time: string

? client_ip: string

? server_ip: string

? url_stem: string

? url_query: string

? client_bytes: integer

? server_bytes: integer

What should you do?

a. Load the data as lines of text into an RDD, then split the text based on a comma-delimiter and load the RDD into DataFrame.

# import the module csv

import csv

import pandas as pd

# open the csv file

with open(r"C:\Users\uname\Downloads\abc.csv") as csv_file:

# read the csv file

csv_reader = csv.reader(csv_file, delimiter=',')

# now we can use this csv files into the pandas

df = pd.DataFrame([csv_reader], index=None)

df.head()

b. Define a schema for the data, then read the data from the CSV file into a DataFrame using the schema.

from pyspark.sql.types import *

from pyspark.sql import SparkSession

newschema = StructType([

StructField("date", DateType(),true),

StructField("time", DateType(),true),

StructField("client_ip", StringType(),true),

StructField("server_ip", StringType(),true),

StructField("url_stem", StringType(),true),

StructField("url_query", StringType(),true),

StructField("client_bytes", IntegerType(),true),

StructField("server_bytes", IntegerType(),true])

c. Read the data from the CSV file into a DataFrame, infering the schema.

abc_DF = spark.read.load('C:\Users\uname\Downloads\new_abc.csv', format="csv", header="true", sep=' ', schema=newSchema)

d. Convert the data to tab-delimited format, then read the data from the text file into a DataFrame, infering the schema.

Import pandas as pd

Df2 = pd.read_csv(‘new_abc.csv’,delimiter="\t")

print('Contents of Dataframe : ')

print(Df2)