Define a function to analyze a numpy array and define a function to analyze car dataset using pandas

[ad_1]
Q1. Define a function to analyze a numpy array Assume we have an array which contains term frequency of each document. Where each row is a document, each column is a word, and the value denotes the frequency of the word in the document. Define a function named “analyze_tf” which: has two input parameters: a rank 2 input array a parameter “binary” with a default value set to False does the following steps in sequence: if “binary” is True, binarizes the input array, i.e. if a value is greater than 1, change it to 1. normalizes the frequency of each word as: word frequency divided by the length of the document (i.e. sum of each row). Save the result as an array named tf (i.e. term frequency). The sum of each row of tf should be 1. calculates the document frequency (df) of each word, i.e. how many documents contain a specific word calculate the inverse document frequency (idf) of each word as N/df (df divided by N) where N is the number of documents calculates tf_idf array as: tf * log (idf) (tf multiply the log (base e) of idf ). The reason is, if a word appears in most documents, it does not have the discriminative power and often is called a “stop” word. The inverse of df can downgrade the weight of such words. returns the tf_idf array. Note, for all the steps, do not use any loop. Just use array functions and broadcasting for high performance computation. Q2. Define a function to analyze car dataset using pandas Define a function named “analyze_cars” to do the follows: Take a csv file path string as an input. Assume the csv file is in the format of the provided sample file. Read the csv file as a dataframe with the first row as column names Find cars with top 3 mpg among those of origin = 1. Print the names (i.e. “car” column) and mpg of these three cars. Create a new column called “brand” to store the brand name as the first word in “car” column (hint: use “apply” function) Show the mean, min, and max mpg values for each of these brands: “ford”, “buick” and “honda” Create a cross tab to show the average mpg of each brand and each origin value. Use “brand” as row index and “origin” as column index. This function does not have any return. Just print out the result of each calculation step. Submission Guideline Following the solution template provided below. Use main block to test your functions # Structure of your solution to Assignment 1 import numpy as np def analyze_tf(arr, binary=False): tf_idf=None # add your code return tf_idf def analyze_cars(filepath): # add your code

What Students Are Saying About Us

.......... Customer ID: 12*** | Rating: ⭐⭐⭐⭐⭐
"Honestly, I was afraid to send my paper to you, but splendidwritings.com proved they are a trustworthy service. My essay was done in less than a day, and I received a brilliant piece. I didn’t even believe it was my essay at first 🙂 Great job, thank you!"

.......... Customer ID: 14***| Rating: ⭐⭐⭐⭐⭐
"The company has some nice prices and good content. I ordered a term paper here and got a very good one. I'll keep ordering from this website."

"Order a Custom Paper on Similar Assignment! No Plagiarism! Enjoy 20% Discount"