What I remember from my Data Science skills interviews - Part 2

This is a continuation from my previous post (What I remember from my Data Science skills interviews - Part 1). 

Note: The example below is not an exact replica of any test I have done.

 

#A FUNCTION THAT FINDS THE NUMBER OF FILES WITH SAY BYTE SIZE GREATER THAN 5000, RETURNS THE NUMBER OF SUCH FILES  IN ONE ROW AND THE SUM TOTAL OF THE BYTES ON THE NEXT ROW.

IT TAKES IN A FILE THAT MIGHT LOOK SOMETHING LIKE THIS AS THE INPUT.

(All the code is in R)

file<-c('good "2021-12-31T19:27:58.900z" 200 117',

        'goodi "2021-12-31T19:08:31.760z" 200 4567',

        'goodie "2021-12-31T16:56:14.124z" 200 5436',

        'goods "2021-12-31T15:54:07.963z" 200 5453',

        'goodle "2021-12-31T15:35:30.504z" 200 143')


#The function

library(dplyr) #for its verbs and the pipe operator


filename = function(k){

  #Split the strings at white spaces

  st_split <- strsplit(k, split = " ")

   #Convert to a dataframe by rows using do.call

  st_df <- data.frame(do.call(rbind,st_split))

   #Rename the columns - may skip this step 

  names(st_df) <- c("Good", "Date_Plus", "Other","Bytes")

  #Convert bytes into numeric for manipulation

  st_df$Bytes <- as.numeric(st_df$Bytes)

  #Filter for "files" of byte size greater than 5000

  large_bytes <- st_df%>%

    filter(Bytes>5000)

   #Print the number of such files and total sum bytes size

  rbind(nrow(large_bytes), sum(large_bytes$Bytes))

}


An example with the file we created earlier

filename(file)

The example returns the following




Happy exploring!!

~NMN

Comments

Popular posts from this blog

Financial Mathematics CT-1 Finally Paid Off

Data Scientist Courses (edX vs DatCamp)

Self Joins in R