What I remember from my Data Science skills interviews - Part 2
This is a continuation from my previous post (What I remember from my Data Science skills interviews - Part 1).
Note: The example below is not an exact replica of any test I have done.
#A FUNCTION THAT FINDS THE NUMBER OF FILES WITH SAY BYTE SIZE GREATER THAN 5000, RETURNS THE NUMBER OF SUCH FILES IN ONE ROW AND THE SUM TOTAL OF THE BYTES ON THE NEXT ROW.
IT TAKES IN A FILE THAT MIGHT LOOK SOMETHING LIKE THIS AS THE INPUT.
(All the code is in R)
file<-c('good "2021-12-31T19:27:58.900z" 200 117',
'goodi "2021-12-31T19:08:31.760z" 200 4567',
'goodie "2021-12-31T16:56:14.124z" 200 5436',
'goods "2021-12-31T15:54:07.963z" 200 5453',
'goodle "2021-12-31T15:35:30.504z" 200 143')
#The function
library(dplyr) #for its verbs and the pipe operator
filename = function(k){
#Split the strings at white spaces
st_split <- strsplit(k, split = " ")
#Convert to a dataframe by rows using do.call
st_df <- data.frame(do.call(rbind,st_split))
#Rename the columns - may skip this step
names(st_df) <- c("Good", "Date_Plus", "Other","Bytes")
#Convert bytes into numeric for manipulation
st_df$Bytes <- as.numeric(st_df$Bytes)
#Filter for "files" of byte size greater than 5000
large_bytes <- st_df%>%
filter(Bytes>5000)
#Print the number of such files and total sum bytes size
rbind(nrow(large_bytes), sum(large_bytes$Bytes))
}
An example with the file we created earlier
filename(file)
The example returns the following
Happy exploring!!
~NMN
Comments
Post a Comment