Authorship Attribution – Data Analysis Methods

Investigation14/04/2020RAHUL GOYALGOYRY004@MYMAIL.UNISA.EDU.AUContentsIntroduction ……………………………………………………………………………………………………………………………….. 2Methods for Data Analysis……………………………………………………………………………………………………………. 2Rapid Miner…………………………………………………………………………………………………………………………….. 2R Analytics………………………………………………………………………………………………………………………………. 3SAS ………………………………………………………………………………………………………………………………………… 3Advantages and Disadvantages …………………………………………………………………………………………………….. 4Rapid Miner…………………………………………………………………………………………………………………………….. 4Advantages………………………………………………………………………………………………………………………….. 4Disadvantage……………………………………………………………………………………………………………………….. 4R Analytics………………………………………………………………………………………………………………………………. 4Advantages………………………………………………………………………………………………………………………….. 4Disadvantages ……………………………………………………………………………………………………………………… 4SAS ………………………………………………………………………………………………………………………………………… 5Advantages………………………………………………………………………………………………………………………….. 5Disadvantage……………………………………………………………………………………………………………………….. 5Difference of these three Methods ……………………………………………………………………………………………….. 6Rapid Miner – Derives best outcome …………………………………………………………………………………………….. 7Authorship Attribution – Data Analysis MethodsIntroductionThis study has been conducted to find out the various methods of data analytics withthe purpose of Authorship Attribution. The study outlines 3 commonly used methodsalong with their strengths and weaknesses. It also suggests that which method weshould choose for data analysis purpose. (Juola, 2008)Methods for Data AnalysisIn this study I have selected three methods of data analysis in the case of AuthorshipAttribution. As shown in picture below these methods are “Rapid Manner”, “R” and“SAS”.Rapid Miner: Rapid Miner is the software developed by company with the samename i.e. “Rapid Miner”. It is the data science software platform which providesintegrated services ranging from machine learning, data preparation, text mining,deep learning, data analysis and productive analytics. Generally this method isadopted by the various governmental organizations and big business units for thepurpose of training, education, research, application development and radio“Rapid Miner”“SAS”“RAnalytics”prototyping functioning etc. This method is also giving us the facility of modelvalidation of data, optimization of data, results visualization. It is further assured bythe company that this very method gives 99% solutions in Authorship Attributionprocess by the way of template-based frameworks. It boosts the speed of entirefunctioning and makes reduction in errors. (Kannapiran, 2016)R Analytics: The second method of data analysis is “R Analytics”. This method isconsidered as modern technique which stands on foot front and overtakes thefeatures of SPSS and other traditional software packages. The software is not onlyused for data analysis but also used for creating application and software. Researcherreviewed that this method is somewhat difficult to adopt because it makes easythings harder and harder things easy so it is challenge for research to understand theworking of “R Analytics” for analysis in the field of Authorship Attribution. (Patil,2016)SAS: known for ‘Statistical Analysis Software’. It is the software which has theprovision of altering, mining, managing and retrieving data from various sources andfurther conducts the statistical analysis on this data. This software has two types ofsteps in which the whole process of data analysis works. In this one is DATA stepsand other is PROC steps. DATA is further divided into two phases’ execution andcompilation of data and it works for retrieving and manipulation of data whereas thetask of data analysis is performed in PROC. This function also works for displayingthe results, sorting data and various other operations. (Nielsen, 2011)Advantages and DisadvantagesRapid MinerAdvantagesa. Rapid miner has a number of provisions including rich sections of machinelearning algorithms for the purpose of data mining task. Along with this it hasspecified set of functions for pre-processing of data. Looking at the repositoryof the software then it is full of machine learning algorithms.b. It is user friendly so it is very easy to operate software of visual workflowdesigner. In the process of authorship attribution this model helps users indata preparation and modeling etc.c. Another benefit is that extension of software option is also available in it. Ifthere is “Missing Algorithm” then it can be installed from repository.(Devipriya B, 2019)Disadvantagea. The rapid miner graphics are old fashioned so it does not provide attractivepresentation of data. There is less capabilities of for the integration of thirdparty application.b. The use of media is quite difficult in this software. It is not friendly to accessvideos, audios and pictures etc. It is a challenge to make connection audiosand its text for the purpose of data analytics. (Devipriya B, 2019)R AnalyticsAdvantagesa. This is platform-independent functioning software which means it can workin any of the operating system such as Mac, Linux and Windows etc.b. The task of data wrangling is also given under R analytics. It give thestructure to the messy data by the way if its packages such as readr and dplyretc.Disadvantagesa. First limitation is related with the storage of data. In this software data isstored in physical form. Further it requires that the scattered data should beplaced at one single place. It is not considered as good option while dealingwith big data.b. The software has steep learning curve. It is not useful for the people who arenew to this software because it will not be understandable to them. (Patil,2016)SASAdvantagesa. This method is commonly used to handle large data bases. It gives supportfor heavy data storage and processing. An Authorship attribution cans easilybe done in this.b. The software duly tested and analyzed by the team of developers. Thealgorithms which are implemented in this are also checked with properworking. It gives complete confidentiality to their users.Disadvantagea. It is concerned that text mining is one of the major problems of SASsoftware. If look at R analytics then text mining option is free in that butthere is need of SAS enterprise in this.b. The graphic representation of this software is vivid in the case of usage ofgraphs, charts and diagrams. (Acock, 2005)Difference of these three MethodsThese methods of data analysis are perfect in their own ways, every method has itsown strengths and weaknesses which are compared as follows.
Rapid Miner
R Analytics
The method gives the option ofdescriptor selection so pass in thistest.
There are no wrapper methods inthis so no option of descriptorselection.
This is perfect in parameteroptimization of machine learningor statistical methods.
R analytics has no access overparameter optimization ofmachine learning. Along with thisis there is no automatic tool toavail these options.
Rapid miner can do the partitionof data but it is comparatively lesseffective than R.
R analytics is highly capable ofdoing partition of various data setsinto small testing sets.
It has a number of methods forerror management whichthoroughly put checks on entiredata base.
There is limited error detectionmethod in R for the purpose ofmodel validation.
Rapid Miner
SAS
Rapid miner software is easy to processas compared with SAS. Even a personwithout much training regarding this canoperate this.
SAS needs professional training of itsusers to have command on the saidsoftware. It can be operated by fresherdirectly.
Rapid miner is good in terms ofalgorithm option but it is lacking idcompared with SAS.
It is complete system of analyticalalgorithms because nothing is left fromthe scope of SAS. It is full fledgesolution to data analytics.
Rapid miner can handle and store bigversion of data sets but problems relatedwith this can be arisen in this.
Data handling process of SAS is perfectas the storage of data, data mining, datakeeping and other key functions are thespecialty of SAS.
SAS
R Analytics
SAS needs technical skills for users andalso it is user friendly. It performmultiple functions in a short timeperiod.
If we look on the ease of use then Ranalytics is hard to operate for its users.The users need specialized skills tooperate this.
SAS is much better for data handling ascompared with R analytics from all theperspective of this particular function.
It has been observed by the research andanalyst that this method has somewhatproblems in data handling such as datamining.
This is speedy function because of itspool of algorithms. It processes the datavery quickly and gives spontaneousresults of data analysis
The speed of R analytics is good but notperfect as SAS.
Rapid Miner – Derives best outcomeAs I have analyzed the three methods which can be used for Authorship attributionso according to me Rapid miner is the best method to conduct the data analysis forthe purpose of Authorship attribution. The method has certain unique featureswhich is not commonly available in other methods. Furthermore I have also lookafter the reviews of technical experts over various websites and chatting portal theyall suggested that Rapid miner is advantageous when it comes to authorshipattribution.I will adopt rapid miner because of following reasonsa. Rapid miner effectively simplifies data access for authorship attribution as itallows its users accessing, loading and evaluation for all kinds of data. Rapidminer has good access over texts, pictures and audio tracks as well.b. Authorship attributing demand such kind of method which can covert theuseless, disorganized and uncluttered data into valuable information and theselected software is perfect in doing this.c. The tools needed for authorship attribution task are already installed in thissystem and further if there is any need of other algorithms then this can beinstalled from the repository of Rapid miner. This will help me in my task tofind out and installation of needed tools.d. Looking at the cost of this method then it is highly cost effective as the trialversion of this software is available free of cost over their website.(Kannapiran, 2016)BibliographyAcock, A., 2005. SAS, Stata, SPSS: A Comparison. Alan C. Acock. Journal of Marriage and Family, 67(4),pp. 1093-1095.Devipriya B, K. Y., 2019. Evaluation of Sentiment Data using Classifier. International Journal ofEngineering and Advanced Technology (IJEAT), 9(1).Juola, P., 2008. Authorship attribution. Foundations and Trends® in Information Retrieval.Kannapiran, T., 2016. Analysis and Comparison Study of Data Mining Algorithms Using Rapid Miner.Research Gate.Nielsen, S. F., 2011. SAS for data analysis. Søren Feodor Nielsen, 38(8), pp. 1743-1744.Patil, S., 2016. Big Data Analytics Using R. International Research Journal of Engineering and Technology(IRJET), 3(7).

[Button id=”1″]

Quality and affordable writing services. Our papers are written to meet your needs, in a personalized manner. You can order essays, annotated bibliography, discussion, research papers, reaction paper, article critique, coursework, projects, case study, term papers, movie review, research proposal, capstone project, speech/presentation, book report/review, and more.
Need Help? Click On The Order Now Button For Help

What Students Are Saying About Us

.......... Customer ID: 12*** | Rating: ⭐⭐⭐⭐⭐
"Honestly, I was afraid to send my paper to you, but splendidwritings.com proved they are a trustworthy service. My essay was done in less than a day, and I received a brilliant piece. I didn’t even believe it was my essay at first 🙂 Great job, thank you!"

.......... Customer ID: 14***| Rating: ⭐⭐⭐⭐⭐
"The company has some nice prices and good content. I ordered a term paper here and got a very good one. I'll keep ordering from this website."

"Order a Custom Paper on Similar Assignment! No Plagiarism! Enjoy 20% Discount"