Uploaded by Celestino Kuchena

Kuchena Celestino Academic Development in Doctoral Studies

advertisement
Running head: KUCHENA CELESTINO STATISTICAL PACKAGE REVIEW
Kuchena Celestino Academic Development in Doctoral Studies
C. Kuchena
University of Zambia
GSB 8101
Dr. Rob Shah
1
KUCHENA CELESTINO STATISTICAL PACKAGE REVIEW
Watching tutorials on statistical packages guided me in making a decision regarding
which two I will mainly use for analysing my results. I looked at four packages (Trifecta, Rabid
Miner, Jamovi, PSPP, and JASP) to decide. Each of them has its strengths and weaknesses. My
choice was determined by the cost of ownership, userfriendliness, shareability of results, and
range of tests possible. This paper will discuss Trifecta first, Rabid Miner next, PSPP follows
then Jamovi and lastly JASP.
Trifecta
We are using Trifacta within our organization for multiple use cases. These include
simple reporting, data profiling, issue detection, and data prep for analytics. The main benefit of
this tool is that you can get at the target data very quickly compared to any programming
language.
Pros and Cons
Data profiling
Big data processing
Data wrangling
Automation
Smart algorithms suggest data manipulation steps when you simply select data.
Hadoop source data tables are presented in a flat searchable list but I would rather see
them in the native hierarchy.
Web interface can be flaky. We often need to refresh the page.
2
KUCHENA CELESTINO STATISTICAL PACKAGE REVIEW
Rabid Miner
We use RapidMiner to create ETL (Extract, Transform, Load) processes to load our BI
datamarts with data from operational databases. We've created complex load processes, and we
prepare that data to be fed, and later on, create Business Intelligence dashboards. We also use
RapidMiner to perform some data mining with different techniques such as text processing,
image processing, and algorithm data analysis (clustering, neuralnet, correlation, etc...)
Pros and Cons
RapidMiner is really fast at reading all kinds of databases. We read and merge databases
like SQL Server, Informix, MySQL, and Oracle. Configuring access is easy, some drivers are
inbuilt, but it's not difficult to find new java drivers to allow RapidMiner to connect to other
databases.
Performing all kinds of transformations, calculations (date, percentages...), joins, and
filters without coding. We have several different databases and this makes my life a lot easier.
Knowing that this part is 80% of analyst work, you know that you can work more on the
analyses itself and not on cleaning and preparing data.
You can clone transformations to reuse on new analyses, so you save a lot of time.
There's a lot of add-ons to make different things (text, image analysis, recommender systems,
etc).
Training is easy, the tool is intuitive and there's a lot of videos on the internet. The
community is very active.
Sharing RapidMiner Studio analysis is not easy. You may think that the RapidMiner
Server does that work but no. It's more automated job oriented or useful to run models on a web
3
KUCHENA CELESTINO STATISTICAL PACKAGE REVIEW
site. If you need to use it for Business Analytics dashboards, this is not the tool. It's more a
backstage tool for analyst. Some charts are good but other not so much.
The free edition allows you to work with 10,000 rows, but if you need more, it's not
cheap (100,000 rows - 2,500 USD/year, 1,000,000 rows - 5,000 USD/year).
The commercial team is not very reactive. I've asked for a RapidMiner Education
Program and Rapidminer Server quotation with no answer. I guess that's because they were
changing from an opensource company model to a more commercial one.
Return on Investment
Very high positive impact because it's very fast to work with data with no coding need.
You save a lot of time.
Jamovi
It is free
One complete package for introductory statistics
jamovi is a gem of a package, one that looks so good I asked the developers if they had
an artist or user interface designer on the team. They don’t, but clearly, they have put a lot of
thought into how to make the software beautiful and easy to use. They have also chosen their
options carefully so that each analysis includes what that a researcher would want to see.
Their creation of the jmv package is a bold move, one that promises to greatly simplify
the number of separate packages a coder would need to learn, though in so doing they challenge
4
KUCHENA CELESTINO STATISTICAL PACKAGE REVIEW
5
the existing way of programming in R. Just as the tidyverse set of commands is controversial, the
jmv package is also likely to ruffle some feathers.
As nice as jamovi is, it also lacks significant features, including: the ability to see and
save data management syntax; the ability to handle date/time variables; the ability to
perform many more fundamental data management tasks; the ability to save new variables such
as predicted values or factor scores; the ability to save models so they can be tested on hold-out
samples or new data sets.
JASP
Jasp is a free and user friendly software which does frequency and Bayesian analysis. It
also does adavanced analysis such as Sturctural Equation Modelling. It also Summary stats
module allows analysing published results without needing the original data. It updates results as
you go. Jasp allows copying and pasting of tables to Word in APA style. Furthermore, it permits
saving of plots in formats for submission of the article.One cabn annotate results in Jasp to allow
a full understanding of results when collaborating.The data sync feature allows editing of data
within Jasp. This software permits publishing of results directly onto OSF.
Conclusion
At this point, use both. JASP does Bayesian analyses, network analysis, and SEM while
JAMOVI does not do this (yet). On the other hand, JAMOVI does HLM, confirmatory factor
analysis, simple mediation and moderation, equivalence testing, sample size estimation, simple
main effects, and some extensions to R. Both can read SPSS and CSV files but JAMOVI can
also read JASP work spaces.
KUCHENA CELESTINO STATISTICAL PACKAGE REVIEW
6
I've tried both packages on MAC, Windows, and Linux ad they work perfectly. As they're
both free, you've got nothing to lose except disk space.
I'm using versions 9.0+ of both and I imagine great things to come.
References
McKiernan, P., & Tsui, A. S. (2019). Responsible management research: a senior scholar legacy
in doctoral education. Academy of Management Learning and Education, 18(2), 310-313.
Panigrahi, S. S., Bahinipati, B., & Jain, V. (2019). Sustainable supply chain management: a
review of literature and implications for future research. Management of Environmental
Quality: An International Journal.
Saunders, M., Lewis, P., & Thornhill, A. (2019). Research methods for business students.
Trifecta: https://www.trifacta.com/start-wrangling/ (Links to an external site.)Links to an
external site.
Rapid Miner https://rapidminer.com/get-started/ (Links to an external site.)Links to an external
site.
Jamovi https://www.jamovi.org/ (Links to an external site.)Links to an external site.
Jasp https://jasp-stats.org/ (Links to an external site.)Links to an external site.
KUCHENA CELESTINO STATISTICAL PACKAGE REVIEW
Tutorial Trifacta Wrangler (Links to an external site.)
Tutorial Rapid Miner Studio (Links to an external site.)
Tutorial Jamovi (Links to an external site.)
Intro to JASP (Links to an external site.)
7
Download