All posts by John Little

Announcing Tidyverse workshops for Winter 2018

Coming this winter the Data & Visualization Services Department will once again host a workshop series on the R programming language. Our spring offering is modeled on our well received R we having fun yet‽ (Rfun) fall workshop series. The four-part series will introduce R as a language for modern data manipulation by highlighting a set of tidyverse packages that enable functional data science. We will approach R using the free RStudio IDE, an intent to make reproducible literate code, and a bias towards the tidyverse. We believe this open tool-set provides a context that enables and reinforces reproducible workflows, analysis, and reporting.

This six-part series will introduce R as a language for modern data manipulation by highlighting a set of tidyverse packages that enable functional data science.

January Line-up

Title Date Registration Past Workshop
Intro to R Jan 19
1 – 3pm
register Resources
R Markdown
with Dr. Çetinkaya-Rundel
Jan 23
9am
register
Shiny
with Dr. Çetinkaya-Rundel
Jan 25
9am
register
Mapping with R Jan 25
1-3pm
register Resources
Reproducibility & Git Jan 29
1-3pm
register Resources
Visualizationg with ggplot2 Feb 1
9:30-11:30am
register Resources

An official announcement with links to registration is forthcoming. Feel free to subscribe to the Rfun or DVS-Announce lists. Or look to the DVS Workshop page for official registration links as soon as they are available.

Workshop Arrangement

This workshop series is intended to be iterative and recursive. We recommend starting with the Introduction to R. Proceed through the remaining three workshops in any order of interest.

Recordings and Past Workshops

We presented a similar version of this workshop series last fall and recorded each session whenever possible. You can stream past workshops and engage with the shareable data sets at your-own-pace (see the Past Workshop resources links, above.) Alternatively, all the past workshop resource links are presented in one listicle: Rfun recap.

Sharing Files: Your Duke Box.com

Last fall Duke University released its newest file sharing service known as Duke’s Box.  By partnering with Box.comBox.com Logo, Duke offers a cloud-storage service which is intuitive, secure, and easy to use. Login with with your NetID, share files with colleagues, and have confidence this cloud storage is compliant with all laws and regulations regarding data privacy and security.

Simple to Use

Duke’s Box is similar to other cloud-based file storage services which support collaboration, productivity, and synchronization.  You can drop and drag files, identify collaborators and set permissions (read, edit, comment, etc.) But unlike some services, such as Dropbox or Google Drive, Duke’s Box enables you to be in compliance with data privacy and security. Additionally, you can synchronize data across your devices, at your discretion and subject to Duke’s Security & Usage Practice restrictions

While you may have previously used OIT’s NAS (Network Attached Storage) file storage service known as CIFS for data storage,  Duke’s Box is easier to use -although it provides services for slightly different use-cases. For example, CIFS might be more useful if accessing large files (e.g. video files that are larger than 5 GB). However, CIFS doesn’t enable collaboration or sharing.  Depending on your needs you may still want to use your departmental or OIT NAS.  Either way, you can use both file storage services and each service is free.

Check out this quick-start video:

50 GB of Space by Default

You are automatically provisioned 50 GB of space, but you can request more if you need more.  See the Comparison of Document Management & Collaboration Tools at Duke for details.

Individual file size limitations are throttled to less than 5 GB.  This means Duke’s Box may be less than ideal for sharing very large files. NAS services may be more appropriate for large files as the time to download or synchronize large files can become inconvenient.  But for many common file sharing cases, Duke’s Box is ideal, fast and convenient.

Documentation, Restrictions & Use

While you can store many types of files, there are best practices and restrictions you will want to review.  For example, Duke Medicine users are required to complete an online training module prior to account activation.

Sharing Your Data With Us

One of the many use-cases for Duke’s Box is a more convenient way for you to share your data with us.  As you know we welcome questions about data analysis and visualization. We know describing data can be difficult while sharing your dataset can clarify your question.   But sharing your data via email consumes a lot of resources — both yours and ours. Now there’s a better way; please share your data with us via Duke’s Box.

Steps for Sharing Your Data with DVS Consultants

How to Share your files - 5 second annimated loop

  1. Log into Duke’s Box  (Use the bluecontinuebutton) 
  2. Open your “homefolder
  3. Put your data in the “sharingfolder
  4. Use the “invite people” button (right-hand sidebar)
    • Using a consultant email address, invite the DVS Consultant to see your data.  (Don’t worry if you don’t have our email yet.  When you start your question at askData@duke.edu, an individual consultant will be back in touch.)

Access your Duke-Cloud from ANYWHERE

Say you’ve been making hella maps or data stories all day. Now you need to move to your comfy work spot and you need your data to come with you.  If you use Duke’s CIFS, moving around is easy, and all of your files are already backed-up.

In this example we follow the researcher, Ms. Stu Fac-Staff.  Stu is part student, part faculty, and part staff at Duke University.  She needs a portable place for her data and wants easy access from her home, lab, and devices.  Stu also needs to easily share data with colleagues.  No problem!  Stu uses CIFS.

Here’s the scenario.  Ms. Stu Fac-Staff walks into the Data & GIS Lab in the Duke University Libraries with a flash drive full of data tables.  She gathers more supporting data and some advice about crunching the numbers.  Stu finishes her day with a visualization and map. (Proudly, Stu imagines this is going to get the A.  “Is this grant worthy?” Stu asks herself.  “You bet your NSF Application it is!”)  Meanwhile, her flash dive is now full and all she wants is to SAVE THE DATA, CONVENIENTLY for later retrieval back home. So Stu stores the data on the Duke Cloud (CIFS.)

How do I get the free CIFS Space and how much can I use/access?

How do I access the data from my device?

  • In the Data & GIS Lab, after using your NetID to login, open the Windows File Explorer and your CIFS space will be mapped as drive Z.
  • After you leave our Data & GIS Lab, all you have to do is “map the drive” on your own machine
  • Web – For easy distribution to colleagues, you might want to access or distribute your files through the web.  To do this, store the files in your ‘public_html‘ directory inside of your CIFS space.  Now the files can be downloaded via a web browser.  This method is, by default, open to the world; you may want to take additional steps to secure this public_html directory  (see below.)

    http://people.duke.edu/~NetID

     

Can I Secure the Data?

  • Are you trying to access your mapped drive from off campus?
    • Use the VPN directions
    • The CIFS protocol encrypts NetID/password but it does not encrypt your data stream over the Internet.  If you’re connecting from an unencrypted or untrusted network (e.g. wireless in the coffee shop), the VPN allows for a secure connection.
  • Did you put files in your public_html folder?
    • Unlike the default CIFS space, placing files in the ‘public_html’ directory means they become accessible to the world
    • You can control and limit access by following OIT’s “htaccess” instructions