How do you support 57,860 online students learning R and statistics ? Late last fall, Data and GIS Services shared this challenge with Professor Mine Çetinkaya-Rundel and the staff of CIT as we sought to translate Professor Çetinkaya-Rundel’s successful Statistics 101 course to a Coursera class on Data Analysis and Statistical Inference. While Data and GIS Services has supported Statistics 101 students for several years identifying appropriate data and using the R statistical language for their assignments, the scale of the Coursera course introduced new challenges of trying to provide engaging data to a very large audience without having the opportunity to provide direct support to everyone in the class.
In our initial meetings with Professor Çetinkaya-Rundel, she requested that Data and GIS create data collections for the course that would provide easy access in R and would include a range of statistical measures that would appeal to the diverse audience in the class. The first challenge — easy access to R — required some translation work. While R excels in its flexibility, graphics, and statistical power, it lacks some of the built in data documentation features present in other statistical packages. This project prompted Data and GIS to reconsider how to provide documentation and pre-formatted R data to an audience that would likely be unfamiliar with R and data documentation.
The second challenge — finding data that covered a wide range of interesting topics — proved much easier. The General Social Survey with its diverse and engaging questions on a wide range of topics proved to be an easy choice for the class. The American National Election Studies, also offered a diverse set of measures of public opinion that suited the course well. With these challenges identified and addressed, we spent the end of 2013 selecting portions of the data for class (subsetting), abridging the data documentation for instructional use, and transforming the data to address its usage in an online setting (processing missing values for R, creating factor variables).
As Professor Çetinkaya-Rundel’s class launches on February 17th, this project has given us a new appreciation of providing data and statistical services in a MOOC while also building course materials that we are using in Statistics 101 at Duke. While students begin the Coursera course on Data Analysis and Statistical Inference, students in Professor Kari Lock Morgan’s Statistics 101 class will use these data in their on-campus Duke course as well. We hope that both collections will reduce some of the technological hurdles that often confront courses using R as well as improving statistical literacy at Duke and beyond.
Confused about Data & GIS Services? Not sure what questions you should be asking us or what kind of services we provide? Here’s one handy chart we’ve come up with to explain what exactly we cover in our consultations and workshops.
When it comes to picking what day to stop by our walk-in hours or knowing how much of the data life cycle our consultants cover, this graphic might be your first stop. Whether it’s finding data, processing or analyzing that data, or mapping and visualizing that data, we have staff with expertise to help!
Still not sure who to approach or what kind of help you might need? Just email firstname.lastname@example.org to get in touch with all of us at once. Some questions can be answered quickly over email, but we’re also happy to schedule an appointment to talk in person.
Explore network analysis, text mining, online mapping, data visualization, and statistics in our spring 2014 workshop series. Our workshops provide a chance to explore new tools or refresh your memory on effective strategies for managing digital research. Interested in keeping up to date with workshops and events in Data and GIS? Subscribe to the dgs-announce listserv or follow us on Twitter (@duke_data).
Currently Scheduled Workshops
Thu, Jan 9 2:00 PM – 3:30 PM Data Management Plans – Grants, Strategies, and Considerations Mon, Jan 13 2:00 PM – 3:30 PM Webinar: Social Science Data Management and Curation Mon, Jan 13 3:00 PM – 4:00 PM Google Fusion Tables Tue, Jan 14 3:00 PM – 4:00 PM Open (aka Google) Refine Wed, Jan 15 1:00 PM – 3:00 PM Stata for Research Thu, Jan 16 3:00 PM – 5:00 PM Analysis with R Tue, Jan 21 1:00 PM – 3:00 PM Introduction to ArcGIS Wed, Jan 22 1:00 PM – 3:00 PM ArcGIS Online Wed, Jan 22 3:00 PM – 4:00 PM Open (aka Google) Refine Mon, Jan 27 2:00 PM – 3:30 PM Introduction to Text Analysis Wed, Jan 29 1:00 PM – 3:00 PM Analysis with R Thu, Jan 30 2:00 PM – 4:00 PM Stata for Research Mon, Feb 3 1:00 PM – 2:00 PM Data Visualization on the Web Mon, Feb 3 2:00 PM – 3:00 PM Data Visualization on the Web (Advanced) Tue, Feb 11 2:00 PM – 4:00 PM Using Gephi for Network Analysis and Visualization Wed, Feb 12 1:00 PM – 3:00 PM Introduction to ArcGIS Tue, Feb 18 2:00 PM – 3:30 PM Introduction to Tableau Public 8 Tue, Feb 25 1:00 PM – 3:00 PM ArcGIS Online Thu, Feb 27 1:00 PM – 3:00 PM Historical GIS Mon, Mar 3 2:00 PM – 3:30 PM Designing Academic Figures and Posters Tue, Mar 4 1:00 PM – 3:00 PM Useful R Packages: Extensions for Data Analysis, Management, and Visualization
Say you’ve been making hella maps or data stories all day. Now you need to move to your comfy work spot and you need your data to come with you. If you use Duke’s CIFS, moving around is easy, and all of your files are already backed-up.
In this example we follow the researcher, Ms. Stu Fac-Staff. Stu is part student, part faculty, and part staff at Duke University. She needs a portable place for her data and wants easy access from her home, lab, and devices. Stu also needs to easily share data with colleagues. No problem! Stu uses CIFS.
Here’s the scenario. Ms. Stu Fac-Staff walks into the Data & GIS Lab in the Duke University Libraries with a flash drive full of data tables. She gathers more supporting data and some advice about crunching the numbers. Stu finishes her day with a visualization and map. (Proudly, Stu imagines this is going to get the A. ”Is this grant worthy?” Stu asks herself. ”You bet your NSF Application it is!”) Meanwhile, her flash dive is now full and all she wants is to SAVE THE DATA, CONVENIENTLY for later retrieval back home. So Stu stores the data on the Duke Cloud (CIFS.)
How do I get the free CIFS Space and how much can I use/access?
- Duke University provides 5 GB (at least!) of easily accessible Cloud-storage space to all faculty, students, and staff
- If you need more space, larger quantities are available upon request
- The space is called CIFS and is an OIT supported personal home directory of portable file space; CIFS is a mappable drive on your device and the files are backed up
- Students are provisioned CIFS space automatically. Faculty & Staff must request the space through the OIT Service Desk
How do I access the data from my device?
- In the Data & GIS Lab, after using your NetID to login, open the Windows File Explorer and your CIFS space will be mapped as drive Z.
- After you leave our Data & GIS Lab, all you have to do is “map the drive” on your own machine
- Web – For easy distribution to colleagues, you might want to access or distribute your files through the web. To do this, store the files in your ‘public_html‘ directory inside of your CIFS space. Now the files can be downloaded via a web browser. This method is, by default, open to the world; you may want to take additional steps to secure this public_html directory (see below.)
Can I Secure the Data?
- Are you trying to access your mapped drive from off campus?
- Use the VPN directions
- The CIFS protocol encrypts NetID/password but it does not encrypt your data stream over the Internet. If you’re connecting from an unencrypted or untrusted network (e.g. wireless in the coffee shop), the VPN allows for a secure connection.
- Did you put files in your public_html folder?
- Unlike the default CIFS space, placing files in the ‘public_html’ directory means they become accessible to the world
- You can control and limit access by following OIT’s “htaccess” instructions
Data & GIS Services will soon be accepting submissions to its 2nd annual student data visualization contest. If you have a course project that involves visualization, start thinking about your submission now!
The purpose of the contest is to highlight outstanding student data visualization work at Duke University. Data & GIS Services wants to give you a chance to showcase the hard work that goes into your visualization projects.
Data visualization here is broadly defined, encompassing everything from charts and graphs to 3D models to maps to data art. Data visualizations may be part of a larger research project or may be developed specifically to communicate a trend or phenomenon. Some are static images, while others may be animated simulations or interactive web experiences. Browse through last year’s submissions to get an idea of the range of work that counts as visualization.
The Student Data Visualization Contest is sponsored by Data & GIS Services, Perkins Library, Scalable Computing Support Center, Office of Information Technology, and the Office of the Vice Provost for Research.
For more details, see the 2014 Student Data Visualization Contest page. Please address all additional questions to Angela Zoss (email@example.com), Data Visualization Coordinator, 226 Perkins Library.
On Friday, October 4, Dr. Christopher G. Healey will visit Duke University to speak at the Visualization Friday Forum.
Christopher G. Healey is an Associate Professor in the Department of Computer Science at North Carolina State University. He received a B.Math from the University of Waterloo in Waterloo, Canada, and an M.Sc. and Ph.D. from the University of British Columbia in Vancouver, Canada. He is an Associate Editor for ACM Transactions on Applied Perception. His research interests include visualization, graphics, visual perception, and areas of applied mathematics, databases, artificial intelligence, and aesthetics related to visual analysis and data management.
We hope you can join us at the Friday Forum!
Analyze, discover, manage, map, and visualize your data with Duke Libraries Data and GIS Services. Our team of five consultants provides a broad range of support in areas ranging from data analysis, data visualization, geographic information systems, financial data, statistical software and data storage and management. Our lab provides 12 workstations with the latest data software and three Bloomberg Professional workstations nearly 24/7 for the Duke community.
Data and GIS Workshop Series
All are welcome to the Data and GIS Workshop Series. Analyze, communicate, clean, map, represent and visualize your data with a wide range of workshops on data based research methods and tools. Details and registration for each class are available at the links that follow. (Interested in keeping up to date with workshops and events in Data and GIS? Just go to https://lists.duke.edu/sympa/info/dgs-announce and click on the “Subscribe” link at the bottom left.)
Tue, Sep 3, 2013 1:00 PM - 3:00 PM Introduction to ArcGIS Wed, Sep 4, 2013 10:00 AM - 11:30 AM Stata for Research Wed, Sep 11, 2013 10:00 AM - 11:00 AM Open (aka Google) Refine Thu, Sep 12, 2013 1:00 PM - 3:00 PM Analysis with R Tue, Sep 17, 2013 1:00 PM - 2:30 PM Introduction to Tableau Public 8 Thu, Sep 19, 2013 10:00 AM - 11:00 AM Google Fusion Tables Mon, Sep 23, 2013 1:00 PM - 2:30 PM Introduction to Tableau Public 8 Tue, Sep 24, 2013 1:00 PM - 2:30 PM Stata for Research Mon, Sep 30, 2013 10:00 AM - 11:00 AM Top 10 Dos and Don'ts for Charts and Graphs Mon, Sep 30, 2013 1:00 PM - 3:00 PM Introduction to ArcGIS Tue, Oct 8, 2013 1:00 PM - 2:30 PM Introduction to Text Analysis Thu, Oct 10, 2013 1:00 PM - 3:00 PM ArcGIS Special Topics: Geocoding & Proximity Analysis Thu, Oct 17, 2013 1:00 PM - 3:00 PM Historical GIS Mon, Oct 28, 2013 1:00 PM - 2:00 PM Designing Academic Figures and Posters Tue, Oct 29, 2013 1:00 PM - 3:00 PM Web GIS Applications
Data and GIS also offers instruction tailored to courses or research teams. Please contact firstname.lastname@example.org to schedule a session!
Data Management Planning – DMPTool – Get 24/7 online help for your next data management plan, including information about Duke resources available for your data work.
Statistical Software Updates
- Stata 13 is now available in our lab and training room
- QSR’s Nvivo qualitative software is available on all machines
- 3 Bloomberg Professional Workstationsare available for individual or group use
Job Opportunities in Data and GIS Services
Data & GIS Services is hiring! We have two open positions for student web programmers interested in working on data visualization projects. See the Library Student employment page (http://library.duke.edu/jobs/students.html) for more information on how to apply. (The job can be found by searching for requisition number ”DUL14-AMZ02″.)
New Data and Map Collections
CPS on Web (CPS Utilities Online)
CPS on Web is a set of utilities enabling you to access CPS data and documentation from this website. You may make tables and graphs from the CPS data, download data extractions, make estimations, get summaries and statistical measures, search the documentation, and make your own variables as functions of the existing ones.
Global Financial Data
Global Financial Data is a collection of financial and economic data provided in ASCII or Excel format. Data includes: long-term historical indices on stock markets; Total Return data on stocks, bonds, and bills; interest rates; exchange rates; inflation rates; bond indices; commodity indices and prices; consumer price indices; gross domestic product; individual stocks; sector indices; treasury bill yields; wholesale price indices; and unemployment rates covering over 200 countries.
The LandScan Global Population Database provides global population distribution in a gridded GIS format at 30 arc-second resolution (approximately 1×1 km cells). Oak Ridge National Laboratory developed modeling techniques to disaggregate and interpolate census data within administrative boundaries to create a GIS layer showing population distribution as accurately and as timely as possible. EastView provides this data to use in GIS software as a WMS (Web Mapping Service) or as a WCS (Web Coverage Service) to allow a user to incorporate population distribution into GIS mapping and analysis.
Data and GIS Services is happy to announce the launch of new service designed to provide detailed data management planning help online. As an increasing number of granting agencies require a data management plan as part of the grant application process, the DMPTool provides “an open source, web application that assists researchers in producing data management plans and delivering them to funders.” For Duke researchers, the tool provides constantly updated advice about how to complete a data management plan while simultaneously highlighting Duke resources available from a variety of data support providers for the planning, maintenance, and sharing of research data.
We hope that the DMPTool will streamline the grant writing process and help researchers make the appropriate connections to resources available both at Duke and beyond for data management planning. We welcome your comments and suggestions on this resource.
MATLAB is an integrated technical computing environment that combines numeric computation, advanced graphics and visualization, and a high-level programming language. Duke’s license agreement offers MATLAB licenses to faculty and staff for work or personal computers, as well as students through on-campus use. The Duke Office of Information Technology (OIT) maintains instructions on installing MATLAB at Duke. MATLAB is used by many communities at Duke, including Engineering, Econometrics, Medical Sciences, Computational Biology, and Business.
On Tuesday, June 18, OIT in partnership with Duke University Libraries will host a one-day course on MATLAB that focuses on using this software for Data Processing and Visualization. The course will cover importing data, organizing data, and visualizing data in a hands-on format (detailed outline). Seats are limited to 20; please register soon to reserve your spot.
MATLAB for Data Processing and Visualization
Laura Proctor, Academic Training Engineer at MathWorks
Tuesday, June 18
8:30 a.m. to 4:30 p.m. (lunch break from 12:00 p.m. to 1:00 p.m., lunch not provided)
Library Computer Classroom, Bostock 023
Registration (seats limited to 20)
The course assumes some existing familiarity with MATLAB. New potential MATLAB users may want to attend an overview seminar on the software that will be held on Thursday, May 30. This overview will not be hands on, but it will include live demonstrations and examples of both MATLAB and Simulink, an environment for multi-domain simulation and model-based design.
Introduction to Data Analysis and Visualization with MATLAB & Simulink
(details and registration)
Mehernaz Savai, Applications Engineer at MathWorks
Thursday, May 30
1:00 p.m. to 4:00 p.m.
FCIEMAS Building, Schiciano Auditorium – side A
If you would like to begin learning to use MATLAB, MathWorks offers a self-directed MATLAB Fundamentals course, and the Duke library collection also includes several introductory MATLAB texts, such as MATLAB Primer and MATLAB: A Practical Approach.
The finalists winners of the 2013 Data Visualization Contest were announced at our recent Data & GIS Services open house. The judging panel selected the top five submissions as finalists, each of which was then converted into a poster for display in the Brandaleone Family Center for Data and GIS Services (Perkins 226). Of the five finalists, the panel also selected two grand prize winners, each of whom was awarded $250 in Amazon Gift Cards.
The grand prize winners were:
ACC Basketball Tournament Series Records, by Volodymyr Zavidovych
Limbique, by Pinar Yoldas and David Paulsen
The other three finalists were:
Mapping Chinatown, by Sabrina McCutchan
Duke Intellectual Climate Report 2012, by Amanda Peralta
spNavigate, by Benjamin Radford
Data and GIS Services would like to congratulate the finalists and winners and thank all of the student submitters for their impressive work! The full set of submissions to the contest is available on our growing Flickr gallery.
AboutData and GIS Services collects, curates, and supports numeric and geospatial data from a variety of sources for the Duke community.
Search the Data & GIS Blog
Subscribe to the Blog
- February 2014
- January 2014
- December 2013
- November 2013
- October 2013
- September 2013
- August 2013
- July 2013
- May 2013
- April 2013
- March 2013
- February 2013
- January 2013
- December 2012
- November 2012
- August 2012
- July 2012
- April 2012
- January 2012
- December 2011
- November 2011
- October 2011
- September 2011
- August 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- February 2011
- January 2011
- December 2010