Data Analysis of NSLDS
Table of Contents
We will take a closer look at The National Student Loan System (NSLDS) to understand Data Analysis. National Student Loan System (NSLDS) includes the national database of information about grants and loans awarded to students. These kinds of government sponsored student loan systems are existing in more than 50 countries. Some schemes are shared for students across university level in predictive analytics market.
The other levels are limited to students enrolled in other public sectors. Some schemes are created for a few particular groups while the others are available for all. Not all of them but many schemes about loans have been introduced for facilitate increases in tuition fees, targeted at greater cost recovery.
Also, some loan schemes are mostly funded by offering acceptable repayment conditions, while others were almost near to commercial conditions.
Respectively, the size of loan recovery, and also the financially efficiency of the loan scheme, depends across schemes. Mentioned differences between national loan charts stem mostly from the differing objectives chased. In this article, we will mostly focus on visually presentation of the collected data.
Whole data in this article have collected from data.gov (U.S. Government’s open database) which includes free data about these topics; agriculture, climate, consumer, ecosystem, education, energy, finance, health, local government, manufacturing, maritime, ocean, public safety and finally science & research.
The main goal of this article is showing statistical data by charts (graphically). And the pros, cons, advantages and disadvantages of the visualization process.
Data Analysis: Rationale
Data visualization is graphic representation of the data to understand its structure and purpose. Big clusters of data such as databases, expressed in a tabular form. It can frequently hide models and correlations and are easier to notice when the data is visually represented.
Numerical data also cannot provide the context of quantitative relationships that can help us compare and compare them effectively. Presenting the data in a visual way greatly simplifies this process.
Data visualization involves making many choices about how data is presented. Researchers decide on the combination of points, areas, lines, dimensions, positions, colors, axes (x, y, z, etc.) and grid lines to create different kinds of graphics without complexity and composition.
These visualized graphics can identify different kinds of data and depict different analysis angles.
To begin presenting your data, you should to think about the type of data, your analytical purpose, some main components of different graphic kinds, and the principles that compares effective visualization from others.
The ultimate goal of data visualization is to help us understand the data. To achieve this, data visualization should address the following three features.
First, the charts should clearly indicate the values of the data under review. Graphics should make it easier for readers to decipher various shapes, sizes and colors: turning visual presentations into values; and estimate the relationships between different values.
Second, charts should help readers understand the importance of values in the chart. This includes making design choices that explain what readers’ subtitles, colors, titles, and other annotations will mean on the visual display.
Third, data visualization helps readers understand what the graph adds to existing information. Charts should reflect whether we already know certain quantitative relationships in the current literature.
Data Analysis: Data visualization methods
There are two methods for visualizing data. First one is explanatory and the second one is exploratory. The first method not only provides viewers with a visual characterization of the data but also releases new thoughts about the balance between variables. The purpose of the second method is to let viewers think a process of analyzing and understanding by showing key points in interpreting and drawing hidden meanings from the data. Different tools are used for this cause, like; log analysis tools, log analytics tools,
This visualization method for data analysis requires that the researcher should have sufficient knowledge about the concept under study and be able to identify the most relevant, charming, ant worthy thoughts. In contrast, this method aims at help viewers realize their own insights. This process involves questioning and manipulating the data. Both of these processes are interactive and iterative. They let users to modify their view of charts, filter central traits of interest, change the parameters or rotate the view.
Finally, the process of data visualization contains 4 steps in planning and developing. Which are; formulating the brief, working with the given data, establishing own your thinking and making design solutions.
The reason why I choose Federal Student Aid Portfolio Summary is that it includes lots of numerical data inside of it. Numerical data are the key materials of visualization process.
Data Analysis: Assumptions
In our dataset, there are similar kinds of tables with different titles. The main structure of those tables contains data of loans by years, quarters and some specific situations which we’ll examine in the next parts. According to our dataset, there will be two types of variables. First one is the constant variable.
While creating charts and applying visualization method between all these variables, some of them must be stay still to see the effect of only one variable. Online cloud data analytics software is useful at this point to determine consumer data analytics in a business.
The second variable is changing variable, which will be the key variable of that chart because only it will have effect on the chart. I believe bar charts will be the key chart of visualization process; which is also an ibm predictive analytics tool to measure customer data analytics.
Also pie chart is one of the main visualization charts used for data analysis. It is easy to use both of them and helps viewers to see the results clearly. These both visualization methods are used for comparing values. As data analysts, you see your fair share of datasets. The charts are perfect when you want to compare them and similarities between these sets. They easily release the high and low values of a particular set, so you can realize huge differences or outliers. There are a few popular methods for creating a comparison chart;
Data Analysis: Most Popular Chart Types for Data visualization
Visual presentation is important part of Data analysis. Let’s take a look at all those charts step by step.
Tt is simple, time saving way to show the difference between different datasets. A column chart contains data labels along the horizontal (X) axis, or values presented on the vertical (Y) axis, also known as the left side of the chart. The initial point for Y axis is 0 and is as high as the largest measurement you’ve ever watched. You can use column charts to track monthly sales numbers, landing page revenue, or similar metrics. Consistent colors help the data focus on it, but you can offer accent colors to highlight important data points or see changes by the time.
Similar to previous one, these include other visual elements. When using the bullet chart, you will start with a master account and compare it with another units of measure to find a better meaning and link.
The Mekko chart has only one expectation from bar chart. Which is; The X axis measures another dimension of your data sets instead of tracking your data progress.
The pie chart represents a static number divided into categories that make up its own pieces.
Although bar charts limit your label and comparison area, you can generally use a bar chart and bar chart in the same way. It is best to stick to a bar chart if:
- Working with longer tags
- Viewing negative numbers
- Comparing 10 or more items
Line chart is designed to reveal trends, progress or changes over time. Because of that, your dataset becomes the best when it is continuous instead of being full of start and end. Like the bar chart, the data labels in the line chart are on the X axis, while the measurements are on the Y axis.
It works best when analyzing more than one data points and looking for similarities in dataset. And also, you can notice any outliers and also better understand your overall data distribution.
Descriptive Analysis for Data Analysis
Calculating descriptive analysis includes calculating mean, mode, median and standard deviation of a sample. All calculations are done and added to the “Data_Analysis.xlsx” file with their explanations.
In our database, we compare results by years. So, year is the key parameter of our calculation. However, first six years are given as years, and the later part given as divided into quarters. There are two thing we can do. We can either look at our data as 2 different datasets. First one compares by years, second one compares data by quarters. And the other thing is, we can calculate sum of the years with quarters and think them as one data. I will go with the second one. To do that, I will use the sum values in the right side of the table (K Column) in Page 1. These are the results that I found;
Mean = $1.165,91
Mode = DNE
Median = 2761,25
Standard Deviation = 2224,801482
On our dataset;
According to the dataset, in 2007 the total outstanding amount was $516 billion. In one year, it increased $61 billion. By the year passed, this kept increasing. Between 2008 and 2009, it increased another $80 billion. With this, the total change became $141 billion dollars in 2 years. In 2012, the new amount of total outstanding became $948,2 billion. It’s $432,2 billion dollars more from the data of 2007. After 2012, there became a significant jump in data. It was $948,2 billion in 2012 but $4.007,5 billion dollars in 2013. When you think of the data between 2007 and 2012, that was significant raise when we evaluate our data by years. After 2003, it continued to raise and it is still raising today, in 2020.
While reading all these numbers and years, it becomes boring and cause you to lose your focus. This is exactly the time where visual presentation is required. We need it because of a graphical summary of the data makes it easier to recognize patterns and trends rather than looking at too many of rows in a spreadsheet. This is the way how human brain works.
Since the purpose of data analysis is to have a better understanding ability, any data is more valuable when it’s viewed. Even if a data analyst can preview data without visualization, it will be more difficult to carry meaning without charts or graphics. Diagrams and charts make it easy to see the results of communication data even if you define patterns without them.
In undergraduate business schools, students learn the value of presenting data results with visualization.
Without a visual representation of the data, it can be difficult for the viewer to clutch the real meaning of the results. For example, blocking numbers doesn’t tell your boss why he should care about data, but you can be sure he’ll get his attention by showing him a graph showing how much information he can save or he can win.
Our dataset contains Federal Student Aid Portfolio Summary by years from 2007 to 2020.
According to the chart, we can clearly see that total outstanding money is increased every year. Also, there is a significant jump between 2012-2013. This chart leads us to think the increment between years and the significant jump between 2012-2013. It also leads us to think of the total improvement we made. From where to where. At the beginning, we were under $1000, after 2019, we reached out to over $60000. Now let’s change our perspective the same data with another chart.
After looking at this pie chart, we think based on years. Our pie chart starts by 2007 (the smallest one) and goes till 2019 (the biggest one, light blue). This chart leads us to think which year has the biggest amount between all these years.
For all the other different charts & graphics;
We can repeat this action at change our perspective by changing the chart we use. I will give it another shot.
After we look at this chart, we also clearly see the big jump after 2012 and the general increment by years like it is in column chart. Additionally, this chart leads us to think about how it is going to be in 2020, or future. What I am trying to say is, it leads us to forecast next years’ values. By the values of this chart, we can easily apply Linear regression to forecast next year’s value or even 2060.
According to the simple analysis we made in previous section, all charts have different types of effect on viewers. We can think as manipulating viewers. Where ever you want to lead them, you should pick the right chart for it. After we have seen all these charts with the same variables and same data set, we can say that column chart is useful to use for analyzing data by groups (by years in our example).
The column chart fits great for our dataset. However, if there would be too many x axis (n, years) it would be really complicated to read the chart.
When we look at the second one, pie chart, it is really easy to understand. It leads viewer to think the data as a sample of an entire dataset. In another words, you focus on one data per unit time and compare it with others and the total. The only thing is, it is not giving too many information about the dataset.
Finally, the last chart line graph, it also has the same features like column graph. Also, it leads viewers to think about the future. In another words, it leads viewers to forecast future’s data.
It could be really hard to visualize a dataset in our minds and make decisions depending on a thought. But when we see those data in a chart, we easily analyze them. After I wrote this assignment, I realized that different types of charts have different effects on the viewer. They all leads viewer to think something else and make viewer focus on another point. I should learn the features, pros and cons of all of them.
ZIDERMAN, A., 2002. Alternative Objectives Of National Student Loan Schemes: Implications For Design, Evaluation And Policy. [ebook] researchgate.net. Available at:
Link [Accessed 20 March 2020].
Shu, X. (2020). DATA VISUALIZATION. In Knowledge Discovery in the Social Sciences: A Data Mining Approach (pp. 70-90). Oakland, California: University of California Press. doi:10.2307/j.ctvw1d683.6
Fox, P., & Hendler, J. (2011). Changing the Equation on Scientific Data Visualization. Science, 331(6018), 705-708. Retrieved March 20, 2020, from www.jstor.org/stable/25790280
AKRAMOV, K. (2012). Data and Descriptive Analysis. In Foreign Aid Allocation, Governance, and Economic Growth (pp. 26-41). University of Pennsylvania Press. Retrieved March 20, 2020, from www.jstor.org/stable/j.ctt3fhm24.10