Visualization of Air Traffic in World's Busiest Airports


The visualization above represents 24 hours of flights arriving to and departing from Dallas/Fort Worth International Airport. Each tiny line represents a single flight and the circle shape is given by the fact that only planes within a 200km radius from the airport have been considered. In the graphic on the left, the saturation of the color is proportional to the altitude of the planes: white means low altitude, while blue means high altitude. The graphic on the right has been colored using the same concept (white = low altitude, color = high altitude), but this time the color of each line is green for arriving flights and red for departing flights.

I have collected 24 hours of data for flights arriving and departing from the 10 world's busiest airports and produced 2 visualizations for each airport. The result can be seen below.


To build this visualization, I had to go through several steps:

First I have downloaded flight data at regular intervals using Python for 24 hours. I have downloaded just data related to flights that were in a 200km radius from one of the airports in my list.

The coordinates (latitude and longitude) of each point have been translated at data collection time to x and y coordinates using the Mercator projection; I decided to use this projection since most of the other projections (e.g. Equirectangular) I tried weren't able to preserve the shape of the circles.

After this process, I had a bunch of lists (flights) containing lists of x and y coordinates (positions of the plane in a given snapshot) that needed to be connected. To do so, I used Processing, since it's very easy to use, it allows a lot of customization, and has a built-in implementation of Catmull-Rom splines (a mathematical method to draw a curve through points). After obtaining the curves representing the flights, I have changed the color and saturation of the lines using the HSB color space. This allowed me to define a color by setting its hue value (blue for the visualization on the left, green for arriving flights and red for departing flights in the visualization of the right), setting maximum brightness (so that lines are clearly visible on the black background), and setting the saturation value depending on the altitude (low saturation, white, for low altitude and high saturation, full colored, for high altitude).

It's interesting to notice how each airport has a different pattern and how arriving and departing flights take different routes. In some airports, a few green circles are present: they are formed by arriving planes which are waiting their turn to land.

Most Common Country of Origin of Immigrants in the United States

According to the Department of Homeland Security, 1,031,631 people became Legal Permanent Residents (green card holders) of the United States in 2012. Many of them (146,406) come from Mexico, the most common country of origin of permanent immigrants in most of the U.S. states. But what happens if you don't take take Mexico into account? The infographic below represents each of the 50 states as the flag of the country the most of its immigrants come from (excluding Mexico).
Is this what you were expecting? As you can see, the most popular countries are India (19 states), the Philippines (7 states) and Burma/Myanmar (5 states).
So, now, you may be wondering why I decided to take Mexico out of the picture. The infographic below will give you the answers: it shows the most common country of origin of Legal Permanent Immigrants in each U.S. state (including Mexico). 
As you can see, Mexico is by far the most popular country (27 states), followed by India (7 states).
If you focus on the countries you would have never guessed, it's not that hard to understand why they are there: most of them (Burma, Bhutan, Ethiopia, Iraq and Somalia) are the countries of origin of political refugees who were granted asylum.
P.S.: Thanks to Ben Blatt for the inspiration.

Create a Tag Cloud of Tweets Using Python

Here I will show you a simple code (the full version is on GitHub: https://github.com/johnbyron/pyTagTweets) to create a tag cloud based on the content of tweets talking about a given query and save it as a png.

Before starting you will need to install some Python libraries:

First of all you will need to authenticate using the 4 secrets associated to your Twitter developer account (more info here: https://dev.twitter.com/discussions/631):

 my_oauth = OAuth1(***,  
          client_secret=***,  
          resource_owner_key=***,  
          resource_owner_secret=***)  

Then you will need to create the URL associated with your query:

 complete_url = 'https://api.twitter.com/1.1/search/tweets.json?q=hello  

Here for example we are retrieving the most recent tweets containing the world "hello".

After that, you will just need to create your tag cloud, here we are using a loop to update the tagcloud every wait_time milliseconds.

 while True:  
   my_text = ''  
   r = requests.get(complete_url, auth=my_oauth)  
   tweets = r.json()  
   for tweet in tweets['statuses']:  
     text = tweet['text'].lower()  
     text = ''.join(ch for ch in text if ch not in punctuation) # exclude punctuation from tweets  
     important_words = text  
     my_text += important_words  
   words = my_text.split()  
   counts = Counter(words)  
   for word in stop_words:  
     del counts[word]  
   for key in counts.keys():  
     if len(key) < 3 or key.startswith('http'):  
       del counts[key]  
   final = counts.most_common(max_words)  
   max_count = max(final, key=operator.itemgetter(1))[1]  
   final = [(name, count / float(max_count))for name, count in final]  
   tags = make_tags(final, maxsize=max_word_size)  
   create_tag_image(tags, 'cloud_large.png', size=(width, height), layout=layout, fontname='Lobster', background = background_color)  
   print "new png created"  
   time.sleep(wait_time)  

If you want to download the complete version of the code and try it yourself you can find it here: https://github.com/johnbyron/pyTagTweets

Feel free to comment if you encounter any problem.

Collect Tweets from the Twitter API Using Python

Many of you have asked how to retrieve tweets from the Twitter API using Python. Well... it's extremely simple! First of all, if you still don't have one, you have to open a Twitter developer account and obtain your "secrets" (as explained here: https://dev.twitter.com/discussions/631) that you will later use to authenticate.

The only external libraries that you will need are "requests" (get it here: http://docs.python-requests.org/en/latest) that makes simpler to interact with the API and requests_oauthlib (get it here: https://github.com/requests/requests-oauthlib) to simplify the authorization process. Once you have installed it you can create an authorization variable using this simple code:

 oauth = OAuth1(CONSUMER_KEY,  
         client_secret=CONSUMER_SECRET,  
         resource_owner_key=OAUTH_TOKEN,  
         resource_owner_secret=OAUTH_TOKEN_SECRET)  

The four parameters are the "secrets" that you obtained when you created your Twitter developer account.
We are almost done! You just need to make a request to the Twitter API using the authorization variable previously created:

 r = requests.get(url="https://api.twitter.com/1.1/search/tweets.json?q=Obama", auth=oauth)  

For example in this case we retrieved the most recent tweets talking about Obama.
To have a look at the content of the retrieved tweets you can just print it:

 print r.json()  

or save the content to a csv file using the python csv module.

I hope that everything was clear, feel free to comment if you need more help!

Visualize the Network of Your Friends on Facebook


Here I will teach you how to visualize the network of mutual friendship among your Facebook friends using a simple online tool called FriendsGraph.

First of all you will need to access the FriendsGraph website: https://app.friendsgraph.me and login using your Facebook account. You will be asked to allow the application to access the list of your friends but don't worry, it's just needed to build the network.

Now you are almost ready to visualize your network. FriendsGraph will take up to a couple of minutes to compute the connections among your friends, in the meantime you will be presented some interesting facts related to your friends.


After approximately one minute you will be able to explore your network of friends. You can zoom in and out using your mouse wheel or the bar on the bottom left of the screen. Clicking on a friends will highlight the sub-network of common friendship and display a picture of him/her. The search bar at the top of the screen can be used to search for a particular friend within your network.


The application has another very interesting feature: as you have noticed, FriendsGraph assign to each node (friend) a color based on a community-detection algorithm and a position based on the connections with the other nodes (friends). This means that your friends which belong to the same "social group" are more likely to be near and to be of the same color.

Are you able to distinguish the different groups among your friends and give a name to them?

Visualize the Relationships among the Characters of a Movie




Here I will describe how to visualize the relationships among the characters of a movie or book using Gephi.

Gephi (https://gephi.org/) is a software for the analysis and visualization of graphs, available for Windows, Mac and Linux.

First of all you will need to choose a book or movie and write down the relationships between the characters. 

Then you will need to open Gephi and create a new Project (File -> New Project). You will then have to switch to the "Data Laboratory" view, click on Nodes and start adding the nodes to the graph (that is the characters of the novel/movie). When you have added all the nodes you can click on Edges and start adding the edges (the relationships) between the nodes (characters), simply choosing them from a drop down list (See Figure).

You can choose the relationship to be directed or undirected (if the relationship is mutual).


If you go back to the Overview tab you will see your network visualized. In the example below I changed the color of the nodes in the Data Laboratory, making the males blue and the females pink.


You now just need to apply a layout algorithm (choose it from the Layout drop-down list on the left) and visualize the result in the Preview view. You can then export your visualization using the command 
File -> Export. In the example below I visualized the network of relationships in the book "Les Miserabales", using the Alphabetical Sorter layout (https://marketplace.gephi.org/plugin/alphabetical-sorter/).





Visualize Tweets on a Map Using Java and Gephi


Some people are asking me how to visualize tweets related to a given topic on a map. In this tutorial you will learn how to do it using Java (to collect the tweets) and Gephi (to visualize them).

Phase 1 - Collecting the tweets


Here I will show you a sample code based on the Twitter4J libray (here more details on how to use the library) to collect geolocated tweets from the Twitter API. If you want to use another language to collect the tweets, or you already have a dataset of tweets to visualize, skip to the next phase. The important thing is that the collected tweets has latitude and longitude information.

The following code shows how to collect geolocated tweets (in this case related to the topic "Obama").


 int LIMIT= 5000; //the number of retrieved tweets
ConfigurationBuilder cb;
Twitter twitter;
cb = new ConfigurationBuilder();
cb.setOAuthConsumerKey("***");
cb.setOAuthConsumerSecret("***");
 cb.setOAuthAccessToken("***");
cb.setOAuthAccessTokenSecret("***");
twitter = new TwitterFactory(cb.build()).getInstance();
 Query query = new Query("Obama");
query.setCount(100);

 try {
int count=0;
QueryResult r;
 
  do {
r = twitter.search(query);
ArrayList ts= (ArrayList) r.getTweets();
 

  for (int i = 0; i < ts.size() && count < LIMIT; i++) {

   Status tweet = ts.get(i);

    if (tweet.getGeoLocation() != null) {
count++;
     
     //add to a csv
     //for example three attributes:
     //tweet.getId()
     //tweet.getGeoLocation().getLatitude();
     //tweet.getGeoLocation().getLongitude();
 }
}
} while ((query = r.nextQuery()) != null && count < LIMIT);
}
catch (TwitterException te) {
System.out.println("Couldn't connect: " + te);
}
}


After this phase you should have a .csv file like this:


Phase 2 - Visualizing the tweets



To visualize the tweets you will need Gephi (a graph visualization tool available here).

  1. Open Gephi.
  2. Download the GeoLayout plugin: Tools -> Plugins -> Available plugins -> Select "GeoLayout"  -> Install. Then restart Gephi.
  3. Open a new project: File -> New Project
  4. Open the data table: Window -> Data Table
  5. Click on Import Spreadsheet and choose the separator you used and "Nodes Table" (See Figure).

  6. Click on "Next" and choose "Double" as the type of the "latitude" and "longitude" attributes and "String" for the ID.
  7. Go back to the "Overview" tab, you should see your graph which is still not laid out (See Figure).
  8. On the left, in the "Layout" tab select "Geo Layout" and the Latitude and Longitude attributes. Then click on Run. You should have obtained something like this:

  9. In the Data Laboratory you can set the size and color of all the nodes to obtain your desired result. Then you can switch to the "Preview" view and set the opacity of the nodes and the background of the canvas, obtaining the following result:

Language Communities and Information Flow on Twitter


Inspired by an awesome visualization by Eric Fischer, I decided to create a map of tweets which represents the language communities and the information flow on Twitter. Each tweet is represented on the map as a point whose color is determined by the language settings of its creator. The arcs represent retweets; in this way it was possible to visualize how the information flows between the different language communities on Twitter.

To achieve this result I downloaded geolocated tweets for 4 days from the Twitter  Streaming API. The total number of retrieved tweets is 247850 (4632 of which are retweets). Gephi was used to produce the final visualization.

From this analysis, some interesting aspects emerge:


First of all it's possible to notice that English, Spanish and Portuguese tweets make up the 80% of the total number of tweets.

Moreover some interesting patterns have been highlighted by the map visualization:
  • In some countries and areas (e.g. China, Africa, Australia, North Korea) Twitter is poorly diffused. 
  • Information in English flows mainly from UK and US to UK and US themselves, but also to Philippines, Malaysia, India, Norway, West and South Africa.
  • England has the highest density of tweets and  "outgoing" information.
  • Tweets in Spanish flow from and to Spain, South America, Mexico, but also California.
  • Most Japanese have set their Twitter language to Japanese, but they "import" many tweets created by English-speaking users.