Heap Map Chart of Semantic Role Labels

While doing text analytics on a large document collection, the analyst is often looking for relationships between entities like person, organization, location etc. The existing approaches to finding related entities automatically are quite primitive. They are generally some variant of finding relationship by co-citation of entities and bibliographic coupling of documents. According to this scheme, two entities are considered “related” if they are mentioned in the same document. Conversely, two documents are considered related, if they have common entities mentioned in them. A major problem with this scheme of “relatedness” is that the nature of relationship (semantics) between two entities is unknown until the documents mentioning the entities together are not read by the analyst.

Another problem with this approach is that it can generate a large number of false positives. It is possible that two entities are mentioned in the same document but they are not related in any meaningful way. The current approach will still show these entities as related along with other related entities that are actually useful for the analysis task. Since there is no information about the nature of relationship, the task of separating the useful relationships from the rest can be challenging in large document collections.

To tackle this problem, I have recently started looking in a natural language processing technique called Semantic Role Labeling (SRL) and how it can be used to support analytics for large document collections. Automated SRL, first developed by Daniel Gildea and Daniel Jurafsky [1] is an interesting machine learning technique which tries to identify the semantic arguments associated with the verb/predicate in a sentence. These semantic roles capture a higher level structure of text than what a syntax tree does. Consider a simple sentence like this:

Joe sold the car.

For the sentence above, the task of semantic role labeling is to identify the predicate “to sell” and its associated semantic arguments “Joe” as the seller and “car” as the thing_sold. The arguments also referred to as “A0/Proto-Agent” and “A1/Proto-Patient” in the NLP world and the predicate form what I refer to as an SRL triple (or just a triple). For the sentence above, for example SRL generates the triple (Mary, sell, John). The roles “seller” and “thing_sold” are called “roleA0″ and “roleA1″ since they are the semantic roles for A0 and A1 arguments of the triple.

The first idea that came to mind was to simply use the verbs to show the relationships. But that did not work. The problem is that most documents do not contain simple sentences like “Joe sold the car.” Instead, they contain much more complex sentences which result in arguments that may be several words long, often containing several entities. In such a situation, it is difficult to identify the entities within documents that are related by the given predicate. So, I started looking at another ways that the information might be useful.

I tried to generate a heat map chart of the semantic roles in the document collection. The idea was to get some idea about the distribution of semantic roles over the collection. Admittedly, this was not aimed at solving the original problem of meaningless relationships. Rather it was an attempt to discover if there are any interesting pattern of semantic roles in the document collection.

Given a collection of triples for a document collection, first, I calculated the number of triples for every possible pair of roleA0 and roleA1. Since this exercise was a shot in the dark, I resisted from writing code for generating the visualization. Rather I used excel for generating the heat map. Once, I had the number of triples for every pair, I put the roleA0s as the first column and roleA1s in  the first column. The cell at the intersection of roleA0 row and roleA1 column contained the number of triples that they have in the collection. Once I had this, I used the conditional-formatting feature in excel to generate the heat map. Here is the excel file with the heat map, that I got as a final output.

HeatMapChartExcel_26May2014

If you look at the file, there are just too many rows and columns for a data that is so sparse. Clearly, the heat map was not a great idea. A node-link diagram or some other visualization technique for sparse data might have been better solution. I completely failed at estimating what the heat map chart for my data will look like. In any case, this was not an exercise completely in vain. It definitely improved my understanding of the data that I am dealing with.

References:

  1. Gildea, Daniel, and Daniel Jurafsky. “Automatic labeling of semantic roles.”Computational linguistics 28.3 (2002): 245-288.
  2. Semantic Role Labeling, http://en.wikipedia.org/wiki/Semantic_role_labeling

How to use Swing widgets in an Eclipse RCP application?

SwingSWTExample
This is what the application looks at the end.

Eclipse RCP is a great platform for developing nice GUI applications with Java. It provides a wonderful framework to build application with its very useful publish-subscribe event model and annotation based dependency injection. It feels almost like magic when you switch from developing hard-coded Swing applications to Eclipse.

However, it uses SWT as its widget set which is independent of the Java AWT and Swing library. While, it would be a little absurd to develop Swing widgets for Eclipse RCP application considering the richness of the SWT platform, there are times when it is unavoidable. At times you have an existing Java application with a lot of custom  Swing components or your application uses a third party visualization library like JUNG which is developed entirely in AWT/Swing. Converting every bit of Swing code to SWT in such a case is very expensive and time consuming. A good time-and-money saving approach in such a case would be to move to SWT, one step at a time with converting Swing code to SWT, as and when possible. Thankfully, the good members of the Eclipse community have made it really simple.

In this post, I will show a simple application that uses SWT and Swing  at the same time, uses a library that is not available as a bundle (at least to my knowledge) and how events can be send from a SWT widget to Swing (of course it can be sent the other way around as well). That said, I will try not to explain details about Eclipse RCP development. I am expecting the reader has some idea about how to get started with Eclipse RCP application. If not, Lars Vogel has a very nice tutorial on Eclipse 4 RCP development.

Before we begin, you will need to to install Eclipse and e4 (here are the instructions). Once you do that, create an Eclipse 4 project (File > New > Project > Eclipse 4 > Eclipse 4 Application Project). At this point, you will have a project that looks like this:

PackageExplorer

 

Double click on the SwingSWTTutorial.product file and select the overview tab. Click on the link that reads “Launch and Eclipse application”. You should see an empty window appear. Close the window now.

Next, lets design the application by specifying the application model. For that open the Application.e4xmi file and select the Form tab. You will see something like this (with certain nodes in the left-side missing which I will show how to add)

ApplicationModel

If you look at our final application, you see two panels juxtaposed (similar to the JSplitPane in Swing). To create this, go to Windows/Trimmed Window. Right click on Controls and Add child > PartSashContainer and selection Horizontal orientation. This is the container that will contain within it the SWT List and the Jung-Swing visualization.

Next right click this container to add two children using Add Child/Part. Click on the first Part and set the Label to be SWTView and the second to be SwingView (you can chose whatever label you want).

At this point, we have specified the skeleton of the application. Now we need to add code to fill our views and do some event sending and handling. But before that we need to add JUNG libraries to the project so that we can create and visualize a graph.

Go to JUNG’s download page at sourceforge. Click on jung2-2_0_1.zip file to download it. Extract this zip. This file contains several jar files. We need only a few of them. Create a lib folder/directory in the project and copy the following jars to it:

  • jung-api-2.0.1.jar,
  • jung-graph-impl-2.0.1.jar,
  • collections-generic-4.01.jar,
  • jung-visualization-2.0.1.jar,
  • jung-algorithms-2.0.1.jar

We need to add these libraries to the classpath. To do that open META-INF/MANIFEST.MF and select Runtime tab. Go to the classpath section and click on Add… Navigate to the lib directory (that should have the jars now) and add all the jars. Check that the MANIFEST.MF file has following content for the Bundle-Classpath:

Bundle-ClassPath: lib/collections-generic-4.01.jar,
 lib/concurrent-1.3.4.jar,
 lib/jung-algorithms-2.0.1.jar,
 lib/jung-api-2.0.1.jar,
 lib/jung-graph-impl-2.0.1.jar,
 lib/jung-visualization-2.0.1.jar,
 .

Make sure that the period at the end is not missing. Try not to edit this file manually as it is sensitive to formatting.

Now that we have the external libraries in place, let us define our data structures. First create a package swingswttutorial.data and add two classes: GraphData and GraphRepository

public class GraphData {
 
	private final Graph<String, Number> graph;
	private final String name;
 
	public GraphData(String name, Graph<String, Number> graph) {
		this.name = name.trim();
		this.graph = graph;
	}
 
	public Graph<String, Number> getGraph() {
		return graph;
	}
 
	public String getName() {
		return name;
	}
 
	@Override
	public String toString() {
		return name;
	}
 
}

and

public class GraphRepository {
 
	private final List graphs = new ArrayList<>();
 
	public GraphRepository() {
		initializeGraphs();
	}
 
	public void initializeGraphs() {
		Graph<String, Number> g1 = new DirectedSparseMultigraph<String, Number>();
		g1.addEdge(1, "A", "B");
		g1.addEdge(2, "B", "C");
		g1.addEdge(3, "C", "A");
		graphs.add(new GraphData("3-point graph", g1));
 
		Graph<String, Number> g2 = new DirectedSparseMultigraph<String, Number>();
		g2.addEdge(1, "A", "B");
		g2.addEdge(2, "B", "C");
		g2.addEdge(3, "C", "D");
		g2.addEdge(4, "D", "A");
		graphs.add(new GraphData("4-point graph", g2));
	}
 
	public Object[] getElements() {
		return graphs.toArray();
	}
}

The GraphData is a simple structure containing a name for the graph and the JUNG Graph itself. GraphDataRepository is a collection of GraphData objects.

Once our data model is defined, we need to tell eclipse about the repository so that Eclipse can make it accessible throughout the application. To do that add following lines to Activator.java

IEclipseContext ctx = E4Workbench.getServiceContext();
ctx.set(GraphRepository.class, new GraphRepository());

Now, we will create the views. Create the package swingswttutorial.views

Lets first create the SWTListView.java

Now this is IMPORTANT. Lets create the SwingView first. Create class SwingGraphView.java

public class SWTListView {
 
	@PostConstruct
	public void postConstruct(Composite parent,
			final GraphRepository repository, final IEventBroker eventBroker) {
		final ListViewer viewer = new ListViewer(parent);
		viewer.setContentProvider(new IStructuredContentProvider() {
 
			@Override
			public void inputChanged(Viewer viewer, Object oldInput,
					Object newInput) {
				// TODO Auto-generated method stub
 
			}
 
			@Override
			public void dispose() {
				// TODO Auto-generated method stub
 
			}
 
			@Override
			public Object[] getElements(Object inputElement) {
				return ((GraphRepository) inputElement).getElements();
			}
		});
 
		viewer.setInput(repository);
 
		viewer.addSelectionChangedListener(new ISelectionChangedListener() {
 
			@Override
			public void selectionChanged(SelectionChangedEvent event) {
				int selectionIndex = viewer.getList().getSelectionIndex();
				Object selectedData = viewer.getElementAt(selectionIndex);
				if (selectedData != null) {
					GraphData gData = (GraphData) selectedData;
					eventBroker.send(AppConstants.GRAPH_SELECTED, gData);
				}
 
			}
		});
	}
}

Now go to the Application model file Application.e4xmi. Select the SWTView part and click on Find next to the class URI field. Find and select the SWTListView.java class that you just created. What this does is tells Eclipse to use this class to construct the view. This is how it works:

Eclipse creates an object of SWTListView and then calls postConstruct() method as we have annotated it with @PostConstruct annotation. You can call the method whatever you like as long as it is annotate properly. Now the magic takes place. When calling this method Eclipse sees that the postConstruct method takes two arguments: Composite and GraphRepository. Remember, we added GraphRepository object earlier in Activator.java. Eclipse finds this object and also the parent composite object. It inserts those objects in place and calls postConstruct() method. The next lines then create the list, specify the data and registers a selection change listener to the list.

When the selection changes, we send a event with the following key (if you will)

AppConstants.GRAPH_SELECTED //this is string constant

and the selected GraphData object. Now lets create the GraphView that will create JUNG’s swing visualization.

Here is the code for the class:

public class SwingGraphView {
 
	private GraphPanel p;
 
	@PostConstruct
	public void postConstruct(Composite parent) {
		Composite composite = new Composite(parent, SWT.EMBEDDED);
		Frame new_Frame = SWT_AWT.new_Frame(composite);
		p = new GraphPanel();
		new_Frame.add(p);
	}
 
	@Inject
	@Optional
	public void onGraphSelected(
		@UIEventTopic(AppConstants.GRAPH_SELECTED) GraphData graphData) {
		p.view(graphData);
	}
 
}

Let’s see what is happening here. Following are the lines that add the Swing JPanel to the SWT parent object. We create a new Composite object that can embed and has the injected parent as the parent. Next we create an AWT Frame using SWT_AWT class and the new composite. Register it to the SwingView in application model just like we did it for SWTView and its done.

	Composite composite = new Composite(parent, SWT.EMBEDDED); //create a composite object that is set to embed
		Frame new_Frame = SWT_AWT.new_Frame(composite);
		p = new GraphPanel();
		new_Frame.add(p);

 The final piece is listening to the selection change event sent by the SWTListView, so that we can update the corresponding graph. Look at the onGraphSelected method in SwingGraphView.java . This method is called when the selection in list view changes. But, how does Eclipse know to call this method. Another piece of magic! If you look at the method arguments, you can see that one of the parameters is annotated with the following annotation

@UIEventTopic(AppConstants.GRAPH_SELECTED)

What this annotation does is tells Eclipse to call this method when a event has the key AppConstants.GRAPH_SELECTED. Note that this is the same constant key with which we sent the selected data in the list view. What eclipse does is that it sets the value of the key, i.e. the selected GraphData object to be the parameter of the method and now you can ask the SwingJPanel to visualize the Graph.

We created an Eclipse 4 RCP application in which both Swing and SWT components (with SWT container). Also, we were able to easily communicate between the two elements. I did not paste the complete code in this post. You can find the complete code on github.

Now, go and start moving your Swing application to Eclipse.

SwingSWTTutorial

Don’t teach them calculus before they can add !!

Lately, there has been quite a discussion online about changing ways to teach computer programming. Programming is being realized as a very useful tool for practitioners in almost every domain. Yet, the way it is being taught continues become more and more convoluted. As a result, more and more students are frightened of the word “programming”. We have finally managed to make a beast out of something that is amazingly simple in reality. We have also managed to kill the fun and joy that programming has to offer.

Most programming courses seem to abandon the fundamentals of programming before they try to teach students about object-oriented design, optimization and computer-generated graphics. In most cases, students are taught object-oriented design from the first or the second week of their first programming class. It is as if we are in a hurry to teach them calculus even before they can add two numbers.

I am not the first one to write about the sorry state that we have created for programming. Many have written about it before and came up with solutions and recommendations. While I agree with most of their analysis about the current state of programming, I do not entirely agree with their recommendations. The most common recommendation that I have come across is to teach programming in an environment where students can see the environment change as they code (something like LOGO). While this recommendation might be good for someone trying make sense of a nightmarish library, my personal experience tells me that such an environment will be no better. An instructor who does not have patience with his students will still use the environment to teach them about the wonderful principles of object-oriented design just after the students print “Hello World”.  Another recommendation that I have come across is to teach students to read code before asking them to write code. While reading others’ code might be a good exercise, it does make programming any easier. Yet another recommendation that somewhat overlaps with the previous recommendation is to train students in reading and writing documentation, reading code and all the other daily chores of a professional programmer. Again, it overlooks the fundamental need of making programming fun, easy and accessible.

When I look back to the days when I was a beginner, I find myself to be one of the few fortunate ones who were not shoved programming down their throat. Rather, I was taught to program in a way that made it a fun activity. My first encounter with programming was in 7th and 8th grade (as far as I remember) when I was introduced to LOGO and Microsoft BASIC(not Visual BASIC) respectively. In fact, the version that I was taught BASIC with was the one in which the programmer was required to write the line number before they write an instruction (sorry I could not find a link for that).

Both the environments provided console-based writing of programs, making it painful and near impossible to edit a program with ease that shiny editors and IDEs provide today. I was not taught with any special interface or environment. Yet, those initial programming years were fun and taught me a lot about programming fundamentals. I am certainly not claiming that a good environment does not make a difference. A good environment can definitely facilitate better understanding of fundamentals. But, that is only possible when the instructor does not go astray from the fundamentals to things like OOP, optimization and graphics. In my opinion, the first programming course should be about showing the potential of programming in harnessing the power of machinery and showing how the purpose of a programming language is not that different from natural language. Explicitly demonstrating the communication aspect of programming in general without getting lost in the jungle of syntax nightmare of a programming is very important. As far as programming languages are concerned, I am not a fan of teaching a language that is industry-standard. That being said, I also do not promote using LOGO to teach programming to university students. We need to find something in between LOGO and the modern languages like C# and Java that come with the bloat of object-oriented programming.

Fortunately, there are lot of options available for first languages. For beginning course, I recommend languages that allow communication with the machine with as little bloat as possible and with maximum ease to move from simple script-like code to object-oriented design in the same language or some another close relative. Two languages that come to mind are Python and Groovy. I personally prefer Groovy because of my preference of braces over indentation. Both these languages allow you to write script-like code without any bloat and at the same time allow the student to move on to complex code, once he/she is familiar with the fundamentals.

Again, an impatient instructor can mess up the learning experience even with such flexible languages. Hence, what we should be careful of when introducing programming to a newbie is to make sure that he/she understands why programming is useful, what programming allows them to do which would otherwise be very difficult or impossible and show them that programming is not about writing code. Programming is about interacting with a machine to achieve things that they could not do by other means.

First Date with Blender

WoodOnCloth_reducedBlender is a great free and open-source tool for creating 3D computer graphics. In my knowledge it is the only open-source tool that gives a real competition to its commercial counterparts. I have come across it several times and always wanted to learn it. But, I kept procrastinating. Finally, I got some free time last week and got hands on a very nice video tutorial on YouTube. The tutorial is easy to follow and I was able to learn the Blender interface and was able to replicate the tutorials to produce some really nice graphics that I am really proud of. Here are couple of images that I have created so far.

Ice_reducedPoolBalls_reduced Football_reduced 

Initializing a final variable when the method used to assign throws an exception.

A good design principle is to make a variable final whenever possible. However, following this principle can sometimes be tricky. One such case is when the variable/field is initialized with return of a function that throws exception. Let me explain using an example. Suppose, you have classes Foo and Bar as follows:

class Foo{
  private Foo(){}
 
  public static Foo createFoo() throws Exception{
 
  } 
}
 
class Bar{
  private final Foo foo;
 
  //This will not work. It will give the error "variable foo might not have been initialized.
  public Bar(){
     try{
      foo = Foo.createFoo();
     }catch(Exception e){
      System.err.println("foo cannot be initialized");
     }
  }
}

As you can see that the variable foo, in a Bar object is final. Function createFoo() creates a Foo object. As this method can throw an exception, the code for class Bar will not compile, raising an error “variable foo might not have been initialized.

The problem is that the compiler has no way of making sure that the variable foo will be assigned after the constructor for Bar finishes (foo will not be initialized if createFoo() throws an exception). How can you solve this compilation error?

Here is how: Write another function that wraps call to the function createFoo() and the associated exception handling.

 

class Bar{
  ...
  public Bar(){
      foo = createFooWrapper();
  }
 
  private Foo createFooWrapper(){
     try{
      return Foo.createFoo();
     }catch(Exception e){
      System.err.println("foo cannot be initialized");
     }
     return null;
  }
}

In case the exception occurs, foo will be null, else it would be what createFoo()  returns.

What can you do with Java annotations? – Part 2

In the previous post I briefly wrote about Java annotations and some of their uses. There is plenty of documentation about syntax and features online and hence I don’t want to write about it again.

Instead, I will be showing how you can use annotations for something simple but useful. Consider the following scenario:

You are working on an application that has a menu. Each of the items in that menu implements an interface (with certain methods). You have several classes that implement that interface. However, you only want certain menu items to be displayed in the menu. In addition, you might also want to change the positioning of the menu items.

To keep the example simple, we will consider a simple console based menu. You might start with something like this:

public interface MyPlugin{
 public String getName();
}
...
public static void main(String []args){
  MyPlugin plugin1 = new PluginA();
  MyPlugin plugin2 = new PluginB();
  MyPlugin plugin3 = new PluginC();
 
  List plugins = new ArrayList();
  plugins.add(plugin1);
  plugins.add(plugin2);
  plugins.add(plugin3);
 
  for(int i =0; i<plugins.size(); i++){
    System.out.println(i + "  " + plugins.get(i).getName());
  }
}
...

This is all good, except that whenever you want to hide a menu item or change its position, you will be required to make changes to the main application code. This is something you should try to avoid when the main application code is much more complex than just printing Strings. In fact, you might not even have the opportunity to change the main code, if you are trying to extend an existing application. Hence a better way would be what is called Dependency Injection. One way to do that in Java is using annotation. Let’s see what it looks like:

First we define the interface for items:

public interface MyPlugin {
	public String getName();
}

Next we define the annotation that will be used to make the plugin visible and specify its position.

@Retention(RetentionPolicy.RUNTIME)
public @interface MyPluginAnnotation {
	int position();
}

As you can see the declaration looks similar to that of an interface, except that we use @ symbol in front of interface. There is a new annotation that you might not have seen before. @Retention is an annotation that can be added to your annotation. It tells the compiler the how the newly defined annotation is to be stored. RetentionPolicy.RUNTIME tells the compiler that this annotation should be available at runtime. This is required in our case as we will be using the annotations at runtime to decide the position and visibility of plugins.

Next <code>int position()</code> declares an element of the annotation called position of type int. Let’s use it for our first plugin.

@MyPluginAnnotation(position = 0)
public class PluginA implements MyPlugin{
 
  @Override
  public String getName() {
    return "Menu Item A";
  }
}

The only new piece of code is

@MyPluginAnnotation(position = 0)

Here you can see that the annotation assigns a value of 0 to the position element. Similarly we can define another one with position 1.

@MyPluginAnnotation(position = 1)
public class PluginB implements MyPlugin{
 
  @Override
  public String getName() {
    return "Menu Item B";
  }
}

Now, let’s see how we can use them:

public class Main {
  public static void main(String[] args) {
    Main main = new Main();
    Map<Integer, MyPlugin> pluggedInMenuItems = main.loadPlugins();
    Set keySet = pluggedInMenuItems.keySet();
 
    List positions = new ArrayList(keySet);
    Collections.sort(positions);
 
    for(Integer i : positions){
       System.out.println(i + "   " + pluggedInMenuItems.get(i).getName());
    }
}
 
private Map<Integer, MyPlugin> loadPlugins() {
  Map<Integer, MyPlugin> positionMap = new HashMap<Integer, MyPlugin>();
  //find classes that implement the given interface.
  Reflections reflections = new Reflections("");
  Set<Class<? extends MyPlugin>> subTypesOf = reflections.getSubTypesOf(MyPlugin.class);
 
  for (Class<? extends MyPlugin> c : subTypesOf) {
    MyPluginAnnotation annotation = c.getAnnotation(MyPluginAnnotation.class);
    // only add the plugin to the position map if it is annotated
    if (annotation != null) {
      try {
         //use the position element of annotation to put the plugin instance at the right position.
         positionMap.put(annotation.position(), c.newInstance());
      } catch (InstantiationException e) {
         System.err.println("Plugin instantiation failed. Make sure that the plugin has a contructor without any arguments.");
      } catch (IllegalAccessException e) {
         e.printStackTrace();
      }
    }
  }
 
  return positionMap;
}
 
}

This code contains the loadPlugins() method which makes use of the annotation and its position element to return a map that maps the position to corresponding instance of the plugin. loadPlugins() uses the Reflections library to do find classes. Once it finds all the classes that implement the interface, it finds just the ones with the MyPluginAnnotation annotation to put the instance of the plugin with appropriate position. Finally the main() method prints the menu. Now, you can add to the menu any number of items without changing code in the main method.

There are many frameworks like Netbeans and Spring that allow developers to use annotations for dependency injection and if you have worked with any of them before, now you know how it works!

NOTE: Complete code for this example is available at github.

What can you do with Java annotations? – Part 1

Java 5.0 introduced annotations among many things. Annotations are a way of adding metadata to Java elements such as classes, methods, variables, parameters and packages. An annotation in its simplest form looks something like this:

@MyAnnotation

The ‘@’ character means that what follows is an annotation. One of the uses of annotations is to generate boiler code. However, they can be used in several interesting ways. Java defines certain built-in annotations such as @Override and @Deprecated. @Override is used to annotate methods that are overridden. When annotated the compiler makes sure that the method is actually overriding a method, else compilation will fail. For example, the following code annotates a method myMethod() with @Override. When this code compiles, the compiler ensures that myMethod() is actually overridden.

@Override
public void myMethod(){
...
}

You might be wondering what do we gain by @Override? Assuming that we do not have annotations, if while overriding myMethod(), I misspell it to be yourMethod(), the compiler will not generate any error. It will consider it to be just another method and detecting such a bug can be a bloody business. So, you can see how useful can a simple annotation be. But, that is not all. As we mentioned earlier, annotations can be used for much more amazing stuff. One of my favourites is Deuce STMSoftware Transactional Memory is one of the several ways of developing concurrent applications in a clean fashion. Deuce STM  provides support for STM via a “java agent” and @Atomic annotation.

Annotations are a very interesting feature and there is a lot to annotations which you can know by reading the Oracle documentation to read more. The purpose of this post is not to rewrite what is already there. Rather I would like to show you in the next post, an interesting way to use annotations to implement a plugin-like structure to your application.

Defining PhD for myself !

I finally got admitted into the PhD program at SIAT – Simon Fraser University. I always knew that I do not want to leave academia. The idea of being involved in research and at the same time sharing my knowledge with students who will shape the future is something that I have always found myself drawn to.

However, since I got the admission I have found myself utterly confused. Suddenly, it started getting difficult for me to answer “Why I want to do a PhD?”; to what end do the next five years serve. I was oscillating between two extreme points of view — whether research means to develop complete solutions that can be used by others to make their lives easier or does it mean to just discover new ideas that have the potential of making someone’s life easier. But today, it seems I have found a better metaphor for what I want to do with my career, my research and PhD.

The answer is quite simple. First of all, I was considering the wrong extremes. Although, it is true that no research is of any use until converted into concrete products or solutions that make lives easier, it is not the scientist’s responsibility (unless he wants to be an academic entrepreneur).  Its just too distracting. On the other hand, suggesting ideas is not enough either. Scientists don’t just talk; people talk.

The right analogy would be to consider research like art. Just like and artist who just does not dream of the future, but uses his brush to show the world what he sees, words to tell a story or music to share his feelings, a researcher needs to provide solutions to small problems and tell a the world how his solutions fit together to solve the bigger puzzle; how his pieces fit together to form the complete picture while leaving it for others to actually sew the pieces together. This is what I want to do with my research, my PhD — find the individual pieces and let someone else use those pieces to complete the picture.

What does brushing and linking mean in information visualization?

Visualization systems generally consist of several independent visualizations, each allowing exploration of a different aspect of data. Over years, researchers have developed several interaction techniques to allow users to explore data. Two of these several techniques are Brushing and Linking.

Brushing refers to several interaction techniques that allow the user to select a subset of data in a visualization. Consider the following node-link visualization:

 

A node-link graph with several nodes and images.
A node-link graph with several nodes and images.

It contains several nodes and edges making it difficult to make sense of the graph. Suppose the user only wants to focus his exploration on a certain node and all its connected nodes. To do this she would select a subset of nodes. This selection as shown in the following figure is called brushing.

 

A subset of nodes selected in the graph.
A subset of nodes selected in the graph.

 

As I mentioned that most visualization software consist of several different types of visualization. It is often required to visualize a subset of data in different visualizations. This requirement is facilitated by the interaction technique called Linking. When a set of data elements is selected in a visualization, the same set gets selected in the other visualizations as well. For instance, consider the document view in CZSaw for VAST 2010 mini-challenge1 data-set.

 

Document view in CZSaw that brushes the entities selected in a list to a document.
Document view in CZSaw that brushes the entities selected in a list to a document.

In the document view of CZSaw, when a set of data elements (called entities) is selected in a list of entities, they get selected in the in corresponding document as well as any other visualization such as the hybrid view below:Screenshot from 2013-05-24 14:03:02

Brushing and Linking are two important interaction techniques that are supported in almost every visualization software. Brushing used with Linking form form a very powerful interaction and exploration of data in different visualization software.

Using switch-case for String in Java6

Until Java 7, developers were not able to use String in switch-case construct. The only option was to write a huge if-else block like this

 
if(str.equals("Opt1")){
...
}else if(str.equals("Opt2")) {
...
}else if(str.equals("Opt3")){
...
}

In my opinion, switch-case although somewhat limited than if-else, is a cleaner construct.

With Java 7, one can use String in a switch-case just like an integer:

switch(str){
case "Opt1": ...
case "Opt2": ...
case "Opt3": ...
}

However, as there is still a substantial amount of Java 6 development, I thought I would share a cleaner alternative for conditionals involving String. However, this alternative is applicable only for cases where the existing conditionals use equals() method. Suppose you have an if-else block like the one shown above, then you can convert it into a switch-case construct like this:

First, define an Enum containing all the options

enum Options{
 Opt1, Opt2, Opt3
}

Then replace the if-else by switch-case like this:

Options option = Options.valueOf(str);
switch(option){
case OPT1: <task1>; break;
case OPT2: <task2>; break;
case OPT3: ...; break;
}

Update:
However, the switch-case alternative although cleaner, might appear slower in some cases. If the time taken by the tasks (taks1, task2 etc…) is very small compared to Options.valueOf() then the switch-case would appear slower as the valueOf method does several complex operations to return the correct enum.

If you would like to compare performance difference between the two alternatives, I have written a simple test available here.