How lucene's mlt works

###TL;DR It takes the document, computes tf/idf, takes the top 25 (configurable) terms, and queries the DB with these terms with a boost.

Read More

Apple Provisioning Hell

We’re developing an ionic app for iOS and android, and handling the Apple provisioning/certificates/apn profile is hell. Just pure hell.

Read More

Easy Executors and Callables in groovy

Multithreading and general multi-tasking in groovy is super easy, thanks to frameworks like GPars. However, even the basic Java frameworks can be easily utilized in groovy for a rapid-no-brainer task-driven design.

Read More

Url catching regexp

Not exactly a brilliant piece of engineering, but this is a useful dirty hack.  If you want to clean urls from a string, you can match it using this regexp:

Read More

Node JS for backend programmers

I started working in a new and exciting start up, and the thought of using Java for development crushed my soul, so I began shopping around for a fast-development, fast-deployment, easy ramp up and heavily supported language. I wanted something that will be easy for new developers to learn, and most importantly, something that I’ll have fun writing in.</span>

Read More

Web programming for server people

I’m mainly a back-end engineer – through my not-so-long career, I’ve dealt with sql, no-sql, offline processes, etl mechanisms, billing services, web-crawlers, recommendation system, offline learning algorithms, and so forth, and so forth. The only web-related thing I did was to write struts hooks for specific server functions. So basically, I know nothing about http-requests, css, javascripts and other web-thingies – which makes me feel a bit handicapped.

Read More

Finally! What’s hogging your RAM?

Some of you may have noticed that the Windows Task Manager doesn’t show you the real memory consumption of the applications you’re running. It’s quite obvious, specially when the actual memory consumption, and the sum of the memory consumption presented by the task manager doesn’t add up!

Read More

Zorpia is a scam

I’ve been getting a lot of emails lately from a website called “Zorpia” claiming I have a message from someone named Katja. I’ve never signed into this website, and I also don’t know anyone named Katja, so I ignored the first and the second emails – but I just kept getting more and more of them, so I decided to find out what the hell is this website.

Read More

SearchIndexer & TSVNCache : Two things that are hogging your hd – and your computer

I have a brand new Lenovo T510 , i7, 4G ,64bit , Windows 7 with an Nvidia GPU – and it sucks. In this price , with these specifications , it really shouldn’t suck.
I’ve tried numerous things (including the regular defrag scan disk sequence) , and I did find a few problems (It was utilizing only 3 GB out of 4 , for example) – But it still sometimes had the worst performances ever.

Read More

Scala 101– Basic OOP : Writing a class

This is part 3 of my Scala tutorial – read the first part and the second part for a more general Scala intro. All the examples you see here were ran via the REPL ( that’s the Scala interpreter).

Scala’s take on OOP:

Scala tends toward pure object oriented model:

  • There is no such thing as “primitive”.
    In Java , You have Objects and Primitives. Scala , on the other hand , takes after other language like Python, Ruby, Smalltalk (and many others) in the sense that Everything is an object. Including the integer 1 and string “Hello, World”. For many , this seems a reasonable evolution from the Object/Primitive dichotomy used in Java.
  • Functions and Closures are also Objects.
    This is something people with no background in functional programming sometimes find difficult to accept in the beginning.
    In Scala , a function is just another type of object , and as such it do anything an object can – Change , get sent as a parameter , be the return value of another function , and so forth. Even though the idea seems strange at first , you might recall that even in C/C++ you can pass a reference to a function , or use it as a return value from a function.
  • Operators are methods – Like in C++
    Let’s get technical:

Let’s build a simple class that will represent a fish. At first , a fish only has a name:

scala> class Fish(var name: String) {}
defined class Fish

Well , something looks a bit… off… , isn’t it? There’s no constructor , no fields , no nothing! And yet , it works:

scala> var jaws = new Fish("Jaws")
jaws: Fish = Fish@530f243b

scala> jaws.name
res2: String = Jaws

scala> jaws.name = "Rex"

scala> jaws.name
res3: String = Rex

So what happened here , exactly? What we’ve used here is the “Primary Constructor”. The variable we’ve passed is a property of the class , and you really don’t need any more setters and getters.

What if I want an immutable field? Just use “val” instead of “var” :

scala> class Fish(val name: String) {}
defined class Fish

scala> var jaws = new Fish("Jaws")
jaws: Fish = Fish@2876b359

scala> jaws.name
res4: String = Jaws

scala> jaws.name = "Rex"
<console>:7: error: reassignment to val
jaws.name = "Rex"
^

And private fields?

scala> class Fish(private val name: String) {}
defined class Fish

scala> var jaws = new Fish("Jaws")
jaws: Fish = Fish@15664f1a

scala> jaws.name
<console>:8: error: value name cannot be accessed in Fish
jaws.name
^

scala> jaws.name = "Rex"
<console>:7: error: value name cannot be accessed in Fish
jaws.name = "Rex"
^

What about fields that aren’t in the constructor?

scala> class Fish(val name: String) {
val kind : String = "Shark"
}
defined class Fish

scala> val f = new Fish("Jaws")
f: Fish = Fish@69066caf

scala> f.kind
res12: String = Shark

Well, that’s great , but I want more then one constructor!

scala> class Fish(val name: String) {
def this() = this("SomeName")
}
defined class Fish

scala> var jaws = new Fish()
jaws: Fish = Fish@1dd0eb0b

scala> jaws.name
res6: String = SomeName

And something to tease you – How do you create a private primary constructor?

scala> class Fish private (val name: String) {}
defined class Fish

scala> var jaws = new Fish("Jaws")
<console>:6: error: constructor Animal cannot be accessed in object $iw
var jaws = new Fish("Jaws")
^

Scala has a primary constructor and zero or more auxiliary constructors. The primary constructor is the entire body of the class.  So actually , every line written in the body of the class will be executed (not including those inside functions / methods , naturally).
Note: In Scala , any auxiliary constructor must call another constructor of the same class as it’s first actions!

Let’s say we want a fish to print its name when it goes up:

 

scala> class Fish(var name: String) {
println(“I am “ + name)
}
defined class Fish

scala> var jaws = new Fish("Jaws")
I am Jaws

The code line “println (“I am “ + name)” , although it appears context-less , is actually part of the constructor.

Let’s sum everything up:

class Fish(var name: String, private var age : Int) {
   println (“A new fish is born!”);

   //This is accessible
   val kind : String = "Shark"

   //This is not accessible
   private val nickName = "Goldi"

   def this() = {
      //An auxiliary con'r MUST invoke another constructor as it's first action!
      this("Fishi", 0)
      println ("I'm an auxiliary constructor!")
   }

   def this(name: String) = this(name, 0);

   def swim() = println("Blo Blo")

   private def showUpperFin() = println("Dramatic music!")

   //If you override a function , you must declare it using the override keyword. Unless the
   //function is abstract , and then it's kind of obvious to the compiler
   override def toString = "My name is " + name

   //An operator is a method just like any other
   def + (that: Fish): Fish = return new Fish(this.name,this.age + that.age)
}

So what have we learned so far?

  • Scala has a Primary constructor (which is the entire body of the class) and auxiliary constructors – which serve just like Java constructors.
  • The default Scala scope is public
  • You can declare class parameters in the primary constructor
  • You can access and change non private class properties by accessing them directly

The next post will touch the getter/setters issue in Scala , limiting the scope of variables , using default values , and more . Stay tuned 😉

 

Read More

Swing – Closing the window and finalizing stuff in the background

I’m writing a program that is a little data intensive. At first , when the user clicked the X and choose “yes” at the “are you sure” box , I would do the clean up and then close the window . The problem was that the clean up sometimes took up to 1 minute , in which the window was frozen and annoyingly stuck on screen.

I’ve tried several different approaches to make it close the window and keep the cleaning in the background , but nothing worked.
Why?

Because I used

this.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);

instead of

this.setDefaultCloseOperation(JFrame.DO_NOTHING_ON_CLOSE);

Let’s see some code:

public class MyFrame extends JFrame {

		public MyFrame() {
			super("My Frame");			
			this.setDefaultCloseOperation(JFrame.DO_NOTHING_ON_CLOSE);

			this.addWindowListener(new WindowAdapter() {						                                       


				//This is the function that will be invoked when the user clicks on the close button (the X)
				//At the end of this function , there will be no window on the screen , because we set in the 
				//previous line the frame's defautl close operation to DisposeOnClose				

				@Override                               
				public void windowClosing(WindowEvent e) {
					int confirmed = JOptionPane.showConfirmDialog(null,
							"Are you sure you want to exit?",
							"Leaving so soon?", JOptionPane.YES_NO_OPTION);
					if (confirmed == JOptionPane.YES_OPTION) {
						System.out.println("The windows is no longer visible");
                        dispose();
                                                
					}
				}

				/*
				This is the function that is invoked when the window is closed (i.e , immediatly after the previous 
				function "windowClosing" exits). 
                                If the frame's default closing operation was "EXIT_ON_CLOSE" , this function wouldn't run.
				*/
				@Override
				public void windowClosed(WindowEvent e) {								
					viewManager.cleanUpTheMess();
				}
			});

		}
	}

What would have happened if we used EXIT_ON_CLOSE instead of DO_NOTHING_ON_CLOSE?
Simple – the “windowClosed” function wouldn’t run . It’s as if there was a System.exit(0) at the end of the windowClosing function.

Read More

Grep on PowerShell

I must admit that I find the windows PowerShell a really-way-too-late pale substitution for the full capacity of the unix shell , but since I sometimes need to work on windows , there ain’t much I can do about it.

Read More

Using Log4Net in a C# web application

I came to Log4Net because I really loved Log4j , but I must say the documentation on this project is simply crappy, and half of the links in the project page are broken (conveniently , to all the examples…) .

Read More

Learning C# with a Java background

I’m a Java programmer , and  I’m starting to work on a production project written in C# , so this is a great time to start learning this language . I’ve never used any of MS languages , so this is basically my first time in a .Net world .

Read More

Closing notes

Unfortunately , we reached the conclusion that HBase is still not stable enough for us to use in production environment. I think it’s a great project , and I’m certain that it will become a very important open-source tool in the not-so-far future – but currently , it just not good enough for what we need. I’ll keep posting about out new solution (once we find it) , and where it takes us.

Read More

Bug Update

After a lengthy discussion with St.Ack (on the hbase channel in the mirc) and Jean Daniel about [this bug][1] , we currently believe that what we’re seeing is “ … a fumble of regionstate somehow.  The master says its on regionserver X but when client goes there regionserver X says, I don’t have it “ (St.Ack).

Read More

Can’t live with it

We assumed that the bug was caused mainly because of the high load rate , and that once the bulk of the data will be in HBase and the load will drop considerably , we won’t see it again.

Read More

We’ll try to live with it

As you remember , HBase tends to collapse (Return “NotServingRegionException”) after a few millions of files (Latest crash: 6 million).
Since we (want to) believe this only happens because of the rapid insertion rate (~500 a minute) , we will try to load all the files into the the HBase , and then test it in Production-Like mode: meaning , mainly read request , and much lower insertion rate.

Read More

Still unresolved

We’ve been working on the previous bug (last post) for a few days now , but unfortunately nothing is working. We’ve suspected the error might be caused by wrong insertion sequence on our side , but it seems very unlikely now (we checked everything is corresponding with the API , reviewed it with others and all was fine. Besides , it’s really not that big a code)

Read More

First Setback

We’ve tried loading the system with 10,000 , 100,000 and 200,000 files – everything worked perfectly.

Read More

Testing update

We are planning to move on to production soon , and we intend to build a test environment that will resemble the production environment as close as possible.

Read More

Testing Hadoop - Starting HBase

Hbase configuration and running is very similar to hadoops. Not surprisingly , they also have [a nice Getting Started page][1]. The tricky part , though , is to understand what the hell do they mean with all these “column families” , and why the syntax is not plain SQL.

Read More

Testing Hadoop - Starting Hadoop

First thing : It works. If you’re getting a lot error messages and you start thinking “well , maybe it’s crap – still 0.18 can’t be that good a version” – Stop. It works.

Read More

Testing Hadoop - Problem Definition

We have a very large amount of relatively small files (~5k avg , 41k max , 0k min) , that we access a lot (20M times a day) for various computations. Currently , all the data is stored on a single server – a very ad-hoc solution that was OK until now , but is no longer acceptable – in terms of Service level , redundancy , backup , and so forth. We add approximately 5K new files a day , and data is never deleted .

Read More