05.11.08

Interesting Non-Results in YouTube, Flickr, and MetaCafe

Posted in Search Engines at 2:05 pm by Brandon Wirtz

As the results of my recent projects I’m building interesting maps of holes. Voids if you will, in the Internet.

I was importing Webster’s Unabridged dictionary and seeing interesting words I didn’t know existed and thinking I wonder what that looks like.

Well I have my own Video/Picture/Dictionary now at Digerat.com so here are some visuals that I thought warrant a video, but don’t return any.

Zamang (n.) An immense leguminous tree (Pithecolobium Saman) of Venezuela. Its branches form a hemispherical mass, often one hundred and eighty feet across. The sweet pulpy pods are used commonly for feeding cattle. Also called rain tree.

Palinurus (n.) An instrument for obtaining directly, without calculation, the true bearing of the sun, and thence the variation of the compass

Debuscope (n.) A modification of the kaleidoscope; — used to reflect images so as to form beautiful designs.

These were just a few that caught my eye.  Quick rush off and make videos of these and you can be the top hit for them…  I have 1000’s of others, and an interesting list of things which return results in one service but not another.  The Metacafe terms that are not in YouTube is one of the more fascinating.

05.10.08

Powerset should Sell

Posted in Money, Responses, Search Engines at 5:29 pm by Brandon Wirtz

I like what I have seen of Powerset, but… and this is a big but, they have spent a lot of time learning how to search 2 sites.  2 sites that are well edited, are supposed to be reasonably encyclopedic.

That makes it Cool, but a long ways from done.

Consider the following Passages which mean exactly the same thing, but are written in the style of two very different bloggers.

Powerset leverages the power of Natural Language Search to discern what you are searching for.  This allows it to determine the difference between positive and negative positions, so if you are search for "Plants that are not Vegetables" Powerset will return articles about Fruits, and Legumes.

The above is "Encyclopedic" but just as useful, and more typical of web pages is the following…

The dudes at Powerset have a Search Engine that pulls results using normal English.  So rather than answers that are bogus because they have all of the words "Plants that are not Vegetables" it knows you are looking for fruits, or legumes.

The first is a LOT easier for a machine to parse.  Any Yoda style post that would be fine for a human is going to wreak havoc with a "natural language" search.

Powerset searches with the Language of Nature they do.  Results from the meaning of your words they find. Seeking "Plants that are not Vegetables" yields Fruits and Legumes, not mis-placed pages with those words upon them. MMMM

Teaching a computer to understand context in sentences with regular word order is not particularly difficult.  Working with a sample size that is 1/1 millionth the size of the Internet is not easy, but is a small feat compared to what is needed to Index everything on the planet. 

And to be entirely honest searching Wikipedia is easy because pretty much everything in it is a Noun.  What do I mean by this?

Wikipedia is no help if you are looking for "Setting up Exchange Server"  You don’t need natural language to parse this question.  but finding the answer is hard.  Because you will encounter all sorts of things that look like they are the answer in the real world.  "I need help setting up an Exchange server" is going to appear, and there will be very technical looking things surrounding the statement, but it won’t be the answer to the question.

Conversely finding the answer to "who was the first president of the USA?" can be broken down.

Who = Person search

Was = In the past not the present

The = the question is singular

First = question implies there were more than one

President = a Noun likely what we are looking for.

Of = Modifier of President so subset of the answers

the USA = Specific Modifier

Run a search no results found, so you run parts through a Synonym engine. 

the USA = United States

Poof you now have an easy task, search for "a person" with "first president" and "United States" hopefully in the same sentence.

I haven’t gotten to play with the tool, but would it get the equally "natural language" answer to "Who was Voted the First American President?"  or "Who First Filled the Role of US President?"

Don’t get me wrong I think PowerSet has a future.  But I think in the near term it is in Answering questions about "small" sets of data, not the web in general.  eHow would be benefited, Microsoft Encarta, Project Gutenberg, and as more an more of these data sources were indexed Powerset can get better, and can be ready to deal with News Sites, and from there, it might be able to break in to Blogging, but it will take a very long time to make it work for the Internet in General. 

Or Perhaps making people write in a style that is easy for computers to parse will be a good thing for SEO in the future.

This is a response to:

Michael Arrington: Powerset’s Dilemma: Go For It, Or Sell

05.08.08

The Value of Facebook, Techmeme, or any Other Scoble Favorite

Posted in Google, Microsoft, Responses, Search Engines, Technology News, Yahoo at 12:00 am by Brandon Wirtz

Microsoft is moving on to Facebook.  Poor MSFT what are you thinking?  Facebook isn’t getting rich on it’s ad strategy.  If it was it wouldn’t have wanted your money the last time you invested in them.  If you are shopping to compete with GOOG then quit looking at big players that aren’t competing, and start looking at little players who might if they had the money.

Despite agreeing with Scoble on the idea that the Google Model’s days are numbered, it is not because Social Sites like Facebook are going to replace them, it is because eventually people are not going to need to go looking, the tools will just know what you want.

I have been working on a Techmeme Style product that works off of your OPML file.  Anyone can have their own Meme, bought the domain it is practically ready to launch.  Only trouble, the CPU requirements are HUGE.  Maybe I’ll convert it to run on Google’s servers….  Oh wait that makes Google Relevant again.

Robert is right the current model will become more like Techmeme and more like Mahalo.com and more like FaceBook… But Google is not going to sit there and do nothing.  Google adapts and grows.

I can see all of the pieces being made ready to jump on any of us that are testing the waters, after WE figure out what works.  Google doesn’t need to have a finished product just enough beta bits to make sure that whatever models start to look like they could work can be crushed.

Insert your favorite explicative, Google can crush me at any moment just by de-listing me, same for Mahalo.  If Google said, "Hey Jason Calacanis your site is just a very pretty splog, or at least that is what we are going to claim so our stocks don’t fall on news that Techmeme’s crowd says you’re going to crush us in 3 years"  well Mahalo revenue would drop 80% over night and the 3 years that it takes to recoup the cost of a post would now take 15.  That would end the butt kicking.

I just found out about Google "Sets"  which looks a lot like what I have been talking about for the last 3 months with figuring out what a set of words category is, so that you can tune the results contextually.  If I know that the last 100 searches you did were about food, I know a Chili Dog is just a Chili Dog.  If the last 100 searches you did were X rated I know that you are looking for something a bit grosser.

This is just one of the pieces that form the Lego’s that Google is building.  And it is one of the public ones, we don’t get to see the ones they don’t care to share.  Matt Cutts only tells us what it is in Google’s interest to tell us.

If you read my "rant" about Greatest Living American, you start to get where I’m going with this.  This week the "right answer" to who is the Greatest Living American is "Brandon Wirtz" or "Steven Colbert".  That is the only time it should be this.  If my mom gives her 3rd graders the task of writing an essay on who they think the Greatest Living American is you don’t want them to write about Perry Como, Night of the Living Dead, or How to make grilled cheese because they used Mahalo.  And I don’t want them to say it is me because they used Google And really Norman Borlaug may not be the "right" answer but it is the only answer I could find that wasn’t me or Steven.

Results that are free from the possibility of manipulation is part of the future of search.  Results that are contextually relevant to the searcher is part of the future of search.  Results that are contextually relevant to the events of the moment are part of the future of search.

When Oprah has someone on her show for the next 12 hours a perfect search engine would know that 90% of the people looking for that person wants to know about that person "through" Oprah.  Similarly when "Don’t forget the Lyrics" is on searches for things that look very similar to words from a song are about a song, not about what ever the words are about.  "Feet Down Below His Knee" is not about how one sprints, or trousers  (isayhello gets this wrong too since I don’t have a large enough Lyric library to reference, but I get it right with a YouTube video).

Google Is starting to get this, as Two days ago above search results for Greatest Living American was a link to CNN’s article about the Colbert Webby.  All of the points that I and Scoble make about the future of search, and what our "Dashboards for Life" will look like Google is building.

The people who aren’t getting this seemingly is Microsoft.  They are in such a race to show they get it by cutting a check they are missing all of the small innovative companies that are making it work.  I would encourage MSFT to buy Mahalo.  (Jason if they cut you a check for $300 million  I want 1% for putting the idea out there)  Because Mahalo brings a piece of the Pie that is needed, Human summarized results.  There is not enough CPU or a large enough database to classify things the way an army of volunteers/cheap labor can.  Mahalo can take the 100 things each hour that people are looking for, and sort them, compile a headline, and find the source of the trend in record time, and if MSFT wants me to use Live.com as a Landing page that is part of what they need.

If Microsoft wants to by Techmeme, I would root them on. Gabe Rivera has proven that he can create Meme’s for various subjects, and I’m sure if someone paid him he could create 300 Meme’s that fit personality profiles for 80% of the people out there, and that would make them happy to start their day on a Microsoft site, and you know what most of those people’s search results could be filtered through the sites in their Meme.

I’m sorry I don’t support the Facebook buy.  They just don’t know how to make people happy. I thought Facebook Apps had potential, but they are all such time wasters.  I really wanted to like them.  Stack on that Facebook doesn’t really know how to monetize their users, and I don’t see anything in Facebook that isn’t in LinkedIn.  If LinkedIn goes the way of Facebook, I’m dropping it.

This is a response to:

Anders Bylund: Microsoft is probing Facebook’s merger interest

SmoothSpan Blog: Is Microsoft Playing Possum for Yahoo? It Could Be Much Worse!

Jim Goldman: Microsoft’s About Face With Facebook—Is It In Writing?

MG Siegler: The Microsoft buying Facebook rumors commence, again

Joel Evans: Is Microsoft still shopping?

Nicholas Carlson: Microsoft’s plan for Web growth, minus Yahoo and Facebook

05.05.08

12 step program for beating AA. What Microsoft’s Plan B Should Be: Building an Adsense/Adwords Competitor in Minimal Time

Posted in Advertising, Google, Microsoft, Search Engines, Yahoo at 10:26 am by Brandon Wirtz

I have to say that I thought buying Yahoo was the wrong move.  Yahoo didn’t have an Adsense/Adwords product that was successful, their use of Adwords in order to raise their Search Engine Earnings is a testament to that.  So what should Microsoft do?

1. Convert the Microsoft owned Building on Pear Street in Mountain View to be the first headquarters for Microsoft’s new Ad Product.

It has to be in Silicon Valley, and it would be better if it were NOT on the over crowded SVC campus, and there isn’t time to build a new campus.  MSFT has this building and it is a nice size and walking distance to Microsoft SVC, and Google.  Which is important because every time Google has a Blogger Event, or an Ad Expo, there needs to be one at Microsoft as well.  Microsoft Culture doesn’t always merge well so finding a new home near, but not on an existing campus give the Acquisitions a better chance of bringing their expertise rather than being molded in to Microsofties too quickly.

2. Acquire Adsdaq.  Adsdaq is the best non-Google banner ad company out there.  There are bigger ones, but Adsdaq has a simple intuitive UI, that makes sense, and with MSFT behind it could achieve the necessary volume to be a true success.  I don’t believe entirely in the name your own price ad serving model, but it would be a benefit to MSFT early on because it would allow the expectation that you would get less than 100% fill rate, which would allow time to grow ad inventory to meet demand.

3. Acquire Compete.  I don’t even like Compete, but there  are a lot of people who trust it more than Alexa for traffic analysis, and that is an important component.  Microsoft is going to need to build a better than Google Analytics tools quickly, and it seems like Compete has the biggest jump in this space.

4. Acquire ISayhello.com . Microsoft needs a deep understanding of not just keywords, but keyword relationships, the ability to mine data quickly and efficiently, and ISayHello.com brings that to the table.  Until there is an Ad for every keyword on the planet Microsoft will need to be able to do contextual matches against categories, content types, and working out those relationships will also allow Microsoft to do something Google doesn’t.  Let you easily decide if you want to run ads on Pages that distinctly say your product sucks.  With out the ability to determine the actual meaning of the words on the page you could very well be spending money to advertise on sites that are bashing your product, already selling your product, both of these scenarios should be up to the ad buyer.

5. Don’t Force Dev, on a Microsoft Platform.  It should get moved eventually, but time is critical, and many of the companies that are going to need to get bought won’t be running ASP and MSSQL.  Encourage Flexibility, and work on building API’s to connect what is out there to Microsoft Platforms.  This will do do things, Shorten time to product, and make creation of tools for sale to end users easier.  There will be a time in the near future when people will be ready Microsoft Ad Server, for managing their inventory of In House ads, and their ads provided by third parties.  Knowing how to talk to anything and everything will be part of building that.

6. Get Scoble Back.  You don’t even have to hire him full time, but Scoble brings legitimacy to bloggers.  Microsoft needs to be the exclusive ad provider for all of Scoble’s projects.  While they are at it, they need to get Dave Winer, Om Malik, Fake Steve Jobs, and Perez Hilton. And for good measure the Technorati Top 100 by Authority and by Favorites.

7. Be an Omnimedia Company.  Create products that don’t exist now, that allow for Microsoft to be a "One Stop Shop" for advertising.  Cut a deal with Clearchannel, along with other radio stations, PBS and NPR.  Yes PBS.  Every time there is a "This show sponsored in part by" there is an opportunity to advertise.  And the Cost is cheap, and the ads are simple so they are easy to build.  Every small town newspaper on the planet, even if it is just to sell classifieds.  A reseller program for Adam’s outdoor.  You get the picture.  The idea should be that every mom and pop shop should be able to go to one place and get any type of ad they need.

8. Build an MCSE equivalent for Advertising.  By creating lots of local experts Microsoft can create a partner network to off load support and make the ad buying experience more personal than any of the existing products.  Many Graphic Designers would be happy to create ads and manage them, many companies that handle media buys would like to be able to say they were a certified partner.  This helps customers know who to trust.

9. Offer payment as products from Microsoft and Partners.  A lot of "small fish" will take all year to get $100 from Google.  So why not let the Game Blogger get a game at the Microsoft Employee price every so often.  Halo 3 for $20, and Blue Dragon for $20 is a great incentive to the High School Kid who wants to get in to Blogging, but is going to take 3 years to get a check from Google.

10. Allow the use of Ad revenue to buy more Ads on the network.  I hate having to take my Adsense dollars into my bank accounts so that I can put them back in Adwords.  Save some money and just let me pull from one into the other. 

11. Maintain Transparency.  The most irksome thing about Google for ad buyers and ad sellers, is that you don’t know what commission Google is getting.  So are you paying $1 for a click that cost 10 cents?   Are you being paid a penny for a click that Google charge a quarter for?  I’m confident Google moves the slider around based on the volume of your buy, and the volume of your sell. But it is hard to trust Google.  You know you will get paid but it is hard to anticipate your income because the CPM’s seem to flux with where Google thinks it needs to get its earnings.

12. Remember the Little Guy.  Google got to the size it is buy taking the big and the small.  You can advertise with as little as a dollar, and you can get paid as little as $30 a year. Being a success means growing with your customers.

04.25.08

Keyword Search Death Knell Sounds

Posted in Search Engines at 1:01 pm by Brandon Wirtz

Is Keyword Search About To Hit Its Breaking Point? Yes.  I made a post a few days earlier about the problem with Keyword search is that people aren’t all of the same background.

The example I use is what happens if Oprah Winfrey gets pushed infront of the L in Chicago?

What would you search for?  Oprah Pushed By Fan?  Winfrey killed by Train? Oprah murdered on the L?   They are all valid searches that should return the results, but they won’t all work if you type them in to Dmoz, Mahalo.com, Wikia, Wikipedia, or Ask.com, certainly not if you type them in, in the first hour of the event.

A Search engine needs to know "what" a thing is.  And it needs to adapt to what you are trying to search or at least give you the tools to find what you are searching for.

I think the intelligent web is closer than the 2020 that is quoted in the TechCrunch Article.  The processing power required isn’t as insane as many believe because 80% of searches are for 20% of topics.  But it is not the searching that is hard… It is the finding.  How will we know the "Right" answer to a question?

Is it true that Brandon Wirtz is the Greatest Living American ?  Or is Stephen Colbert? Which truth has more truthiness? 

When a story breaks is New York Times a better result or the Chicago Tribune? 

Once you get over the technological hurdle of dissecting  a query into what does the user want, you need to get over the hurdle of determining which result has the most authority on the subject.

Imagine the following queries happen on the same day and the reasons for them being asked.

In the morning: a pre-recorded showing of Oprah’s Favorite things airs, and she gives everyone in the Audience iPhone 2.0.

Mid-day: a sextape of Stedman and Lindsey Lohan is released with photos on TMZ

Afternoon: Oprah remarks to her new assistant that she wishes someone would just push her in front of the L, and the assistant does just that.

 

So the queries come in…

Oprah iPhone

Oprah boyfriend sex tape

Oprah train

 

The Oprah iPhone query should get a result from Wired about the iPhone 2.0, or the Apple iPhone Page, and the Oprah website about today’s show, but shouldn’t return the 3 month old result about the Oprah Website now being iPhone "Safari compatible"  which was the top hit the day before.

While Wired is the place for all things "Tech" TMZ should have authority for Sex Tape… But an interview in the Chicago Tribune says that Oprah’s assistant helped with her suicide because of the Stedman Sex tape.

And the search for Oprah Train ?  Well the tribune didn’t SEO their article because it was written for print, and so it refers to the "L" and Chicagoe Transit authority, but never uses the word train… So the right result would be the fore-mentioned Tribune Article, but because NYT headlined "Oprah killed in Train Accident" search results would normally go to this article, but an intelligent search would know that the "L" is a train and would respond accordingly.

Where do I come in?  HUGE lists.  If you can build a classification of everything in to lots of categories you can start to build the relationships that we humans take for granted.  I don’t have to write complex rules, I can use lots of simple rules to determine that things are related.  A Spike in Oprah traffic means that something news worthy happened there.

Some of the rules I’m building are obvious, some of the rules are only obvious after you point them out.  Like knowing the TV schedule, so that you know that a search for Grey’s Anatomy Contest is not looking for the illustration in a book but, people looking for information about how Meredith won the Glitter Pager.

 

Edit:

Hacking Cough is right about you having to know what needle you are looking for.

Words are there for themselves

 

Also thanks Jordan for pointing out the misspelling of Knell… This is what happens when you blog while Coding.

Part of the XYHD.tv Content Network