Here again, 4 interesting links from the last days:

  • After AWS Kinesis and the upcoming Google Cloud Product to process streaming data, also MS Azure showed off their product: It is called EventHub and described in a littel bit more detail at http://weblogs.asp.net/scottgu/azure-announcing-new-real-time-data-streaming-and-data-factory-services . Especially the scope of the “managed service” makes it look like a quite good package.
  • Google started to walk the road of “office automation” in many different places, but this is still a very fragmented story (Google Docs automation vs. Google Mail Plugins etc.). Microsoft Office365 is about to offer a very consistend framework for atuomating everything around Office… this will be a good spot for productivity tool providers. More details at http://www.programmableweb.com/news/microsoft-office-365-apis-and-sdks-provide-mobile-dev-opportunities/2014/10/29 .
  • Azure Automation provides now a proprietaray way to do infrastructure “scripts” called “RunBooks“… unfortunately they cannot yet support existing open source standards. They are PowerShell supported scripts and the preview at http://azure.microsoft.com/blog/tag/azure-automation/ shows a good potential. This is also interesting compared to other cloud providers, as they offer only limited way to script whole infrastructure operations.
  • A company called LinguaSys debuted their product, which is an NLP API for 20 languages. (https://nlp.linguasys.com/ ) This is not the first API of its kind, but compared to most of the other APIs I’ve seen so far the language coverage is best. Their storymapper functionality offers services such as sentiment analysis, entity recognition etc.


My 4 links for today:

– Sheetlabs is a webservice, which allows to put any spreadsheet into an API, which eases integration in workflows while avoiding to hardcode spreadsheet structures. I had this challenge already sometimes and happy to see somebody come up with a solution… will test that!

OpStarts is a SaaS solution, which provides planning & forecasting for startups, especially in the SaaS area. It includes basic business planning with the special types ‘entities’ in SaaS business, such as subscriptions, plans etc.

– MS Azure steps now up with a search as a service feature: they are not the first ones, but the offering looks good & easy to setup. Well.. only once you found out, that theservice can only be configured in the new azure admin portal, and not in the ‘old’ one.

– Domino Data Labs brought a piece on Integrating R with production systems using an HTTP API : it is essentially a service which allows to expose R logic as an API, which makes integration in other applications easy while keeping the calculation logic separated & maintainable in R. Before you had to do calculcations in R and once you were done, you had either to cope with not-so-performant R wrappers or you had to rewrite your logic in your app’s programming language.



My 4 links for today:

– Introducing the Data Pipeline of ChartIO shows one additional toolset to let data scientists data workflows. This comes in a row with existing tools like MS Azure Machine Learning (which I could see in beta stage) and the upcoming Google DataFlow. … looks like work of data analysts will be supported in future even better.

–  BBC open sourced today a new toolset called DataStringer, which is targeted towards data journalists and allows to setup push notifications for changes in remote data repositories. As of now, JavaScript knowledge is needed to setup the system, but improvements migth make this even more acessible.

– googleVis got a new release: it makes the powerful Google Chart tools accessible to scripts in R. In advantage to e.g. ggplot2, googleVis is also able to generate interactive graphics.

– A comparison of different toolsets for interactive graphics from R  helps to find the right library for your visualization case.

Again, my shameless self-plug: I started gathering interest for a solution, which creates a printed large-scale diagram based on your AWS cloud infrastructure: if your interested feel free to head to http://clouddiagram0.datenprodukt.com/ .



My 3 links for today are:

– For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights: This piece of NYT highlights the fact, that while incredibly fancy tech tools are out there to master the algorithmic and big data aspects of data scientist’s work, the start point of every work is a process step called ‘data wrangling’, which means preparing data to make them comparable and computable for processing. My hope is that by applying techniques from the Semantic Web area (ontologies, etc.) people will in future have to spare less time on the basic wrangling, and can focus more on the interesting parts of projects.

– AngelList is a database for the startup ecosystem, including startups, VCs, locations… (similar to Crunchbase, where Crunchbase has more focus on the financial transactions). Happily enough, AngelList (such as Crunchbase) has a quite open API to retrieve their data… and GeekTime published “Which Technologies Do Startups Use? An Exploration of AngelList Data.” Not so much surprising AWS and Heroku are quire dominant in IT infrastructure, more surprising is the clear advantages JavaScript/ Node.js have for many startups.

– You might come to the point where your development capacity needs to scale,… and your are not the first who wants to grab the enormous dev talent potential in eastern Europe. Tips for Outsourcing Web Development to Eastern Europe has some goods hints on this one.

A shameless self-plug: I started gathering interest for a solution, which creates a printed large-scale diagram based on your AWS cloud infrastructure: if your interested feel free to head to http://clouddiagram0.datenprodukt.com/ .



My 4 links for today:

– CSV Fingerprint: Spot errors in your data at a glance links to a very interesting tool for anybody who more or less regulary works with potentially messy data. It helps to find anomalies in CSV files based on their format in a bird-level overview.

– I worked some time ago with masterdata management software, and they are difficult to sell (and implement) because customers do not immediately see the touchable benefits. The MDM for Anything (MDM Summer Series Part 6) blog (and the series before it) sheds some light on this. Interestingly enough, even startups come up with quite specialized services in this area. (such as product data as a service)

– In the eye of the storm contains a very good told story about typhoons in Hong Kong. Especially the second part (about how Hong Kong has adapted) has very instructive data visualizations e.g. on the ways typhoons took over the year. Ok, http://hint.fm/wind/ is slighty more impressive as it covers winds across the US with a good combination of historical & forecasted data).

– The talk Winston Chang’s “Interactive Graphics with ggvis” at useR! 2014 shows how the next evolution of the already know ggplot data visualization library for R can be used for interactive graphics. When not being in an R environment, I liked recently interactive visualizations by SAP Lumira, find examples at http://www.saplumira.com/learn/boards.php .



My 4 Links for today:
At least in German media it was.somehow present, howthegerman soccer team used SAP Hana for optimization of their.training.strategy.  The blog article Big Data & Spatial Analytics Help Germany Score the World Cup gives now some more technical notes how this was done … and I’m delighted to see that also startups take this up as a business opportunity.(e.g. Varsaty)

“Put ALL your Logs into Business Action” is a very interesting idea to use databases such as Hana for internal IT/cloud optimization. (Part of my fascination comes from the fact, that I do similar now with focus on one cloud provider..and I can see the optimization potential already there) So in case you managing a number of different it+ clouds like services, this SAP initiative might be interesting to you.

Transactional HTML Email Templates is a bundle of Good html email templates, which are targeted tow ards use in transactional emails. While getting this kind of emails has been solved for a while (e.g. via services like SendGrid, which I like for their API coverage or AWS SES), well designed html templates with consistent rendering different devices and in different applications have been are area with room for improvement for a while.

When it comes to distributed task handling in web applications, some terms come immediately into mind, such as RabbitMQ, Celery for Python/Django, AWS SQS etc. Nextdoor Taskworker: simple, efficient & scalable has now detailed overview of the architecture considerations they have chosen for them … very inspiring to read!



Only 3 links for today:

Moving from Data Analysis to Product Management – A Personal Journey

… a good piece about data analytics techniques in product management.

John Chambers: Interfaces, Efficiency and Big Data

talks the long way from S to R, which is now an increasingly common language for statistical analytics.

I had today the experience of working with vmWare vcOps dashboard… and usability for editors was not among the most important requirements. Then I read today

Building a Better Dashboard for Virtual Infrastructure

, which shows that at least equally appealing user interfaces can be done with much easier tools.



Today I want to present you four data-realted links from my daily reading:

So You Wanna Try Deep Learning?

…is a stunning documents & tutorial collection for everone who wants to get into deep learning, which according to Wikipedia helps to find connections and relations in datasets.

Responsive Dashboard

is an AngularJS/Twitter Bootstrap based template for a browser-based dashboard. For non-designers like me these templates help setup fast an ok- or even good looking site.

Free Microsoft eBooks

MS decided to make all of their product & platform related ebooks publicly available. Endeusers of MS products will be happy about the Office stuff, but I remember a colleague from an email today who thanked another one for his hero skills for clicking through – estimated – 50 Azure-Cloud-related portals/management sites etc. … maybe books like this one either explain the mess or give some guidance to MS product managers on where to improve.

SAP & ApiGee

Thats’s going to be a huge story… since some time I have now an HANA database in my hands. Already as a standalone tool it is powerful, but its additions on top such as the meanwhile open-sourced OpenUI5 and XSEngine open a new bunch of options especially for developers … and it is good to see that APIs get a more prominent role as business enablers.

That’s for today… happy to get feeback on this new format!



Since end of September 2013 I changed the legal form of my IT consulting & development activities, the result is datenProdukt GmbH.

I would like to invite you to take a look at the portfolio, more information about products and other services will follow.