HPI-Information-Systems/Metanome

Name: Metanome

Owner: HPI-Information-Systems

Description: The source repository of the Metanome tool

Created: 2014-03-06 12:34:50.0

Updated: 2018-01-17 07:31:43.0

Pushed: 2018-01-22 10:13:30.0

Homepage: metanome.de

Size: 26093

Language: Java

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Metanome

Build Status Coverage Status

The Metanome project is a joint project between the Hasso-Plattner-Institut (HPI) and the Qatar Computing Research Institute (QCRI). Metanome provides a fresh view on data profiling by developing and integrating efficient algorithms into a common tool, expanding on the functionality of data profiling, and addressing performance and scalability issues for Big Data. A vision of the project appears in SIGMOD Record: “Data Profiling Revisited“.

The Metanome tool is supplied under Apache License. You can use and extend the tool to develop your own profiling algorithms. The profiling algorithms provided on our download page have HPI copyright. You are free to use and distribute them for research purposes.

The Metanome platform itself is an backend service that communicates over an HTTP REST API endpoint. We provide a Metanome Frontend maintained in an separate repository that can be used to interact with the Metanome platform.

Building Metanome Locally

Metanome is a java maven project. So in order to build the sources, the following development tools are needed:

  1. Java JDK 1.8 or later
  2. Maven 3.1.0
  3. Git

Make sure that all three are on your system's PATH variable when running the build.

Pull Metanome Frontend Submodule

Before executing the build you have to clone the Metanome Frontend into the project.

submodule init
submodule update
Build Metanome

Metanome can be build by executing:

he frontend build fails due to missing or incompatible Angular packages, it often helps to re-run the build.

 the built has finished, Metanome can be packaged together with a Tomcat webserver, some test data, and some test algorithms. 
peedup builds this package is not created in the default maven profile. 
deployment package can be created by executing the build with the deployment-local profile: 

or by executing package on the deployment project directly:

 that if metanome has not been installed before creating the package (via mvn clean install), dependencies will be retrieved online, which can result in a deprecated package!

tart the Metanome frontend you then have to execute the following steps in the deployment folder:

nzip `deployment/target/deployment-1.1-SNAPSHOT-package_with_tomcat.zip`
o into the unzipped folder and start the run script, either `run.sh` or `run.bat`(Windows Systems)
pen a browser at [http://localhost:8080/](http://localhost:8080/)

 Downloads
Metanome releases can be found on the [Metanome releases page](https://github.com/HPI-Information-Systems/Metanome/releases).

ent profiling algorithms are available at the [Algorithm releases page](https://hpi.de/naumann/projects/data-profiling-and-analytics/metanome-data-profiling/algorithms.html). The sources of all these algorithms are available on GitHub in the [metanome-algorithms](https://github.com/HPI-Information-Systems/metanome-algorithms) repository.

 Developing a profiling algorithm for Metanome
ou want to build your own profiling algorithm for the Metanome tool, the best way to get started is our [Skeleton Project](https://hpi.de/fileadmin/user_upload/fachgebiete/naumann/projekte/repeatability/DataProfiling/Metanome/MetanomeAlgorithmSkeleton.zip). It contains an algorithm frame and a test runner project, with which you can run and test your code (without a running Metanome tool instance). For more details, check out the contained README.txt file.

e many profiling algorithms use similar techniques for the discovery of dependendencies, its worth checking out the following resources as well:

etanome-algorithms](https://github.com/HPI-Information-Systems/metanome-algorithms) including many implementations of novel and popular profiling algorithms for various types of metadata.
etanome-Data-Structures](https://github.com/jakob-zwiener/Metanome-Data-Structures) including, for instance, position list indexes (PLIs), which many algorithms use for candidate validation; see also [pli-benchmarks](https://github.com/jakob-zwiener/pli-benchmarks)

 Documentation
Metanome tool, information for algorithm developers and contributors to the project can be found in the [github wiki](https://github.com/HPI-Information-Systems/Metanome/wiki).

 Deploy Metanome Remote
s possible to deploy Metanome using PaaS providers like (Amazon Beanstalk, Heroku or Google App Engine).
rovide additional configs and documentation how to deploy Metanome on these in the [github wiki](https://github.com/HPI-Information-Systems/Metanome/wiki).

 Development
Metanome modules are continuously deployed to sonatype and can be used by adding the repository:

<repository>
    <id>snapshots-repo</id>
    <url>https://oss.sonatype.org/content/repositories/snapshots</url>
</repository>

 Git Commit Hooks
project is using [license-maintainer](https://github.com/NitorCreations/license-maintainer) as Pre-Commit Git Hook to keep the license information in all Java, XML and Python files up to date. To use it you have to execute the ```./add_hooks.sh``` shell script which is creating an pre-commit hook symlink to the license-maintainer script.

# Coding style
project follows the google-styleguide please make sure that all contributions adhere to the correct format. Formatting settings for common ides can be found at: http://code.google.com/p/google-styleguide/
files should contain the apache copyright header. The header can be found in the ```COPYRIGHT_HEADER``` file.

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.