<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/bpf/syscall.c, branch v4.1</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.1</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.1'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2015-04-15T16:00:47Z</updated>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next</title>
<updated>2015-04-15T16:00:47Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-04-15T16:00:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6c373ca89399c5a3f7ef210ad8f63dc3437da345'/>
<id>urn:sha1:6c373ca89399c5a3f7ef210ad8f63dc3437da345</id>
<content type='text'>
Pull networking updates from David Miller:

 1) Add BQL support to via-rhine, from Tino Reichardt.

 2) Integrate SWITCHDEV layer support into the DSA layer, so DSA drivers
    can support hw switch offloading.  From Floria Fainelli.

 3) Allow 'ip address' commands to initiate multicast group join/leave,
    from Madhu Challa.

 4) Many ipv4 FIB lookup optimizations from Alexander Duyck.

 5) Support EBPF in cls_bpf classifier and act_bpf action, from Daniel
    Borkmann.

 6) Remove the ugly compat support in ARP for ugly layers like ax25,
    rose, etc.  And use this to clean up the neigh layer, then use it to
    implement MPLS support.  All from Eric Biederman.

 7) Support L3 forwarding offloading in switches, from Scott Feldman.

 8) Collapse the LOCAL and MAIN ipv4 FIB tables when possible, to speed
    up route lookups even further.  From Alexander Duyck.

 9) Many improvements and bug fixes to the rhashtable implementation,
    from Herbert Xu and Thomas Graf.  In particular, in the case where
    an rhashtable user bulk adds a large number of items into an empty
    table, we expand the table much more sanely.

10) Don't make the tcp_metrics hash table per-namespace, from Eric
    Biederman.

11) Extend EBPF to access SKB fields, from Alexei Starovoitov.

12) Split out new connection request sockets so that they can be
    established in the main hash table.  Much less false sharing since
    hash lookups go direct to the request sockets instead of having to
    go first to the listener then to the request socks hashed
    underneath.  From Eric Dumazet.

13) Add async I/O support for crytpo AF_ALG sockets, from Tadeusz Struk.

14) Support stable privacy address generation for RFC7217 in IPV6.  From
    Hannes Frederic Sowa.

15) Hash network namespace into IP frag IDs, also from Hannes Frederic
    Sowa.

16) Convert PTP get/set methods to use 64-bit time, from Richard
    Cochran.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1816 commits)
  fm10k: Bump driver version to 0.15.2
  fm10k: corrected VF multicast update
  fm10k: mbx_update_max_size does not drop all oversized messages
  fm10k: reset head instead of calling update_max_size
  fm10k: renamed mbx_tx_dropped to mbx_tx_oversized
  fm10k: update xcast mode before synchronizing multicast addresses
  fm10k: start service timer on probe
  fm10k: fix function header comment
  fm10k: comment next_vf_mbx flow
  fm10k: don't handle mailbox events in iov_event path and always process mailbox
  fm10k: use separate workqueue for fm10k driver
  fm10k: Set PF queues to unlimited bandwidth during virtualization
  fm10k: expose tx_timeout_count as an ethtool stat
  fm10k: only increment tx_timeout_count in Tx hang path
  fm10k: remove extraneous "Reset interface" message
  fm10k: separate PF only stats so that VF does not display them
  fm10k: use hw-&gt;mac.max_queues for stats
  fm10k: only show actual queues, not the maximum in hardware
  fm10k: allow creation of VLAN on default vid
  fm10k: fix unused warnings
  ...
</content>
</entry>
<entry>
<title>tracing, perf: Implement BPF programs attached to kprobes</title>
<updated>2015-04-02T11:25:49Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@plumgrid.com</email>
</author>
<published>2015-03-25T19:49:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2541517c32be2531e0da59dfd7efc1ce844644f5'/>
<id>urn:sha1:2541517c32be2531e0da59dfd7efc1ce844644f5</id>
<content type='text'>
BPF programs, attached to kprobes, provide a safe way to execute
user-defined BPF byte-code programs without being able to crash or
hang the kernel in any way. The BPF engine makes sure that such
programs have a finite execution time and that they cannot break
out of their sandbox.

The user interface is to attach to a kprobe via the perf syscall:

	struct perf_event_attr attr = {
		.type	= PERF_TYPE_TRACEPOINT,
		.config	= event_id,
		...
	};

	event_fd = perf_event_open(&amp;attr,...);
	ioctl(event_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);

'prog_fd' is a file descriptor associated with BPF program
previously loaded.

'event_id' is an ID of the kprobe created.

Closing 'event_fd':

	close(event_fd);

... automatically detaches BPF program from it.

BPF programs can call in-kernel helper functions to:

  - lookup/update/delete elements in maps

  - probe_read - wraper of probe_kernel_read() used to access any
    kernel data structures

BPF programs receive 'struct pt_regs *' as an input ('struct pt_regs' is
architecture dependent) and return 0 to ignore the event and 1 to store
kprobe event into the ring buffer.

Note, kprobes are a fundamentally _not_ a stable kernel ABI,
so BPF programs attached to kprobes must be recompiled for
every kernel version and user must supply correct LINUX_VERSION_CODE
in attr.kern_version during bpf_prog_load() call.

Signed-off-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Reviewed-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Reviewed-by: Masami Hiramatsu &lt;masami.hiramatsu.pt@hitachi.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@infradead.org&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/1427312966-8434-4-git-send-email-ast@plumgrid.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: allow extended BPF programs access skb fields</title>
<updated>2015-03-16T02:02:28Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@plumgrid.com</email>
</author>
<published>2015-03-13T18:57:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9bac3d6d548e5cc925570b263f35b70a00a00ffd'/>
<id>urn:sha1:9bac3d6d548e5cc925570b263f35b70a00a00ffd</id>
<content type='text'>
introduce user accessible mirror of in-kernel 'struct sk_buff':
struct __sk_buff {
    __u32 len;
    __u32 pkt_type;
    __u32 mark;
    __u32 queue_mapping;
};

bpf programs can do:

int bpf_prog(struct __sk_buff *skb)
{
    __u32 var = skb-&gt;pkt_type;

which will be compiled to bpf assembler as:

dst_reg = *(u32 *)(src_reg + 4) // 4 == offsetof(struct __sk_buff, pkt_type)

bpf verifier will check validity of access and will convert it to:

dst_reg = *(u8 *)(src_reg + offsetof(struct sk_buff, __pkt_type_offset))
dst_reg &amp;= 7

since skb-&gt;pkt_type is a bitfield.

Signed-off-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>cls_bpf: add initial eBPF support for programmable classifiers</title>
<updated>2015-03-01T19:05:19Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2015-03-01T11:31:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e2e9b6541dd4b31848079da80fe2253daaafb549'/>
<id>urn:sha1:e2e9b6541dd4b31848079da80fe2253daaafb549</id>
<content type='text'>
This work extends the "classic" BPF programmable tc classifier by
extending its scope also to native eBPF code!

This allows for user space to implement own custom, 'safe' C like
classifiers (or whatever other frontend language LLVM et al may
provide in future), that can then be compiled with the LLVM eBPF
backend to an eBPF elf file. The result of this can be loaded into
the kernel via iproute2's tc. In the kernel, they can be JITed on
major archs and thus run in native performance.

Simple, minimal toy example to demonstrate the workflow:

  #include &lt;linux/ip.h&gt;
  #include &lt;linux/if_ether.h&gt;
  #include &lt;linux/bpf.h&gt;

  #include "tc_bpf_api.h"

  __section("classify")
  int cls_main(struct sk_buff *skb)
  {
    return (0x800 &lt;&lt; 16) | load_byte(skb, ETH_HLEN + __builtin_offsetof(struct iphdr, tos));
  }

  char __license[] __section("license") = "GPL";

The classifier can then be compiled into eBPF opcodes and loaded
via tc, for example:

  clang -O2 -emit-llvm -c cls.c -o - | llc -march=bpf -filetype=obj -o cls.o
  tc filter add dev em1 parent 1: bpf cls.o [...]

As it has been demonstrated, the scope can even reach up to a fully
fledged flow dissector (similarly as in samples/bpf/sockex2_kern.c).

For tc, maps are allowed to be used, but from kernel context only,
in other words, eBPF code can keep state across filter invocations.
In future, we perhaps may reattach from a different application to
those maps e.g., to read out collected statistics/state.

Similarly as in socket filters, we may extend functionality for eBPF
classifiers over time depending on the use cases. For that purpose,
cls_bpf programs are using BPF_PROG_TYPE_SCHED_CLS program type, so
we can allow additional functions/accessors (e.g. an ABI compatible
offset translation to skb fields/metadata). For an initial cls_bpf
support, we allow the same set of helper functions as eBPF socket
filters, but we could diverge at some point in time w/o problem.

I was wondering whether cls_bpf and act_bpf could share C programs,
I can imagine that at some point, we introduce i) further common
handlers for both (or even beyond their scope), and/or if truly needed
ii) some restricted function space for each of them. Both can be
abstracted easily through struct bpf_verifier_ops in future.

The context of cls_bpf versus act_bpf is slightly different though:
a cls_bpf program will return a specific classid whereas act_bpf a
drop/non-drop return code, latter may also in future mangle skbs.
That said, we can surely have a "classify" and "action" section in
a single object file, or considered mentioned constraint add a
possibility of a shared section.

The workflow for getting native eBPF running from tc [1] is as
follows: for f_bpf, I've added a slightly modified ELF parser code
from Alexei's kernel sample, which reads out the LLVM compiled
object, sets up maps (and dynamically fixes up map fds) if any, and
loads the eBPF instructions all centrally through the bpf syscall.

The resulting fd from the loaded program itself is being passed down
to cls_bpf, which looks up struct bpf_prog from the fd store, and
holds reference, so that it stays available also after tc program
lifetime. On tc filter destruction, it will then drop its reference.

Moreover, I've also added the optional possibility to annotate an
eBPF filter with a name (e.g. path to object file, or something
else if preferred) so that when tc dumps currently installed filters,
some more context can be given to an admin for a given instance (as
opposed to just the file descriptor number).

Last but not least, bpf_prog_get() and bpf_prog_put() needed to be
exported, so that eBPF can be used from cls_bpf built as a module.
Thanks to 60a3b2253c41 ("net: bpf: make eBPF interpreter images
read-only") I think this is of no concern since anything wanting to
alter eBPF opcode after verification stage would crash the kernel.

  [1] http://git.breakpoint.cc/cgit/dborkman/iproute2.git/log/?h=ebpf

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Cc: Jiri Pirko &lt;jiri@resnulli.us&gt;
Acked-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ebpf: move read-only fields to bpf_prog and shrink bpf_prog_aux</title>
<updated>2015-03-01T19:05:19Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2015-03-01T11:31:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=24701ecea76b0b93bd9667486934ec310825f558'/>
<id>urn:sha1:24701ecea76b0b93bd9667486934ec310825f558</id>
<content type='text'>
is_gpl_compatible and prog_type should be moved directly into bpf_prog
as they stay immutable during bpf_prog's lifetime, are core attributes
and they can be locked as read-only later on via bpf_prog_select_runtime().

With a bit of rearranging, this also allows us to shrink bpf_prog_aux
to exactly 1 cacheline.

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: rcu lock must not be held when calling copy_to_user()</title>
<updated>2015-01-27T01:20:40Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@plumgrid.com</email>
</author>
<published>2015-01-23T01:11:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8ebe667c41e054384df19f2f382bc415badfaee1'/>
<id>urn:sha1:8ebe667c41e054384df19f2f382bc415badfaee1</id>
<content type='text'>
BUG: sleeping function called from invalid context at mm/memory.c:3732
in_atomic(): 0, irqs_disabled(): 0, pid: 671, name: test_maps
1 lock held by test_maps/671:
 #0:  (rcu_read_lock){......}, at: [&lt;0000000000264190&gt;] map_lookup_elem+0xe8/0x260
Call Trace:
([&lt;0000000000115b7e&gt;] show_trace+0x12e/0x150)
 [&lt;0000000000115c40&gt;] show_stack+0xa0/0x100
 [&lt;00000000009b163c&gt;] dump_stack+0x74/0xc8
 [&lt;000000000017424a&gt;] ___might_sleep+0x23a/0x248
 [&lt;00000000002b58e8&gt;] might_fault+0x70/0xe8
 [&lt;0000000000264230&gt;] map_lookup_elem+0x188/0x260
 [&lt;0000000000264716&gt;] SyS_bpf+0x20e/0x840

Fix it by allocating temporary buffer to store map element value.

Fixes: db20fd2b0108 ("bpf: add lookup/update/delete/iterate methods to BPF maps")
Reported-by: Michael Holzheu &lt;holzheu@linux.vnet.ibm.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Acked-by: Daniel Borkmann &lt;dborkman@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: fix BPF_MAP_LOOKUP_ELEM command return code</title>
<updated>2014-11-18T18:43:59Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@plumgrid.com</email>
</author>
<published>2014-11-14T01:36:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a1854d6ac0008518bfc45e791172ad250999c2a2'/>
<id>urn:sha1:a1854d6ac0008518bfc45e791172ad250999c2a2</id>
<content type='text'>
fix errno of BPF_MAP_LOOKUP_ELEM command as bpf manpage
described it in commit b4fc1a460f30("Merge branch 'bpf-next'"):
-----
BPF_MAP_LOOKUP_ELEM
    int bpf_lookup_elem(int fd, void *key, void *value)
    {
        union bpf_attr attr = {
            .map_fd = fd,
            .key = ptr_to_u64(key),
            .value = ptr_to_u64(value),
        };

        return bpf(BPF_MAP_LOOKUP_ELEM, &amp;attr, sizeof(attr));
    }
    bpf() syscall looks up an element with given key in  a  map  fd.
    If  element  is found it returns zero and stores element's value
    into value.  If element is not found  it  returns  -1  and  sets
    errno to ENOENT.

and further down in manpage:

   ENOENT For BPF_MAP_LOOKUP_ELEM or BPF_MAP_DELETE_ELEM,  indicates  that
          element with given key was not found.
-----

In general all BPF commands return ENOENT when map element is not found
(including BPF_MAP_GET_NEXT_KEY and BPF_MAP_UPDATE_ELEM with
 flags == BPF_MAP_UPDATE_ONLY)

Subsequent patch adds a testsuite to check return values for all of
these combinations.

Signed-off-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: add 'flags' attribute to BPF_MAP_UPDATE_ELEM command</title>
<updated>2014-11-18T18:43:25Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@plumgrid.com</email>
</author>
<published>2014-11-14T01:36:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3274f52073d88b62f3c5ace82ae9d48546232e72'/>
<id>urn:sha1:3274f52073d88b62f3c5ace82ae9d48546232e72</id>
<content type='text'>
the current meaning of BPF_MAP_UPDATE_ELEM syscall command is:
either update existing map element or create a new one.
Initially the plan was to add a new command to handle the case of
'create new element if it didn't exist', but 'flags' style looks
cleaner and overall diff is much smaller (more code reused), so add 'flags'
attribute to BPF_MAP_UPDATE_ELEM command with the following meaning:
 #define BPF_ANY	0 /* create new element or update existing */
 #define BPF_NOEXIST	1 /* create new element if it didn't exist */
 #define BPF_EXIST	2 /* update existing element */

bpf_update_elem(fd, key, value, BPF_NOEXIST) call can fail with EEXIST
if element already exists.

bpf_update_elem(fd, key, value, BPF_EXIST) can fail with ENOENT
if element doesn't exist.

Userspace will call it as:
int bpf_update_elem(int fd, void *key, void *value, __u64 flags)
{
    union bpf_attr attr = {
        .map_fd = fd,
        .key = ptr_to_u64(key),
        .value = ptr_to_u64(value),
        .flags = flags;
    };

    return bpf(BPF_MAP_UPDATE_ELEM, &amp;attr, sizeof(attr));
}

First two bits of 'flags' are used to encode style of bpf_update_elem() command.
Bits 2-63 are reserved for future use.

Signed-off-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: verifier (add ability to receive verification log)</title>
<updated>2014-09-26T19:05:15Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@plumgrid.com</email>
</author>
<published>2014-09-26T07:17:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cbd357008604925355ae7b54a09137dabb81b580'/>
<id>urn:sha1:cbd357008604925355ae7b54a09137dabb81b580</id>
<content type='text'>
add optional attributes for BPF_PROG_LOAD syscall:
union bpf_attr {
    struct {
	...
	__u32         log_level; /* verbosity level of eBPF verifier */
	__u32         log_size;  /* size of user buffer */
	__aligned_u64 log_buf;   /* user supplied 'char *buffer' */
    };
};

when log_level &gt; 0 the verifier will return its verification log in the user
supplied buffer 'log_buf' which can be used by program author to analyze why
verifier rejected given program.

'Understanding eBPF verifier messages' section of Documentation/networking/filter.txt
provides several examples of these messages, like the program:

  BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
  BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
  BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
  BPF_LD_MAP_FD(BPF_REG_1, 0),
  BPF_CALL_FUNC(BPF_FUNC_map_lookup_elem),
  BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
  BPF_ST_MEM(BPF_DW, BPF_REG_0, 4, 0),
  BPF_EXIT_INSN(),

will be rejected with the following multi-line message in log_buf:

  0: (7a) *(u64 *)(r10 -8) = 0
  1: (bf) r2 = r10
  2: (07) r2 += -8
  3: (b7) r1 = 0
  4: (85) call 1
  5: (15) if r0 == 0x0 goto pc+1
   R0=map_ptr R10=fp
  6: (7a) *(u64 *)(r0 +4) = 0
  misaligned access off 4 size 8

The format of the output can change at any time as verifier evolves.

Signed-off-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: verifier (add docs)</title>
<updated>2014-09-26T19:05:14Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@plumgrid.com</email>
</author>
<published>2014-09-26T07:17:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=51580e798cb61b0fc63fa3aa6c5c975375aa0550'/>
<id>urn:sha1:51580e798cb61b0fc63fa3aa6c5c975375aa0550</id>
<content type='text'>
this patch adds all of eBPF verfier documentation and empty bpf_check()

The end goal for the verifier is to statically check safety of the program.

Verifier will catch:
- loops
- out of range jumps
- unreachable instructions
- invalid instructions
- uninitialized register access
- uninitialized stack access
- misaligned stack access
- out of range stack access
- invalid calling convention

More details in Documentation/networking/filter.txt

Signed-off-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
